Re: XML element and attribute statistics

Marcin Miłkowski Thu, 03 Apr 2014 01:41:25 -0700

W dniu 2014-04-03 01:12, Andriy Rysin pisze:
> On 04/02/2014 04:44 PM, Daniel Naber wrote:
>> On 2014-04-02 19:29, Andriy Rysin wrote:
>>
>>> When I was splitting grammar.xml file I actually spent almost a day
>>> trying to use xml include features to include component grammar files,
>>> I must say I was not able to make it work properly in all scenarios:
>> I guess you tried this one?
>> http://wiki.languagetool.org/tips-and-tricks#toc2
>> If that doesn't work, there's no other approach I know of.
> yes, that's what i tried, I could not make the url work for both
> filesystem and jar, I even seen some differences on how LT code and
> xmllint include files (the simple include that worked for xmllint didn't
> work in LT) so I abandoned that path
>>
>>> If we can't do that can we consider loading all files together
>>> similarly to how it's done in production code?
>> Mhh, I can't see us doing anything special in production code. All files
>> are handled separately. Are you really 100% sure that these rules
>> actually worked? Or did they maybe work by chance, e.g. because the
>> <unify> wasn't actually needed for the examples you tried?
> yes I can confirm one of the rules (rulegroup id "SAMYI") works
> correctly in 2.5 and takes to account unification.
>
> It looks that PatterRuleTest.validatePatternFile() checks the xml files
> one at a time: loading one, validating it, going for next, while
> JLanguageTool.activateDefaultPatternRules() loads them all in memory,
> which (if I understand correctly) will keep first grammar.xml (which
> contains common parts) already loaded and parsed when loading/parsing
> rest of them.
>
> I guess we have two ways to go from here: adjust the tests to load files
> and keep them (I am not sure how easy it is - depends on how flexible
> our XMLValidator is) or change our getRuleFileNames() API to require
> those files to be independent (which may not be very efficient if all
> rule files will have to load and parse the same common parts, like
> unification etc)


This is error-prone: imagine someone touches the unification by mistake, 
and you get inconsistencies, as the latest unification code would 
override the earlier. There is just one unification set for all grammar 
rules. We could go for different unification sets, and that would be the 
only safe option.

But I think that would be a bad solution as we'd usually use the same 
unification set and we shouldn't be pressed to copy the same code over 
and over again.

Unfortunately, modularizing means also that we need to have consistent 
integration of grammar files. I think the same mechanism should be used 
for loading user grammar files, and we may imagine inconsistent user 
grammar files. What do we do in such a case? Simply say "caveat emptor"? 
Or maybe we should have additional features that say which files can be 
combined with which?

Regards,
Marcin

------------------------------------------------------------------------------
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Re: XML element and attribute statistics

Reply via email to