Hi all,

W dniu 2013-01-02 20:47, Dominique Pellé pisze:
> Hi Jaume
>
> I wonder whether this is related to the error I had when I tried
> generate the French POS tag dictionary with Morfologik-1.5.2.
> When I did that, the French disambiguation rule RP-D_J_AMBIG_N
> gave different results causing errors in "ant test". Somehow with
> Morfologik-1.4 it works fine.  What seemed to happen is that the
> POS tags are in different order when using Morfologik-1.5.2 and
> that affects the disambiguation rule RP-D_J_AMBIG_N which
> disambiguates differently. Only 2 rules use unification in French,
> so I would not mind if we could use simpler normal rules without
> disambiguator instead.  I did not write those unification rules in
> French by the way. Not sure who did.

Agnes Souque; she took the rules from An Gramadoir and converted to our 
emerging XML encoding.

The problem is that these rules were not tested thoroughly. Some might 
not match any sentence at all. Also, the rules in the disambiguator are 
ordering-sensitive, in contrast to rules in the grammar.xml files: they 
are cascading. It seems that some of the disambiguation rules relied on 
the order of POS tags implicitly, because it was the order of tags that 
the cascade of rules depended on.

It is very hard to debug but since we have now the debugging output, it 
should be quite easy to switch on the verbose output for French text 
that fails for Morfologik > 1.5.2 and runs for the earlier version and 
compare it side by side. I don't think we have any bug in 
morfologik-stemming here; the only change is that we optimize the order 
of entries in a different way but the application should not rely on the 
order.

Could you make a diff on these two debugging outputs (ignoring the 
failing ant test)?

I strongly recommend adding examples to disambiguation rules to ensure 
regression tests pass - just like for grammar rules. Sometimes, quite 
surprisingly, a rule is never matched because the input is already 
disambiguated when it reaches the rule. Changing the order of rules 
sometimes helps.

Regards,
Marcin

> Now how do you suggest to change the unification rules in
> French to work with your changes?  You say it's simple so
> perhaps you already tried something locally without checking-in?
>
> Regards
> -- Dominique
>
>
> On Wed, Jan 2, 2013 at 7:25 PM, Jaume Ortolà i Font
> <jaumeort...@gmail.com <mailto:jaumeort...@gmail.com>> wrote:
>
>     Hi,
>
>     I have made changes in unification in order to solve the problem I
>     explained before. I hope it was understandable. See the attachment.
>
>     With theses changes, now appear two errors in French sentences:
>
>     Je ne suis pas la seule. (in FrenchRuleDisambiguatorTest.java)
>     Les composants électroniques peuvent être nettoyés. (in grammar.xml)
>
>     What happens now is that the behaviour of action="unify" in
>     disambiguation rules has changed a little. The readings that doesn't
>     match with the postags in the pattern rule are removed. What is the
>     desirable behavior of action="unify" in these cases? Anyway, I think
>     this question is not relevant at all.
>
>     What we were doing until now with action="unify" was very unclear.
>     We were removing (or not removing) readings that were not explicitly
>     written in the pattern. So nobody was aware of what was really
>     happening. Using action="filterall" with or without unification is
>     much more clear (preferably with unification or with other
>     restrictive conditions). The problems in the French tests can be
>     solved easily with a few changes in the disambiguation rules
>     (changes in the combinations of determinants, adjectives and nouns,
>     and the priority we give them). What do you think, Dominique?
>
>     Regards,
>     Jaume Ortolà
>
>
>
>
>     2013/1/2 Jaume Ortolà i Font <jaumeort...@gmail.com
>     <mailto:jaumeort...@gmail.com>>
>
>         I found a solution. I'm trying to change properly the code.
>
>         Regards,
>         Jaume
>
>
>
>         2013/1/2 Jaume Ortolà i Font <jaumeort...@gmail.com
>         <mailto:jaumeort...@gmail.com>>
>
>             Hi,
>
>             I have found a problem with unification. I'm using this pattern:
>
>               <rule id="DAAN_" name="det + adj + adj + nom">
>                     <pattern>
>                         <unify>
>                             <feature id="nombre"/>
>                             <feature id="genere"/>
>                             <marker>
>             <token postag="D[^R].*" postag_regexp="yes"/>
>                             <token postag="A.*" postag_regexp="yes"/>
>             <token postag="A.*" postag_regexp="yes"/>
>             <token postag="N.*" postag_regexp="yes"/>
>                             </marker>
>                         </unify>
>                         </pattern>
>                         <disambig action="filterall" />
>                     </rule>
>
>             The disambig action is not relevant here. The question is
>             whether the pattern matches a sentence or not.
>
>             The sentence I want to match in Catalan is: "La part
>             superior esquerra".
>
>             What's going on here? This sentence matches the pattern
>             Det-Adj-Adj-Noun, and at the same time the sentence is
>             unified in number and gender (feminine singular). However
>               the pattern as a whole should not match because both
>             conditions are not satisfied by the same readings but by
>             different readings.
>
>             The second word of the sentence ("part") is adjective
>             masculine singular (which satisfies the D-A-A-N pattern),
>             and it is noun feminine singular (which satisfies the
>             unification condition). But "part" is not adjective feminine
>             singular which would satisfy the pattern as a whole. So I
>             think the pattern should not match this sentence.
>
>             I tried to debug the code (AbstractPatternRule.java,
>             Unifier.java), but to no avail so far.
>
>             PS For debugging I put the rule at the start of the Catalan
>             disambiguation file and the sentence at the start of a JUnit
>             inside assertCorrect("").
>
>             Regards,
>             Jaume Ortolà
>
>
>
> ------------------------------------------------------------------------------
> Master Java SE, Java EE, Eclipse, Spring, Hibernate, JavaScript, jQuery
> and much more. Keep your Java skills current with LearnJavaNow -
> 200+ hours of step-by-step video tutorials by Java experts.
> SALE $49.99 this month only -- learn more at:
> http://p.sf.net/sfu/learnmore_122612
>
>
>
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>


------------------------------------------------------------------------------
Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and
much more. Get web development skills now with LearnDevNow -
350+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122812
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to