Hi,

When using 'unification', I sometimes find 'out of bounds' errors, and I
think this could be easily prevented. The next rule and other similar rules
seem to work fine with the examples. But when I test it with the Wikipedia
corpus, 'out of bounds' exceptions appear in some places. I copy here the
rule, the phrase that caused error, the error message, the Java code lines
and the proposed solution that works for me.

My next step will be to test what happens using unification with sequences
of more than two tokens.

Jaume


THE RULE (in Catalan grammar.xml)

            <rule>
                <pattern>
                    <unify>
                        <feature id="persona"/>
                        <feature id="nombre"/>
                        <marker>
                            <token postag="P0.*" postag_regexp="yes"
skip="2"><exception postag="P[0P].*|V.[^MNGP].*|SENT_END" scope="next"
postag_regexp="yes" negate_pos="yes"/></token>
                        </marker>
                        <token postag="V.[^MNGP].*" postag_regexp="yes"
skip="1"></token>
                    </unify>
                    <token
regexp="yes">caure|callar|témer|marxar|albergar|olorar</token>
                </pattern>
                <message>Aquest verb no ha de ser reflexiu. Elimina el
pronom '\1'. <suggestion></suggestion>.</message>
                <short>Aquest verb no és reflexiu.</short>
                <example type="incorrect">El nen <marker>es</marker> pot
caure.</example>
                <example type="incorrect"><marker>T'</marker>has de
callar.</example>
                <example type="incorrect">El nen <marker>se</marker> li pot
caure.</example>
                <example type="incorrect">Tu <marker>et</marker> pots
caure.</example>
                <example type="correct">El nen pot caure.</example>
                <example type="correct">El nen et pot caure.</example>

            </rule>

THE PHRASE THAT CAUSED ERROR
<S> GNU[GNU]'s[es/P0300000] not[notar/VMIP1S0] unix[unir/VMIP3S0,
unir/VMM02S0].[</S>]

THE ERROR MESSAGE

*java.lang.IndexOutOfBoundsException: Index: 2, Size: 1
at java.util.ArrayList.rangeCheck(Unknown Source)
at java.util.ArrayList.get(Unknown Source)
at org.languagetool.rules.patterns.Unifier.checkNext(Unifier.java:234)
at org.languagetool.rules.patterns.Unifier.isSatisfied(Unifier.java:157)
at org.languagetool.rules.patterns.Unifier.isUnified(Unifier.java:343)
at
org.languagetool.rules.patterns.AbstractPatternRule.testUnificationAndGroups(AbstractPatternRule.java:212)
*

Unifier.java, lines 229-236

      if (unifiedNext) {
        if (tokSequence.size() == readingsCounter) {
          tokSequence.add(new AnalyzedTokenReadings(aToken, 0));
        } else {
          tokSequence.get(readingsCounter).addReading(aToken); //*****
        }
        tmpFeaturesFound = tokenFeaturesFound;
      }

THE PROPOSED SOLUTION

         if (readingsCounter<tokSequence.size()) {
            tokSequence.get(readingsCounter).addReading(aToken);
          }   /* else? */
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to