Hi

Have a look in the following debug output
of LanguageTool where a token gets non-sensical
POS tag "N.*" (multiple times) after a disambiguation
rule is applied.

Is it a bug in the disambiguator?
Or am writing an incorrect disambiguation rule?

$ echo "An eil"| java -jar
languagetool-standalone/target/LanguageTool-2.7-SNAPSHOT/LanguageTool-2.7-SNAPSHOT/languagetool-commandline.jar
-c utf-8 -l br -v
Expected text language: Breton
Working on STDIN...
664 rules activated for language Breton
<S> An[mont/V pres 1 s,monet/V pres 1 s,an/D e sp,]
eil[eilañ/N.*,eilañ/N.*,eilañ/N.*,eilañ/N.*,eilañ/N.*,</S>,]<P/>
Disambiguator log:

UR_N:2 eil[eilañ/V pres 3 s,eilañ/V impe 2 s,eil/K e sp
o,eil/J,eilañ/SENT_END] ->
eil[eilañ/N.*,eilañ/N.*,eilañ/N.*,eilañ/N.*,eilañ/N.*,eilañ/SENT_END]


Notice that the token "eil" gets POS tag "N.*" (which
is a invalid POS tag, it's not mean to be a regexp) and
furthermore, it gets that same POS tag 5 times after
disambiguation.

The disambiguation rule UR_N:2 in
languagetool-language-modules/br/src/main/resources/org/languagetool/resource/br/disambiguation.xml
is...

    <rule>
      <pattern>
        <token regexp="yes">u[ln]|a[nlr]</token>
        <marker>
          <token postag="V.*" postag_regexp="yes"/>
        </marker>
      </pattern>
      <disambig action="filter" postag="N.*"/>
    </rule>

The idea of the disambiguation rule is that, if the
word following "an" (or al, or ar, etc.) is a verb (V.*),
then keep only its noun POS tag (N.*)
in case it happens to be also a noun.
But obviously, this is not what's happening here.

Regards
Dominique

------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to