Hi Have a look in the following debug output of LanguageTool where a token gets non-sensical POS tag "N.*" (multiple times) after a disambiguation rule is applied.
Is it a bug in the disambiguator? Or am writing an incorrect disambiguation rule? $ echo "An eil"| java -jar languagetool-standalone/target/LanguageTool-2.7-SNAPSHOT/LanguageTool-2.7-SNAPSHOT/languagetool-commandline.jar -c utf-8 -l br -v Expected text language: Breton Working on STDIN... 664 rules activated for language Breton <S> An[mont/V pres 1 s,monet/V pres 1 s,an/D e sp,] eil[eilañ/N.*,eilañ/N.*,eilañ/N.*,eilañ/N.*,eilañ/N.*,</S>,]<P/> Disambiguator log: UR_N:2 eil[eilañ/V pres 3 s,eilañ/V impe 2 s,eil/K e sp o,eil/J,eilañ/SENT_END] -> eil[eilañ/N.*,eilañ/N.*,eilañ/N.*,eilañ/N.*,eilañ/N.*,eilañ/SENT_END] Notice that the token "eil" gets POS tag "N.*" (which is a invalid POS tag, it's not mean to be a regexp) and furthermore, it gets that same POS tag 5 times after disambiguation. The disambiguation rule UR_N:2 in languagetool-language-modules/br/src/main/resources/org/languagetool/resource/br/disambiguation.xml is... <rule> <pattern> <token regexp="yes">u[ln]|a[nlr]</token> <marker> <token postag="V.*" postag_regexp="yes"/> </marker> </pattern> <disambig action="filter" postag="N.*"/> </rule> The idea of the disambiguation rule is that, if the word following "an" (or al, or ar, etc.) is a verb (V.*), then keep only its noun POS tag (N.*) in case it happens to be also a noun. But obviously, this is not what's happening here. Regards Dominique ------------------------------------------------------------------------------ Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel