In UkrainianWordTokenizer.java I am replacing "Unicode apostrophes" U+2019 and U+02BC into old good single quote (') to unify all apostrophe handling. If Dutch case is similar you could borrow this code.
Andriy On 09/02/2014 08:11 AM, R.J. Baars wrote: > The Dutch tokenizer is a little bit different from thet otheres, because > of words with a ' in it. > > That works fine, unless the text does not have a ', but a ’ , which > happens quite often. > > Since I am not able to edit the java program (little knowledge), could > someone have a look at this please? > > Ruud > > > ------------------------------------------------------------------------------ > Slashdot TV. > Video for Nerds. Stuff that matters. > http://tv.slashdot.org/ > _______________________________________________ > Languagetool-devel mailing list > Languagetool-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/languagetool-devel ------------------------------------------------------------------------------ Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel