In UkrainianWordTokenizer.java I am replacing "Unicode apostrophes"
U+2019 and U+02BC into old good single quote (') to unify all apostrophe
handling. If Dutch case is similar you could borrow this code.

Andriy

On 09/02/2014 08:11 AM, R.J. Baars wrote:
> The Dutch tokenizer is a little bit different from thet otheres, because
> of words with a ' in it.
>
> That works fine, unless the text does not have a ', but a ’ , which
> happens quite often.
>
> Since I am not able to edit the java program (little knowledge), could
> someone have a look at this please?
>
> Ruud
>
>
> ------------------------------------------------------------------------------
> Slashdot TV.  
> Video for Nerds.  Stuff that matters.
> http://tv.slashdot.org/
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel


------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to