Re: Current limitations of MorfologikSpeller

2014-09-02 Thread R.J. Baars
I could, If I were able to code. I only do things on the XML level. Ruud > In UkrainianWordTokenizer.java I am replacing "Unicode apostrophes" > U+2019 and U+02BC into old good single quote (') to unify all apostrophe > handling. If Dutch case is similar you could borrow this code. > > Andriy >

Bug is disambiguator?

2014-09-02 Thread Dominique Pellé
Hi Have a look in the following debug output of LanguageTool where a token gets non-sensical POS tag "N.*" (multiple times) after a disambiguation rule is applied. Is it a bug in the disambiguator? Or am writing an incorrect disambiguation rule? $ echo "An eil"| java -jar languagetool-standalone

Re: Current limitations of MorfologikSpeller

2014-09-02 Thread Andriy Rysin
In UkrainianWordTokenizer.java I am replacing "Unicode apostrophes" U+2019 and U+02BC into old good single quote (') to unify all apostrophe handling. If Dutch case is similar you could borrow this code. Andriy On 09/02/2014 08:11 AM, R.J. Baars wrote: > The Dutch tokenizer is a little bit differ

WMF Individual Engagement Grants

2014-09-02 Thread Daniel Naber
FYI, the Wikimedia Foundation welcomes proposals for their 'Individual Engagement Grants'. If you want to develop the LT Wikipedia integration, this might be interesting to you: https://meta.wikimedia.org/wiki/Grants:IEG For example, one might extend LT WikiCheck to do spell checking on Wikipe

Re: French names list

2014-09-02 Thread Dominique Pellé
R.J. Baars wrote: > Thanks a lot for the tip. I am including it in the Dutch rules. > There is some work in translating the messages. > > Maybe it is possible to standardize the examples (not the messages, these > are significat for users), so translation is easier. > > Ruud Hi Ruud Probably no

Re: French names list

2014-09-02 Thread R.J. Baars
I did part of the work while moving to Dutch. I removed some rules, because in the Netherlands, we don translate or transform names of fantasy people normally, and soms names were really quite local ;-) It is quite easy now to translate to any language. I could start by translating it to English,

French names list

2014-09-02 Thread R.J. Baars
Thanks a lot for the tip. I am including it in the Dutch rules. There is some work in translating the messages. Maybe it is possible to standardize the examples (not the messages, these are significat for users), so translation is easier. Ruud ---

Sorry, wrong thread, should be : Dutch tokenizer.

2014-09-02 Thread R.J. Baars
> The Dutch tokenizer is a little bit different from thet otheres, because > of words with a ' in it. > > That works fine, unless the text does not have a ', but a ’ , which > happens quite often. > > Since I am not able to edit the java program (little knowledge), could > someone have a look at th

Re: Current limitations of MorfologikSpeller

2014-09-02 Thread R.J. Baars
The Dutch tokenizer is a little bit different from thet otheres, because of words with a ' in it. That works fine, unless the text does not have a ', but a ’ , which happens quite often. Since I am not able to edit the java program (little knowledge), could someone have a look at this please? Ru