I could, If I were able to code. I only do things on the XML level.
Ruud
> In UkrainianWordTokenizer.java I am replacing "Unicode apostrophes"
> U+2019 and U+02BC into old good single quote (') to unify all apostrophe
> handling. If Dutch case is similar you could borrow this code.
>
> Andriy
>
Hi
Have a look in the following debug output
of LanguageTool where a token gets non-sensical
POS tag "N.*" (multiple times) after a disambiguation
rule is applied.
Is it a bug in the disambiguator?
Or am writing an incorrect disambiguation rule?
$ echo "An eil"| java -jar
languagetool-standalone
In UkrainianWordTokenizer.java I am replacing "Unicode apostrophes"
U+2019 and U+02BC into old good single quote (') to unify all apostrophe
handling. If Dutch case is similar you could borrow this code.
Andriy
On 09/02/2014 08:11 AM, R.J. Baars wrote:
> The Dutch tokenizer is a little bit differ
FYI, the Wikimedia Foundation welcomes proposals for their 'Individual
Engagement Grants'. If you want to develop the LT Wikipedia integration,
this might be interesting to you:
https://meta.wikimedia.org/wiki/Grants:IEG
For example, one might extend LT WikiCheck to do spell checking on
Wikipe
R.J. Baars wrote:
> Thanks a lot for the tip. I am including it in the Dutch rules.
> There is some work in translating the messages.
>
> Maybe it is possible to standardize the examples (not the messages, these
> are significat for users), so translation is easier.
>
> Ruud
Hi Ruud
Probably no
I did part of the work while moving to Dutch. I removed some rules,
because in the Netherlands, we don translate or transform names of fantasy
people normally, and soms names were really quite local ;-)
It is quite easy now to translate to any language. I could start by
translating it to English,
Thanks a lot for the tip. I am including it in the Dutch rules.
There is some work in translating the messages.
Maybe it is possible to standardize the examples (not the messages, these
are significat for users), so translation is easier.
Ruud
---
> The Dutch tokenizer is a little bit different from thet otheres, because
> of words with a ' in it.
>
> That works fine, unless the text does not have a ', but a , which
> happens quite often.
>
> Since I am not able to edit the java program (little knowledge), could
> someone have a look at th
The Dutch tokenizer is a little bit different from thet otheres, because
of words with a ' in it.
That works fine, unless the text does not have a ', but a , which
happens quite often.
Since I am not able to edit the java program (little knowledge), could
someone have a look at this please?
Ru