Re: MorfologikSpeller

R.J. Baars Thu, 04 Sep 2014 02:20:53 -0700

I set the case options like this:

fsa.dict.speller.ignore-all-uppercase=false


I set this to false, because of correct words like LSD. For Dutch, it
would be great as there is a max amount of uppercased: LSD is ignored,
AMSTERDAM is not.

fsa.dict.speller.ignore-camel-case=true
These are all proper names, so can be ignored

fsa.dict.speller.convert-case=false
I will try to set this to true, but it is unclear in the text what a
dictionary entry like LSD will do then.
It appears to work as expected ;-) : whe words are uppercased in the dic,
it will require that; if not, it will suggest the uppercased version.


about the dash at the end of a word:

Details tells me the dash is kept to the word in the tokenizer in Dutch

<S> <S> Een[een/DTe,een/NM,een/NM1,een/NN1d,] afdelings[afdelings/null,]
of[of/CJo,] ander[ander/AJn,] uitje[uitje/NN1r,ui/NN1r,] .[</S>,]

In the English version (that I am testing with)

<S> Een[Een/null,B-NP-singular] afdelings[afdelings/null,E-NP-singular]
-[-/null,O] of[of/IN,B-PP] ander[Ander/NNP,B-NP-singular]
uitje[uitje/null,E-NP-singular] .[./.,</S>,O]

So the mistake is in the mix of languages.

I think that makes it necessary to test in the Dutch, and touch the code
for this.


> W dniu 2014-09-03 20:06, R.J. Baars pisze:
>> I replace the English dictionary with the newly generated Dutch one.
>>
>> Running the complete list of wrong and correct words through LT works.
>> The
>> output is less structured than I would like though. When there is no
>> suggestion, the entire suggestion line is missing; also the word is not
>> recognizable in the output, just underlined, which is more difficult to
>> process. I will have to build a program around this to get the data I
>> need
>> to judge the suggestions. Taask for tomorrow.
>>
>> But it works, with the following conclusions:
>> - there is still a lot of words that should have been accepted (missing
>> compounding parts in Hunspell)
>
> Daniel is working on that for German.
>
>> - numbers as a whole (0123456) should be skipped, but ranking numbers
>> like
>> 100e and mp3, F16 should be checked. As far as I could see, there are no
>> options for that.
>
> Interesting. This is probably a bug, as I don't expect numbers to be
> checked by a spell checker.
>
>> - When a word is completely in upper-case (UPPERCASE) (which is not in
>> the
>> dictionary and set not to be accepted), the alternatives Uppercase and
>> uppercase are not suggested.
>
> This is probably because your dictionary is case-sensitive.
>
>>
>> These are no showstoppers, but a small step back from Hunspell.
>>
>> Maybe some of these are general things, useful to put on the todo-list.
>
> It seems to me that the number checking is a genuine bug. I never had
> this "check words with numbers" option set, so this is why I didn't
> encounter this.
>
> Regards,
> Marcin
>
> ------------------------------------------------------------------------------
> Slashdot TV.
> Video for Nerds.  Stuff that matters.
> http://tv.slashdot.org/
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>



------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Re: MorfologikSpeller

Reply via email to