W dniu 2014-09-03 12:30, R.J. Baars pisze:
> Marcin,
>
> For English, there are .info files in /resource/ as well as in
> /resource/hunspell.
> First seems to be for the tagging dict, second for the speller.
Ah, of course, there should be one .info file per one .dict file. I 
thought you were asking about one dictionary file.

>
> (I would prefer spell-checker for directory name.)
>
> The content of the info file for Dutch should probably be:
> fsa.dict.speller.ignore-numbers=false
> fsa.dict.speller.ignore-all-uppercase=false
> fsa.dict.speller.ignore-camel-case=true
> fsa.dict.speller.ignore-punctuation=false
Note: if you don't have all punctuation in your dictionary, this will 
make the speller complain on all commas, colons, hyphens etc.

> fsa.dict.input-conversion=ij ij, IJ IJ

You need to use normal Unicode here or Java escaping, not HTML escaping.

> fsa.dict.output-conversion=ij ij, IJ IJ
Do you have such characters in the dictionary file? If not, then you 
don't need the output conversion.

> fsa.dict.speller.runon-words=false
> fsa.dict.speller.locale=nl_NL
> fsa.dict.speller.convert-case=false
> fsa.dict.speller.ignore-diacritics=true
> fsa.dict.speller.replacement-pairs=y ij, ei ij
> fsa.dict.speller.equivalent-chars=
> fsa.dict.frequency-included=true
> fsa.dict.encoding=utf-8
> fsa.dict.separator=
> fsa.dict.author=R. Baars;
>
> I am not sure about separator , equivalent chars and the locale.
Separator is just used for internal management (usually it's a plus 
character). Doesn't really matter unless you want to use "+" as an entry 
(and you would have to if you have "ignore-punctuation" set to false).

> I don quite get the difference between diacritics, equivalent chars and
> replacment pairs. Diacritics seems to me to be part of equivalent and is a
> kind of automatic replacement.
Diacritics is automatic and faster than replacement pairs. Roughly the 
same as equivalent chars.

> ei ij is a replacement, á and a are taken care of by diacritics, and I
> guess Dutch does not have equivalents ...
>
> Right?
What about apostrophes? Do you want them normalized or not?

Regards,
Marcin

>
>
>
>> W dniu 2014-09-03 10:58, R.J. Baars pisze:
>>> To add the words frequencis, I am directed by the wiki to an address
>>> where
>>> there is a frequency list indeed. But only 187000 words; while I have
>>> 1.2
>>> million Dutch words and their frequency myself.
>> Probably the probabilities of their occurrence is quite low. I tried
>> replacing that list with a bigger one for Polish and my results indeed
>> made the dictionary file bigger but nothing else changed much.
>>
>>> The frequency is just a number; what is expected there? I this number a
>>> plain ratio, a occurrence count, or something else, like logarithmic?
>>> Will I have to convert to that format, or is a plain word<tab>number an
>>> option too?
>> Log scale, I believe. You might want to filter out some of the lower
>> results, as well, as they don't really help and only make files bigger.
>>
>> Marcin
>>
>>> Ruud
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Slashdot TV.
>>> Video for Nerds.  Stuff that matters.
>>> http://tv.slashdot.org/
>>> _______________________________________________
>>> Languagetool-devel mailing list
>>> Languagetool-devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>>
>>>
>>
>> ------------------------------------------------------------------------------
>> Slashdot TV.
>> Video for Nerds.  Stuff that matters.
>> http://tv.slashdot.org/
>> _______________________________________________
>> Languagetool-devel mailing list
>> Languagetool-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>
>
>
> ------------------------------------------------------------------------------
> Slashdot TV.
> Video for Nerds.  Stuff that matters.
> http://tv.slashdot.org/
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
>


------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to