Marcin, For English, there are .info files in /resource/ as well as in /resource/hunspell. First seems to be for the tagging dict, second for the speller.
(I would prefer spell-checker for directory name.) The content of the info file for Dutch should probably be: fsa.dict.speller.ignore-numbers=false fsa.dict.speller.ignore-all-uppercase=false fsa.dict.speller.ignore-camel-case=true fsa.dict.speller.ignore-punctuation=false fsa.dict.input-conversion=ij ij, IJ IJ fsa.dict.output-conversion=ij ij, IJ IJ fsa.dict.speller.runon-words=false fsa.dict.speller.locale=nl_NL fsa.dict.speller.convert-case=false fsa.dict.speller.ignore-diacritics=true fsa.dict.speller.replacement-pairs=y ij, ei ij fsa.dict.speller.equivalent-chars= fsa.dict.frequency-included=true fsa.dict.encoding=utf-8 fsa.dict.separator= fsa.dict.author=R. Baars; I am not sure about separator , equivalent chars and the locale. I don quite get the difference between diacritics, equivalent chars and replacment pairs. Diacritics seems to me to be part of equivalent and is a kind of automatic replacement. ei ij is a replacement, á and a are taken care of by diacritics, and I guess Dutch does not have equivalents ... Right? > W dniu 2014-09-03 10:58, R.J. Baars pisze: >> To add the words frequencis, I am directed by the wiki to an address >> where >> there is a frequency list indeed. But only 187000 words; while I have >> 1.2 >> million Dutch words and their frequency myself. > > Probably the probabilities of their occurrence is quite low. I tried > replacing that list with a bigger one for Polish and my results indeed > made the dictionary file bigger but nothing else changed much. > >> >> The frequency is just a number; what is expected there? I this number a >> plain ratio, a occurrence count, or something else, like logarithmic? >> Will I have to convert to that format, or is a plain word<tab>number an >> option too? > > Log scale, I believe. You might want to filter out some of the lower > results, as well, as they don't really help and only make files bigger. > > Marcin > >> >> Ruud >> >> >> ------------------------------------------------------------------------------ >> Slashdot TV. >> Video for Nerds. Stuff that matters. >> http://tv.slashdot.org/ >> _______________________________________________ >> Languagetool-devel mailing list >> Languagetool-devel@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/languagetool-devel >> >> > > > ------------------------------------------------------------------------------ > Slashdot TV. > Video for Nerds. Stuff that matters. > http://tv.slashdot.org/ > _______________________________________________ > Languagetool-devel mailing list > Languagetool-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/languagetool-devel > ------------------------------------------------------------------------------ Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel