Dear list,

I have a huge list[1] of 70,000 (potential) false positives in the
latest de_DE_frami dictionary for Hunspell. Among other things, it is
based on the first 10,000 (or so) entries of a machine-generated list
Ruud Baars sent me a few months ago.

I think this is a valuable resource, but its size makes it difficult
to handle. In particular, there is no guarantee that there aren't some
true positives hidden somewhere, and I don't know how to filter out
words that are very rare and would only clutter the dictionary.

Any ideas how this can be made useful for the improvement of German
Hunspell support with reasonable effort?

-- Jan

[1] https://sourceforge.net/p/germandict/code/HEAD/tree/hunspell_words.txt

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to