Hi, this is just a reminder that the data for statistical error detection exists, now people just need to use it...
Regards Daniel On 2015-09-16 22:50, Daniel Naber wrote: > Hi, > > some time ago, I've added a rule for English to detect errors > statistically, by using large ngram data sets. I've activated the rule > now for all languages that we have data for: Chinese, French, Italian, > Russian, and Spanish (German had been activated for some time already). > > That means rule developers can add word pairs to the > 'confusion_sets.txt' file and LT will try to detect wrong usage of > either word of the pair. Here's how you can use this approach to detect > errors: > > 1.) Download the (large) data from > http://languagetool.org/download/ngram-data/untested/ for your language > 2.) Follow the documentation at > http://wiki.languagetool.org/adding-n-gram-data-rules > > This is not a general replacement for writing rules manually, but it's > often easier and it sometimes works better. In my experience, it's had > to tell which word pairs work will with this approach, it's something > one just has to experiment with. > > Please give it a try and let me know if you have feedback or questions. > > Regards > Daniel ------------------------------------------------------------------------------ _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel