Re: using ngram data to detect errors

Daniel Naber Mon, 19 Oct 2015 00:16:31 -0700

Hi,

this is just a reminder that the data for statistical error detection 
exists, now people just need to use it...


Regards
  Daniel

On 2015-09-16 22:50, Daniel Naber wrote:
> Hi,
> 
> some time ago, I've added a rule for English to detect errors
> statistically, by using large ngram data sets. I've activated the rule
> now for all languages that we have data for: Chinese, French, Italian,
> Russian, and Spanish (German had been activated for some time already).
> 
> That means rule developers can add word pairs to the
> 'confusion_sets.txt' file and LT will try to detect wrong usage of
> either word of the pair. Here's how you can use this approach to detect
> errors:
> 
> 1.) Download the (large) data from
> http://languagetool.org/download/ngram-data/untested/ for your language
> 2.) Follow the documentation at
> http://wiki.languagetool.org/adding-n-gram-data-rules
> 
> This is not a general replacement for writing rules manually, but it's
> often easier and it sometimes works better. In my experience, it's had
> to tell which word pairs work will with this approach, it's something
> one just has to experiment with.
> 
> Please give it a try and let me know if you have feedback or questions.
> 
> Regards
>   Daniel


------------------------------------------------------------------------------
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Re: using ngram data to detect errors

Reply via email to