Daniel Naber <daniel.na...@languagetool.org> wrote: > Hi, > > even though I don't speak French, I've started adding confusion pairs > for French. Here's an example from fr/confusion_sets.txt: > > quand; quant; 1000000 # p=1.000, > r=0.662, 186+988, 3grams, 2016-03-29 > > This means that whenever 'quand' appears, LT checks whether 'quant' > isn't more probable here using Google ngrams[1] and vice versa. > '1000000' is a factor to avoid false alarms. p=1.000, r=0.662 means: > with my evaluation set, this pair has a precision of 1, i.e. it doesn't > produce any false alarms and a recall of 0.662, i.e. 66,2% of all errors > are detected. > > So far, there are only 9 pairs like this (pris/prix, don/donc, dans/dent > etc.) but I'm going to add more. I'll do the same for Spanish. Feel free > to also add pairs. You can check how well a pair works (and find a good > factor with a low false alarm rate) using ConfusionRuleEvaluator from > the languagetool-dev module. > > Regards > Daniel > > [1] http://wiki.languagetool.org/finding-errors-using-n-gram-data
Hi Daniel I have looked at this yet but I wonder... could it be the root cause for French regression tests failing since on March 30 and March 31? Regards Dominique ------------------------------------------------------------------------------ Transform Data into Opportunity. Accelerate data analysis in your applications with Intel Data Analytics Acceleration Library. Click to learn more. http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140 _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel