Re: French: detecting errors with statistics

Dominique Pellé Fri, 01 Apr 2016 00:20:07 -0700

Daniel Naber <daniel.na...@languagetool.org> wrote:

> Hi,
>
> even though I don't speak French, I've started adding confusion pairs
> for French. Here's an example from fr/confusion_sets.txt:
>
> quand; quant; 1000000                                    # p=1.000,
> r=0.662, 186+988, 3grams, 2016-03-29
>
> This means that whenever 'quand' appears, LT checks whether 'quant'
> isn't more probable here using Google ngrams[1] and vice versa.
> '1000000' is a factor to avoid false alarms. p=1.000, r=0.662 means:
> with my evaluation set, this pair has a precision of 1, i.e. it doesn't
> produce any false alarms and a recall of 0.662, i.e. 66,2% of all errors
> are detected.
>
> So far, there are only 9 pairs like this (pris/prix, don/donc, dans/dent
> etc.) but I'm going to add more. I'll do the same for Spanish. Feel free
> to also add pairs. You can check how well a pair works (and find a good
> factor with a low false alarm rate) using ConfusionRuleEvaluator from
> the languagetool-dev module.
>
> Regards
>   Daniel
>
> [1] http://wiki.languagetool.org/finding-errors-using-n-gram-data


Hi Daniel

I have looked at this yet but I wonder... could it be the root cause
for French regression tests failing since on March 30 and March 31?

Regards
Dominique

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Re: French: detecting errors with statistics

Reply via email to