Re: [Languagetool] Performance

Marcin Miłkowski Mon, 19 Nov 2012 08:31:20 -0800

W dniu 2012-11-19 11:39, Dominique Pellé pisze:
> Marcin Miłkowski wrote:
>
>  > > In Breton at least, I experience sometimes a combinatorial
>  > > explosion or rules in order to implement what I want.  Of
>  > > course many rules probably slow down.
>  >
>  > I don't think that having extra 16 rules changes much as they usually
>  > don't match anyway. The slowdown cannot be due to this thing.
>
> Yes, it won't provide a big speed up. It's more a matter
> of making it easier to maintain rules.
>
>  > > Another construct which would help to avoid explosion of
>  > > number of rules is a way to be able to perform several
>  > > substitutions.  Here is an example in Breton:
>  > >
>  > >       <rule id="DAM" name="da + ma = da’m">
>  > >        <pattern>
>  > >          <token>da</token>
>  > >          <token>ma</token>
>  > >        </pattern>
>  > >        <message>Gwelloc’h eo skrivañ
>  > > <suggestion>\1’m</suggestion>.</message>
>  > >        <example type="incorrect">Lavaret em eus <marker>da ma</marker>
>  > > zad.</example>
>  > >        <example type="correct">Lavaret em eus da’m zad.</example>
>  > >      </rule>
>  > >
>  > >      <rule id="DAZ" name="da + da = da’z">
>  > >        <pattern>
>  > >          <token>da</token>
>  > >          <token>da</token>
>  > >        </pattern>
>  > >        <message>Gwelloc’h eo skrivañ
>  > > <suggestion>\1’z</suggestion>.</message>
>  > >        <example type="incorrect">Lavaret em eus <marker>da da</marker>
>  > > dad.</example>
>  > >        <example type="correct">Lavaret em eus da’z tad.</example>
>  > >      </rule>
>  > >
>  > > Those 2 rules are almost the same. I wish I could write them in
>  > > one single rule with the pattern....
>  >
>  > Right. You think of conditional search replace (if ma, then ’m; if da,
>  > then ’z). If it were ma -> ’m and da -> ’d, then you could simply
>  > replace to ’$1, where you'd match ([dm]) in the regexp. Now, since you
>  > have ’z as the second replacement, you can try another trick. Simply
>  > make two <match> elements: first for "ma", second for "da", and make
>  > sure they are exclusive. One will produce an empty string, and another
>  > the string you want. I did not test it, but the idea is simple enough.
>  >
>  > The only caveat is that I don't remember what <match> does by default if
>  > it produces an empty string via substitution. For some time, it did
>  > produce the original string in parentheses, but we can change it easily
>  > if it still does (I remember I changed some of this because of
>  > spell-checking).
>
> I remember now trying exactly that a long time ago, hoping it would work
> but it does not because when a <match> element does not match, it
> unfortunately outputs the token unchanged, rather than output and
> empty string which would be more useful in this case.  I'm not sure
> that behavior can be changed without introducing backward compatibilities
> issues.
>
> I remember discussing this "issue" in the mailing list a long time ago.
> I proposed a small patch to improve it which added the optional
> regexp_replace_nomatch="..." attribute to the <match> tag.
> Ha! I found that discussion in the mailing list achive (2011-10-16):
>
> http://sourceforge.net/mailarchive/forum.php?thread_name=CAON-T_gEtK3NYzi4LHDJoyANQ5R6GMmMeCRxVzTi%3DZ-ctaFk-g%40mail.gmail.com&forum_name=languagetool-devel
>
> With that proposal, I could write a single rule:
>
> <rule id="DAZ" name="da + da = da’z, da ma = da'm">
>    <pattern>
>       <token>da</token>
>       <token regexp="yes">[dm]a</token>
>     </pattern>
>     <message>Gwelloc’h eo skrivañ <suggestion>\1’<match no="1"
> regexp_match="da" regexp_replace="z" regexp_replace_nomatch=""/><match
> no="1" regexp_match="ma" regexp_replace="m"
> regexp_replace_nomatch=""/></suggestion>.</message>
>     <example type="incorrect">Lavaret em eus <marker>da ma</marker>
> zad.</example>
>     <example type="correct">Lavaret em eus da’m zad.</example>
>     <example type="incorrect">Lavaret em eus <marker>da da</marker>
> dad.</example>
>     <example type="correct">Lavaret em eus da’z tad.</example></rule>
> </rule>
>
> You replied privately to me at the time (not in the mailing list) indicating
> that my proposal may be a little hacky solution, and a somewhat more
> general mechanism.
>
> What this general mechanism would be is unclear to me.


Maybe I thought of the spelling-check suppression of suggestions 
produced by regular expressions. Or something else, which is now 
unimportant.

I think I will implement something like your solution (and check how it 
behaves right now for languages with a spell-check suppression in the 
synthesizer and ones without).

Best,
Marcin

------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Re: [Languagetool] Performance

Reply via email to