W dniu 2012-11-19 11:39, Dominique Pellé pisze: > Marcin Miłkowski wrote: > > > > In Breton at least, I experience sometimes a combinatorial > > > explosion or rules in order to implement what I want. Of > > > course many rules probably slow down. > > > > I don't think that having extra 16 rules changes much as they usually > > don't match anyway. The slowdown cannot be due to this thing. > > Yes, it won't provide a big speed up. It's more a matter > of making it easier to maintain rules. > > > > Another construct which would help to avoid explosion of > > > number of rules is a way to be able to perform several > > > substitutions. Here is an example in Breton: > > > > > > <rule id="DAM" name="da + ma = da’m"> > > > <pattern> > > > <token>da</token> > > > <token>ma</token> > > > </pattern> > > > <message>Gwelloc’h eo skrivañ > > > <suggestion>\1’m</suggestion>.</message> > > > <example type="incorrect">Lavaret em eus <marker>da ma</marker> > > > zad.</example> > > > <example type="correct">Lavaret em eus da’m zad.</example> > > > </rule> > > > > > > <rule id="DAZ" name="da + da = da’z"> > > > <pattern> > > > <token>da</token> > > > <token>da</token> > > > </pattern> > > > <message>Gwelloc’h eo skrivañ > > > <suggestion>\1’z</suggestion>.</message> > > > <example type="incorrect">Lavaret em eus <marker>da da</marker> > > > dad.</example> > > > <example type="correct">Lavaret em eus da’z tad.</example> > > > </rule> > > > > > > Those 2 rules are almost the same. I wish I could write them in > > > one single rule with the pattern.... > > > > Right. You think of conditional search replace (if ma, then ’m; if da, > > then ’z). If it were ma -> ’m and da -> ’d, then you could simply > > replace to ’$1, where you'd match ([dm]) in the regexp. Now, since you > > have ’z as the second replacement, you can try another trick. Simply > > make two <match> elements: first for "ma", second for "da", and make > > sure they are exclusive. One will produce an empty string, and another > > the string you want. I did not test it, but the idea is simple enough. > > > > The only caveat is that I don't remember what <match> does by default if > > it produces an empty string via substitution. For some time, it did > > produce the original string in parentheses, but we can change it easily > > if it still does (I remember I changed some of this because of > > spell-checking). > > I remember now trying exactly that a long time ago, hoping it would work > but it does not because when a <match> element does not match, it > unfortunately outputs the token unchanged, rather than output and > empty string which would be more useful in this case. I'm not sure > that behavior can be changed without introducing backward compatibilities > issues. > > I remember discussing this "issue" in the mailing list a long time ago. > I proposed a small patch to improve it which added the optional > regexp_replace_nomatch="..." attribute to the <match> tag. > Ha! I found that discussion in the mailing list achive (2011-10-16): > > http://sourceforge.net/mailarchive/forum.php?thread_name=CAON-T_gEtK3NYzi4LHDJoyANQ5R6GMmMeCRxVzTi%3DZ-ctaFk-g%40mail.gmail.com&forum_name=languagetool-devel > > With that proposal, I could write a single rule: > > <rule id="DAZ" name="da + da = da’z, da ma = da'm"> > <pattern> > <token>da</token> > <token regexp="yes">[dm]a</token> > </pattern> > <message>Gwelloc’h eo skrivañ <suggestion>\1’<match no="1" > regexp_match="da" regexp_replace="z" regexp_replace_nomatch=""/><match > no="1" regexp_match="ma" regexp_replace="m" > regexp_replace_nomatch=""/></suggestion>.</message> > <example type="incorrect">Lavaret em eus <marker>da ma</marker> > zad.</example> > <example type="correct">Lavaret em eus da’m zad.</example> > <example type="incorrect">Lavaret em eus <marker>da da</marker> > dad.</example> > <example type="correct">Lavaret em eus da’z tad.</example></rule> > </rule> > > You replied privately to me at the time (not in the mailing list) indicating > that my proposal may be a little hacky solution, and a somewhat more > general mechanism. > > What this general mechanism would be is unclear to me.
Maybe I thought of the spelling-check suppression of suggestions produced by regular expressions. Or something else, which is now unimportant. I think I will implement something like your solution (and check how it behaves right now for languages with a spell-check suppression in the synthesizer and ones without). Best, Marcin ------------------------------------------------------------------------------ Monitor your physical, virtual and cloud infrastructure from a single web console. Get in-depth insight into apps, servers, databases, vmware, SAP, cloud infrastructure, etc. Download 30-day Free Trial. Pricing starts from $795 for 25 servers or applications! http://p.sf.net/sfu/zoho_dev2dev_nov _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel