I know it will be simple to generate ignore rule like this, And I will probably do that, as soon as they pop up in the frequency table.
Ruud Op 16-09-14 om 12:01 schreef Marcin Miłkowski: > W dniu 2014-09-16 o 11:21, R.J. Baars pisze: >> Marcin, >> >> We don't agree. There is a spellchecker, but also a single word ignore >> list for it. > Yes, but for multi-words, we'd have to use the disambiguator code > internally anyway. You ask for yet another notation of the same thing. > > Notice also that no spell checker will propose "Tel Aviv" for "Aviv". > You need to have an XML rule for that. A simple one, to be sure, but > still an XML rule. I think it's pretty trivial to go through a list of > such words and create parallel lists of ignore-spelling rule for > disambiguation and missing part grammar rules. > > Regards, > Marcin > >> There are XML rules, but also a Simplereplace rule, a compounding rule. >> >> So apart from the hammer and the screwdriver, there are more tools. >> >> But anyway, adding the most frequent ones tot the disambiguator works. >> >> Getting rid of wrong postags and 10% reported possible spelling errors on >> the entire corpus is a higher priority. >> And fixing false positives. Having almost doubled the amount or rules is >> enough for this month. >> >> Ruud >> >> >> >>> W dniu 2014-09-16 o 09:03, R.J. Baars pisze: >>>> A word like 'Aviv'is not correct unless 'Tel' is before it. >>>> So it is best to leave Tel and Aviv out of the spell checker. >>>> That results in spell checking reporting errors for Aviv. >>>> >>>> In the disambiguator, there is the option to block that, by making an >>>> immunizing rule: >>>> >>>> <!-- Tel Aviv--> >>>> <rule id="TEL_AVIV" name="Tel Aviv"> >>>> <pattern> >>>> <token>Tel</token> >>>> <token>Aviv</token> >>>> </pattern> >>>> <disambig action="ignore_spelling"/> >>>> </rule> >>>> >>>> That works perfectly. But then, there are a lot of these word >>>> combinations. Wouldn't it be better to have a multi-word ignore list for >>>> the spell checker? >>>> >>>> (Or even a multi-word spell checker, not just knowing 'correct' and 'not >>>> in list', but 'correct', 'incorrect' and 'not in list') >>> It would not be an enhancement, as this would not give new functionality >>> but cripple the existing one. Also, the ability to use all XML syntax is >>> extremely important to me (I use POS tags and regular expressions), so I >>> wouldn't make use of the multi-word spell checker anyway. So we'd have >>> to introduce a crippled syntax that would look a little bit different >>> for a human being but with no meaningful functional change. I don't >>> think it's worth our time. >>> >>> The spell checker is best for checking individual words. Just like a >>> hammer, it's good for nails, and not for screws. For screws, we have a >>> screwdriver. For multi-word entities, we have more refined tools, like >>> tagging and disambiguation and special attributes. >>> >>> Best, >>> Marcin >>> >>> ------------------------------------------------------------------------------ >>> Want excitement? >>> Manually upgrade your production database. >>> When you want reliability, choose Perforce. >>> Perforce version control. Predictably reliable. >>> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> Languagetool-devel mailing list >>> Languagetool-devel@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel >>> >> >> >> ------------------------------------------------------------------------------ >> Want excitement? >> Manually upgrade your production database. >> When you want reliability, choose Perforce. >> Perforce version control. Predictably reliable. >> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk >> _______________________________________________ >> Languagetool-devel mailing list >> Languagetool-devel@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/languagetool-devel >> >> > > ------------------------------------------------------------------------------ > Want excitement? > Manually upgrade your production database. > When you want reliability, choose Perforce. > Perforce version control. Predictably reliable. > http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk > _______________________________________________ > Languagetool-devel mailing list > Languagetool-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/languagetool-devel ------------------------------------------------------------------------------ Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce. Perforce version control. Predictably reliable. http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel