W dniu 2012-12-27 18:00, Dominique Pellé pisze: > Hi > > Hunspell spelling checker or FSA spelling dictionaries have no regexp > mechanism to specify valid words. > > It would be useful if we could specify valid words with a regexp. > > For example, in French, I'm currently seeing spurious spelling errors for > first, second, third when written as: 1er, 1ere, 2e, 3e, 4e (etc) (= French > for: 1st, 2nd, 3rd, 4th...) with LanguageTool-2.0 (release candidate). > These could easily be specified with a regexp "1ere?|[2-9]e|[1-9][0-9]+e". > > Such regexp(s) could be checked in Java before probing the Hunspell > or FSA spelling dictionary.
This could mean a huge slowdown if we make the mechanism too generic. Regexes are very expensive, and it's very easy to make them resource hogs. > Do you think that would be a useful addition or is there a > better alternative? It is also in principle possible to use a limited subset of regexes to build a fsa dictionary. Strictly limited regexes (without backtracking, groups, greedy quantification etc. -- you should avoid anything that could open the door for the pumping lemma) can be translated into a finite state machine. So we might implement a slightly more advanced fsa builder. But this is probably overkill for such a simple use. The easiest way is to create a mechanism that would create a finite list of all words that match a (limited) regex, and add this list to a spelling dictionary. Probably Viterbi algorithm could work for regex-matching list creation. Regards, Marcin ------------------------------------------------------------------------------ Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at: http://p.sf.net/sfu/learnmore_122712 _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel