W dniu 2012-12-27 18:00, Dominique Pellé pisze:
> Hi
>
> Hunspell spelling checker or FSA spelling dictionaries have no regexp
> mechanism to specify valid words.
>
> It would be useful if we could specify valid words with a regexp.
>
> For example, in French, I'm currently seeing spurious spelling errors for
> first, second, third when written as:  1er, 1ere, 2e, 3e, 4e (etc) (= French
> for: 1st, 2nd, 3rd, 4th...) with LanguageTool-2.0 (release candidate).
> These could easily be specified with a regexp  "1ere?|[2-9]e|[1-9][0-9]+e".
>
> Such regexp(s) could be checked in Java before probing the Hunspell
> or FSA spelling dictionary.

This could mean a huge slowdown if we make the mechanism too generic. 
Regexes are very expensive, and it's very easy to make them resource hogs.

> Do you think that would be a useful addition or is there a
> better alternative?

It is also in principle possible to use a limited subset of regexes to 
build a fsa dictionary. Strictly limited regexes (without backtracking, 
groups, greedy quantification etc. -- you should avoid anything that 
could open the door for the pumping lemma) can be translated into a 
finite state machine. So we might implement a slightly more advanced fsa 
builder. But this is probably overkill for such a simple use.

The easiest way is to create a mechanism that would create a finite list 
of all words that match a (limited) regex, and add this list to a 
spelling dictionary. Probably Viterbi algorithm could work for 
regex-matching list creation.

Regards,
Marcin

------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122712
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to