On Tue, Feb 17, 2009 at 9:50 PM, Jarrett Billingsley <jarrett.billings...@gmail.com> wrote: > On Tue, Feb 17, 2009 at 3:30 PM, Daniel de Kok <m...@danieldk.org> wrote: >> >> Hmmm, define "complex" > > \w+([\-+.]\w+)*...@\w+([\-.]\w+)*\.\w+([\-.]\w+)* > > This is a simple email regexp. This takes about 4 or 5 seconds to > compile on my lappy (Pentium M).
Hmm, odd. I have translated that regexp to the syntax of the tool that we used, that is written in Prolog (it is generally a constant factor slower than C/C++/D equivalents). Generating a minimized DFA takes far less than a second. I used the following expression (abstracted a bit with macros): --- macro(letter, {a..z, 'A'..'Z'}). macro(punctlet,[{-,+,.},letter+]). macro(dompunctlet,[{-,.},letter+]). macro(email,[letter+,punctlet*,@,letter+,dompunctlet*,.,letter+,dompunctlet*]). --- The software is available from: http://www.let.rug.nl/~vannoord/Fsa/fsa.html Take care, Daniel