[Chicken-users] an oddly slow regex ...

2013-10-26 Thread Matt Welland
This regex is so slow that you don't need a timer to see the impact (at
least not on my machine with chicken 4.8.0):

 (string-match [a-z][a-z0-9\\-_.]{0,20} a012345678901234567890123456789)

Changing the {0,20} to + makes it run normally fast so I just replaced the
regex with a string-length and modified the {0,20} to + . I don't
necessarily need a fix for this but it seems like a possible symptom of a
deeper problem so I thought I'd report it.

Pre-compiling the regex didn't seem to make any difference. Just for
completeness I compared with Ruby and the ruby equivalent is (in human
terms) instant.

a012345678901234567890123456789.match(/[a-z][a-z0-9]{0,20}/)

-- 
Matt
-=-
90% of the nations wealth is held by 2% of the people. Bummer to be in the
majority...
___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] an oddly slow regex ...

2013-10-26 Thread Matt Welland
On Sat, Oct 26, 2013 at 11:38 AM, Peter Bex peter@xs4all.nl wrote:

 On Sat, Oct 26, 2013 at 10:37:36AM -0700, Matt Welland wrote:
  This regex is so slow that you don't need a timer to see the impact (at
  least not on my machine with chicken 4.8.0):
 
   (string-match [a-z][a-z0-9\\-_.]{0,20}
 a012345678901234567890123456789)
 
  Changing the {0,20} to + makes it run normally fast so I just replaced
 the
  regex with a string-length and modified the {0,20} to + . I don't
  necessarily need a fix for this but it seems like a possible symptom of a
  deeper problem so I thought I'd report it.

 Hi Matt,

 Thanks for your report.  I'm afraid this is a known problem with
 Irregex - to avoid producing a state machine with too many states,
 it will always use a backtracking implementation for all repetition
 counts.

 I think it's best to take a look at how to fix this upstream first.
 Maybe Alex has an idea of how to do that.

  Pre-compiling the regex didn't seem to make any difference.

 That's because the backtracker matches really slowly.

  Just for completeness I compared with Ruby and the ruby equivalent
  is (in human terms) instant.
 
  a012345678901234567890123456789.match(/[a-z][a-z0-9]{0,20}/)

 Thanks for that!  We need more of such real-world test cases.
 Ideally I'd love to have a complete library of regex examples, which
 we can use to optimize irregex further.

 By the way, I note that your Ruby regex isn't exactly the same as the
 one you used in CHICKEN.  Does it make a difference if you use the
 same regex?


Ah, yes, I did initially compare the exact same regex and saw no
discernible speed impact in Ruby. I did a few variations to see if there
was any sensitivity to ^ or $ at the beginning or end of the regex or
variations to the contents of the character group. This seems to be pretty
much a function of the match count {0,20}.

Some coding for performance hints on the regex page would likely be helpful.




 Cheers,
 Peter
 --
 http://www.more-magic.net




-- 
Matt
-=-
90% of the nations wealth is held by 2% of the people. Bummer to be in the
majority...
___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users