Re: Anyone using the current regex ops?

Steve Fink Mon, 18 Mar 2002 09:49:38 -0800

On Mon, Mar 18, 2002 at 09:09:59AM -0500, Dan Sugalski wrote:
> At 4:17 PM -0800 3/17/02, Steve Fink wrote:
> >On Sat, Mar 16, 2002 at 04:34:34PM -0500, Dan Sugalski wrote:
> >> Now's your time to speak up, please.
> >
> >Ok, you asked for it. I just committed the regular expression
> >compiler. It has known bugs, but I am completely out of tuits for now,
> >and have been since about the time I announced this thing's existence.
> >:-(
> 
> Fair enough--that's what I wanted to know.


I'm not really enough of a user to merit keeping it in for my sake.
Anyone else?

> >I do wonder what you'd replace the rx opcodes with. I don't see any
> >use for some of the existing opcodes (regex flag setting, zwa_atend,
> >etc.), but doing things like maintaining the backtracking stack using
> >generic opcodes sounds very slow to me.
> 
> Maybe. I don't think so, though, and if some specialized constructs 
> are needed, I think they need to be brought outside the context of 
> the regex engine.

I can certainly agree with this. I don't think there's anything that's
inherently specific to regular expressions, although there may be some
opcodes that would never really end up being used by anything else.
(Though regular expression code now kind of includes parsing code, so
that's a lot.)

> >The last benchmarking I did on the regex engine (with a single regex,
> >admittedly) put it at somewhat better than half the speed of perl5's
> >engine, which isn't too bad. Do you have newer (worse) numbers?
> 
> Half the speed is pretty abominable. However, code using the regex 
> opcodes to match /fo+ba?r/ runs at the same speed or slower than code 
> using plain positional ords, and the positional ord code sees 
> significant gains when using the JIT. Neither version came within 
> half the speed of perl, though. Both were slower than that.

Probably because you didn't cheat. :-) I was benchmarking my code
against Brent's and winning, so I wrote to him for help, and disabling
a forced canonicalization to UTF-32 reversed the situation. And that's
probably a fairer comparison to perl5.

> >My compiler would be relatively easy to retarget to general opcodes or
> >another set of regex opcodes, but I am very skeptical that regexen can
> >be sufficiently fast without some tailored opcodes and a regex state
> >PMC.
> 
> Possibly. Regardless, at the moment I'm thinking that the current 
> shot at regexes is, while good, insufficiently compelling over the 
> plain opcodes we have now.
> -- 

So what's the best way of ensuring that we can easily compare the
speeds of the two? We could move the rx .ops files and supporting .c
and .h files into somewhere in languages/regex, though right now it's
kind of a pain to have .ops files outside of the root directory.

Re: Anyone using the current regex ops?

Reply via email to