Benjamin Goldberg:
> Since I don't see anything to save/restore the instack on subroutine
> calls, I am wondering what happens if a regex has a (?{ CODE }), and
> that CODE calls a regex.  Are we garunteed that after a regex completes
> (either succeeds or fails) that the intstack is in the same state it
> started?  If not (and remember, exceptions can leave things in odd
> states), what keeps things from being fubared?

No and nothing, respectively.

> Throughout the rx opcode definitions, I see much use of string_index to
> find the character at a particular index.  If the string's encoding uses
> multiple bytes per character, this can be O(N) for each call.  Not good.
>
> Since str->encoding->skip_forward is supposed to be O(N) in terms of how
> many chars are skipped forwards, and since most of those string_index()s
> are one character-index away from an index which was recently accessed,
> we should be able to switch to that for a great speed improvement.

At one point, there was an rx_normalize (or somesuch) op to transcode to
UTF-32 (which IIRC always has four-byte characters) and versions of most of
the operations that would cheat and index into the buffer directly.  I don't
know if that still exists, but I think something like it is a good idea.
That way, you get the choice of paying the price at the beginning and
letting the regex engine cheat later, or keeping your encoding and dealing
with the speed hit.

> Given the growing number of things that each
> regex subroutine needs to keep track of ...
> IMHO, this would be a good use for a regex state struct.

Cycle of reincarnation--take a look at early versions of the regex engine.
They all used a struct like this.  I believe it eventually proved faster to
pass things in explicitly, but someone else hacked that in--I never really
agreed with that decision, because I considered the regex struct design to
be much cleaner and more expandable.

Honestly, though, I'm no longer sure the full regex engine is a good idea.
A fast index op, a fast ord op, a character class op, and the intstack is
really all that's needed to make a regex engine from plain Parrot opcodes.

--Brent Dax <[EMAIL PROTECTED]>
Perl and Parrot hacker

Reply via email to