Benjamin Goldberg: > Since I don't see anything to save/restore the instack on subroutine > calls, I am wondering what happens if a regex has a (?{ CODE }), and > that CODE calls a regex. Are we garunteed that after a regex completes > (either succeeds or fails) that the intstack is in the same state it > started? If not (and remember, exceptions can leave things in odd > states), what keeps things from being fubared?
No and nothing, respectively. > Throughout the rx opcode definitions, I see much use of string_index to > find the character at a particular index. If the string's encoding uses > multiple bytes per character, this can be O(N) for each call. Not good. > > Since str->encoding->skip_forward is supposed to be O(N) in terms of how > many chars are skipped forwards, and since most of those string_index()s > are one character-index away from an index which was recently accessed, > we should be able to switch to that for a great speed improvement. At one point, there was an rx_normalize (or somesuch) op to transcode to UTF-32 (which IIRC always has four-byte characters) and versions of most of the operations that would cheat and index into the buffer directly. I don't know if that still exists, but I think something like it is a good idea. That way, you get the choice of paying the price at the beginning and letting the regex engine cheat later, or keeping your encoding and dealing with the speed hit. > Given the growing number of things that each > regex subroutine needs to keep track of ... > IMHO, this would be a good use for a regex state struct. Cycle of reincarnation--take a look at early versions of the regex engine. They all used a struct like this. I believe it eventually proved faster to pass things in explicitly, but someone else hacked that in--I never really agreed with that decision, because I considered the regex struct design to be much cleaner and more expandable. Honestly, though, I'm no longer sure the full regex engine is a good idea. A fast index op, a fast ord op, a character class op, and the intstack is really all that's needed to make a regex engine from plain Parrot opcodes. --Brent Dax <[EMAIL PROTECTED]> Perl and Parrot hacker