Dan Sugalski writes:
: At 11:24 AM 6/4/2001 -0700, Larry Wall wrote:
: >Dan Sugalski writes:
: >: Are you speaking of the nodes in regnode.h? I hadn't considered them as
: >: regular perl opcodes--I figured they'd stay internal to the regex engine so
: >: we could keep it reasonably modular.
: >
: >I don't think that's a terribly strong argument--one could justify any
: >number of unfortunate architectural distinctions on the basis of
: >modularity.
: 
: Yeah I know. I mean, look at how those darned bricks have held back 
: architecture all these years! :-P

Hey, I come from the coast where we try to avoid bricks, especially
when they're falling.

: >Plus, I'd argue you can still retain modularity of your
: >code while unifying implementational philosophy.
: 
: I'm not entirely sure of that one--processing a full regex requires the 
: perl interpreter, it's not all that modular.

These days I'm trying to see the regex as just a funny-looking kind of
Perl code.

: Though whether being able to 
: yank out the RE engine and treat it as a standalone library is important 
: enough to warrant being treated as a design goal or not is a separate 
: issue. (I think so, as it also means I can treat it as a black box for the 
: moment so there's less to try and stuff in my head at once)

As a fellow bear of very little brain, I'm just trying to point out that
we already have a good example of the dangers to that approach.

: >It seems to me that the main reason for not considering such a
: >unification is that We've Never Done It That Way Before.  It's as if
: >regular expressions have always been second-class programs, so they'll
: >always be second-class programs, world without end, amen, amen.
: 
: No, not really. The big reasons I wasn't planning on unification are:
: 
: *) It makes the amount of mental space the core interpreter takes up smaller

It may certainly be valuable to (not) think of it that way, but just
don't be surprised if the regex folks come along and borrow a lot of
your opcodes to make things that look like (in C):

    while (s < send && isdigit(*s)) s++;

: *) It can make performance tradeoffs separately from the main perl engine

The option of doing its own thing its own way is always open to an
opcode, but when you do that the option of making efficient use of the
core infrastructure goes away.  As an honorary member of the regex
hacking team, I covet registers.  :-)

: *) We can probably snag the current perl 5 source without much change

Cough, cough.

: *) The current RE engine's scared (or is that scarred?) me off enough that 
: I'd as soon leave it to someone who's more tempermentally suited for such 
: things.

As an honorary member of the temperamentally suited team, allow me to
repeat myself.  Cough, cough.  We can certainly borrow the ideas from
Perl 5's regex engine, but even us temperamentally suited veterans are
sufficiently scared/scarred to want something that works better.

: *) Treating regexes as non-atomic operations brings some serious threading 
: issues into things.

Eh, treating regexes as atomic is the root of the re-entrancy problem.
If the regex has access to local storage, the re-entrancy and threading
problems pretty much solve themselves.

: >The fact that Perl 5's regex engine is a royal pain to deal with should
: >be a warning to us.
: 
: I can think of a couple of reasons that the current engine's a royal pain, 
: and they don't have much to do with it as a separate entity...

Sure, I'm just saying that at least two of those couple reasons are
that 1) it invents its own opcode storage mechanism, and 2) it uses
globals for efficiency when it should be using some efficient variety
of locals.

: >Much of the pain of dealing with the regex engine in Perl 5 has to do
: >with allocation of opcodes and temporary values in a non-standard
: >fashion, and dealing with the resultant non-reentrancy on an ad hoc
: >basis.  We've already tried that experiment, and it sucks.  I don't
: >want to see the regex engine get swept back under the complexity carpet
: >for Perl 6.
: 
: Yeah, but those are mostly issues with the implementation, not with the 
: separation.

That sounds suspiciously like what I'm trying to say.

: >That's a scenario I'd love to avoid.  And if we can manage to store
: >regex opcodes and state using mechanisms similar to ordinary opcodes,
: >maybe we'll not fall back into the situation where the regex engine is
: >understood by only three people, plus or minus four.
: 
: While I'm not sure I agree with it, if that's what you want, then that's 
: what we'll do. Threading will complicate this some, since we'll need to 
: guarantee atomicity across multiple opcodes, something I'd not planned on 
: doing.

How will we guarantee that my $foo stays "my" under threading?  Surely
the same mechanism could serve to keep "my" regex state variables sane
under the same circumstances.  (I oversimplify, of course...)

Larry

Reply via email to