At 5:38 PM +0100 3/19/03, Matthijs van Duin wrote:
On Wed, Mar 19, 2003 at 11:09:01AM -0500, Dan Sugalski wrote:
At the time I run the regex, I can inline things. There's nothing that prevents it. Yes, at compile time it's potentially an issue, since things can be overridden later,

OK, but that's not how you initially presented it :-)

Then I wasn't clear enough, sorry. This is perl -- the state of something at compile time is just a suggestion as to how things ultimately work. The state at the time of is the only thing that really matters, and I shortcut.


you aren't allowed to selectively redefine rules in the middle of a regex that uses those rules. Or, rather, you can but the update won't take effect until after the end

I don't recall having seen such a restriction mentioned in Apoc 5.

I'll nudge Larry to add it explicitly, but in general redefinitons of code that you're in the middle of executing don't take effect immediately, and it's not really any different for regex rules than for subs.


While I'm a big fan of optimization, especially for something like this, I think we should be careful with introducing mandatory restrictions just to aid optimization. ("is inline" will allow such optimizations ofcourse)

Actually, we should be extraordinarily liberal with the application of restrictions at this phase. It's far easier to lift a restriction later than to impose it later, and I very much want to stomp out any constructs that will force slow code execution. Yes, I may lose, but if I don't try...


My job, after all, is to make it go fast. If you want something that'll require things to be slow then I don't want you to have it. :)

There's issues with hypothetical variables and continuations. (And with coroutines as well) While this is a general issue, they come up most with regexes.

I'm still curious what you're referring to exactly. I've outlined possible semantics for hypothetical variables in earlier posts that should work.

The issue of hypotheticals is complex.


We do, after all, want this fast, right?

Ofcourse, and we should optimize as much as we can - but not optimize *more* than we can. Rules need generic backtracking semantics, and that's what I'm talking about.

No. No, in fact they don't. Rules need very specific backtracking semantics, since rules are fairly specific. We're talking about backtracking in regular expressions, which is a fairly specific generality. If you want to talk about a more general backtracking that's fine, but it won't apply to how regexes backtrack.

My impression from A5 and A6 is that rules are methods. They're looked up like methods, they can be invoked like methods, etc.

They aren't methods, though. They're not code in general, they're regex constructions in specific. Because they live in the symbol table and in some cases can be invoked as subs/methods doesn't make them subs or methods, it makes them regex constructs with funky wrappers if you want to use them in a non-regex manner.


I certainly want to be able to write rules myself, manually, when I think it's appropriate; and use these as subrules in other methods. Generic backtracking semantics are needed for that, and should at least conceptually also apply to normal rules.

No, no it shouldn't. Rule are rules for regexes, they are *not* subs. If you want generic backtracking to work, then there can't be any difference between:


rule foo { \w+ }

and
  sub foo { ... }

but there must be. With rules as regex constructs the semantics are much simpler. If we allow rules to be arbitrary code not only do we have to expose a fair amount of the internals of the regex engine to the sub so it can actually work on the stream and note its position (which is fine, I can do that) we also need to be able to pause foo in the middle and jump back in while passing in parameters of some sort. Neither continuations nor standard coroutines are sufficient in this instance, since the reinvocation must *both* preserve the state of the code at the time it exited but also pass in an indication as to what the sub should do. For example, if the foo sub was treated as a rule and we backtrack, should it slurp more or less?

If rules are just plain regex rules and not potentially arbitrary code, the required semantics are much simpler.

Then there's the issue of being able to return continuations from within arbitrary unnamed blocks, since the block in this:

$foo ~~ m:w/<alpha> {...} <number>/;

should be able to participate in the backtracking activities if we're not drawing a distinction between rules and generic code. (Yeah, the syntax is wrong, but you get the point)

Ultimately the question is "How do you backtrack into arbitrary code, and how do we know that the arbitrary code can be backtracked into?" My answer is we don't, but I'm not sure how popular that particular answer is.

When common sub-patterns are inlined, simple regexen will not use runtime subrules at all, so the issue doesn't exist there - that covers everything you would do with regexen in perl 5 for example.

All rules will essentially be inlined at regex invocation time. There may be some indirect rule dispatch, but in a very simplistic form compared to sub dispatch.


I say, make generic semantics first, and then optimize the heck out of it.

That's fine. I disagree. :) -- Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Reply via email to