Re: S05 question

Larry Wall Wed, 08 Dec 2004 08:42:46 -0800

On Tue, Dec 07, 2004 at 10:36:53PM -0800, Larry Wall wrote:
: But somehow I expect that when someone writes (<foo>) they probably
: usually meant (Ťfooť).


If we're going to stick with the notion that <foo> captures and something
else doesn't, I'm beginning to think that the other thing isn't Ťfooť for
a couple of reasons.  First, if other languages are going to borrow this
notation, they're probably not going to buy into the French quotes.  Second,
I can think of several other possible uses for the French quotes to cure
perceived ills such as the <(...)> vs <{...}> confusion.  Third, it now
bothers me to have a ! without a ?.  So what if Ťfooť is instead written
<?foo>, meaning you only want to evaluate its success.  (Unlike <!foo>,
it's not zero-width, but that's just how success/failure works.)  So we'd
get things like

    / $<bar> := [ (<?ident>) = (\N+) ]* /

And people would have to get used to seeing ? as non-capturing assertions:

    <?before ...>
    <?after ...>
    <?ws>
    <?sp>
    <?null>

This has a rather Ruby-esque "I am a boolean" feeling to it.  I think
I like it.  It's pretty easy to type, at least on my keyboard.

Now suppose that we extend that "I am a boolean" feeling to

    <?{ code }>

which might take the place of the confusing <(...)>, and make consistent
the notion that we always use {...} to invoke "real" code.

: : Or is it that hypotheticals only bind to things captured by parens?
: : If so, it might need clarification (or perhaps I'm overlooking the part
: : that makes it clear).
: 
: No, I think you just found a blind spot in the design.

I think I'm leaning toward the idea that anything in angles that
begins alpha is a capture to just the alpha part, so the ? prefix is 
merely a no-op that happens to make the assertion not start with an
alpha.  Interestingly, that gives these implicit bindings:

    <after ...>         $<after>        $`
    <before ...>        $<before>       $'

Thought that's an argument for changing them to <pre ...> and <post ...>,
I suppose, since if users are going to refer to $<after> in their main
program, it doesn't look like a declarative assertion anymore.

Another problem we've run into is naming if there are multiple assertions
of the same name.  If the capture name is just the alpha part of the
assertion, then we could allow an optional number, and still recognize
it as a "ws":

    <ws1> <ws2> <ws3>

Except I can well imagine people wanting numbered rules.  Drat.  Could
force people to say <ws_1> if they want that, I suppose.

Or we could use some standard delim for that:

    <ws-1> <ws-2> <ws-3>

which is vaguely reminiscent of our "version" syntax.  Indeed, if we
had quantifications, you might well want to have wildcards <ws-*> and
let the name be filled in rather than autogenerating a list.  But maybe
we just stick with lists in that case.

For captures of non-alpha assertions, we could say that ? is the same
as "true" (just as with regular operators), and so

    <true-3 +<alpha>-[aeiou]>

would capture to $<true-3>.  (And one could always do an explicit binding
for a different name.)

Actually, I think people would find $<match-3> more meaningful than
C<true-3>.

I'm still thinking about what Ť...ť might mean, if anything.  Bonus points
for interpolative and/or word-splitty.

Anyway, that's where I am this week/day/hour/minute/second.

Larry

Re: S05 question

Reply via email to