On Fri, Sep 07, 2007 at 02:45:52AM -0500, Patrick R. Michaud wrote:
: On Thu, Sep 06, 2007 at 05:12:03PM -0700, [EMAIL PROTECTED] wrote:
: > Log:
: > old <?foo> is now <+foo> to suppress capture
: > new <?foo> now is zero-width like <!foo>
: 
: I really like the change from <?foo> to <+foo>, but I think there's
: a conflict (or at least some confusion) in the way the new spec is
: worded, especially as it relates to character class sets.

I'm actually still of two minds whether it's proper to overload <+foo>
like that, and what we end up with may well depend on revisions to the
binding syntax.  But it can be <+foo> for now, assuming we can deal with
the ambiguities you point out.  'Course, by the time we're done with
that, we might well decide <+foo> is a bad plan...

: Both old and new versions of S05 say:
: 
:     If the first character after the identifier is whitespace, the
:     subsequent text (following any whitespace) is passed as a regex, 
:     so   <foo bar>   is more or less equivalent to   <foo(/bar/)>  .
: 
: In the previous version of S05, the non-capturing form of <foo bar>
: would be <?foo bar>.  Here, the whitespace after "foo" indicated
: that "bar" was to be parsed and passed to foo as a regex.
: 
: In the new version of S05, the non-capturing form of <foo bar>
: would seem to be <+foo bar>.  Okay, I can handle that.  However, 
: S05 also says that " <foo+bar-baz> can be written as <+ foo + bar - baz> ".
: Presumably this second form would also allow "<+foo + bar - baz>",
: which seems to conflict slightly with the notion that <+foo bar>
: is the non-capturing form of <foo bar>.  In other words, the
: whitespace character following "<+foo" doesn't seem to be
: sufficient to indicate how the remainder is to be processed --
: we have to look beyond the whitespace for a leading plus or minus.

If we stick with +, one approach might be to simply disallow whitespace
in composite character classes.

: Perhaps S05 is addressing this when it says 
: 
:     An initial identifier is taken as a character class, so the 
:     first character after the identifier doesn't matter in this 
:     case, and you can use whitespace however you like.
: 
: Here I find this wording very unclear -- it doesn't tell me 
: what is distinguishing the "doesn't matter in this case" part
: between <+foo + bar> and <+foo bar>.    

What, me unclear?  How could that happen?  :-)

[Don't answer that...]

: Since the S05 spec has changed so that all punctuation is meta, 
: I'm thinking we may be able to simplify the spec altogether.
: Previously the "whitespace following the identifier" was
: used to distinguish <foo-bar> from <foo -bar>, or <alpha-[Jj]>
: from <alpha -[Jj]>.  Since it's now effectively impossible for 
: a regex to begin with a bare plus or minus character, we may be
: able to alter the "whitespace following identifier" wording such
: that <foo-bar> and <foo - bar> are identical.  Perhaps
: something like:
: 
:   - if the character following the identifier is a left paren,
:     it's a call
: 
:         <foo('bar')>
:         <+foo('bar')>
:         <!foo('bar')>
: 
:   - if the character following the identifier is a colon, the rest
:     of the text (following any whitespace) is passed as a string
: 
:         <foo: bar>             # same as <foo('bar')>
:         <+foo: bar>
:         <!foo: bar>
: 
:   - if the identifier is followed by a plus or minus (with optional
:     intervening whitespace), it's a set of character classes
: 
:         <foo+baz-bar>
:         <foo + baz - bar>      # same thing
:         <+foo + baz - bar>     # also the same
: 
:   - anything else following whitespace is a regex to be passed
: 
:         <foo bar>              # same as <foo(/bar/)>
:         <+foo bar>             # same as <+foo(/bar/)>
:         <!foo bar>             # same as <!foo(/bar/)>

That's assuming we don't define any metasyntax that starts with + or
- in the future, such as bare +[ a..z ], or +[ ...] as a variant of
[...]+.  And while we could resolve the ambiguity of the second +
by fiat, it would probably be better if the ambiguity didn't arise
in the first place.  If <+foo ...> is going to change the parsing
of ...  at all, then it should probably do so consistenly, which
means <+foo> is really a bad plan.  (Also, there are already too
many +'s in patterns.)  So while it's cute to generalize <+foo> to
"establish the initial universal set of matches", I suspect it's
likely to change to something else.  Possibilities I've been mulling:

    <~ws>               # "I just want to match as a string"
    <\ws>               # "Don't do the normal thing with the following"
    <.ws>               # "Just call the ws method"
    <=ws>               # "Bind to nothing", assuming <foo=ws> binds $<foo>

Damian points out that it's a little strange for = to enable binding
in the <foo=ws> case but disable it in the <=ws> case.  It would be
possible to make <=ws> mean <ws=ws> and <ws> not capture at all.
Offhand I'd say that would be bad huffmanization, but I need to look
at STD some more.  It also depends on any post-binding syntax
resembling:

    <ws> -> $foo {...}

and whether that is deemed preferable to <foo=ws> or $foo=<ws> or
whatever.  (One nice thing about the post syntax is that we could know
for sure that we're creating a new var, not binding an existing one,
so [] -> $x; might in fact declare $x as a "my" variable that happens
to scope properly under backtracking.  But I digress.)

Other available chars:

    <`ws>
    <^ws>
    <&ws>
    <*ws>
    <-ws>
    <|ws>
    <:ws>
    <;ws>
    </ws>

Larry

Reply via email to