On Fri, Sep 07, 2007 at 02:45:52AM -0500, Patrick R. Michaud wrote: : On Thu, Sep 06, 2007 at 05:12:03PM -0700, [EMAIL PROTECTED] wrote: : > Log: : > old <?foo> is now <+foo> to suppress capture : > new <?foo> now is zero-width like <!foo> : : I really like the change from <?foo> to <+foo>, but I think there's : a conflict (or at least some confusion) in the way the new spec is : worded, especially as it relates to character class sets.
I'm actually still of two minds whether it's proper to overload <+foo> like that, and what we end up with may well depend on revisions to the binding syntax. But it can be <+foo> for now, assuming we can deal with the ambiguities you point out. 'Course, by the time we're done with that, we might well decide <+foo> is a bad plan... : Both old and new versions of S05 say: : : If the first character after the identifier is whitespace, the : subsequent text (following any whitespace) is passed as a regex, : so <foo bar> is more or less equivalent to <foo(/bar/)> . : : In the previous version of S05, the non-capturing form of <foo bar> : would be <?foo bar>. Here, the whitespace after "foo" indicated : that "bar" was to be parsed and passed to foo as a regex. : : In the new version of S05, the non-capturing form of <foo bar> : would seem to be <+foo bar>. Okay, I can handle that. However, : S05 also says that " <foo+bar-baz> can be written as <+ foo + bar - baz> ". : Presumably this second form would also allow "<+foo + bar - baz>", : which seems to conflict slightly with the notion that <+foo bar> : is the non-capturing form of <foo bar>. In other words, the : whitespace character following "<+foo" doesn't seem to be : sufficient to indicate how the remainder is to be processed -- : we have to look beyond the whitespace for a leading plus or minus. If we stick with +, one approach might be to simply disallow whitespace in composite character classes. : Perhaps S05 is addressing this when it says : : An initial identifier is taken as a character class, so the : first character after the identifier doesn't matter in this : case, and you can use whitespace however you like. : : Here I find this wording very unclear -- it doesn't tell me : what is distinguishing the "doesn't matter in this case" part : between <+foo + bar> and <+foo bar>. What, me unclear? How could that happen? :-) [Don't answer that...] : Since the S05 spec has changed so that all punctuation is meta, : I'm thinking we may be able to simplify the spec altogether. : Previously the "whitespace following the identifier" was : used to distinguish <foo-bar> from <foo -bar>, or <alpha-[Jj]> : from <alpha -[Jj]>. Since it's now effectively impossible for : a regex to begin with a bare plus or minus character, we may be : able to alter the "whitespace following identifier" wording such : that <foo-bar> and <foo - bar> are identical. Perhaps : something like: : : - if the character following the identifier is a left paren, : it's a call : : <foo('bar')> : <+foo('bar')> : <!foo('bar')> : : - if the character following the identifier is a colon, the rest : of the text (following any whitespace) is passed as a string : : <foo: bar> # same as <foo('bar')> : <+foo: bar> : <!foo: bar> : : - if the identifier is followed by a plus or minus (with optional : intervening whitespace), it's a set of character classes : : <foo+baz-bar> : <foo + baz - bar> # same thing : <+foo + baz - bar> # also the same : : - anything else following whitespace is a regex to be passed : : <foo bar> # same as <foo(/bar/)> : <+foo bar> # same as <+foo(/bar/)> : <!foo bar> # same as <!foo(/bar/)> That's assuming we don't define any metasyntax that starts with + or - in the future, such as bare +[ a..z ], or +[ ...] as a variant of [...]+. And while we could resolve the ambiguity of the second + by fiat, it would probably be better if the ambiguity didn't arise in the first place. If <+foo ...> is going to change the parsing of ... at all, then it should probably do so consistenly, which means <+foo> is really a bad plan. (Also, there are already too many +'s in patterns.) So while it's cute to generalize <+foo> to "establish the initial universal set of matches", I suspect it's likely to change to something else. Possibilities I've been mulling: <~ws> # "I just want to match as a string" <\ws> # "Don't do the normal thing with the following" <.ws> # "Just call the ws method" <=ws> # "Bind to nothing", assuming <foo=ws> binds $<foo> Damian points out that it's a little strange for = to enable binding in the <foo=ws> case but disable it in the <=ws> case. It would be possible to make <=ws> mean <ws=ws> and <ws> not capture at all. Offhand I'd say that would be bad huffmanization, but I need to look at STD some more. It also depends on any post-binding syntax resembling: <ws> -> $foo {...} and whether that is deemed preferable to <foo=ws> or $foo=<ws> or whatever. (One nice thing about the post syntax is that we could know for sure that we're creating a new var, not binding an existing one, so [] -> $x; might in fact declare $x as a "my" variable that happens to scope properly under backtracking. But I digress.) Other available chars: <`ws> <^ws> <&ws> <*ws> <-ws> <|ws> <:ws> <;ws> </ws> Larry