Me writes: : > Very nice (but, I assume you meant {$foo data})! : : I didn't mean that (even if I should have). : : Aiui, Mike's final suggestion was that parens end up : doing all the (ops data) tricks, and braces are used : purely to do code insertions. (I really liked that idea.) : : So: : : Perl 5 Perl6 : (data) ( data) : (?opsdata) (ops data) : ({}) {}
Hmm. Let me spill a few beans about where I'm going with A5. I've been thinking similar thoughts about the problem of overloading parens so heavily in Perl 5, but I'm going in a slightly different direction with it. The basic principles for the new regexen are: * Parens always capture. * Braces are always closures. * Square brackets are always character classes. * Angle brackets are always metasyntax (along with backslash). So a first whack at the differences might be: Old New --- --- // /<prior>/ ??? ?pat? /<?f:pat/ ??? /pat/i m:i/pat/ or /<?i:pat>/ or even m<?i:pat> ??? /pat/x /pat/ /^pat$/m /^^pat$$/ /./s /<any>/ or /<.>/ ??? \p{prop} <+prop> ??? \P{prop} <-prop> ??? space <sp> (or \h for "horizontal"?) {n,m} <n,m> \t also <tab> \n also <lf> or <nl> (latter matching logical newline) \r also <cr> \f also <ff> \a also <bell> \e also <esc> \033 same \x1B same \x{263a} \x<263a> ??? \c[ same \N{name} <name> \l same \u same \Lstring\E \L<string> \Ustring\E \U<string> \E gone [\040\t] \h plus any Unicode horizontal whitespace [\r\n\ck] \v plus any Unicode vertical whitespace \b same \B same \A ^ \Z same? \z $ \G <pos>, but assumed in nested patterns? \1 $1 \Q$var\E $var always assumed literal, so $1 is literal backref $var <$var> assumed to be regex =~ $re =~ /<$re>/ ouch? (??{$rule}) <rule> (?{ code }) { code } with failure semantics (?#...) {"..."} :-) (?:...) <:...> (?=...) <before: ...> (?!...) <!before: ...> (?<=...) <after: ...> (?<!...) <!after: ...> (?>...) <grab: ...> (?(cond)t|f) Not sure. Could just use { if ... } Obviously the <word> and <word:...> syntaxes will be user extensible. We have to be able to support full grammars. I consider it a feature that <foo> looks like a non-terminal in standard BNF notation. I do not consider it a misfeature that <foo> resembles an HTML or XML tag, since most of those languages need to be matched with a fancy rule named <tag> anyway. An interesting idea would be that if you say m<foo: pat> or m{code} it's as if you said m/<foo: pat>/ or m/{code}/ The latter is particularly interesting to me in that I can see uses for patterns that are Perl code at the top level rather than regex literal. Any closure within a regular expression has full access to the current state object for the match. So most of the RFCs proposing ad hoc mechanisms for saving submatches in various kinds of variables can be handled with closures. /(...)(...)(...) { @array = .all } / or /(...) { $first = $+ } (...) { $second = $+ } (...) { $third = $+ }/ or /<IF> (<COND>) (<BLOCK>) { .node = ["if",$1,$2] } / # shades of yacc or whatever. Could have a <$foo=...> as syntactic sugar, perhaps. But we need the general mechanism for building up parse trees of arrays of hashes of arrays of arrays of hashes of arrays of hashes of... I haven't decided yet whether matches embedded in the closure should automatically pick up where the outer match is, or whether there should be some explicit match op to mean that, much like \G only better. I'm thinking when the current topic is a match state, we automatically continue where we left off, and require explicit =~ to start an unrelated match. I also haven't committed to any particular mechanism for defining a set of related rules in a grammar. Obviously it needs to be a good enough mechanism to parse Perl and its variants, which means it probably needs to be OO based, and you make new grammars by derivation from the base grammar and overriding the rules you want to change. Sorry if this is a bit delirious--I'm fighting off some kind of infection, and my nights have been shortchanged lately by the neighborhood panhandler who doesn't seem to understand either complicated concepts like "bedtime" or simple concepts like "no". Larry