Brent Dax writes:
: # ?pat? /<?f:pat/ ???
: # /pat/i m:i/pat/ or /<?i:pat>/ or even m<?i:pat> ???
:
: Whoa, those are moving to the front?!?
The problem with options in general is that they can't easily modify
parsing if they come in back. Now in the particular case of /f and /i,
it probably doesn't matter. But I was trying to see if there was some way
to do away with trailing options altogether. This might even extend to
things like:
qq:s"$interpolates @doesn't %doesn't"
And that's definitely a situation where it changes the parse. Hmm, if
strings have options, they're probably addititive, so to add scalar
interpolation you'd want to base it on "q", not "qq":
q:s"$interpolates @doesn't %doesn't"
On the other hand, that doesn't work for the other things like "qr", so
maybe any of :s, :a, :h turn off default interpolations, so qr:a would
only interpolate arrays, for instance.
: # /pat/x /pat/
: # /^pat$/m /^^pat$$/
:
: That's...odd. Is $$ (the variable) going away?
Maybe. It'd be $*PID if so, since it's truly global to the process.
But if not, we could special case $$ inside regexes, just as we already
special case $ itself.
: # \p{prop} <+prop> ???
: # \P{prop} <-prop> ???
:
: Intriguing.
Yeah, especially when you start stacking them. But maybe we're treading
on [...] territory. It could be argued that <...> is just a generalized
form of POSIX's [:...:] construct
: # \t also <tab>
: # \n also <lf> or <nl> (latter matching
: logical newline)
: # \r also <cr>
: # \f also <ff>
: # \a also <bell>
: # \e also <esc>
:
: I can tell you right now that these are going to screw people up.
: They'll try to use these in normal strings and be confused when it
: doesn't work. And you probably won't be able to emit a warning,
: considering how much CGI Perl munches.
I can see pragmatic variants in which those *do* interpolate by default.
And pragmatic variants where they don't.
: # \033 same
: # \x1B same
: # \x{263a} \x<263a> ???
:
: Why? Wouldn't we want the same thing to work in quoted strings? (Or
: are those changing syntaxes too?)
I'm just wondering how far I can drive the principle that {} is always
a closure (even though it isn't). I admit that it's probably overkill
here, which is why there are question marks.
: # \c[ same
: # \N{name} <name>
: # \l same
: # \u same
: # \Lstring\E \L<string>
: # \Ustring\E \U<string>
:
: So that's changed from whenever you talked about \q{} ?
Possibly. Again, the question is whether {} more strongly imply
something that's not true. But curlies were so overloaded in Perl 5
that I don't think people are going to necessarily expect them to do
only one thing. Still, if <> are taking over the role of "unmarked
metasyntactic delimiters", maybe they belong here too.
: # \E gone
: # [\040\t] \h plus any Unicode horizontal whitespace
: # [\r\n\ck] \v plus any Unicode vertical whitespace
: #=20
: # \b same
: # \B same
:
: # \A ^
: # \Z same?
: # \z $
:
: Are you sure that optimizes for the common case?
No, I'm not sure, but we have to clean up the \A...\z mess somehow.
: # \G <pos>, but assumed in nested patterns?
: # =20
: # \1 $1
: #=20
: # \Q$var\E $var always assumed literal, so $1 is literal
: backref
:
: So these are reinterpolated every time you backtrack? Are you *trying*
: to destroy regex performance? :^)
They're not interpolated. They're matched, as in string comparison, just
as backrefs are matched right now.
: # $var <$var> assumed to be regex
:
: What if $var is a qr//ed object?
Then it's a pretty easy assumption that it's a regex. :-)
: # =~ $re =~ /<$re>/ ouch?
:
: I don't see the win.
No difference if $re is qr//, but if it's not, that is the syntax for
forcing $re to be interpreted as a regex.
: # (??{$rule}) <rule>
: # (?{ code }) { code } with failure semantics
: # (?#...) {"..."} :-)
: # (?:...) <:...>
: # (?=3D...) <before: ...>
: # (?!...) <!before: ...>
: # (?<=3D...) <after: ...>
: # (?<!...) <!after: ...>
:
: Cute. (Wait a minute, aren't those reversed?)
Nope, I realized they were ambiguous depending on whether you think of
them as declarative or operational, but I settled on the declarative
reading because it works with their being assertions. All the other
options I could think of are either really clunky or similarly ambiguous.
: # (?>...) <grab: ...>
: # (?(cond)t|f) Not sure. Could just use { if ... }
:
: <if(cond):true|false>?
Well, sure, if you're attached to that particular set of punctuation.
But we could also have
<if cond: ...>
<elsif cond: ...>
<else: ...>
On the other hand, I think we'll often see parsers doing things like:
$TERM = qr/{
when cond { /.../ }
when cond { /.../ }
when cond { /.../ }
when cond { /.../ }
when cond { /.../ }
when cond { /.../ }
default { /.../ }
}/;
So maybe the <> version is:
<when cond: ...>
<when cond: ...>
<when cond: ...>
<when cond: ...>
<when cond: ...>
<default: ...>
(assuming the scoping of "break" can be worked out).
: # Obviously the <word> and <word:...> syntaxes will be user=20
: # extensible. We have to be able to support full grammars. I=20
: # consider it a feature that <foo> looks like a non-terminal in=20
: # standard BNF notation. I do not consider it a misfeature=20
: # that <foo> resembles an HTML or XML tag, since most of those=20
: # languages need to be matched with a fancy rule named <tag> anyway.
:
: But that *does* make it harder to define the fancy rules. I could see
: someone defining rules like:
:
: 'gt' =3D> qr/\</,
: 'lt' =3D> qr/\>/
:
: just to get around backslashing everything in sight.
I could see someone saying qr:X or some such.
: # An interesting idea would be that if you say
: #=20
: # m<foo: pat>
: #=20
: # or
: #=20
: # m{code}
: #=20
: # it's as if you said
: #=20
: # m/<foo: pat>/
: # =20
: # or
: # =20
: # m/{code}/
:
: I don't know about that one. I often use {} as delimiters on regexen
: because it's a character that doesn't occur in data very often. I think
: the gain of two characters isn't as critical as the loss of options.
: =20
: Understand, I'm not a regex Luddite. I've been working with yacc and
: lex a lot lately, so I have at least a hint of how powerful formal
: parsing is--and I love all of these features. However, I think that
: syntactically a lot of this is a loss for the average Perl hacker. (Not
: me, not you, and not most of the people on this list--the *average*
: hacker, like the 3s or 4s on PerlMonks.)
:
: The *average* Perl hacker doesn't have much use for embedded code in a
: regex or BNF-like rules. The *average* Perl hacker just wants to do an
: s#<emphasis>(\d{1,3}(\.\d{1,3}){3})</emphasis>#<inet>$1</inet># (an
: early example from "Mastering Regular Expressions"). There's a very
: good chance that he knows exactly what the input data looks like and
: that this will work on it.
:
: For this simple reason, I highly suggest somehow hijacking curlies
: instead, and perhaps making embedded code use two curlies. After all,
: regexes are intimidating enough already. :^)
With respect to Perl 5, I'm trying to unhijack curlies as much as possible.
Larry