Re: Regex and Matched Delimiters

Larry Wall Tue, 23 Apr 2002 10:02:25 -0700

Me writes:
: >     /pat/i m:i/pat/ or /<?i:pat>/ or even m<?i:pat> ???
: 
: Why lose the modifier-following-final-delimiter
: syntax? Is this to avoid a parsing issue, or
: because it's linguistically odd to have a modifier
: at the end?


Haven't decided for sure to lose it, but it does have several problems.
First is the parsing issue, but there's also what in natural language
is called the "end weight" problem.  We often rearrange our sentences
in English so that the short things come first and the long things come
last.  That's why you choose indirect object syntax sometimes and not
others.  Try turning either of these to the other form:

    I gave him a big, smelly tuna-fish and cucumber sandwich.
    I gave the sandwich to a big, smelly tuna fisherman and his dog "Cucumber".

Now, options are always little, so it seems that they should come early.

: >     /^pat$/m /^^pat$$/
: 
: What's the mnemonic here? It feels the wrong
: way round -- like a single ^ or $ should match
: at newlines, double ^ or $ should only match
: at start/end string.

Well, I though of it as ^^ or $$ matching potentially multiple places
in the string.

: Ah. The newline matches between the ^^ or $$.
: That works.

Except that the newline doesn't match between the characters.  You could
say /$$\n^^/ for instance.

: Then there's the PID issue. Hmm. How to save $$
: (it is nice for one liners)?

$PID is only two chars worse.  (The * of $*PID is optional.)

: Sorry if this is a dumb suggestion, but could you have
: just one assertion, say ^$, that alternates matching
: just before and just after a newline?

^$ matches a null string.  That aside, I don't think stateful assertions
would be unconfusing in the extreme.

: >     /./s /<any>/ or /<.>/ ???
: 
: I'd expect . to match newlines by default. For a . that
: didn't match newlines, I'd expect to need to use [^\n].

But . has never matched newlines by default, not even in grep.  Possibly
some editors do it that way, but if so, it's non-standard.

: >     space <sp> (or \h for "horizontal"?)
: 
: Can one quote a substring of a regex? In a later part you
: say that \Q...\E is going away, so it seems not. It would be
: nice to say something like:
: 
:     /foo bar baz 'qux waldo' emerson/
: 
: and have the space between qux and waldo be literal.
: Similar arguments apply more broadly so that one
: could escape the usual meaning of metacharacters etc.

Well, <"qux waldo"> could be made to mean that, I suppose.  For that
matter, so might \q{qux waldo}.  Er, \q<qux waldo>?

: >     \Lstring\E \L<string>
: >     \Ustring\E \U<string>
: 
: Maybe, if I wasn't too far off with the quote mark
: suggestion above, then  \L'string' would be more
: natural.

Maybe \L and \q are in the same class, in which case that would work.

: >     (?#...) {"..."} :-)
: 
: Will plain # comments work in p6  regexen?

Yes, just as in /x.  And there's no ambiguity in the end delimiter
any more because we parse in one pass.

: >     (?:...) <:...>
: >     (?=...) <before: ...>
: >     (?!...) <!before: ...>
: >     (?<=...) <after: ...>
: >     (?<!...) <!after: ...>
: >     (?>...) <grab: ...>
: 
: Hmm. So <> are clustering just like ().

Yes, and you can quantify them where it makes sense.

: One difference is that () always capture whereas <>
: only do so sometimes. Oh, and {} can too.

Eh?  <> never capture.  None of those constructs above capture.
Nothing inside a {} can capture anything that influences the paren
count outsid the {}, because any inner regex has its own paren count.

: () are no longer used for clever stuff, <> are instead.
: And {}.

Basically, yes.

: Hmm. Time for bed.

Why?  I just got up.  :-)

Larry

Re: Regex and Matched Delimiters

Reply via email to