On Thu, Aug 5, 2010 at 12:28 PM, Aaron Sherman <a...@ajs.com> wrote:
> While that's a nifty special case (I'm sure it will surprise me someday, and
> I'll spend a half hour debugging before I remember this mail), it doesn't
> help in the general case (see my example grammar, below).

In the general case, no. In the case of your grammar, and all
grammars, it does help.

All regex routines, when called standalone, are anchored to the
beginning and end of the string. So, having "^" and "$" at the
beginning and end of your TOP is a no-op unless some other rule calls
it as a subrule.

S05 says: "In general, the anchoring of any subrule call is controlled
by its calling context. When a regex, token, or rule method is called
as a subrule, the front is anchored to the current position (as with
:p), while the end is not anchored, since the calling context will
likely wish to continue parsing. However, when such a method is
smartmatched directly, it is automatically anchored on both ends to
the beginning and end of the string." and that "The basic rule of
thumb is that the keyword-defined methods never do implicit .*?-like
scanning, while the m// and s// quotelike forms do such scanning in
the absence of explicit anchoring."

Given that the Grammar.parse is specified to create a new Grammar
object and directly match its TOP(or the value of the :rule adverb)
method, without any specification that it does implicit .*? like
scanning, I think that Grammar.parse should always anchor. This
doesn't appear to work quite properly in Rakudo currently. It anchors
to the beginning but not to the end. I'm about to check if there's a
rakudobug for this already, and submit it if not.

> After doing some more thinking and comparing this to other languages
> (python, for example has "match" which matches only at the start of a
> string), it seems to me that there is a sort of out-of-band need to have a
> more general solution at match time. Here's my second pass suggestion:
>
>  m:r / m:rooted -- Match is rooted on both ends ("^...$")
>  m:rs / m:rootedstart - Match is rooted at the start of string ("^", ala
> Python re.match)
>  m:re / m:rootedend - Match is rooted at the end of string ("$")
>  m:rn / m:rootednone - Match is not rooted (default)
>  m:o / m:oneline - Modify :r and friends to use ^^/$$
>
> Here's one way I can see that being routinely used:
>
>  # Simplistic shell scripts
>  rule TOP :r {<stmt>*} # Match the whole script
>  rule stmt :r :o { <cmd> <arg>* } # One statement per line

:oneline or similar might be useful. I'm not sure about :rootedend and
:rootedstart. :rooted is useful only in one situation: when implicitly
matching against the topic. You could do "m:r/ foo /;" to match
against the topic, but "regex { foo };" would not do what you want (I
think). I don't know if doing an anchored match against the topic is
really important enough to justify an adverb just so you don't have to
do "$_ ~~ regex { foo }".

>
> The other way to go about that would be with parameterized adverbs. I'm not
> sure how comfy people are with those, but they're in the spec. So this:
>
>  m:r / m:rooted -- Match is rooted (default is ^...$)
>    Parameters:
>    :s / :start -- Match is rooted only at start ("^")
>    :e / :end -- Match is rooted only at end ("$")
>    [note: :s :e should produce a warning]
>    :n / :none -- Match is not rooted (null modifier)
>    [note: combining :n with :s or :e should warn]
>    :o / :oneline -- Use ^^ and $$ instead of ^ and $
>    [note: combining :o with :n should warn?]
>
> So our statement matching grammar becomes:
>
>  rule TOP :r {<stmt>*}
>  rule stmt :r(:o) { <cmd> <arg>* }
>
> The clown nose is just a side benefit ;-)
>
> Seriously, though, I prefer :r(:o) because :r:o looks like it should be the
> opposite of :rw (there is no :ro, as far as I know).
>
> PS: I see no reason that any of this is needed for 6.0.0
>
> --
> Aaron Sherman
> Email or GTalk: a...@ajs.com
> http://www.ajs.com/~ajs
>



-- 
Tyler Curtis

Reply via email to