Trey Harris wrote:

> I guess this is as good an opportunity as any to be sure I've got what's
> going on.  So, here's a first, simple, addmitedly broken hack at a simple
> parser for xml-ish start tags and empty entities:
> 
> rule lt { \< }
> rule gt { \> }
> rule identifier {
>     # I don't know the XML identifier rules, but let's pretend:
>     <alpha> [ <alpha> | \d | _ ]*
> }

Or just:

      <alpha> \w*


> rule val {
>     [   # quoted
>        $b := <['"]>
>        ( [ \\. | . ]*? )
>        $b
>     ] | # or not
>        (\H+)
> }

Not quite. Assigning to $b is a capture. So now you have more than one
capture in the first branch, so the paren-captured value won't come back on 
its own. And you don't really want to allow vertical spaces in the unquoted
value either, so \S is more appropriate. And the precedence of | is still low,
so the [...] are unnecessary (though not wrong in themselves). And \\.|. is 
just \\?. (but again, not wrong in itself).
 
So you want:

    rule val {
        $delim := <['"]>
        $data  := ( [\\?.]*? )
        $delim
      | $data  := (\S+) 
    }  

> rule parsetag :w {
>    <lt> $tagname :=    <identifier>
>         %attrs   := [ (<identifier>) =
>                       (<val>)
>                     ]*
>    /?
>    <gt>
> }
> 
> for <$fh> {
>     while m:e/<parsetag>/ {
>        print "Found tag $0{tagname}\n";
>        print "  $a = '$v'\n" for $0{attrs}.kv -> $a, $b;
>     }
> }
> 
> My questions are:
> 
> 1. Does the match in my <val> rule get passed correctly?  I.e., I have
>    parens in alternations, will whichever one gets matched become the
>    return value of the whole rule?

I believe not, since you're also caputuring the delimiter. I've discussed the 
need to do complex capturing internally, but still return a simple value
with Larry. My suggested solutions were either a special assertion:

    rule val {
      [ $delim := <['"]>
        $data  := ( [\\?.]*? )
        $delim
      | $data  := (\S+) 
      ]
      <return $data>
    }  

or a special hypothetical variable:

    rule val {
        $delim  := <['"]>
        $RETURN := ( [\\?.]*? )
        $delim
      | $RETURN := (\S+)    
    }  

I don't know which, if either, he will approve. Maybe both!


> 2. Did I do the binding to the hash correctly?

Yes, indeed.


> 3. Will my I/O loop work like I expect?

Depends what you expect! ;-)

But you probably want this:

    for <$fh> {
        while m:c/<parsetag>/ {
            print "Found tag $0{tagname}\n";
            for $0{attrs}.kv -> $a, $b { print "  $a = '$b'\n" }
        }
    }

That's is, the :c modifier rather than :e, since you're looping 
one-match-at-a-time via a C<while>. If you want to grab every match
at once, and then iterate them, you do want :e, but you also want
a C<for>:

    for <$fh> {
        for m:e/<parsetag>/  -> $match{
            print "Found tag $match{tagname}\n";
            for $match{attrs}.kv -> $a, $b { print "  $a = '$b'\n" }
        }
    }

Oh, and you can't use the -> on a postfix C<for> modifier, only on a prefix
C<for> statement.
    
Great to see someone so keen to start programming in Perl 6!

Damian

Reply via email to