Trey Harris wrote: > I guess this is as good an opportunity as any to be sure I've got what's > going on. So, here's a first, simple, addmitedly broken hack at a simple > parser for xml-ish start tags and empty entities: > > rule lt { \< } > rule gt { \> } > rule identifier { > # I don't know the XML identifier rules, but let's pretend: > <alpha> [ <alpha> | \d | _ ]* > }
Or just: <alpha> \w* > rule val { > [ # quoted > $b := <['"]> > ( [ \\. | . ]*? ) > $b > ] | # or not > (\H+) > } Not quite. Assigning to $b is a capture. So now you have more than one capture in the first branch, so the paren-captured value won't come back on its own. And you don't really want to allow vertical spaces in the unquoted value either, so \S is more appropriate. And the precedence of | is still low, so the [...] are unnecessary (though not wrong in themselves). And \\.|. is just \\?. (but again, not wrong in itself). So you want: rule val { $delim := <['"]> $data := ( [\\?.]*? ) $delim | $data := (\S+) } > rule parsetag :w { > <lt> $tagname := <identifier> > %attrs := [ (<identifier>) = > (<val>) > ]* > /? > <gt> > } > > for <$fh> { > while m:e/<parsetag>/ { > print "Found tag $0{tagname}\n"; > print " $a = '$v'\n" for $0{attrs}.kv -> $a, $b; > } > } > > My questions are: > > 1. Does the match in my <val> rule get passed correctly? I.e., I have > parens in alternations, will whichever one gets matched become the > return value of the whole rule? I believe not, since you're also caputuring the delimiter. I've discussed the need to do complex capturing internally, but still return a simple value with Larry. My suggested solutions were either a special assertion: rule val { [ $delim := <['"]> $data := ( [\\?.]*? ) $delim | $data := (\S+) ] <return $data> } or a special hypothetical variable: rule val { $delim := <['"]> $RETURN := ( [\\?.]*? ) $delim | $RETURN := (\S+) } I don't know which, if either, he will approve. Maybe both! > 2. Did I do the binding to the hash correctly? Yes, indeed. > 3. Will my I/O loop work like I expect? Depends what you expect! ;-) But you probably want this: for <$fh> { while m:c/<parsetag>/ { print "Found tag $0{tagname}\n"; for $0{attrs}.kv -> $a, $b { print " $a = '$b'\n" } } } That's is, the :c modifier rather than :e, since you're looping one-match-at-a-time via a C<while>. If you want to grab every match at once, and then iterate them, you do want :e, but you also want a C<for>: for <$fh> { for m:e/<parsetag>/ -> $match{ print "Found tag $match{tagname}\n"; for $match{attrs}.kv -> $a, $b { print " $a = '$b'\n" } } } Oh, and you can't use the -> on a postfix C<for> modifier, only on a prefix C<for> statement. Great to see someone so keen to start programming in Perl 6! Damian