Trey Harris wrote:

> On second reading, it occurs to me that this wouldn't work quite right,
> because the :w would imply a \s+ between <lt> and <identifier>, between
> the equals, and before the <gt>.

No. Under :w you get \s+ between literal sequences that are potential identifiers, and 
\s* between anything else. So your:

> rule parsetag :w {
>    <lt> $tagname :=    <identifier>
>         %attrs   := [ (<identifier>) =
>                       (<val>)
>                     ]*
>    /?
>    <gt>
> }

is really:

  rule parsetag :w {
     \s* <lt> \s* $tagname :=    <identifier>
                  %attrs   := [ \s* (<identifier>) \s* =
                                \s* (<val>)
                              ]*
     \s* /?
     \s* <gt>
  }

Which matches valid tags (and some invalid ones too).



> Does an explicit space assertion in :w
> automatically suppress the implicit ones on either side?

Yes.


> I.e., would
> 
> rule parsetag :w {
>    <lt> \s* $tagname := <identifier>
>             %attrs := [ (<identifier>) = <val> ]*
>    \s* /?
>    <gt>
> }
> 
> Work?  Or would I have to be explicit about everything:

To get the (lack-of-)spacing rules you probably desire, you'd only have 
to be explicit only where the default rules are inappropriate:

  rule parsetag {
     <lt>[$tagname:=<identifier>] \s+
          %attrs := [ (<identifier>)=(<val>) ]*
     /?<gt>
  }


> It strikes me that this is a problem crying out for a DWIMmy
> solution--something that could deal with whitespace in a common way, i.e.,
> required between tokens that can't otherwise be differentiated....  am I
> missing something?

Yes. You're missing:

    Another new modifier is :w, which causes an implicit match of
    whitespace wherever there's literal whitespace in a pattern. In
--> other words, it replaces every sequence of actual whitespace in
--> the pattern with a \s+ (between two identifiers) or a \s*
--> (between anything else). So 
    
      m:w/ foo bar \: ( baz )*/ 
              ^   
    really means (expressed in Perl 5 form):
     
      m:p5/\s*foo\s+bar\s*:(\s*baz\s*)*/ 
                 ^^^      
    You can still control the handling of whitespace under :w,
    since we extend the rule to say that any explicit
    whitespace-matching token can't match whitespace implicitly on
    either side. So: 
    
      m:w/ foo\ bar \h* \: (baz)*/ 
      
    really means (expressed in Perl 5 form): 
    
      m:p5/\s*foo bar[\040\t\p{Zs}]*:\s*(baz)*/ 


Damian

Reply via email to