Dave Storrs wrote:
> why didn't you have to write:
 >
 >      rule ugly_c_comment {
 > 
        /
 > 
                \/ \*  [ .*? <ugly_c_comment>? ]*?  \* \/
 > 
                { let $0 := " " }
 > 
        /
 >      }

Think of the curly braces as the regex quotes. If "{" is the quote
then there's nothing special about "/" and it doesn't need to be
escaped. Also, I don't think you want spaces between "/" and "*"
because "/ *" isn't a comment delimiter.

> 2) As written, I believe that the ugly_c_comment rule would permit nested
> comments (that is, /* /**/ */), but would break if the comments were
> improperly nested (e.g., /* /* */).  Is that correct?

It wouldn't fail, but it would scan to EOF and then back track.
Basically the inner <ugly_c_comment> succeeds and then the rest
of the file is scanned for <'*/'>. When that fails, the regex
back tracks to the inner <ugly_c_comment>, fails that and then
skips the unbalanced "/*" with .*?. I'd like to add ::: to fail
the entire comment if the inner comment fails, but I'm not sure
how to do it. Does this work?

   /\* [ .*? | <ugly_c_comment> ::: ]*? \*/

> 3) The rule will replace the comment with a single, literal space.  Why is
> this replacement necessary...isn't it sufficient to simply define it as
> whitespace, as was done above?

Probably. I think it's a hold-over from thinking of parser vs lexer,
but that may not be true depending on how the rest of the grammar
uses white space. IMHO value bound to the white space production
should be the actual text (the comment in this case).

- Ken

Reply via email to