Dave Storrs wrote: > why didn't you have to write: > > rule ugly_c_comment { > / > \/ \* [ .*? <ugly_c_comment>? ]*? \* \/ > { let $0 := " " } > / > }
Think of the curly braces as the regex quotes. If "{" is the quote then there's nothing special about "/" and it doesn't need to be escaped. Also, I don't think you want spaces between "/" and "*" because "/ *" isn't a comment delimiter. > 2) As written, I believe that the ugly_c_comment rule would permit nested > comments (that is, /* /**/ */), but would break if the comments were > improperly nested (e.g., /* /* */). Is that correct? It wouldn't fail, but it would scan to EOF and then back track. Basically the inner <ugly_c_comment> succeeds and then the rest of the file is scanned for <'*/'>. When that fails, the regex back tracks to the inner <ugly_c_comment>, fails that and then skips the unbalanced "/*" with .*?. I'd like to add ::: to fail the entire comment if the inner comment fails, but I'm not sure how to do it. Does this work? /\* [ .*? | <ugly_c_comment> ::: ]*? \*/ > 3) The rule will replace the comment with a single, literal space. Why is > this replacement necessary...isn't it sufficient to simply define it as > whitespace, as was done above? Probably. I think it's a hold-over from thinking of parser vs lexer, but that may not be true depending on how the rest of the grammar uses white space. IMHO value bound to the white space production should be the actual text (the comment in this case). - Ken