Re: [Rd] Why does the lexical analyzer drop comments ?

Peter Dalgaard Fri, 20 Mar 2009 15:13:19 -0700

Duncan Murdoch wrote:

On 3/20/2009 2:56 PM, [email protected] wrote:

It happens in the token function in gram.c:
Â Â Â  c = SkipSpace();
Â Â Â  if (c == '#') c = SkipComment();


and then SkipComment goes like that:
static int SkipComment(void)
{
Â Â Â  int c;
Â Â Â  while ((c = xxgetc()) != '\n' && c != R_EOF) ;
Â Â Â  if (c == R_EOF) EndOfFile = 2;
Â Â Â  return c;
}

which effectively drops comments.

Would it be possible to keep the information somewhere ?
The source code says this:
Â *Â  The function yylex() scans the input, breaking it into
Â *Â  tokens which are then passed to the parser.Â  The lexical
Â *Â  analyser maintains a symbol table (in a very messy fashion).

so my question is could we use this symbol table to keep track of,say, COMMENT tokens.

Why would I even care about that ? I'm writing a package that will
perform syntax highlighting of R source code based on the output of the
parser, and it seems a waste to drop the comments.

An also, when you print a function to the R console, you don't get thecomments, and some of them might be useful to the user.

Am I mad if I contemplate looking into this ?

Comments are syntactically the same as whitespace. You don't want themto affect the parsing.


Well, you might, but there is quite some madness lying that way.

Back in the bronze age, we did actually try to keep comments attached to(AFAIR) the preceding token. One problem is that the elements of theparse tree typically involve multiple tokens, and if comments afterdifferent tokens get stored in the same place something is not goingback where it came from when deparsing. So we had problems with commentsmoving from one end of a loop the other and the like.

You could try extending the scheme by encoding which part of a syntacticstructure the comment belongs to, but consider for instance how manyplaces in a function call you can stick in a comment.


f #here
( #here
a #here (possibly)
= #here
1 #this one belongs to the argument, though
) #but here as well

If you're doing syntax highlighting, you can determine the whitespace by
looking at the srcref records, and then parse that to determine whatisn't being counted as tokens. (I think you'll find a few things therebesides whitespace, but it is a fairly limited set, so shouldn't be toohard to recognize.)
The Rd parser is different, because in an Rd file, whitespace issignificant, so it gets kept.
Duncan Murdoch

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - ([email protected])              FAX: (+45) 35327907

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Why does the lexical analyzer drop comments ?

Reply via email to