Alex and Akim, you might also want to review Paul Eggert's changes as they rewrite a significant portion of Alex's work, and Paul has already pushed his changes to master. My review is below.
On Sun, 9 Jan 2011, Paul Eggert wrote: > On 01/09/2011 05:39 AM, Joel E. Denny wrote: > > capturing the full sequence of characters that the user might have > > accidentally intended as part of the symbol name enables Bison to give > > more helpful errors and warnings. > > That is an advantage, but it is a relatively minor one compared to the > confusion engendered by overly complex rules. Even now I don't > fully understand them, and I expect users are likely to be in the > same boat. For example, Bison currently complains if you jam a > number next to an identifier, like this: > > 123FOO > > because that's confusing. Even programmers who don't use Bison would likely agree that, without whitespace, that should be one token not two. The only question is whether it's a valid token. Unfortunately, at least as far back as 1.875 and as recent as 2.4.3, Bison has treated that as two tokens. Thus, Bison accepted %token BAR 123FOO as %token BAR 123 FOO Akim fixed this in branch-2.5 and master so that 123FOO is an error. Your patch does not change this behavior, so I assume you're fine with Akim's fix. > Presumably it should also complain about this: > > -123 > > because that's an identifier (-) jammed next to a number (123). What programmer wouldn't read that as a negative integer? Before your patch, branch-2.5 and master unfortunately accepted %token FOO -123 as %token FOO - 123 Clearly that's a bug. Instead, Bison should read -123 as a negative integer and report an error as it has previously. Fixing that would be easy. However, after your patch, Bison actually accepts -123 as an identifier. That isn't intuitive at all. > But then, why not also complain about this? > > -.234FOO > > as that also looks like a number (-.234) jammed next to an identifier? Bison understands integers. It doesn't understand "." as a decimal point. It should be fairly easy for users to grasp that concept. Even Yacc treats "." as a dot not a decimal point. Our documentation simply needs to be clear about what is an identifier: An identifier can be any sequence of letters, underscores, periods, dashes, and digits that does not start with an integer (unsigned or negative). Thus, it's easy to understand that -.234FOO is an identifier, -123 is an integer not an identifier, and 123FOO is an error. I see nothing unintuitive about those cases. Even so, I could also live with either of the following, but I think they're unnecessarily restrictive: An identifier can be any sequence of letters, underscores, periods, dashes, and digits that does not start with an integer or float (unsigned or negative). An identifier can be any sequence of letters, underscores, periods, dashes, and digits that does not start with a digit or dash. In summary, you apparently want to say that negative integers can be identifiers while unsigned integers cannot. I think you're heading in the wrong direction. > > Feel free to propose a patch. > > How about the following patch? It tries to simplify things, at some > cost in niceties in diagnostics. I pushed it into the trunk. Further > comments are welcome. You started your email by arguing that you want to make improvements to help the user. However, when I look through your test suite changes (where you should be testing all such improvements), all I see is a loss of diagnostic information provided to the user. Specifically, many "possibly meant" messages are now missing. Sorry, my opinion is that your patch does more harm than good and should be backed out. However, I would very much like for Alex and Akim to give their opinions.