On Dec 26, 2007, at 10:44 AM, Nico Weber wrote:
the crash I reported and fixed earlier ( http://lists.cs.uiuc.edu/pipermail/cfe-dev/2007-December/000745.html
) happened because `Tok.getIdentifierInfo()` sometimes returns 0.
Right. Tokens that are not "pp-identifiers" in the lexer do not have
an identifier pointer. This includes tokens like numbers (1), strings
("foo"), etc.
These conditions are not clearly documented in Token.h, and even if
it was documented functions that may or may not return 0 are
generally error prone. So I grepped clang for calls to
`getIdentifierInfo()`. I found two places where this function was
not handled correctly. Tests to reproduce the crashes and makeshift
patches are attached (Someone familiar with the code needs to look
at the FIXMEs in the patch. Problems where related to ObjC's @try/
@catch and ObjC2 @interface prefixes).
Nice! Your patch looks exactly right, I applied it here (after
tweaking the expected-error stuff):
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20071224/003579.html
(Why is it a good idea to treat stuff like @try as two tokens
instead of one?)
The answer is that thing like @ /*comment*/ try are legal, sadly
enough. However, it seems that we could probably do something in the
lexer (when it sees the "@", to handle this. I'll see what I can do
about this when I have time.
Furthermore, I'd suggest to at least use an assert if you know that
`getIdentifierInfo()` can't return 0 and rely on it. Doing an
`assert(Tok.getIdentifierInfo() && "foo always has ident info")`
serves as good documentation.
Well, in theory the code should only call and deference
getIdentifierInfo if it already knows. If it isn't clear from the
context of the call in the code, adding an assert makes sense.
In the following places it was not immediately clear to me why the
code is valid and `getIdentifierInfo` can't possibly return 0 (line
numbers relative to rev 45360):
Lex/MacroExpander.cpp:
line 324
This is safe because previous code verified that the macro arguments
are identifiers.
#define A(1)
should be rejected earlier. Adding an assert would make sense.
Lex/Preprocessor.cpp:
2222
2253
2329
The calls to ReadMacroName verify that the name is an identifier.
Parse/ParseDecl.cpp:
101
I'm not sure about this. That call is only reachable if
"Tok.is(tok::identifier) || isDeclarationSpecifier()". It is unclear
to me that all declspecs have identifiers. Steve?
1467
assert(Tok.is(tok::kw_typeof) && "Not a typeof specifier");
const IdentifierInfo *BuiltinII = Tok.getIdentifierInfo();
The assertion verifies that the token is a keyword, which has an
identifier ptr. This code is trying to preserve __typeof__ vs typeof
in a diagnostic.
Parse/ParseExpr.cpp:
216
ParseExpressionWithLeadingIdentifier is only called with an identifier
as IdTok.
247
likewise for ParseAssignmentExprWithLeadingIdentifier.
(785)
This is only called with these 4 keywords as the current token:
case tok::kw___builtin_va_arg:
case tok::kw___builtin_offsetof:
case tok::kw___builtin_choose_expr:
case tok::kw___builtin_types_compatible_p:
Parse/Parser.cpp:
377 (one of the bugs, fixed with the patch
Ok.
Parse/ParseObjc.cpp:
304
325 (but only because of strange identation because of tabs instead
of spaces -- fixed in the attached patch as well)
(476)
1130 (one of the bugs, fixed with the patch)
1164 (one of the bugs, fixed with the patch)
1235
Even better than adding asserts in these lines is to catch this
problem with the compiler (for example, by putting
`getIdentifierInfo()` in a subclass and never let it return 0. Then
you _have_ to check for the right token type to call the method),
but that's a bit of work :-P
This would also require the Token class to be polymorphic, which is a
non-starter. Another potential solution would be to make
getIdentifierInfo() always assert that the pointer is non-null. This
would require callers to call Tok.hasIdentifierInfo() if they don't
know it is valid or to add a getIdentifierInfoOrNull() method.
An unrelated crash that I found on the way is:
int main()
{
id a;
[a bla:0 6:7];
}
(crashes somewhere in sema, something like this should be put in
test/Parse/objc-messaging-1.m)
And here's an inconsistency with gcc:
int @interface bla ; // ?? this is valid objc?
@end
I have no idea what this code is supposed to do, but it doesn't warn
with clang but doesn't even compile with gcc.
I'll let Steve and Fariborz chime in on these.
ps: I also converted a few tabs to spaces
Thanks! It would make it easier to review the patch if you kept the
mechanical pieces separate from the changes that require review, but I
appreciate the patch.
As an aside, things will probably pick up in early january, many
people are out for the holidays.
-Chris
_______________________________________________
cfe-dev mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev