Author: rsmith
Date: Mon Apr 17 18:44:51 2017
New Revision: 300515

URL: http://llvm.org/viewvc/llvm-project?rev=300515&view=rev
Log:
Fix mishandling of escaped newlines followed by newlines or nuls.

Previously, if an escaped newline was followed by a newline or a nul, we'd lex
the escaped newline as a bogus space character. This led to a bunch of
different broken corner cases:

For the pattern "\\\n\0#", we would then have a (horizontal) space whose
spelling ends in a newline, and would decide that the '#' is at the start of a
line, and incorrectly start preprocessing a directive in the middle of a
logical source line. If we were already in the middle of a directive, this
would result in our attempting to process multiple directives at the same time!
This resulted in crashes, asserts, and hangs on invalid input, as discovered by
fuzz-testing.

For the pattern "\\\n" at EOF (with an implicit following nul byte), we would
produce a bogus trailing space character with spelling "\\\n". This was mostly
harmless, but would lead to clang-format getting confused and misformatting in
rare cases. We now produce a trailing EOF token with spelling "\\\n",
consistent with our handling for other similar cases -- an escaped newline is
always part of the token containing the next character, if any.

For the pattern "\\\n\n", this was somewhat more benign, but would produce an
extraneous whitespace token to clients who care about preserving whitespace.
However, it turns out that our lexing for line comments was relying on this bug
due to an off-by-one error in its computation of the end of the comment, on the
slow path where the comment might contain escaped newlines.

Added:
    cfe/trunk/test/Lexer/newline-nul.c   (with props)
Modified:
    cfe/trunk/lib/Format/FormatTokenLexer.cpp
    cfe/trunk/lib/Lex/Lexer.cpp
    cfe/trunk/unittests/Format/FormatTestComments.cpp

Modified: cfe/trunk/lib/Format/FormatTokenLexer.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Format/FormatTokenLexer.cpp?rev=300515&r1=300514&r2=300515&view=diff
==============================================================================
--- cfe/trunk/lib/Format/FormatTokenLexer.cpp (original)
+++ cfe/trunk/lib/Format/FormatTokenLexer.cpp Mon Apr 17 18:44:51 2017
@@ -467,6 +467,9 @@ FormatToken *FormatTokenLexer::getNextTo
       if (pos >= 0 && Text[pos] == '\r')
         --pos;
       // See whether there is an odd number of '\' before this.
+      // FIXME: This is wrong. A '\' followed by a newline is always removed,
+      // regardless of whether there is another '\' before it.
+      // FIXME: Newlines can also be escaped by a '?' '?' '/' trigraph.
       unsigned count = 0;
       for (; pos >= 0; --pos, ++count)
         if (Text[pos] != '\\')

Modified: cfe/trunk/lib/Lex/Lexer.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Lex/Lexer.cpp?rev=300515&r1=300514&r2=300515&view=diff
==============================================================================
--- cfe/trunk/lib/Lex/Lexer.cpp (original)
+++ cfe/trunk/lib/Lex/Lexer.cpp Mon Apr 17 18:44:51 2017
@@ -1282,12 +1282,6 @@ Slash:
       Size += EscapedNewLineSize;
       Ptr  += EscapedNewLineSize;
 
-      // If the char that we finally got was a \n, then we must have had
-      // something like \<newline><newline>.  We don't want to consume the
-      // second newline.
-      if (*Ptr == '\n' || *Ptr == '\r' || *Ptr == '\0')
-        return ' ';
-
       // Use slow version to accumulate a correct size field.
       return getCharAndSizeSlow(Ptr, Size, Tok);
     }
@@ -1338,12 +1332,6 @@ Slash:
       Size += EscapedNewLineSize;
       Ptr  += EscapedNewLineSize;
 
-      // If the char that we finally got was a \n, then we must have had
-      // something like \<newline><newline>.  We don't want to consume the
-      // second newline.
-      if (*Ptr == '\n' || *Ptr == '\r' || *Ptr == '\0')
-        return ' ';
-
       // Use slow version to accumulate a correct size field.
       return getCharAndSizeSlowNoWarn(Ptr, Size, LangOpts);
     }
@@ -2070,8 +2058,11 @@ bool Lexer::SkipLineComment(Token &Resul
   // Scan over the body of the comment.  The common case, when scanning, is 
that
   // the comment contains normal ascii characters with nothing interesting in
   // them.  As such, optimize for this case with the inner loop.
+  //
+  // This loop terminates with CurPtr pointing at the newline (or end of 
buffer)
+  // character that ends the line comment.
   char C;
-  do {
+  while (true) {
     C = *CurPtr;
     // Skip over characters in the fast loop.
     while (C != 0 &&                // Potentially EOF.
@@ -2097,6 +2088,8 @@ bool Lexer::SkipLineComment(Token &Resul
         break; // This is a newline, we're done.
 
       // If there was space between the backslash and newline, warn about it.
+      // FIXME: This warning is bogus if trigraphs are disabled and the line
+      // ended with '?' '?' '\\' '\n'.
       if (HasSpace && !isLexingRawMode())
         Diag(EscapePtr, diag::backslash_newline_space);
     }
@@ -2140,9 +2133,9 @@ bool Lexer::SkipLineComment(Token &Resul
         }
     }
 
-    if (CurPtr == BufferEnd+1) { 
-      --CurPtr; 
-      break; 
+    if (C == '\r' || C == '\n' || CurPtr == BufferEnd + 1) {
+      --CurPtr;
+      break;
     }
 
     if (C == '\0' && isCodeCompletionPoint(CurPtr-1)) {
@@ -2150,8 +2143,7 @@ bool Lexer::SkipLineComment(Token &Resul
       cutOffLexing();
       return false;
     }
-
-  } while (C != '\n' && C != '\r');
+  }
 
   // Found but did not consume the newline.  Notify comment handlers about the
   // comment unless we're in a #if 0 block.

Added: cfe/trunk/test/Lexer/newline-nul.c
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/test/Lexer/newline-nul.c?rev=300515&view=auto
==============================================================================
Binary file - no diff available.

Propchange: cfe/trunk/test/Lexer/newline-nul.c
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Modified: cfe/trunk/unittests/Format/FormatTestComments.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/unittests/Format/FormatTestComments.cpp?rev=300515&r1=300514&r2=300515&view=diff
==============================================================================
--- cfe/trunk/unittests/Format/FormatTestComments.cpp (original)
+++ cfe/trunk/unittests/Format/FormatTestComments.cpp Mon Apr 17 18:44:51 2017
@@ -2018,7 +2018,7 @@ TEST_F(FormatTestComments, AlignTrailing
             format("#define MACRO(V)\\\n"
                    "V(Rt2)  /* one more char */ \\\n"
                    "V(Rs) /* than here  */    \\\n"
-                   "/* comment 3 */         \\\n",
+                   "/* comment 3 */\n",
                    getLLVMStyleWithColumns(40)));
   EXPECT_EQ("int i = f(abc, // line 1\n"
             "          d,   // line 2\n"


_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to