[PATCH] D65906: [clang-scan-deps] Fix edge cases in the minimizer

2019-08-21 Thread Alexandre Ganea via Phabricator via cfe-commits
aganea abandoned this revision.
aganea added a comment.

Please see the other reviews I've sent.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65906/new/

https://reviews.llvm.org/D65906



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D65906: [clang-scan-deps] Fix edge cases in the minimizer

2019-08-09 Thread Alexandre Ganea via Phabricator via cfe-commits
aganea added a comment.

In D65906#1622209 , @arphaman wrote:

> @aganea These are not just any invisible characters that you have, this is 
> the UTF8 BOM. Clang's Lexer skips over them if they're in the beginning of 
> the file (`Lexer::InitLexer`). The minimizer should do the same thing, so 
> ideally you would factor out the BOM detection and teach the minimizer to 
> skip past it.


Thanks for the heads up. I'll do that.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65906/new/

https://reviews.llvm.org/D65906



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D65906: [clang-scan-deps] Fix edge cases in the minimizer

2019-08-08 Thread Alex Lorenz via Phabricator via cfe-commits
arphaman added a comment.

@aganea These are not just any invisible characters that you have, this is the 
UTF8 BOM. Clang's Lexer skips over them if they're in the beginning of the file 
(`Lexer::InitLexer`). The minimizer should do the same thing, so ideally you 
would factor out the BOM detection and teach the minimizer to skip past it.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65906/new/

https://reviews.llvm.org/D65906



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D65906: [clang-scan-deps] Fix edge cases in the minimizer

2019-08-08 Thread Duncan P. N. Exon Smith via Phabricator via cfe-commits
dexonsmith added a comment.

In D65906#1621542 , @aganea wrote:

> In D65906#1620034 , @arphaman wrote:
>
> > Regarding the invisible characters before `#ifdef`, are you sure that's the 
> > right behavior?
>
>
> Should we then copy these invisible characters to the minimized output? 
> Believe it or not, these characters are there in our codebase since 2013, and 
> never clang has complained about it :)


Do you have an example command-line where clang doesn't error with your test 
file?  Alex gave you an example where it does error.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65906/new/

https://reviews.llvm.org/D65906



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D65906: [clang-scan-deps] Fix edge cases in the minimizer

2019-08-08 Thread Alexandre Ganea via Phabricator via cfe-commits
aganea added a comment.

In D65906#1620034 , @arphaman wrote:

> Regarding the invisible characters before `#ifdef`, are you sure that's the 
> right behavior?


Should we then copy these invisible characters to the minimized output? Believe 
it or not, these characters are there in our codebase since 2013, and never 
clang has complained about it :)


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65906/new/

https://reviews.llvm.org/D65906



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D65906: [clang-scan-deps] Fix edge cases in the minimizer

2019-08-07 Thread Alex Lorenz via Phabricator via cfe-commits
arphaman added a comment.

Thanks for figuring out these issues! It looks like you're fixing several edge 
cases, would you mind splitting the patch up for easier review?

Regarding the invisible characters before `#ifdef`, are you sure that's the 
right behavior? If you run the preprocessor over your test case, it doesn't 
accept it as a valid input, so why should the minimizer be different:

  $ clang -cc1 -DTEST  %t.c -Eonly -o -
  %t.c:3:2: error: #endif without #if
  #endif
   ^
  1 error generated.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65906/new/

https://reviews.llvm.org/D65906



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D65906: [clang-scan-deps] Fix edge cases in the minimizer

2019-08-07 Thread Alexandre Ganea via Phabricator via cfe-commits
aganea created this revision.
aganea added reviewers: arphaman, dexonsmith, Bigcheese.
aganea added a project: clang.
Herald added a subscriber: tschuett.

This patch fixes:

1. Invisible characters that come just before #include, such as #ifndef. 
( were hidden, depending on the display locale). I choose to simply skip 
over non-ASCII characters.
2. Double slashes in #include directive with angle brackets not handled 
correctly: #include 
3. #error directive with quoted, multi-line content, along with CR+LF line 
endings wasn't handled correctly.


Repository:
  rC Clang

https://reviews.llvm.org/D65906

Files:
  lib/Lex/DependencyDirectivesSourceMinimizer.cpp
  test/Lexer/minimize_source_to_dependency_directives_include.c
  test/Lexer/minimize_source_to_dependency_directives_invalid_chars.c
  test/Lexer/minimize_source_to_dependency_directives_invalid_error.c

Index: test/Lexer/minimize_source_to_dependency_directives_invalid_error.c
===
--- test/Lexer/minimize_source_to_dependency_directives_invalid_error.c
+++ test/Lexer/minimize_source_to_dependency_directives_invalid_error.c
@@ -0,0 +1,16 @@
+// Test CF+LF are properly handled along with quoted, multi-line #error
+// RUN: cat %s | unix2dos | %clang_cc1 -DOTHER -print-dependency-directives-minimized-source 2>&1 | FileCheck %s
+
+#ifndef TEST
+#error "message \
+   more message \
+   even more"
+#endif
+
+#ifdef OTHER
+#include 
+#endif
+
+// CHECK:  #ifdef OTHER
+// CHECK-NEXT: #include 
+// CHECK-NEXT: #endif
Index: test/Lexer/minimize_source_to_dependency_directives_invalid_chars.c
===
--- test/Lexer/minimize_source_to_dependency_directives_invalid_chars.c
+++ test/Lexer/minimize_source_to_dependency_directives_invalid_chars.c
@@ -0,0 +1,9 @@
+// Test invisible, bad characters just before #ifdef
+// RUN: echo -n -e '\xef\xbb\xbf#ifdef TEST\n' > %t.c
+// RUN: echo '#include ' >> %t.c
+// RUN: echo '#endif' >> %t.c
+// RUN: %clang_cc1 -DTEST -print-dependency-directives-minimized-source %t.c 2>&1 | FileCheck %s
+
+// CHECK:  #ifdef TEST
+// CHECK-NEXT: #include 
+// CHECK-NEXT: #endif
Index: test/Lexer/minimize_source_to_dependency_directives_include.c
===
--- test/Lexer/minimize_source_to_dependency_directives_include.c
+++ test/Lexer/minimize_source_to_dependency_directives_include.c
@@ -0,0 +1,8 @@
+// Test double slashes in #include directive along with angle brackets. Previously, this was interpreted as comments.
+// RUN: %clang_cc1 -DTEST -print-dependency-directives-minimized-source %s 2>&1 | FileCheck %s
+
+#include "a//b.h"
+#include 
+
+// CHECK: #include "a//b.h"
+// CHECK: #include 
Index: lib/Lex/DependencyDirectivesSourceMinimizer.cpp
===
--- lib/Lex/DependencyDirectivesSourceMinimizer.cpp
+++ lib/Lex/DependencyDirectivesSourceMinimizer.cpp
@@ -113,7 +113,8 @@
 }
 
 static void skipOverSpaces(const char *, const char *const End) {
-  while (First != End && isHorizontalWhitespace(*First))
+  while (First != End &&
+ (isHorizontalWhitespace(*First) || !clang::isASCII(*First)))
 ++First;
 }
 
@@ -185,8 +186,8 @@
 }
 
 static void skipString(const char *, const char *const End) {
-  assert(*First == '\'' || *First == '"');
-  const char Terminator = *First;
+  assert(*First == '\'' || *First == '"' || *First == '<');
+  const char Terminator = *First == '<' ? '>' : *First;
   for (++First; First != End && *First != Terminator; ++First)
 if (*First == '\\')
   if (++First == End)
@@ -195,15 +196,27 @@
 ++First; // Finish off the string.
 }
 
-static void skipNewline(const char *, const char *End) {
-  assert(isVerticalWhitespace(*First));
-  ++First;
+// Returns the length of EOL, either 0 (no end-of-line), 1 (\n) or 2 (\r\n)
+static unsigned isEOL(const char *First, const char *const End) {
   if (First == End)
-return;
+return 0;
+  if (End - First > 1) {
+if (isVerticalWhitespace(First[0]) && isVerticalWhitespace(First[1]) &&
+First[0] != First[1])
+  return 2;
+  }
+  return !!isVerticalWhitespace(First[0]);
+}
 
-  // Check for "\n\r" and "\r\n".
-  if (LLVM_UNLIKELY(isVerticalWhitespace(*First) && First[-1] != First[0]))
-++First;
+static unsigned skipNewline(const char *, const char *End) {
+  unsigned Len = isEOL(First, End);
+  assert(Len);
+  First += Len;
+  return Len;
+}
+
+static bool wasLineContinuation(const char *First, unsigned Len) {
+  return First[-(int)Len - 1] == '\\';
 }
 
 static void skipToNewlineRaw(const char *, const char *const End) {
@@ -211,17 +224,22 @@
 if (First == End)
   return;
 
-if (isVerticalWhitespace(*First))
+unsigned Len = isEOL(First, End);
+if (Len)
   return;
 
-while (!isVerticalWhitespace(*First))
+do {
   if (++First == End)