[PATCH] D66511: [clang-scan-deps] Skip UTF-8 BOM in source minimizer
This revision was automatically updated to reflect the committed changes. Closed by commit rL369993: [clang-scan-deps] Skip UTF-8 BOM in source minimizer (authored by aganea, committed by ). Herald added a project: LLVM. Herald added a subscriber: llvm-commits. Changed prior to commit: https://reviews.llvm.org/D66511?vs=216311=217275#toc Repository: rL LLVM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D66511/new/ https://reviews.llvm.org/D66511 Files: cfe/trunk/lib/Lex/DependencyDirectivesSourceMinimizer.cpp cfe/trunk/test/Lexer/minimize_source_to_dependency_directives_utf8bom.c Index: cfe/trunk/lib/Lex/DependencyDirectivesSourceMinimizer.cpp === --- cfe/trunk/lib/Lex/DependencyDirectivesSourceMinimizer.cpp +++ cfe/trunk/lib/Lex/DependencyDirectivesSourceMinimizer.cpp @@ -834,7 +834,14 @@ return lexDefault(Kind, Id.Name, First, End); } +static void skipUTF8ByteOrderMark(const char *, const char *const End) { + if ((End - First) >= 3 && First[0] == '\xef' && First[1] == '\xbb' && + First[2] == '\xbf') +First += 3; +} + bool Minimizer::minimizeImpl(const char *First, const char *const End) { + skipUTF8ByteOrderMark(First, End); while (First != End) if (lexPPLine(First, End)) return true; Index: cfe/trunk/test/Lexer/minimize_source_to_dependency_directives_utf8bom.c === --- cfe/trunk/test/Lexer/minimize_source_to_dependency_directives_utf8bom.c +++ cfe/trunk/test/Lexer/minimize_source_to_dependency_directives_utf8bom.c @@ -0,0 +1,10 @@ +// Test UTF8 BOM at start of file +// RUN: printf '\xef\xbb\xbf' > %t.c +// RUN: echo '#ifdef TEST\n' >> %t.c +// RUN: echo '#include ' >> %t.c +// RUN: echo '#endif' >> %t.c +// RUN: %clang_cc1 -DTEST -print-dependency-directives-minimized-source %t.c 2>&1 | FileCheck %s + +// CHECK: #ifdef TEST +// CHECK-NEXT: #include +// CHECK-NEXT: #endif Index: cfe/trunk/lib/Lex/DependencyDirectivesSourceMinimizer.cpp === --- cfe/trunk/lib/Lex/DependencyDirectivesSourceMinimizer.cpp +++ cfe/trunk/lib/Lex/DependencyDirectivesSourceMinimizer.cpp @@ -834,7 +834,14 @@ return lexDefault(Kind, Id.Name, First, End); } +static void skipUTF8ByteOrderMark(const char *, const char *const End) { + if ((End - First) >= 3 && First[0] == '\xef' && First[1] == '\xbb' && + First[2] == '\xbf') +First += 3; +} + bool Minimizer::minimizeImpl(const char *First, const char *const End) { + skipUTF8ByteOrderMark(First, End); while (First != End) if (lexPPLine(First, End)) return true; Index: cfe/trunk/test/Lexer/minimize_source_to_dependency_directives_utf8bom.c === --- cfe/trunk/test/Lexer/minimize_source_to_dependency_directives_utf8bom.c +++ cfe/trunk/test/Lexer/minimize_source_to_dependency_directives_utf8bom.c @@ -0,0 +1,10 @@ +// Test UTF8 BOM at start of file +// RUN: printf '\xef\xbb\xbf' > %t.c +// RUN: echo '#ifdef TEST\n' >> %t.c +// RUN: echo '#include ' >> %t.c +// RUN: echo '#endif' >> %t.c +// RUN: %clang_cc1 -DTEST -print-dependency-directives-minimized-source %t.c 2>&1 | FileCheck %s + +// CHECK: #ifdef TEST +// CHECK-NEXT: #include +// CHECK-NEXT: #endif ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D66511: [clang-scan-deps] Skip UTF-8 BOM in source minimizer
aganea marked an inline comment as done. aganea added inline comments. Comment at: lib/Lex/DependencyDirectivesSourceMinimizer.cpp:822 bool Minimizer::minimizeImpl(const char *First, const char *const End) { + skipUTF8ByteOrderMark(First, End); while (First != End) dexonsmith wrote: > Is skipping this the right thing, or should it also be copied to the output? The code in `Lexer::InitLexer()` assumes the files are always encoded as UTF-8, it simply skips over the BOM like we do here. Repository: rC Clang CHANGES SINCE LAST ACTION https://reviews.llvm.org/D66511/new/ https://reviews.llvm.org/D66511 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D66511: [clang-scan-deps] Skip UTF-8 BOM in source minimizer
dexonsmith added inline comments. Comment at: lib/Lex/DependencyDirectivesSourceMinimizer.cpp:822 bool Minimizer::minimizeImpl(const char *First, const char *const End) { + skipUTF8ByteOrderMark(First, End); while (First != End) Is skipping this the right thing, or should it also be copied to the output? Repository: rC Clang CHANGES SINCE LAST ACTION https://reviews.llvm.org/D66511/new/ https://reviews.llvm.org/D66511 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D66511: [clang-scan-deps] Skip UTF-8 BOM in source minimizer
arphaman accepted this revision. arphaman added a comment. This revision is now accepted and ready to land. LGTM Repository: rC Clang CHANGES SINCE LAST ACTION https://reviews.llvm.org/D66511/new/ https://reviews.llvm.org/D66511 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D66511: [clang-scan-deps] Skip UTF-8 BOM in source minimizer
aganea created this revision. aganea added reviewers: arphaman, dexonsmith, Bigcheese. aganea added a project: clang. Herald added a subscriber: tschuett. As per title. Repository: rC Clang https://reviews.llvm.org/D66511 Files: lib/Lex/DependencyDirectivesSourceMinimizer.cpp test/Lexer/minimize_source_to_dependency_directives_utf8bom.c Index: test/Lexer/minimize_source_to_dependency_directives_utf8bom.c === --- test/Lexer/minimize_source_to_dependency_directives_utf8bom.c +++ test/Lexer/minimize_source_to_dependency_directives_utf8bom.c @@ -0,0 +1,10 @@ +// Test UTF8 BOM at start of file +// RUN: printf '\xef\xbb\xbf' > %t.c +// RUN: echo '#ifdef TEST\n' >> %t.c +// RUN: echo '#include ' >> %t.c +// RUN: echo '#endif' >> %t.c +// RUN: %clang_cc1 -DTEST -print-dependency-directives-minimized-source %t.c 2>&1 | FileCheck %s + +// CHECK: #ifdef TEST +// CHECK-NEXT: #include +// CHECK-NEXT: #endif Index: lib/Lex/DependencyDirectivesSourceMinimizer.cpp === --- lib/Lex/DependencyDirectivesSourceMinimizer.cpp +++ lib/Lex/DependencyDirectivesSourceMinimizer.cpp @@ -812,7 +812,14 @@ return lexDefault(Kind, Id.Name, First, End); } +static void skipUTF8ByteOrderMark(const char *, const char *const End) { + if ((End - First) >= 3 && First[0] == '\xef' && First[1] == '\xbb' && + First[2] == '\xbf') +First += 3; +} + bool Minimizer::minimizeImpl(const char *First, const char *const End) { + skipUTF8ByteOrderMark(First, End); while (First != End) if (lexPPLine(First, End)) return true; Index: test/Lexer/minimize_source_to_dependency_directives_utf8bom.c === --- test/Lexer/minimize_source_to_dependency_directives_utf8bom.c +++ test/Lexer/minimize_source_to_dependency_directives_utf8bom.c @@ -0,0 +1,10 @@ +// Test UTF8 BOM at start of file +// RUN: printf '\xef\xbb\xbf' > %t.c +// RUN: echo '#ifdef TEST\n' >> %t.c +// RUN: echo '#include ' >> %t.c +// RUN: echo '#endif' >> %t.c +// RUN: %clang_cc1 -DTEST -print-dependency-directives-minimized-source %t.c 2>&1 | FileCheck %s + +// CHECK: #ifdef TEST +// CHECK-NEXT: #include +// CHECK-NEXT: #endif Index: lib/Lex/DependencyDirectivesSourceMinimizer.cpp === --- lib/Lex/DependencyDirectivesSourceMinimizer.cpp +++ lib/Lex/DependencyDirectivesSourceMinimizer.cpp @@ -812,7 +812,14 @@ return lexDefault(Kind, Id.Name, First, End); } +static void skipUTF8ByteOrderMark(const char *, const char *const End) { + if ((End - First) >= 3 && First[0] == '\xef' && First[1] == '\xbb' && + First[2] == '\xbf') +First += 3; +} + bool Minimizer::minimizeImpl(const char *First, const char *const End) { + skipUTF8ByteOrderMark(First, End); while (First != End) if (lexPPLine(First, End)) return true; ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits