salman-javed-nz marked an inline comment as not done.
salman-javed-nz added inline comments.


================
Comment at: clang-tools-extra/clang-tidy/utils/HeaderGuard.cpp:28
+/// Replace invalid characters in the identifier with '_'.
+static std::string replaceInvalidChars(StringRef Identifier) {
+  std::string Fixed{Identifier};
----------------
salman-javed-nz wrote:
> aaron.ballman wrote:
> > salman-javed-nz wrote:
> > > aaron.ballman wrote:
> > > > This fixes some issues but I think it introduces new issues too. The 
> > > > old code would accept Unicode characters and pass them along as (valid) 
> > > > identifier characters. The new code will replace all the Unicode 
> > > > characters with `_` which means some people's header guards may go from 
> > > > useful to `_______________`. We should definitely add test cases for 
> > > > Unicode file names.
> > > > 
> > > > Ultimately, I think this functionality will want to use 
> > > > `UnicodeCharSets.h` to implement something along the lines of what's 
> > > > done in `isAllowedInitiallyIDChar()` and `isAllowedIDChar()` from 
> > > > Lexer.cpp.
> > > `isAllowed*IDChar()`does exactly what I want, but is a static function in 
> > > Lexer.cpp.
> > > Should I be changing its declaration so that it's exposed via Lexer.h?
> > > I don't know what the bar is to warrant changing core public Clang API. 
> > > 
> > > If I can just call the function directly from llvm-header-guard, I can 
> > > avoid code duplication.
> > > 
> > > 
> > We could expose the interface from `Lexer` if we feel that's the right 
> > approach. I was going back and forth on whether it was the right approach 
> > and eventually thought that we don't want to expose it directly because I 
> > don't think we want clang-tidy to care about things like 
> > `LangOpts.AsmPreprocessor` or even `LangOpts.DollarIdents`.
> > 
> > So my recommendation is to put this into LexerUtils.h/.cpp in clang-tidy 
> > rather than changing the Clang Lexer interface. WDYT?
> //(For brevity, when I say `isAllowedIDChar`, I mean both `isAllowedIDChar` 
> and `isAllowedInitiallyIDChar`)//
> 
> I see two possible directions:
> 
> If I expose the interface from Lexer.h, then in Clang-Tidy, it's only a 
> matter of calling 
> `isAllowedIDChar(C, Context->getLangOpts())`. Clang-Tidy doesn't need to care 
> about the contents of `LangOpts` - it just needs to pass them on as-is to the 
> `isAllowedIDChar()` function.
> `getLangOpts()` is already used all over the Clang-Tidy code base.
> 
> The other way is to create a copy of `isAllowedIDChar()` in the Clang-Tidy 
> project and take out the parts to do with `LangOpts.AsmPreprocessor` and 
> `LangOpts.DollarIdents`.  Now we have an implementation tailored to just the 
> things Clang-Tidy cares about. I will need to add `clangLex` to 
> CMakeLists.txt to gain visibility of clang/lib/Lex/UnicodeCharSets.h's 
> character tables.
I need to make a correction to my previous comment (I swear Phabricator had an 
"edit comment" button but I just don't see it at the moment).

> The other way is to create a copy of `isAllowedIDChar()` in the Clang-Tidy 
> project and take out the parts to do with `LangOpts.AsmPreprocessor` and 
> `LangOpts.DollarIdents`.  Now we have an implementation tailored to just the 
> things Clang-Tidy cares about. I will need to add `clangLex` to 
> CMakeLists.txt to gain visibility of clang/lib/Lex/UnicodeCharSets.h's 
> character tables.

Of course clangTidyUtils already depends on clangLex, so we don't need another 
`add_clang_library` line in CMakeLists.txt, contrary to what I said earlier. 
However, UnicodeCharSets.h is still doesn't come up in Clang-Tidy's header 
search path because it doesn't sit in the include/clang folder. The file will 
need moving.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114149/new/

https://reviews.llvm.org/D114149

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to