salman-javed-nz marked an inline comment as not done.
salman-javed-nz added inline comments.


================
Comment at: clang-tools-extra/clang-tidy/utils/HeaderGuard.cpp:28
+/// Replace invalid characters in the identifier with '_'.
+static std::string replaceInvalidChars(StringRef Identifier) {
+  std::string Fixed{Identifier};
----------------
aaron.ballman wrote:
> salman-javed-nz wrote:
> > aaron.ballman wrote:
> > > This fixes some issues but I think it introduces new issues too. The old 
> > > code would accept Unicode characters and pass them along as (valid) 
> > > identifier characters. The new code will replace all the Unicode 
> > > characters with `_` which means some people's header guards may go from 
> > > useful to `_______________`. We should definitely add test cases for 
> > > Unicode file names.
> > > 
> > > Ultimately, I think this functionality will want to use 
> > > `UnicodeCharSets.h` to implement something along the lines of what's done 
> > > in `isAllowedInitiallyIDChar()` and `isAllowedIDChar()` from Lexer.cpp.
> > `isAllowed*IDChar()`does exactly what I want, but is a static function in 
> > Lexer.cpp.
> > Should I be changing its declaration so that it's exposed via Lexer.h?
> > I don't know what the bar is to warrant changing core public Clang API. 
> > 
> > If I can just call the function directly from llvm-header-guard, I can 
> > avoid code duplication.
> > 
> > 
> We could expose the interface from `Lexer` if we feel that's the right 
> approach. I was going back and forth on whether it was the right approach and 
> eventually thought that we don't want to expose it directly because I don't 
> think we want clang-tidy to care about things like `LangOpts.AsmPreprocessor` 
> or even `LangOpts.DollarIdents`.
> 
> So my recommendation is to put this into LexerUtils.h/.cpp in clang-tidy 
> rather than changing the Clang Lexer interface. WDYT?
//(For brevity, when I say `isAllowedIDChar`, I mean both `isAllowedIDChar` and 
`isAllowedInitiallyIDChar`)//

I see two possible directions:

If I expose the interface from Lexer.h, then in Clang-Tidy, it's only a matter 
of calling 
`isAllowedIDChar(C, Context->getLangOpts())`. Clang-Tidy doesn't need to care 
about the contents of `LangOpts` - it just needs to pass them on as-is to the 
`isAllowedIDChar()` function.
`getLangOpts()` is already used all over the Clang-Tidy code base.

The other way is to create a copy of `isAllowedIDChar()` in the Clang-Tidy 
project and take out the parts to do with `LangOpts.AsmPreprocessor` and 
`LangOpts.DollarIdents`.  Now we have an implementation tailored to just the 
things Clang-Tidy cares about. I will need to add `clangLex` to CMakeLists.txt 
to gain visibility of clang/lib/Lex/UnicodeCharSets.h's character tables.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114149/new/

https://reviews.llvm.org/D114149

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to