whisperity added inline comments.

================
Comment at: clang-tools-extra/clang-tidy/misc/Homoglyph.cpp:32-35
+/**
+ * Build a skeleton out of the Original identifier, following the algorithm
+ * described in http://www.unicode.org/reports/tr39/#def-skeleton
+ */
----------------
`/*` -> `//` or `///` (latter is for documentation comments)


================
Comment at: clang-tools-extra/clang-tidy/misc/Homoglyph.cpp:41-42
+
+  char const *Curr = SName.c_str();
+  char const *End = Curr + SName.size();
+  while (Curr < End) {
----------------
(Nit: in LLVM we use //"const west"//)


================
Comment at: clang-tools-extra/clang-tidy/misc/Homoglyph.cpp:48
+    llvm::ConversionResult Result = llvm::convertUTF8Sequence(
+        (const llvm::UTF8 **)&Curr, (const llvm::UTF8 *)End, &CodePoint,
+        llvm::strictConversion);
----------------
These casts are weird when reading the code. The problem with C-style casts is 
that the compiler will try to resolve it to //some// kind of cast, which makes 
them less "explicit" than would be good for type safety. Could you replace 
these with `static_cast`s? Or are these, in fact, `reinterpret_cast`s?


================
Comment at: clang-tools-extra/clang-tidy/misc/Homoglyph.cpp:64-68
+      llvm::UTF8 Buffer[32];
+      llvm::UTF8 *BufferStart = std::begin(Buffer);
+      llvm::UTF8 *IBuffer = BufferStart;
+      const llvm::UTF32 *ValuesStart = std::begin(Where->values);
+      const llvm::UTF32 *ValuesEnd =
----------------
(Nit: consider adding a `using namespace llvm;` **inside** the function and 
then you could reduce the length of these types.)


================
Comment at: clang-tools-extra/clang-tidy/misc/Homoglyph.cpp:84
+  if (const auto *ND = Result.Nodes.getNodeAs<NamedDecl>("nameddecl")) {
+    if(  IdentifierInfo *   II = ND->getIdentifier()) {
+      StringRef NDName = II->getName();
----------------
(Nit: formatting is broken here.)


================
Comment at: clang-tools-extra/clang-tidy/misc/Homoglyph.cpp:86-88
+      auto &Mapped = Mapper[skeleton(NDName)];
+      auto *NDDecl = ND->getDeclContext();
+      for (auto *OND : Mapped) {
----------------
(Nit: we tend to use `auto` if and only if the type of the variable is either 
reasonably impossible to spell out (e.g. an iterator into a container with 
3784723 parameters) or immediately obvious from the initialiser expression 
(e.g. the init is a cast).)


================
Comment at: clang-tools-extra/clang-tidy/misc/Homoglyph.cpp:93
+        if (OND->getName() != NDName) {
+          diag(OND->getLocation(), "%0 is confusable with %1")
+              << OND->getName() << NDName;
----------------
Perhaps you could elaborate on "confusable" here a bit for the user?


================
Comment at: clang-tools-extra/clang-tidy/misc/Homoglyph.cpp:94
+          diag(OND->getLocation(), "%0 is confusable with %1")
+              << OND->getName() << NDName;
+          diag(ND->getLocation(), "other definition found here",
----------------
Careful: `getName()` might assert away if the identifier isn't "trivially 
nameable", e.g. `operator==`. Is this prevented somewhere? (Otherwise, could we 
add a negative test case just to ensure no crashes start happening?) Operators 
are also `NamedDecl`s.


================
Comment at: clang-tools-extra/clang-tidy/misc/Homoglyph.cpp:95
+              << OND->getName() << NDName;
+          diag(ND->getLocation(), "other definition found here",
+               DiagnosticIDs::Note);
----------------
Definition or declaration? "Entity"?


================
Comment at: clang-tools-extra/clang-tidy/misc/Homoglyph.h:18
+
+class Homoglyph : public ClangTidyCheck {
+public:
----------------
This is the "top-level" check that is the addition in this patch, right?

In that case, this class is breaking naming conventions (usually the symbol 
name ends with `Check`) and the "usual" documentation comment from the class is 
also missing.

-----

Also, maybe a full rename to `ConfusableIdentifierCheck` would be worth it. If 
you'd like to stick to the current summary of the patch.


================
Comment at: clang-tools-extra/clang-tidy/misc/MiscTidyModule.cpp:41
         "misc-misleading-bidirectional");
+    CheckFactories.registerCheck<Homoglyph>("misc-homoglyph");
     CheckFactories.registerCheck<MisleadingIdentifierCheck>(
----------------
The comment about coming up with a better name for the check I think applies 
here too. I would be comfortable with `misc-confusable-identifiers`. The 
problem with `homoglyph` is that it requires a specific understanding of 
English and (natural) language studies that we should not hard-require from our 
users. If I have to look up on Wikipedia what the name of the rule I'm using 
means then something is wrong.


================
Comment at: clang-tools-extra/docs/ReleaseNotes.rst:141
+
+  Detects confusable unicode identifiers.
+
----------------
**U**nicode? Isn't it a trademark?


================
Comment at: clang-tools-extra/docs/clang-tidy/checks/misc-homoglyph.rst:8
+each other, but use different Unicode characters. This detects a potential
+attack described in `Trojan Source <https://www.trojansource.codes>`_.
+
----------------
Is the associated research paper **published** anywhere? I can only come up 
with arXiv preprints. However, adding a link to arXiv would be more "future 
proof" than the (agreeably fancy) website itself. The site mentions one and the 
research paper mentions two CVE numbers for the attack vector... perhaps we 
should just link the CVE database.


================
Comment at: clang-tools-extra/test/clang-tidy/checkers/misc-homoglyph.cpp:16-19
+int 𝐟i;
+// CHECK-MESSAGES: :[[#@LINE-1]]:5: warning: 𝐟i is confusable with fi 
[misc-homoglyph]
+int fi;
+// CHECK-MESSAGES: :[[#@LINE-1]]:5: note: other definition found here
----------------
(Nit: If we keep the message blocks together they will be more resilient to 
accidental modifications. You can have any kind of offset in the test comment, 
anyway.)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112916/new/

https://reviews.llvm.org/D112916

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to