[PATCH] D112914: Misleading identifier detection

2021-11-02 Thread Richard Smith - zygoloid via Phabricator via cfe-commits
rsmith added inline comments.



Comment at: clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.h:2
+//===--- MisleadingIdentifierCheck.h - clang-tidy *- 
C++
+//-*-===//
+//

This needs fixing too.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112914/new/

https://reviews.llvm.org/D112914

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D112914: Misleading identifier detection

2021-11-03 Thread serge via Phabricator via cfe-commits
serge-sans-paille updated this revision to Diff 384381.
serge-sans-paille added a comment.

Fix comments


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112914/new/

https://reviews.llvm.org/D112914

Files:
  clang-tools-extra/clang-tidy/misc/CMakeLists.txt
  clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.cpp
  clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.h
  clang-tools-extra/docs/ReleaseNotes.rst
  clang-tools-extra/docs/clang-tidy/checks/list.rst
  clang-tools-extra/docs/clang-tidy/checks/misc-misleading-identifier.rst
  clang-tools-extra/test/clang-tidy/checkers/misc-misleading-identifier.cpp

Index: clang-tools-extra/test/clang-tidy/checkers/misc-misleading-identifier.cpp
===
--- /dev/null
+++ clang-tools-extra/test/clang-tidy/checkers/misc-misleading-identifier.cpp
@@ -0,0 +1,15 @@
+// RUN: %check_clang_tidy %s misc-misleading-identifier %t
+
+#include 
+
+// CHECK-MESSAGES: :[[@LINE+1]]:1: warning: identifier has right-to-left codepoints
+short int א = (short int)0;
+// CHECK-MESSAGES: :[[@LINE+1]]:1: warning: identifier has right-to-left codepoints
+short int ג = (short int)12345;
+
+int main() {
+  // CHECK-MESSAGES: :[[@LINE+1]]:5: warning: identifier has right-to-left codepoints
+  int א = ג; // a local variable, set to zero?
+  printf("ג is %d\n", ג);
+  printf("א is %d\n", א);
+}
Index: clang-tools-extra/docs/clang-tidy/checks/misc-misleading-identifier.rst
===
--- /dev/null
+++ clang-tools-extra/docs/clang-tidy/checks/misc-misleading-identifier.rst
@@ -0,0 +1,23 @@
+.. title:: clang-tidy - misc-misleading-identifier
+
+misc-misleading-identifier
+==
+
+Finds identifiers that contain Unicode characters with right-to-left direction,
+which can be confusing as they may change the understanding of a whole statement
+line, as described in `Trojan Source `_.
+
+An example of such misleading code follows:
+
+.. code-block:: c
+
+#include 
+
+short int א = (short int)0;
+short int ג = (short int)12345;
+
+int main() {
+  int א = ג; // a local variable, set to zero?
+  printf("ג is %d\n", ג);
+  printf("א is %d\n", א);
+}
Index: clang-tools-extra/docs/clang-tidy/checks/list.rst
===
--- clang-tools-extra/docs/clang-tidy/checks/list.rst
+++ clang-tools-extra/docs/clang-tidy/checks/list.rst
@@ -211,6 +211,7 @@
`llvmlibc-implementation-in-namespace `_,
`llvmlibc-restrict-system-libc-headers `_, "Yes"
`misc-definitions-in-headers `_, "Yes"
+   `misc-misleading-identifier `_,
`misc-misplaced-const `_,
`misc-new-delete-overloads `_,
`misc-no-recursion `_,
Index: clang-tools-extra/docs/ReleaseNotes.rst
===
--- clang-tools-extra/docs/ReleaseNotes.rst
+++ clang-tools-extra/docs/ReleaseNotes.rst
@@ -94,12 +94,15 @@
   Reports identifiers whose names are too short. Currently checks local
   variables and function parameters only.
 
-
 - New :doc:`readability-data-pointer ` check.
 
   Finds cases where code could use ``data()`` rather than the address of the
   element at index 0 in a container.
 
+- New :doc:`misc-misleading-identifier ` check.
+
+  Reports identifier with unicode right-to-left characters.
+
 New check aliases
 ^
 
Index: clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.h
===
--- /dev/null
+++ clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.h
@@ -0,0 +1,31 @@
+//===--- MisleadingIdentifierCheck.h - clang-tidy ---*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MISC_MISLEADINGIDENTIFIERCHECK_H
+#define LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MISC_MISLEADINGIDENTIFIERCHECK_H
+
+#include "../ClangTidyCheck.h"
+
+namespace clang {
+namespace tidy {
+namespace misc {
+
+class MisleadingIdentifierCheck : public ClangTidyCheck {
+public:
+  MisleadingIdentifierCheck(StringRef Name, ClangTidyContext *Context);
+  ~MisleadingIdentifierCheck();
+
+  void registerMatchers(ast_matchers::MatchFinder *Finder) override;
+  void check(const ast_matchers::MatchFinder::MatchResult &Result) override;
+};
+
+} // namespace misc
+} // namespace tidy
+} // namespace clang
+
+#endif // LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MISC_MISLEADINGIDENTIFIERCHECK_H
Index: clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.cpp
===
--- /dev/null
+++ clang-tools-extra/clan

[PATCH] D112914: Misleading identifier detection

2021-11-04 Thread Richard Smith - zygoloid via Phabricator via cfe-commits
rsmith accepted this revision.
rsmith added a comment.

In D112914#3102728 , @carlosgalvezp 
wrote:

> I don't know how to remove the "Requested changes" from here so I'll just 
> remove myself from reviewer.

Marking the change as accepted should overwrite the "requested changes" state.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112914/new/

https://reviews.llvm.org/D112914

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D112914: Misleading identifier detection

2021-11-09 Thread serge via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG299aa4dfa1d8: Misleading unicode identifier detection pass 
(authored by serge-sans-paille).

Changed prior to commit:
  https://reviews.llvm.org/D112914?vs=384381&id=385805#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112914/new/

https://reviews.llvm.org/D112914

Files:
  clang-tools-extra/clang-tidy/misc/CMakeLists.txt
  clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.cpp
  clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.h
  clang-tools-extra/docs/ReleaseNotes.rst
  clang-tools-extra/docs/clang-tidy/checks/list.rst
  clang-tools-extra/docs/clang-tidy/checks/misc-misleading-identifier.rst
  clang-tools-extra/test/clang-tidy/checkers/misc-misleading-identifier.cpp

Index: clang-tools-extra/test/clang-tidy/checkers/misc-misleading-identifier.cpp
===
--- /dev/null
+++ clang-tools-extra/test/clang-tidy/checkers/misc-misleading-identifier.cpp
@@ -0,0 +1,15 @@
+// RUN: %check_clang_tidy %s misc-misleading-identifier %t
+
+#include 
+
+// CHECK-MESSAGES: :[[@LINE+1]]:1: warning: identifier has right-to-left codepoints
+short int א = (short int)0;
+// CHECK-MESSAGES: :[[@LINE+1]]:1: warning: identifier has right-to-left codepoints
+short int ג = (short int)12345;
+
+int main() {
+  // CHECK-MESSAGES: :[[@LINE+1]]:5: warning: identifier has right-to-left codepoints
+  int א = ג; // a local variable, set to zero?
+  printf("ג is %d\n", ג);
+  printf("א is %d\n", א);
+}
Index: clang-tools-extra/docs/clang-tidy/checks/misc-misleading-identifier.rst
===
--- /dev/null
+++ clang-tools-extra/docs/clang-tidy/checks/misc-misleading-identifier.rst
@@ -0,0 +1,23 @@
+.. title:: clang-tidy - misc-misleading-identifier
+
+misc-misleading-identifier
+==
+
+Finds identifiers that contain Unicode characters with right-to-left direction,
+which can be confusing as they may change the understanding of a whole statement
+line, as described in `Trojan Source `_.
+
+An example of such misleading code follows:
+
+.. code-block:: c
+
+#include 
+
+short int א = (short int)0;
+short int ג = (short int)12345;
+
+int main() {
+  int א = ג; // a local variable, set to zero?
+  printf("ג is %d\n", ג);
+  printf("א is %d\n", א);
+}
Index: clang-tools-extra/docs/clang-tidy/checks/list.rst
===
--- clang-tools-extra/docs/clang-tidy/checks/list.rst
+++ clang-tools-extra/docs/clang-tidy/checks/list.rst
@@ -212,6 +212,7 @@
`llvmlibc-implementation-in-namespace `_,
`llvmlibc-restrict-system-libc-headers `_, "Yes"
`misc-definitions-in-headers `_, "Yes"
+   `misc-misleading-identifier `_,
`misc-misplaced-const `_,
`misc-new-delete-overloads `_,
`misc-no-recursion `_,
@@ -449,4 +450,4 @@
`hicpp-vararg `_, `cppcoreguidelines-pro-type-vararg `_,
`llvm-else-after-return `_, `readability-else-after-return `_, "Yes"
`llvm-qualified-auto `_, `readability-qualified-auto `_, "Yes"
-   
\ No newline at end of file
+   
Index: clang-tools-extra/docs/ReleaseNotes.rst
===
--- clang-tools-extra/docs/ReleaseNotes.rst
+++ clang-tools-extra/docs/ReleaseNotes.rst
@@ -101,12 +101,15 @@
   Reports identifiers whose names are too short. Currently checks local
   variables and function parameters only.
 
-
 - New :doc:`readability-data-pointer ` check.
 
   Finds cases where code could use ``data()`` rather than the address of the
   element at index 0 in a container.
 
+- New :doc:`misc-misleading-identifier ` check.
+
+  Reports identifier with unicode right-to-left characters.
+
 New check aliases
 ^
 
Index: clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.h
===
--- /dev/null
+++ clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.h
@@ -0,0 +1,31 @@
+//===--- MisleadingIdentifierCheck.h - clang-tidy ---*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MISC_MISLEADINGIDENTIFIERCHECK_H
+#define LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MISC_MISLEADINGIDENTIFIERCHECK_H
+
+#include "../ClangTidyCheck.h"
+
+namespace clang {
+namespace tidy {
+namespace misc {
+
+class MisleadingIdentifierCheck : public ClangTidyCheck {
+public:
+  MisleadingIdentifierCheck(StringRef Name, ClangTidyConte

[PATCH] D112914: Misleading identifier detection

2021-11-09 Thread Simon Pilgrim via Phabricator via cfe-commits
RKSimon added a comment.

@serge-sans-paille Any luck with a fix for all the buildbot failures or should 
we revert ?  https://lab.llvm.org/buildbot/#/builders/109/builds/25932


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112914/new/

https://reviews.llvm.org/D112914

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D112914: Misleading identifier detection

2021-11-09 Thread Aaron Puchert via Phabricator via cfe-commits
aaronpuchert added inline comments.



Comment at: 
clang-tools-extra/docs/clang-tidy/checks/misc-misleading-identifier.rst:12
+
+.. code-block:: c
+

Sphinx can't parse that (perhaps unsurprisingly), could you declare that as 
having no language?

```
Warning, treated as error:
/home/buildbot/llvm-build-dir/clang-tools-sphinx-docs/llvm/src/clang-tools-extra/docs/clang-tidy/checks/misc-misleading-identifier.rst:12:Could
 not lex literal_block as "c". Highlighting skipped.
```

Should be reproducible by building `docs-clang-tools-html` (might need `-j1`).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112914/new/

https://reviews.llvm.org/D112914

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D112914: Misleading identifier detection

2021-11-09 Thread serge via Phabricator via cfe-commits
serge-sans-paille added a comment.

Yep, I'm currently validating locally, I somehow messed up a rebase :-/


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112914/new/

https://reviews.llvm.org/D112914

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D112914: Misleading identifier detection

2021-11-09 Thread Simon Pilgrim via Phabricator via cfe-commits
RKSimon added a comment.

In D112914#3119103 , 
@serge-sans-paille wrote:

> Yep, I'm currently validating locally, I somehow messed up a rebase :-/

I'm sorry @serge-sans-paille I missed this reply before pushing the revert :(


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112914/new/

https://reviews.llvm.org/D112914

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D112914: Misleading identifier detection

2021-11-09 Thread Nico Weber via Phabricator via cfe-commits
thakis added a comment.

The test seems to trigger an assert: http://45.33.8.238/linux/60293/step_9.txt

Please take a look!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112914/new/

https://reviews.llvm.org/D112914

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D112914: Misleading identifier detection

2021-11-09 Thread serge via Phabricator via cfe-commits
serge-sans-paille added a comment.

In D112914#3119704 , @thakis wrote:

> The test seems to trigger an assert: http://45.33.8.238/linux/60293/step_9.txt
>
> Please take a look!

It's getting late here: Reverted, I'll have a look tomorrow.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112914/new/

https://reviews.llvm.org/D112914

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D112914: Misleading identifier detection

2021-11-10 Thread Simon Pilgrim via Phabricator via cfe-commits
RKSimon added a comment.

@serge-sans-paille Still failing on 
https://lab.llvm.org/buildbot/#/builders/139/builds/12974


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112914/new/

https://reviews.llvm.org/D112914

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D112914: Misleading identifier detection

2021-11-10 Thread Simon Pilgrim via Phabricator via cfe-commits
RKSimon added inline comments.



Comment at: 
clang-tools-extra/test/clang-tidy/checkers/misc-misleading-identifier.cpp:3
+
+#include 
+

@serge-sans-paille Any chance you can remove the include and just forward 
declare printf()?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112914/new/

https://reviews.llvm.org/D112914

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D112914: Misleading identifier detection

2021-11-10 Thread serge via Phabricator via cfe-commits
serge-sans-paille added inline comments.



Comment at: 
clang-tools-extra/test/clang-tidy/checkers/misc-misleading-identifier.cpp:3
+
+#include 
+

RKSimon wrote:
> @serge-sans-paille Any chance you can remove the include and just forward 
> declare printf()?
Sure, I'll prepare a patch.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112914/new/

https://reviews.llvm.org/D112914

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D112914: Misleading identifier detection

2021-11-11 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added inline comments.



Comment at: 
clang-tools-extra/docs/clang-tidy/checks/misc-misleading-identifier.rst:12
+
+.. code-block:: c
+

aaronpuchert wrote:
> Sphinx can't parse that (perhaps unsurprisingly), could you declare that as 
> having no language?
> 
> ```
> Warning, treated as error:
> /home/buildbot/llvm-build-dir/clang-tools-sphinx-docs/llvm/src/clang-tools-extra/docs/clang-tidy/checks/misc-misleading-identifier.rst:12:Could
>  not lex literal_block as "c". Highlighting skipped.
> ```
> 
> Should be reproducible by building `docs-clang-tools-html` (might need `-j1`).
I fixed this in ac8c813b89f62efcaff4d0c92fdf1a5dd29d795d as the bot was red.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112914/new/

https://reviews.llvm.org/D112914

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D112914: Misleading identifier detection

2021-11-01 Thread serge via Phabricator via cfe-commits
serge-sans-paille created this revision.
Herald added subscribers: carlosgalvezp, mgorny.
serge-sans-paille requested review of this revision.
Herald added a project: clang-tools-extra.
Herald added a subscriber: cfe-commits.

Detect identifiers with right-to-left ordering through clang-tidy.
It detects a class of attack similar to the ones described in 
https://www.trojansource.codes/trojan-source.pdf


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D112914

Files:
  clang-tools-extra/clang-tidy/misc/CMakeLists.txt
  clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.cpp
  clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.h
  clang-tools-extra/test/clang-tidy/checkers/misc-misleading-identifier.cpp

Index: clang-tools-extra/test/clang-tidy/checkers/misc-misleading-identifier.cpp
===
--- /dev/null
+++ clang-tools-extra/test/clang-tidy/checkers/misc-misleading-identifier.cpp
@@ -0,0 +1,15 @@
+// RUN: %check_clang_tidy %s misc-misleading-identifier %t
+
+#include 
+
+// CHECK-MESSAGES: :[[@LINE+1]]:1: warning: identifier has right-to-left codepoints
+short int א = (short int)0;
+// CHECK-MESSAGES: :[[@LINE+1]]:1: warning: identifier has right-to-left codepoints
+short int ג = (short int)12345;
+
+int main() {
+  // CHECK-MESSAGES: :[[@LINE+1]]:5: warning: identifier has right-to-left codepoints
+  int א = ג; // a local variable, set to zero?
+  printf("ג is %d\n", ג);
+  printf("א is %d\n", א);
+}
Index: clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.h
===
--- /dev/null
+++ clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.h
@@ -0,0 +1,32 @@
+//===--- MisleadingIdentifierCheck.h - clang-tidy *- C++
+//-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MISC_MISLEADINGIDENTIFIERCHECK_H
+#define LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MISC_MISLEADINGIDENTIFIERCHECK_H
+
+#include "../ClangTidyCheck.h"
+
+namespace clang {
+namespace tidy {
+namespace misc {
+
+class MisleadingIdentifierCheck : public ClangTidyCheck {
+public:
+  MisleadingIdentifierCheck(StringRef Name, ClangTidyContext *Context);
+  ~MisleadingIdentifierCheck();
+
+  void registerMatchers(ast_matchers::MatchFinder *Finder) override;
+  void check(const ast_matchers::MatchFinder::MatchResult &Result) override;
+};
+
+} // namespace misc
+} // namespace tidy
+} // namespace clang
+
+#endif // LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MISC_MISLEADINGIDENTIFIERCHECK_H
Index: clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.cpp
===
--- /dev/null
+++ clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.cpp
@@ -0,0 +1,164 @@
+//===--- MisleadingIdentifier.cpp - clang-tidy
+//===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "MisleadingIdentifier.h"
+
+#include "clang/Frontend/CompilerInstance.h"
+#include "llvm/Support/ConvertUTF.h"
+
+namespace clang {
+namespace tidy {
+namespace misc {
+
+// see https://www.unicode.org/Public/14.0.0/ucd/extracted/DerivedBidiClass.txt
+static bool isUnassignedAL(llvm::UTF32 CP) {
+  return (0x0600 <= CP && CP <= 0x07BF) || (0x0860 <= CP && CP <= 0x08FF) ||
+ (0xFB50 <= CP && CP <= 0xFDCF) || (0xFDF0 <= CP && CP <= 0xFDFF) ||
+ (0xFE70 <= CP && CP <= 0xFEFF) ||
+ (0x00010D00 <= CP && CP <= 0x00010D3F) ||
+ (0x00010F30 <= CP && CP <= 0x00010F6F) ||
+ (0x0001EC70 <= CP && CP <= 0x0001ECBF) ||
+ (0x0001ED00 <= CP && CP <= 0x0001ED4F) ||
+ (0x0001EE00 <= CP && CP <= 0x0001EEFF);
+}
+
+// see https://www.unicode.org/Public/14.0.0/ucd/extracted/DerivedBidiClass.txt
+static bool isUnassignedR(llvm::UTF32 CP) {
+  return (0x0590 <= CP && CP <= 0x05FF) || (0x07C0 <= CP && CP <= 0x085F) ||
+ (0xFB1D <= CP && CP <= 0xFB4F) ||
+ (0x00010800 <= CP && CP <= 0x00010CFF) ||
+ (0x00010D40 <= CP && CP <= 0x00010F2F) ||
+ (0x00010F70 <= CP && CP <= 0x00010FFF) ||
+ (0x0001E800 <= CP && CP <= 0x0001EC6F) ||
+ (0x0001ECC0 <= CP && CP <= 0x0001ECFF) ||
+ (0x0001ED50 <= CP && CP <= 0x0001EDFF) ||
+ (0x0001EF00 <= CP && CP <= 0x0001EFFF);
+}
+
+// see https://www.unicode.org/Public/14.0.0/ucd/extracted/DerivedBidiClass.txt
+static bool isR(llvm::UTF32 CP) {
+  return (CP == 0x0590) || (CP ==

[PATCH] D112914: Misleading identifier detection

2021-11-01 Thread Richard Smith - zygoloid via Phabricator via cfe-commits
rsmith accepted this revision.
rsmith added a comment.
This revision is now accepted and ready to land.

Looks good as far as it goes (though I've not checked your classification 
functions are correct).

We should have some detection for RTL characters in string literals and 
comments too, at least when there is no subsequent character with strong LTR 
directionality. Neither `"` nor `*/` has strong directionality, so both `א"` 
and `א*/` can potentially leave us in an RTL state. For example:

  // This is "א" then + then "ג".
  "א" + "ג"
  
  // This is `a` then `/*א*/` then `*` then `-` then `/*ג*/` then `b`, aka "a * 
- b" not "a - * b".
  a /*א*/ * - /*ג*/ b




Comment at: clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.cpp:2
+//===--- MisleadingIdentifier.cpp - clang-tidy
+//===//
+//

Please fix



Comment at: clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.cpp:19
+
+// see https://www.unicode.org/Public/14.0.0/ucd/extracted/DerivedBidiClass.txt
+static bool isUnassignedAL(llvm::UTF32 CP) {

(throughout).



Comment at: clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.cpp:134-135
+llvm::strictConversion);
+if (Result != llvm::conversionOK)
+  break;
+if (isUnassignedAL(CodePoint) || isUnassignedR(CodePoint) || 
isR(CodePoint))

As in the other patch, we could recover here by skipping until we find a valid 
UTF-8 lead byte. It's not important here since invalid UTF-8 won't be accepted 
in an identifier anyway, but might be important if we start checking strings 
and comments for RTL characters too.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112914/new/

https://reviews.llvm.org/D112914

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D112914: Misleading identifier detection

2021-11-02 Thread serge via Phabricator via cfe-commits
serge-sans-paille updated this revision to Diff 384006.
serge-sans-paille added a comment.

- fix formatting
- added documentation
- *not* doing the extra work wrt. unicode error recovery


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112914/new/

https://reviews.llvm.org/D112914

Files:
  clang-tools-extra/clang-tidy/misc/CMakeLists.txt
  clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.cpp
  clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.h
  clang-tools-extra/docs/clang-tidy/checks/misc-misleading-identifier.rst
  clang-tools-extra/test/clang-tidy/checkers/misc-misleading-identifier.cpp

Index: clang-tools-extra/test/clang-tidy/checkers/misc-misleading-identifier.cpp
===
--- /dev/null
+++ clang-tools-extra/test/clang-tidy/checkers/misc-misleading-identifier.cpp
@@ -0,0 +1,15 @@
+// RUN: %check_clang_tidy %s misc-misleading-identifier %t
+
+#include 
+
+// CHECK-MESSAGES: :[[@LINE+1]]:1: warning: identifier has right-to-left codepoints
+short int א = (short int)0;
+// CHECK-MESSAGES: :[[@LINE+1]]:1: warning: identifier has right-to-left codepoints
+short int ג = (short int)12345;
+
+int main() {
+  // CHECK-MESSAGES: :[[@LINE+1]]:5: warning: identifier has right-to-left codepoints
+  int א = ג; // a local variable, set to zero?
+  printf("ג is %d\n", ג);
+  printf("א is %d\n", א);
+}
Index: clang-tools-extra/docs/clang-tidy/checks/misc-misleading-identifier.rst
===
--- /dev/null
+++ clang-tools-extra/docs/clang-tidy/checks/misc-misleading-identifier.rst
@@ -0,0 +1,23 @@
+.. title:: clang-tidy - misc-misleading-identifier
+
+misc-misleading-identifier
+==
+
+Finds identifiers that contain Unicode characetrs with right-to-left direction,
+which can be confusing as they may change the understanding of a whole statement
+line, as described in [Trojan Source](https://trojansource.codes/).
+
+An exemple of such misleading code follows:
+
+.. code-block:: c
+
+#include 
+
+short int א = (short int)0;
+short int ג = (short int)12345;
+
+int main() {
+  int א = ג; // a local variable, set to zero?
+  printf("ג is %d\n", ג);
+  printf("א is %d\n", א);
+}
Index: clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.h
===
--- /dev/null
+++ clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.h
@@ -0,0 +1,32 @@
+//===--- MisleadingIdentifierCheck.h - clang-tidy *- C++
+//-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MISC_MISLEADINGIDENTIFIERCHECK_H
+#define LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MISC_MISLEADINGIDENTIFIERCHECK_H
+
+#include "../ClangTidyCheck.h"
+
+namespace clang {
+namespace tidy {
+namespace misc {
+
+class MisleadingIdentifierCheck : public ClangTidyCheck {
+public:
+  MisleadingIdentifierCheck(StringRef Name, ClangTidyContext *Context);
+  ~MisleadingIdentifierCheck();
+
+  void registerMatchers(ast_matchers::MatchFinder *Finder) override;
+  void check(const ast_matchers::MatchFinder::MatchResult &Result) override;
+};
+
+} // namespace misc
+} // namespace tidy
+} // namespace clang
+
+#endif // LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MISC_MISLEADINGIDENTIFIERCHECK_H
Index: clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.cpp
===
--- /dev/null
+++ clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.cpp
@@ -0,0 +1,163 @@
+//===--- MisleadingIdentifier.cpp - clang-tidy-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "MisleadingIdentifier.h"
+
+#include "clang/Frontend/CompilerInstance.h"
+#include "llvm/Support/ConvertUTF.h"
+
+namespace clang {
+namespace tidy {
+namespace misc {
+
+// See https://www.unicode.org/Public/14.0.0/ucd/extracted/DerivedBidiClass.txt
+static bool isUnassignedAL(llvm::UTF32 CP) {
+  return (0x0600 <= CP && CP <= 0x07BF) || (0x0860 <= CP && CP <= 0x08FF) ||
+ (0xFB50 <= CP && CP <= 0xFDCF) || (0xFDF0 <= CP && CP <= 0xFDFF) ||
+ (0xFE70 <= CP && CP <= 0xFEFF) ||
+ (0x00010D00 <= CP && CP <= 0x00010D3F) ||
+ (0x00010F30 <= CP && CP <= 0x00010F6F) ||
+ (0x0001EC70 <= CP && CP <= 0x0001ECBF) ||
+ (0x0001ED00 <= CP && CP <= 0x0001ED4F) ||
+ (0x0001EE00 <= CP && CP <= 0x

[PATCH] D112914: Misleading identifier detection

2021-11-02 Thread Carlos Galvez via Phabricator via cfe-commits
carlosgalvezp requested changes to this revision.
carlosgalvezp added a comment.
This revision now requires changes to proceed.

- Add check to list of checks: 
https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/docs/clang-tidy/checks/list.rst
- Mention check in the Release Notes: 
https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/docs/ReleaseNotes.rst


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112914/new/

https://reviews.llvm.org/D112914

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D112914: Misleading identifier detection

2021-11-02 Thread serge via Phabricator via cfe-commits
serge-sans-paille updated this revision to Diff 384017.
serge-sans-paille added a comment.

Also update clang-tidy doc index


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112914/new/

https://reviews.llvm.org/D112914

Files:
  clang-tools-extra/clang-tidy/misc/CMakeLists.txt
  clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.cpp
  clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.h
  clang-tools-extra/docs/clang-tidy/checks/list.rst
  clang-tools-extra/docs/clang-tidy/checks/misc-misleading-identifier.rst
  clang-tools-extra/test/clang-tidy/checkers/misc-misleading-identifier.cpp

Index: clang-tools-extra/test/clang-tidy/checkers/misc-misleading-identifier.cpp
===
--- /dev/null
+++ clang-tools-extra/test/clang-tidy/checkers/misc-misleading-identifier.cpp
@@ -0,0 +1,15 @@
+// RUN: %check_clang_tidy %s misc-misleading-identifier %t
+
+#include 
+
+// CHECK-MESSAGES: :[[@LINE+1]]:1: warning: identifier has right-to-left codepoints
+short int א = (short int)0;
+// CHECK-MESSAGES: :[[@LINE+1]]:1: warning: identifier has right-to-left codepoints
+short int ג = (short int)12345;
+
+int main() {
+  // CHECK-MESSAGES: :[[@LINE+1]]:5: warning: identifier has right-to-left codepoints
+  int א = ג; // a local variable, set to zero?
+  printf("ג is %d\n", ג);
+  printf("א is %d\n", א);
+}
Index: clang-tools-extra/docs/clang-tidy/checks/misc-misleading-identifier.rst
===
--- /dev/null
+++ clang-tools-extra/docs/clang-tidy/checks/misc-misleading-identifier.rst
@@ -0,0 +1,23 @@
+.. title:: clang-tidy - misc-misleading-identifier
+
+misc-misleading-identifier
+==
+
+Finds identifiers that contain Unicode characetrs with right-to-left direction,
+which can be confusing as they may change the understanding of a whole statement
+line, as described in [Trojan Source](https://trojansource.codes/).
+
+An exemple of such misleading code follows:
+
+.. code-block:: c
+
+#include 
+
+short int א = (short int)0;
+short int ג = (short int)12345;
+
+int main() {
+  int א = ג; // a local variable, set to zero?
+  printf("ג is %d\n", ג);
+  printf("א is %d\n", א);
+}
Index: clang-tools-extra/docs/clang-tidy/checks/list.rst
===
--- clang-tools-extra/docs/clang-tidy/checks/list.rst
+++ clang-tools-extra/docs/clang-tidy/checks/list.rst
@@ -211,6 +211,7 @@
`llvmlibc-implementation-in-namespace `_,
`llvmlibc-restrict-system-libc-headers `_, "Yes"
`misc-definitions-in-headers `_, "Yes"
+   `misc-misleading-identifier `_,
`misc-misplaced-const `_,
`misc-new-delete-overloads `_,
`misc-no-recursion `_,
Index: clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.h
===
--- /dev/null
+++ clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.h
@@ -0,0 +1,32 @@
+//===--- MisleadingIdentifierCheck.h - clang-tidy *- C++
+//-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MISC_MISLEADINGIDENTIFIERCHECK_H
+#define LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MISC_MISLEADINGIDENTIFIERCHECK_H
+
+#include "../ClangTidyCheck.h"
+
+namespace clang {
+namespace tidy {
+namespace misc {
+
+class MisleadingIdentifierCheck : public ClangTidyCheck {
+public:
+  MisleadingIdentifierCheck(StringRef Name, ClangTidyContext *Context);
+  ~MisleadingIdentifierCheck();
+
+  void registerMatchers(ast_matchers::MatchFinder *Finder) override;
+  void check(const ast_matchers::MatchFinder::MatchResult &Result) override;
+};
+
+} // namespace misc
+} // namespace tidy
+} // namespace clang
+
+#endif // LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MISC_MISLEADINGIDENTIFIERCHECK_H
Index: clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.cpp
===
--- /dev/null
+++ clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.cpp
@@ -0,0 +1,163 @@
+//===--- MisleadingIdentifier.cpp - clang-tidy-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "MisleadingIdentifier.h"
+
+#include "clang/Frontend/CompilerInstance.h"
+#include "llvm/Support/ConvertUTF.h"
+
+namespace clang {
+namespace tidy {
+namespace misc {
+
+// See https://www.unicode.org/Public/14.0.0/ucd/extracted/De

[PATCH] D112914: Misleading identifier detection

2021-11-02 Thread serge via Phabricator via cfe-commits
serge-sans-paille updated this revision to Diff 384020.
serge-sans-paille added a comment.

Update release note too.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112914/new/

https://reviews.llvm.org/D112914

Files:
  clang-tools-extra/clang-tidy/misc/CMakeLists.txt
  clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.cpp
  clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.h
  clang-tools-extra/docs/ReleaseNotes.rst
  clang-tools-extra/docs/clang-tidy/checks/list.rst
  clang-tools-extra/docs/clang-tidy/checks/misc-misleading-identifier.rst
  clang-tools-extra/test/clang-tidy/checkers/misc-misleading-identifier.cpp

Index: clang-tools-extra/test/clang-tidy/checkers/misc-misleading-identifier.cpp
===
--- /dev/null
+++ clang-tools-extra/test/clang-tidy/checkers/misc-misleading-identifier.cpp
@@ -0,0 +1,15 @@
+// RUN: %check_clang_tidy %s misc-misleading-identifier %t
+
+#include 
+
+// CHECK-MESSAGES: :[[@LINE+1]]:1: warning: identifier has right-to-left codepoints
+short int א = (short int)0;
+// CHECK-MESSAGES: :[[@LINE+1]]:1: warning: identifier has right-to-left codepoints
+short int ג = (short int)12345;
+
+int main() {
+  // CHECK-MESSAGES: :[[@LINE+1]]:5: warning: identifier has right-to-left codepoints
+  int א = ג; // a local variable, set to zero?
+  printf("ג is %d\n", ג);
+  printf("א is %d\n", א);
+}
Index: clang-tools-extra/docs/clang-tidy/checks/misc-misleading-identifier.rst
===
--- /dev/null
+++ clang-tools-extra/docs/clang-tidy/checks/misc-misleading-identifier.rst
@@ -0,0 +1,23 @@
+.. title:: clang-tidy - misc-misleading-identifier
+
+misc-misleading-identifier
+==
+
+Finds identifiers that contain Unicode characetrs with right-to-left direction,
+which can be confusing as they may change the understanding of a whole statement
+line, as described in [Trojan Source](https://trojansource.codes/).
+
+An exemple of such misleading code follows:
+
+.. code-block:: c
+
+#include 
+
+short int א = (short int)0;
+short int ג = (short int)12345;
+
+int main() {
+  int א = ג; // a local variable, set to zero?
+  printf("ג is %d\n", ג);
+  printf("א is %d\n", א);
+}
Index: clang-tools-extra/docs/clang-tidy/checks/list.rst
===
--- clang-tools-extra/docs/clang-tidy/checks/list.rst
+++ clang-tools-extra/docs/clang-tidy/checks/list.rst
@@ -211,6 +211,7 @@
`llvmlibc-implementation-in-namespace `_,
`llvmlibc-restrict-system-libc-headers `_, "Yes"
`misc-definitions-in-headers `_, "Yes"
+   `misc-misleading-identifier `_,
`misc-misplaced-const `_,
`misc-new-delete-overloads `_,
`misc-no-recursion `_,
Index: clang-tools-extra/docs/ReleaseNotes.rst
===
--- clang-tools-extra/docs/ReleaseNotes.rst
+++ clang-tools-extra/docs/ReleaseNotes.rst
@@ -94,12 +94,15 @@
   Reports identifiers whose names are too short. Currently checks local
   variables and function parameters only.
 
-
 - New :doc:`readability-data-pointer ` check.
 
   Finds cases where code could use ``data()`` rather than the address of the
   element at index 0 in a container.
 
+- New :doc:`misc-misleading-identifier ` check.
+
+  Reports identifier with unicode right-to-left characters.
+
 New check aliases
 ^
 
Index: clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.h
===
--- /dev/null
+++ clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.h
@@ -0,0 +1,32 @@
+//===--- MisleadingIdentifierCheck.h - clang-tidy *- C++
+//-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MISC_MISLEADINGIDENTIFIERCHECK_H
+#define LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MISC_MISLEADINGIDENTIFIERCHECK_H
+
+#include "../ClangTidyCheck.h"
+
+namespace clang {
+namespace tidy {
+namespace misc {
+
+class MisleadingIdentifierCheck : public ClangTidyCheck {
+public:
+  MisleadingIdentifierCheck(StringRef Name, ClangTidyContext *Context);
+  ~MisleadingIdentifierCheck();
+
+  void registerMatchers(ast_matchers::MatchFinder *Finder) override;
+  void check(const ast_matchers::MatchFinder::MatchResult &Result) override;
+};
+
+} // namespace misc
+} // namespace tidy
+} // namespace clang
+
+#endif // LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MISC_MISLEADINGIDENTIFIERCHECK_H
Index: clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.cpp
===
--- /dev/null
+++

[PATCH] D112914: Misleading identifier detection

2021-11-02 Thread Carlos Galvez via Phabricator via cfe-commits
carlosgalvezp added a comment.

Looks good, I guess the license issue still needs to be sorted out?




Comment at: 
clang-tools-extra/docs/clang-tidy/checks/misc-misleading-identifier.rst:6
+
+Finds identifiers that contain Unicode characetrs with right-to-left direction,
+which can be confusing as they may change the understanding of a whole 
statement

characters



Comment at: 
clang-tools-extra/docs/clang-tidy/checks/misc-misleading-identifier.rst:10
+
+An exemple of such misleading code follows:
+

example


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112914/new/

https://reviews.llvm.org/D112914

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D112914: Misleading identifier detection

2021-11-02 Thread serge via Phabricator via cfe-commits
serge-sans-paille added a comment.

AKAIU, the licensing issue doesn't impact that particular review, only the one 
on confusable identifiers.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112914/new/

https://reviews.llvm.org/D112914

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D112914: Misleading identifier detection

2021-11-02 Thread Carlos Galvez via Phabricator via cfe-commits
carlosgalvezp resigned from this revision.
carlosgalvezp added a comment.
This revision is now accepted and ready to land.

Ok! I don't really know what applies when you take //part// of a file, so I'll 
leave that up to people who know. I don't know how to remove the "Requested 
changes" from here so I'll just remove myself from reviewer.

PS: I don't know what AKAIU means, nor can I find the answer in Google :)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112914/new/

https://reviews.llvm.org/D112914

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D112914: Misleading identifier detection

2021-11-02 Thread serge via Phabricator via cfe-commits
serge-sans-paille added a comment.

In D112914#3102728 , @carlosgalvezp 
wrote:

> Ok! I don't really know what applies when you take //part// of a file, so 
> I'll leave that up to people who know. I don't know how to remove the 
> "Requested changes" from here so I'll just remove myself from reviewer.

That's a valid question, but I guess that using spec content is ok... not 100% 
sure though.

> PS: I don't know what AKAIU means, nor can I find the answer in Google :)

I meant AFAIU, I'm so skilled I can make typo in acronyms :-)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112914/new/

https://reviews.llvm.org/D112914

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D112914: Misleading identifier detection

2021-11-02 Thread serge via Phabricator via cfe-commits
serge-sans-paille updated this revision to Diff 384118.
serge-sans-paille added a comment.

Minor typos


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112914/new/

https://reviews.llvm.org/D112914

Files:
  clang-tools-extra/clang-tidy/misc/CMakeLists.txt
  clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.cpp
  clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.h
  clang-tools-extra/docs/ReleaseNotes.rst
  clang-tools-extra/docs/clang-tidy/checks/list.rst
  clang-tools-extra/docs/clang-tidy/checks/misc-misleading-identifier.rst
  clang-tools-extra/test/clang-tidy/checkers/misc-misleading-identifier.cpp

Index: clang-tools-extra/test/clang-tidy/checkers/misc-misleading-identifier.cpp
===
--- /dev/null
+++ clang-tools-extra/test/clang-tidy/checkers/misc-misleading-identifier.cpp
@@ -0,0 +1,15 @@
+// RUN: %check_clang_tidy %s misc-misleading-identifier %t
+
+#include 
+
+// CHECK-MESSAGES: :[[@LINE+1]]:1: warning: identifier has right-to-left codepoints
+short int א = (short int)0;
+// CHECK-MESSAGES: :[[@LINE+1]]:1: warning: identifier has right-to-left codepoints
+short int ג = (short int)12345;
+
+int main() {
+  // CHECK-MESSAGES: :[[@LINE+1]]:5: warning: identifier has right-to-left codepoints
+  int א = ג; // a local variable, set to zero?
+  printf("ג is %d\n", ג);
+  printf("א is %d\n", א);
+}
Index: clang-tools-extra/docs/clang-tidy/checks/misc-misleading-identifier.rst
===
--- /dev/null
+++ clang-tools-extra/docs/clang-tidy/checks/misc-misleading-identifier.rst
@@ -0,0 +1,23 @@
+.. title:: clang-tidy - misc-misleading-identifier
+
+misc-misleading-identifier
+==
+
+Finds identifiers that contain Unicode characters with right-to-left direction,
+which can be confusing as they may change the understanding of a whole statement
+line, as described in `Trojan Source `_.
+
+An example of such misleading code follows:
+
+.. code-block:: c
+
+#include 
+
+short int א = (short int)0;
+short int ג = (short int)12345;
+
+int main() {
+  int א = ג; // a local variable, set to zero?
+  printf("ג is %d\n", ג);
+  printf("א is %d\n", א);
+}
Index: clang-tools-extra/docs/clang-tidy/checks/list.rst
===
--- clang-tools-extra/docs/clang-tidy/checks/list.rst
+++ clang-tools-extra/docs/clang-tidy/checks/list.rst
@@ -211,6 +211,7 @@
`llvmlibc-implementation-in-namespace `_,
`llvmlibc-restrict-system-libc-headers `_, "Yes"
`misc-definitions-in-headers `_, "Yes"
+   `misc-misleading-identifier `_,
`misc-misplaced-const `_,
`misc-new-delete-overloads `_,
`misc-no-recursion `_,
Index: clang-tools-extra/docs/ReleaseNotes.rst
===
--- clang-tools-extra/docs/ReleaseNotes.rst
+++ clang-tools-extra/docs/ReleaseNotes.rst
@@ -94,12 +94,15 @@
   Reports identifiers whose names are too short. Currently checks local
   variables and function parameters only.
 
-
 - New :doc:`readability-data-pointer ` check.
 
   Finds cases where code could use ``data()`` rather than the address of the
   element at index 0 in a container.
 
+- New :doc:`misc-misleading-identifier ` check.
+
+  Reports identifier with unicode right-to-left characters.
+
 New check aliases
 ^
 
Index: clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.h
===
--- /dev/null
+++ clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.h
@@ -0,0 +1,32 @@
+//===--- MisleadingIdentifierCheck.h - clang-tidy *- C++
+//-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MISC_MISLEADINGIDENTIFIERCHECK_H
+#define LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MISC_MISLEADINGIDENTIFIERCHECK_H
+
+#include "../ClangTidyCheck.h"
+
+namespace clang {
+namespace tidy {
+namespace misc {
+
+class MisleadingIdentifierCheck : public ClangTidyCheck {
+public:
+  MisleadingIdentifierCheck(StringRef Name, ClangTidyContext *Context);
+  ~MisleadingIdentifierCheck();
+
+  void registerMatchers(ast_matchers::MatchFinder *Finder) override;
+  void check(const ast_matchers::MatchFinder::MatchResult &Result) override;
+};
+
+} // namespace misc
+} // namespace tidy
+} // namespace clang
+
+#endif // LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MISC_MISLEADINGIDENTIFIERCHECK_H
Index: clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.cpp
===
--- /dev/null
+++ clang-tools