[PATCH] D112913: Misleading bidirectional detection

serge via Phabricator via cfe-commits Wed, 03 Nov 2021 13:39:17 -0700

serge-sans-paille added inline comments.


================
Comment at: clang-tools-extra/clang-tidy/misc/MisleadingBidirectional.cpp:59
+    // If conversion fails, utf-8 is designed so that we can just try next 
char.
+    if (Result != llvm::conversionOK) {
+      ++CurPtr;
----------------
rsmith wrote:
> Is there a guarantee that `convertUTF8Sequence` doesn't update `CurPtr` on 
> error? I'm concerned we might increment *past* the end in the case where 
> `CurPtr` points to the end, below, which would at least formally be UB.
According to the doc "If the conversion succeeds, this pointer will be updated 
to point to the byte just past the end of the converted sequence". My 
understanding of the implementation confirms that statements, and it's used in 
a similar manner in clang Lexer.


================
Comment at: clang-tools-extra/clang-tidy/misc/MisleadingBidirectional.cpp:60-62
+    // If conversion fails, we stop the analysis because we don't know how many
+    // characters should be skipped otherwise. That's an obvious way to bypass
+    // the check.
----------------
rsmith wrote:
> Can we scan forward looking for the next non-continuation byte? (Skip while 
> `c & 0b1100_0000 == 0b1000_0000`)
I'm not quite sure. There's a risk we end up starting over from the second part 
of a unicode character, and that could mess up with the decoding. I'll 
investigate how it's done in other libraries.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112913/new/

https://reviews.llvm.org/D112913

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D112913: Misleading bidirectional detection

Reply via email to