Re: [PATCH] UTF-8 support for clang-format.

Alexander Kornienko Wed, 05 Jun 2013 06:01:51 -0700


================
Comment at: unittests/Format/FormatTest.cpp:4890-4891
@@ +4889,4 @@
+               getLLVMStyleWithColumns(35));
+  verifyFormat("\"一 二 三 四 五 六 七 八 九 十\"",
+               getLLVMStyleWithColumns(21));
+  verifyFormat("// Однажды в студёную зимнюю пору...",
----------------
Daniel Jasper wrote:
> Dmitri Gribenko wrote:
> > OK, if we decide to count codepoints for now, please adjust the string 
> > length then -- it is currently computed as if a double-width character 
> > actually occupies 2 columns (but the code being tested counts it as one).
> > 
> Well, I think this is not at all what the test was meant to be testing. The 
> point is that the string contains 22 bytes .. And IMO, it would also be 22 
> columns with 2-column characters. So it does in fact test the right thing.
> 
> But I also don't see harm in using getLLVMStyleWithColumns(12). If anything 
> it makes the test stricter..
These characters are separated by spaces (to ensure clang-format can break the 
literal if it wants to), so the string contains 21 code points. The test is 
correct.


================
Comment at: unittests/Format/FormatTest.cpp:4931
@@ +4930,3 @@
+
+TEST_F(FormatTest, SplitsUTF8BlockComments) {
+  EXPECT_EQ("/* Гляжу,\n"
----------------
Daniel Jasper wrote:
> If I am correct, the chinese letters are just numbers, I hope the russian 
> characters don't mean anything offensive ;-)...
Test cases in Russian aren't offensive in any way. I guess, Dmitry Gribenko 
would notice, if they were.

================
Comment at: lib/Format/Utils.h:1
@@ +1,2 @@
+//===--- Utils.h - Format C++ code 
----------------------------------------===//
+//
----------------
Manuel Klimek wrote:
> Daniel Jasper wrote:
> > Please don't call this "Utils", this is far too generic. How about 
> > "Encodings"? I think hex/octal escape sequences are also a kind of encoding 
> > ..
> +1 to not have utils. Alternatively, create two headers, one for encodings, 
> and one for string literal related functions, and name them appropriately.
Utils.h -> Encoding.h with proper changes in all related names.

================
Comment at: lib/Format/FormatToken.h:96
@@ -94,3 +95,3 @@
   /// with the token.
   unsigned TokenLength;
 
----------------
Manuel Klimek wrote:
> Daniel Jasper wrote:
> > How about we make these slightly easier to understand and shorter?
> > 
> > What are the remaining usages of TokenLength? Would it make sense to rename 
> > that to "ByteCount"? And would it then make sense to rename CodePointCount 
> > to "TokenLength"? Or even just "Length" as we are in a class ..Token?
> I think I'd like to keep CodePointCount as it makes it really obvious we're 
> not counting bytes. I'm also in favor of renaming TokenLength to ByteCount 
> though, so the duality is more clear.
Good idea. Renaming TokenLength -> ByteCount discover a few more wrong usages, 
and led to cleaning up some code in TokenAnnotator.


http://llvm-reviews.chandlerc.com/D918
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [PATCH] UTF-8 support for clang-format.

Reply via email to