================
Comment at: unittests/Format/FormatTest.cpp:4890-4891
@@ +4889,4 @@
+ getLLVMStyleWithColumns(35));
+ verifyFormat("\"一 二 三 四 五 六 七 八 九 十\"",
+ getLLVMStyleWithColumns(21));
+ verifyFormat("// Однажды в студёную зимнюю пору...",
----------------
Daniel Jasper wrote:
> Dmitri Gribenko wrote:
> > OK, if we decide to count codepoints for now, please adjust the string
> > length then -- it is currently computed as if a double-width character
> > actually occupies 2 columns (but the code being tested counts it as one).
> >
> Well, I think this is not at all what the test was meant to be testing. The
> point is that the string contains 22 bytes .. And IMO, it would also be 22
> columns with 2-column characters. So it does in fact test the right thing.
>
> But I also don't see harm in using getLLVMStyleWithColumns(12). If anything
> it makes the test stricter..
These characters are separated by spaces (to ensure clang-format can break the
literal if it wants to), so the string contains 21 code points. The test is
correct.
================
Comment at: unittests/Format/FormatTest.cpp:4931
@@ +4930,3 @@
+
+TEST_F(FormatTest, SplitsUTF8BlockComments) {
+ EXPECT_EQ("/* Гляжу,\n"
----------------
Daniel Jasper wrote:
> If I am correct, the chinese letters are just numbers, I hope the russian
> characters don't mean anything offensive ;-)...
Test cases in Russian aren't offensive in any way. I guess, Dmitry Gribenko
would notice, if they were.
================
Comment at: lib/Format/Utils.h:1
@@ +1,2 @@
+//===--- Utils.h - Format C++ code
----------------------------------------===//
+//
----------------
Manuel Klimek wrote:
> Daniel Jasper wrote:
> > Please don't call this "Utils", this is far too generic. How about
> > "Encodings"? I think hex/octal escape sequences are also a kind of encoding
> > ..
> +1 to not have utils. Alternatively, create two headers, one for encodings,
> and one for string literal related functions, and name them appropriately.
Utils.h -> Encoding.h with proper changes in all related names.
================
Comment at: lib/Format/FormatToken.h:96
@@ -94,3 +95,3 @@
/// with the token.
unsigned TokenLength;
----------------
Manuel Klimek wrote:
> Daniel Jasper wrote:
> > How about we make these slightly easier to understand and shorter?
> >
> > What are the remaining usages of TokenLength? Would it make sense to rename
> > that to "ByteCount"? And would it then make sense to rename CodePointCount
> > to "TokenLength"? Or even just "Length" as we are in a class ..Token?
> I think I'd like to keep CodePointCount as it makes it really obvious we're
> not counting bytes. I'm also in favor of renaming TokenLength to ByteCount
> though, so the duality is more clear.
Good idea. Renaming TokenLength -> ByteCount discover a few more wrong usages,
and led to cleaning up some code in TokenAnnotator.
http://llvm-reviews.chandlerc.com/D918
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits