> Do we actually need to detect UTF-8 here, or can we just always assume the 
input is in UTF-8 as Clang does?

  There are plenty source code files in various 8-bit encodings out there, so 
we'd better be sure that we deal with valid UTF-8. And the "detection" is 
trivial - basically just a check, that the whole file is valid utf-8. The only 
overhead we have is from passing encoding everywhere. But it allows us to avoid 
checking validity of encoding of each token's text.

http://llvm-reviews.chandlerc.com/D918
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Reply via email to