barannikov88 added a comment. In D154290#4483055 <https://reviews.llvm.org/D154290#4483055>, @cor3ntin wrote:
> In D154290#4482975 <https://reviews.llvm.org/D154290#4482975>, @barannikov88 > wrote: > >> According to the current wording, the static_assert-message is either >> unevaluated string or an expression evaluated at compile time. >> Unevaluated strings don't allow certain escape sequences, but if I wrap the >> string in a string_view-like class, I'm allowed to use any escape sequeces, >> including '\x'. >> Moreover, wrapping a string in a class would change its encoding. >> Unevaluated strings are displayed as written in the source (that is, UTF-8), >> while wrapped strings undergo conversion to execution encoding (e.g. EBCDIC) >> and then printed in system locale, leading to mojibake. > > Not quite. > Unevaluated strings are always UTF-8 ( regardless of source file encoding). > Evaluated strings are in the literal encoding which is always UTF-8 for > clang. > This will change whenever we allow for different kinds of literal encodings > per this RFC > https://discourse.llvm.org/t/rfc-enabling-fexec-charset-support-to-llvm-and-clang-reposting/71512/1 > > If and when that is the case we will have to convert back to UTF-8 before > displaying - and then maybe convert back to the system locale depending on > host. > Numeric escape sequences can then occur in evaluated strings and produce > mojibake if the evaluated strings is not valid in the string literal encoding. > I don't believe that we would want to output static messages without > conversion on any system as the diagnostics framework is very much geared > towards UTF-8 and we want to keep supporting cross compilation. > > So the process will be > source -> utf8 -> literal encoding -> utf8 -> terminal encoding. Thanks for your reply, I think I see the idea. > By the same account, casting 0-extended utf-8 to char is fine until such time > clang support more than UTF-8. (which is one of the reasons we need to make > sure clang conversions utilities can convert from and to utf-8) > > Unevaluated strings were introduced in part to help identify what gets > converted and what does not. It is a bit strange that the string in `static_assert(false, "й")` is not converted, while it is converted in `static_assert(false, std::string_view("й"))`. It might be possible to achieve identical diagnostic output even with -fexec-charset supported (which would only affect the second form), but right now I'm confused by the distinction… Why don't always evaluate the message? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D154290/new/ https://reviews.llvm.org/D154290 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits