cor3ntin added a comment.

In D106577#2898967 <https://reviews.llvm.org/D106577#2898967>, 
@hubert.reinterpretcast wrote:

> In D106577#2897588 <https://reviews.llvm.org/D106577#2897588>, @aaron.ballman 
> wrote:
>
>> In D106577#2897522 <https://reviews.llvm.org/D106577#2897522>, @jyknight 
>> wrote:
>>
>>> I'm not sure we should be populating this.
>>>
>>> The _value_ is determined by what libc supports, so it probably needs to be 
>>> left up to libc to define it.
>>
>> Why is the value determined by what libc supports? The definition from the 
>> standard is:
>>
>>   If this symbol is defined, then every character in the Unicode required 
>> set, when stored in an
>>   object of type wchar_t, has the same value as the short identifier of that 
>> character.
>>
>> That doesn't seem to imply anything about the library, just the size of 
>> `wchar_t`.
>
> Every character in the Unicode required set encoded in what way? To say that 
> such a character is stored in an object of type `wchar_t` means that 
> interpreting the `wchar_t` yields that stored character. Methods to determine 
> the interpretation of the stored `wchar_t` value include locale-sensitive 
> functions such as `wcstombs` (and thus is tied to libc).

"has the same value as the short identifier of that character." implies UTF-32.
There is no mention of interpretation here, the *value* is the same. As in, 
when casting to an integer type you get the code point value.
*Storing* that value might involve either assigning from a wide-character 
literal or `mbrtowc`.
Both methods imply some transcoding,  the latter of which could be affected by 
locale such that it would store a different character, but again, is it related 
to this wording?

Note that by virtue of being a macro this cannot possibly be affected by locale.

A few scenarios

- The encoding of wide literal as determined by clang is not utf-32, the macro 
should be defined by neither the compiler nor the library
- The encoding of wide literals as determined by the compiler is utf-32, libc 
agrees... this works as intended
- The encoding of wide literals as determined by the compiler is utf-32, libc 
disagrees... nothing good can come of that.

The compiler and the libc have to agree here.
The library cannot (should not) define this macro without knowing the wide 
literal encoding.

Note that both standards imply that these macros should be defined when 
relevant independently of the environment which includes hosted and 
non-Linux+glibc platforms. So relying on a specific glibc implementation
seems dubious. Especially as glibc will *always* define that macro

Now, I agree that the compiler and the library should ideally expose the same 
*value* for this macro (although I struggle to find code that actually relies 
on the value)

When D34158 <https://reviews.llvm.org/D34158> as mentioned by @jyknight lands, 
the value will be set to that of the library version thereby overriding the 
compiler default.
On other systems, the value will be set to the library version whenever the 
library is included.

When we add support for non-utf wide execution encoding, we can use that 
information to not define this macro.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106577/new/

https://reviews.llvm.org/D106577

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to