Initially it was a simple patch to make `btowc` and `wctob` match UCRT
behavior. If do serious changes to `btowc` and `wctob`, I think we should also
take a look at `mb*towc*` and `wc*tomb*` functions provided by mingw-w64.
I do not say and I do not think that we should replace `mb*towc*` and
`wc*tomb*` functions for UCRT. What we can do is make sure that provided
replacements match CRT's behavior (e.g. use lossy conversion and follow this
strange "C" locale behavior).
At this point it would be easier to implement both `btowc` and `wctob` in terms
of `mbrtowc` and `wcrtomb` respectively.
I suggest we start a new discussion in a new thread. I have some other details
regarding CRT's locale support since I am currently working on code which
implements POSIX locale functions on top of Win32 and CRT.
- Kirill Makurin
________________________________
From: LIU Hao
Sent: Saturday, June 14, 2025 8:55 PM
To: Kirill Makurin; mingw-w64-public
Subject: Re: [Mingw-w64-public] Inconsistent behavior of btowc with "C" locale
在 2025-6-8 00:21, Kirill Makurin 写道:
> I guess sticking to range [0,255] is our best choice.
>
> I attached patches.
>
Mostly these look good to me. However I get errors from libc++ testsuite:
https://github.com/lhmouse/mingw-w64/actions/runs/15650737822/job/44095645474#step:7:13365
which failed at this, which can by producedby installing mingw-w64 CRT with the
first patch and compiling
the testcase with `clang++ -static`:
std::locale l;
typedef std::ctype_byname<wchar_t> F;
std::locale ll(l, new F("C"));
const F& f = std::use_facet<F>(ll);
assert(f.widen(char(-5)) == L'\u00fb');
And here's backtrace:
#0 0x00007ff657205139 in btowc (c=-5) at misc/btowc.c:16
#1 0x00007ff6571fcd61 in std::__1::__locale::__btowc(int,
std::__1::__locale::__locale_t) ()
#2 0x00007ff6571dda9a in std::__1::ctype_byname<wchar_t>::do_widen(char)
const ()
#3 0x00007ff6571b19ac in
std::__1::ctype<wchar_t>::widen[abi:ne200100](char) const (this=0x5b9c40,
__c=-5 '\373') at C:/MSYS64/clang64/include/c++/v1/__locale:490
#4 0x00007ff6571b1884 in main () at test.cc:37
Here we can see the parameter `c` of type `int` is a sign-extension of the
argument, so I think this
if (cp == 0)
return (unsigned) c <= 0xFF ? c : WEOF;
is being skeptical. What if we blindly truncate `c`, just like the code beneath
it:
if (cp == 0)
return (unsigned char) c;
--
Best regards,
LIU Hao
_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public