On Sunday 26 January 2025 21:26:49 Lasse Collin wrote: > On 2025-01-26 Pali Rohár wrote: > > Maybe it could be a good idea to look into last released version of > > source code for UCRT. Such ___lc_codepage_func() / CP_UTF8 / > > AreFileApisANSI() / CP_ACP / CP_OEMCP should be there too (if it was > > correctly guessed). Maybe there could be some other corner cases? > > I cannot do that, sorry. Perhaps someone else can. > > > Slightly off-topic, not related to readdir, but could be interesting > > to check, what would happen if you call setlocale(LC_ALL, ".UTF-8") > > before __getmainargs() call (which is in mingw-w64 startup code > > crtexe.c)? Would this force UCRT to pass argv[] in UTF-8 encoding > > into main() even without having UTF-8 manifest? > > I didn't test but even if it worked, I suspect that ACP wouldn't become > UTF-8 and thus argv[] and CRT wouldn't be in sync with Win32 *A() APIs. > I *assume* that ACP is set fairly early, even before the first > instruction is run from the executable.
From your tests it looks like that setlocale(UTF-8) does not change ACP but rather somehow instruct UCRT to use UTF-8 encoding for UCRT narrow functions... and argv[] could be also treated as UCRT thing and hence that was my idea if is also affected by setlocale or not. > It's good to avoid the situation where CRT file system APIs use > different encoding than the *A() functions. There is code around that > uses both in parallel with the assumption that the encodings are the > same. That is a good argument to avoid using setlocale(UTF-8) at all. > One would think that setlocale(LC_ALL, ".UTF-8") is rare but I think > it's more common than it seems at first. <libintl.h> from > gettext-runtime overrides setlocale() with its intl_setlocale() > wrapper. The wrapper reads environment variables like LC_CTYPE (the > native setlocale() doesn't do that). > > Cygwin and MSYS2 default to UTF-8 locale and they export these POSIX > environment variables even when running native Windows programs. When > setlocale(LC_ALL, "") becomes intl_setlocale(LC_ALL, "") and there is > LC_CTYPE=en_US.UTF-8, one ends up with UTF-8 locale in UCRT but the > *A() APIs and argv[] are still in 1252. > > In MSYS2, you can try with /ucrt64/bin/size.exe by passing it a > filename that contains non-ASCII characters. It cannot open the file > because it tries to use ANSI encoded filename from argv[] with UCRT's > file system APIs that expect UTF-8 due to the locale. If you set > LC_CTYPE=C then it works. Ou, and that sounds like a very bad thing. Both UCRT fopen() and WinAPI CreateFileA() takes char* type, but in reality those are different types. And setlocale(UTF-8) can very easily mess with both APIs. > On the other hand, the <libint.h> setlocale() override is there only > when translations have been enabled. If a package is configured with > --disable-nls, then <libintl.h> isn't #included either and the LC_* > environment variables aren't obeyed on native Windows. (Packages that > use Gnulib might have a setlocale() override still though.) > > To keep things simpler, UTF-8 locales ideally wouldn't be used unless > ACP is UTF-8 (set in application manifest or globally in Windows > settings). It's not that simple though because, for some apps, > filenames don't matter but stdin/stdout encoding does. > > It's a curious mess. > > -- > Lasse Collin I'm surprised, what else we find if we are going to continue in this discussion? And how many security issues we have discovered which are now just waiting for targeted exploits? :D _______________________________________________ Mingw-w64-public mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/mingw-w64-public
