The thing is, the code point sequence you have here is not valid UTF-8
at all. If it is indeed doing the conversion from UTF-8 you will most
likely get incorrect result or crashes.
As you realized and reported in another reply that you were actually
testing with msvcrt. It is likely that msvcrt just ignored the
unsupported locale and was doing something unspecified.
On 20/3/2023 19:07, 傅继晗 wrote:
However,I use GBK as default code page in my windows , and I try to
test it with GBK encoding content .But this trick seems still
work.Here is the test case.
----------------------------------------------------------------------------------------------------
#include <stdio.h>
extern char * __cdecl basename (char *path);
void xprint(const char *s)
{
while (*s)
printf("\\x%02x", (int)(unsigned char)(*s++));
}
int main(int argc, char **argv)
{
char input[]
={0x2f,0x73,0x64,0x63,0x61,0x72,0x64,0x2f,0xcc,0xec,0xcc,0xec,0xcf,0xf2,0xc9,0xcf,0x00};
// it is gbk encoding of "/sdcard/天天向上"
char *output;
printf("basename(\"");
xprint(input);
printf("\") = \"");
output = basename(input);
xprint(output);
printf("\"\n");
return 0;
}
----------------------------------------------------------------------------------------------------
Alvin Wong <al...@alvinhc.com> 于2023年3月20日周一 18:52写道:
Hi,
Thanks for sending the patches. However my comment on these
patches will be that, they only work when the process ANSI
codepage (ACP) is UTF-8, which requires either embedding a
manifest with activeCodePage set to UTF-8 or setting the system
ACP to UTF-8. If the process is using CP936 (GBK) for example, it
will still be broken similar to before.
Just my two cents: I would prefer to remove any code that changes
the locale then attempt to restore it (which is not thread-safe),
then replace `mbstowcs` and `wcstombs` with direct usage of
`MultiByteToWideChar` and `WideCharToMultiByte`, which can convert
from/to CP_ACP directly.
Best Regards,
Alvin
On 20/3/2023 18:36, 傅继晗 wrote:
ok,it has txt extension now
Alvin Wong <al...@alvinhc.com> 于2023年3月20日周一 18:10写道:
Hi, if you attached a patch in your mail, it has been
stripped by the
mailing list software. Please try renaming it to `.txt` and
resend.
On 20/3/2023 16:55, 傅继晗 wrote:
> Hello maintainers:
>
> According to microsoft page:setlocale, _wsetlocale |
Microsoft Learn
>
<https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170>
>
> *Starting in Windows 10 version 1803 (10.0.17134.0), the
Universal C
> Runtime supports using a UTF-8 code page. The change means
that char
> strings passed to C runtime functions can expect strings in
the UTF-8
> encoding.*
>
> But the libmingwex.a in toolchain of Mingw-w64-public
doesn't support
> non-ascii file name,and cause some bugs in project,see :
> MinGW-w64 - for 32 and 64 bit Windows / Bugs / #227
basename() truncates
> filenames with variable-width encoding (sourceforge.net
<http://sourceforge.net>)
> <https://sourceforge.net/p/mingw-w64/bugs/227/>
> and AOSP adb pull push error
> Google Issue Tracker
<https://issuetracker.google.com/issues/143232373>
>
> so,the patches for dirname.c and basename.c is needed to
support utf-8
> encoding.
>
> Greetings
>
> fjh1997
>
> _______________________________________________
> Mingw-w64-public mailing list
> Mingw-w64-public@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/mingw-w64-public
_______________________________________________
Mingw-w64-public mailing list
Mingw-w64-public@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public