The thing is, the code point sequence you have here is not valid UTF-8 at all. If it is indeed doing the conversion from UTF-8 you will most likely get incorrect result or crashes.

As you realized and reported in another reply that you were actually testing with msvcrt. It is likely that msvcrt just ignored the unsupported locale and was doing something unspecified.

On 20/3/2023 19:07, 傅继晗 wrote:
However,I use GBK  as default code page in my windows , and I try to test it with GBK encoding content .But this trick seems still work.Here is the test case.
----------------------------------------------------------------------------------------------------
#include <stdio.h>
extern char * __cdecl basename (char *path);
void xprint(const char *s)
{
    while (*s)
        printf("\\x%02x", (int)(unsigned char)(*s++));
}

int main(int argc, char **argv)
{
    char input[] ={0x2f,0x73,0x64,0x63,0x61,0x72,0x64,0x2f,0xcc,0xec,0xcc,0xec,0xcf,0xf2,0xc9,0xcf,0x00}; // it is gbk encoding of "/sdcard/天天向上"
    char *output;
    printf("basename(\"");
    xprint(input);
    printf("\") = \"");
    output = basename(input);
    xprint(output);
    printf("\"\n");
    return 0;
}
----------------------------------------------------------------------------------------------------


Alvin Wong <al...@alvinhc.com> 于2023年3月20日周一 18:52写道:

    Hi,

    Thanks for sending the patches. However my comment on these
    patches will be that, they only work when the process ANSI
    codepage (ACP) is UTF-8, which requires either embedding a
    manifest with activeCodePage set to UTF-8 or setting the system
    ACP to UTF-8. If the process is using CP936 (GBK) for example, it
    will still be broken similar to before.

    Just my two cents: I would prefer to remove any code that changes
    the locale then attempt to restore it (which is not thread-safe),
    then replace `mbstowcs` and `wcstombs` with direct usage of
    `MultiByteToWideChar` and `WideCharToMultiByte`, which can convert
    from/to CP_ACP directly.

    Best Regards,
    Alvin

    On 20/3/2023 18:36, 傅继晗 wrote:
    ok,it has txt extension now

    Alvin Wong <al...@alvinhc.com> 于2023年3月20日周一 18:10写道:

        Hi, if you attached a patch in your mail, it has been
        stripped by the
        mailing list software. Please try renaming it to `.txt` and
        resend.

        On 20/3/2023 16:55, 傅继晗 wrote:
        > Hello maintainers:
        >
        > According to microsoft page:setlocale, _wsetlocale |
        Microsoft Learn
        >
        
<https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170>
        >
        > *Starting in Windows 10 version 1803 (10.0.17134.0), the
        Universal C
        > Runtime supports using a UTF-8 code page. The change means
        that char
        > strings passed to C runtime functions can expect strings in
        the UTF-8
        > encoding.*
        >
        > But the libmingwex.a in toolchain of Mingw-w64-public 
        doesn't support
        > non-ascii file name,and cause some bugs in project,see :
        > MinGW-w64 - for 32 and 64 bit Windows / Bugs / #227
        basename() truncates
        > filenames with variable-width encoding (sourceforge.net
        <http://sourceforge.net>)
        > <https://sourceforge.net/p/mingw-w64/bugs/227/>
        > and AOSP adb pull push error
        > Google Issue Tracker
        <https://issuetracker.google.com/issues/143232373>
        >
        > so,the patches for dirname.c and basename.c is needed to
        support utf-8
        > encoding.
        >
        > Greetings
        >
        > fjh1997
        >
        > _______________________________________________
        > Mingw-w64-public mailing list
        > Mingw-w64-public@lists.sourceforge.net
        > https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

_______________________________________________
Mingw-w64-public mailing list
Mingw-w64-public@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to