Re: [Mingw-w64-public] dirent changes

Lasse Collin Mon, 27 Jan 2025 06:50:11 -0800

Another behavior difference happens with invalid multibyte strings.
I tested with UTF-8 in application manifest. A file named L"_\uFFFD_"
exists.


The UCRT functions fail if given invalid UTF-8:

    fopen("_\x80_", "r");
    _open("_\x80_", O_RDONLY);
    // _findfirst fails too

GetLastError() returns ERROR_NO_UNICODE_TRANSLATION.

Win32 API functions convert the invalid bytes to U+FFFD and then access
the resulting filename, so these succeed:

    GetFileAttributesA("_\x80_");

    WIN32_FIND_DATAA wfd;
    FindFirstFileA("_\x80_", &wfd);
    // wfd.cFileName contains "_\uFFFD_" in UTF-8.

Listing files in a directory works too, that is,
FindFirstFileA("_\x80_directory\\*", &wfd) lists files in
"_\ufffd_directory".

I suppose dirent should follow the UCRT behavior. This means using
MB_ERR_INVALID_CHARS with MultiByteToWideChar().

* * *

It was pointed out that using FindFirstFileExW() can improve speed if
one tells it to not list 8.3 names. I didn't see a difference on SSD
(or well, actually cached data in RAM). But 8.3 names are needed if
there was _readdir_8dot3() which would fall back to the 8.3 name if
conversion of the long name fails. I suppose it's a more sensible
fallback for some apps than imaginary names from best-fit mapping.

-- 
Lasse Collin


_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Re: [Mingw-w64-public] dirent changes

Reply via email to