Another behavior difference happens with invalid multibyte strings.
I tested with UTF-8 in application manifest. A file named L"_\uFFFD_"
exists.
The UCRT functions fail if given invalid UTF-8:
fopen("_\x80_", "r");
_open("_\x80_", O_RDONLY);
// _findfirst fails too
GetLastError() returns ERROR_NO_UNICODE_TRANSLATION.
Win32 API functions convert the invalid bytes to U+FFFD and then access
the resulting filename, so these succeed:
GetFileAttributesA("_\x80_");
WIN32_FIND_DATAA wfd;
FindFirstFileA("_\x80_", &wfd);
// wfd.cFileName contains "_\uFFFD_" in UTF-8.
Listing files in a directory works too, that is,
FindFirstFileA("_\x80_directory\\*", &wfd) lists files in
"_\ufffd_directory".
I suppose dirent should follow the UCRT behavior. This means using
MB_ERR_INVALID_CHARS with MultiByteToWideChar().
* * *
It was pointed out that using FindFirstFileExW() can improve speed if
one tells it to not list 8.3 names. I didn't see a difference on SSD
(or well, actually cached data in RAM). But 8.3 names are needed if
there was _readdir_8dot3() which would fall back to the 8.3 name if
conversion of the long name fails. I suppose it's a more sensible
fallback for some apps than imaginary names from best-fit mapping.
--
Lasse Collin
_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public