On Monday 27 January 2025 16:49:26 Lasse Collin wrote:
> Another behavior difference happens with invalid multibyte strings.
> I tested with UTF-8 in application manifest. A file named L"_\uFFFD_"
> exists.
>
> The UCRT functions fail if given invalid UTF-8:
>
> fopen("_\x80_", "r");
> _open("_\x80_", O_RDONLY);
> // _findfirst fails too
>
> GetLastError() returns ERROR_NO_UNICODE_TRANSLATION.
>
> Win32 API functions convert the invalid bytes to U+FFFD and then access
> the resulting filename, so these succeed:
>
> GetFileAttributesA("_\x80_");
>
> WIN32_FIND_DATAA wfd;
> FindFirstFileA("_\x80_", &wfd);
> // wfd.cFileName contains "_\uFFFD_" in UTF-8.
>
> Listing files in a directory works too, that is,
> FindFirstFileA("_\x80_directory\\*", &wfd) lists files in
> "_\ufffd_directory".
>
> I suppose dirent should follow the UCRT behavior.
I agree with you. Autoconverting of 0x0080 to 0xFFFD is a bad idea.
> This means using MB_ERR_INVALID_CHARS with MultiByteToWideChar().
>
> * * *
>
> It was pointed out that using FindFirstFileExW() can improve speed if
> one tells it to not list 8.3 names. I didn't see a difference on SSD
> (or well, actually cached data in RAM). But 8.3 names are needed if
> there was _readdir_8dot3() which would fall back to the 8.3 name if
> conversion of the long name fails. I suppose it's a more sensible
> fallback for some apps than imaginary names from best-fit mapping.
>
> --
> Lasse Collin
I think that for excluding 8.3 names you mean to use FindExInfoBasic
level instead of FindExInfoStandard when doing FindFirstFileExW().
Level FindExInfoBasic is supported since Windows 7 and I think that
readdir() could be still useful also on Windows XP.
_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public