On 25/3/2023 17:28, LIU Hao wrote:
在 2023-03-25 12:35, Alvin Wong 写道:
Can we just avoid converting to wide char at all and operate only in MBCS? IsDBCSLeadByte should be enough to allow these functions to skip any false matches on the second byte of double-byte chars. And it does not matter that IsDBCSLeadByte doesn't work with UTF-8, because the UTF-8 encoding already ensures that there will be no false matches with 7-bit ASCII chars (all bytes forming multi-byte chars have the MSB set, unlike some DBCS).

While this argument is almost correct on its own (except that `IsDBCSLeadByteEx()` is preferred to `IsDBCSLeadByte()`), we should not declare these functions as working with UTF-8. As explained in a previous message, the Yen symbol (`¥`, two bytes in UTF-8: C2 A5) is a path separator in Japanese locales, and the Won symbol (`₩`, three bytes in UTF-8: E2 82 A9) is also a path separator in Korean locales;

This claim needs to be verified. The native path separator on Windows should be only U+005C (with APIs also accepting U+002F). While both U+005C and U+00A5 translates to 0x5C in CP932, Windows uses Unicode to handle files and NTFS uses Unicode file names. If you give Windows the path `L"C:\134new\245folder"`, I can't really imagine it referring to `C:\new\folder` rather than `C:\new¥folder` when system code page is in Japanese. Of course, if you first translate the path to CP932, or if you are using a program that does not use the Unicode Windows APIs, then you will not be able to refer to `new¥folder`.

I think the following things need to be checked:

1.  From Windows Explorer, can you create a file or folder containing
   U+00A5 in its name on Japanese Windows? (Don't try from cmd.exe.)
2. If you create a file or folder containing U+00A5 on an NTFS volume
   from another non-Japanese system, can you access it from Windows
   Explorer on Japanese Windows?
3. Create the path `C:\new\folder` and try to access it using the
   Unicode Windows API with the path `L"C:\134new\245folder"`.
4. Create the path `C:\new¥folder` (with U+00A5) and try the same.
5. Check the above two points, but with embedded manifest setting the
   active code page to UTF-8, and using the "-A" APIs with a UTF-8
   string instead.
6. Check whether MultiByteToWideChar converts 0x5C from CP932 to U+005C
   or U+00A5.

Remember that U+005C and U+00A5 can look exactly the same in the Japanese font on Windows, so you should verify you have the correct code point when testing.

those are not something we can handle, because we can't know the encoding of the argument string.
We can check the value of `GetACP()`, although I am not convinced we need to.
_______________________________________________
Mingw-w64-public mailing list
Mingw-w64-public@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to