On Friday 10 January 2025 19:35:30 Lasse Collin wrote: > On 2025-01-04 Pali Rohár wrote: > > Interesting... What is not clear is if the research team reported this > > bug to Microsoft and if Microsoft is going to address this issue at > > some level, or at least document best practices what should affected > > applications do. > > There is a blog post with more details: > > > https://devco.re/blog/2025/01/09/worstfit-unveiling-hidden-transformers-in-windows-ansi/
Thanks for the link! I was trying to find something but nothing seems to be available on search engines yet. > According to that, Microsoft knows this. The addition to the > GetCommandLineA docs is mentioned. Only time will tell if more will > come but I don't know how long it makes sense to wait. So in case the Windows system libraries are not going fixed (which applies e.g. for Windows 7 or any other older Windows versions) then applications compiled and linked with mingw-w64 are affected by these issues: * GetCommandLineA() - interpreting this string may lead to issues users of GetCommandLineA(): * _acmdln * __p__acmdln() (pointer to _acmdln) users of _acmdln: * __getmainargs() * mingw-w64 entry point of -mwindows builds, which calls WinMain() users of __getmainargs(): * __argc * __argv * __p___argc() (pointer to __argc) * __p___argv() (pointer to __argv) * mingw-w64 entry point of -mconsole builds, which calls main() We know about issues: 1. wrong interpretation of _acmdln 2. wrong split of command line string to argv-array 3. wrong expanding of wildcards on argv-array 4. wrong interpretation of argv[i] or __argv[i] by application itself 5. wrong interpretation of GetCommandLineA() by application itself Then there is an issue that argv[i] is suppose to be filename argument and even after the correct splitting of cmd-to-argv[], due to the bestfit, the argv[i] describes other file than the original one in wargv[i]. I was thinking more about this issue... but now I'm thinking that this is not issue at all. The reason is that very similar behavior has also Apple Mac OS X systems, then they are doing modified NFD UTF-8 normalization of filenames. So if somebody pass string which is not in NFD form in argv[i] and application do file operation for argv[i], in reality the system would not do anything with filename argv[i] but on filename returned by normalized(argv[i]). Something similar has also Linux with ext4 filesystem if for the directory is enabled unicode semantics and/or case-insensitive semantic. So application would not do operation on file which was passed in argv[i] but rather on some modified filename. And AFAIK nobody treat this as a security issue. Windows "besfit" for this case is just some kind of modification. For fixing 1. the only way is to modify _acmdln and replace all problematic characters which are result of bestfit by some non-problematic replacement. As application may use _acmdln we have to fill it with something which is as much as possible the original string, but is not vulnerable anymore. Problematic characters include those which are used for cmdline-to-argv[] splitting and also characters used for wildcard expansion. Maybe we can provide global const variable with extra problematic characters which can application define on its own? For fixing 2. I think we can reuse your idea which you proposed. Do not use msvcrt/ucrt __getmainargs() function, but rather msvcrt/ucrt __wgetmainargs() function and do wchar_t* to char* conversion per wargv[i] element. Just this needs to be done in very smart way to always do that bestfit mapping because this function must not fail, application main() has to be called and bestfit is the most sane scenario. Ideally mingw-w64 should provide its own __getmainargs() wrapper with all these fixes in import library, so if application calls __getmainargs() manually, it will receive this mingw-w64 fix. (Yes, there are applications which are doing it, we recently received bug report that application cannot be linked because of my failure when I forgot to define __imp symbol for this function during the last one ABI fixup). I have already something work-in-progress as I was inspecting what is __getmainargs() doing. For fixing 3. I think that solution from 2. is enough. Issue 4. is unfixable by the mingw-w64. Only application knows how is going to use argv[] or __argv[]. mingw-w64 could help application by providing: * global bool variable which can disable bestfit mapping and/or call application callback function when it happens. For example application would want to print some error message and exit. Or mingw-w64 can provide generic error message. * global bitmap array variable for each argv[i], to contains j-bit set if the bestfit was applied for argv[i][j]. For example application can use it to figure out if the part of the argv[i][a..b] range is safe to use or not. Application can use bestfit for any user visible test, but cannot use it for example when interpreting script language (for example: perl -E "...perl command..." must not use bestfit for argv[2]). Issue 5. is hard. Maybe mingw-w64 can provide wrapper around GetCommandLineA() function which will do similar thing as for issue 1.? > MinGW-w64's <dirent.h> is affected too. One fix could be to skip over > problematic filenames but remember that such skipping has occurred. > Then, after the last filename in the directory, return an error. This > way one would list all filenames that can be listed. On the other hand, > fast error detection is better in some use cases. > > According to POSIX, the correct errno value would be EOVERFLOW: "One of > the values in the structure to be returned cannot be represented > correctly." > > > https://pubs.opengroup.org/onlinepubs/9799919799.2024edition/functions/readdir_r.html > > If a 8.3 name is available, it could be returned instead but I fear it > creates more issues than solves. For example, copying files with a > wrong tool would result in loss of the long names like in 1990s. As > always, the least bad choice depends on what exactly the application is > doing with the filenames. > > Can the structs in MinGW-w64's <dirent.h> be changed or are they all > part of the ABI that shouldn't be broken? In struct dirent, extending > "char d_name[260];" would allow long UTF-8 filenames to fit. > > -- > Lasse Collin This dirent.h and readdir functionality is not in msvcrt/ucrt libraries. So it is always statically linked into the EXE application or DLL library. Also it is not Windows function, so we have to just follow POSIX or GNU APIs. POSIX for struct dirent says: https://pubs.opengroup.org/onlinepubs/9799919799.2024edition/basedefs/dirent.h.html ... define the structure dirent which shall include the following members: ino_t d_ino File serial number. char d_name[] Filename string of entry. ... The array d_name in each of these structures is of unspecified size, but shall contain a filename of at most {NAME_MAX} bytes followed by a terminating null byte. In limits.h we have: #define NAME_MAX 255 So I would say that changing sizeof(struct dirent) in mingw-w64 is less problematic than changing something related to msvcrt/ucrt ABI because it statically linked into DLL or EXE. Also readdir() function would be used by the POSIX application, and because different systems (Linux, BSD, Solaris, Hurd?) may have different size of these structures, POSIX application should be written or prepared of unspecified size. So in my opinion changing increasing size of the d_name[] member of struct dirent could be possible. Also we should reflect it by the NAME_MAX constant which is already defined incorrectly (it is 255, but d_name[] may return 260 byte long name). What other people think about it? _______________________________________________ Mingw-w64-public mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/mingw-w64-public
