On Friday 10 January 2025 19:35:30 Lasse Collin wrote:
> On 2025-01-04 Pali Rohár wrote:
> > Interesting... What is not clear is if the research team reported this
> > bug to Microsoft and if Microsoft is going to address this issue at
> > some level, or at least document best practices what should affected
> > applications do.
> 
> There is a blog post with more details:
> 
>     
> https://devco.re/blog/2025/01/09/worstfit-unveiling-hidden-transformers-in-windows-ansi/

Thanks for the link! I was trying to find something but nothing seems to
be available on search engines yet.

> According to that, Microsoft knows this. The addition to the
> GetCommandLineA docs is mentioned. Only time will tell if more will
> come but I don't know how long it makes sense to wait.

So in case the Windows system libraries are not going fixed (which
applies e.g. for Windows 7 or any other older Windows versions) then
applications compiled and linked with mingw-w64 are affected by these
issues:

* GetCommandLineA() - interpreting this string may lead to issues

users of GetCommandLineA():
* _acmdln
* __p__acmdln() (pointer to _acmdln)

users of _acmdln:
* __getmainargs()
* mingw-w64 entry point of -mwindows builds, which calls WinMain()

users of __getmainargs():
* __argc
* __argv
* __p___argc() (pointer to __argc)
* __p___argv() (pointer to __argv)
* mingw-w64 entry point of -mconsole builds, which calls main()

We know about issues:
1. wrong interpretation of _acmdln
2. wrong split of command line string to argv-array
3. wrong expanding of wildcards on argv-array
4. wrong interpretation of argv[i] or __argv[i] by application itself
5. wrong interpretation of GetCommandLineA() by application itself

Then there is an issue that argv[i] is suppose to be filename argument
and even after the correct splitting of cmd-to-argv[], due to the
bestfit, the argv[i] describes other file than the original one in
wargv[i]. I was thinking more about this issue... but now I'm thinking
that this is not issue at all. The reason is that very similar behavior
has also Apple Mac OS X systems, then they are doing modified NFD UTF-8
normalization of filenames. So if somebody pass string which is not in
NFD form in argv[i] and application do file operation for argv[i], in
reality the system would not do anything with filename argv[i] but on
filename returned by normalized(argv[i]). Something similar has also
Linux with ext4 filesystem if for the directory is enabled unicode
semantics and/or case-insensitive semantic. So application would not do
operation on file which was passed in argv[i] but rather on some
modified filename. And AFAIK nobody treat this as a security issue.
Windows "besfit" for this case is just some kind of modification.


For fixing 1. the only way is to modify _acmdln and replace all
problematic characters which are result of bestfit by some
non-problematic replacement. As application may use _acmdln we have to
fill it with something which is as much as possible the original string,
but is not vulnerable anymore. Problematic characters include those
which are used for cmdline-to-argv[] splitting and also characters used
for wildcard expansion. Maybe we can provide global const variable with
extra problematic characters which can application define on its own?

For fixing 2. I think we can reuse your idea which you proposed. Do not
use msvcrt/ucrt __getmainargs() function, but rather msvcrt/ucrt
__wgetmainargs() function and do wchar_t* to char* conversion per
wargv[i] element. Just this needs to be done in very smart way to always
do that bestfit mapping because this function must not fail,
application main() has to be called and bestfit is the most sane
scenario. Ideally mingw-w64 should provide its own __getmainargs()
wrapper with all these fixes in import library, so if application calls
__getmainargs() manually, it will receive this mingw-w64 fix.
(Yes, there are applications which are doing it, we recently received
bug report that application cannot be linked because of my failure when
I forgot to define __imp symbol for this function during the last one
ABI fixup). I have already something work-in-progress as I was
inspecting what is __getmainargs() doing.

For fixing 3. I think that solution from 2. is enough.

Issue 4. is unfixable by the mingw-w64. Only application knows how is
going to use argv[] or __argv[]. mingw-w64 could help application by
providing:
* global bool variable which can disable bestfit mapping and/or call
  application callback function when it happens. For example application
  would want to print some error message and exit. Or mingw-w64 can
  provide generic error message.
* global bitmap array variable for each argv[i], to contains j-bit set
  if the bestfit was applied for argv[i][j]. For example application can
  use it to figure out if the part of the argv[i][a..b] range is safe to
  use or not. Application can use bestfit for any user visible test, but
  cannot use it for example when interpreting script language (for
  example: perl -E "...perl command..." must not use bestfit for argv[2]).

Issue 5. is hard. Maybe mingw-w64 can provide wrapper around
GetCommandLineA() function which will do similar thing as for issue 1.?

> MinGW-w64's <dirent.h> is affected too. One fix could be to skip over
> problematic filenames but remember that such skipping has occurred.
> Then, after the last filename in the directory, return an error. This
> way one would list all filenames that can be listed. On the other hand,
> fast error detection is better in some use cases.
> 
> According to POSIX, the correct errno value would be EOVERFLOW: "One of
> the values in the structure to be returned cannot be represented
> correctly."
> 
>     
> https://pubs.opengroup.org/onlinepubs/9799919799.2024edition/functions/readdir_r.html
> 
> If a 8.3 name is available, it could be returned instead but I fear it
> creates more issues than solves. For example, copying files with a
> wrong tool would result in loss of the long names like in 1990s. As
> always, the least bad choice depends on what exactly the application is
> doing with the filenames.
> 
> Can the structs in MinGW-w64's <dirent.h> be changed or are they all
> part of the ABI that shouldn't be broken? In struct dirent, extending
> "char d_name[260];" would allow long UTF-8 filenames to fit.
> 
> -- 
> Lasse Collin

This dirent.h and readdir functionality is not in msvcrt/ucrt libraries.
So it is always statically linked into the EXE application or DLL
library. Also it is not Windows function, so we have to just follow
POSIX or GNU APIs.

POSIX for struct dirent says:
https://pubs.opengroup.org/onlinepubs/9799919799.2024edition/basedefs/dirent.h.html

... define the structure dirent which shall include the following members:
ino_t  d_ino       File serial number.
char   d_name[]    Filename string of entry.
...
The array d_name in each of these structures is of unspecified size, but
shall contain a filename of at most {NAME_MAX} bytes followed by a
terminating null byte.

In limits.h we have: #define NAME_MAX 255

So I would say that changing sizeof(struct dirent) in mingw-w64 is less
problematic than changing something related to msvcrt/ucrt ABI because
it statically linked into DLL or EXE. Also readdir() function would be
used by the POSIX application, and because different systems (Linux,
BSD, Solaris, Hurd?) may have different size of these structures, POSIX
application should be written or prepared of unspecified size. So in my
opinion changing increasing size of the d_name[] member of struct dirent
could be possible. Also we should reflect it by the NAME_MAX constant
which is already defined incorrectly (it is 255, but d_name[] may return
260 byte long name).

What other people think about it?


_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to