On 2025-01-12 Pali Rohár wrote:
> GetCommandLineA(), __p__acmdln(), _acmdln and WinMain()'s lpCmdLine
> are all affected by this problem. In any case, I think we should mark
> all these functions and variables as security vulnerable. Is there
> some attribute for gcc for marking them, so usage would throw security
> compile warning? lpCmdLine the argument for WinMain() so such thing
> would require more smart things from compiler as the WinMain() is
> required (it is entry point), so function usage should not throw
> warning, but rather usage of its 3rd argument should.

The "deprecated" attribute can be used to mark functions and variables.
I think it doesn't help with a specific function argument. I'm not able
to comment if such marking is good here vs. how noisy or annoying the
warnings would be.

> > A global bool to detect if best-fit happened in the wargv-to-argv
> > step could cover a lot use cases already. This way it's possible to
> > provide a translated error message or do other customizations when
> > displaying the error.  
> 
> So just providing (from application point of view) readonly bool
> variable which would be set by mingw-w64 runtime code, if the bestfit
> happened or not?

Yes. I would be set to true if at least one of argv[] strings wasn't
converted losslessly. Apps can then decide what to do with the
information. It's not possible to provide a full solution inside
MinGW-w64 alone because different apps have different needs. This
method keeps the MinGW-w64 portion fairly minimal.

One could combine it with the the argv[i][-1] flag. It's not much extra
code. I don't know if it is worth even that though.

> When introducing such new variable, we would need some macro in header
> file, so application would know if it can access that new variable or
> not. To ensure that application can be compiled also with older
> mingw-w64 version (or with msvc).

Yes.

> argv[i][-1] is strange; it would be better argv[i][strlen(argv[i])+1].

It might look strange but in some situations it's a convenient method.
argv[i][strlen(argv[i])+1] is a more complex expression to type and
also a little less efficient.

> In most cases at &argv[i][-1] is same as &argv[i-1][strlen(argv[i-1])]
> and argv[] allocation algorithm can be harder to write as it is mostly
> one linear array.

It can be but it doesn't need to be. For example, duplicate_ppstrings()
in crtexe.c doesn't create a flat array.

If something about the command line handling is fixed in MinGW-w64, I
think it's good to start with the warg-to-argv method. Fixing the
quoting issue is a significant improvement and highly unlikely to cause
any breakage (unless someone has used fullwidth double quotes as ASCII
double quote replacements just for fun). So:

  - Do it with best-fit to minimize compatiblity risks.
  - Add a global boolean.
  - Perhaps add argv[i][-1].

I have attached a draft patch (header bits are missing) and a demo
program. It has the above features so it's possible to think if the
extra code is worth it.

_acmdln and WinMain() aren't so important at least when considering
apps ported from POSIX. I don't mean that nothing should be done about
them, I only mean that the warg-to-argv sounds like good starting
point.

> > > In limits.h we have: #define NAME_MAX 255  
> > 
> > I think it should be at least (255 * 3 + 1) which is needed for
> > UTF-8.  
> 
> IIRC 255 * 4 + 1. As one UNICODE code point can be encoded by 4 bytes.

wchar_t is 16 bits on Windows. The highest code point that can be
encoded using one wchar_t is U+FFFF. A three-byte UTF-8 character holds
16 bits, so the last three-byte character is U+FFFF. So one wchar_t
produces three bytes.

A four-byte UTF-8 character consumes two wide chars on Windows due
to surrogate pairs. Using two wide chars to produce four bytes results
in a smaller byte count than above.

With GB18030, U+FFFD consumes four bytes. So with that charset, the
maximum possible byte count is larger.

> > Perhaps (255 * MB_LEN_MAX + 1) is best. Then any file name in any  
> 
> In limits.h we have already #define MB_LEN_MAX 5
> So this should be safe.
> 
> Question is if the 255 is enough. IMHO it needs to be maximal length
> of the filename (in u16 unit) which can be used by the NT kernel. And
> I'm not sure what it is right now.

I'm not sure. 255 (without \0) tends to be a common limit in practice.
Multipying by MB_LEN_MAX produces a very excessively-sized buffer for
UTF-8 use at least.

I don't know in which charset a code point may encode to five bytes. If
it requires a surrogate pair then using 5 as a multiplier is silly.

> Now I understood what you mean. I guess that there can be DLL library
> which takes DIR* pointer parameter. And then we have a problem.

You understood it correctly. :-)

> Maybe we would need type versioning? Like it was with time_t or fpos_t
> which based on the compile time macro expands either to old (32-bit)
> or new (64-bit type).

If a third party header has

    include <dirent.h>
    int foo(DIR *d);

it's not possible to know which version of the symbols were used when
the library was compiled. To do versioning with only header macros, all
participants have to co-operate. Ideally one doesn't use this kind of
data types in API/ABI at all.

> > I (hopefully) fixed dirent issues in Gnulib:
> > 
> >     https://lists.gnu.org/archive/html/bug-gnulib/2025-01/msg00087.html

This didn't work out but I learned a few things which are useful *if*
MinGW-w64's dirent code is changed. I try to send something for
discussion in a few days.

-- 
Lasse Collin
#include <locale.h>
#include <stdio.h>

extern int _argv_is_unsafe;

int _dowildcard = 1;

int main(int argc, char **argv)
{
        setlocale(LC_ALL, "");

        printf("_argv_is_unsafe = %d\n", _argv_is_unsafe);
        printf("argc = %d\n", argc);

        for (int i = 0; i < argc; ++i)
                printf("%d  '%s'\n", argv[i][-1], argv[i]);

        return 0;
}
diff --git a/mingw-w64-crt/crt/crtexe.c b/mingw-w64-crt/crt/crtexe.c
index 328cc305b..392d27efe 100644
--- a/mingw-w64-crt/crt/crtexe.c
+++ b/mingw-w64-crt/crt/crtexe.c
@@ -65,7 +65,12 @@ extern LPTOP_LEVEL_EXCEPTION_FILTER __mingw_oldexcpt_handler;
 
 extern void _pei386_runtime_relocator (void);
 long CALLBACK _gnu_exception_handler (EXCEPTION_POINTERS * exception_data);
+
+#ifdef _UNICODE
 static void duplicate_ppstrings (int ac, _TCHAR ***av);
+#else
+int _argv_is_unsafe;
+#endif
 
 static int __cdecl pre_c_init (void);
 static void __cdecl pre_cpp_init (void);
@@ -134,7 +139,77 @@ pre_cpp_init (void)
 #ifdef _UNICODE
   argret = __wgetmainargs(&argc,&argv,&envp,_dowildcard,&startinfo);
 #else
-  argret = __getmainargs(&argc,&argv,&envp,_dowildcard,&startinfo);
+  // __getmainargs() parses the command line from _acmdln which has been
+  // converted to active code page using best-fit mapping. This can result
+  // in dangerous quoting breakage.
+  //
+  // Avoid the quoting breakage by doing the argument splitting and
+  // possible wildcard expansion as wide chars. Then convert wargv[]
+  // to active code page one argument at the time. It is done using
+  // best-fit mapping to minimize backward compatibility issues but
+  // at the cost of not preventing some dangerous situations.
+  //
+  // Environment will be taken directly as narrow chars later.
+  // That too uses best-fit mapping.
+  wchar_t **wargv;
+  wchar_t **wenvp_dummy;
+  argret = __wgetmainargs(&argc,&wargv,&wenvp_dummy,_dowildcard,&startinfo);
+  if (argret < 0)
+    _amsg_exit(8); /* _RT_SPACEARG */
+
+  // Calculate how much memory is needed for the multibyte argv[] elements.
+  size_t buf_size = 0;
+
+  for (int i = 0; i < argc; ++i) {
+    int size = WideCharToMultiByte(CP_ACP, 0,
+                                   wargv[i], -1,
+                                   NULL, 0,
+                                   NULL, NULL);
+    if (size <= 0) {
+      // This shouldn't happen.
+      _amsg_exit(8); /* _RT_SPACEARG */
+    }
+
+    // Reserve an extra byte for a lossy conversion indicator.
+    buf_size += 1 + size;
+  }
+
+  argv = malloc((argc + 1) * sizeof(char *));
+  char *buf = malloc(buf_size);
+  if (argv == NULL || buf == NULL)
+    _amsg_exit(8); /* _RT_SPACEARG */
+
+  for (int i = 0; i < argc; ++i) {
+    // In main(), argv[i][-1] is 1 if conversion of argv[i] was lossy.
+    BOOL conv_was_lossy = TRUE;
+    WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS,
+                        wargv[i], -1,
+                        NULL, 0,
+                        NULL, &conv_was_lossy);
+    _argv_is_unsafe |= conv_was_lossy;
+    *buf++ = conv_was_lossy;
+    --buf_size;
+
+    int size = WideCharToMultiByte(CP_ACP, 0,
+                                   wargv[i], -1,
+                                   buf, buf_size,
+                                   NULL, NULL);
+    argv[i] = buf;
+    buf += size;
+    buf_size -= size;
+  }
+
+  argv[argc] = NULL;
+
+  // Initialize narrow environment.
+  //
+  // IMPORTANT: Don't expand wildcards as that alone can be a security issue
+  // with __getmainargs! Best-fit mapping can result in arguments that point
+  // to network shares and wildcard expansion of such arguments could then
+  // result in unexpected network access.
+  int argc_dummy;
+  char **argv_dummy;
+  argret = __getmainargs(&argc_dummy,&argv_dummy,&envp,0,&startinfo);
 #endif
   if (argret < 0)
     _amsg_exit(8); /* _RT_SPACEARG */
@@ -256,7 +331,10 @@ __tmainCRTStartup (void)
 
     _fpreset ();
 
+#ifdef _UNICODE
+    // Narrow argv is already a local copy so only duplicate the wide version.
     duplicate_ppstrings (argc, &argv);
+#endif
     __main (); /* C++ initialization. */
 #ifdef _UNICODE
     __winitenv = envp;
@@ -314,6 +392,7 @@ check_managed_app (void)
   return 0;
 }
 
+#ifdef _UNICODE
 static void duplicate_ppstrings (int ac, _TCHAR ***av)
 {
        _TCHAR **avl;
@@ -330,6 +409,7 @@ static void duplicate_ppstrings (int ac, _TCHAR ***av)
        n[i] = NULL;
        *av = n;
 }
+#endif
 
 int __cdecl atexit (_PVFV func)
 {
_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to