On 2025-01-12 Pali Rohár wrote:
> GetCommandLineA(), __p__acmdln(), _acmdln and WinMain()'s lpCmdLine
> are all affected by this problem. In any case, I think we should mark
> all these functions and variables as security vulnerable. Is there
> some attribute for gcc for marking them, so usage would throw security
> compile warning? lpCmdLine the argument for WinMain() so such thing
> would require more smart things from compiler as the WinMain() is
> required (it is entry point), so function usage should not throw
> warning, but rather usage of its 3rd argument should.
The "deprecated" attribute can be used to mark functions and variables.
I think it doesn't help with a specific function argument. I'm not able
to comment if such marking is good here vs. how noisy or annoying the
warnings would be.
> > A global bool to detect if best-fit happened in the wargv-to-argv
> > step could cover a lot use cases already. This way it's possible to
> > provide a translated error message or do other customizations when
> > displaying the error.
>
> So just providing (from application point of view) readonly bool
> variable which would be set by mingw-w64 runtime code, if the bestfit
> happened or not?
Yes. I would be set to true if at least one of argv[] strings wasn't
converted losslessly. Apps can then decide what to do with the
information. It's not possible to provide a full solution inside
MinGW-w64 alone because different apps have different needs. This
method keeps the MinGW-w64 portion fairly minimal.
One could combine it with the the argv[i][-1] flag. It's not much extra
code. I don't know if it is worth even that though.
> When introducing such new variable, we would need some macro in header
> file, so application would know if it can access that new variable or
> not. To ensure that application can be compiled also with older
> mingw-w64 version (or with msvc).
Yes.
> argv[i][-1] is strange; it would be better argv[i][strlen(argv[i])+1].
It might look strange but in some situations it's a convenient method.
argv[i][strlen(argv[i])+1] is a more complex expression to type and
also a little less efficient.
> In most cases at &argv[i][-1] is same as &argv[i-1][strlen(argv[i-1])]
> and argv[] allocation algorithm can be harder to write as it is mostly
> one linear array.
It can be but it doesn't need to be. For example, duplicate_ppstrings()
in crtexe.c doesn't create a flat array.
If something about the command line handling is fixed in MinGW-w64, I
think it's good to start with the warg-to-argv method. Fixing the
quoting issue is a significant improvement and highly unlikely to cause
any breakage (unless someone has used fullwidth double quotes as ASCII
double quote replacements just for fun). So:
- Do it with best-fit to minimize compatiblity risks.
- Add a global boolean.
- Perhaps add argv[i][-1].
I have attached a draft patch (header bits are missing) and a demo
program. It has the above features so it's possible to think if the
extra code is worth it.
_acmdln and WinMain() aren't so important at least when considering
apps ported from POSIX. I don't mean that nothing should be done about
them, I only mean that the warg-to-argv sounds like good starting
point.
> > > In limits.h we have: #define NAME_MAX 255
> >
> > I think it should be at least (255 * 3 + 1) which is needed for
> > UTF-8.
>
> IIRC 255 * 4 + 1. As one UNICODE code point can be encoded by 4 bytes.
wchar_t is 16 bits on Windows. The highest code point that can be
encoded using one wchar_t is U+FFFF. A three-byte UTF-8 character holds
16 bits, so the last three-byte character is U+FFFF. So one wchar_t
produces three bytes.
A four-byte UTF-8 character consumes two wide chars on Windows due
to surrogate pairs. Using two wide chars to produce four bytes results
in a smaller byte count than above.
With GB18030, U+FFFD consumes four bytes. So with that charset, the
maximum possible byte count is larger.
> > Perhaps (255 * MB_LEN_MAX + 1) is best. Then any file name in any
>
> In limits.h we have already #define MB_LEN_MAX 5
> So this should be safe.
>
> Question is if the 255 is enough. IMHO it needs to be maximal length
> of the filename (in u16 unit) which can be used by the NT kernel. And
> I'm not sure what it is right now.
I'm not sure. 255 (without \0) tends to be a common limit in practice.
Multipying by MB_LEN_MAX produces a very excessively-sized buffer for
UTF-8 use at least.
I don't know in which charset a code point may encode to five bytes. If
it requires a surrogate pair then using 5 as a multiplier is silly.
> Now I understood what you mean. I guess that there can be DLL library
> which takes DIR* pointer parameter. And then we have a problem.
You understood it correctly. :-)
> Maybe we would need type versioning? Like it was with time_t or fpos_t
> which based on the compile time macro expands either to old (32-bit)
> or new (64-bit type).
If a third party header has
include <dirent.h>
int foo(DIR *d);
it's not possible to know which version of the symbols were used when
the library was compiled. To do versioning with only header macros, all
participants have to co-operate. Ideally one doesn't use this kind of
data types in API/ABI at all.
> > I (hopefully) fixed dirent issues in Gnulib:
> >
> > https://lists.gnu.org/archive/html/bug-gnulib/2025-01/msg00087.html
This didn't work out but I learned a few things which are useful *if*
MinGW-w64's dirent code is changed. I try to send something for
discussion in a few days.
--
Lasse Collin
#include <locale.h>
#include <stdio.h>
extern int _argv_is_unsafe;
int _dowildcard = 1;
int main(int argc, char **argv)
{
setlocale(LC_ALL, "");
printf("_argv_is_unsafe = %d\n", _argv_is_unsafe);
printf("argc = %d\n", argc);
for (int i = 0; i < argc; ++i)
printf("%d '%s'\n", argv[i][-1], argv[i]);
return 0;
}
diff --git a/mingw-w64-crt/crt/crtexe.c b/mingw-w64-crt/crt/crtexe.c
index 328cc305b..392d27efe 100644
--- a/mingw-w64-crt/crt/crtexe.c
+++ b/mingw-w64-crt/crt/crtexe.c
@@ -65,7 +65,12 @@ extern LPTOP_LEVEL_EXCEPTION_FILTER __mingw_oldexcpt_handler;
extern void _pei386_runtime_relocator (void);
long CALLBACK _gnu_exception_handler (EXCEPTION_POINTERS * exception_data);
+
+#ifdef _UNICODE
static void duplicate_ppstrings (int ac, _TCHAR ***av);
+#else
+int _argv_is_unsafe;
+#endif
static int __cdecl pre_c_init (void);
static void __cdecl pre_cpp_init (void);
@@ -134,7 +139,77 @@ pre_cpp_init (void)
#ifdef _UNICODE
argret = __wgetmainargs(&argc,&argv,&envp,_dowildcard,&startinfo);
#else
- argret = __getmainargs(&argc,&argv,&envp,_dowildcard,&startinfo);
+ // __getmainargs() parses the command line from _acmdln which has been
+ // converted to active code page using best-fit mapping. This can result
+ // in dangerous quoting breakage.
+ //
+ // Avoid the quoting breakage by doing the argument splitting and
+ // possible wildcard expansion as wide chars. Then convert wargv[]
+ // to active code page one argument at the time. It is done using
+ // best-fit mapping to minimize backward compatibility issues but
+ // at the cost of not preventing some dangerous situations.
+ //
+ // Environment will be taken directly as narrow chars later.
+ // That too uses best-fit mapping.
+ wchar_t **wargv;
+ wchar_t **wenvp_dummy;
+ argret = __wgetmainargs(&argc,&wargv,&wenvp_dummy,_dowildcard,&startinfo);
+ if (argret < 0)
+ _amsg_exit(8); /* _RT_SPACEARG */
+
+ // Calculate how much memory is needed for the multibyte argv[] elements.
+ size_t buf_size = 0;
+
+ for (int i = 0; i < argc; ++i) {
+ int size = WideCharToMultiByte(CP_ACP, 0,
+ wargv[i], -1,
+ NULL, 0,
+ NULL, NULL);
+ if (size <= 0) {
+ // This shouldn't happen.
+ _amsg_exit(8); /* _RT_SPACEARG */
+ }
+
+ // Reserve an extra byte for a lossy conversion indicator.
+ buf_size += 1 + size;
+ }
+
+ argv = malloc((argc + 1) * sizeof(char *));
+ char *buf = malloc(buf_size);
+ if (argv == NULL || buf == NULL)
+ _amsg_exit(8); /* _RT_SPACEARG */
+
+ for (int i = 0; i < argc; ++i) {
+ // In main(), argv[i][-1] is 1 if conversion of argv[i] was lossy.
+ BOOL conv_was_lossy = TRUE;
+ WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS,
+ wargv[i], -1,
+ NULL, 0,
+ NULL, &conv_was_lossy);
+ _argv_is_unsafe |= conv_was_lossy;
+ *buf++ = conv_was_lossy;
+ --buf_size;
+
+ int size = WideCharToMultiByte(CP_ACP, 0,
+ wargv[i], -1,
+ buf, buf_size,
+ NULL, NULL);
+ argv[i] = buf;
+ buf += size;
+ buf_size -= size;
+ }
+
+ argv[argc] = NULL;
+
+ // Initialize narrow environment.
+ //
+ // IMPORTANT: Don't expand wildcards as that alone can be a security issue
+ // with __getmainargs! Best-fit mapping can result in arguments that point
+ // to network shares and wildcard expansion of such arguments could then
+ // result in unexpected network access.
+ int argc_dummy;
+ char **argv_dummy;
+ argret = __getmainargs(&argc_dummy,&argv_dummy,&envp,0,&startinfo);
#endif
if (argret < 0)
_amsg_exit(8); /* _RT_SPACEARG */
@@ -256,7 +331,10 @@ __tmainCRTStartup (void)
_fpreset ();
+#ifdef _UNICODE
+ // Narrow argv is already a local copy so only duplicate the wide version.
duplicate_ppstrings (argc, &argv);
+#endif
__main (); /* C++ initialization. */
#ifdef _UNICODE
__winitenv = envp;
@@ -314,6 +392,7 @@ check_managed_app (void)
return 0;
}
+#ifdef _UNICODE
static void duplicate_ppstrings (int ac, _TCHAR ***av)
{
_TCHAR **avl;
@@ -330,6 +409,7 @@ static void duplicate_ppstrings (int ac, _TCHAR ***av)
n[i] = NULL;
*av = n;
}
+#endif
int __cdecl atexit (_PVFV func)
{
_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public