On 2024-11-12 Pali Rohár wrote:
> On Monday 11 November 2024 22:10:01 Lasse Collin wrote:
> > If permissive mode can be enabled the same way as _dowildcard works,
> > then argv[] could be constructed with the same code as in strict
> > mode. The current crtexe.c in master has
> > 
> >   for (int i = 0; i < argc; ++i) {
> >     BOOL conv_was_lossy = TRUE;
> >     int size = WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS,
> >         wargv[i], -1,
> >         NULL, 0,
> >         NULL, &conv_was_lossy);
> > 
> > but it could become something like this:
> > 
> >   BOOL conv_was_lossy = FALSE;
> >   BOOL *conv_was_lossy_ptr = _strict_argv ? &conv_was_lossy : NULL;
> > 
> >   for (int i = 0; i < argc; ++i) {
> >     int size = WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS,
> >         wargv[i], -1,
> >         NULL, 0,
> >         NULL, conv_was_lossy_ptr);  
> 
> This does not fix _acmdln, which as I pointed is used in other places
> of mingw-w64. That code would be needed to move out its current place
> to first fix _acmdln.

Correct. I thought that the above still had some value:

(1) It can catch lossy conversion that comes from wildcard
    expansion. _acmdln + narrow arg parser cannot do that.

(2) If wildcard expansion is enabled, _acmdln + narrow parser could
    match wrong files. For example, if x is a problematic character,
    foox* might be sanitized to foo?* or foo_* which would then match
    files that foox* wouldn't have.

(3) If best-fit mapping is wanted, fullwidth double quotes in wargv[]
    become to ASCII double quotes in argv[]. It happens after argument
    splitting, thus it doesn't cause quoting breakage. I don't see how
    a similar thing could be achieved via _acmdln + narrow arg parser.
    This is a very minor detail so it might not matter if one doesn't
    get ASCII double quotes in argv[] in this case. :-)

Sanitizing _acmdln is needed still for code that reads it, including
crtexewin.c as you pointed out. I had missed that in the original fix
attempt.

If wildcard issues are ignored, then your idea of sanitizing _acmdln
and then using narrow argument parser should be OK. Ignoring the
best-fit mapping in wildcard expansion feels like a half-done solution
though.

> > It would allow apps to set _strict_argv = 0 to disable the
> > _exit(255) usage.
> > 
> > I don't know if permissive mode should use WC_NO_BEST_FIT_CHARS.
> > Keeping it can avoid some security risks. But maybe it could create
> > new issues if best-fit conversion happened to be preferred (for
> > example, command line argument is a message to show to a user).
> > 
> > If WC_NO_BEST_FIT_CHARS has compatibility concerns with very old
> > Windows versions but one doesn't want best-fit mapping, I wonder if
> > the lpDefaultChar argument is still usable. One could set it to, for
> > example, "?" or "_".  
> 
> At least every character which is by best-fit mapped to double quote
> has to be replaced.

Yes, that seems mandatory. Should something else be done?

(1) Disabling best-fit mapping completely is slightly simpler than
    sanitizing only double quotes. One just needs to create _acmdln with
    WC_NO_BEST_FIT_CHARS. This may add ASCII question marks which could
    have special meaning in some apps. In some code pages the
    replacement character might be different.

(2) Using WC_NO_BEST_FIT_CHARS but forcing the replacement character to
    something else than the default. While no character is safe in all
    cases, an underscore *might* be safer than a question mark. I mean,
    an underscore is less likely to have a special meaning. Like:

        WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS,
                            _wcmdln, -1,
                            _acmdln, acmdln_size, "_", NULL);

    On the other hand, underscore might look weirder as a replacement
    character in some situations.

I don't have a clear opinion now, I'm just writing down ideas.

> And before doing any future change I would rather think about it, to
> ensure that it handles all cases, and also all source files in
> mingw-w64. Are those only crtexe.c and crtexewin.c? Or are there also
> other files which parses cmdline?

Grepping MinGW-w64 for "cmdln" and "GetCommandLine" lists only a few
files if *.def.in and *.def are ignored. *If* searching for those
strings is enough, there aren't other files which parse the command
line.

> The only idea for FindFirstFileA/FindNextFindA which I have is to
> return "problematic" file names only in 8.3 notation. 8.3 notation is
> supported in all WinAPI calls, it is always in ASCII and has 1:1
> mapping with the original UNICODE file name. The only downside is
> that it may have totally different name as expected and can be
> confusing. But I think it is still better than returning the "wrong"
> name which would alias with other file.

The biggest downside is that on some file systems there are no 8.3
names. 8.3 names can be disabled and stripped with "fsutil". Thus using
8.3 names would workaround one issue but then be broken in a new way.

> Anyway, gcc / GNU LD still does not have support for marking PE binary
> as UTF-8 capable. So using UTF-8 code page for application linked by
> GNU LD is not an option for now. The reason is that GNU LD is not able
> to work with manifests (where are put all new PE flags / options).
> 
> In past I reported this missing UTF-8 support into GNU LD bugzilla:
> https://sourceware.org/bugzilla/show_bug.cgi?id=30382

It's easy to include a manifest file using a short resource file:

    #include <winresrc.h>
    CREATEPROCESS_MANIFEST_RESOURCE_ID RT_MANIFEST "app.manifest"

The problem is that one has to provide a full manifest. As far as I
know, currently one cannot specify "add UTF-8 to the default manifest"
when using MinGW-w64.

-- 
Lasse Collin


_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to