On Thursday 16 January 2025 18:32:08 Lasse Collin wrote:
> On 2025-01-14 Pali Rohár wrote:
> > Well, warnings are warnings. They are always being added by new
> > compiler versions, so I would not be afraid of adding also in new
> > mingw-w64 version. And security "warning" for me sounds like a good
> > idea.
> 
> OK, I agree. :-)
> 
> Remember that I'm not a MinGW-w64 developer. Don't put too much weight
> on my opinions.
> 
> > > I have attached a draft patch (header bits are missing) and a demo
> > > program. It has the above features so it's possible to think if the
> > > extra code is worth it.  
> > 
> > I have already WIP something better, to handle all other parts, like
> > not leaking wide global variables, properly initialize of narrow
> > variables and fixing also direct usage of __getmainargs function by
> > applications. All these parts are not handled in tour draft. I need
> > more time to finish it and do more tests.
> 
> Great! :-)
> 
> > > With GB18030, U+FFFD consumes four bytes. So with that charset, the
> > > maximum possible byte count is larger.  
> > 
> > It is possible to change local process ucrt encoding to GB18030?
> 
> I suspect that it's not. UTF-8 might be the only locale code page that
> isn't single or double byte. So reserving space for UTF-8 should be
> enough. Even if some locale code page supports longer encodings
> someday, the name has to be very long to hit the limit and result in
> ENAMETOOLONG.
> 
> With UTF-8, even the current 255 bytes should almost never be a
> problem. Increasing NAME_MAX ensures that apps can list names in
> an unusual case too (but they still cannot list unpaired surrogates). On
> the other hand, the larger NAME_MAX may cause new problems if an app
> assumes that a filename always fits in MAX_PATH (260) bytes. The dirent
> API is from POSIX, so one would hope that apps ported from POSIX handle
> it well. I don't know if that is too optimistic.
> 
> > > > Maybe we would need type versioning? Like it was with time_t or
> > > > fpos_t which based on the compile time macro expands either to
> > > > old (32-bit) or new (64-bit type).  
> > > 
> > > If a third party header has
> > > 
> > >     include <dirent.h>
> > >     int foo(DIR *d);
> > > 
> > > it's not possible to know which version of the symbols were used
> > > when the library was compiled. To do versioning with only header
> > > macros, all participants have to co-operate. Ideally one doesn't
> > > use this kind of data types in API/ABI at all.  
> > 
> > Yes, that is truth. But same thing is already being done for time_t
> > types in both visual studio and mingw-w64 header files. There are some
> > defaults and via #define you change the behavior.
> > 
> > Also same was used for a long time by UNIX LFN which changed in this
> > way fpos_t and off_t types (plus redefined open, lseek and other
> > functions).
> 
> Those are good points, thanks. I still fear it could be messy.
> 
> I see that sizeof(DIR) depends on _USE_32BIT_TIME_T because DIR
> contains _finddata_t or _wfinddata_t. Luckily no one is supposed to
> access that structure directly.

That is really bad. So it means if the mingw-w64 runtime is compiled
with _USE_32BIT_TIME_T then opendir() provided by mingw-64 would be
ABI-incompatible with application which will use 64-bit time_t (as it
would change meaning of struct DIR).

So it means that whole struct DIR could be already broken for libraries
which exports functions like "int foo(DIR *d);"

I think that we need a new ABI for opendir/readdir without these
problems. And at the same time it can fix these problems related to
encoding.

> I will send a few dirent patches.

Feel free to put me into copy.

> I played around with flags that
> re-enable best-fit mapping or disallow filenames over 255 bytes (if
> someone needs those for compatibility reasons). Those won't be in the
> first version.
> 
>   - Having more than one flag might make the API too fancy in the same
>     sense as I commented about command line handling features. The
>     main problem isn't the few lines of extra code in dirent.c, it's
>     that few would use the extra features and that the risk of
>     incorrect use increases.
> 
>   - A global variable works for _dowildcard but it's problematic for
>     dirent because a library might want to set it too. A DLL and
>     application would have their own flags which would work, but if a
>     library is built statically then the same variable could be defined
>     twice and cause a linker error. Or if the flag variable is only
>     defined in the static library, it would affect the unsuspecting
>     application too.
> 
>   - <dirent.h> could have _opendir_lossy(const char *). Then one could
>     have something like:
> 
>         #ifdef _DIRENT_LOSSY
>         #   define opendir _opendir_lossy
>         #endif
> 
>     Apps could then #define _DIRENT_LOSSY and the code would be *source
>     compatible* with both old and new MinGW-w64. If an application
>     itself does "#define opendir _opendir_lossy" then the code would
>     only compile with new MinGW-w64.
> 
>   - It's easy to add d_lossy flag to struct dirent to mark which names
>     weren't properly converted. But again, it could be too fancy.
> 
> -- 
> Lasse Collin


_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to