On 2025-02-17 Pali Rohár wrote:
> On Monday 17 February 2025 17:37:17 Lasse Collin wrote:
> > I suppose lstat() shouldn't follow any reparse points.
>
> That is a question for which I do not have answer.
One would need to figure out what is the least surprising behavior.
> Why should follow:
> - ensure that lstat() and stat() returns same data for file type which
> is not symbolic link (this is somehow what POSIX applications may
> except)
> - ensure that lstat() for mount points returns inode where mount point
> points and not the inode of the underlying mount point itself (this
> is again POSIX behavior)
These are very good reasons.
> Why should NOT follow:
> - some reparse point, specially those of name surrogate type,
> represents different file (possible remote) and lstat could return
> the real local information
> - reparse points for which driver is not installed cannot be opened in
> "follow mode"
If lstat() did follow and a driver is not installed, lstat() could fail
with a custom errno value, that is, something else than ENOENT. Perhaps
ENXIO could be OK. Maybe stat() could do the same.
> > Maybe there could
> > be non-POSIX macros like S_IFRPP and S_ISRPP(m) to indicate any
> > other reparse point than symlink or socket. Some OSes have extra
> > macros but I don't know if the idea makes sense here.
>
> Unless there is a real needs from applications, I would rather not add
> any new non-POSIX/non-GNU macros.
Right, better to not add anything special until there's a clear need
for it.
If something new was introduced, one variation is to make lstat()
follow reparse points except symlinks and sockets, and add _rppstat()
that returns S_IFRPP for other reparse points. The idea would be that
POSIXish apps that care about reparse points could be adapted with
smaller changes. I definitely am not suggesting to do this, I'm just
thinking out aloud. :-)
> > If UCRT added S_IFLNK and S_IFSOCK but used different constants than
> > mingw-w64, that would be a mess. I'm not proposing any S_ macro
> > additions in this email, I'm just thinking out aloud.
>
> IIRC UCRT has not added support yet. But sure it is not a good idea to
> add something incompatible.
Microsoft already uses the same values for S_IFxxx as everyone else. It
would be a big surprise if they did differently, for example, with
S_IFSOCK.
If they added S_IFBLK, there's a small chance that they could look at
mingw-w64, and then adopt the weird 0x3000. I wonder how much breakage
it would cause if S_IFBLK was changed in mingw-w64, and if that risk is
worth it to get rid of this small portability trap. I *guess* that both
negative and postive effects cannot affect many packages.
I quickly tried to find examples, and didn't find many. The tcl package
has ucrt64/include/tcl8.6/tcl-private/win/tclWinPort.h which defines
S_IFBLK to 0x6000 if S_IFBLK isn't already defined. It was added in
2016. The comment refers to "Linux and MingW" but the author might have
missed the odd value. Thus MSVC and mingw-w64 built tcl use different
S_IFBLK value.
https://packages.msys2.org/packages/mingw-w64-ucrt-x86_64-tcl
https://github.com/tcltk/tcl/commit/48e3873f199817b52d54a4d802966b508917d200
Python's stat.S_IFBLK and stat.S_ISBLK(mode) use the values from
<sys/stat.h>. Thus, MSYS2-Python uses 0x6000 and UCRT64-Python
uses 0x3000. I suppose MSVC-Python uses 0x6000 because it's the
fallback value when S_IFBLK isn't already defined. The comment about
the constants suggests that no one noticed that mingw-w64 has an
unusual value:
https://github.com/python/cpython/blob/b93b7e566e5a4efe7f077af2083140e50bd2b08f/Modules/_stat.c#L55
0x6000 is used in stat.py too, but due to the import at the bottom,
the values from _stat.c are used instead.
https://github.com/python/cpython/blob/b93b7e566e5a4efe7f077af2083140e50bd2b08f/Lib/stat.py#L33
You can try the following Python code:
import os
import stat
mode = os.stat(".").st_mode
print("0x%x" % mode)
print(stat.S_ISDIR(mode))
print(stat.S_ISBLK(mode))
mode = stat.S_IFBLK
print("0x%x" % mode)
print(stat.S_ISBLK(mode))
I'm not brave enough to actually propose that mingw-w64 should change
S_IFBLK to 0x6000, but I still wonder if keeping the current value does
more bad than good.
> I would propose: Do not add DT_BLK for now. And once we figure out
> that it is really useful, we can add it. So we do not need to think
> which value DT_BLK should have (unless there is no doubt that it
> should have numeric value XXXX).
I agree. It shouldn't be a disaster if S_IFBLK and DT_BLK values aren't
in sync, but since DT_BLK isn't actually required, it's better to not
define it for now.
> > Are you certain that IO_REPARSE_TAG_AF_UNIX is the right tag for
> > Win32 AF_UNIX?
>
> Yes, I'm sure.
Thanks! :-)
> > - If adding DT_SOCK, is this OK:
> >
> > static unsigned char
> > get_d_type (DWORD attrs, DWORD reparse_tag)
> > {
> > if (attrs & FILE_ATTRIBUTE_REPARSE_POINT)
> > {
> > switch (reparse_tag)
> > {
> > case IO_REPARSE_TAG_SYMLINK:
> > return DT_LNK;
> >
> > case IO_REPARSE_TAG_AF_UNIX:
> > return DT_SOCK;
> >
> > default:
> > return DT_UNKNOWN;
> > }
> > }
> >
> > return (attrs & FILE_ATTRIBUTE_DIRECTORY) ? DT_DIR : DT_REG;
> > }
>
> For me this looks good.
Thanks! I attached another fixup patch. I shortened seekdir doc in
dirent.h a bit too. It doesn't need to be quite so specific about the
implementation details.
> > I think NAME_MAX should be increased at least if modern MSVC doesn't
> > define it. Making NAME_MAX visible with _POSIX_C_SOURCE etc. should
> > be simple, one just needs to be careful that all relevant macros
> > are listed correctly.
>
> I do not know about this one point.
I have changed my mind. Let's not touch NAME_MAX. It might even be good
to keep it undefined as it is now (I suppose very few apps define the
non-standard _POSIX_ macro).
(1) Apps can misuse NAME_MAX. For example, in an earlier email I linked
to binutils bug about NAME_MAX and mingw-w64. Seems that binutils
uses NAME_MAX in exactly one place, and there it assumes that any
filename up to NAME_MAX ASCII chars can be *created*. If other apps
have similar assumptions, increasing NAME_MAX can create problems.
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=binutils/dlltool.c;h=ef21423b06d6b30956dd44adee9df8325bb48aa0;hb=HEAD#l4002
(2) Even if NAME_MAX was defined on a system, there might not be any
specific limit in reality. It's one reason why readdir_r() is
deprecated. For example, NAME_MAX is 255 on Linux, but on FAT32
mounted with iocharset=utf8, one can have 765 bytes long filenames
because the limit is 255 UTF-16 code units. Most apps don't care
about NAME_MAX, thus such names work normally.
Linux actually claims 2 * 765 for NAME_MAX on FAT32:
$ getconf NAME_MAX /mnt
1530
I don't think it's possible on FAT32. It's just a big enough value
calculated as FAT_LFN_LEN * NLS_MAX_CHARSET_SIZE.
(3) In POSIX.1-2024, the second paragraph under "Path Variable Names"
says that macros in <limits.h> shall be omitted in some cases. That
doesn't really apply to the case in mingw-w64, and there's no
pathconf() in mingw-w64, but the point is that apps shouldn't
assume that NAME_MAX exists.
https://pubs.opengroup.org/onlinepubs/009695399/basedefs/limits.h.html#tag_13_24_03_02
> Some more information. Visual C++ CRT file stdio.h defines following
> constant:
>
> #define FILENAME_MAX 260
>
> And it defines it since the first Visual C++ 1.0 version for 32-bit NT
> up to the last version with UCRT support.
>
> So I guess that value 260 comes from this FILENAME_MAX macro.
>
> But what is point of FILENAME_MAX I have no idea, it is not referenced
> or used in CRT header files.
FILENAME_MAX is at least in C99 and newer standards. Its definition
is kind of relaxed. See C99 7.19.1 or C11/C17/C23 7.21.1. Defining
FILENAME_MAX to the same value as MAX_PATH sounds sensible because
MAX_PATH or _MAX_PATH is a limit in a few CRT APIs.
Size of d_name is derived from WIN32_FIND_DATAW.cFileName[MAX_PATH] in
the new dirent code (_finddata_t.filename[_MAX_PATH] in the old code).
I suppose it's more logical to refer to MAX_PATH in the dirent.h
comments.
--
Lasse Collin
From 0bc73b9ae3dc02efa733ec45230315dd1a0c06a2 Mon Sep 17 00:00:00 2001
From: Lasse Collin <[email protected]>
Date: Tue, 18 Feb 2025 17:38:11 +0200
Subject: [PATCH 9/9] ... Reparse point tweaks: DT_LNK, DT_SOCK, DT_UNKNOWN
---
mingw-w64-crt/misc/dirent.c | 34 +++++++++-------
mingw-w64-headers/crt/dirent.h | 71 ++++++++++++++--------------------
2 files changed, 49 insertions(+), 56 deletions(-)
diff --git a/mingw-w64-crt/misc/dirent.c b/mingw-w64-crt/misc/dirent.c
index f11e2db83..90b5d2b08 100644
--- a/mingw-w64-crt/misc/dirent.c
+++ b/mingw-w64-crt/misc/dirent.c
@@ -382,21 +382,25 @@ opendir (const char *path)
static unsigned char
get_d_type (DWORD attrs, DWORD reparse_tag)
{
- /* Check for a reparse point before checking if it is a directory.
- * A reparse point can also have the directory attribute. This
- * includes "directory symlinks" too. */
- if ((attrs & FILE_ATTRIBUTE_REPARSE_POINT) &&
- reparse_tag == IO_REPARSE_TAG_SYMLINK)
- {
- return DT_LNK;
- }
-
- if (attrs & FILE_ATTRIBUTE_DIRECTORY)
- {
- return DT_DIR;
- }
-
- return DT_REG;
+ /* Check for a reparse point before checking if it is a directory.
+ * A reparse point can also have the directory attribute. This
+ * includes "directory symlinks" too. */
+ if (attrs & FILE_ATTRIBUTE_REPARSE_POINT)
+ {
+ switch (reparse_tag)
+ {
+ case IO_REPARSE_TAG_SYMLINK:
+ return DT_LNK;
+
+ case IO_REPARSE_TAG_AF_UNIX:
+ return DT_SOCK;
+
+ default:
+ return DT_UNKNOWN;
+ }
+ }
+
+ return (attrs & FILE_ATTRIBUTE_DIRECTORY) ? DT_DIR : DT_REG;
}
diff --git a/mingw-w64-headers/crt/dirent.h b/mingw-w64-headers/crt/dirent.h
index 06d95650a..7a27f725c 100644
--- a/mingw-w64-headers/crt/dirent.h
+++ b/mingw-w64-headers/crt/dirent.h
@@ -22,36 +22,32 @@ extern "C" {
#endif
-/* In addition to the d_ino and d_name members required by POSIX,
+/*
+ * In addition to the d_ino and d_name members required by POSIX,
* also d_type, d_8dot3, d_reclen, and d_namlen are present in
- * struct dirent and struct _wdirent. */
+ * struct dirent and struct _wdirent.
+ */
#define _DIRENT_HAVE_D_TYPE 1
#define _DIRENT_HAVE_D_8DOT3 1
#define _DIRENT_HAVE_D_RECLEN 1
#define _DIRENT_HAVE_D_NAMLEN 1
-/* Values for d_type:
- *
- * - DT_DIR and DT_REG are used when the entry is a directory or
- * a regular file.
- *
- * - DT_LNK is only used for reparse points that have the tag
- * IO_REPARSE_TAG_SYMLINK. Other reparse points are DT_DIR or DT_REG.
- *
- * - Other values aren't possible in a directory listing. They are only
- * defined for compatibility.
- *
- * The constants match glibc and *BSDs:
- *
- * - DT_FIFO == (S_IFIFO >> 12)
- * DT_CHR == (S_IFCHR >> 12)
- * DT_DIR == (S_IFDIR >> 12)
- * DT_REG == (S_IFREG >> 12)
- *
- * - There is no S_IFLNK on Windows. However, on other systems,
- * S_IFLNK == (S_IFREG | S_IFCHR), and S_IFCHR has the same
- * value in glibc, *BSDs, and Windows. Thus the same DT_LNK
- * constant is used here as on the other systems.
+/*
+ * Values for d_type:
+ * DT_DIR Not a reparse point and has FILE_ATTRIBUTE_DIRECTORY
+ * DT_REG Not a reparse point and doesn't have FILE_ATTRIBUTE_DIRECTORY
+ * DT_LNK A reparse point with the tag IO_REPARSE_TAG_SYMLINK
+ * DT_SOCK A reparse point with the tag IO_REPARSE_TAG_AF_UNIX
+ * DT_UNKNOWN All other reparse points
+ * DT_FIFO Not used (cannot appear in a directory listing)
+ * DT_CHR Not used (cannot appear in a directory listing)
+ *
+ * The constants are the same as on many POSIX systems and match the
+ * pattern DT_xxx == (S_IFxxx >> 12). However, as of Febrary 2025:
+ * - S_IFLNK and S_IFSOCK don't exist in mingw-w64.
+ * - S_IFBLK is a mingw-w64 extension with an unusual value 0x3000 instead
+ * of the normal 0x6000, thus DT_BLK was omitted for now. Like DT_FIFO
+ * and DT_CHR, DT_BLK wouldn't appear in a directory listing.
*/
#define DT_UNKNOWN 0
#define DT_FIFO 1
@@ -59,6 +55,7 @@ extern "C" {
#define DT_DIR 4
#define DT_REG 8
#define DT_LNK 10
+#define DT_SOCK 12
struct dirent
{
@@ -67,7 +64,7 @@ struct dirent
unsigned char d_8dot3; /* 0 or 1, see _readdir_8dot3. */
unsigned short d_reclen; /* Size of this struct. */
unsigned short d_namlen; /* Length of name in d_name. */
- char d_name[778]; /* [NAME_MAX + 1] */ /* File name. */
+ char d_name[778]; /* [(MAX_PATH-1)*3+1] */ /* File name. */
};
typedef struct __dirent_DIR DIR;
@@ -156,7 +153,7 @@ int __cdecl __MINGW_NOTHROW closedir (DIR*);
* or _readdir_8dot3.
*
* After rewinding, the next call to readdir, _readdir_8dot3, or _wreaddir
- * has to reopen the directory. It can fail, for example, if the directory
+ * will reopen the directory. It can fail, for example, if the directory
* has been renamed, removed, or access permissions have changed.
*/
void __cdecl __MINGW_NOTHROW rewinddir (DIR*);
@@ -173,20 +170,12 @@ long __cdecl __MINGW_NOTHROW telldir (DIR*);
* _wreaddir call. A possible delayed EILSEQ set by an earlier call to
* readdir or _readdir_8dot3 is preserved (this differs from rewinddir).
*
- * If seeking back by more than one entry, the next call to readdir,
- * _readdir_8dot3, or _wreaddir has to reopen the directory. It can fail,
- * for example, if the directory has been renamed, removed, or access
- * permissions have changed. If reopening is successful, the readdir call
- * will try to reach the requested position. This produces expected results
- * only if the file system always returns the entries in the same order.
- *
- * If seeking back by exactly one entry, the next readdir, _readdir_8dot3,
- * or _wreaddir will try to return the previous entry again which is fast.
- * However, this is guaranteed to succeed only when using _wreaddir in the
- * sequence telldir ; _wreaddir ; seekdir ; _wreaddir. If readdir or
- * _readdir_8dot3 is used instead, a possible character set conversion
- * problem may result in entries to be skipped, and then the seekdir
- * step needs to seek backwards more than one entry.
+ * If seeking backwards, the next call to readdir, _readdir_8dot3, or
+ * _wreaddir might need to reopen the directory. It can fail, for example,
+ * if the directory has been renamed, removed, or access permissions
+ * have changed. If reopening is successful, readdir will try to reach
+ * the requested position. This produces expected results only if the
+ * file system always returns the entries in the same order.
*/
void __cdecl __MINGW_NOTHROW seekdir (DIR*, long);
@@ -204,7 +193,7 @@ struct _wdirent
unsigned char d_8dot3; /* Always zero. */
unsigned short d_reclen; /* Size of this struct. */
unsigned short d_namlen; /* Length of name in d_name. */
- wchar_t d_name[260]; /* [FILENAME_MAX] */ /* File name. */
+ wchar_t d_name[260]; /* [MAX_PATH] */ /* File name. */
};
/*
--
2.48.1
_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public