On Monday 17 February 2025 17:37:17 Lasse Collin wrote:
> On 2025-02-16 Pali Rohár wrote:
> > On Sunday 16 February 2025 19:32:24 Lasse Collin wrote:
> > > (There's no lstat() to detect symlinks or readlink() to read them.)
> > 
> > I know. Maybe for future it would be nice to have lstat() call.
> > Implementation can be straightforward, open path as reparse point,
> > check if it s reparse point + retrieve reparse tag and then either
> > call fstat() with custom type (if it is reparse point) or call stat()
> > (if it is not reparse point).
> 
> lstat() might be nice because symlinks are a thing on Windows nowadays.
> <sys/stat.h> would need new macros:
> 
>   - S_IFLNK and S_ISLNK(m) for IO_REPARSE_TAG_SYMLINK
> 
>   - S_IFSOCK and S_ISSOCK(m) for IO_REPARSE_TAG_AF_UNIX

Sure.

> I suppose lstat() shouldn't follow any reparse points.

That is a question for which I do not have answer.

Why should follow:
- ensure that lstat() and stat() returns same data for file type which
  is not symbolic link (this is somehow what POSIX applications may
  except)
- ensure that lstat() for mount points returns inode where mount point
  points and not the inode of the underlying mount point itself (this is
  again POSIX behavior)

Why should NOT follow:
- some reparse point, specially those of name surrogate type, represents
  different file (possible remote) and lstat could return the real local
  information
- reparse points for which driver is not installed cannot be opened in
  "follow mode"

> Maybe there could
> be non-POSIX macros like S_IFRPP and S_ISRPP(m) to indicate any other
> reparse point than symlink or socket. Some OSes have extra macros but I
> don't know if the idea makes sense here.

Unless there is a real needs from applications, I would rather not add
any new non-POSIX/non-GNU macros.

> stat() would only need S_IFSOCK. Most reparse points likely should be
> transparent to stat() (to the extent it is possible).

For stat() probably yes, nothing more than S_IFSOCK.

But for fstat() it would be needed extension for S_IFLNK and maybe also
for S_IFBLK. I can imagine that it should be possible to use NT syscall
for open or regular WinAPI function for opening file, and open either
symlink or NT path like \Device\HarddiskVolume1, then use CRT function
_open_osfhandle() to associate C file descriptor for returned handle,
and use that C file descriptor for fstat().

Here for the unresolved symlink, the fstat() should be return S_IFLNK,
for the device representing the hardisk S_IFBLK, for WinSock socket
(does not matter if it is AF_UNIX or AF_INET or AF_INET6) S_IFSOCK.

> If UCRT added S_IFLNK and S_IFSOCK but used different constants than
> mingw-w64, that would be a mess. I'm not proposing any S_ macro
> additions in this email, I'm just thinking out aloud.

IIRC UCRT has not added support yet. But sure it is not a good idea to
add something incompatible.

> > > My point with the long list of attributes in get_d_type was to
> > > return DT_UNKNOWN if Microsoft added a new not-regular-file
> > > attribute some day, or if some application wants to handle reparse
> > > points specially (apps might be ported from POSIX with some extra
> > > code added on top to support Windows, so the end result can be a
> > > mix of both worlds). But I might have been over-thinking (wouldn't
> > > be the first time) or over-cautious.  
> > 
> > I highly doubt that some new attribute in future would change regular
> > file to something totally different. That would break lot of things.
> > 
> > The way how new file types could be added is via reparse points. As
> > this is existing way and can do basically anything.
> > 
> > What could probably makes sense for DT_UNKNOWN is to return it for
> > files and dirs with reparse point attribute and reparse tag is not
> > handled in the function. This can address the idea about applications
> > which wants to handle reparse point specially, and also handles the
> > AF_UNIX sockets (mentioned below).
> > 
> > It is important to know that if you do not have installed NT kernel
> > driver for particular reparse point tag, then it is not to open file
> > or dir to which is attached reparse point with that tag. Hence
> > without the installed driver that file or dir with reparse point is
> > not regular file or dir. But rather something unknown for the system.
> 
> Alright. :-) So my 0008.patch.txt in the previous email did too much. It
> only should have removed the supported_attrs check.
> 
> > > About DT_ macros that cannot appear in a directory listing: I didn't
> > > define DT_BLK in dirent.h because S_IFBLK seems to be a MinGW
> > > invention (to make it easier to port apps). Its value doesn't match
> > > glibc or *BSDs, so DT_BLK == S_IFBLK >> 12 wouldn't match glibc or
> > > *BSDs.  
> > 
> > I think that "block device" is not available in neither msvcrt/ucrt
> > nor in WinAPI. So that is why there is no DT_BLK / S_IFBLK macro in
> > ucrt header files. I guess in mingw it is just for compile purposes
> > of posix applications.
> 
> Right. The following MinGW bug says that it is or was needed to build
> GCC. It has discussion about the atypical value of S_IFBLK too.
> 
>     https://sourceforge.net/p/mingw/bugs/1146/
> 
> In dirent.h, defining DT_BLK for similar compatibility reasons might
> make sense if there are apps that assume that DT_BLK is defined if
> _DIRENT_HAVE_D_TYPE is defined. However, people can add #ifdef DT_BLK
> when porting such apps (which also forces them to notice that block
> devices don't exist on Windows in this form). *If* DT_BLK is added, I
> wonder if the value should be 3 instead of 6 due to mingw-w64's unsual
> S_IFBLK value.

I would propose: Do not add DT_BLK for now. And once we figure out that
it is really useful, we can add it. So we do not need to think which
value DT_BLK should have (unless there is no doubt that it should have
numeric value XXXX).

> libarchive has a comment about S_IFBLK and MinGW which refers to the
> above bug:
> 
>     
> https://github.com/libarchive/libarchive/blob/65196fdd1a385f22114f245a9002ee8dc899f2c4/libarchive/test/test_entry.c#L89
> 
> The change was made in 2009, note the commit message:
> 
>     
> https://github.com/libarchive/libarchive/commit/56965e7a9b1d8b0d70e55d952bd16172e7738746
> 
> There's a longer generic comment about S_IFxxx values in another file:
> 
>     
> https://github.com/libarchive/libarchive/blob/65196fdd1a385f22114f245a9002ee8dc899f2c4/libarchive/archive_entry.h#L179
> 
> S_IFBLK could be changed to 0x6000, but it would be an ABI break. :-(
> (But so is NAME_MAX change.)
> 
> > > There is no DT_SOCK either (mingw-w64 doesn't have S_IFSOCK).  
> > 
> > Ou, I forgot about this. Native AF_UNIX support is now available for
> > WinAPI. This was added to WinAPI just recently and probably it is not
> > supported in UCRT at all. So AF_UNIX files are detected as regular
> > files.
> > 
> > But for future it would be nice to extend mingw stat and readdir code
> > to detect AF_UNIX socket files and report them as DT_SOCK / S_IFSOCK.
> > 
> > WinAPI's AF_UNIX socket is stored as empty regular file with attached
> > reparse point with tag IO_REPARSE_TAG_AF_UNIX and empty reparse point
> > buffer.
> 
> Perhaps DT_SOCK could be added for IO_REPARSE_TAG_AF_UNIX already even
> when S_IFSOCK and stat() support isn't there.

I agree, it can be added.

CRT stat() is probably calling just regular WinAPI's CreateFile() or CRT
_open(). And both calls would fail on IO_REPARSE_TAG_AF_UNIX (unless
called as open reparse point) as this tag is not handled by any driver.
It is just a tag, which Winsock layer is using for tagging AF_UNIX
sockets.

Also on Linux it is not possible to open AF_UNIX files. Connection has
to be done via socket()/bind()/connect() syscalls.

> (DT_LNK is already being
> added even though there is no S_IFLNK, but maybe it's different as long
> as lstat() doesn't exist.)

Sure.

> Are you certain that IO_REPARSE_TAG_AF_UNIX is the right tag for Win32
> AF_UNIX?

Yes, I'm sure.

But you can verify it. Just write simple application which creates
AF_UNIX file, compile it for windows, run it on windows and check what
your readdir() code returns.

> The following lists it as a WSL thing but it might be due to
> the document being older than the Win32 AF_UNIX feature. The second link
> says that WSL and Win32 are interoperable to some extent with AF_UNIX.
> So quite likely it is the right tag, but it's better to be sure. :-)
> 
>     
> https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-fscc/c8e77b37-3909-4fe6-a4ea-2b9d423b1ee4
> 
>     
> https://devblogs.microsoft.com/commandline/windowswsl-interop-with-af_unix/

Reparse points with IO_REPARSE_TAG_AF_UNIX tag were primarily introduced for
native Win32 AF_UNIX sockets and later were re-used by also by WSL:
https://devblogs.microsoft.com/commandline/af_unix-comes-to-windows/
https://devblogs.microsoft.com/commandline/windowswsl-interop-with-af_unix/

Support was added in Windows 10 April 2018 Update (version 1803) and in
Windows Server 2019 (version 1809).

You have already found the second WSL blog point, but the first one was
before the Win32/WSL interop was added.

Unfortunately [MS-FSCC] 2.1.2.1 Reparse Tags documentation is not
accurate. It does not mention information about Win32/Winsock usage.

> Summary of d_type questions:
> 
>   - Can DT_SOCK be added now?

I do not see any reason why not.

>   - Should DT_BLK be added? If yes, should the value be
>     (S_IFBLK >> 12) == 3 (not 6) based on mingw-w64's <sys/stat.h>?
>     (Or should changing of S_IFBLK to 0x6000 be considered?)

As there is not clear answer which value it should have, lets postpone
this addition.

>   - If adding DT_SOCK, is this OK:
> 
>     static unsigned char
>     get_d_type (DWORD attrs, DWORD reparse_tag)
>     {
>       if (attrs & FILE_ATTRIBUTE_REPARSE_POINT)
>         {
>           switch (reparse_tag)
>             {
>               case IO_REPARSE_TAG_SYMLINK:
>                 return DT_LNK;
> 
>               case IO_REPARSE_TAG_AF_UNIX:
>                 return DT_SOCK;
> 
>               default:
>                 return DT_UNKNOWN;
>             }
>         }
> 
>       return (attrs & FILE_ATTRIBUTE_DIRECTORY) ? DT_DIR : DT_REG;
>     }

For me this looks good.

> What to do with NAME_MAX?
> 
>   - The old d_name[260] is already wrong in sense that size of d_name
>     should be at most NAME_MAX + 1, and currently NAME_MAX is 255.
> 
>   - NAME_MAX isn't visible with standard feature test macros like
>     _POSIX_C_SOURCE, so NAME_MAX is broken in this sense too.
> 
>   - I don't know if MSVC defines NAME_MAX in any situation. If
>     it does, then a different value in mingw-w64 might be a tiny
>     compatibility issue.
> 
> I think NAME_MAX should be increased at least if modern MSVC doesn't
> define it. Making NAME_MAX visible with _POSIX_C_SOURCE etc. should be
> simple, one just needs to be careful that all relevant macros are listed
> correctly.

I do not know about this one point.

Some more information. Visual C++ CRT file stdio.h defines following
constant:

#define FILENAME_MAX 260

And it defines it since the first Visual C++ 1.0 version for 32-bit NT
up to the last version with UCRT support.

So I guess that value 260 comes from this FILENAME_MAX macro.

But what is point of FILENAME_MAX I have no idea, it is not referenced
or used in CRT header files.

> > > One can access "C:\Documents and Settings\SomeUserName" just fine if
> > > one has permission to access C:\Users\SomeUserName. It's just the
> > > root of the junction that doesn't allow its contents listed. So it
> > > is a permission issue as the error message says.
> > 
> > Ok, so it is just a normal EACCES scenario.
> 
> It's just that this example makes it look like that junctions aren't
> transparent in all common situations like FindFirstFileW.
> 
> -- 
> Lasse Collin


_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to