On 2025-02-28 Lasse Collin wrote:
> On 2025-02-25 Pali Rohár wrote:
> > On Tuesday 25 February 2025 20:11:12 Lasse Collin wrote:  
> > > While \\?\...\Partition3 without trailing \ isn't a directory, I
> > > more and more think that it doesn't matter too much if opendir
> > > works on it still. It's so tiny bug that *it alone* isn't worth
> > > the the extra code or the runtime overhead of GetFileAttributes.
> > > (With stat(), this kind of cheating wouldn't be acceptable.)    
> > 
> > I'm really not sure. What I think is that readdir() and stat()
> > should be in-sync and consistent. And ideally follows the Windows
> > meaning.  
> 
> I agree that ideally it should be so.
> 
> At the same time, it doesn't feel reasonable to add more code and
> runtime overhead of GetFileAttributesW solely to reject this kind of
> unusual paths. The error handling of FindFirstFileW won't be
> meaningfully shorter, and there's a time-of-check to time-of-use race
> condition between GetFileAttributesW and FindFirstFileW anyway.
> 
> It's not an easy decision, but I think the balance between pros and
> cons is such that it's OK to make readdir too permissive with uncommon
> paths like these. If you think this isn't OK, I will reconsider again,
> but I would like to hear other people's opinions too.

I attached a patch to use GetFileAttributesW. Hopefully it's easier to
compare both versions this way.

ERROR_INVALID_FUNCTION doesn't seem to be needed anymore. New error
codes are ERROR_INVALID_PARAMETER and ERROR_INVALID_HANDLE.

(1)
GetFileAttributesW method is an improvement with an unusual path which
could be diagnosed with ENOTDIR:

    GetFileAttributesW: \\?\GLOBALROOT\Device    ERROR_INVALID_PARAMETER
    FindFirstFileW:     \\?\GLOBALROOT\Device\*  ERROR_INVALID_NAME

It clearly should be ENOENT here though since ? isn't a valid filename:

    GetFileAttributesW: C:\?\foo    ERROR_INVALID_NAME
    FindFirstFileW:     C:\?\foo\*  ERROR_INVALID_NAME

But there is a much more common pathname that definitely should be
ENOTDIR:

    GetFileAttributesW: file.txt\    ERROR_INVALID_NAME
    FindFirstFileW:     file.txt\*   ERROR_DIRECTORY

I don't like this. It's not perfect with FindFirstFileW-only method
either, but with that the problem occurs with \\?\GLOBALROOT\Device
which becomes ENOENT instead of ENOTDIR. It doesn't matter as much with
such unusual paths that few users will see in practice.

The attached patch uses ENOTDIR for ERROR_INVALID_NAME.

(2)
As I mentioned in an earlier email, symlink loop detection doesn't work
with leading pathname components in GetFileAttributesW:

    GetFileAttributesW: loop\foo    ERROR_FILE_NOT_FOUND
    FindFirstFileW:     loop\foo\*  ERROR_CANT_RESOLVE_FILENAME

Oddly it's ERROR_FILE_NOT_FOUND instead of ERROR_PATH_NOT_FOUND. In any
case, it becomes ENOENT. This downgrade in error detection is a very
minor thing though.

(3)
On my test system, printing filenames from C:\msys64 to a file takes
20-25 % longer with GetFileAttributesW variant. There are 5500 dirs and
119,000 regular files. However, with the old dirent that lacks d_type,
the same job takes ten times as long.

If one doesn't take advantage of d_type and calls stat() on every
entry, the GetFileAttributesW version is about 1 % slower, that is, it
doesn't matter at all.

I still think that using only FindFileFirstW is the way go. I prefer
the way it handles error detection in the examples in (1), and it's
also slightly simpler code. Both versions should be an improvement over
the old code. We just need to choose something. :-)

-- 
Lasse Collin
From 68a7ca92fe3ca932ead81b88820737173567dbd4 Mon Sep 17 00:00:00 2001
From: Lasse Collin <[email protected]>
Date: Sun, 2 Mar 2025 16:30:17 +0200
Subject: [PATCH] ... Use GetFileAttributesW first

---
 mingw-w64-crt/misc/dirent.c | 186 +++++++++++++++++++++++-------------
 1 file changed, 122 insertions(+), 64 deletions(-)

diff --git a/mingw-w64-crt/misc/dirent.c b/mingw-w64-crt/misc/dirent.c
index 06d28160c..b1f40dd57 100644
--- a/mingw-w64-crt/misc/dirent.c
+++ b/mingw-w64-crt/misc/dirent.c
@@ -123,6 +123,7 @@ get_code_page (void)
 DIR *
 _wopendir (const wchar_t *path)
 {
+  DWORD attrs;
   BOOL normalize_path;
   DWORD full_path_alloc_size;
   DWORD full_path_len;
@@ -142,6 +143,112 @@ _wopendir (const wchar_t *path)
       return NULL;
     }
 
+  /* Attempt to determine if the given path really is a directory.
+   * This won't catch dangling directory symlinks or other non-working
+   * reparse points that are directories. Lack of access permissions to
+   * the directory won't be detected either. These are caught when
+   * FindFirstFileW is called.
+   *
+   * We could rely on FindFirstFileW catching almost all problems and thus
+   * avoid the overhead of GetFileAttributesW. However, because we need to
+   * append \* to the path for FindFirstFileW, we would incorrectly succeed
+   * on paths that are directories only when there is a trailing directory
+   * separator. For example, \\?\C:\ is a directory but \\?\C: is not (don't
+   * confuse \\?\C: with the drive-relative DOS path C: which is a directory).
+   */
+  attrs = GetFileAttributesW (path);
+  if (attrs == INVALID_FILE_ATTRIBUTES)
+    {
+      switch (GetLastError ())
+       {
+         case ERROR_FILE_NOT_FOUND:
+         case ERROR_PATH_NOT_FOUND:
+         case ERROR_BAD_PATHNAME:
+         case ERROR_BAD_NETPATH:
+         case ERROR_BAD_NET_NAME:
+         case ERROR_CANT_ACCESS_FILE:
+           /* ERROR_FILE_NOT_FOUND occurs when a file doesn't exist but
+            * the containing directory does (C:\notfound.txt).
+            *
+            * ERROR_PATH_NOT_FOUND:
+            *
+            *   - A leading pathname component doesn't exist
+            *     (C:\notfound\foo).
+            *
+            *   - A leading pathname component isn't a directory
+            *     (C:\file.txt\foo). This should actually be ENOTDIR but
+            *     GetFileAttributesW's error codes aren't specific enough.
+            *
+            *   - The pathname is too long: 32767 wide chars including
+            *     the \0 for a long path aware app, or 260 (MAX_PATH)
+            *     including the \0 if the app isn't long path aware.
+            *     This should actually be ENAMETOOLONG.
+            *
+            * ERROR_BAD_PATHNAME occurs at least with this: \\server\
+            *
+            * ERROR_BAD_NETPATH occurs if the server name cannot be resolved
+            * or if the server doesn't support file sharing.
+            *
+            * ERROR_BAD_NET_NAME occurs if the server can be contacted but
+            * the share doesn't exist.
+            *
+            * ERROR_CANT_ACCESS_FILE occurs with directories that have
+            * an unhandled reparse point tag. Treat them the same way as
+            * directory symlinks and junctions whose targets don't exist. */
+           errno = ENOENT;
+           return NULL;
+
+         case ERROR_INVALID_NAME:
+         case ERROR_INVALID_PARAMETER:
+         case ERROR_INVALID_HANDLE:
+         case ERROR_NOT_FOUND:
+           /* ERROR_INVALID_NAME occurs with non-directories when the
+            * pathname has a trailing directory separator: file.txt\
+            * However, it also occurs with bad pathnames like C:\?\foo
+            * which clearly should be ENOENT because ? doesn't exist.
+            *
+            * ERROR_INVALID_PARAMETER occurs with some device paths:
+            * \\?\nul, \\?\C:
+            *
+            * ERROR_INVALID_HANDLE example: \\?\GLOBALROOT\Device
+            *
+            * ERROR_NOT_FOUND occurs with the "con" device: \\?\con\foo */
+           errno = ENOTDIR;
+           return NULL;
+
+         case ERROR_ACCESS_DENIED:
+           /* A leading pathname component denies access. If the last
+            * component isn't accessible, it will be detected when
+            * FindFirstFileW is called. */
+           errno = EACCES;
+           return NULL;
+
+         case ERROR_CALL_NOT_IMPLEMENTED:
+           /* Windows 95/98/ME is not supported by this dirent code.
+            * GetFullPathNameW and FindFirstFileW would fail too. */
+           errno = ENOSYS;
+           return NULL;
+
+         default:
+           /* Unknown error. */
+           errno = EINVAL;
+           return NULL;
+       }
+    }
+
+  if (!(attrs & FILE_ATTRIBUTE_DIRECTORY))
+    {
+      /* The entry exists but it's not a directory.
+       *
+       * Detecting non-directories only works with the last pathname
+       * component. For example, if there is a file "foo", _wopendir(L"foo")
+       * will result in ENOTDIR here. _wopendir(L"foo/bar") should be ENOTDIR
+       * too, but will result in ENOENT because GetFileAttributesW fails with
+       * ERROR_PATH_NOT_FOUND. */
+      errno = ENOTDIR;
+      return NULL;
+    }
+
   /*
    * We need an absolute pathname so that things keep working even if
    * the current directory is changed.
@@ -171,24 +278,10 @@ _wopendir (const wchar_t *path)
       full_path_alloc_size = GetFullPathNameW (path, 0, NULL, NULL);
       if (full_path_alloc_size == 0)
        {
-         switch (GetLastError ())
-           {
-             case ERROR_CALL_NOT_IMPLEMENTED:
-               /* Windows 95/98/ME is not supported by this dirent code. */
-               errno = ENOSYS;
-               return NULL;
-
-             case ERROR_FILENAME_EXCED_RANGE:
-               /* See the comment of DIRENT_WPATH_MAX in this file. */
-               errno = ENAMETOOLONG;
-               return NULL;
-
-             default:
-               /* Unknown error. GetFullPathNameW accepts various invalid
-                * inputs so this error should be uncommon. */
-               errno = EINVAL;
-               return NULL;
-           }
+         /* Unknown error. GetFileAttributesW already succeeded, so this
+          * error should be uncommon. */
+         errno = EINVAL;
+         return NULL;
        }
     }
 
@@ -253,13 +346,7 @@ _wopendir (const wchar_t *path)
   /* Initialize dd_handle and dd_wfd. The FindFirstFileW call cannot be
    * delayed until readdir because that would result in worse error detection.
    * Specifically, it's not enough to call GetFileAttributesW here because it
-   * cannot detect EACCES or dangling directory symlinks.
-   *
-   * NOTE: \\?\C:\ is a directory but \\?\C: isn't. We incorrectly accept the
-   * latter as a directory still because appending \* produces \\?\C:\*.
-   * GetFileAttributesW would detect paths that aren't directories when the
-   * trailing \ is missing, but the overhead of GetFileAttributesW isn't worth
-   * it if the only benefit is being pedantic about this kind of paths. */
+   * cannot detect EACCES or dangling directory symlinks. */
   err = 0;
   dirp->dd_handle = FindFirstFileW (dirp->dd_name, &dirp->dd_wfd);
   if (dirp->dd_handle == INVALID_HANDLE_VALUE)
@@ -285,40 +372,20 @@ _wopendir (const wchar_t *path)
             * or 260 (MAX_PATH) including the \0 if the app isn't
             * long path aware.
             *
-            * ERROR_INVALID_NAME occurs with C:\:\* and such invalid inputs.
-            * GetFullPathNameW does accept invalid paths so this error is
-            * indeed possible.
-            *
-            * ERROR_BAD_PATHNAME occurs if \\* is passed to FindFirstFileW.
-            *
-            * ERROR_BAD_NETPATH occurs if the server name cannot be resolved
-            * or if the server doesn't support file sharing.
-            *
-            * ERROR_BAD_NET_NAME occurs if the server can be contacted but
-            * the share doesn't exist.
-            *
             * ERROR_CANT_ACCESS_FILE occurs with directories that have
             * an unhandled reparse point tag. Treat them the same way as
-            * directory symlinks and junctions whose targets don't exist. */
+            * directory symlinks and junctions whose targets don't exist.
+            *
+            * The other errors above and ERROR_DIRECTORY below shouldn't
+            * easily occur because GetFileAttributesW succeeded. However,
+            * the file system might have changed since the GetFileAttributesW
+            * call, so it's good to handle the most common extra errors here
+            * still. */
            err = ENOENT;
            break;
 
          case ERROR_DIRECTORY:
-         case ERROR_INVALID_FUNCTION:
-         case ERROR_NOT_FOUND:
-           /* Detecting non-directories only works with the last pathname
-            * component. For example, if there is a file foo, passing
-            * foo\* to FindFirstFileW will fail with ERROR_DIRECTORY.
-            * foo\bar\* should be ENOTDIR too but it becomes ENOENT
-            * because FindFirstFileW fails with ERROR_PATH_NOT_FOUND.
-            *
-            * ERROR_INVALID_FUNCTION happens at least with the device
-            * namespace. _wopendir(L"nul") makes GetFullPathNameW produce
-            * \\.\nul, and then FindFirstFileW is called with \\.\nul\*
-            * which results in ERROR_INVALID_FUNCTION.
-            *
-            * _wopendir(L"con") behaves like nul except that \\.\con\*
-            * results in ERROR_NOT_FOUND */
+           /* See the comment above. */
            err = ENOTDIR;
            break;
 
@@ -330,15 +397,11 @@ _wopendir (const wchar_t *path)
            break;
 
          case ERROR_ACCESS_DENIED:
+           /* GetFileAttributesW cannot detect lack of access permissions
+            * to the last pathname component. That error is caught here. */
            err = EACCES;
            break;
 
-         case ERROR_FILENAME_EXCED_RANGE:
-           /* Unsure if this error code can happen from FindFirstFileW
-            * but handle it just in case to avoid the default case. */
-           err = ENAMETOOLONG;
-           break;
-
          case ERROR_NOT_ENOUGH_MEMORY:
            err = ENOMEM;
            break;
@@ -349,11 +412,6 @@ _wopendir (const wchar_t *path)
           * That error shouldn't be possible with FindFirstFileW.
           */
 
-         case ERROR_CALL_NOT_IMPLEMENTED:
-           /* We might get here if GetFullPathNameW wasn't called. */
-           err = ENOSYS;
-           break;
-
          default:
            /* Unknown error. */
            err = EINVAL;
-- 
2.48.1

_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to