Hi,
> There's also the problem of handling NFS shares. However, I just had an
> idea how to speed up symlink_info::check without neglecting NFS shares.
> This will take some time, though since it turns a lot of code upside
> down. Stay tuned.
This sounds great! Cygwin filesystem performance is a very important
issue, and any improvement is more than welcome!
> I don't understand how you think this should work. The filter expression
> given to NtQueryDirectoryFile is either a constant string and has to
match
> the filename exactly, or it contains wildcards. This is documented
> behaviour:
http://msdn.microsoft.com/en-us/library/ff567047%28VS.85%29.aspx
> So, "foo" works, "foo*" works, but a list like "foo foo.exe foo.lnk"
> does not.
There are two options for stat() and other places the need file info
(such as check_symlink):
1) CreateFile(the_dir), then NtQueryDirectoryFile("foo*") and retrieve
all the info (including the hardlink), filter out the results in
user-mode ("foo", "foo.exe", "foo.lnk"), and then call CloseHandle().
2) CreateFile(the_dir), NtQueryDirectoryFile("foo"),
NtQueryDirectoryFile("foo.exe"), NtQueryDirectoryFile("foo.lnk"),
CloseHandle(). The calls to NtQueryDirectoryFile() should be with
RestartScan=1, so that the the_dir handle can be reused. Also
ReturnSingleEntry=1 can be set to improve performance.
This is instead what is done today in cygwin:
3) CreateFile("foo"), NtQueryFileInformation(), CloseHandle() (and
repeat this for "foo.exe" and "foo.lnk")
I did some performance tests comparing #1 #2 and #3.
I found out that #1 and #2 are both around 10x to 100x (!!!) times
faster than #3.
I checked out why, and found out that #1 and #2 don't modify the access
time of the file, whereas #3 does. This already immediately causes a
huge performance penalty (and it is also not according to the posix
standard: stat("foo") should not update atime of "foo").
Another reason is that the kernel NTFS driver performs automatically
read-ahead of the file, thus just stat("foo") (which calls
CreateFile("foo") in #3) causes the first 64k of "foo" to be read from
the disk - slowing down performance tremendously. Think of "ls /bin"
with 3500 files: NTFS reads the first 64K of all the 3500 files! no
wonder it takes so long...
And yet another reason why #3 is way slower than #1 and #2 is the
anti-viruses: Nearly all Windows users install an AV (or use Win7 MS
AV). These trap and monitor all CreateFile() to regular files (not to
directory files). Therefore CreateFile() to a regular file can take a
lot lot longer than CreateFile() to a directory.
I would suggest using #2 over #1, since its simpler code-wise, and I did
not see any serious performance difference between the two.
Yoni
On 14/9/2010 12:05 PM, Corinna Vinschen wrote:
On Sep 13 13:28, Yoni Londner wrote:
Hi,
However, isn't that kind of a chicken/egg situation? If you want to
reuse the content of the FILE_BOTH{_ID}_DIRECTORY_INFORMATION structure
from a previous call to readdir, you would have to call the
I am not talking about reusing info from a previous readdir.
Every single file cygwin tries to access, it does it in a loop,
trying afterwards to check for *.lnk file.
Using the directory query operations, it is possible to get this
info faster:
instead of getting file info for FOO and then for "FOO.lnk",
Cygwin can query the directory info for "FOO FOO.LNK" (for the file
requested, plus its possible symlink file).
I don't understand how you think this should work. The filter expression
given to NtQueryDirectoryFile is either a constant string and has to match
the filename exactly, or it contains wildcards. This is documented
behaviour: http://msdn.microsoft.com/en-us/library/ff567047%28VS.85%29.aspx
So, "foo" works, "foo*" works, but a list like "foo foo.exe foo.lnk"
does not.
There's also the problem of handling NFS shares. However, I just had an
idea how to speed up symlink_info::check without neglecting NFS shares.
This will take some time, though since it turns a lot of code upside
down. Stay tuned.
Corinna