On Wed, 30 Jan 2008, Alan Bateman wrote:
Michael Allman wrote:
Hello,
Can someone with knowledge of such matters explain what
FileDispatcher.preClose() is supposed to do on Solaris/Linux. I mean, I
see the code, but I don't understand why it exists or what problem it's
supposed to avoid or something.
I ask because I'm trying to fix a file-locking problem on soylatte and it
seems the solution to that problem is to remove this code (on that
platform). But before I charge ahead, I need a better understanding of why
this code exists.
In particular, I'm really interested in the stuff that happens in
FileDispatcher.c, functions Java_sun_nio_ch_FileDispatcher_init and
Java_sun_nio_ch_FileDispatcher_preClose0. They're setting something up
that looks important, but I just don't get it.
In a multi-threaded application it is always difficult to know when you can
safely close and release a file descriptor (or other resource). If one thread
is using a file descriptor to read or write and another thread releases
(closes) it then it it possible for the first thread to read or write to the
wrong file or socket in the event that the file descriptor is recycled
quickly. The approach that we use in both classic networking and NIO is to
use a two-step process. In the first step we duplicate (dup2) the file
descriptor to another that is one end of a half shutdown socket pair. Other
threads that are reading or writing but haven't called the read or write
system calls yet will get an immediate EOF or pipe error when they do so. As
the threads complete the read or write method then they examine their state.
If there is a close pending then the last one releases the file descriptor.
Hopefully this brief overview gives you some idea what this code is about.
The FileDescriptor#init method is where the socketpair is created, and that
preClose0 method does the dup2. I haven't been following the Soylatte port
very closely so I'm curious what problem you are seeing - when you say "file
locking" do you mean FileChannel#lock? If so then the issue may be that the
asynchronous close mechanism isn't completely extended to FileChannel yet.
I think I get it. So let me explain the problem I'm seeing here.
If I close a file channel on which I have acquired (but not released) a
file lock, I get an IOException: Bad file descriptor. For example, the
Lock regression test does this and fails (on soylatte).
I think the problem here is that FileChannelImpl.implCloseChannel() calls
nd.preClose(fd) before the block that releases its file locks. On
non-windows, nd.preClose(fd) doesn't just "pre close" fd, it closes it.
Then implCloseChannel() tries to release its file locks. fd now points to
a socket descriptor and on Solaris/Linux, such attempt seems to be
harmless. On Mac OS X, it complains with the EBADF error code.
It seems that the preClose semantics are not correctly handled by the
FileChannelImpl.implCloseChannel() method. On non-windows, it attempts to
release file locks that no longer exist (because preClose() releases
them). It seems that the file lock release block should be moved into
NativeDispatcher.preClose(). It will be run on Windows, but will not be
run on non-Windows. That seems correct to me, given that on non-Windows,
preClose0 releases the file locks.
Obviously, this kind of change is much more than a soylatte patch. It
changes code that already works on Windows, Solaris, and Linux. But if my
analysis is correct, it looks like it's just a silent bug.
Thoughts?
Michael