On Wed, 30 Jan 2008, Alan Bateman wrote:

Michael Allman wrote:
Hello,

Can someone with knowledge of such matters explain what FileDispatcher.preClose() is supposed to do on Solaris/Linux. I mean, I see the code, but I don't understand why it exists or what problem it's supposed to avoid or something.

I ask because I'm trying to fix a file-locking problem on soylatte and it seems the solution to that problem is to remove this code (on that platform). But before I charge ahead, I need a better understanding of why this code exists.

In particular, I'm really interested in the stuff that happens in FileDispatcher.c, functions Java_sun_nio_ch_FileDispatcher_init and Java_sun_nio_ch_FileDispatcher_preClose0. They're setting something up that looks important, but I just don't get it.
In a multi-threaded application it is always difficult to know when you can safely close and release a file descriptor (or other resource). If one thread is using a file descriptor to read or write and another thread releases (closes) it then it it possible for the first thread to read or write to the wrong file or socket in the event that the file descriptor is recycled quickly. The approach that we use in both classic networking and NIO is to use a two-step process. In the first step we duplicate (dup2) the file descriptor to another that is one end of a half shutdown socket pair. Other threads that are reading or writing but haven't called the read or write system calls yet will get an immediate EOF or pipe error when they do so. As the threads complete the read or write method then they examine their state. If there is a close pending then the last one releases the file descriptor. Hopefully this brief overview gives you some idea what this code is about. The FileDescriptor#init method is where the socketpair is created, and that preClose0 method does the dup2. I haven't been following the Soylatte port very closely so I'm curious what problem you are seeing - when you say "file locking" do you mean FileChannel#lock? If so then the issue may be that the asynchronous close mechanism isn't completely extended to FileChannel yet.

I think I get it.  So let me explain the problem I'm seeing here.

If I close a file channel on which I have acquired (but not released) a file lock, I get an IOException: Bad file descriptor. For example, the Lock regression test does this and fails (on soylatte).

I think the problem here is that FileChannelImpl.implCloseChannel() calls nd.preClose(fd) before the block that releases its file locks. On non-windows, nd.preClose(fd) doesn't just "pre close" fd, it closes it. Then implCloseChannel() tries to release its file locks. fd now points to a socket descriptor and on Solaris/Linux, such attempt seems to be harmless. On Mac OS X, it complains with the EBADF error code.

It seems that the preClose semantics are not correctly handled by the FileChannelImpl.implCloseChannel() method. On non-windows, it attempts to release file locks that no longer exist (because preClose() releases them). It seems that the file lock release block should be moved into NativeDispatcher.preClose(). It will be run on Windows, but will not be run on non-Windows. That seems correct to me, given that on non-Windows, preClose0 releases the file locks.

Obviously, this kind of change is much more than a soylatte patch. It changes code that already works on Windows, Solaris, and Linux. But if my analysis is correct, it looks like it's just a silent bug.

Thoughts?

Michael

Reply via email to