On Mon, Aug 17, 2009 at 01:24:30PM -0700, Udo Grabowski wrote: > We see this problem during fast write access to small files (a couple > of jobs on different files on the same filesystem) through the > automounter (layout and options as described above). Files sometimes > do exist before and sometimes not, so no clue there.
It's a race condition. If the mirror mount hasn't been done yet, or if has been unmounted (due to not being in use for a while), and two or more threads step into the mount, then you have a race, and only one thread will win. This is trivial to reproduce when running "dmake" in an ONNV source directory in an NFS-mounted workspace with a separate "proto" area filesystem. Yes, that's ONNV engineer speak. There's other ways to trigger it, and, IIRC I provided enough details about this a long time ago. If any process holds a reference to a file or directory in that mirror-mounted filesystem, then that filesystem will not be unmounted, and then the EBUSY won't happen (since no threads can be racing to mount the thing that's already mounted). So that's your workaround: keep a reference to every affected mirror mount open so that they don't get unmounted. (But I've not tested this workaround, or at least not recently.) This is a general bug in domount() in the kernel. The automounter works around it by retrying calls to domount() on failure (there's a bit more to it, but you get the picture). The fix here is to do more or less what the automounter does. Tom had a fix, but that fix introduced some other bug (I forget the details), and since it was a low-priority bug (in my case it was easy enough to live with it) it has languished. Nico --