Re: Making forced unmounts work
In article <31490263-5a8e-411a-bb57-f7fc5cffc...@eis.cs.tu-bs.de>, J. Hannken-Illjes wrote: >The more I think the more I just want to remove forced unmounts. I think that any operation that cannot be undone (and requires reboot to be undone) makes the OS less resilient to failure. >To take some examples: > >- A hard,nointr NFS mount hanging because the server stops responding. > >Even if it were possible to use fstrans_ here (and it would become ugly) >it would not help. The root node of the mount will likely be locked by the >first thread trying to lookup so unmount won't be able to even lookup >the mount point. If it were possible to run `mount -u' or `unmount' it >should be possible to update the mount as `soft,intr' and proceed as usual, >kill threads blocking an unmount and unmount. Store the normalized mount path with the mountpoint, look it up in the mount list, make all blocked threads give an I/O error on the current operation, etc. christos
Re: Making forced unmounts work
The more I think the more I just want to remove forced unmounts. To take some examples: - A hard,nointr NFS mount hanging because the server stops responding. Even if it were possible to use fstrans_ here (and it would become ugly) it would not help. The root node of the mount will likely be locked by the first thread trying to lookup so unmount won't be able to even lookup the mount point. If it were possible to run `mount -u' or `unmount' it should be possible to update the mount as `soft,intr' and proceed as usual, kill threads blocking an unmount and unmount. - Ejecting a removable device currently mounted. In this case the disk driver should error out and all threads should come back with an I/O-error. If this is not the case the driver is buggy. - Refcount and other bugs ... should be fixed ... -- J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)
Re: Making forced unmounts work
hann...@eis.cs.tu-bs.de ("J. Hannken-Illjes") writes: >On Dec 6, 2012, at 8:32 AM, Michael van Elst wrote: >> hann...@eis.cs.tu-bs.de ("J. Hannken-Illjes") writes: >> >>> David wants forced unmounts to work even if a thread gets stuck >>> permanently in a vnode operation. >> >> How can it get stuck (short of bugs) ? >Here we are talking about bugs only. I am interested in the case where a mounted device is detached and the umount blocks forever. Wether that is considered a bug or a design decision, I don't care. -- -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: Making forced unmounts work
hi, > Forced unmounts will most likely panic the kernel. The main problem is > other threads running a vnode operation when we come to clean and > reclaim an active vnode and therefore change its operation vector and > destroy the file system private data without locking or synchronisation. > > One solution is to to suspend the file system while it gets unmounted. > This way all other threads running a vnode operation will wait for > fstrans_start() to succeed. Changing fstrans_start() to detect a now > dead vnode it is possible to restart the vnode operation with the new > operations vector. > > In short the attached diff: > > - Adds a new kernel-internal errno ERESTARTVOP and changes VCALL() to > restart a vnode operation once it returns ERESTARTVOP. > > - Changes fstrans_start() to take an optional `hint vnode' and return > ERESTARTVOP if the vnode becomes dead. > > - Rearranges all file system operations to call fstrans_start() before > accessing file system private data and check (and return) an error. > > - Changes vfs_suspend() to optionally suspend a file system without > syncing it to disk. This feature is used when revoking a vnode. > > - Changes dounmount() to suspend a file system during the unmount. > > - Keeps the `struct mp' until all (dead) vnodes disappear. > > - Adds some missing operations to deadfs. > > Comments or objections? thanks for working on this. if possible, it's better to avoid taking a rwlock for every VOPs. IMO umount and revoke, instead every VOPs, ought to do the hard work. have you considered to do it at upper levels like file, descriptors, mmaps, etc? eg. 1. iterate all processes to mark their files revoked 2. make long-sleepers (fifo, nfs rpcs, etc) abort 3. wait for reference counts settle you may need to send IPIs as some operations are barrier-less. (fd_getfile/putfile) iirc dragonflybsd implements revoke at the file or descriptor level. YAMAMOTO Takashi > > -- > J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)
Re: Making forced unmounts work
Date: Thu, 6 Dec 2012 08:23:07 +0100 From: "J. Hannken-Illjes" Looks like this thread is dead. No one beside David Holland is interested and David objects. I take back my proposal. I'm interested, but I haven't fit enough of the vnode life cycle or the fstrans mechanism into my head to assess how well it is likely to work, and I don't think it's a good strategy in general to put more boilerplate into every vop implementation -- there is far too much boilerplate already that lots of vop implementations get wrong. As a step toward fixing the first problem, could you perhaps write down the rules about how fstrans currently composes with vnode locks and operations and life cycle? I seem to recall there was some issue a year or so ago about the ordering of vput and fstrans_done in ufs_rmdir, and the nature of the issue wasn't clear to me from the man pages or a cursory examination of the source, which suggested to me that I don't understand fstrans well enough to reason about patches like this.
re: Making forced unmounts work
> Looks like this thread is dead. No one beside David Holland is > interested and David objects. I take back my proposal. i'm very interested in this idea. > David wants to track information about threads running a vnode > operation from vnode_if.c. I have no idea how this could be done > without knowing file system implementation. > > David wants forced unmounts to work even if a thread gets stuck > permanently in a vnode operation. I don't see a way to safely > reclaim a vnode from a file system when this vnode is in use by > another thread. i think we should strive for this goal, even if we're not sure how to get there yet. but more importantly, your change seems to fix at least a large part of the problem and i think we should continue to consider it. .mrg.
Re: Making forced unmounts work
On Tue, Dec 04, 2012 at 05:31:59PM +0100, J. Hannken-Illjes wrote: > >> A suspended fs has the guarantee that no other thread will be inside > >> fstrans_suspend / fstrans_done of any vnode operation. > >> > >> Threads stuck permanently as in (c) are impossible to catch. > > > > ...doesn't that mean the (c) threads are going to be holding read > > locks, so the suspend will hang forever? > > Yes. > > > We have to assume there will be at least one such thread, as people > > don't generally attempt umount -f unless something's wedged. > > If forced unmounts have to work with vnode operations that will not > leave a fs the only solution will be to disable forced unmounts and > leave reboot as the last option. How could we safely determine if > a vnode operation just takes some time or is stuck forever? What should > we do with threads that are stuck forever? You can't. That's why umount -f is currently unsafe. We can probably make it more safe (and as just pointed out elsewhere, linux-style lazy unmounts are safe and can often be used as a substitute) but making it fully safe will be difficult. The basic use case for umount -f is when you have processes hanging uninterruptibly waiting for a hard-mounted NFS server that's gone away and won't be coming back anytime soon. The other common case is when you accidentally ejected a removable device without unmounting it, and didn't notice until reinserting it isn't an option. Usually what causes you to notice is accessing the device and failing, and that often results in a stuck process. If there isn't a process hanging, generally you can clear away any processes holding the volume busy and do an ordinary umount. The third case I guess is when vnode refcount bugs have left you with a "busy" fs that nothing's actually using; in this case there won't be processes hanging; but by the time you've confirmed this, there also won't be processes using the fs, so the current scheme is more or less safe enough... > Looks like this thread is dead. No one beside David Holland is > interested and David objects. I take back my proposal. I think fstrans may not be the mechanism we want, but I don't think we should give up. > >> - Both VOP_GETPAGES() and VOP_PUTPAGES() cannot block or sleep here > >> as they run part or all of the operation with vnode interlock held. > > > > Ugh. But this doesn't, for example, preclude atomically incrementing a > > per-cpu counter or setting a flag in the lwp structure. > > Sure, but we need a point in time where it is safe to mark a vnode dead > and that means to reclaim its inode or other fs-private state. This is > not just accounting -- we must be sure no thread is using fs-private data > and I don't see a way without suspending vnode operations. The way to do this is to divide the "dead" operation into two parts: first change the ops vector (or set a dead flag) so no new threads can enter, then when the threads already in the fs are gone or otherwise neutralized, delete the private data. Or we could just leave the private data. One of the advantages of setting a dead flag (and testing it in vnode_if.c) instead of changing the ops vector is that the vnode doesn't lose its attachment to the fs and can be cleaned up properly later if it ever gets released. As for "neutralized", one possible scheme is to mark the vnodes dead, wait for a few seconds, and then do something evil to the blocked threads still using the fs; that requires knowing which threads they are, which is the hard part. Any simple scheme will end up being a hotspot for cache contention. See the "kicking everybody out of the softc" thread from a couple years ago for one possible approach to that. > >> - Accessing devices and fifos through a file system cannot be tracked > >> at the vnode_if.c level. Take ufs_spec_read() for example: > >> > >> fstrans_start(...); > >> VTOI(vp)->i_flag |= IN_ACCESS; > >> fstrans_done(...); > >> return VOCALL(spec_vnodeop_p, ...) > >> > >> Here the VOCALL may sleep forever if the device has no data. > > > > This I don't understand. To the extent it's "in" the fs while it's > > doing this, vnode_if.c can know that, because it called ufs_spec_read > > and ufs_spec_read hasn't returned yet. To the extent that it's in > > specfs, specfs won't itself ever be umount -f'd, since it isn't > > mounted, so there's no danger. > > Ok, we had to use VOCALL rather than vnode_if.c and add the accounting > here. No, I don't see why. vnode_if.c calls ufs_spec_read(); until ufs_spec_read() returns, as far as tracking done in vnode_if.c is concerned, the thread doing it is still in the fs. > > If umount -f currently blats over the data that specfs uses, then it's > > currently not safe, but I don't see how it's different from any other > > similar case with ufs data. > > > > I expect I'm missing something. > > So another approach is to serialize vclean(), at least when cl
Re: Making forced unmounts work
On Thu, Dec 06, 2012 at 10:32:01AM +, Julian Yon wrote: > I think you could take some inspiration from Linux here: it has a very > handy umount -l which detaches the filesystem from the tree, but defers > the rest of the unmount/cleanup until the fs is no longer busy. This > can help in situations where you're not trying to physically remove > media but just need to reclaim the mount point, and therefore have no > immediate need for a umount -f. Because no process can open anything > new on the detached fs, if it eventually unwedges itself somehow it > won't get rewedged. Yes, I've talked about supporting that elsewhere (maybe not on tech-kern, or at least not recently) -- allowing it has basically no downside, but it requires infrastructure we don't currently have. (Most notably, if you umount -l a hanging nfs volume and then later want to umount -f it, you need to be able to address the leftovers somehow; currently because it isn't in the fs namespace any more and it isn't associated with a device, there's no way to refer to it. I'm not sure how, if at all, Linux copes with this case.) Also it requires splitting the current struct mount into two parts, the filesystem and the mounting of that filesystem; but that's desirable in the long run for other reasons as well, such as supporting plan9-type rebind mounts. -- David A. Holland dholl...@netbsd.org
Re: Making forced unmounts work
On Thu, 6 Dec 2012 08:23:07 +0100 "J. Hannken-Illjes" wrote: > Looks like this thread is dead. No one beside David Holland is > interested and David objects. I take back my proposal. Have been reading the discussion. Don't assume that no contribution means no interest! > David wants forced unmounts to work even if a thread gets stuck > permanently in a vnode operation. I don't see a way to safely > reclaim a vnode from a file system when this vnode is in use by > another thread. I think you could take some inspiration from Linux here: it has a very handy umount -l which detaches the filesystem from the tree, but defers the rest of the unmount/cleanup until the fs is no longer busy. This can help in situations where you're not trying to physically remove media but just need to reclaim the mount point, and therefore have no immediate need for a umount -f. Because no process can open anything new on the detached fs, if it eventually unwedges itself somehow it won't get rewedged. Julian -- 3072D/F3A66B3A Julian Yon (2012 General Use) signature.asc Description: PGP signature
Re: Making forced unmounts work
On Dec 6, 2012, at 10:14 AM, Martin Husemann wrote: > I am interested, but I lack significant vnode clue. So, sorry if answers > are obvious - they are not to me. > > About the only situation I ever (and it is almost reproducable at will), in > daily life, wanted to use forced unmounts instead of rebooting a machine (or > before the machine rebooted itself in direct consequence of the bug that > caused the inital issue) is a NFS mount combined with what probably are > network driver bugs (or maybe hardware). > > I have multiple NFS servers where I better force all clients to use TCP > or smaller write windows, otherwise the clients will lock up on bigger > writes. AFAICT the nfs level creates the necessary packets (repeataly, > after appropriate timeouts etc.) and passes them to the driver, but the > driver (or the hardware, and most likely it is the receiving side on the > server) reproducably drops them, so no progress overall happens. > > Would this situation be fixed by your patch or would the nfs layer still > hold the reader lock? For now my patch only covers ffs and msdosfs file systems. In general it should be possible to add fstrans to the nfs file system making it possible to suspend it. I did not dig deeper but as the nfs client already has timeout mechanisms it should be possible to cover this kind of problem. -- J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)
Re: Making forced unmounts work
I am interested, but I lack significant vnode clue. So, sorry if answers are obvious - they are not to me. About the only situation I ever (and it is almost reproducable at will), in daily life, wanted to use forced unmounts instead of rebooting a machine (or before the machine rebooted itself in direct consequence of the bug that caused the inital issue) is a NFS mount combined with what probably are network driver bugs (or maybe hardware). I have multiple NFS servers where I better force all clients to use TCP or smaller write windows, otherwise the clients will lock up on bigger writes. AFAICT the nfs level creates the necessary packets (repeataly, after appropriate timeouts etc.) and passes them to the driver, but the driver (or the hardware, and most likely it is the receiving side on the server) reproducably drops them, so no progress overall happens. Would this situation be fixed by your patch or would the nfs layer still hold the reader lock? Martin
Re: Making forced unmounts work
On Dec 6, 2012, at 8:32 AM, Michael van Elst wrote: > hann...@eis.cs.tu-bs.de ("J. Hannken-Illjes") writes: > >> David wants forced unmounts to work even if a thread gets stuck >> permanently in a vnode operation. > > How can it get stuck (short of bugs) ? Here we are talking about bugs only. -- J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)
Re: Making forced unmounts work
hann...@eis.cs.tu-bs.de ("J. Hannken-Illjes") writes: >David wants forced unmounts to work even if a thread gets stuck >permanently in a vnode operation. How can it get stuck (short of bugs) ? -- -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: Making forced unmounts work
Looks like this thread is dead. No one beside David Holland is interested and David objects. I take back my proposal. David wants to track information about threads running a vnode operation from vnode_if.c. I have no idea how this could be done without knowing file system implementation. David wants forced unmounts to work even if a thread gets stuck permanently in a vnode operation. I don't see a way to safely reclaim a vnode from a file system when this vnode is in use by another thread. -- J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)
Re: Making forced unmounts work
On Dec 4, 2012, at 4:44 PM, David Holland wrote: > On Sun, Dec 02, 2012 at 05:29:01PM +0100, J. Hannken-Illjes wrote: > Also I wonder if there's any way to accomplish this that doesn't > require adding fstrans calls to every operation in every fs. Not in a clean way. We would need some kind of reference counting for vnode operations and that is quite impossible as vnode operations on devices or fifos sometimes wait forever and are called from other fs like ufsspec_read() for example. How could we protect UFS updating access times here? >>> >>> I'm not entirely convinced of that. There are basically three >>> problems: (a) new incoming threads, (b) threads that are already in >>> the fs and running, and (c) threads that are already in the fs and >>> that are stuck more or less permanently because something broke. >>> >>> Admittedly I don't really understand how fstrans suspending works. >>> Does it keep track of all the threads that are in the fs, so the (b) >>> ones can be interrupted somehow, or so we at least can wait until all >>> of them either leave the fs or enter fstrans somewhere and stall? >> >> Fstrans is a recursive rw-lock. Vnode operations take the reader lock >> on entry. To suspend a fs we take the writer lock and therefore >> it will catch both (a) and (b). New threads will block on first entry, >> threads already inside the fs will continue until they leave the fs >> and block on next entry. > > Ok, fair enough, but... > >> A suspended fs has the guarantee that no other thread will be inside >> fstrans_suspend / fstrans_done of any vnode operation. >> >> Threads stuck permanently as in (c) are impossible to catch. > > ...doesn't that mean the (c) threads are going to be holding read > locks, so the suspend will hang forever? Yes. > We have to assume there will be at least one such thread, as people > don't generally attempt umount -f unless something's wedged. If forced unmounts have to work with vnode operations that will not leave a fs the only solution will be to disable forced unmounts and leave reboot as the last option. How could we safely determine if a vnode operation just takes some time or is stuck forever? What should we do with threads that are stuck forever? >>> If we're going to track that information we should really do it from >>> vnode_if.c, both to avoid having to modify every fs and to make sure >>> all fses support it correctly. (We also need to be careful about how >>> it's done to avoid causing massive lock contention; that's why such >>> logic doesn't already exist.) >> >> Some cases are nearly impossible to track at the vnode_if.c level: >> >> - Both VOP_GETPAGES() and VOP_PUTPAGES() cannot block or sleep here >> as they run part or all of the operation with vnode interlock held. > > Ugh. But this doesn't, for example, preclude atomically incrementing a > per-cpu counter or setting a flag in the lwp structure. Sure, but we need a point in time where it is safe to mark a vnode dead and that means to reclaim its inode or other fs-private state. This is not just accounting -- we must be sure no thread is using fs-private data and I don't see a way without suspending vnode operations. >> - Accessing devices and fifos through a file system cannot be tracked >> at the vnode_if.c level. Take ufs_spec_read() for example: >> >> fstrans_start(...); >> VTOI(vp)->i_flag |= IN_ACCESS; >> fstrans_done(...); >> return VOCALL(spec_vnodeop_p, ...) >> >> Here the VOCALL may sleep forever if the device has no data. > > This I don't understand. To the extent it's "in" the fs while it's > doing this, vnode_if.c can know that, because it called ufs_spec_read > and ufs_spec_read hasn't returned yet. To the extent that it's in > specfs, specfs won't itself ever be umount -f'd, since it isn't > mounted, so there's no danger. Ok, we had to use VOCALL rather than vnode_if.c and add the accounting here. > > If umount -f currently blats over the data that specfs uses, then it's > currently not safe, but I don't see how it's different from any other > similar case with ufs data. > > I expect I'm missing something. So another approach is to serialize vclean(), at least when cleaning active vnodes, and have VOCALL() take care of the vnode currently being cleaned. We would need just per-thread state then. This would still not work for vnode operations stuck in the fs. Is this the direction you have in mind? -- J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)
Re: Making forced unmounts work
On Sun, Dec 02, 2012 at 05:29:01PM +0100, J. Hannken-Illjes wrote: > I'm convinced -- having fstrans_start() return ERESTART is the way to go. Ok then :-) > >>> Also I wonder if there's any way to accomplish this that doesn't > >>> require adding fstrans calls to every operation in every fs. > >> > >> Not in a clean way. We would need some kind of reference counting for > >> vnode operations and that is quite impossible as vnode operations on > >> devices or fifos sometimes wait forever and are called from other fs > >> like ufsspec_read() for example. How could we protect UFS updating > >> access times here? > > > > I'm not entirely convinced of that. There are basically three > > problems: (a) new incoming threads, (b) threads that are already in > > the fs and running, and (c) threads that are already in the fs and > > that are stuck more or less permanently because something broke. > > > > Admittedly I don't really understand how fstrans suspending works. > > Does it keep track of all the threads that are in the fs, so the (b) > > ones can be interrupted somehow, or so we at least can wait until all > > of them either leave the fs or enter fstrans somewhere and stall? > > Fstrans is a recursive rw-lock. Vnode operations take the reader lock > on entry. To suspend a fs we take the writer lock and therefore > it will catch both (a) and (b). New threads will block on first entry, > threads already inside the fs will continue until they leave the fs > and block on next entry. Ok, fair enough, but... > A suspended fs has the guarantee that no other thread will be inside > fstrans_suspend / fstrans_done of any vnode operation. > > Threads stuck permanently as in (c) are impossible to catch. ...doesn't that mean the (c) threads are going to be holding read locks, so the suspend will hang forever? We have to assume there will be at least one such thread, as people don't generally attempt umount -f unless something's wedged. > > If we're going to track that information we should really do it from > > vnode_if.c, both to avoid having to modify every fs and to make sure > > all fses support it correctly. (We also need to be careful about how > > it's done to avoid causing massive lock contention; that's why such > > logic doesn't already exist.) > > Some cases are nearly impossible to track at the vnode_if.c level: > > - Both VOP_GETPAGES() and VOP_PUTPAGES() cannot block or sleep here > as they run part or all of the operation with vnode interlock held. Ugh. But this doesn't, for example, preclude atomically incrementing a per-cpu counter or setting a flag in the lwp structure. > - Accessing devices and fifos through a file system cannot be tracked > at the vnode_if.c level. Take ufs_spec_read() for example: > > fstrans_start(...); > VTOI(vp)->i_flag |= IN_ACCESS; > fstrans_done(...); > return VOCALL(spec_vnodeop_p, ...) > > Here the VOCALL may sleep forever if the device has no data. This I don't understand. To the extent it's "in" the fs while it's doing this, vnode_if.c can know that, because it called ufs_spec_read and ufs_spec_read hasn't returned yet. To the extent that it's in specfs, specfs won't itself ever be umount -f'd, since it isn't mounted, so there's no danger. If umount -f currently blats over the data that specfs uses, then it's currently not safe, but I don't see how it's different from any other similar case with ufs data. I expect I'm missing something. -- David A. Holland dholl...@netbsd.org
Re: Making forced unmounts work
On Dec 2, 2012, at 5:00 AM, David Holland wrote: > On Thu, Nov 29, 2012 at 06:19:37PM +0100, J. Hannken-Illjes wrote: In short the attached diff: - Adds a new kernel-internal errno ERESTARTVOP and changes VCALL() to restart a vnode operation once it returns ERESTARTVOP. - Changes fstrans_start() to take an optional `hint vnode' and return ERESTARTVOP if the vnode becomes dead. >>> >>> Is there any major reason we can't just use ERESTART and rerun the >>> whole syscall? >> >> Not all vnode operations come from a syscall and to me it looks cleaner >> to use one private errno for exactly this purpose. > > Could be. All those places are supposed to be capable of coping with > ERESTART though (otherwise, they break if it happens) so it shouldn't > make much difference. And if it does make a difference somewhere, that > should be fixed... regardless of ERESTART for signals, we want FS > operations to be able to bail and restart for transaction abort. I'm convinced -- having fstrans_start() return ERESTART is the way to go. >>> Also I wonder if there's any way to accomplish this that doesn't >>> require adding fstrans calls to every operation in every fs. >> >> Not in a clean way. We would need some kind of reference counting for >> vnode operations and that is quite impossible as vnode operations on >> devices or fifos sometimes wait forever and are called from other fs >> like ufsspec_read() for example. How could we protect UFS updating >> access times here? > > I'm not entirely convinced of that. There are basically three > problems: (a) new incoming threads, (b) threads that are already in > the fs and running, and (c) threads that are already in the fs and > that are stuck more or less permanently because something broke. > > Admittedly I don't really understand how fstrans suspending works. > Does it keep track of all the threads that are in the fs, so the (b) > ones can be interrupted somehow, or so we at least can wait until all > of them either leave the fs or enter fstrans somewhere and stall? Fstrans is a recursive rw-lock. Vnode operations take the reader lock on entry. To suspend a fs we take the writer lock and therefore it will catch both (a) and (b). New threads will block on first entry, threads already inside the fs will continue until they leave the fs and block on next entry. A suspended fs has the guarantee that no other thread will be inside fstrans_suspend / fstrans_done of any vnode operation. Threads stuck permanently as in (c) are impossible to catch. > If we're going to track that information we should really do it from > vnode_if.c, both to avoid having to modify every fs and to make sure > all fses support it correctly. (We also need to be careful about how > it's done to avoid causing massive lock contention; that's why such > logic doesn't already exist.) Some cases are nearly impossible to track at the vnode_if.c level: - Both VOP_GETPAGES() and VOP_PUTPAGES() cannot block or sleep here as they run part or all of the operation with vnode interlock held. - Accessing devices and fifos through a file system cannot be tracked at the vnode_if.c level. Take ufs_spec_read() for example: fstrans_start(...); VTOI(vp)->i_flag |= IN_ACCESS; fstrans_done(...); return VOCALL(spec_vnodeop_p, ...) Here the VOCALL may sleep forever if the device has no data. > If, however, fstrans isn't tracking that information, I don't see how > suspending the fs helps deal with the (b) threads, because if they're > currently running they can continue to chew on fs-specific data for > arbitrarily long before they run into anything that stalls them, and > there's no way to know when they're done or how many of them there > are. > > I don't really see what the issue with ufsspec_read() is, however, so > we may be talking past each other. -- J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)
Re: Making forced unmounts work
On Thu, Nov 29, 2012 at 06:19:37PM +0100, J. Hannken-Illjes wrote: > >> In short the attached diff: > >> > >> - Adds a new kernel-internal errno ERESTARTVOP and changes VCALL() to > >> restart a vnode operation once it returns ERESTARTVOP. > >> > >> - Changes fstrans_start() to take an optional `hint vnode' and return > >> ERESTARTVOP if the vnode becomes dead. > > > > Is there any major reason we can't just use ERESTART and rerun the > > whole syscall? > > Not all vnode operations come from a syscall and to me it looks cleaner > to use one private errno for exactly this purpose. Could be. All those places are supposed to be capable of coping with ERESTART though (otherwise, they break if it happens) so it shouldn't make much difference. And if it does make a difference somewhere, that should be fixed... regardless of ERESTART for signals, we want FS operations to be able to bail and restart for transaction abort. > > I see there are two references to ERESTARTVOP in genfs_io.c, and I > > don't see what they're for without digging deeper, but given that they > > appear to make locking behavior depend on the error condition maybe it > > would be better not to do that too. :-/ > > This is the wonderful world of VOP_GETPAGES() and VOP_PUTPAGES(). Both > are called with vnode interlock held and when it is needed and possible > to check the vnode the interlock has been released. When these operations > return ERESTARTVOP we have to lock the interlock because dead_getpages() > and dead_putpages need it on entry (just to release it). > > It is possible to directly return the error from genfs_XXXpages() though. > To me it looks clearer to always go the ERESTARTVOP route. Ugh. I don't like having the locking behavior be different for different error cases; it's asking for trouble in the long run. I think this ends up being a reason to use ERESTART instead. > > Also I wonder if there's any way to accomplish this that doesn't > > require adding fstrans calls to every operation in every fs. > > Not in a clean way. We would need some kind of reference counting for > vnode operations and that is quite impossible as vnode operations on > devices or fifos sometimes wait forever and are called from other fs > like ufsspec_read() for example. How could we protect UFS updating > access times here? I'm not entirely convinced of that. There are basically three problems: (a) new incoming threads, (b) threads that are already in the fs and running, and (c) threads that are already in the fs and that are stuck more or less permanently because something broke. Admittedly I don't really understand how fstrans suspending works. Does it keep track of all the threads that are in the fs, so the (b) ones can be interrupted somehow, or so we at least can wait until all of them either leave the fs or enter fstrans somewhere and stall? If we're going to track that information we should really do it from vnode_if.c, both to avoid having to modify every fs and to make sure all fses support it correctly. (We also need to be careful about how it's done to avoid causing massive lock contention; that's why such logic doesn't already exist.) If, however, fstrans isn't tracking that information, I don't see how suspending the fs helps deal with the (b) threads, because if they're currently running they can continue to chew on fs-specific data for arbitrarily long before they run into anything that stalls them, and there's no way to know when they're done or how many of them there are. I don't really see what the issue with ufsspec_read() is, however, so we may be talking past each other. -- David A. Holland dholl...@netbsd.org
Re: Making forced unmounts work
On Nov 29, 2012, at 5:17 PM, David Holland wrote: > On Mon, Nov 26, 2012 at 03:06:34PM +0100, J. Hannken-Illjes wrote: >> In short the attached diff: >> >> - Adds a new kernel-internal errno ERESTARTVOP and changes VCALL() to >> restart a vnode operation once it returns ERESTARTVOP. >> >> - Changes fstrans_start() to take an optional `hint vnode' and return >> ERESTARTVOP if the vnode becomes dead. > > Is there any major reason we can't just use ERESTART and rerun the > whole syscall? Not all vnode operations come from a syscall and to me it looks cleaner to use one private errno for exactly this purpose. > I see there are two references to ERESTARTVOP in genfs_io.c, and I > don't see what they're for without digging deeper, but given that they > appear to make locking behavior depend on the error condition maybe it > would be better not to do that too. :-/ This is the wonderful world of VOP_GETPAGES() and VOP_PUTPAGES(). Both are called with vnode interlock held and when it is needed and possible to check the vnode the interlock has been released. When these operations return ERESTARTVOP we have to lock the interlock because dead_getpages() and dead_putpages need it on entry (just to release it). It is possible to directly return the error from genfs_XXXpages() though. To me it looks clearer to always go the ERESTARTVOP route. > Also I wonder if there's any way to accomplish this that doesn't > require adding fstrans calls to every operation in every fs. Not in a clean way. We would need some kind of reference counting for vnode operations and that is quite impossible as vnode operations on devices or fifos sometimes wait forever and are called from other fs like ufsspec_read() for example. How could we protect UFS updating access times here? -- J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)
Re: Making forced unmounts work
On Mon, Nov 26, 2012 at 03:06:34PM +0100, J. Hannken-Illjes wrote: > In short the attached diff: > > - Adds a new kernel-internal errno ERESTARTVOP and changes VCALL() to > restart a vnode operation once it returns ERESTARTVOP. > > - Changes fstrans_start() to take an optional `hint vnode' and return > ERESTARTVOP if the vnode becomes dead. Is there any major reason we can't just use ERESTART and rerun the whole syscall? I see there are two references to ERESTARTVOP in genfs_io.c, and I don't see what they're for without digging deeper, but given that they appear to make locking behavior depend on the error condition maybe it would be better not to do that too. :-/ Also I wonder if there's any way to accomplish this that doesn't require adding fstrans calls to every operation in every fs. More later on hopefully when I have more time to look... -- David A. Holland dholl...@netbsd.org
Re: Making forced unmounts work
On Nov 28, 2012, at 3:59 AM, David Young wrote: > On Mon, Nov 26, 2012 at 03:06:34PM +0100, J. Hannken-Illjes wrote: >> Comments or objections? > > I'm wondering if this will fix the bugs in 'mount -u -r /xyz' where a > FFS is mounted read-write at /xyz? Sorry, I don't remember any longer > what the bugs were. This one will not fix the rw->ro mount problem where data or meta data will be written to a file system mounted read only. Stay tuned... -- J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)
Re: Making forced unmounts work
On Mon, Nov 26, 2012 at 03:06:34PM +0100, J. Hannken-Illjes wrote: > Comments or objections? I'm wondering if this will fix the bugs in 'mount -u -r /xyz' where a FFS is mounted read-write at /xyz? Sorry, I don't remember any longer what the bugs were. Dave -- David Young dyo...@pobox.comUrbana, IL(217) 721-9981