Re: Making forced unmounts work

2012-11-27 Thread David Young
On Mon, Nov 26, 2012 at 03:06:34PM +0100, J. Hannken-Illjes wrote:
> Comments or objections?

I'm wondering if this will fix the bugs in 'mount -u -r /xyz' where a
FFS is mounted read-write at /xyz?  Sorry, I don't remember any longer
what the bugs were.

Dave

-- 
David Young
dyo...@pobox.comUrbana, IL(217) 721-9981


Re: Making forced unmounts work

2012-11-27 Thread J. Hannken-Illjes

On Nov 28, 2012, at 3:59 AM, David Young  wrote:

> On Mon, Nov 26, 2012 at 03:06:34PM +0100, J. Hannken-Illjes wrote:
>> Comments or objections?
> 
> I'm wondering if this will fix the bugs in 'mount -u -r /xyz' where a
> FFS is mounted read-write at /xyz?  Sorry, I don't remember any longer
> what the bugs were.

This one will not fix the rw->ro mount problem where data or meta data
will be written to a file system mounted read only.  Stay tuned...

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Re: Making forced unmounts work

2012-11-29 Thread David Holland
On Mon, Nov 26, 2012 at 03:06:34PM +0100, J. Hannken-Illjes wrote:
 > In short the attached diff:
 > 
 > - Adds a new kernel-internal errno ERESTARTVOP and changes VCALL() to
 >   restart a vnode operation once it returns ERESTARTVOP.
 > 
 > - Changes fstrans_start() to take an optional `hint vnode' and return
 >   ERESTARTVOP if the vnode becomes dead.

Is there any major reason we can't just use ERESTART and rerun the
whole syscall?

I see there are two references to ERESTARTVOP in genfs_io.c, and I
don't see what they're for without digging deeper, but given that they
appear to make locking behavior depend on the error condition maybe it
would be better not to do that too. :-/

Also I wonder if there's any way to accomplish this that doesn't
require adding fstrans calls to every operation in every fs.

More later on hopefully when I have more time to look...

-- 
David A. Holland
dholl...@netbsd.org


Re: Making forced unmounts work

2012-11-29 Thread J. Hannken-Illjes
On Nov 29, 2012, at 5:17 PM, David Holland  wrote:

> On Mon, Nov 26, 2012 at 03:06:34PM +0100, J. Hannken-Illjes wrote:
>> In short the attached diff:
>> 
>> - Adds a new kernel-internal errno ERESTARTVOP and changes VCALL() to
>>  restart a vnode operation once it returns ERESTARTVOP.
>> 
>> - Changes fstrans_start() to take an optional `hint vnode' and return
>>  ERESTARTVOP if the vnode becomes dead.
> 
> Is there any major reason we can't just use ERESTART and rerun the
> whole syscall?

Not all vnode operations come from a syscall and to me it looks cleaner
to use one private errno for exactly this purpose.

> I see there are two references to ERESTARTVOP in genfs_io.c, and I
> don't see what they're for without digging deeper, but given that they
> appear to make locking behavior depend on the error condition maybe it
> would be better not to do that too. :-/

This is the wonderful world of VOP_GETPAGES() and VOP_PUTPAGES().  Both
are called with vnode interlock held and when it is needed and possible
to check the vnode the interlock has been released.  When these operations
return ERESTARTVOP we have to lock the interlock because dead_getpages()
and dead_putpages need it on entry (just to release it).

It is possible to directly return the error from genfs_XXXpages() though.
To me it looks clearer to always go the ERESTARTVOP route.

> Also I wonder if there's any way to accomplish this that doesn't
> require adding fstrans calls to every operation in every fs.

Not in a clean way.  We would need some kind of reference counting for
vnode operations and that is quite impossible as vnode operations on
devices or fifos sometimes wait forever and are called from other fs
like ufsspec_read() for example.  How could we protect UFS updating
access times here?

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Re: Making forced unmounts work

2012-12-01 Thread David Holland
On Thu, Nov 29, 2012 at 06:19:37PM +0100, J. Hannken-Illjes wrote:
 > >> In short the attached diff:
 > >> 
 > >> - Adds a new kernel-internal errno ERESTARTVOP and changes VCALL() to
 > >>  restart a vnode operation once it returns ERESTARTVOP.
 > >> 
 > >> - Changes fstrans_start() to take an optional `hint vnode' and return
 > >>  ERESTARTVOP if the vnode becomes dead.
 > > 
 > > Is there any major reason we can't just use ERESTART and rerun the
 > > whole syscall?
 > 
 > Not all vnode operations come from a syscall and to me it looks cleaner
 > to use one private errno for exactly this purpose.

Could be. All those places are supposed to be capable of coping with
ERESTART though (otherwise, they break if it happens) so it shouldn't
make much difference. And if it does make a difference somewhere, that
should be fixed... regardless of ERESTART for signals, we want FS
operations to be able to bail and restart for transaction abort.

 > > I see there are two references to ERESTARTVOP in genfs_io.c, and I
 > > don't see what they're for without digging deeper, but given that they
 > > appear to make locking behavior depend on the error condition maybe it
 > > would be better not to do that too. :-/
 > 
 > This is the wonderful world of VOP_GETPAGES() and VOP_PUTPAGES().  Both
 > are called with vnode interlock held and when it is needed and possible
 > to check the vnode the interlock has been released.  When these operations
 > return ERESTARTVOP we have to lock the interlock because dead_getpages()
 > and dead_putpages need it on entry (just to release it).
 > 
 > It is possible to directly return the error from genfs_XXXpages() though.
 > To me it looks clearer to always go the ERESTARTVOP route.

Ugh.

I don't like having the locking behavior be different for different
error cases; it's asking for trouble in the long run. I think this
ends up being a reason to use ERESTART instead.

 > > Also I wonder if there's any way to accomplish this that doesn't
 > > require adding fstrans calls to every operation in every fs.
 > 
 > Not in a clean way. We would need some kind of reference counting for
 > vnode operations and that is quite impossible as vnode operations on
 > devices or fifos sometimes wait forever and are called from other fs
 > like ufsspec_read() for example.  How could we protect UFS updating
 > access times here?

I'm not entirely convinced of that. There are basically three
problems: (a) new incoming threads, (b) threads that are already in
the fs and running, and (c) threads that are already in the fs and
that are stuck more or less permanently because something broke.

Admittedly I don't really understand how fstrans suspending works.
Does it keep track of all the threads that are in the fs, so the (b)
ones can be interrupted somehow, or so we at least can wait until all
of them either leave the fs or enter fstrans somewhere and stall?

If we're going to track that information we should really do it from
vnode_if.c, both to avoid having to modify every fs and to make sure
all fses support it correctly. (We also need to be careful about how
it's done to avoid causing massive lock contention; that's why such
logic doesn't already exist.)

If, however, fstrans isn't tracking that information, I don't see how
suspending the fs helps deal with the (b) threads, because if they're
currently running they can continue to chew on fs-specific data for
arbitrarily long before they run into anything that stalls them, and
there's no way to know when they're done or how many of them there
are.

I don't really see what the issue with ufsspec_read() is, however, so
we may be talking past each other.

-- 
David A. Holland
dholl...@netbsd.org


Re: Making forced unmounts work

2012-12-02 Thread J . Hannken-Illjes
On Dec 2, 2012, at 5:00 AM, David Holland  wrote:

> On Thu, Nov 29, 2012 at 06:19:37PM +0100, J. Hannken-Illjes wrote:
 In short the attached diff:
 
 - Adds a new kernel-internal errno ERESTARTVOP and changes VCALL() to
 restart a vnode operation once it returns ERESTARTVOP.
 
 - Changes fstrans_start() to take an optional `hint vnode' and return
 ERESTARTVOP if the vnode becomes dead.
>>> 
>>> Is there any major reason we can't just use ERESTART and rerun the
>>> whole syscall?
>> 
>> Not all vnode operations come from a syscall and to me it looks cleaner
>> to use one private errno for exactly this purpose.
> 
> Could be. All those places are supposed to be capable of coping with
> ERESTART though (otherwise, they break if it happens) so it shouldn't
> make much difference. And if it does make a difference somewhere, that
> should be fixed... regardless of ERESTART for signals, we want FS
> operations to be able to bail and restart for transaction abort.

I'm convinced -- having fstrans_start() return ERESTART is the way to go.



>>> Also I wonder if there's any way to accomplish this that doesn't
>>> require adding fstrans calls to every operation in every fs.
>> 
>> Not in a clean way. We would need some kind of reference counting for
>> vnode operations and that is quite impossible as vnode operations on
>> devices or fifos sometimes wait forever and are called from other fs
>> like ufsspec_read() for example.  How could we protect UFS updating
>> access times here?
> 
> I'm not entirely convinced of that. There are basically three
> problems: (a) new incoming threads, (b) threads that are already in
> the fs and running, and (c) threads that are already in the fs and
> that are stuck more or less permanently because something broke.
> 
> Admittedly I don't really understand how fstrans suspending works.
> Does it keep track of all the threads that are in the fs, so the (b)
> ones can be interrupted somehow, or so we at least can wait until all
> of them either leave the fs or enter fstrans somewhere and stall?

Fstrans is a recursive rw-lock. Vnode operations take the reader lock
on entry.  To suspend a fs we take the writer lock and therefore
it will catch both (a) and (b).  New threads will block on first entry,
threads already inside the fs will continue until they leave the fs
and block on next entry.

A suspended fs has the guarantee that no other thread will be inside
fstrans_suspend / fstrans_done of any vnode operation.

Threads stuck permanently as in (c) are impossible to catch.
 
> If we're going to track that information we should really do it from
> vnode_if.c, both to avoid having to modify every fs and to make sure
> all fses support it correctly. (We also need to be careful about how
> it's done to avoid causing massive lock contention; that's why such
> logic doesn't already exist.)

Some cases are nearly impossible to track at the vnode_if.c level:

- Both VOP_GETPAGES() and VOP_PUTPAGES() cannot block or sleep here
  as they run part or all of the operation with vnode interlock held.

- Accessing devices and fifos through a file system cannot be tracked
  at the vnode_if.c level.  Take ufs_spec_read() for example:

fstrans_start(...);
VTOI(vp)->i_flag |= IN_ACCESS;
fstrans_done(...);
return VOCALL(spec_vnodeop_p, ...)

  Here the VOCALL may sleep forever if the device has no data.

> If, however, fstrans isn't tracking that information, I don't see how
> suspending the fs helps deal with the (b) threads, because if they're
> currently running they can continue to chew on fs-specific data for
> arbitrarily long before they run into anything that stalls them, and
> there's no way to know when they're done or how many of them there
> are.
> 
> I don't really see what the issue with ufsspec_read() is, however, so
> we may be talking past each other.


--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Re: Making forced unmounts work

2012-12-04 Thread David Holland
On Sun, Dec 02, 2012 at 05:29:01PM +0100, J. Hannken-Illjes wrote:
 > I'm convinced -- having fstrans_start() return ERESTART is the way to go.

Ok then :-)

 > >>> Also I wonder if there's any way to accomplish this that doesn't
 > >>> require adding fstrans calls to every operation in every fs.
 > >> 
 > >> Not in a clean way. We would need some kind of reference counting for
 > >> vnode operations and that is quite impossible as vnode operations on
 > >> devices or fifos sometimes wait forever and are called from other fs
 > >> like ufsspec_read() for example.  How could we protect UFS updating
 > >> access times here?
 > > 
 > > I'm not entirely convinced of that. There are basically three
 > > problems: (a) new incoming threads, (b) threads that are already in
 > > the fs and running, and (c) threads that are already in the fs and
 > > that are stuck more or less permanently because something broke.
 > > 
 > > Admittedly I don't really understand how fstrans suspending works.
 > > Does it keep track of all the threads that are in the fs, so the (b)
 > > ones can be interrupted somehow, or so we at least can wait until all
 > > of them either leave the fs or enter fstrans somewhere and stall?
 > 
 > Fstrans is a recursive rw-lock. Vnode operations take the reader lock
 > on entry.  To suspend a fs we take the writer lock and therefore
 > it will catch both (a) and (b).  New threads will block on first entry,
 > threads already inside the fs will continue until they leave the fs
 > and block on next entry.

Ok, fair enough, but...

 > A suspended fs has the guarantee that no other thread will be inside
 > fstrans_suspend / fstrans_done of any vnode operation.
 > 
 > Threads stuck permanently as in (c) are impossible to catch.

...doesn't that mean the (c) threads are going to be holding read
locks, so the suspend will hang forever?

We have to assume there will be at least one such thread, as people
don't generally attempt umount -f unless something's wedged.

 > > If we're going to track that information we should really do it from
 > > vnode_if.c, both to avoid having to modify every fs and to make sure
 > > all fses support it correctly. (We also need to be careful about how
 > > it's done to avoid causing massive lock contention; that's why such
 > > logic doesn't already exist.)
 > 
 > Some cases are nearly impossible to track at the vnode_if.c level:
 > 
 > - Both VOP_GETPAGES() and VOP_PUTPAGES() cannot block or sleep here
 >   as they run part or all of the operation with vnode interlock held.

Ugh. But this doesn't, for example, preclude atomically incrementing a
per-cpu counter or setting a flag in the lwp structure.

 > - Accessing devices and fifos through a file system cannot be tracked
 >   at the vnode_if.c level.  Take ufs_spec_read() for example:
 > 
 >  fstrans_start(...);
 >  VTOI(vp)->i_flag |= IN_ACCESS;
 >  fstrans_done(...);
 >  return VOCALL(spec_vnodeop_p, ...)
 > 
 >   Here the VOCALL may sleep forever if the device has no data.

This I don't understand. To the extent it's "in" the fs while it's
doing this, vnode_if.c can know that, because it called ufs_spec_read
and ufs_spec_read hasn't returned yet. To the extent that it's in
specfs, specfs won't itself ever be umount -f'd, since it isn't
mounted, so there's no danger.

If umount -f currently blats over the data that specfs uses, then it's
currently not safe, but I don't see how it's different from any other
similar case with ufs data.

I expect I'm missing something.

-- 
David A. Holland
dholl...@netbsd.org


Re: Making forced unmounts work

2012-12-04 Thread J. Hannken-Illjes
On Dec 4, 2012, at 4:44 PM, David Holland  wrote:

> On Sun, Dec 02, 2012 at 05:29:01PM +0100, J. Hannken-Illjes wrote:
> Also I wonder if there's any way to accomplish this that doesn't
> require adding fstrans calls to every operation in every fs.
 
 Not in a clean way. We would need some kind of reference counting for
 vnode operations and that is quite impossible as vnode operations on
 devices or fifos sometimes wait forever and are called from other fs
 like ufsspec_read() for example.  How could we protect UFS updating
 access times here?
>>> 
>>> I'm not entirely convinced of that. There are basically three
>>> problems: (a) new incoming threads, (b) threads that are already in
>>> the fs and running, and (c) threads that are already in the fs and
>>> that are stuck more or less permanently because something broke.
>>> 
>>> Admittedly I don't really understand how fstrans suspending works.
>>> Does it keep track of all the threads that are in the fs, so the (b)
>>> ones can be interrupted somehow, or so we at least can wait until all
>>> of them either leave the fs or enter fstrans somewhere and stall?
>> 
>> Fstrans is a recursive rw-lock. Vnode operations take the reader lock
>> on entry.  To suspend a fs we take the writer lock and therefore
>> it will catch both (a) and (b).  New threads will block on first entry,
>> threads already inside the fs will continue until they leave the fs
>> and block on next entry.
> 
> Ok, fair enough, but...
> 
>> A suspended fs has the guarantee that no other thread will be inside
>> fstrans_suspend / fstrans_done of any vnode operation.
>> 
>> Threads stuck permanently as in (c) are impossible to catch.
> 
> ...doesn't that mean the (c) threads are going to be holding read
> locks, so the suspend will hang forever?

Yes.

> We have to assume there will be at least one such thread, as people
> don't generally attempt umount -f unless something's wedged.

If forced unmounts have to work with vnode operations that will not
leave a fs the only solution will be to disable forced unmounts and
leave reboot as the last option.  How could we safely determine if
a vnode operation just takes some time or is stuck forever?  What should
we do with threads that are stuck forever?

>>> If we're going to track that information we should really do it from
>>> vnode_if.c, both to avoid having to modify every fs and to make sure
>>> all fses support it correctly. (We also need to be careful about how
>>> it's done to avoid causing massive lock contention; that's why such
>>> logic doesn't already exist.)
>> 
>> Some cases are nearly impossible to track at the vnode_if.c level:
>> 
>> - Both VOP_GETPAGES() and VOP_PUTPAGES() cannot block or sleep here
>>  as they run part or all of the operation with vnode interlock held.
> 
> Ugh. But this doesn't, for example, preclude atomically incrementing a
> per-cpu counter or setting a flag in the lwp structure.

Sure, but we need a point in time where it is safe to mark a vnode dead
and that means to reclaim its inode or other fs-private state.  This is
not just accounting -- we must be sure no thread is using fs-private data
and I don't see a way without suspending vnode operations.

>> - Accessing devices and fifos through a file system cannot be tracked
>>  at the vnode_if.c level.  Take ufs_spec_read() for example:
>> 
>>  fstrans_start(...);
>>  VTOI(vp)->i_flag |= IN_ACCESS;
>>  fstrans_done(...);
>>  return VOCALL(spec_vnodeop_p, ...)
>> 
>>  Here the VOCALL may sleep forever if the device has no data.
> 
> This I don't understand. To the extent it's "in" the fs while it's
> doing this, vnode_if.c can know that, because it called ufs_spec_read
> and ufs_spec_read hasn't returned yet. To the extent that it's in
> specfs, specfs won't itself ever be umount -f'd, since it isn't
> mounted, so there's no danger.

Ok, we had to use VOCALL rather than vnode_if.c and add the accounting
here. 

> 
> If umount -f currently blats over the data that specfs uses, then it's
> currently not safe, but I don't see how it's different from any other
> similar case with ufs data.
> 
> I expect I'm missing something.

So another approach is to serialize vclean(), at least when cleaning
active vnodes, and have VOCALL() take care of the vnode currently being
cleaned.  We would need just per-thread state then.  This would still
not work for vnode operations stuck in the fs.

Is this the direction you have in mind?

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Re: Making forced unmounts work

2012-12-05 Thread J. Hannken-Illjes
Looks like this thread is dead.  No one beside David Holland is
interested and David objects.  I take back my proposal.


David wants to track information about threads running a vnode
operation from vnode_if.c. I have no idea how this could be done
without knowing file system implementation.

David wants forced unmounts to work even if a thread gets stuck
permanently in a vnode operation.  I don't see a way to safely
reclaim a vnode from a file system when this vnode is in use by
another thread.

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Re: Making forced unmounts work

2012-12-05 Thread Michael van Elst
hann...@eis.cs.tu-bs.de ("J. Hannken-Illjes") writes:

>David wants forced unmounts to work even if a thread gets stuck
>permanently in a vnode operation.

How can it get stuck (short of bugs) ?

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: Making forced unmounts work

2012-12-06 Thread J. Hannken-Illjes
On Dec 6, 2012, at 8:32 AM, Michael van Elst  wrote:

> hann...@eis.cs.tu-bs.de ("J. Hannken-Illjes") writes:
> 
>> David wants forced unmounts to work even if a thread gets stuck
>> permanently in a vnode operation.
> 
> How can it get stuck (short of bugs) ?

Here we are talking about bugs only.

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Re: Making forced unmounts work

2012-12-06 Thread Martin Husemann
I am interested, but I lack significant vnode clue. So, sorry if answers
are obvious - they are not to me.

About the only situation I ever (and it is almost reproducable at will), in
daily life, wanted to use forced unmounts instead of rebooting a machine (or
before the machine rebooted itself in direct consequence of the bug that
caused the inital issue) is a NFS mount combined with what probably are
network driver bugs (or maybe hardware).

I have multiple NFS servers where I better force all clients to use TCP
or smaller write windows, otherwise the clients will lock up on bigger
writes. AFAICT the nfs level creates the necessary packets (repeataly,
after appropriate timeouts etc.) and passes them to the driver, but the
driver (or the hardware, and most likely it is the receiving side on the
server) reproducably drops them, so no progress overall happens.

Would this situation be fixed by your patch or would the nfs layer still
hold the reader lock?

Martin


Re: Making forced unmounts work

2012-12-06 Thread J. Hannken-Illjes
On Dec 6, 2012, at 10:14 AM, Martin Husemann  wrote:

> I am interested, but I lack significant vnode clue. So, sorry if answers
> are obvious - they are not to me.
> 
> About the only situation I ever (and it is almost reproducable at will), in
> daily life, wanted to use forced unmounts instead of rebooting a machine (or
> before the machine rebooted itself in direct consequence of the bug that
> caused the inital issue) is a NFS mount combined with what probably are
> network driver bugs (or maybe hardware).
> 
> I have multiple NFS servers where I better force all clients to use TCP
> or smaller write windows, otherwise the clients will lock up on bigger
> writes. AFAICT the nfs level creates the necessary packets (repeataly,
> after appropriate timeouts etc.) and passes them to the driver, but the
> driver (or the hardware, and most likely it is the receiving side on the
> server) reproducably drops them, so no progress overall happens.
> 
> Would this situation be fixed by your patch or would the nfs layer still
> hold the reader lock?

For now my patch only covers ffs and msdosfs file systems.

In general it should be possible to add fstrans to the nfs file system
making it possible to suspend it.  I did not dig deeper but as the nfs
client already has timeout mechanisms it should be possible to cover
this kind of problem.

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Re: Making forced unmounts work

2012-12-06 Thread Julian Yon
On Thu, 6 Dec 2012 08:23:07 +0100
"J. Hannken-Illjes"  wrote:

> Looks like this thread is dead.  No one beside David Holland is
> interested and David objects.  I take back my proposal.

Have been reading the discussion. Don't assume that no contribution
means no interest!

> David wants forced unmounts to work even if a thread gets stuck
> permanently in a vnode operation.  I don't see a way to safely
> reclaim a vnode from a file system when this vnode is in use by
> another thread.

I think you could take some inspiration from Linux here: it has a very
handy umount -l which detaches the filesystem from the tree, but defers
the rest of the unmount/cleanup until the fs is no longer busy. This
can help in situations where you're not trying to physically remove
media but just need to reclaim the mount point, and therefore have no
immediate need for a umount -f. Because no process can open anything
new on the detached fs, if it eventually unwedges itself somehow it
won't get rewedged.


Julian

-- 
3072D/F3A66B3A Julian Yon (2012 General Use) 


signature.asc
Description: PGP signature


Re: Making forced unmounts work

2012-12-06 Thread David Holland
On Thu, Dec 06, 2012 at 10:32:01AM +, Julian Yon wrote:
 > I think you could take some inspiration from Linux here: it has a very
 > handy umount -l which detaches the filesystem from the tree, but defers
 > the rest of the unmount/cleanup until the fs is no longer busy. This
 > can help in situations where you're not trying to physically remove
 > media but just need to reclaim the mount point, and therefore have no
 > immediate need for a umount -f. Because no process can open anything
 > new on the detached fs, if it eventually unwedges itself somehow it
 > won't get rewedged.

Yes, I've talked about supporting that elsewhere (maybe not on
tech-kern, or at least not recently) -- allowing it has basically no
downside, but it requires infrastructure we don't currently have.

(Most notably, if you umount -l a hanging nfs volume and then later
want to umount -f it, you need to be able to address the leftovers
somehow; currently because it isn't in the fs namespace any more and
it isn't associated with a device, there's no way to refer to it. I'm
not sure how, if at all, Linux copes with this case.)

Also it requires splitting the current struct mount into two parts,
the filesystem and the mounting of that filesystem; but that's
desirable in the long run for other reasons as well, such as
supporting plan9-type rebind mounts.

-- 
David A. Holland
dholl...@netbsd.org


Re: Making forced unmounts work

2012-12-06 Thread David Holland
On Tue, Dec 04, 2012 at 05:31:59PM +0100, J. Hannken-Illjes wrote:
 > >> A suspended fs has the guarantee that no other thread will be inside
 > >> fstrans_suspend / fstrans_done of any vnode operation.
 > >> 
 > >> Threads stuck permanently as in (c) are impossible to catch.
 > > 
 > > ...doesn't that mean the (c) threads are going to be holding read
 > > locks, so the suspend will hang forever?
 > 
 > Yes.
 > 
 > > We have to assume there will be at least one such thread, as people
 > > don't generally attempt umount -f unless something's wedged.
 > 
 > If forced unmounts have to work with vnode operations that will not
 > leave a fs the only solution will be to disable forced unmounts and
 > leave reboot as the last option.  How could we safely determine if
 > a vnode operation just takes some time or is stuck forever?  What should
 > we do with threads that are stuck forever?

You can't. That's why umount -f is currently unsafe. We can probably
make it more safe (and as just pointed out elsewhere, linux-style lazy
unmounts are safe and can often be used as a substitute) but making it
fully safe will be difficult.

The basic use case for umount -f is when you have processes hanging
uninterruptibly waiting for a hard-mounted NFS server that's gone away
and won't be coming back anytime soon.

The other common case is when you accidentally ejected a removable
device without unmounting it, and didn't notice until reinserting it
isn't an option. Usually what causes you to notice is accessing the
device and failing, and that often results in a stuck process.

If there isn't a process hanging, generally you can clear away any
processes holding the volume busy and do an ordinary umount.

The third case I guess is when vnode refcount bugs have left you with
a "busy" fs that nothing's actually using; in this case there won't be
processes hanging; but by the time you've confirmed this, there also
won't be processes using the fs, so the current scheme is more or less
safe enough...

 > Looks like this thread is dead.  No one beside David Holland is
 > interested and David objects.  I take back my proposal.

I think fstrans may not be the mechanism we want, but I don't think we
should give up.

 > >> - Both VOP_GETPAGES() and VOP_PUTPAGES() cannot block or sleep here
 > >>  as they run part or all of the operation with vnode interlock held.
 > > 
 > > Ugh. But this doesn't, for example, preclude atomically incrementing a
 > > per-cpu counter or setting a flag in the lwp structure.
 > 
 > Sure, but we need a point in time where it is safe to mark a vnode dead
 > and that means to reclaim its inode or other fs-private state.  This is
 > not just accounting -- we must be sure no thread is using fs-private data
 > and I don't see a way without suspending vnode operations.

The way to do this is to divide the "dead" operation into two parts:
first change the ops vector (or set a dead flag) so no new threads can
enter, then when the threads already in the fs are gone or otherwise
neutralized, delete the private data.

Or we could just leave the private data. One of the advantages of
setting a dead flag (and testing it in vnode_if.c) instead of changing
the ops vector is that the vnode doesn't lose its attachment to the fs
and can be cleaned up properly later if it ever gets released.

As for "neutralized", one possible scheme is to mark the vnodes dead,
wait for a few seconds, and then do something evil to the blocked
threads still using the fs; that requires knowing which threads they
are, which is the hard part. Any simple scheme will end up being a
hotspot for cache contention.

See the "kicking everybody out of the softc" thread from a couple
years ago for one possible approach to that.

 > >> - Accessing devices and fifos through a file system cannot be tracked
 > >>  at the vnode_if.c level.  Take ufs_spec_read() for example:
 > >> 
 > >>   fstrans_start(...);
 > >>   VTOI(vp)->i_flag |= IN_ACCESS;
 > >>   fstrans_done(...);
 > >>   return VOCALL(spec_vnodeop_p, ...)
 > >> 
 > >>  Here the VOCALL may sleep forever if the device has no data.
 > > 
 > > This I don't understand. To the extent it's "in" the fs while it's
 > > doing this, vnode_if.c can know that, because it called ufs_spec_read
 > > and ufs_spec_read hasn't returned yet. To the extent that it's in
 > > specfs, specfs won't itself ever be umount -f'd, since it isn't
 > > mounted, so there's no danger.
 > 
 > Ok, we had to use VOCALL rather than vnode_if.c and add the accounting
 > here.

No, I don't see why. vnode_if.c calls ufs_spec_read(); until
ufs_spec_read() returns, as far as tracking done in vnode_if.c is
concerned, the thread doing it is still in the fs.

 > > If umount -f currently blats over the data that specfs uses, then it's
 > > currently not safe, but I don't see how it's different from any other
 > > similar case with ufs data.
 > > 
 > > I expect I'm missing something.
 > 
 > So another approach is to serialize vclean(), at least when cl

re: Making forced unmounts work

2012-12-06 Thread matthew green

> Looks like this thread is dead.  No one beside David Holland is
> interested and David objects.  I take back my proposal.

i'm very interested in this idea.

> David wants to track information about threads running a vnode
> operation from vnode_if.c. I have no idea how this could be done
> without knowing file system implementation.
> 
> David wants forced unmounts to work even if a thread gets stuck
> permanently in a vnode operation.  I don't see a way to safely
> reclaim a vnode from a file system when this vnode is in use by
> another thread.

i think we should strive for this goal, even if we're not sure how
to get there yet.

but more importantly, your change seems to fix at least a large
part of the problem and i think we should continue to consider it.


.mrg.


Re: Making forced unmounts work

2012-12-06 Thread Taylor R Campbell
   Date: Thu, 6 Dec 2012 08:23:07 +0100
   From: "J. Hannken-Illjes" 

   Looks like this thread is dead.  No one beside David Holland is
   interested and David objects.  I take back my proposal.

I'm interested, but I haven't fit enough of the vnode life cycle or
the fstrans mechanism into my head to assess how well it is likely to
work, and I don't think it's a good strategy in general to put more
boilerplate into every vop implementation -- there is far too much
boilerplate already that lots of vop implementations get wrong.

As a step toward fixing the first problem, could you perhaps write
down the rules about how fstrans currently composes with vnode locks
and operations and life cycle?  I seem to recall there was some issue
a year or so ago about the ordering of vput and fstrans_done in
ufs_rmdir, and the nature of the issue wasn't clear to me from the man
pages or a cursory examination of the source, which suggested to me
that I don't understand fstrans well enough to reason about patches
like this.


Re: Making forced unmounts work

2012-12-06 Thread YAMAMOTO Takashi
hi,

> Forced unmounts will most likely panic the kernel.  The main problem is
> other threads running a vnode operation when we come to clean and
> reclaim an active vnode and therefore change its operation vector and
> destroy the file system private data without locking or synchronisation.
> 
> One solution is to to suspend the file system while it gets unmounted.
> This way all other threads running a vnode operation will wait for
> fstrans_start() to succeed.  Changing fstrans_start() to detect a now
> dead vnode it is possible to restart the vnode operation with the new
> operations vector.
> 
> In short the attached diff:
> 
> - Adds a new kernel-internal errno ERESTARTVOP and changes VCALL() to
>   restart a vnode operation once it returns ERESTARTVOP.
> 
> - Changes fstrans_start() to take an optional `hint vnode' and return
>   ERESTARTVOP if the vnode becomes dead.
> 
> - Rearranges all file system operations to call fstrans_start() before
>   accessing file system private data and check (and return) an error.
> 
> - Changes vfs_suspend() to optionally suspend a file system without
>   syncing it to disk.  This feature is used when revoking a vnode.
> 
> - Changes dounmount() to suspend a file system during the unmount.
> 
> - Keeps the `struct mp' until all (dead) vnodes disappear.
> 
> - Adds some missing operations to deadfs.
> 
> Comments or objections?

thanks for working on this.

if possible, it's better to avoid taking a rwlock for every VOPs.
IMO umount and revoke, instead every VOPs, ought to do the hard work.
have you considered to do it at upper levels like file, descriptors,
mmaps, etc?

eg.
1. iterate all processes to mark their files revoked
2. make long-sleepers (fifo, nfs rpcs, etc) abort
3. wait for reference counts settle

you may need to send IPIs as some operations are barrier-less.
(fd_getfile/putfile)

iirc dragonflybsd implements revoke at the file or descriptor level.

YAMAMOTO Takashi

> 
> --
> J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)


Re: Making forced unmounts work

2012-12-07 Thread Michael van Elst
hann...@eis.cs.tu-bs.de ("J. Hannken-Illjes") writes:

>On Dec 6, 2012, at 8:32 AM, Michael van Elst  wrote:

>> hann...@eis.cs.tu-bs.de ("J. Hannken-Illjes") writes:
>> 
>>> David wants forced unmounts to work even if a thread gets stuck
>>> permanently in a vnode operation.
>> 
>> How can it get stuck (short of bugs) ?

>Here we are talking about bugs only.


I am interested in the case where a mounted device is detached and the
umount blocks forever. Wether that is considered a bug or a design
decision, I don't care.


-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: Making forced unmounts work

2012-12-08 Thread J. Hannken-Illjes
The more I think the more I just want to remove forced unmounts.

To take some examples:

- A hard,nointr NFS mount hanging because the server stops responding.

Even if it were possible to use fstrans_ here (and it would become ugly)
it would not help.  The root node of the mount will likely be locked by the
first thread trying to lookup so unmount won't be able to even lookup
the mount point.  If it were possible to run `mount -u' or `unmount' it
should be possible to update the mount as `soft,intr' and proceed as usual,
kill threads blocking an unmount and unmount.

- Ejecting a removable device currently mounted.

In this case the disk driver should error out and all threads should
come back with an I/O-error.  If this is not the case the driver is buggy.

- Refcount and other bugs

... should be fixed ...

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Re: Making forced unmounts work

2012-12-08 Thread Christos Zoulas
In article <31490263-5a8e-411a-bb57-f7fc5cffc...@eis.cs.tu-bs.de>,
J. Hannken-Illjes  wrote:
>The more I think the more I just want to remove forced unmounts.

I think that any operation that cannot be undone (and requires reboot
to be undone) makes the OS less resilient to failure.

>To take some examples:
>
>- A hard,nointr NFS mount hanging because the server stops responding.
>
>Even if it were possible to use fstrans_ here (and it would become ugly)
>it would not help.  The root node of the mount will likely be locked by the
>first thread trying to lookup so unmount won't be able to even lookup
>the mount point.  If it were possible to run `mount -u' or `unmount' it
>should be possible to update the mount as `soft,intr' and proceed as usual,
>kill threads blocking an unmount and unmount.

Store the normalized mount path with the mountpoint, look it up in the mount
list, make all blocked threads give an I/O error on the current operation,
etc.

christos