On Sun, 2016-10-30 at 19:22 +0000, Bart Van Assche wrote:
> On 10/28/16 19:08, James Bottomley wrote:
> > This is a deadlock caused by an inversion issue in kernfs (suicide
> > vs
> > non-suicide removes); so fixing it in SCSI alone really isn't
> > appropriate.  I count at least five other subsystems all using this
> > mechanism, so they'll all be similarly affected.  It looks to be
> > fairly
> > simply fixable inside kernfs, so please fix it that way.
> 
> Hello James,
> 
> Can you clarify this further? To me this looks like the result of how
> the SCSI core works rather than an issue in the kernfs layer.

I'm at a bit of a loss, the problem looks clear from the original
trace, so I'm not really sure what's not clear to you.

The inversion is between the scan mutex and s_active which is the
rather fanciful name Tejun gave to the hand rolled mutex in
kernfs_node.

The reason for the inversion is that s_active is taken when you open a
sysfs file, including the delete one.  There's a special suidice path
to allow that file to be deleted while something else holds the lock. 
 However, if the delete path also takes any lock, and there's a way to
get into delete not via writing to sysfs (which is pretty much
universally true) then you get an inversion because kernfs_node mutex
is also taken when the file is removed, which is why it's not specific
to scsi.

Since you press the issue, I've got to say I'm not a huge fan of trying
to escape from a lock inversion by making some path asynchronous
because it usually leads to even more problems on down the road.  If
there's some problem with the generic fix, there is a way of fixing
this in SCSI without introducing asynchronicity.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to