Re: [PATCH] sd: use async_probe cookie to avoid deadlocks

2017-03-21 Thread Hannes Reinecke
On 03/21/2017 04:33 PM, James Bottomley wrote:
> On Tue, 2017-03-21 at 16:25 +0100, Hannes Reinecke wrote:
>> On 03/21/2017 02:05 PM, James Bottomley wrote:
>>> On Tue, 2017-03-21 at 13:14 +0100, Hannes Reinecke wrote:
 With the current design we're waiting for all async probes to
 finish when removing any sd device.
 This might lead to a livelock where the 'remove' call is blocking
 for any probe calls to finish, and the probe calls are waiting
 for
 a response, which will never be processes as the thread handling
 the responses is waiting for the remove call to finish.
 Which is completely pointless as we only _really_ care for the
 probe on _this_ device to be completed; any other probing can
 happily continue for all we care.
 So save the async probing cookie in the structure and only wait
 if this specific probe is still active.
>>>
>>> How does this preserve ordering?  It looks like you have one cookie 
>>> per sdkp ... is there some sort of ordering guarantee I'm not
>>> seeing?
>>>
>> Do we need one?
>> The only thing we care here is that probing for _this_ device has 
>> finished.
> 
> OK, so currently we guarantee the linear ordering luns for individual
> hbas.  We also guarantee no interleaving of sdX letters for individual
> hbas.  We don't guarantee the scan order of the hbas themselves. 
>  Preserve those guarantees and I'm happy with the patch.  If you can't
> preserve them I think we need further discussion.
> 
Which is actually not true.
If just some devices are removed from the hba (eg if they belong to the
same remote port) and we're rescanning devices once the port comes back
there is no guarantee that the devices will be getting the same device
letters. Nor that the device letters will be consecutive; just starting
'scsi_debug' with just one device before rescanning will mess up the
ordering. Even now.

So I don't see how we can be worse off than we are today.

Plus we (what with me now speaking for SUSE) never promised our
customers anything regardind sdX stability :-)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)


Re: [PATCH] sd: use async_probe cookie to avoid deadlocks

2017-03-21 Thread Hannes Reinecke
On 03/21/2017 02:05 PM, James Bottomley wrote:
> On Tue, 2017-03-21 at 13:14 +0100, Hannes Reinecke wrote:
>> With the current design we're waiting for all async probes to
>> finish when removing any sd device.
>> This might lead to a livelock where the 'remove' call is blocking
>> for any probe calls to finish, and the probe calls are waiting for
>> a response, which will never be processes as the thread handling
>> the responses is waiting for the remove call to finish.
>> Which is completely pointless as we only _really_ care for the
>> probe on _this_ device to be completed; any other probing can
>> happily continue for all we care.
>> So save the async probing cookie in the structure and only wait
>> if this specific probe is still active.
> 
> How does this preserve ordering?  It looks like you have one cookie per
> sdkp ... is there some sort of ordering guarantee I'm not seeing?
> 
Do we need one?
The only thing we care here is that probing for _this_ device has finished.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)


Re: [PATCH] sd: use async_probe cookie to avoid deadlocks

2017-03-21 Thread James Bottomley
On Tue, 2017-03-21 at 16:25 +0100, Hannes Reinecke wrote:
> On 03/21/2017 02:05 PM, James Bottomley wrote:
> > On Tue, 2017-03-21 at 13:14 +0100, Hannes Reinecke wrote:
> > > With the current design we're waiting for all async probes to
> > > finish when removing any sd device.
> > > This might lead to a livelock where the 'remove' call is blocking
> > > for any probe calls to finish, and the probe calls are waiting
> > > for
> > > a response, which will never be processes as the thread handling
> > > the responses is waiting for the remove call to finish.
> > > Which is completely pointless as we only _really_ care for the
> > > probe on _this_ device to be completed; any other probing can
> > > happily continue for all we care.
> > > So save the async probing cookie in the structure and only wait
> > > if this specific probe is still active.
> > 
> > How does this preserve ordering?  It looks like you have one cookie 
> > per sdkp ... is there some sort of ordering guarantee I'm not
> > seeing?
> > 
> Do we need one?
> The only thing we care here is that probing for _this_ device has 
> finished.

OK, so currently we guarantee the linear ordering luns for individual
hbas.  We also guarantee no interleaving of sdX letters for individual
hbas.  We don't guarantee the scan order of the hbas themselves. 
 Preserve those guarantees and I'm happy with the patch.  If you can't
preserve them I think we need further discussion.

James




Re: [PATCH] sd: use async_probe cookie to avoid deadlocks

2017-03-21 Thread Hannes Reinecke
On 03/21/2017 02:33 PM, James Bottomley wrote:
> On Tue, 2017-03-21 at 13:30 +, Bart Van Assche wrote:
>> On Tue, 2017-03-21 at 09:05 -0400, James Bottomley wrote:
>>> How does this preserve ordering?  It looks like you have one cookie 
>>> per sdkp ... is there some sort of ordering guarantee I'm not
>>> seeing?
>>
>> Hello James,
>>
>> Since the probe order depends on the order in which __async_probe() 
>> adds entries to the "pending" list, and since the order of the
>> __async_probe() calls is not changed by this patch, shouldn't the 
>> probe order be preserved by this patch?
> 
> I don't know: that's what I'm asking.  I believe they complete in order
> for a single domain.  I thought ordering isn't preserved between
> domains?  So moving to multiple domains loses us ordering of disk
> appearance.
> 
Ah.
But we don't move to multiple domains, now do we?
We're just terminating the wait until _our_ probe is completed.
It's not that we're having a individual probe domain per device...

Unless I'm misunderstanding something...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)


Re: [PATCH] sd: use async_probe cookie to avoid deadlocks

2017-03-21 Thread Bart Van Assche
On Tue, 2017-03-21 at 09:33 -0400, James Bottomley wrote:
> On Tue, 2017-03-21 at 13:30 +, Bart Van Assche wrote:
> > On Tue, 2017-03-21 at 09:05 -0400, James Bottomley wrote:
> > > How does this preserve ordering?  It looks like you have one cookie 
> > > per sdkp ... is there some sort of ordering guarantee I'm not
> > > seeing?
> > 
> > Hello James,
> > 
> > Since the probe order depends on the order in which __async_probe() 
> > adds entries to the "pending" list, and since the order of the
> > __async_probe() calls is not changed by this patch, shouldn't the 
> > probe order be preserved by this patch?
> 
> I don't know: that's what I'm asking.  I believe they complete in order
> for a single domain.  I thought ordering isn't preserved between
> domains?  So moving to multiple domains loses us ordering of disk
> appearance.

Right, since sd_remove() doesn't wait any longer for completion of probes
from other domains the multi-domain probing behavior may change due to this
patch. However, the multi-domain probing order was already dependent on the
duration of individual probes so I don't think that it is guaranteed today
that multi-domain probing happens in the same order during every boot. I
hope that the change introduced by this patch will be considered acceptable.

Bart.

Re: [PATCH] sd: use async_probe cookie to avoid deadlocks

2017-03-21 Thread James Bottomley
On Tue, 2017-03-21 at 13:30 +, Bart Van Assche wrote:
> On Tue, 2017-03-21 at 09:05 -0400, James Bottomley wrote:
> > How does this preserve ordering?  It looks like you have one cookie 
> > per sdkp ... is there some sort of ordering guarantee I'm not
> > seeing?
> 
> Hello James,
> 
> Since the probe order depends on the order in which __async_probe() 
> adds entries to the "pending" list, and since the order of the
> __async_probe() calls is not changed by this patch, shouldn't the 
> probe order be preserved by this patch?

I don't know: that's what I'm asking.  I believe they complete in order
for a single domain.  I thought ordering isn't preserved between
domains?  So moving to multiple domains loses us ordering of disk
appearance.

James




Re: [PATCH] sd: use async_probe cookie to avoid deadlocks

2017-03-21 Thread Bart Van Assche
On Tue, 2017-03-21 at 09:05 -0400, James Bottomley wrote:
> How does this preserve ordering?  It looks like you have one cookie per
> sdkp ... is there some sort of ordering guarantee I'm not seeing?

Hello James,

Since the probe order depends on the order in which __async_probe() adds
entries to the "pending" list, and since the order of the __async_probe()
calls is not changed by this patch, shouldn't the probe order be preserved
by this patch?

Thanks,

Bart.

Re: [PATCH] sd: use async_probe cookie to avoid deadlocks

2017-03-21 Thread James Bottomley
On Tue, 2017-03-21 at 13:14 +0100, Hannes Reinecke wrote:
> With the current design we're waiting for all async probes to
> finish when removing any sd device.
> This might lead to a livelock where the 'remove' call is blocking
> for any probe calls to finish, and the probe calls are waiting for
> a response, which will never be processes as the thread handling
> the responses is waiting for the remove call to finish.
> Which is completely pointless as we only _really_ care for the
> probe on _this_ device to be completed; any other probing can
> happily continue for all we care.
> So save the async probing cookie in the structure and only wait
> if this specific probe is still active.

How does this preserve ordering?  It looks like you have one cookie per
sdkp ... is there some sort of ordering guarantee I'm not seeing?

James




Re: [PATCH] sd: use async_probe cookie to avoid deadlocks

2017-03-21 Thread Bart Van Assche
On Tue, 2017-03-21 at 13:14 +0100, Hannes Reinecke wrote:
> With the current design we're waiting for all async probes to
> finish when removing any sd device.
> This might lead to a livelock where the 'remove' call is blocking
> for any probe calls to finish, and the probe calls are waiting for
> a response, which will never be processes as the thread handling
> the responses is waiting for the remove call to finish.
> Which is completely pointless as we only _really_ care for the
> probe on _this_ device to be completed; any other probing can
> happily continue for all we care.
> So save the async probing cookie in the structure and only wait
> if this specific probe is still active.

Nice work! This may even help to reduce system boot time.

Reviewed-by: Bart Van Assche 

[PATCH] sd: use async_probe cookie to avoid deadlocks

2017-03-21 Thread Hannes Reinecke
With the current design we're waiting for all async probes to
finish when removing any sd device.
This might lead to a livelock where the 'remove' call is blocking
for any probe calls to finish, and the probe calls are waiting for
a response, which will never be processes as the thread handling
the responses is waiting for the remove call to finish.
Which is completely pointless as we only _really_ care for the
probe on _this_ device to be completed; any other probing can
happily continue for all we care.
So save the async probing cookie in the structure and only wait
if this specific probe is still active.

Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/sd.c | 7 ---
 drivers/scsi/sd.h | 3 +++
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index fb9b4d2..9f932e4 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -48,7 +48,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -3217,7 +3216,8 @@ static int sd_probe(struct device *dev)
dev_set_drvdata(dev, sdkp);
 
get_device(>dev); /* prevent release before async_schedule */
-   async_schedule_domain(sd_probe_async, sdkp, _sd_probe_domain);
+   sdkp->async_probe = async_schedule_domain(sd_probe_async, sdkp,
+ _sd_probe_domain);
 
return 0;
 
@@ -3256,7 +3256,8 @@ static int sd_remove(struct device *dev)
scsi_autopm_get_device(sdkp->device);
 
async_synchronize_full_domain(_sd_pm_domain);
-   async_synchronize_full_domain(_sd_probe_domain);
+   async_synchronize_cookie_domain(sdkp->async_probe,
+   _sd_probe_domain);
device_del(>dev);
del_gendisk(sdkp->disk);
sd_shutdown(dev);
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 4dac35e..d4b5826 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -1,6 +1,8 @@
 #ifndef _SCSI_DISK_H
 #define _SCSI_DISK_H
 
+#include 
+
 /*
  * More than enough for everybody ;)  The huge number of majors
  * is a leftover from 16bit dev_t days, we don't really need that
@@ -73,6 +75,7 @@ struct scsi_disk {
unsigned intzones_optimal_nonseq;
unsigned intzones_max_open;
 #endif
+   async_cookie_t  async_probe;
atomic_topeners;
sector_tcapacity;   /* size in logical blocks */
u32 max_xfer_blocks;
-- 
1.8.5.6