Re: [PATCH] Change synchronize_rcu() in scsi_device_quiesce() into synchronize_sched()

2018-03-19 Thread Bart Van Assche
On Mon, 2018-03-19 at 10:02 -0700, t...@kernel.org wrote:
> On Mon, Mar 19, 2018 at 04:57:45PM +, Bart Van Assche wrote:
> > For synchronization primitives that wait having a stronger synchronization
> > primitive nested inside a more relaxed one can lead to a deadlock. But since
> > the rcu read lock primitives do not wait it could be safe to use that kind
> > of nesting with RCU. Do you perhaps know whether any documentation is
> > available about that kind of nesting or whether it is already used elsewhere
> > in the kernel?
> 
> Oh, we nest them all the time.  They're like (and sometimes literally
> are) preempt_disable() and don't care about nest ordering.

Hello Martin,

This was probably already clear to you, but anyway: please drop the patch at the
start of this thread.

Thanks,

Bart.




Re: [PATCH] Change synchronize_rcu() in scsi_device_quiesce() into synchronize_sched()

2018-03-19 Thread t...@kernel.org
Hey,

On Mon, Mar 19, 2018 at 04:57:45PM +, Bart Van Assche wrote:
> For synchronization primitives that wait having a stronger synchronization
> primitive nested inside a more relaxed one can lead to a deadlock. But since
> the rcu read lock primitives do not wait it could be safe to use that kind
> of nesting with RCU. Do you perhaps know whether any documentation is
> available about that kind of nesting or whether it is already used elsewhere
> in the kernel?

Oh, we nest them all the time.  They're like (and sometimes literally
are) preempt_disable() and don't care about nest ordering.

Thanks.

-- 
tejun


Re: [PATCH] Change synchronize_rcu() in scsi_device_quiesce() into synchronize_sched()

2018-03-19 Thread Bart Van Assche
On Mon, 2018-03-19 at 09:29 -0700, t...@kernel.org wrote:
> This could be me being slow but can you explain how what
> percpu_ref_tryget_live() uses internally affects whether we can use
> regular RCU around it?

Hello Tejun,

For synchronization primitives that wait having a stronger synchronization
primitive nested inside a more relaxed one can lead to a deadlock. But since
the rcu read lock primitives do not wait it could be safe to use that kind
of nesting with RCU. Do you perhaps know whether any documentation is
available about that kind of nesting or whether it is already used elsewhere
in the kernel?

Thanks,

Bart.



Re: [PATCH] Change synchronize_rcu() in scsi_device_quiesce() into synchronize_sched()

2018-03-19 Thread t...@kernel.org
Hello,

On Mon, Mar 19, 2018 at 04:18:54PM +, Bart Van Assche wrote:
> The algorithm explained above does not depend on sched-rcu. But because
> percpu_ref_tryget_live() uses sched-rcu and because we need to add an RCU lock
> around that call we are forced to use sched-rcu. I hope this makes it clear.

This could be me being slow but can you explain how what
percpu_ref_tryget_live() uses internally affects whether we can use
regular RCU around it?

Thanks.

-- 
tejun


Re: [PATCH] Change synchronize_rcu() in scsi_device_quiesce() into synchronize_sched()

2018-03-19 Thread Bart Van Assche
On Mon, 2018-03-19 at 08:21 -0700, t...@kernel.org wrote:
> On Mon, Mar 19, 2018 at 03:16:07PM +, Bart Van Assche wrote:
> > As explained in the comment in scsi_device_quiesce(), the effect of
> > blk_set_preempt_only() must be visible for percpu_ref_tryget() calls that
> > occur after the queue unfreeze triggered by scsi_device_quiesce(). Hence the
> > RCU read lock calls in blk_queue_enter(). Since these RCU read lock calls
> > surround a function call that uses rcu_read_lock_sched(), namely
> > percpu_ref_tryget_live(), we have to use sched-RCU in both blk_queue_enter()
> > and scsi_device_quiesce().
> 
> Let's please not depend on rcu_read_lock_sched() in
> percpu_ref_tryget_live().  That's an implementation detail.  If the
> code needs RCU based synchronization, it should perform them
> explicitly.

Hello Tejun,

The algorithm explained above does not depend on sched-rcu. But because
percpu_ref_tryget_live() uses sched-rcu and because we need to add an RCU lock
around that call we are forced to use sched-rcu. I hope this makes it clear.

Bart.




Re: [PATCH] Change synchronize_rcu() in scsi_device_quiesce() into synchronize_sched()

2018-03-19 Thread t...@kernel.org
Hello,

On Mon, Mar 19, 2018 at 03:16:07PM +, Bart Van Assche wrote:
> As explained in the comment in scsi_device_quiesce(), the effect of
> blk_set_preempt_only() must be visible for percpu_ref_tryget() calls that
> occur after the queue unfreeze triggered by scsi_device_quiesce(). Hence the
> RCU read lock calls in blk_queue_enter(). Since these RCU read lock calls
> surround a function call that uses rcu_read_lock_sched(), namely
> percpu_ref_tryget_live(), we have to use sched-RCU in both blk_queue_enter()
> and scsi_device_quiesce().

Let's please not depend on rcu_read_lock_sched() in
percpu_ref_tryget_live().  That's an implementation detail.  If the
code needs RCU based synchronization, it should perform them
explicitly.

Thanks.

-- 
tejun


Re: [PATCH] Change synchronize_rcu() in scsi_device_quiesce() into synchronize_sched()

2018-03-19 Thread Bart Van Assche
On Mon, 2018-03-19 at 07:31 -0700, Tejun Heo wrote:
> On Fri, Mar 16, 2018 at 10:35:16AM -0700, Bart Van Assche wrote:
> > Since blk_queue_enter() uses rcu_read_lock_sched() scsi_device_quiesce()
> > must use synchronize_sched().
> 
> Is there a reason to use sched-RCU here instead of the regular one?
> If not, it'd be better to switch to regular RCU than the other way
> around.

Hello Tejun,

As explained in the comment in scsi_device_quiesce(), the effect of
blk_set_preempt_only() must be visible for percpu_ref_tryget() calls that
occur after the queue unfreeze triggered by scsi_device_quiesce(). Hence the
RCU read lock calls in blk_queue_enter(). Since these RCU read lock calls
surround a function call that uses rcu_read_lock_sched(), namely
percpu_ref_tryget_live(), we have to use sched-RCU in both blk_queue_enter()
and scsi_device_quiesce().

Bart.





Re: [PATCH] Change synchronize_rcu() in scsi_device_quiesce() into synchronize_sched()

2018-03-19 Thread Tejun Heo
On Fri, Mar 16, 2018 at 10:35:16AM -0700, Bart Van Assche wrote:
> Since blk_queue_enter() uses rcu_read_lock_sched() scsi_device_quiesce()
> must use synchronize_sched().

Is there a reason to use sched-RCU here instead of the regular one?
If not, it'd be better to switch to regular RCU than the other way
around.

Thanks.

-- 
tejun


Re: [PATCH] Change synchronize_rcu() in scsi_device_quiesce() into synchronize_sched()

2018-03-19 Thread Martin Steigerwald
Hi Bart.

Bart Van Assche - 16.03.18, 22:51:
> On 03/16/18 14:42, Martin Steigerwald wrote:
> > What is this one about?
> > 
> > Fix for the regression I (and others?) reported?¹
> > 
> > [1] [Bug 199077] [Possible REGRESSION, 4.16-rc4] Error updating SMART data
> > during runtime and boot failures with blk_mq_terminate_expired in
> > backtrace
> > 
> > https://bugzilla.kernel.org/show_bug.cgi?id=199077
[…]
> This patch is a fix for the fix for the bug that you and others had
> reported. See also "I/O hangs after resuming from suspend-to-ram"
> (https://marc.info/?l=linux-block=150340235201348).
> 
> This patch fixes an API violation. For those users for which suspend and
> resume works fine with commit 3a0a529971ec applied it should still work
> fine with this patch applied since this patch may cause
> scsi_device_quiesce() to wait longer.

Okay, so if I understand you correctly, this is not related to the regression 
I mentioned above.

Testing anyway.

Thanks,
-- 
Martin


Re: [PATCH] Change synchronize_rcu() in scsi_device_quiesce() into synchronize_sched()

2018-03-16 Thread Bart Van Assche

On 03/16/18 14:42, Martin Steigerwald wrote:

What is this one about?

Fix for the regression I (and others?) reported?¹

[1] [Bug 199077] [Possible REGRESSION, 4.16-rc4] Error updating SMART data
during runtime and boot failures with blk_mq_terminate_expired in backtrace

https://bugzilla.kernel.org/show_bug.cgi?id=199077


Hello Martin,

This patch is a fix for the fix for the bug that you and others had 
reported. See also "I/O hangs after resuming from suspend-to-ram" 
(https://marc.info/?l=linux-block=150340235201348).


This patch fixes an API violation. For those users for which suspend and 
resume works fine with commit 3a0a529971ec applied it should still work 
fine with this patch applied since this patch may cause 
scsi_device_quiesce() to wait longer.


Bart.


Re: [PATCH] Change synchronize_rcu() in scsi_device_quiesce() into synchronize_sched()

2018-03-16 Thread Martin Steigerwald
Hello Bart.

What is this one about?

Fix for the regression I (and others?) reported?¹

[1] [Bug 199077] [Possible REGRESSION, 4.16-rc4] Error updating SMART data 
during runtime and boot failures with blk_mq_terminate_expired in backtrace

https://bugzilla.kernel.org/show_bug.cgi?id=199077

Thanks,
Martin

Bart Van Assche - 16.03.18, 18:35:
> Since blk_queue_enter() uses rcu_read_lock_sched() scsi_device_quiesce()
> must use synchronize_sched().
> 
> Reported-by: Tejun Heo 
> Fixes: 3a0a529971ec ("block, scsi: Make SCSI quiesce and resume work
> reliably") Signed-off-by: Bart Van Assche 
> Cc: Hannes Reinecke 
> Cc: Ming Lei 
> Cc: Christoph Hellwig 
> Cc: Johannes Thumshirn 
> Cc: Tejun Heo 
> Cc: Oleksandr Natalenko 
> Cc: Martin Steigerwald 
> Cc: sta...@vger.kernel.org # v4.15
> ---
>  drivers/scsi/scsi_lib.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 1d83f29aee74..0b99ee2fbbb5 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -3014,7 +3014,7 @@ scsi_device_quiesce(struct scsi_device *sdev)
>* unfreeze even if the queue was already frozen before this function
>* was called. See also https://lwn.net/Articles/573497/.
>*/
> - synchronize_rcu();
> + synchronize_sched();
>   blk_mq_unfreeze_queue(q);
> 
>   mutex_lock(>state_mutex);


-- 
Martin


[PATCH] Change synchronize_rcu() in scsi_device_quiesce() into synchronize_sched()

2018-03-16 Thread Bart Van Assche
Since blk_queue_enter() uses rcu_read_lock_sched() scsi_device_quiesce()
must use synchronize_sched().

Reported-by: Tejun Heo 
Fixes: 3a0a529971ec ("block, scsi: Make SCSI quiesce and resume work reliably")
Signed-off-by: Bart Van Assche 
Cc: Hannes Reinecke 
Cc: Ming Lei 
Cc: Christoph Hellwig 
Cc: Johannes Thumshirn 
Cc: Tejun Heo 
Cc: Oleksandr Natalenko 
Cc: Martin Steigerwald 
Cc: sta...@vger.kernel.org # v4.15
---
 drivers/scsi/scsi_lib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 1d83f29aee74..0b99ee2fbbb5 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -3014,7 +3014,7 @@ scsi_device_quiesce(struct scsi_device *sdev)
 * unfreeze even if the queue was already frozen before this function
 * was called. See also https://lwn.net/Articles/573497/.
 */
-   synchronize_rcu();
+   synchronize_sched();
blk_mq_unfreeze_queue(q);
 
mutex_lock(>state_mutex);
-- 
2.16.2