Re: [GIT PULL] SCSI fixes for 4.13-rc6

2017-08-23 Thread Damien Le Moal

On 8/24/17 00:27, Martin K. Petersen wrote:
> 
> Hi Bart,
> 
>> Had you noticed that Damien had asked not to send the "sd_zbc: Write
>> unlock zone from sd_uninit_cmnd()" patch to Linus without my "scsi-mq:
>> Always unprepare before requeuing a request" patch?
> 
> He did change his mind later in that thread, though.
> 
> However, what's more important is that we still need a good version of
> your patch for 4.13. I took Brian's workaround for ipr but I still think
> Christoph's concerns need to be addressed for me to put your change back
> in.

Yes, I did ask for the patch to be kept since it does not make things
worse for the SMR/mq case (still have a potential dispatch deadlock) but
does cleanup the code in the legacy/sq case.

Since it looks like the jiffies_at_alloc and retry counter
initialization fixes are progressing, we hopefully should be able to get
Bart's pacth back in soon (this week ?) and remove the mq dispatch deadlock.

Best regards.

-- 
Damien Le Moal,
Western Digital Research


Re: [GIT PULL] SCSI fixes for 4.13-rc6

2017-08-23 Thread Brian King
On 08/23/2017 10:44 AM, Bart Van Assche wrote:
> On Wed, 2017-08-23 at 11:27 -0400, Martin K. Petersen wrote:
>> However, what's more important is that we still need a good version of
>> your patch for 4.13. I took Brian's workaround for ipr but I still think
>> Christoph's concerns need to be addressed for me to put your change back
>> in.
> 
> Hello Martin,
> 
> I am not aware of any requests to modify the patch "scsi-mq: Always unprepare
> before requeuing a request". See also
> https://www.spinics.net/lists/linux-scsi/msg111541.html. Are you perhaps
> referring to another patch?
> 
> What I remember is that my patch uncovered a bug in the ipr driver. As you
> mentioned, a workaround for that bug has already been queued for kernel v4.14
> (https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git/commit/?h=4.13/scsi-fixes=723cd772fde2344a9810eeaf5106787d535ec4a4).

I don't completely agree with that statement. The patch "scsi-mq: Always 
unprepare
before requeueing a request" introduces a regression in the scsi stack.
It alters the behavior of retries such that the retry counter no longer works
and the jiffies_at_alloc use to ensure we don't spend a tremendous amount of
time retrying ops gets broken as well.

As far as ipr is concerned, we do have the workaround in place now and I'll also
queue up a further improvement to ipr to return a better failure response. 
However,
until we fix the retry counter in scsi, any driver that returns an error 
response that
scsi wants to retry could get us stuck in an eternal retry loop, like we were
seeing with ipr.

> Further improvements for the SCSI core that are the result of the analysis of
> the behavior of the SCSI subsystem on PowerPC systems are under discussion. 
> See
> also "[PATCHv2 1/2] scsi: Move scsi_cmd->jiffies_at_alloc initialization to
> allocation time" (https://marc.info/?l=linux-next=150335524812989) and
> "[PATCH 2/2] scsi: Preserve retry counter through scsi_prep_fn"
> (https://marc.info/?l=linux-scsi=150335371112485).

While my patches highlight the problem, I don't think they are the right fix
and need to be reworked. It looks like we go through scsi_init_rq at
hctx setup time rather than for each new i/o submission. Adding a simple
kprobe to scsi_init_rq never triggers when issuing i/o to already configured
devices. Therefore, we cannot simply move the initialization of 
jiffies_at_alloc to
scsi_init_rq. I've been working on moving the retry counter and jiffies_at_alloc
into struct scsi_request, as that doesn't get reinitialized.

Thanks,

Brian 

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center



Re: [GIT PULL] SCSI fixes for 4.13-rc6

2017-08-23 Thread Bart Van Assche
On Wed, 2017-08-23 at 11:27 -0400, Martin K. Petersen wrote:
> However, what's more important is that we still need a good version of
> your patch for 4.13. I took Brian's workaround for ipr but I still think
> Christoph's concerns need to be addressed for me to put your change back
> in.

Hello Martin,

I am not aware of any requests to modify the patch "scsi-mq: Always unprepare
before requeuing a request". See also
https://www.spinics.net/lists/linux-scsi/msg111541.html. Are you perhaps
referring to another patch?

What I remember is that my patch uncovered a bug in the ipr driver. As you
mentioned, a workaround for that bug has already been queued for kernel v4.14
(https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git/commit/?h=4.13/scsi-fixes=723cd772fde2344a9810eeaf5106787d535ec4a4).

Further improvements for the SCSI core that are the result of the analysis of
the behavior of the SCSI subsystem on PowerPC systems are under discussion. See
also "[PATCHv2 1/2] scsi: Move scsi_cmd->jiffies_at_alloc initialization to
allocation time" (https://marc.info/?l=linux-next=150335524812989) and
"[PATCH 2/2] scsi: Preserve retry counter through scsi_prep_fn"
(https://marc.info/?l=linux-scsi=150335371112485).

Please correct me if I got anything wrong.

Bart.

Re: [GIT PULL] SCSI fixes for 4.13-rc6

2017-08-23 Thread Martin K. Petersen

Hi Bart,

> Had you noticed that Damien had asked not to send the "sd_zbc: Write
> unlock zone from sd_uninit_cmnd()" patch to Linus without my "scsi-mq:
> Always unprepare before requeuing a request" patch?

He did change his mind later in that thread, though.

However, what's more important is that we still need a good version of
your patch for 4.13. I took Brian's workaround for ipr but I still think
Christoph's concerns need to be addressed for me to put your change back
in.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [GIT PULL] SCSI fixes for 4.13-rc6

2017-08-23 Thread Bart Van Assche
On Wed, 2017-08-23 at 07:42 +0100, James Bottomley wrote:
> Six minor and error leg fixes, plus one major change: the reversion of
> scsi-mq as the default.  We're doing the latter temporarily (with a
> backport to stable) to give us time to fix all the issues that turned
> up with this default before trying again.
> [ ... ]
> Damien Le Moal (1):
>   scsi: sd_zbc: Write unlock zone from sd_uninit_cmnd()
> [ ... ]

Hello James,

Had you noticed that Damien had asked not to send the "sd_zbc: Write unlock
zone from sd_uninit_cmnd()" patch to Linus without my "scsi-mq: Always
unprepare before requeuing a request" patch? A quote from an e-mail from
Damien (https://www.spinics.net/lists/stable/msg185436.html):

---
Re: [PATCH] Revert "scsi-mq: Always unprepare before requeuing a request"
>> For an unknown reason this patch causes the boot process to hang on
>> PowerPC systems:
> 
> OK, dropped it from fixes for now.

It means that commit 70e42fd02c46e2aa9ab07b766d418637e3a51de7 "scsi:
sd_zbc: Write unlock zone from sd_uninit_cmnd()" will need to be
reverted too as it will not solve the potential deadlock anymore. Bart's
patch was needed for it to work.
---

Bart.