Re: Reported regressions for 4.7 as of Sunday, 2016-06-19

2016-07-05 Thread Martin K. Petersen
> "Linus" == Linus Torvalds  writes:

Linus> It's not in my tree, at least.

Not in scsi-fixes either. I have been waiting for a "real" patch
submission with one or more Tested-by: tags. I generally don't queue
something that comes with a "try this untested workaround" patch
description.

Quinn, please submit a real patch.

Linus> And I don't think I've seen a "yes, that fixes it". Although
Linus> Johannes was right that in addition to that, the ordering of the
Linus> irq setup should probably _also_ be fixed, but that's a separate
Linus> patch.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reported regressions for 4.7 as of Sunday, 2016-06-19

2016-07-05 Thread Thorsten Leemhuis
On 05.07.2016 19:32, Linus Torvalds wrote:
> On Tue, Jul 5, 2016 at 9:30 AM, Josh Boyer  wrote:
>> On Wed, Jun 22, 2016 at 11:57 AM, Quinn Tran  wrote:
>>>
>>> -   if (rsp->msix->cpuid != smp_processor_id()) {
>>> +   if (rsp->msix && (rsp->msix->cpuid != smp_processor_id())) {
>>
>> Did this wind up going into an official commit somewhere?
> It's not in my tree, at least.
> And I don't think I've seen a "yes, that fixes it".

Quinn Tran ACKed a nearly identical patch from Bruno Prémont in a
different thread:
http://thread.gmane.org/gmane.linux.kernel/2257008/focus=2257139

>From what I can see in the initial mail in that thread it seems Bruno
successfully tested the patch he submitted. But I have no idea if the
patch is in someones queue to mainline right now. That's why I had it on
my "if nothing happens soon, poke someone" list...

HTH, CU, Thorsten


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reported regressions for 4.7 as of Sunday, 2016-06-19

2016-07-05 Thread Linus Torvalds
On Tue, Jul 5, 2016 at 9:30 AM, Josh Boyer  wrote:
> On Wed, Jun 22, 2016 at 11:57 AM, Quinn Tran  wrote:
>>
>> -   if (rsp->msix->cpuid != smp_processor_id()) {
>> +   if (rsp->msix && (rsp->msix->cpuid != smp_processor_id())) {
>
> Did this wind up going into an official commit somewhere?

It's not in my tree, at least.

And I don't think I've seen a "yes, that fixes it". Although Johannes
was right that in addition to that, the ordering of the irq setup
should probably _also_ be fixed, but that's a separate patch.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reported regressions for 4.7 as of Sunday, 2016-06-19

2016-06-23 Thread Eric W. Biederman
Linus Torvalds  writes:

> On Thu, Jun 23, 2016 at 9:13 AM, Quinn Tran  wrote:
>>
>>
>> QT: setting up the interrupt vector does not mean the interrupt starts 
>> firing immediately.
>
> Actually, it very much can mean that. If the interrupt can possibly be
> shared, there is a very real possibility of it fiding immediately.
>
> Now, with MSI(-X) I guess that isn't a worry, so I suspect your patch
> that handles just the legacy INTx case anyway is sufficient, but in
> general I would like people to always act as if interrupts can happen
> immediately after request_irq().
>
> We have had *tons* of situations where the firmware left a device
> active, for example. Or where some random interrupt controller ended
> up having stale interrupts pending, even.
>
> So in general, it's just good practice to say "spurious interrupts can
> and do happen" - the shared irq case is the most obvious case, but
> there have been other sources of unexpected spurious interrupts
> firing.

One case that occassionally bytes even for MSI-X is the case of kexec on
panic where the hardware was not shut down before the kernel starts, and
the start of the kernel masks the irq.  Then when the driver initializes
and calls request_irq it is possible for an irq to be pending as soon as
the MSI-X irq is actually enabled to the hardware.

And there is always CONFIG_IRQ_DEBUG which always acts like an interrupt
happens right when after request_irq finishes.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reported regressions for 4.7 as of Sunday, 2016-06-19

2016-06-23 Thread Quinn Tran

-Original Message-
From: Johannes Thumshirn <jthumsh...@suse.de>
Date: Thursday, June 23, 2016 at 12:22 AM
To: Quinn Tran <quinn.t...@qlogic.com>
Cc: "Martin K. Petersen" <martin.peter...@oracle.com>, Linus Torvalds 
<torva...@linux-foundation.org>, Josh Boyer <jwbo...@fedoraproject.org>, 
Thorsten Leemhuis <regressi...@leemhuis.info>, linux-kernel 
<linux-ker...@vger.kernel.org>, linux-scsi <linux-scsi@vger.kernel.org>
Subject: Re: Reported regressions for 4.7 as of Sunday, 2016-06-19

>[+ Cc linux-scsi@vger.kernel.org ]
>
>On Wed, Jun 22, 2016 at 03:57:35PM +, Quinn Tran wrote:
>> Johannes,  Martin,
>> 
>> Based on the screen shot/call trace,  it looks like this adapter is not 
>> using MSIX.  It defaulted back to MSI or INTx interrupt.  The code made an 
>> assumption  of MSIX is available.  There is no point in go through that code 
>> segment.
>> 
>> Can you try this work around?  It’s untested.  Thanks.
>> 
>> 
>> diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
>> index 5649c20..e033ecb 100644
>> --- a/drivers/scsi/qla2xxx/qla_isr.c
>> +++ b/drivers/scsi/qla2xxx/qla_isr.c
>> @@ -2548,7 +2548,7 @@ void qla24xx_process_response_queue(struct 
>> scsi_qla_host *vha,
>> if (!vha->flags.online)
>> return;
>>  
>> -   if (rsp->msix->cpuid != smp_processor_id()) {
>> +   if (rsp->msix && (rsp->msix->cpuid != smp_processor_id())) {
>> /* if kernel does not notify qla of IRQ's CPU change,
>>  * then set it here.
>>  */
>> 
>
>But this still does not fix the race which would be possible if the HBA is
>using MSI-X but triggering IRQs early enough.
>
>Have a look at this (I admit theoretical) path:
>qla24xx_enable_msix(struct qla_hw_data *ha, struct rsp_que *rsp)
>{
>   [...]
>   /* Enable MSI-X vectors for the base queue */
>   for (i = 0; i < 2; i++) {
>   qentry = >msix_entries[i];
>   if (IS_P3P_TYPE(ha))
>   ret = request_irq(qentry->vector,
>   qla82xx_msix_entries[i].handler,
>   0, qla82xx_msix_entries[i].name, rsp);
>   else
>   ret = request_irq(qentry->vector,
>   msix_entries[i].handler,
>   0, msix_entries[i].name, rsp);
>   if (ret)
>   goto msix_register_fail;
>   <--- IRQ arrives here

QT: setting up the interrupt vector does not mean the interrupt starts firing 
immediately.  Interrupt starting firing when the driver is ready to accept the 
interrupt by enabling the interrupt (ha->isp_ops->enable_intrs(ha)) later on in 
time.  In addition, that particular code path/qla24xx_process_response_queue  
is not executed until driver feeds commands to the hardware work queue.  

IF there is a left over interrupt that happens to trigger the call immediately, 
there is another check that prevent the code from getting to the point of the 
“theoretical" race.


>   qentry->have_irq = 1;
>   qentry->rsp = rsp;
>   rsp->msix = qentry;
>
>   [...]
>
>
>void qla24xx_process_response_queue(struct scsi_qla_host *vha,
>struct rsp_que *rsp)
>{
--->8--
if (!vha->flags.online)
return;

---8<--
>   if (rsp->msix->cpuid != smp_processor_id()) {
>  ^
>  \--- rsp->msix == NULL
>
>   /* if kernel does not notify qla of IRQ's CPU change,
>* then set it here.
>*/
>   rsp->msix->cpuid = smp_processor_id();
>   ha->tgt.rspq_vector_cpuid = rsp->msix->cpuid;
>
>-- 
>Johannes Thumshirn  Storage
>jthumsh...@suse.de+49 911 74053 689
>SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
>GF: Felix Imendörffer, Jane Smithard, Graham Norton
>HRB 21284 (AG Nürnberg)
>Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: Reported regressions for 4.7 as of Sunday, 2016-06-19

2016-06-23 Thread Linus Torvalds
On Thu, Jun 23, 2016 at 9:13 AM, Quinn Tran  wrote:
>
>
> QT: setting up the interrupt vector does not mean the interrupt starts firing 
> immediately.

Actually, it very much can mean that. If the interrupt can possibly be
shared, there is a very real possibility of it fiding immediately.

Now, with MSI(-X) I guess that isn't a worry, so I suspect your patch
that handles just the legacy INTx case anyway is sufficient, but in
general I would like people to always act as if interrupts can happen
immediately after request_irq().

We have had *tons* of situations where the firmware left a device
active, for example. Or where some random interrupt controller ended
up having stale interrupts pending, even.

So in general, it's just good practice to say "spurious interrupts can
and do happen" - the shared irq case is the most obvious case, but
there have been other sources of unexpected spurious interrupts
firing.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reported regressions for 4.7 as of Sunday, 2016-06-19

2016-06-23 Thread Johannes Thumshirn
[+ Cc linux-scsi@vger.kernel.org ]

On Wed, Jun 22, 2016 at 03:57:35PM +, Quinn Tran wrote:
> Johannes,  Martin,
> 
> Based on the screen shot/call trace,  it looks like this adapter is not using 
> MSIX.  It defaulted back to MSI or INTx interrupt.  The code made an 
> assumption  of MSIX is available.  There is no point in go through that code 
> segment.
> 
> Can you try this work around?  It’s untested.  Thanks.
> 
> 
> diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
> index 5649c20..e033ecb 100644
> --- a/drivers/scsi/qla2xxx/qla_isr.c
> +++ b/drivers/scsi/qla2xxx/qla_isr.c
> @@ -2548,7 +2548,7 @@ void qla24xx_process_response_queue(struct 
> scsi_qla_host *vha,
> if (!vha->flags.online)
> return;
>  
> -   if (rsp->msix->cpuid != smp_processor_id()) {
> +   if (rsp->msix && (rsp->msix->cpuid != smp_processor_id())) {
> /* if kernel does not notify qla of IRQ's CPU change,
>  * then set it here.
>  */
> 

But this still does not fix the race which would be possible if the HBA is
using MSI-X but triggering IRQs early enough.

Have a look at this (I admit theoretical) path:
qla24xx_enable_msix(struct qla_hw_data *ha, struct rsp_que *rsp)
{
[...]
/* Enable MSI-X vectors for the base queue */
for (i = 0; i < 2; i++) {
qentry = >msix_entries[i];
if (IS_P3P_TYPE(ha))
ret = request_irq(qentry->vector,
qla82xx_msix_entries[i].handler,
0, qla82xx_msix_entries[i].name, rsp);
else
ret = request_irq(qentry->vector,
msix_entries[i].handler,
0, msix_entries[i].name, rsp);
if (ret)
goto msix_register_fail;
<--- IRQ arrives here
qentry->have_irq = 1;
qentry->rsp = rsp;
rsp->msix = qentry;

[...]


void qla24xx_process_response_queue(struct scsi_qla_host *vha,
struct rsp_que *rsp)
{
[...]
if (rsp->msix->cpuid != smp_processor_id()) {
  ^
  \--- rsp->msix == NULL

/* if kernel does not notify qla of IRQ's CPU change,
 * then set it here.
 */
rsp->msix->cpuid = smp_processor_id();
ha->tgt.rspq_vector_cpuid = rsp->msix->cpuid;

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html