Re: Reported regressions for 4.7 as of Sunday, 2016-06-19
> "Linus" == Linus Torvalds writes: Linus> It's not in my tree, at least. Not in scsi-fixes either. I have been waiting for a "real" patch submission with one or more Tested-by: tags. I generally don't queue something that comes with a "try this untested workaround" patch description. Quinn, please submit a real patch. Linus> And I don't think I've seen a "yes, that fixes it". Although Linus> Johannes was right that in addition to that, the ordering of the Linus> irq setup should probably _also_ be fixed, but that's a separate Linus> patch. -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reported regressions for 4.7 as of Sunday, 2016-06-19
On 05.07.2016 19:32, Linus Torvalds wrote: > On Tue, Jul 5, 2016 at 9:30 AM, Josh Boyer wrote: >> On Wed, Jun 22, 2016 at 11:57 AM, Quinn Tran wrote: >>> >>> - if (rsp->msix->cpuid != smp_processor_id()) { >>> + if (rsp->msix && (rsp->msix->cpuid != smp_processor_id())) { >> >> Did this wind up going into an official commit somewhere? > It's not in my tree, at least. > And I don't think I've seen a "yes, that fixes it". Quinn Tran ACKed a nearly identical patch from Bruno Prémont in a different thread: http://thread.gmane.org/gmane.linux.kernel/2257008/focus=2257139 >From what I can see in the initial mail in that thread it seems Bruno successfully tested the patch he submitted. But I have no idea if the patch is in someones queue to mainline right now. That's why I had it on my "if nothing happens soon, poke someone" list... HTH, CU, Thorsten -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reported regressions for 4.7 as of Sunday, 2016-06-19
On Tue, Jul 5, 2016 at 9:30 AM, Josh Boyer wrote: > On Wed, Jun 22, 2016 at 11:57 AM, Quinn Tran wrote: >> >> - if (rsp->msix->cpuid != smp_processor_id()) { >> + if (rsp->msix && (rsp->msix->cpuid != smp_processor_id())) { > > Did this wind up going into an official commit somewhere? It's not in my tree, at least. And I don't think I've seen a "yes, that fixes it". Although Johannes was right that in addition to that, the ordering of the irq setup should probably _also_ be fixed, but that's a separate patch. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reported regressions for 4.7 as of Sunday, 2016-06-19
Linus Torvalds writes: > On Thu, Jun 23, 2016 at 9:13 AM, Quinn Tran wrote: >> >> >> QT: setting up the interrupt vector does not mean the interrupt starts >> firing immediately. > > Actually, it very much can mean that. If the interrupt can possibly be > shared, there is a very real possibility of it fiding immediately. > > Now, with MSI(-X) I guess that isn't a worry, so I suspect your patch > that handles just the legacy INTx case anyway is sufficient, but in > general I would like people to always act as if interrupts can happen > immediately after request_irq(). > > We have had *tons* of situations where the firmware left a device > active, for example. Or where some random interrupt controller ended > up having stale interrupts pending, even. > > So in general, it's just good practice to say "spurious interrupts can > and do happen" - the shared irq case is the most obvious case, but > there have been other sources of unexpected spurious interrupts > firing. One case that occassionally bytes even for MSI-X is the case of kexec on panic where the hardware was not shut down before the kernel starts, and the start of the kernel masks the irq. Then when the driver initializes and calls request_irq it is possible for an irq to be pending as soon as the MSI-X irq is actually enabled to the hardware. And there is always CONFIG_IRQ_DEBUG which always acts like an interrupt happens right when after request_irq finishes. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reported regressions for 4.7 as of Sunday, 2016-06-19
-Original Message- From: Johannes Thumshirn Date: Thursday, June 23, 2016 at 12:22 AM To: Quinn Tran Cc: "Martin K. Petersen" , Linus Torvalds , Josh Boyer , Thorsten Leemhuis , linux-kernel , linux-scsi Subject: Re: Reported regressions for 4.7 as of Sunday, 2016-06-19 >[+ Cc linux-scsi@vger.kernel.org ] > >On Wed, Jun 22, 2016 at 03:57:35PM +, Quinn Tran wrote: >> Johannes, Martin, >> >> Based on the screen shot/call trace, it looks like this adapter is not >> using MSIX. It defaulted back to MSI or INTx interrupt. The code made an >> assumption of MSIX is available. There is no point in go through that code >> segment. >> >> Can you try this work around? It’s untested. Thanks. >> >> >> diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c >> index 5649c20..e033ecb 100644 >> --- a/drivers/scsi/qla2xxx/qla_isr.c >> +++ b/drivers/scsi/qla2xxx/qla_isr.c >> @@ -2548,7 +2548,7 @@ void qla24xx_process_response_queue(struct >> scsi_qla_host *vha, >> if (!vha->flags.online) >> return; >> >> - if (rsp->msix->cpuid != smp_processor_id()) { >> + if (rsp->msix && (rsp->msix->cpuid != smp_processor_id())) { >> /* if kernel does not notify qla of IRQ's CPU change, >> * then set it here. >> */ >> > >But this still does not fix the race which would be possible if the HBA is >using MSI-X but triggering IRQs early enough. > >Have a look at this (I admit theoretical) path: >qla24xx_enable_msix(struct qla_hw_data *ha, struct rsp_que *rsp) >{ > [...] > /* Enable MSI-X vectors for the base queue */ > for (i = 0; i < 2; i++) { > qentry = &ha->msix_entries[i]; > if (IS_P3P_TYPE(ha)) > ret = request_irq(qentry->vector, > qla82xx_msix_entries[i].handler, > 0, qla82xx_msix_entries[i].name, rsp); > else > ret = request_irq(qentry->vector, > msix_entries[i].handler, > 0, msix_entries[i].name, rsp); > if (ret) > goto msix_register_fail; > <--- IRQ arrives here QT: setting up the interrupt vector does not mean the interrupt starts firing immediately. Interrupt starting firing when the driver is ready to accept the interrupt by enabling the interrupt (ha->isp_ops->enable_intrs(ha)) later on in time. In addition, that particular code path/qla24xx_process_response_queue is not executed until driver feeds commands to the hardware work queue. IF there is a left over interrupt that happens to trigger the call immediately, there is another check that prevent the code from getting to the point of the “theoretical" race. > qentry->have_irq = 1; > qentry->rsp = rsp; > rsp->msix = qentry; > > [...] > > >void qla24xx_process_response_queue(struct scsi_qla_host *vha, >struct rsp_que *rsp) >{ --->8-- if (!vha->flags.online) return; ---8<-- > if (rsp->msix->cpuid != smp_processor_id()) { > ^ > \--- rsp->msix == NULL > > /* if kernel does not notify qla of IRQ's CPU change, >* then set it here. >*/ > rsp->msix->cpuid = smp_processor_id(); > ha->tgt.rspq_vector_cpuid = rsp->msix->cpuid; > >-- >Johannes Thumshirn Storage >jthumsh...@suse.de+49 911 74053 689 >SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg >GF: Felix Imendörffer, Jane Smithard, Graham Norton >HRB 21284 (AG Nürnberg) >Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: Reported regressions for 4.7 as of Sunday, 2016-06-19
On Thu, Jun 23, 2016 at 9:13 AM, Quinn Tran wrote: > > > QT: setting up the interrupt vector does not mean the interrupt starts firing > immediately. Actually, it very much can mean that. If the interrupt can possibly be shared, there is a very real possibility of it fiding immediately. Now, with MSI(-X) I guess that isn't a worry, so I suspect your patch that handles just the legacy INTx case anyway is sufficient, but in general I would like people to always act as if interrupts can happen immediately after request_irq(). We have had *tons* of situations where the firmware left a device active, for example. Or where some random interrupt controller ended up having stale interrupts pending, even. So in general, it's just good practice to say "spurious interrupts can and do happen" - the shared irq case is the most obvious case, but there have been other sources of unexpected spurious interrupts firing. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reported regressions for 4.7 as of Sunday, 2016-06-19
[+ Cc linux-scsi@vger.kernel.org ] On Wed, Jun 22, 2016 at 03:57:35PM +, Quinn Tran wrote: > Johannes, Martin, > > Based on the screen shot/call trace, it looks like this adapter is not using > MSIX. It defaulted back to MSI or INTx interrupt. The code made an > assumption of MSIX is available. There is no point in go through that code > segment. > > Can you try this work around? It’s untested. Thanks. > > > diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c > index 5649c20..e033ecb 100644 > --- a/drivers/scsi/qla2xxx/qla_isr.c > +++ b/drivers/scsi/qla2xxx/qla_isr.c > @@ -2548,7 +2548,7 @@ void qla24xx_process_response_queue(struct > scsi_qla_host *vha, > if (!vha->flags.online) > return; > > - if (rsp->msix->cpuid != smp_processor_id()) { > + if (rsp->msix && (rsp->msix->cpuid != smp_processor_id())) { > /* if kernel does not notify qla of IRQ's CPU change, > * then set it here. > */ > But this still does not fix the race which would be possible if the HBA is using MSI-X but triggering IRQs early enough. Have a look at this (I admit theoretical) path: qla24xx_enable_msix(struct qla_hw_data *ha, struct rsp_que *rsp) { [...] /* Enable MSI-X vectors for the base queue */ for (i = 0; i < 2; i++) { qentry = &ha->msix_entries[i]; if (IS_P3P_TYPE(ha)) ret = request_irq(qentry->vector, qla82xx_msix_entries[i].handler, 0, qla82xx_msix_entries[i].name, rsp); else ret = request_irq(qentry->vector, msix_entries[i].handler, 0, msix_entries[i].name, rsp); if (ret) goto msix_register_fail; <--- IRQ arrives here qentry->have_irq = 1; qentry->rsp = rsp; rsp->msix = qentry; [...] void qla24xx_process_response_queue(struct scsi_qla_host *vha, struct rsp_que *rsp) { [...] if (rsp->msix->cpuid != smp_processor_id()) { ^ \--- rsp->msix == NULL /* if kernel does not notify qla of IRQ's CPU change, * then set it here. */ rsp->msix->cpuid = smp_processor_id(); ha->tgt.rspq_vector_cpuid = rsp->msix->cpuid; -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html