Re: [PATCH V2] PCI/portdrv: do not disable device on reboot/shutdown
On 5/24/2018 2:35 PM, Bjorn Helgaas wrote: > That sounds like a reasonable idea, and it is definitely another can > of worms. I looked briefly at some of the .shutdown() cases: should we throw it into 4.18 and see what happens? -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [PATCH V2] PCI/portdrv: do not disable device on reboot/shutdown
On 5/23/2018 6:57 PM, Sinan Kaya wrote: >> The crash seems to indicate that the hpsa device attempted a DMA after >> we cleared the Root Port's PCI_COMMAND_MASTER, which means >> hpsa_shutdown() didn't stop DMA from the device (it looks like *most* >> shutdown methods don't disable device DMA, so it's in good company). > All drivers are expected to shutdown DMA and interrupts in their shutdown() > routines. They can skip removing threads, data structures etc. but DMA and > interrupt disabling are required. This is the difference between shutdown() > and remove() callbacks. I found this note yesterday to see why we are not disabling the devices in the PCI core itself. pci_device_remove() /* * We would love to complain here if pci_dev->is_enabled is set, that * the driver should have called pci_disable_device(), but the * unfortunate fact is there are too many odd BIOS and bridge setups * that don't like drivers doing that all of the time. * Oh well, we can dream of sane hardware when we sleep, no matter how * horrible the crap we have to deal with is when we are awake... */ Ryan, can you discard the previous patch and test this one instead? remove() path in hpsa driver seems to call pci_disable_device() via hpsa_remove_one() hpsa_free_pci_init() but nothing on the shutdown path. diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c index 4ed3d26..3823f04 100644 --- a/drivers/scsi/hpsa.c +++ b/drivers/scsi/hpsa.c @@ -8651,6 +8651,7 @@ static void hpsa_shutdown(struct pci_dev *pdev) h->access.set_intr_mask(h, HPSA_INTR_OFF); hpsa_free_irqs(h); /* init_one 4 */ hpsa_disable_interrupt_mode(h); /* pci_init 2 */ + pci_disable_device(h->pdev); } -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [PATCH V2] PCI/portdrv: do not disable device on reboot/shutdown
On 5/23/2018 5:32 PM, Bjorn Helgaas wrote: > > The crash seems to indicate that the hpsa device attempted a DMA after > we cleared the Root Port's PCI_COMMAND_MASTER, which means > hpsa_shutdown() didn't stop DMA from the device (it looks like *most* > shutdown methods don't disable device DMA, so it's in good company). All drivers are expected to shutdown DMA and interrupts in their shutdown() routines. They can skip removing threads, data structures etc. but DMA and interrupt disabling are required. This is the difference between shutdown() and remove() callbacks. If you see that this is not being done in HPSA, then that is where the bugfix should be. Counter argument is that if shutdown() is not implemented, at least remove() should be called. Expecting all drivers to implement shutdown() callbacks is just bad by design in my opinion. Code should have fallen back to remove() if shutdown() doesn't exist. I can propose a patch for this but this is yet another story to chase. > >> This has been found to cause crashes on HP DL360 Gen9 machines during >> reboot. Besides, kexec is already clearing the bus master bit in >> pci_device_shutdown() after all PCI drivers are removed. > > The original path was: > > pci_device_shutdown(hpsa) > drv->shutdown > hpsa_shutdown # hpsa_pci_driver.shutdown > ... > pci_device_shutdown(RP) # root port > drv->shutdown > pcie_portdrv_remove # pcie_portdriver.shutdown > pcie_port_device_remove > pci_disable_device > do_pci_disable_device > # clear RP PCI_COMMAND_MASTER > if (kexec) > pci_clear_master(RP) > # clear RP PCI_COMMAND_MASTER > > If I understand correctly, the new path after this patch is: > > pci_device_shutdown(hpsa) > drv->shutdown > hpsa_shutdown # hpsa_pci_driver.shutdown > ... > pci_device_shutdown(RP) # root port > drv->shutdown > pcie_portdrv_shutdown # pcie_portdriver.shutdown > __pcie_portdrv_remove(RP, false) > pcie_port_device_remove(RP, false) > # do NOT clear RP PCI_COMMAND_MASTER yup > if (kexec) > pci_clear_master(RP) > # clear RP PCI_COMMAND_MASTER > > I guess this patch avoids the panic during reboot because we're not in > the kexec path, so we never clear PCI_COMMAND_MASTER for the Root > Port, so the hpsa device can DMA happily until the lights go out. > > But DMA continuing for some random amount of time before the reboot or > shutdown happens makes me a little queasy. That doesn't sound safe. > The more I think about this, the more confused I get. What am I > missing? see above. > >> Just remove the extra clear in shutdown path by seperating the remove and >> shutdown APIs in the PORTDRV. >> >> static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev, >> @@ -218,7 +228,7 @@ static struct pci_driver pcie_portdriver = { >> >> .probe = pcie_portdrv_probe, >> .remove = pcie_portdrv_remove, >> -.shutdown = pcie_portdrv_remove, >> +.shutdown = pcie_portdrv_shutdown, > > What are the circumstances when we call .remove() vs .shutdown()? > > I guess the main (maybe only) way to call .remove() is to hot-remove > the port? And .shutdown() is basically used in the reboot and kexec > paths? Correct. shutdown() is only called during reboot/shutdown calls. If you echo 1 into the remove file, remove() gets called. Handy for hotplug use cases. It needs to be the exact opposite of the probe. It needs to clean up resources etc. and have the HW in a state where it can be reinitialized via probe again. > >> .err_handler= &pcie_portdrv_err_handler, >> >> -- >> 2.7.4 >> > -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
[PATCH v4 3/7] scsi: dpt_i2o: eliminate duplicate barriers on weakly-ordered archs
Code includes barrier() followed by writel(). writel() already has a barrier on some architectures like arm64. This ends up CPU observing two barriers back to back before executing the register write. Since code already has an explicit barrier call, changing writel() to writel_relaxed(). Signed-off-by: Sinan Kaya --- drivers/scsi/dpt_i2o.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/dpt_i2o.c b/drivers/scsi/dpt_i2o.c index fd172b0..3c1e64a 100644 --- a/drivers/scsi/dpt_i2o.c +++ b/drivers/scsi/dpt_i2o.c @@ -1300,7 +1300,7 @@ static s32 adpt_i2o_post_this(adpt_hba* pHba, u32* data, int len) wmb(); //post message - writel(m, pHba->post_port); + writel_relaxed(m, pHba->post_port); wmb(); return 0; @@ -1390,7 +1390,7 @@ static s32 adpt_i2o_reset_hba(adpt_hba* pHba) memcpy_toio(pHba->msg_addr_virt+m, msg, sizeof(msg)); wmb(); - writel(m, pHba->post_port); + writel_relaxed(m, pHba->post_port); wmb(); while(*status == 0){ @@ -2797,7 +2797,7 @@ static s32 adpt_send_nop(adpt_hba*pHba,u32 m) writel( 0,&msg[2]); wmb(); - writel(m, pHba->post_port); + writel_relaxed(m, pHba->post_port); wmb(); return 0; } -- 2.7.4
[PATCH v4 2/7] scsi: megaraid: eliminate duplicate barriers on weakly-ordered archs
Code includes barrier() followed by writel(). writel() already has a barrier on some architectures like arm64. This ends up CPU observing two barriers back to back before executing the register write. Create a new wrapper function with relaxed write operator. Use the new wrapper when a write is following a barrier(). Since code already has an explicit barrier call, changing writel() to writel_relaxed(). Signed-off-by: Sinan Kaya --- drivers/scsi/megaraid/megaraid_mbox.c | 8 drivers/scsi/megaraid/megaraid_mbox.h | 2 ++ drivers/scsi/megaraid/megaraid_sas_fusion.c | 4 ++-- 3 files changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/scsi/megaraid/megaraid_mbox.c b/drivers/scsi/megaraid/megaraid_mbox.c index 530358c..8c4fc6e 100644 --- a/drivers/scsi/megaraid/megaraid_mbox.c +++ b/drivers/scsi/megaraid/megaraid_mbox.c @@ -1439,7 +1439,7 @@ mbox_post_cmd(adapter_t *adapter, scb_t *scb) mbox->ack = 0; wmb(); - WRINDOOR(raid_dev, raid_dev->mbox_dma | 0x1); + WRINDOOR_RELAXED(raid_dev, raid_dev->mbox_dma | 0x1); spin_unlock_irqrestore(MAILBOX_LOCK(raid_dev), flags); @@ -2752,7 +2752,7 @@ mbox_post_sync_cmd(adapter_t *adapter, uint8_t raw_mbox[]) mbox->status= 0xFF; wmb(); - WRINDOOR(raid_dev, raid_dev->mbox_dma | 0x1); + WRINDOOR_RELAXED(raid_dev, raid_dev->mbox_dma | 0x1); // wait for maximum 1 second for status to post. If the status is not // available within 1 second, assume FW is initializing and wait @@ -2877,7 +2877,7 @@ mbox_post_sync_cmd_fast(adapter_t *adapter, uint8_t raw_mbox[]) mbox->status= 0xFF; wmb(); - WRINDOOR(raid_dev, raid_dev->mbox_dma | 0x1); + WRINDOOR_RELAXED(raid_dev, raid_dev->mbox_dma | 0x1); for (i = 0; i < MBOX_SYNC_WAIT_CNT; i++) { if (mbox->numstatus != 0xFF) break; @@ -3329,7 +3329,7 @@ megaraid_mbox_fire_sync_cmd(adapter_t *adapter) mbox->status= 0; wmb(); - WRINDOOR(raid_dev, raid_dev->mbox_dma | 0x1); + WRINDOOR_RELAXED(raid_dev, raid_dev->mbox_dma | 0x1); /* Wait for maximum 1 min for status to post. * If the Firmware SUPPORTS the ABOVE COMMAND, diff --git a/drivers/scsi/megaraid/megaraid_mbox.h b/drivers/scsi/megaraid/megaraid_mbox.h index c1d86d9..641cbd4 100644 --- a/drivers/scsi/megaraid/megaraid_mbox.h +++ b/drivers/scsi/megaraid/megaraid_mbox.h @@ -230,6 +230,8 @@ typedef struct { #define RDINDOOR(rdev) readl((rdev)->baseaddr + 0x20) #define RDOUTDOOR(rdev)readl((rdev)->baseaddr + 0x2C) +#define WRINDOOR_RELAXED(rdev, value) \ + writel_relaxed(value, (rdev)->baseaddr + 0x20) #define WRINDOOR(rdev, value) writel(value, (rdev)->baseaddr + 0x20) #define WROUTDOOR(rdev, value) writel(value, (rdev)->baseaddr + 0x2C) diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c index 073ced0..f560496 100644 --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c @@ -3479,11 +3479,11 @@ complete_cmd_fusion(struct megasas_instance *instance, u32 MSIxIndex) wmb(); if (instance->msix_combined) - writel(((MSIxIndex & 0x7) << 24) | + writel_relaxed(((MSIxIndex & 0x7) << 24) | fusion->last_reply_idx[MSIxIndex], instance->reply_post_host_index_addr[MSIxIndex/8]); else - writel((MSIxIndex << 24) | + writel_relaxed((MSIxIndex << 24) | fusion->last_reply_idx[MSIxIndex], instance->reply_post_host_index_addr[0]); megasas_check_and_restore_queue_depth(instance); -- 2.7.4
[PATCH v4 6/7] scsi: bnx2i: Eliminate duplicate barriers on weakly-ordered archs
Code includes barrier() followed by writel(). writel() already has a barrier on some architectures like arm64. This ends up CPU observing two barriers back to back before executing the register write. Since code already has an explicit barrier call, changing writel() to writel_relaxed(). Signed-off-by: Sinan Kaya --- drivers/scsi/bnx2i/bnx2i_hwi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/bnx2i/bnx2i_hwi.c b/drivers/scsi/bnx2i/bnx2i_hwi.c index 8f03a86..075735b 100644 --- a/drivers/scsi/bnx2i/bnx2i_hwi.c +++ b/drivers/scsi/bnx2i/bnx2i_hwi.c @@ -278,7 +278,7 @@ static void bnx2i_ring_sq_dbell(struct bnx2i_conn *bnx2i_conn, int count) sq_db->prod_idx = ep->qp.sq_prod_idx; bnx2i_ring_577xx_doorbell(bnx2i_conn); } else - writew(count, ep->qp.ctx_base + CNIC_SEND_DOORBELL); + writew_relaxed(count, ep->qp.ctx_base + CNIC_SEND_DOORBELL); mmiowb(); /* flush posted PCI writes */ } -- 2.7.4
[PATCH v4 7/7] scsi: csiostor: Eliminate duplicate barriers on weakly-ordered archs
Code includes barrier() followed by writel(). writel() already has a barrier on some architectures like arm64. This ends up CPU observing two barriers back to back before executing the register write. Create a new wrapper function with relaxed write operator. Use the new wrapper when a write is following a barrier(). Signed-off-by: Sinan Kaya --- drivers/scsi/csiostor/csio_hw.h | 4 drivers/scsi/csiostor/csio_wr.c | 2 +- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/csiostor/csio_hw.h b/drivers/scsi/csiostor/csio_hw.h index 30f5f52..9fd8b00 100644 --- a/drivers/scsi/csiostor/csio_hw.h +++ b/drivers/scsi/csiostor/csio_hw.h @@ -512,6 +512,10 @@ struct csio_hw { csio_reg((_h)->regstart, (_r))) #definecsio_wr_reg16(_h, _v, _r) writew((_v), \ csio_reg((_h)->regstart, (_r))) + +#definecsio_wr_reg32_relaxed(_h, _v, _r) \ + writel_relaxed((_v), csio_reg((_h)->regstart, (_r))) + #definecsio_wr_reg32(_h, _v, _r) writel((_v), \ csio_reg((_h)->regstart, (_r))) #definecsio_wr_reg64(_h, _v, _r) writeq((_v), \ diff --git a/drivers/scsi/csiostor/csio_wr.c b/drivers/scsi/csiostor/csio_wr.c index c0a1778..db26222 100644 --- a/drivers/scsi/csiostor/csio_wr.c +++ b/drivers/scsi/csiostor/csio_wr.c @@ -984,7 +984,7 @@ csio_wr_issue(struct csio_hw *hw, int qidx, bool prio) wmb(); /* Ring SGE Doorbell writing q->pidx into it */ - csio_wr_reg32(hw, DBPRIO_V(prio) | QID_V(q->un.eq.physeqid) | + csio_wr_reg32_relaxed(hw, DBPRIO_V(prio) | QID_V(q->un.eq.physeqid) | PIDX_T5_V(q->inc_idx) | DBTYPE_F, MYPF_REG(SGE_PF_KDOORBELL_A)); q->inc_idx = 0; -- 2.7.4
[PATCH v4 5/7] scsi: ipr: Eliminate duplicate barriers on weakly-ordered archs
Code includes barrier() followed by writel(). writel() already has a barrier on some architectures like arm64. This ends up CPU observing two barriers back to back before executing the register write. Since code already has an explicit barrier call, changing writeX() to writeX_relaxed(). Signed-off-by: Sinan Kaya --- drivers/scsi/ipr.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c index e07dd99..209adac 100644 --- a/drivers/scsi/ipr.c +++ b/drivers/scsi/ipr.c @@ -762,9 +762,9 @@ static void ipr_mask_and_clear_interrupts(struct ipr_ioa_cfg *ioa_cfg, /* Set interrupt mask to stop all new interrupts */ if (ioa_cfg->sis64) - writeq(~0, ioa_cfg->regs.set_interrupt_mask_reg); + writeq_relaxed(~0, ioa_cfg->regs.set_interrupt_mask_reg); else - writel(~0, ioa_cfg->regs.set_interrupt_mask_reg); + writel_relaxed(~0, ioa_cfg->regs.set_interrupt_mask_reg); /* Clear any pending interrupts */ if (ioa_cfg->sis64) @@ -8435,7 +8435,8 @@ static int ipr_reset_enable_ioa(struct ipr_cmnd *ipr_cmd) wmb(); if (ioa_cfg->sis64) { /* Set the adapter to the correct endian mode. */ - writel(IPR_ENDIAN_SWAP_KEY, ioa_cfg->regs.endian_swap_reg); + writel_relaxed(IPR_ENDIAN_SWAP_KEY, + ioa_cfg->regs.endian_swap_reg); int_reg = readl(ioa_cfg->regs.endian_swap_reg); } -- 2.7.4
[PATCH v4 4/7] scsi: lpfc: Eliminate duplicate barriers on weakly-ordered archs
Code includes barrier() followed by writel(). writel() already has a barrier on some architectures like arm64. This ends up CPU observing two barriers back to back before executing the register write. Since code already has an explicit barrier call, changing writel() to writel_relaxed(). Signed-off-by: Sinan Kaya --- drivers/scsi/lpfc/lpfc_sli.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c index 5f5528a..7dae7d3 100644 --- a/drivers/scsi/lpfc/lpfc_sli.c +++ b/drivers/scsi/lpfc/lpfc_sli.c @@ -1647,7 +1647,7 @@ lpfc_sli_update_full_ring(struct lpfc_hba *phba, struct lpfc_sli_ring *pring) * Set ring 'ringno' to SET R0CE_REQ in Chip Att register. * The HBA will tell us when an IOCB entry is available. */ - writel((CA_R0ATT|CA_R0CE_REQ) << (ringno*4), phba->CAregaddr); + writel_relaxed((CA_R0ATT|CA_R0CE_REQ) << (ringno*4), phba->CAregaddr); readl(phba->CAregaddr); /* flush */ pring->stats.iocb_cmd_full++; @@ -1672,7 +1672,7 @@ lpfc_sli_update_ring(struct lpfc_hba *phba, struct lpfc_sli_ring *pring) */ if (!(phba->sli3_options & LPFC_SLI3_CRP_ENABLED)) { wmb(); - writel(CA_R0ATT << (ringno * 4), phba->CAregaddr); + writel_relaxed(CA_R0ATT << (ringno * 4), phba->CAregaddr); readl(phba->CAregaddr); /* flush */ } } -- 2.7.4
[PATCH v4 1/7] scsi: hpsa: Eliminate duplicate barriers on weakly-ordered archs
Code includes wmb() followed by writel(). writel() already has a barrier on some architectures like arm64. This ends up CPU observing two barriers back to back before executing the register write. Since code already has an explicit barrier call, changing writel() to writel_relaxed(). Signed-off-by: Sinan Kaya --- drivers/scsi/hpsa.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h index 018f980..c7d7e6a 100644 --- a/drivers/scsi/hpsa.h +++ b/drivers/scsi/hpsa.h @@ -599,7 +599,7 @@ static unsigned long SA5_ioaccel_mode1_completed(struct ctlr_info *h, u8 q) * but with current driver design this is easiest. */ wmb(); - writel((q << 24) | rq->current_entry, h->vaddr + + writel_relaxed((q << 24) | rq->current_entry, h->vaddr + IOACCEL_MODE1_CONSUMER_INDEX); atomic_dec(&h->commands_outstanding); } -- 2.7.4
[PATCH v4 0/7] scsi: Eliminate duplicate barriers on weakly-ordered archs
Code includes wmb() followed by writel() in multiple places. writel() already has a barrier on some architectures like arm64. This ends up CPU observing two barriers back to back before executing the register write. Since code already has an explicit barrier call, changing writel() to writel_relaxed(). I did a regex search for wmb() followed by writel() in each drivers directory. I scrubbed the ones I care about in this series. I considered "ease of change", "popular usage" and "performance critical path" as the determining criteria for my filtering. We used relaxed API heavily on ARM for a long time but it did not exist on other architectures. For this reason, relaxed architectures have been paying double penalty in order to use the common drivers. Now that relaxed API is present on all architectures, we can go and scrub all drivers to see what needs to change and what can remain. We start with mostly used ones and hope to increase the coverage over time. It will take a while to cover all drivers. Feel free to apply patches individually. Changes since v3: - https://www.spinics.net/lists/arm-kernel/msg641851.html - group patches together into subsystems scsi:... - collect reviewed and tested bys - scrub barrier() Sinan Kaya (7): scsi: hpsa: Eliminate duplicate barriers on weakly-ordered archs scsi: megaraid: eliminate duplicate barriers on weakly-ordered archs scsi: dpt_i2o: eliminate duplicate barriers on weakly-ordered archs scsi: lpfc: Eliminate duplicate barriers on weakly-ordered archs scsi: ipr: Eliminate duplicate barriers on weakly-ordered archs scsi: bnx2i: Eliminate duplicate barriers on weakly-ordered archs scsi: csiostor: Eliminate duplicate barriers on weakly-ordered archs drivers/scsi/bnx2i/bnx2i_hwi.c | 2 +- drivers/scsi/csiostor/csio_hw.h | 4 drivers/scsi/csiostor/csio_wr.c | 2 +- drivers/scsi/dpt_i2o.c | 6 +++--- drivers/scsi/hpsa.h | 2 +- drivers/scsi/ipr.c | 7 --- drivers/scsi/lpfc/lpfc_sli.c| 4 ++-- drivers/scsi/megaraid/megaraid_mbox.c | 8 drivers/scsi/megaraid/megaraid_mbox.h | 2 ++ drivers/scsi/megaraid/megaraid_sas_fusion.c | 4 ++-- 10 files changed, 24 insertions(+), 17 deletions(-) -- 2.7.4
[PATCH v3 08/18] scsi: hpsa: Eliminate duplicate barriers on weakly-ordered archs
Code includes wmb() followed by writel(). writel() already has a barrier on some architectures like arm64. This ends up CPU observing two barriers back to back before executing the register write. Since code already has an explicit barrier call, changing writel() to writel_relaxed(). Signed-off-by: Sinan Kaya --- drivers/scsi/hpsa.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h index 018f980..c7d7e6a 100644 --- a/drivers/scsi/hpsa.h +++ b/drivers/scsi/hpsa.h @@ -599,7 +599,7 @@ static unsigned long SA5_ioaccel_mode1_completed(struct ctlr_info *h, u8 q) * but with current driver design this is easiest. */ wmb(); - writel((q << 24) | rq->current_entry, h->vaddr + + writel_relaxed((q << 24) | rq->current_entry, h->vaddr + IOACCEL_MODE1_CONSUMER_INDEX); atomic_dec(&h->commands_outstanding); } -- 2.7.4
Re: [PATCH V2] scsi: mpt3sas: remove redundant wmb
On 4/21/2017 3:56 AM, Sreekanth Reddy wrote: > [Sreekanth] Whether same thing applicable for SPARC & POWER > architectures. If yes then we are fine with this patch changes. This behavior is common for all architectures according to this document. Who would be the best person to comment on SPARC and POWER architectures in specific? James and I exchanged some comments on the first version. James? can you comment on POWER behavior. https://www.kernel.org/doc/Documentation/memory-barriers.txt Inside of the Linux kernel, I/O should be done through the appropriate accessor routines - such as inb() or writel() - which know how to make such accesses appropriately sequential. "Whilst this, for the most part, renders the explicit use of memory barriers unnecessary", there are a couple of situations where they might be needed: (1) On some systems, I/O stores are not strongly ordered across all CPUs, and so for _all_ general drivers locks should be used and mmiowb() must be issued prior to unlocking the critical section. (2) If the accessor functions are used to refer to an I/O memory window with relaxed memory access properties, then _mandatory_ memory barriers are required to enforce ordering. -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
[PATCH V2] scsi: mpt3sas: remove redundant wmb
Due to relaxed ordering requirements on multiple architectures, drivers are required to use wmb/rmb/mb combinations when they need to guarantee observability between the memory and the HW. The mpt3sas driver is already using wmb() for this purpose. However, it issues a writel following wmb(). writel() function on arm/arm64 arhictectures have an embedded wmb() call inside. This results in unnecessary performance loss and code duplication. writel already guarantees ordering for both cpu and bus. we don't need additional wmb() Signed-off-by: Sinan Kaya --- drivers/scsi/mpt3sas/mpt3sas_base.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index 5b7aec5..18039bb 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -1025,7 +1025,6 @@ static int mpt3sas_remove_dead_ioc_func(void *arg) 0 : ioc->reply_free_host_index + 1; ioc->reply_free[ioc->reply_free_host_index] = cpu_to_le32(reply); - wmb(); writel(ioc->reply_free_host_index, &ioc->chip->ReplyFreeHostIndex); } @@ -1074,7 +1073,6 @@ static int mpt3sas_remove_dead_ioc_func(void *arg) return IRQ_NONE; } - wmb(); if (ioc->is_warpdrive) { writel(reply_q->reply_post_host_index, ioc->reply_post_host_index[msix_index]); -- 1.9.1
Re: [PATCH] scsi: mpt3sas: remove redundant wmb on arm/arm64
On 4/7/2017 1:25 PM, James Bottomley wrote: >> The right thing was to either call __raw_writel/__raw_readl or >> write_relaxed/read_relaxed for multi-arch compatibility. > writeX_relaxed and thus your patch is definitely wrong. The reason is > that we have two ordering domains: the CPU and the Bus. wmb forces > ordering in the CPU domain but not the bus domain. writeX originally > forced ordering in the bus domain but not the CPU domain, but since the > raw primitives I think it now orders in both and writeX_relaxed orders > in neither domain, so your patch would currently eliminate the bus > ordering. Yeah, that's why I recommended to remove the wmb() with a follow up instead of using the relaxed with a follow up. writel already guarantees ordering for both cpu and bus. we don't need additional wmb() -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [PATCH] scsi: mpt3sas: remove redundant wmb on arm/arm64
On 4/7/2017 12:41 PM, Sinan Kaya wrote: > The right thing was to either call __raw_writel/__raw_readl or > write_relaxed/read_relaxed for multi-arch compatibility. One can also argue to get rid of wmb(). I can go either way based on the recommendation. -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
[PATCH] scsi: mpt3sas: remove redundant wmb on arm/arm64
Due to relaxed ordering requirements on multiple architectures, drivers are required to use wmb/rmb/mb combinations when they need to guarantee observability between the memory and the HW. The mpt3sas driver is already using wmb() for this purpose. However, it issues a writel following wmb(). writel() function on arm/arm64 arhictectures have an embedded wmb() call inside. This results in unnecessary performance loss and code duplication. The kernel has been updated to support relaxed read/write API to be supported across all architectures now. The right thing was to either call __raw_writel/__raw_readl or write_relaxed/read_relaxed for multi-arch compatibility. Signed-off-by: Sinan Kaya --- drivers/scsi/mpt3sas/mpt3sas_base.c | 21 +++-- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index 5b7aec5..6e42036 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -1026,8 +1026,8 @@ static int mpt3sas_remove_dead_ioc_func(void *arg) ioc->reply_free[ioc->reply_free_host_index] = cpu_to_le32(reply); wmb(); - writel(ioc->reply_free_host_index, - &ioc->chip->ReplyFreeHostIndex); + writel_relaxed(ioc->reply_free_host_index, + &ioc->chip->ReplyFreeHostIndex); } } @@ -1076,8 +1076,8 @@ static int mpt3sas_remove_dead_ioc_func(void *arg) wmb(); if (ioc->is_warpdrive) { - writel(reply_q->reply_post_host_index, - ioc->reply_post_host_index[msix_index]); + writel_relaxed(reply_q->reply_post_host_index, + ioc->reply_post_host_index[msix_index]); atomic_dec(&reply_q->busy); return IRQ_HANDLED; } @@ -1098,13 +1098,14 @@ static int mpt3sas_remove_dead_ioc_func(void *arg) * value in MSIxIndex field. */ if (ioc->combined_reply_queue) - writel(reply_q->reply_post_host_index | ((msix_index & 7) << - MPI2_RPHI_MSIX_INDEX_SHIFT), - ioc->replyPostRegisterIndex[msix_index/8]); + writel_relaxed(reply_q->reply_post_host_index | + ((msix_index & 7) << + MPI2_RPHI_MSIX_INDEX_SHIFT), + ioc->replyPostRegisterIndex[msix_index/8]); else - writel(reply_q->reply_post_host_index | (msix_index << - MPI2_RPHI_MSIX_INDEX_SHIFT), - &ioc->chip->ReplyPostHostIndex); + writel_relaxed(reply_q->reply_post_host_index | + (msix_index << MPI2_RPHI_MSIX_INDEX_SHIFT), + &ioc->chip->ReplyPostHostIndex); atomic_dec(&reply_q->busy); return IRQ_HANDLED; } -- 1.9.1
Re: [RFC 1/8] Introduce Peer-to-Peer memory (p2pmem) device
On 4/2/2017 1:21 PM, Logan Gunthorpe wrote: >> This is when the BIOS date helps so that you don't break existing systems. > I'm not that worried about this code breaking existing systems. There > are significant trade-offs with using p2pmem (ie. you are quite likely > sacrificing performance for memory QOS or upstream PCI bandwidth), and > therefore the user _has_ to specifically say to use it. This is why > we've put a flag in the nvme target code that defaults to off. Thus we > are not going to have a situation where people upgrade their kernels and > see broken or slow systems. People _have_ to make the decision to turn > it on and decide based on their use case whether it's appropriate. > OK. I didn't know the feature was not enabled by default. This is even easier now. Push the decision all the way to the user. Let them decide whether they want this feature to work on a root port connected port or under the switch. >> We can't guarentee all switches will work either. See above for instructions >> on when this feature should be enabled. > It's a lot easier to say that all switches will work than it is for root > ports. This is essentially what switches are designed for, so I'd be > surprised to find one that doesn't work. Root ports are the trouble here > seeing it's a lot more likely for them to be designed without > considering that traffic needs to move between ports efficiently. If we > do find extremely broken switches that don't support this then we'd > probably want to create a black list for that. Also, there's > significantly fewer PCI switch products on the market than there are > root port instances, so a black list would be much easier to manage there. > I thought the issue was feature didn't work at all with some root ports or there was some kind of memory corruption issue that you were trying to avoid with the existing systems. If you are just worried about performance, the switch recommendation belongs to your particular product tuning guide or a howto document not into the actual code itself. I think you should get rid of all pci searching business in your code. -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [RFC 1/8] Introduce Peer-to-Peer memory (p2pmem) device
Hi Logan, I added Alex and Bjorn above. On 4/1/2017 6:16 PM, Logan Gunthorpe wrote: > Hey, > > On 31/03/17 08:17 PM, ok...@codeaurora.org wrote: >> See drivers/pci and drivers/acpi directory. > > The best I could find was the date of the firmware/bios. I really don't > think that makes sense to tie the two together. And really the more that > I think about it trying to do a date cutoff for this seems crazy without > very comprehensive hardware testing done. I have no idea which AMD chips > have decent root ports for this and then if we include all of ARM and > POWERPC, etc there's a huge amount of unknown hardware. Saying that the > system's firmware has to be written after 2016 seems like an arbitrary > restriction that isn't likely to correlate to any working systems. I recommended a combination of blacklist + p2p capability + BIOS date. Not just BIOS date. BIOS date by itself is useless. As you may or may not be aware, PCI defines capability registers for discovering features. Unfortunately, there is no direct p2p capability register. However, Access Control Services (ACS) capability register has flags indicating p2p functionality. p2p feature needs to be discovered from ACS. https://pdos.csail.mit.edu/~sbw/links/ECN_access_control_061011.pdf This is just one of the many P2P capability flags. "ACS P2P Request Redirect: must be implemented by Root Ports that support peer-to-peer traffic with other Root Ports5; must be implemented by Switch Downstream Ports." If the root port or a switch does not have ACS capability, p2p is not allowed. If these p2p flags are not set, don't allow p2p feature. The normal expectation from any system (root port/switch) is not to set these bits unless p2p feature is present/working. However, there could be systems in the field with ACS capability but broken HW or broken FW. This is when the BIOS date helps so that you don't break existing systems. The right thing in my opinion is 1. blacklist by pci vendor/device id like any other pci quirk in quirks.c. 2. Require this feature for recent HW/BIOS by checking the BIOS date. 3. Check the p2p capability from ACS. > > I still say the only sane thing to do is allow all switches and then add > a whitelist of root ports that are known to work well. If we care about > preventing broken systems in a comprehensive way then that's the only > thing that is going to work. We can't guarentee all switches will work either. See above for instructions on when this feature should be enabled. Let's step back for a moment. If we think about logical blocks here, p2pmem is a pci user. It should not walk the bus and search for possible good things by itself. We don't usually put code into the kernel's driver directory for specific arch/ specific devices. There are hundreds of device drivers in the kernel. None of them are guarenteed to work in any architecture but they don't prohibit use either. System integrators like me test these drivers against their own systems, find bugs to remove arch specific assumptions and post patches. p2pmem is potentially just one of the many users of p2p capability in the system. This p2p detection needs to be done by some p2p driver inside the drivers/pci directory or inside drivers/pci/probe.c. This p2p driver needs to verify ACS permissions similar to what pci_device_group() does. If the system is p2p capable, this p2p driver sets p2p_capable bit in struct pci_dev. p2pmem driver then uses this bit to decide when it should enable its feature. Bjorn and Alex needs to device about the final solution as they maintain both PCI and virtualization (ACS) respectively. Sinan -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [RFC 1/8] Introduce Peer-to-Peer memory (p2pmem) device
On 3/31/2017 6:42 PM, Logan Gunthorpe wrote: > > > On 31/03/17 03:38 PM, Sinan Kaya wrote: >> On 3/31/2017 5:23 PM, Logan Gunthorpe wrote: >>> What exactly would you white/black list? It can't be the NIC or the >>> disk. If it's going to be a white/black list on the switch or root port >>> then you'd need essentially the same code to ensure they are all behind >>> the same switch or root port. >> >> What is so special about being connected to the same switch? >> >> Why don't we allow the feature by default and blacklist by the root ports >> that don't work with a quirk. > > Well root ports have the same issue here. There may be more than one > root port or other buses (ie QPI) between the devices in question. So > you can't just say "this system has root port X therefore we can always > use p2pmem". We only care about devices on the data path between two devices. > In the end, if you want to do any kind of restrictions > you're going to have to walk the tree, as the code currently does, and > figure out what's between the devices being used and black or white list > accordingly. Then seeing there's just such a vast number of devices out > there you'd almost certainly have to use some kind of white list and not > a black list. Then the question becomes which devices will be white > listed? How about a combination of blacklist + time bomb + peer-to-peer feature? You can put a restriction with DMI/SMBIOS such that all devices from 2016 work else they belong to blacklist. > The first to be listed would be switches seeing they will always > work. This is pretty much what we have (though it doesn't currently > cover multiple levels of switches). The next step, if someone wanted to > test with specific hardware, might be to allow the case where all the > devices are behind the same root port which Intel Ivy Bridge or newer. Sorry, I'm not familiar with Intel architecture. Based on what you just wrote, I think I see your point. I'm trying to generalize what you are doing to a little bigger context so that I can use it on another architecture like arm64 where I may or may not have a switch. This text below is sort of repeating what you are writing above. How about this: The goal is to find a common parent between any two devices that need to use your code. - all bridges/switches on the data need to support peer-to-peer, otherwise stop. - Make sure that all devices on the data path are not blacklisted via your code. - If there is at least somebody blacklisted, we stop and the feature is not allowed. - If we find a common parent and no errors, you are good to go. - We don't care about devices above the common parent whether they have some feature X, Y, Z or not. Maybe, a little bit less code than what you have but it is flexible and not that too hard to implement. Well, the code is in RFC. I don't see why we can't remove some restrictions and still have your code move forward. > However, I don't think a comprehensive white list should be a > requirement for this work to go forward and I don't think anything > you've suggested will remove any of the "ugliness". I don't think the ask above is a very big deal. If you feel like addressing this on another patchset like you suggested in your cover letter, I'm fine with that too. > > What we discussed at LSF was that only allowing cases with a switch was > the simplest way to be sure any given setup would actually work. > >> I'm looking at this from portability perspective to be honest. > > I'm looking at this from the fact that there's a vast number of > topologies and devices involved, and figuring out which will work is > very complicated and could require a lot of hardware testing. The LSF > folks were primarily concerned with not having users enable the feature > and see breakage or terrible performance. > >> I'd rather see the feature enabled by default without any assumptions. >> Using it with a switch is just a use case that you happened to test. >> It can allow new architectures to use your code tomorrow. > > That's why I was advocating for letting userspace decide such that if > you're setting up a system with this you say to use a specific p2pmem > device and then you are responsible to test and benchmark it and decide > to use it in going forward. However, this has received a lot of push back. Yeah, we shouldn't trust the userspace for such things. > > Logan > -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [RFC 1/8] Introduce Peer-to-Peer memory (p2pmem) device
On 3/31/2017 5:23 PM, Logan Gunthorpe wrote: > What exactly would you white/black list? It can't be the NIC or the > disk. If it's going to be a white/black list on the switch or root port > then you'd need essentially the same code to ensure they are all behind > the same switch or root port. What is so special about being connected to the same switch? Why don't we allow the feature by default and blacklist by the root ports that don't work with a quirk. I'm looking at this from portability perspective to be honest. I'd rather see the feature enabled by default without any assumptions. Using it with a switch is just a use case that you happened to test. It can allow new architectures to use your code tomorrow. Sinan -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [RFC 1/8] Introduce Peer-to-Peer memory (p2pmem) device
Hi Logan, > +/** > + * p2pmem_unregister() - unregister a p2pmem device > + * @p: the device to unregister > + * > + * The device will remain until all users are done with it > + */ > +void p2pmem_unregister(struct p2pmem_dev *p) > +{ > + if (!p) > + return; > + > + dev_info(&p->dev, "unregistered"); > + device_del(&p->dev); > + ida_simple_remove(&p2pmem_ida, p->id); Don't you need to clean up the p->pool here. > + put_device(&p->dev); > +} > +EXPORT_SYMBOL(p2pmem_unregister); > + I don't like the ugliness around the switch port to be honest. Going to whitelist/blacklist looks simpler in my opinion. Sinan -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [PATCH V2 1/3] scsi: mptxsas: try 64 bit DMA when 32 bit DMA fails
On 11/10/2015 2:56 PM, Arnd Bergmann wrote: The ACPI IORT table declares whether you enable IOMMU for a particular >device or not. The placement of IOMMU HW is system specific. The IORT >table gives the IOMMU HW topology to the operating system. This sounds odd. Clearly you need to specify the IOMMU settings for each possible PCI device independent of whether the OS actually uses the IOMMU or not. There are provisions to have DMA mask in the PCIe host bridge not at the PCIe device level inside IORT table. This setting is specific for each PCIe bus. It is not per PCIe device. It is assumed that the endpoint device driver knows the hardware for PCIe devices. The driver can also query the supported DMA bits by this platform via DMA APIs and will request the correct DMA mask from the DMA subsystem (surprise!). In a lot of cases, we want to turn it off to get better performance when the driver has set a DMA mask that covers all of RAM, but you also want to enable the IOMMU for debugging purposes or for device assignment if you run virtual machines. The bootloader doesn't know how the device is going to be used, so it cannot define the policy here. I think we'll end up adding a virtualization option to the UEFI BIOS similar to how Intel platforms work. Based on this switch, we'll end up patching the ACPI table. If I remove the IORT entry, then the device is in coherent mode with device accessing the full RAM range. If I have the IORT table, the device is in IOMMU translation mode. Details are in the IORT spec. -- Sinan Kaya Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 1/3] scsi: mptxsas: try 64 bit DMA when 32 bit DMA fails
On 11/10/2015 3:05 PM, James Bottomley wrote: OK, you don't seem to be understanding the problem: the Altix isn't a LSI card, it was a SGI platform. Got it. It was the platform where we first discovered the issue that a lot of storage cards didn't work because it by default had no memory below 4GB. The reason coherent masks were introduced was initially so the Altix could manufacture and manage a region of memory in the lower 4GB region and we would guarantee to make allocations from it so the storage cards would then work on that platform. I can't fix the issue if the card cannot do 64 bit DMA when IOMMU is not there. I need IOMMU enabled all the time for this card. -- Sinan Kaya Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 1/3] scsi: mptxsas: try 64 bit DMA when 32 bit DMA fails
On 11/10/2015 2:43 PM, James Bottomley wrote: The Issue, as stated by LSI is Initially set the consistent DMA mask to 32 bit and then change it to 64 bit mask after allocating RDPQ pools by calling the function _base_change_consistent_dma_mask. This is to ensure that all the upper 32 bits of RDPQ entries's base address to be same. Need somebody from mpt to confirm that this behavior is still valid for the recent cards besides altix. If you set a 64 bit coherent mask before this point, you're benefiting from being lucky that all the upper 32 bits of the allocations are the same ... we can't code a driver to rely on luck. Particularly not when the failure mode looks like it would be silent and deadly. Of course nobody wants unreliable code. I'm wondering if I was just lucky during my testing or the 92xx and 93xx hardware supports full 64 bit range. I don't have any insight into what the endpoint does or what it is capable of. >Another comment here from you. >https://lkml.org/lkml/2015/4/2/28 > >"Well, it was originally a hack for altix, because they had no regions >below 4GB and had to specifically manufacture them. As you know, in >Linux, if Intel doesn't need it, no-one cares and the implementation >bitrots." > >Maybe, it is time to fix the code for more recent (even decent) hardware? What do you mean "fix the code"? The code isn't broken, it's parametrising issues with particular hardware. There's no software work around (except allocating memory with the correct characteristics). Need confirmation. I'm questioning if we are stuck with this behavior because of altix or something else. If the latter case, the code could have used PCI ID to do something special for it. -- Sinan Kaya Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 1/3] scsi: mptxsas: try 64 bit DMA when 32 bit DMA fails
On 11/10/2015 1:27 PM, James Bottomley wrote: On Tue, 2015-11-10 at 12:19 -0500, Sinan Kaya wrote: On 11/10/2015 11:47 AM, Arnd Bergmann wrote: On Tuesday 10 November 2015 11:06:40 Sinan Kaya wrote: On 11/10/2015 3:38 AM, Arnd Bergmann wrote: From the email thread, it looks like this was introduced to support some legacy card that has 64 bit addressing limitations and is being carried around ("rotted") since then. I'm the second guy after the powerpc architecture complaining about the very same issue. Any red flags? What BenH was worried about here is that the driver sets different masks for streaming and coherent mappings, which is indeed a worry that could hit us on ARM as well, but I suppose we'll have to deal with that in platform code. Setting both masks to 32-bit is something that a lot of drivers do, and without IOMMU enabled, you'd hit the same bug on all of them. Maybe, maybe not. This is the only card that I had problems with. Your characterisation of "some legacy card" isn't entirely correct. Just to clarify how this happens, most I/O cards today are intelligent offload engines which means they have some type of embedded CPU (it can even be a specially designed asic). This CPU is driven by firmware which is mostly (but not always) in the machine language of the CPU. DMA transfers are sometimes run by this CPU, but mostly handed off to a separate offload engine. When the board gets revised, it's often easier to update the offload engine to 64 bits and keep the CPU at 32 (or even 16) bits. This means that all the internal addresses in the firmware are 32 bit only. As I read the comments in the original thread, it looks like the mpt people tried to mitigate this by using segment registers for external addresses firmware uses ... that's why they say that they don't have to have all the addresses in DMA32 ... they just need the upper 32 bits to be constant so they can correctly program the segment register. Unfortunately, we have no way to parametrise this to the DMA allocation code. You'll find the same thing with Adaptec SPI cards. Their route to 64 bits was via an initial 39 bit extension that had them layering the additional 7 bits into the unused lower region of the page descriptors for the firmware (keeping the actual pointers to DMA at 32 bits because they're always parametrised as address, offset, length and the address is always a 4k page). Eventually, everything will rev to 64 bits and this problem will go away, but, as I suspect you know, it takes time for the embedded world to get to where everyone else already is. As Arnd said, if you failed to allow for this in your platform, then oops, just don't use the card. I think this solution would be better than trying to get the driver to work out which cards can support 64 bit firmware descriptors and only failing on your platform for those that can't. James James, I was referring to this conversation here. https://lkml.org/lkml/2015/2/20/31 "The aic79xx hardware problem was that the DMA engine could address the whole of memory (it had two address modes, a 39 bit one and a 64 bit one) but the script engine that runs the mailboxes only had a 32 bit activation register (the activating write points at the physical address of the script to begin executing)." The fact that LSI SAS 92118i is working with 64 bit addresses suggests me that this problem is already solved. I have not hit any kind of regressions with 93xx and 92xx families under load in a true 64 bit environment. I am only mentioning this based on my testing exposure. Another comment here from you. https://lkml.org/lkml/2015/4/2/28 "Well, it was originally a hack for altix, because they had no regions below 4GB and had to specifically manufacture them. As you know, in Linux, if Intel doesn't need it, no-one cares and the implementation bitrots." Maybe, it is time to fix the code for more recent (even decent) hardware? Sinan -- Sinan Kaya Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 1/3] scsi: mptxsas: try 64 bit DMA when 32 bit DMA fails
On 11/10/2015 11:47 AM, Arnd Bergmann wrote: On Tuesday 10 November 2015 11:06:40 Sinan Kaya wrote: On 11/10/2015 3:38 AM, Arnd Bergmann wrote: > No, as Timur found, the driver is correct and it intentionally sets the 32-bit mask, and that is guaranteed to work on all sane hardware. Don't change the driver but find a better platform for your workload, or talk to the people that are responsible for the platform and get them to fix it. Platform does have an IOMMU. No issues there. I am trying to clean out the patch pipe I have in order to get this card working with and without IOMMU. On PowerPC, I think we automatically enable the IOMMU whenever a DMA mask is set that doesn't cover all of the RAM. We could think about doing the same thing on ARM64 to make all devices work out of the box. The ACPI IORT table declares whether you enable IOMMU for a particular device or not. The placement of IOMMU HW is system specific. The IORT table gives the IOMMU HW topology to the operating system. If the platform also doesn't have an IOMMU, you can probably work around it by setting up the dma-ranges property of the PCI host to map the low PCI addresses to the start of RAM. This will also require changes in the bootloader to set up the PCI outbound translation, and it will require implementing the DMA offset on ARM64, which I was hoping to avoid. From the email thread, it looks like this was introduced to support some legacy card that has 64 bit addressing limitations and is being carried around ("rotted") since then. I'm the second guy after the powerpc architecture complaining about the very same issue. Any red flags? What BenH was worried about here is that the driver sets different masks for streaming and coherent mappings, which is indeed a worry that could hit us on ARM as well, but I suppose we'll have to deal with that in platform code. Setting both masks to 32-bit is something that a lot of drivers do, and without IOMMU enabled, you'd hit the same bug on all of them. Maybe, maybe not. This is the only card that I had problems with. I can't change the address map for PCIe. SBSA requires all inbound PCIe addresses to be non-translated. What about changing the memory map? I suspect there will be more problems for you in the future when all of your RAM is at high addresses. Is this something you could fix in the bootloader by moving the first 2GB to a different CPU physical address? I'm thinking about this. I'll just have to stick with IOMMU for this card. Ok. But how do you currently decide whether to use the IOMMU or not? ACPI table. I wanted to get this fix in so that all operating systems whether they have IOMMU driver enabled or not would work. Arnd -- Sinan Kaya Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 1/3] scsi: mptxsas: try 64 bit DMA when 32 bit DMA fails
On 11/10/2015 3:38 AM, Arnd Bergmann wrote: > No, as Timur found, the driver is correct and it intentionally sets the 32-bit mask, and that is guaranteed to work on all sane hardware. Don't change the driver but find a better platform for your workload, or talk to the people that are responsible for the platform and get them to fix it. Platform does have an IOMMU. No issues there. I am trying to clean out the patch pipe I have in order to get this card working with and without IOMMU. If the platform also doesn't have an IOMMU, you can probably work around it by setting up the dma-ranges property of the PCI host to map the low PCI addresses to the start of RAM. This will also require changes in the bootloader to set up the PCI outbound translation, and it will require implementing the DMA offset on ARM64, which I was hoping to avoid. From the email thread, it looks like this was introduced to support some legacy card that has 64 bit addressing limitations and is being carried around ("rotted") since then. I'm the second guy after the powerpc architecture complaining about the very same issue. Any red flags? I can't change the address map for PCIe. SBSA requires all inbound PCIe addresses to be non-translated. I'll just have to stick with IOMMU for this card. -- Sinan Kaya Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 3/3] scsi: mptxsas: offload IRQ execution
On 11/9/2015 2:15 AM, Hannes Reinecke wrote: On 11/09/2015 02:57 AM, Sinan Kaya wrote: The mpt2sas and mpt3sas drivers are spinning forever in their IRQ handlers if there are a lot of jobs queued up by the PCIe card. This handler is causing spikes for the rest of the system and sluggish behavior. Marking all MSI interrupts as non-shared and moving the MSI interrupts to thread context. This relexes the rest of the system execution. NACK. If there is a scalability issue when handling interrupts it should be fixed in the driver directly. Looking at the driver is should be possible to implement a worker thread handling the reply descriptor, and having the interrupt only to fetch the reply descriptor. Can you go into the detail about which part of the _base_interrupt function needs to be executed in ISR context and which part can be queued up to worker thread? I'm not familiar with the hardware or the code. That's why, I moved the entire ISR into the thread context. Cheers, Hannes -- Sinan Kaya Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 2/3] scsi: fix compiler warning for sg
On 11/9/2015 10:26 PM, Timur Tabi wrote: Sinan Kaya wrote: I created this patch back in March with an older version of the compiler and older kernel (3.19). I'm no longer able to reproduce this with this compiler and linux-next. Thread model: posix gcc version 4.8.3 20140401 (prerelease) (crosstool-NG linaro-1.13.1-4.8-2014.04 - Linaro GCC 4.8-2014.04) I'll drop this patch. Are you sure the compiler handles the old macro correctly? Maybe it's just quiescing the error message, but it's still broken? The code says it is using these macros for small integers only which can't overflow. I was trying to get rid of compiler warning and it seems to have disappeared. -- Sinan Kaya Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 2/3] scsi: fix compiler warning for sg
On 11/9/2015 9:14 AM, Andy Shevchenko wrote: Parens are useless, noticed later, sorry. Isn't mult_frac() enough here? Btw, can you mention explicitly what is the warning you get (copy'n'paste of the line would be okay)? I created this patch back in March with an older version of the compiler and older kernel (3.19). I'm no longer able to reproduce this with this compiler and linux-next. Thread model: posix gcc version 4.8.3 20140401 (prerelease) (crosstool-NG linaro-1.13.1-4.8-2014.04 - Linaro GCC 4.8-2014.04) I'll drop this patch. -- Sinan Kaya Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 1/3] scsi: mptxsas: try 64 bit DMA when 32 bit DMA fails
On 11/9/2015 9:33 AM, Arnd Bergmann wrote: On Monday 09 November 2015 09:07:36 Sinan Kaya wrote: On 11/9/2015 3:59 AM, Arnd Bergmann wrote: On Monday 09 November 2015 08:09:39 Hannes Reinecke wrote: On 11/09/2015 02:57 AM, Sinan Kaya wrote: Current code gives up when 32 bit DMA is not supported. This problem has been observed on systems without any memory below 4 gig. This patch tests 64 bit support before bailing out to find a working combination. That feels decidedly odd. Why do you probe for 64bit if 32bit fails? 32-bit DMA mask on PCI cannot fail, we rely on that in all sorts of places. If the machine has all of its RAM visible above 4GB PCI bus addresses, it must use an IOMMU. Can you be specific? PCIe does not have this limitation. It supports 32 bit and 64 bit TLPs. I have not seen any limitation so far in the OS either. See Documentation/DMA-API-HOWTO.txt All PCI devices start out with a 32-bit DMA mask, and Linux assumes it can always fall back to a 32-bit mask if a smaller mask (needed for some devices that only support DMA on a subset of the 4GB) or larger mask (needed to address high memory but can fail when the PCI host does not support it) cannot be set. Using IOMMU is fine but not required if the endpoint is a true 64 bit supporting endpoint. This endpoint supports 64bit too. There are two aspects here: a) setting a 32-bit mask should not have failed. Any device that actually needs 32-bit DMA will make the same call and the platform must guarantee that this works. If you have a broken platform that can't do this, complain to your board vendor so they wire up the RAM properly, with at least the first 2GB visible to low PCI bus addresses. b) If the platform has any memory above 4GB and the supports high DMA, it should never have asked for the 32-bit mask before trying the 64-bit mask first. This is only a performance optimization, but the driver seems to do the right thing, so I assume there is something wrong with the platform code. Typically it's the other way round, on the grounds that 64bit DMA should be preferred over 32bit. Can you explain why it needs to be done the other way round here? Something else is odd here, the driver already checks for dma_get_required_mask(), which will return the smallest mask that fits all of RAM. If the machine has any memory above 4GB, it already uses the 64-bit mask, and only falls back to the 32-bit mask if that fails or if all memory fits within the first 4GB. I'll add some prints in the code to get to the bottom of it. I think the code is checking more than just the required mask and failing in one of the other conditions. At least that DMA comparison code was more than what I have ever seen. Ok. That should take care of b) above, but we still have a bug with a) that will come back to bite you if you don't address it right. Arnd B should have worked and it doesn't. B works for other drivers but not for this particular one. As you said, "it should never have asked for the 32-bit mask before trying the 64-bit mask first." this is the problem. This HW doesn't have support for a. If I need a, I am required to use IOMMU. Here is what I found. mpt2sas version 20.100.00.00 loaded sizeof(dma_addr_t):8 ioc->dma_mask :0 dma_get_required_mask(&pdev->dev) :7f mpt2sas0: no suitable DMA mask for 0002:01:00.0 mpt2sas0: failure at drivers/scsi/mpt2sas/mpt2sas_scsih.c:8498/_scsih_probe()! Here is the code. if (ioc->dma_mask) consistent_dma_mask = DMA_BIT_MASK(64); else consistent_dma_mask = DMA_BIT_MASK(32); <-- why here? if (sizeof(dma_addr_t) > 4) {<-- true const uint64_t required_mask = dma_get_required_mask(&pdev->dev); if ((required_mask > DMA_BIT_MASK(32)) &&<-- true !pci_set_dma_mask(pdev, DMA_BIT_MASK(64)) && <-- true !pci_set_consistent_dma_mask(pdev, consistent_dma_mask)) { <-- false ioc->base_add_sg_single = &_base_add_sg_single_64; ioc->sge_size = sizeof(Mpi2SGESimple64_t); ioc->dma_mask = 64; goto out; } } ioc->dma_mask is 0 and the driver is trying to use 32 bit even though 64 bit supported by the platform. I think the proper fix is to pass the required_mask back to consistent_dma_mask rather than using ioc->dma_mask and guessing. pci_set_consistent_dma_mask(pdev, required_mask) My code was just a band aid for broken code. The driver worked after that changing the above line only. mpt2sas_version_20.100.00.00_loaded sizeof(dma_addr_t) :8 ioc->dma_mask :0 dma_get_required_mask(&pdev->dev) :7f mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (1642 kB) mpt2sas0: MSI-X vectors supported: 1, no of cores: 24, max_msix_vecto
Re: [PATCH V2 1/3] scsi: mptxsas: try 64 bit DMA when 32 bit DMA fails
On 11/9/2015 3:59 AM, Arnd Bergmann wrote: On Monday 09 November 2015 08:09:39 Hannes Reinecke wrote: On 11/09/2015 02:57 AM, Sinan Kaya wrote: Current code gives up when 32 bit DMA is not supported. This problem has been observed on systems without any memory below 4 gig. This patch tests 64 bit support before bailing out to find a working combination. That feels decidedly odd. Why do you probe for 64bit if 32bit fails? 32-bit DMA mask on PCI cannot fail, we rely on that in all sorts of places. If the machine has all of its RAM visible above 4GB PCI bus addresses, it must use an IOMMU. Can you be specific? PCIe does not have this limitation. It supports 32 bit and 64 bit TLPs. I have not seen any limitation so far in the OS either. Using IOMMU is fine but not required if the endpoint is a true 64 bit supporting endpoint. This endpoint supports 64bit too. Typically it's the other way round, on the grounds that 64bit DMA should be preferred over 32bit. Can you explain why it needs to be done the other way round here? Something else is odd here, the driver already checks for dma_get_required_mask(), which will return the smallest mask that fits all of RAM. If the machine has any memory above 4GB, it already uses the 64-bit mask, and only falls back to the 32-bit mask if that fails or if all memory fits within the first 4GB. I'll add some prints in the code to get to the bottom of it. I think the code is checking more than just the required mask and failing in one of the other conditions. At least that DMA comparison code was more than what I have ever seen. So both the description and the patch are wrong. :( Arnd -- Sinan Kaya Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 3/3] scsi: mptxsas: offload IRQ execution
On 11/9/2015 2:15 AM, Hannes Reinecke wrote: On 11/09/2015 02:57 AM, Sinan Kaya wrote: The mpt2sas and mpt3sas drivers are spinning forever in their IRQ handlers if there are a lot of jobs queued up by the PCIe card. This handler is causing spikes for the rest of the system and sluggish behavior. Marking all MSI interrupts as non-shared and moving the MSI interrupts to thread context. This relexes the rest of the system execution. NACK. If there is a scalability issue when handling interrupts it should be fixed in the driver directly. Looking at the driver is should be possible to implement a worker thread handling the reply descriptor, and having the interrupt only to fetch the reply descriptor. I'll take a look. Cheers, Hannes -- Sinan Kaya Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 1/3] scsi: mptxsas: try 64 bit DMA when 32 bit DMA fails
On 11/9/2015 2:09 AM, Hannes Reinecke wrote: On 11/09/2015 02:57 AM, Sinan Kaya wrote: Current code gives up when 32 bit DMA is not supported. This problem has been observed on systems without any memory below 4 gig. This patch tests 64 bit support before bailing out to find a working combination. That feels decidedly odd. Why do you probe for 64bit if 32bit fails? Typically it's the other way round, on the grounds that 64bit DMA should be preferred over 32bit. Can you explain why it needs to be done the other way round here? Cheers, Hannes The platform does not have any memory below 4G. So, 32 bit DMA is not possible. I'm trying to use 64 bit DMA instead since both the platform and hardware supports it. Current code will not try 64 bit DMA if 32 bit DMA is not working. -- Sinan Kaya Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2 3/3] scsi: mptxsas: offload IRQ execution
The mpt2sas and mpt3sas drivers are spinning forever in their IRQ handlers if there are a lot of jobs queued up by the PCIe card. This handler is causing spikes for the rest of the system and sluggish behavior. Marking all MSI interrupts as non-shared and moving the MSI interrupts to thread context. This relexes the rest of the system execution. Signed-off-by: Sinan Kaya --- drivers/scsi/mpt2sas/mpt2sas_base.c | 12 drivers/scsi/mpt3sas/mpt3sas_base.c | 13 + 2 files changed, 17 insertions(+), 8 deletions(-) diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c b/drivers/scsi/mpt2sas/mpt2sas_base.c index c61c82a..b619a0e 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_base.c +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c @@ -1359,14 +1359,18 @@ _base_request_irq(struct MPT2SAS_ADAPTER *ioc, u8 index, u32 vector) cpumask_clear(reply_q->affinity_hint); atomic_set(&reply_q->busy, 0); - if (ioc->msix_enable) + if (ioc->msix_enable) { snprintf(reply_q->name, MPT_NAME_LENGTH, "%s%d-msix%d", MPT2SAS_DRIVER_NAME, ioc->id, index); - else + r = request_threaded_irq(vector, NULL, _base_interrupt, +IRQF_TRIGGER_RISING | IRQF_ONESHOT, +reply_q->name, reply_q); + } else { snprintf(reply_q->name, MPT_NAME_LENGTH, "%s%d", MPT2SAS_DRIVER_NAME, ioc->id); - r = request_irq(vector, _base_interrupt, IRQF_SHARED, reply_q->name, - reply_q); + r = request_irq(vector, _base_interrupt, IRQF_SHARED, + reply_q->name, reply_q); + } if (r) { printk(MPT2SAS_ERR_FMT "unable to allocate interrupt %d!\n", reply_q->name, vector); diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index 6dc391c..62bee43 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -1661,14 +1661,19 @@ _base_request_irq(struct MPT3SAS_ADAPTER *ioc, u8 index, u32 vector) cpumask_clear(reply_q->affinity_hint); atomic_set(&reply_q->busy, 0); - if (ioc->msix_enable) + if (ioc->msix_enable) { snprintf(reply_q->name, MPT_NAME_LENGTH, "%s%d-msix%d", MPT3SAS_DRIVER_NAME, ioc->id, index); - else + + r = request_threaded_irq(vector, NULL, _base_interrupt, +IRQF_TRIGGER_RISING | IRQF_ONESHOT, +reply_q->name, reply_q); + } else { snprintf(reply_q->name, MPT_NAME_LENGTH, "%s%d", MPT3SAS_DRIVER_NAME, ioc->id); - r = request_irq(vector, _base_interrupt, IRQF_SHARED, reply_q->name, - reply_q); + r = request_irq(vector, _base_interrupt, IRQF_SHARED, + reply_q->name, reply_q); + } if (r) { pr_err(MPT3SAS_FMT "unable to allocate interrupt %d!\n", reply_q->name, vector); -- Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2 1/3] scsi: mptxsas: try 64 bit DMA when 32 bit DMA fails
Current code gives up when 32 bit DMA is not supported. This problem has been observed on systems without any memory below 4 gig. This patch tests 64 bit support before bailing out to find a working combination. Signed-off-by: Sinan Kaya --- drivers/scsi/mpt2sas/mpt2sas_base.c | 21 - drivers/scsi/mpt3sas/mpt3sas_base.c | 22 +- 2 files changed, 41 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c b/drivers/scsi/mpt2sas/mpt2sas_base.c index c167911..c61c82a 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_base.c +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c @@ -1217,8 +1217,27 @@ _base_config_dma_addressing(struct MPT2SAS_ADAPTER *ioc, struct pci_dev *pdev) ioc->base_add_sg_single = &_base_add_sg_single_32; ioc->sge_size = sizeof(Mpi2SGESimple32_t); ioc->dma_mask = 32; - } else + } else { + /* Try 64 bit, 32 bit failed */ + consistent_dma_mask = DMA_BIT_MASK(64); + + if (sizeof(dma_addr_t) > 4) { + const uint64_t required_mask = + dma_get_required_mask(&pdev->dev); + if ((required_mask > DMA_BIT_MASK(32)) && + !pci_set_dma_mask(pdev, DMA_BIT_MASK(64)) && + !pci_set_consistent_dma_mask(pdev, + consistent_dma_mask)) { + ioc->base_add_sg_single = + &_base_add_sg_single_64; + ioc->sge_size = sizeof(Mpi2SGESimple64_t); + ioc->dma_mask = 64; + goto out; + } + } + return -ENODEV; + } out: si_meminfo(&s); diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index d4f1dcd..6dc391c 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -1535,9 +1535,29 @@ _base_config_dma_addressing(struct MPT3SAS_ADAPTER *ioc, struct pci_dev *pdev) ioc->base_add_sg_single = &_base_add_sg_single_32; ioc->sge_size = sizeof(Mpi2SGESimple32_t); ioc->dma_mask = 32; - } else + } else { + /* Try 64 bit, 32 bit failed */ + consistent_dma_mask = DMA_BIT_MASK(64); + if (sizeof(dma_addr_t) > 4) { + const uint64_t required_mask = + dma_get_required_mask(&pdev->dev); + int consistent_mask = + pci_set_consistent_dma_mask(pdev, + consistent_dma_mask); + + if ((required_mask > DMA_BIT_MASK(32)) && + !pci_set_dma_mask(pdev, DMA_BIT_MASK(64)) && + !consistent_mask) { + ioc->base_add_sg_single = + &_base_add_sg_single_64; + ioc->sge_size = sizeof(Mpi2SGESimple64_t); + ioc->dma_mask = 64; + goto out; + } + } return -ENODEV; + } out: si_meminfo(&s); pr_info(MPT3SAS_FMT -- Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2 2/3] scsi: fix compiler warning for sg
The MULDIV macro has been designed for small numbers. Compiler emits an overflow warning on 64 bit systems. This patch uses 64 bit numbers in order to suppress warning. Signed-off-by: Sinan Kaya --- drivers/scsi/sg.c | 20 +--- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c index 9d7b7db..112d8974 100644 --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -51,6 +51,7 @@ static int sg_version_num = 30536;/* 2 digits for each component */ #include #include #include +#include #include "scsi.h" #include @@ -85,12 +86,17 @@ static void sg_proc_cleanup(void); * Replacing muldiv(x) by muldiv(x)=((x % d) * m) / d + int(x / d) * m * calculates the same, but prevents the overflow when both m and d * are "small" numbers (like HZ and USER_HZ). - * Of course an overflow is inavoidable if the result of muldiv doesn't fit - * in 32 bits. */ -#define MULDIV(X,MUL,DIV) X % DIV) * MUL) / DIV) + ((X / DIV) * MUL)) +static inline u64 mult_frac64(u64 x, u32 numer, u32 denom) +{ + u64 r1 = do_div(x, denom); + u64 r2 = r1 * numer; + + do_div(r2, denom); + return (x * numer) + r2; +} -#define SG_DEFAULT_TIMEOUT MULDIV(SG_DEFAULT_TIMEOUT_USER, HZ, USER_HZ) +#define SG_DEFAULT_TIMEOUT mult_frac64(SG_DEFAULT_TIMEOUT_USER, HZ, USER_HZ) int sg_big_buff = SG_DEF_RESERVED_SIZE; /* N.B. This variable is readable and writeable via @@ -877,10 +883,10 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg) return result; if (val < 0) return -EIO; - if (val >= MULDIV (INT_MAX, USER_HZ, HZ)) - val = MULDIV (INT_MAX, USER_HZ, HZ); + if (val >= mult_frac64(INT_MAX, USER_HZ, HZ)) + val = mult_frac64(INT_MAX, USER_HZ, HZ); sfp->timeout_user = val; - sfp->timeout = MULDIV (val, HZ, USER_HZ); + sfp->timeout = mult_frac64(val, HZ, USER_HZ); return 0; case SG_GET_TIMEOUT:/* N.B. User receives timeout as return value */ -- Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2 0/3] scsi: mptxsas: updates for ARM64
Changes from V1: (https://lkml.org/lkml/2015/11/4/856) Changes from V1: (https://lkml.org/lkml/2015/11/4/859) * merge two patches together Changes from V1: (https://lkml.org/lkml/2015/11/4/857) * Use do_div to handle the linker errors on i386 Changes from V1: https://lkml.org/lkml/2015/11/4/858 * None Sinan Kaya (3): scsi: mptxsas: try 64 bit DMA when 32 bit DMA fails scsi: fix compiler warning for sg scsi: mptxsas: offload IRQ execution drivers/scsi/mpt2sas/mpt2sas_base.c | 33 - drivers/scsi/mpt3sas/mpt3sas_base.c | 35 ++- drivers/scsi/sg.c | 20 +--- 3 files changed, 71 insertions(+), 17 deletions(-) -- Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] scsi: fix compiler warning for sg
On 11/5/2015 2:56 PM, Andy Shevchenko wrote: On Thu, Nov 5, 2015 at 9:31 PM, Andy Shevchenko wrote: On Thu, Nov 5, 2015 at 8:32 PM, Sinan Kaya wrote: On 11/5/2015 1:07 PM, Andy Shevchenko wrote: Let's try again. static inline u64 mult_frac64(u64 x, u32 numer, u32 denom) { u64 rem = x % denom; u64 quot = do_div(x, denom); u64 mul = rem * numer; return (quot * numer) + do_div(mul, denom); } First of all why not to put this to generic header? We have math64.h and kernel.h. Might be a good idea (needs to check current users) to move mult_frac to math64.h. Then, x % y is already a problem. After all, you seems messed quot and remainder. What about something like #if BITS_PER_LONG == 64 #define mult_frac64(x,n,d) mult_frac(x,n,d) #else static inline u64 mult_frac64(u64 x, u32 numer, u32 denom) { u64 r1 = do_div(x, denom); u64 r2 = r1 * numer; do_div(r2, denom); return (x * numer) + r2; } I'll use this instead. This is cleaner, scalable and functionally correct to the original code. I'll post a patch with this soon. #endif ? One more look to the users of MULDIV. They all seems 32 bit for x. It means you don't need two do_div()s at all. Just do something like: u64 d = x * numer; do_div(d, denom); return d; -- Sinan Kaya Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] scsi: fix compiler warning for sg
On 11/5/2015 2:56 PM, Andy Shevchenko wrote: One more look to the users of MULDIV. They all seems 32 bit for x. It means you don't need two do_div()s at all. Just do something like: u64 d = x * numer; do_div(d, denom); return d; OK. I assume you want a change only in this file. -- Sinan Kaya Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] scsi: fix compiler warning for sg
On 11/5/2015 1:07 PM, Andy Shevchenko wrote: OK, I didn't know that we had such a macro. To make this look like the other >macro, I can do this. > >static inline u64 mult_frac64(u64 x, u32 numer, u32 denom) >{ > u64 quot; > u64 rem = x % denom; > u64 rem2; > > quot = x; > do_div(quot, denom); > > rem2 = rem * numer; > do_div(rem2, denom); > > return (quot * numer) + rem2; >} Might be I did a wrong smaple, but do_div() returns two values actually. You perhaps overlooked it and thus wrote something redundant above. OK, I was looking at example usages in the kernel. The ones I looked always used the first argument as an input & output parameter. I got nervous about overwriting something. void __ndelay(unsigned long long nsecs) { u64 end; nsecs <<= 9; do_div(nsecs, 125); ... } Let's try again. static inline u64 mult_frac64(u64 x, u32 numer, u32 denom) { u64 rem = x % denom; u64 quot = do_div(x, denom); u64 mul = rem * numer; return (quot * numer) + do_div(mul, denom); } I'll do a s/MULDIV/mult_frac64/g to address Timur's concern. -- Sinan Kaya Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] scsi: fix compiler warning for sg
On 11/5/2015 3:48 AM, Andy Shevchenko wrote: On Thu, Nov 5, 2015 at 6:46 AM, Sinan Kaya wrote: The MULDIV macro has been designed for small numbers. It emits an overflow warning on 64 bit systems. This patch places type casts in the parameters to fix the compiler warning. Signed-off-by: Sinan Kaya --- drivers/scsi/sg.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c index 9d7b7db..eb2739d 100644 --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -88,7 +88,10 @@ static void sg_proc_cleanup(void); * Of course an overflow is inavoidable if the result of muldiv doesn't fit * in 32 bits. */ -#define MULDIV(X,MUL,DIV) X % DIV) * MUL) / DIV) + ((X / DIV) * MUL)) +static inline u64 MULDIV(u64 X, u32 MUL, u32 DIV) +{ + return X % DIV) * MUL) / DIV) + ((X / DIV) * MUL)); +} Like kbuild bot already told you it would be nice to think of 32-bit architectures. Moreover we have mult_frac() macro already for 32-bit numbers. For 64 bit numbers you need to do do_div(). Like: static inline u64 mult_frac64(u64 x, u32 m, u32 n) { u64 ret; ret = do_div(x, n); return ret * m; } OK, I didn't know that we had such a macro. To make this look like the other macro, I can do this. static inline u64 mult_frac64(u64 x, u32 numer, u32 denom) { u64 quot; u64 rem = x % denom; u64 rem2; quot = x; do_div(quot, denom); rem2 = rem * numer; do_div(rem2, denom); return (quot * numer) + rem2; } #define MULDIV(X,MUL,DIV) mult_frac64(X, MUL, DIV) #define SG_DEFAULT_TIMEOUT MULDIV(SG_DEFAULT_TIMEOUT_USER, HZ, USER_HZ) -- Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Sinan Kaya Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] scsi: mpt3sas: try 64 bit DMA when 32 bit DMA fails
Current code gives up when 32 bit DMA is not supported. This patch tests 64 bit support before bailing out in such conditions. Signed-off-by: Sinan Kaya --- drivers/scsi/mpt3sas/mpt3sas_base.c | 22 +- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index d4f1dcd..6dc391c 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -1535,9 +1535,29 @@ _base_config_dma_addressing(struct MPT3SAS_ADAPTER *ioc, struct pci_dev *pdev) ioc->base_add_sg_single = &_base_add_sg_single_32; ioc->sge_size = sizeof(Mpi2SGESimple32_t); ioc->dma_mask = 32; - } else + } else { + /* Try 64 bit, 32 bit failed */ + consistent_dma_mask = DMA_BIT_MASK(64); + if (sizeof(dma_addr_t) > 4) { + const uint64_t required_mask = + dma_get_required_mask(&pdev->dev); + int consistent_mask = + pci_set_consistent_dma_mask(pdev, + consistent_dma_mask); + + if ((required_mask > DMA_BIT_MASK(32)) && + !pci_set_dma_mask(pdev, DMA_BIT_MASK(64)) && + !consistent_mask) { + ioc->base_add_sg_single = + &_base_add_sg_single_64; + ioc->sge_size = sizeof(Mpi2SGESimple64_t); + ioc->dma_mask = 64; + goto out; + } + } return -ENODEV; + } out: si_meminfo(&s); pr_info(MPT3SAS_FMT -- Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] scsi: fix compiler warning for sg
The MULDIV macro has been designed for small numbers. It emits an overflow warning on 64 bit systems. This patch places type casts in the parameters to fix the compiler warning. Signed-off-by: Sinan Kaya --- drivers/scsi/sg.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c index 9d7b7db..eb2739d 100644 --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -88,7 +88,10 @@ static void sg_proc_cleanup(void); * Of course an overflow is inavoidable if the result of muldiv doesn't fit * in 32 bits. */ -#define MULDIV(X,MUL,DIV) X % DIV) * MUL) / DIV) + ((X / DIV) * MUL)) +static inline u64 MULDIV(u64 X, u32 MUL, u32 DIV) +{ + return X % DIV) * MUL) / DIV) + ((X / DIV) * MUL)); +} #define SG_DEFAULT_TIMEOUT MULDIV(SG_DEFAULT_TIMEOUT_USER, HZ, USER_HZ) -- Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] scsi: mptxsas: offload IRQ execution
The mpt2sas and mpt3sas drivers are spinning forever in their IRQ handlers if there is a lot of job queued up by the PCIe card. This handler is causing spikes for the rest of the system and sluggish behavior. Marking all MSI interrupts as non-shared and moving the MSI interrupts to thread context. This relexes the rest of the system execution. Signed-off-by: Sinan Kaya --- drivers/scsi/mpt2sas/mpt2sas_base.c | 13 + drivers/scsi/mpt3sas/mpt3sas_base.c | 14 ++ 2 files changed, 19 insertions(+), 8 deletions(-) mode change 100644 => 100755 drivers/scsi/mpt3sas/mpt3sas_base.c diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c b/drivers/scsi/mpt2sas/mpt2sas_base.c index c61c82a..ee2aead 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_base.c +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c @@ -1359,14 +1359,19 @@ _base_request_irq(struct MPT2SAS_ADAPTER *ioc, u8 index, u32 vector) cpumask_clear(reply_q->affinity_hint); atomic_set(&reply_q->busy, 0); - if (ioc->msix_enable) + if (ioc->msix_enable) { snprintf(reply_q->name, MPT_NAME_LENGTH, "%s%d-msix%d", MPT2SAS_DRIVER_NAME, ioc->id, index); - else + r = request_threaded_irq(vector, NULL, _base_interrupt, +IRQF_TRIGGER_RISING | IRQF_ONESHOT, +reply_q->name, reply_q); + } + else { snprintf(reply_q->name, MPT_NAME_LENGTH, "%s%d", MPT2SAS_DRIVER_NAME, ioc->id); - r = request_irq(vector, _base_interrupt, IRQF_SHARED, reply_q->name, - reply_q); + r = request_irq(vector, _base_interrupt, IRQF_SHARED, + reply_q->name, reply_q); + } if (r) { printk(MPT2SAS_ERR_FMT "unable to allocate interrupt %d!\n", reply_q->name, vector); diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c old mode 100644 new mode 100755 index 6dc391c..c29f359 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -1661,14 +1661,20 @@ _base_request_irq(struct MPT3SAS_ADAPTER *ioc, u8 index, u32 vector) cpumask_clear(reply_q->affinity_hint); atomic_set(&reply_q->busy, 0); - if (ioc->msix_enable) + if (ioc->msix_enable) { snprintf(reply_q->name, MPT_NAME_LENGTH, "%s%d-msix%d", MPT3SAS_DRIVER_NAME, ioc->id, index); - else + + r = request_threaded_irq(vector, NULL, _base_interrupt, +IRQF_TRIGGER_RISING | IRQF_ONESHOT, +reply_q->name, reply_q); + } + else { snprintf(reply_q->name, MPT_NAME_LENGTH, "%s%d", MPT3SAS_DRIVER_NAME, ioc->id); - r = request_irq(vector, _base_interrupt, IRQF_SHARED, reply_q->name, - reply_q); + r = request_irq(vector, _base_interrupt, IRQF_SHARED, + reply_q->name, reply_q); + } if (r) { pr_err(MPT3SAS_FMT "unable to allocate interrupt %d!\n", reply_q->name, vector); -- Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] scsi: mpt2sas: try 64 bit DMA when 32 bit DMA fails
Current code gives up when 32 bit DMA is not supported. This patch tests 64 bit support before bailing out in such conditions. Signed-off-by: Sinan Kaya --- drivers/scsi/mpt2sas/mpt2sas_base.c | 21 - 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c b/drivers/scsi/mpt2sas/mpt2sas_base.c index c167911..c61c82a 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_base.c +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c @@ -1217,8 +1217,27 @@ _base_config_dma_addressing(struct MPT2SAS_ADAPTER *ioc, struct pci_dev *pdev) ioc->base_add_sg_single = &_base_add_sg_single_32; ioc->sge_size = sizeof(Mpi2SGESimple32_t); ioc->dma_mask = 32; - } else + } else { + /* Try 64 bit, 32 bit failed */ + consistent_dma_mask = DMA_BIT_MASK(64); + + if (sizeof(dma_addr_t) > 4) { + const uint64_t required_mask = + dma_get_required_mask(&pdev->dev); + if ((required_mask > DMA_BIT_MASK(32)) && + !pci_set_dma_mask(pdev, DMA_BIT_MASK(64)) && + !pci_set_consistent_dma_mask(pdev, + consistent_dma_mask)) { + ioc->base_add_sg_single = + &_base_add_sg_single_64; + ioc->sge_size = sizeof(Mpi2SGESimple64_t); + ioc->dma_mask = 64; + goto out; + } + } + return -ENODEV; + } out: si_meminfo(&s); -- Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html