Re: [PATCH v3 04/13] lpfc: Add push-to-adapter support to sli4
On 2/20/2018 10:56 PM, Johannes Thumshirn wrote: Yes - I know. On the WC issue though, given how tightly bound the behavior is with the platform as well as whether it provides a real benefit vs a simple "it works", I don't believe this is one that I want to be "generic" on. Hmmm OK, but this won't make out ARM and PPC teams very happy. They are free to contact me, suggest something different, we'll benchmark and and will certainly change is there's a benefit. That's what we did on PPC at the start of this thread. -- james
Re: [PATCH v3 04/13] lpfc: Add push-to-adapter support to sli4
On Tue, Feb 20, 2018 at 10:25:56AM -0800, James Smart wrote: > > Wouldn't it be better to improve the 32Bit writeq() code? > > Well, now that I'm asking for specific details internally, I'm finding that > no one can find the failing machine any more. > > I'm going to keep looking (and testing) for another day or two, and if > nothing pops up, will repost removing the 64bit define. Thanks :-) > Yes - I know. On the WC issue though, given how tightly bound the behavior > is with the platform as well as whether it provides a real benefit vs a > simple "it works", I don't believe this is one that I want to be "generic" > on. Hmmm OK, but this won't make out ARM and PPC teams very happy. Well let's see, Johannes -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH v3 04/13] lpfc: Add push-to-adapter support to sli4
On 2/19/2018 12:14 AM, Johannes Thumshirn wrote: On Fri, Feb 16, 2018 at 08:53:44AM -0800, James Smart wrote: Any reason you can't use writeq() on 32 Bit as well? There's a compat version in linux/io-64-nonatomic-hi-lo.h. We actually ran into issues on the existence of writeq() on a 32bit platform. Thus this code block. Oh can you elaborate more on the issue? I bet if we merge it that way, someone comes around with a patch chaning it to writeq() on 32Bit as well. Wouldn't it be better to improve the 32Bit writeq() code? Well, now that I'm asking for specific details internally, I'm finding that no one can find the failing machine any more. I'm going to keep looking (and testing) for another day or two, and if nothing pops up, will repost removing the 64bit define. Generally speaking (same for the WC issue), ifdefs (especially architecture specific ones) in driver code should be avoided. Yes - I know. On the WC issue though, given how tightly bound the behavior is with the platform as well as whether it provides a real benefit vs a simple "it works", I don't believe this is one that I want to be "generic" on. -- james
Re: [PATCH v3 04/13] lpfc: Add push-to-adapter support to sli4
On Fri, Feb 16, 2018 at 08:53:44AM -0800, James Smart wrote: > > Any reason you can't use writeq() on 32 Bit as well? There's a compat > > version > > in linux/io-64-nonatomic-hi-lo.h. > > We actually ran into issues on the existence of writeq() on a 32bit > platform. Thus this code block. Oh can you elaborate more on the issue? I bet if we merge it that way, someone comes around with a patch chaning it to writeq() on 32Bit as well. Wouldn't it be better to improve the 32Bit writeq() code? Generally speaking (same for the WC issue), ifdefs (especially architecture specific ones) in driver code should be avoided. Thanks, Johannes -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH v3 04/13] lpfc: Add push-to-adapter support to sli4
On 2/14/2018 1:30 AM, Johannes Thumshirn wrote: On Tue, Feb 13, 2018 at 11:34:48AM -0800, James Smart wrote: [...] diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c index 3bff1f9c5df7..5e03b2c969e5 100644 --- a/drivers/scsi/lpfc/lpfc_sli.c +++ b/drivers/scsi/lpfc/lpfc_sli.c @@ -35,6 +35,9 @@ #include #include #include +#ifdef CONFIG_X86 +#include +#endif Not needed anymore now you've killed set_memory_wc(), isn't it? Agree... but, we've done more timing and it turns out the ioremap_wc() on X86 isn't behaving quite the same as set_memory_wc(). Works, but it's actually slower. I think ioremap_wc() is additionally making it cacheable, which seems to be delaying the postings to the io bus (even if wc) until the memory barrier. While the set_memory_wc() seems to flush as soon as the cacheline is filled. Given everything we've seen so far - I'm going back to using set_memory_wc() as it's the fastest latency option we've measured. [...] + if (q->dpp_enable && q->phba->cfg_enable_dpp) { + /* write to DPP aperture taking advatage of Combined Writes */ + tmp = (uint8_t *)wqe; +#ifdef CONFIG_64BIT + for (i = 0; i < q->entry_size; i += sizeof(uint64_t)) + writeq(*((uint64_t *)(tmp + i)), q->dpp_regaddr + i); +#else + for (i = 0; i < q->entry_size; i += sizeof(uint32_t)) + writel(*((uint32_t *)(tmp + i)), q->dpp_regaddr + i); +#endif + } + /* ensure WQE bcopy and DPP flushed before doorbell write */ Any reason you can't use writeq() on 32 Bit as well? There's a compat version in linux/io-64-nonatomic-hi-lo.h. We actually ran into issues on the existence of writeq() on a 32bit platform. Thus this code block. -- james
Re: [PATCH v3 04/13] lpfc: Add push-to-adapter support to sli4
On Tue, Feb 13, 2018 at 11:34:48AM -0800, James Smart wrote: [...] > diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c > index 3bff1f9c5df7..5e03b2c969e5 100644 > --- a/drivers/scsi/lpfc/lpfc_sli.c > +++ b/drivers/scsi/lpfc/lpfc_sli.c > @@ -35,6 +35,9 @@ > #include > #include > #include > +#ifdef CONFIG_X86 > +#include > +#endif Not needed anymore now you've killed set_memory_wc(), isn't it? [...] > + if (q->dpp_enable && q->phba->cfg_enable_dpp) { > + /* write to DPP aperture taking advatage of Combined Writes */ > + tmp = (uint8_t *)wqe; > +#ifdef CONFIG_64BIT > + for (i = 0; i < q->entry_size; i += sizeof(uint64_t)) > + writeq(*((uint64_t *)(tmp + i)), q->dpp_regaddr + i); > +#else > + for (i = 0; i < q->entry_size; i += sizeof(uint32_t)) > + writel(*((uint32_t *)(tmp + i)), q->dpp_regaddr + i); > +#endif > + } > + /* ensure WQE bcopy and DPP flushed before doorbell write */ Any reason you can't use writeq() on 32 Bit as well? There's a compat version in linux/io-64-nonatomic-hi-lo.h. Thanks, Johannes -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
[PATCH v3 04/13] lpfc: Add push-to-adapter support to sli4
New if_type=6 adapters support an additional BAR that provides apertures to allow direct WQE to adapter push support - termed Direct Packet Push (DPP). WQ creation differs slightly to ask for a WQ to be DPP-ized. When submitting a WQE to a DPP WQ, it is submitted to the host memory for the WQ normally, but is also written by the host cpu directly to a BAR aperture. Write buffer coalescing in hardware is (hopefully) turned on, enabling single pci write operation support. The doorbell is thing rung to indicate the WQE is available and was pushed to the aperture. This patch: - Updates the WQ Create commands for the DPP options - Adds the bar mapping for if_type=6 DPP bar - Adds the WQE pushing to the DDP aperture received from WQ create - Adds a new module parameter to disable DPP operation if desired. Default is enabled. Signed-off-by: Dick KennedySigned-off-by: James Smart --- v3: remove unnecessary parens use ioremap_wc() instead of set_memory_wc(). the wc property is now set by default on the BAR. if direct push is disabled, the BAR won't be used so it won't matter what is set on it. Track cases where the ioremap_wc() may not succeed, leaving bar pointer NULL. In this case, disable direct push. As some platforms will honor ioremap_wc() but not truly enable wc, change default for direct push so enabled only on X86. --- drivers/scsi/lpfc/lpfc.h | 3 +- drivers/scsi/lpfc/lpfc_attr.c | 14 +++ drivers/scsi/lpfc/lpfc_hw4.h | 31 ++ drivers/scsi/lpfc/lpfc_init.c | 17 drivers/scsi/lpfc/lpfc_sli.c | 218 ++ drivers/scsi/lpfc/lpfc_sli4.h | 16 +++- 6 files changed, 212 insertions(+), 87 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h index 9698b9635058..86ffb9756e65 100644 --- a/drivers/scsi/lpfc/lpfc.h +++ b/drivers/scsi/lpfc/lpfc.h @@ -840,7 +840,8 @@ struct lpfc_hba { uint32_t cfg_enable_SmartSAN; uint32_t cfg_enable_mds_diags; uint32_t cfg_enable_fc4_type; - uint32_t cfg_enable_bbcr; /*Enable BB Credit Recovery*/ + uint32_t cfg_enable_bbcr; /* Enable BB Credit Recovery */ + uint32_t cfg_enable_dpp;/* Enable Direct Packet Push */ uint32_t cfg_xri_split; #define LPFC_ENABLE_FCP 1 #define LPFC_ENABLE_NVME 2 diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c index 7be4bdef4d42..e90d5066f66b 100644 --- a/drivers/scsi/lpfc/lpfc_attr.c +++ b/drivers/scsi/lpfc/lpfc_attr.c @@ -5186,6 +5186,18 @@ LPFC_ATTR_R(enable_mds_diags, 0, 0, 1, "Enable MDS Diagnostics"); */ LPFC_BBCR_ATTR_RW(enable_bbcr, 1, 0, 1, "Enable BBC Recovery"); +/* + * lpfc_enable_dpp: Enable DPP on G7 + * 0 = DPP on G7 disabled + * 1 = DPP on G7 enabled (default) + * Value range is [0,1]. Default value is 1 on X86, 0 on other architectures. + */ +#ifdef CONFIG_X86 +LPFC_ATTR_RW(enable_dpp, 1, 0, 1, "Enable Direct Packet Push"); +#else +LPFC_ATTR_RW(enable_dpp, 0, 0, 1, "Enable Direct Packet Push"); +#endif + struct device_attribute *lpfc_hba_attrs[] = { _attr_nvme_info, _attr_bg_info, @@ -5294,6 +5306,7 @@ struct device_attribute *lpfc_hba_attrs[] = { _attr_lpfc_xlane_supported, _attr_lpfc_enable_mds_diags, _attr_lpfc_enable_bbcr, + _attr_lpfc_enable_dpp, NULL, }; @@ -6306,6 +6319,7 @@ lpfc_get_cfgparam(struct lpfc_hba *phba) lpfc_fcp_io_channel_init(phba, lpfc_fcp_io_channel); lpfc_nvme_io_channel_init(phba, lpfc_nvme_io_channel); lpfc_enable_bbcr_init(phba, lpfc_enable_bbcr); + lpfc_enable_dpp_init(phba, lpfc_enable_dpp); if (phba->sli_rev != LPFC_SLI_REV4) { /* NVME only supported on SLI4 */ diff --git a/drivers/scsi/lpfc/lpfc_hw4.h b/drivers/scsi/lpfc/lpfc_hw4.h index 93fd9fd10a0f..60ccff6fa8b0 100644 --- a/drivers/scsi/lpfc/lpfc_hw4.h +++ b/drivers/scsi/lpfc/lpfc_hw4.h @@ -1372,6 +1372,15 @@ struct lpfc_mbx_wq_create { #define lpfc_mbx_wq_create_page_size_MASK 0x00FF #define lpfc_mbx_wq_create_page_size_WORD word1 #define LPFC_WQ_PAGE_SIZE_4096 0x1 +#define lpfc_mbx_wq_create_dpp_req_SHIFT 15 +#define lpfc_mbx_wq_create_dpp_req_MASK0x0001 +#define lpfc_mbx_wq_create_dpp_req_WORDword1 +#define lpfc_mbx_wq_create_doe_SHIFT 14 +#define lpfc_mbx_wq_create_doe_MASK0x0001 +#define lpfc_mbx_wq_create_doe_WORDword1 +#define lpfc_mbx_wq_create_toe_SHIFT 13 +#define lpfc_mbx_wq_create_toe_MASK0x0001 +#define lpfc_mbx_wq_create_toe_WORDword1 #define lpfc_mbx_wq_create_wqe_size_SHIFT 8 #define lpfc_mbx_wq_create_wqe_size_MASK 0x000F #define lpfc_mbx_wq_create_wqe_size_WORD word1 @@ -1400,6 +1409,28 @@ struct lpfc_mbx_wq_create { #define lpfc_mbx_wq_create_db_format_MASK 0x #define