Re: [PATCH v3 04/13] lpfc: Add push-to-adapter support to sli4

2018-02-21 Thread James Smart

On 2/20/2018 10:56 PM, Johannes Thumshirn wrote:

Yes - I know. On the WC issue though, given how tightly bound the behavior
is with the platform as well as whether it provides a real benefit vs a
simple "it works", I don't believe this is one that I want to be "generic"
on.


Hmmm OK, but this won't make out ARM and PPC teams very happy.


They are free to contact me, suggest something different, we'll 
benchmark and and will certainly change is there's a benefit.  That's

what we did on PPC at the start of this thread.

-- james




Re: [PATCH v3 04/13] lpfc: Add push-to-adapter support to sli4

2018-02-20 Thread Johannes Thumshirn
On Tue, Feb 20, 2018 at 10:25:56AM -0800, James Smart wrote:
> > Wouldn't it be better to improve the 32Bit writeq() code?
> 
> Well, now that I'm asking for specific details internally, I'm finding that
> no one can find the failing machine any more.
> 
> I'm going to keep looking (and testing) for another day or two, and if
> nothing pops up, will repost removing the 64bit define.

Thanks :-)

> Yes - I know. On the WC issue though, given how tightly bound the behavior
> is with the platform as well as whether it provides a real benefit vs a
> simple "it works", I don't believe this is one that I want to be "generic"
> on.

Hmmm OK, but this won't make out ARM and PPC teams very happy.

Well let's see,
 Johannes
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [PATCH v3 04/13] lpfc: Add push-to-adapter support to sli4

2018-02-20 Thread James Smart

On 2/19/2018 12:14 AM, Johannes Thumshirn wrote:

On Fri, Feb 16, 2018 at 08:53:44AM -0800, James Smart wrote:

Any reason you can't use writeq() on 32 Bit as well? There's a compat version
in linux/io-64-nonatomic-hi-lo.h.


We actually ran into issues on the existence of writeq() on a 32bit
platform. Thus this code block.
  
Oh can you elaborate more on the issue? I bet if we merge it that way, someone

comes around with a patch chaning it to writeq() on 32Bit as well.

Wouldn't it be better to improve the 32Bit writeq() code?


Well, now that I'm asking for specific details internally, I'm finding 
that no one can find the failing machine any more.


I'm going to keep looking (and testing) for another day or two, and if 
nothing pops up, will repost removing the 64bit define.





Generally speaking (same for the WC issue), ifdefs (especially architecture
specific ones) in driver code should be avoided.


Yes - I know. On the WC issue though, given how tightly bound the 
behavior is with the platform as well as whether it provides a real 
benefit vs a simple "it works", I don't believe this is one that I want 
to be "generic" on.


-- james



Re: [PATCH v3 04/13] lpfc: Add push-to-adapter support to sli4

2018-02-19 Thread Johannes Thumshirn
On Fri, Feb 16, 2018 at 08:53:44AM -0800, James Smart wrote:
> > Any reason you can't use writeq() on 32 Bit as well? There's a compat 
> > version
> > in linux/io-64-nonatomic-hi-lo.h.
> 
> We actually ran into issues on the existence of writeq() on a 32bit
> platform. Thus this code block.
 
Oh can you elaborate more on the issue? I bet if we merge it that way, someone
comes around with a patch chaning it to writeq() on 32Bit as well.

Wouldn't it be better to improve the 32Bit writeq() code?

Generally speaking (same for the WC issue), ifdefs (especially architecture
specific ones) in driver code should be avoided.

Thanks,
Johannes
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [PATCH v3 04/13] lpfc: Add push-to-adapter support to sli4

2018-02-16 Thread James Smart

On 2/14/2018 1:30 AM, Johannes Thumshirn wrote:

On Tue, Feb 13, 2018 at 11:34:48AM -0800, James Smart wrote:
[...]

diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 3bff1f9c5df7..5e03b2c969e5 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -35,6 +35,9 @@
  #include 
  #include 
  #include 
+#ifdef CONFIG_X86
+#include 
+#endif


Not needed anymore now you've killed set_memory_wc(), isn't it?


Agree... but, we've done more timing and it turns out the ioremap_wc() 
on X86 isn't behaving quite the same as set_memory_wc().  Works, but 
it's actually slower. I think ioremap_wc() is additionally making it 
cacheable, which seems to be delaying the postings to the io bus (even 
if wc) until the memory barrier. While the set_memory_wc() seems to 
flush as soon as the cacheline is filled.


Given everything we've seen so far - I'm going back to using 
set_memory_wc() as it's the fastest latency option we've measured.





[...]


+   if (q->dpp_enable && q->phba->cfg_enable_dpp) {
+   /* write to DPP aperture taking advatage of Combined Writes */
+   tmp = (uint8_t *)wqe;
+#ifdef CONFIG_64BIT
+   for (i = 0; i < q->entry_size; i += sizeof(uint64_t))
+   writeq(*((uint64_t *)(tmp + i)), q->dpp_regaddr + i);
+#else
+   for (i = 0; i < q->entry_size; i += sizeof(uint32_t))
+   writel(*((uint32_t *)(tmp + i)), q->dpp_regaddr + i);
+#endif
+   }
+   /* ensure WQE bcopy and DPP flushed before doorbell write */


Any reason you can't use writeq() on 32 Bit as well? There's a compat version
in linux/io-64-nonatomic-hi-lo.h.


We actually ran into issues on the existence of writeq() on a 32bit 
platform. Thus this code block.


-- james




Re: [PATCH v3 04/13] lpfc: Add push-to-adapter support to sli4

2018-02-14 Thread Johannes Thumshirn
On Tue, Feb 13, 2018 at 11:34:48AM -0800, James Smart wrote:
[...]
> diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
> index 3bff1f9c5df7..5e03b2c969e5 100644
> --- a/drivers/scsi/lpfc/lpfc_sli.c
> +++ b/drivers/scsi/lpfc/lpfc_sli.c
> @@ -35,6 +35,9 @@
>  #include 
>  #include 
>  #include 
> +#ifdef CONFIG_X86
> +#include 
> +#endif

Not needed anymore now you've killed set_memory_wc(), isn't it?

[...]

> + if (q->dpp_enable && q->phba->cfg_enable_dpp) {
> + /* write to DPP aperture taking advatage of Combined Writes */
> + tmp = (uint8_t *)wqe;
> +#ifdef CONFIG_64BIT
> + for (i = 0; i < q->entry_size; i += sizeof(uint64_t))
> + writeq(*((uint64_t *)(tmp + i)), q->dpp_regaddr + i);
> +#else
> + for (i = 0; i < q->entry_size; i += sizeof(uint32_t))
> + writel(*((uint32_t *)(tmp + i)), q->dpp_regaddr + i);
> +#endif
> + }
> + /* ensure WQE bcopy and DPP flushed before doorbell write */

Any reason you can't use writeq() on 32 Bit as well? There's a compat version
in linux/io-64-nonatomic-hi-lo.h.

Thanks,
Johannes
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


[PATCH v3 04/13] lpfc: Add push-to-adapter support to sli4

2018-02-13 Thread James Smart
New if_type=6 adapters support an additional BAR that provides
apertures to allow direct WQE to adapter push support - termed
Direct Packet Push (DPP). WQ creation differs slightly to ask for
a WQ to be DPP-ized. When submitting a WQE to a DPP WQ, it is
submitted to the host memory for the WQ normally, but is also
written by the host cpu directly to a BAR aperture.  Write buffer
coalescing in hardware is (hopefully) turned on, enabling single
pci write operation support. The doorbell is thing rung to indicate
the WQE is available and was pushed to the aperture.

This patch:
- Updates the WQ Create commands for the DPP options
- Adds the bar mapping for if_type=6 DPP bar
- Adds the WQE pushing to the DDP aperture received from WQ create
- Adds a new module parameter to disable DPP operation if desired.
  Default is enabled.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 

---
v3:
  remove unnecessary parens
  use ioremap_wc() instead of set_memory_wc(). the wc property
is now set by default on the BAR. if direct push is disabled,
the BAR won't be used so it won't matter what is set on it.
Track cases where the ioremap_wc() may not succeed, leaving
bar pointer NULL. In this case, disable direct push.
  As some platforms will honor ioremap_wc() but not truly enable
wc, change default for direct push so enabled only on X86.
---
 drivers/scsi/lpfc/lpfc.h  |   3 +-
 drivers/scsi/lpfc/lpfc_attr.c |  14 +++
 drivers/scsi/lpfc/lpfc_hw4.h  |  31 ++
 drivers/scsi/lpfc/lpfc_init.c |  17 
 drivers/scsi/lpfc/lpfc_sli.c  | 218 ++
 drivers/scsi/lpfc/lpfc_sli4.h |  16 +++-
 6 files changed, 212 insertions(+), 87 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h
index 9698b9635058..86ffb9756e65 100644
--- a/drivers/scsi/lpfc/lpfc.h
+++ b/drivers/scsi/lpfc/lpfc.h
@@ -840,7 +840,8 @@ struct lpfc_hba {
uint32_t cfg_enable_SmartSAN;
uint32_t cfg_enable_mds_diags;
uint32_t cfg_enable_fc4_type;
-   uint32_t cfg_enable_bbcr;   /*Enable BB Credit Recovery*/
+   uint32_t cfg_enable_bbcr;   /* Enable BB Credit Recovery */
+   uint32_t cfg_enable_dpp;/* Enable Direct Packet Push */
uint32_t cfg_xri_split;
 #define LPFC_ENABLE_FCP  1
 #define LPFC_ENABLE_NVME 2
diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
index 7be4bdef4d42..e90d5066f66b 100644
--- a/drivers/scsi/lpfc/lpfc_attr.c
+++ b/drivers/scsi/lpfc/lpfc_attr.c
@@ -5186,6 +5186,18 @@ LPFC_ATTR_R(enable_mds_diags, 0, 0, 1, "Enable MDS 
Diagnostics");
  */
 LPFC_BBCR_ATTR_RW(enable_bbcr, 1, 0, 1, "Enable BBC Recovery");
 
+/*
+ * lpfc_enable_dpp: Enable DPP on G7
+ *   0  = DPP on G7 disabled
+ *   1  = DPP on G7 enabled (default)
+ * Value range is [0,1]. Default value is 1 on X86, 0 on other architectures.
+ */
+#ifdef CONFIG_X86
+LPFC_ATTR_RW(enable_dpp, 1, 0, 1, "Enable Direct Packet Push");
+#else
+LPFC_ATTR_RW(enable_dpp, 0, 0, 1, "Enable Direct Packet Push");
+#endif
+
 struct device_attribute *lpfc_hba_attrs[] = {
_attr_nvme_info,
_attr_bg_info,
@@ -5294,6 +5306,7 @@ struct device_attribute *lpfc_hba_attrs[] = {
_attr_lpfc_xlane_supported,
_attr_lpfc_enable_mds_diags,
_attr_lpfc_enable_bbcr,
+   _attr_lpfc_enable_dpp,
NULL,
 };
 
@@ -6306,6 +6319,7 @@ lpfc_get_cfgparam(struct lpfc_hba *phba)
lpfc_fcp_io_channel_init(phba, lpfc_fcp_io_channel);
lpfc_nvme_io_channel_init(phba, lpfc_nvme_io_channel);
lpfc_enable_bbcr_init(phba, lpfc_enable_bbcr);
+   lpfc_enable_dpp_init(phba, lpfc_enable_dpp);
 
if (phba->sli_rev != LPFC_SLI_REV4) {
/* NVME only supported on SLI4 */
diff --git a/drivers/scsi/lpfc/lpfc_hw4.h b/drivers/scsi/lpfc/lpfc_hw4.h
index 93fd9fd10a0f..60ccff6fa8b0 100644
--- a/drivers/scsi/lpfc/lpfc_hw4.h
+++ b/drivers/scsi/lpfc/lpfc_hw4.h
@@ -1372,6 +1372,15 @@ struct lpfc_mbx_wq_create {
 #define lpfc_mbx_wq_create_page_size_MASK  0x00FF
 #define lpfc_mbx_wq_create_page_size_WORD  word1
 #define LPFC_WQ_PAGE_SIZE_4096 0x1
+#define lpfc_mbx_wq_create_dpp_req_SHIFT   15
+#define lpfc_mbx_wq_create_dpp_req_MASK0x0001
+#define lpfc_mbx_wq_create_dpp_req_WORDword1
+#define lpfc_mbx_wq_create_doe_SHIFT   14
+#define lpfc_mbx_wq_create_doe_MASK0x0001
+#define lpfc_mbx_wq_create_doe_WORDword1
+#define lpfc_mbx_wq_create_toe_SHIFT   13
+#define lpfc_mbx_wq_create_toe_MASK0x0001
+#define lpfc_mbx_wq_create_toe_WORDword1
 #define lpfc_mbx_wq_create_wqe_size_SHIFT  8
 #define lpfc_mbx_wq_create_wqe_size_MASK   0x000F
 #define lpfc_mbx_wq_create_wqe_size_WORD   word1
@@ -1400,6 +1409,28 @@ struct lpfc_mbx_wq_create {
 #define lpfc_mbx_wq_create_db_format_MASK  0x
 #define