Re: [PATCH RE-RESEND V2 for-next 5/5] IB/core: Fix deadlock on uverbs modify_qp error flow

2015-02-19 Thread Or Gerlitz


On 2/18/2015 8:07 AM, Roland Dreier wrote:

On Tue, Feb 17, 2015 at 9:03 PM, Or Gerlitz ogerl...@mellanox.com wrote:

Roland, can you please pick this one too, it's a bug fix and no comments
were raised against it.
I just failed to remove the Gerrit Issue: XXX line,  so if you can do that,
will be great


OK.


I noted that these commits which are targeted to 3.20 are @ your master 
branch but not on for-next, just

dropping you a note to make sure they are not left behind.


IB/core: Add on demand paging caps to ib_uverbs_ex_query_device
IB/core: Add support for extended query device caps
IB/core: Fix deadlock on uverbs modify_qp error flow
IB/core: Properly handle registration of on-demand paging MRs after dereg
IB/mlx4: Bug fixes in mlx4_ib_resize_cq
IB/mlx4: Fix memory leak in __mlx4_ib_modify_qp
IB/mlx4: In mlx4_ib_demux_cm, print out GUID in host-endian order
IB/mlx5: Enable the ODP capability query verb
IB/mlx5: Update the dev in reg_create

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] RDMA/ocrdma: Changes in driver to incorporate the moving of GID Table mgmt to IB/Core.

2015-02-19 Thread Somnath Kotur
Shachar, 
Yes, it happened by mistake which I realized and immediately sent out 
the patch with the correct patch number 
 
Thanks
Som

 -Original Message-
 From: Shachar Raindel [mailto:rain...@mellanox.com]
 Sent: Thursday, February 19, 2015 2:31 PM
 To: Somnath Kotur; rol...@kernel.org
 Cc: linux-rdma@vger.kernel.org; Devesh Sharma
 Subject: RE: [PATCH] RDMA/ocrdma: Changes in driver to incorporate the
 moving of GID Table mgmt to IB/Core.
 
 
 
  -Original Message-
  From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
  ow...@vger.kernel.org] On Behalf Of Somnath Kotur
  Sent: Friday, February 20, 2015 12:02 AM
  To: rol...@kernel.org
  Cc: linux-rdma@vger.kernel.org; Somnath Kotur; Devesh Sharma
  Subject: [PATCH] RDMA/ocrdma: Changes in driver to incorporate the
  moving of GID Table mgmt to IB/Core.
 
 
 Som, the patch number seems to be missing here.
 When sending next iteration, please make sure:
 - That all patches include the proper numbers
 - That the version of the patchset is cleanly indicated in the header. You can
 use --subject-prefix=PATCH V2 when using format-patch to make this
 happen.
 
 Thanks,
 --Shachar
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v1 00/30] IB/Core: Adding support for RoCEV2 Specification

2015-02-19 Thread Somnath Kotur
Hi Bart,
Here's the link to the git tree with the patches

https://github.com/matanb10/linux.git 
branch name: rocev2_rc4

Thanks
Som

 -Original Message-
 From: Bart Van Assche [mailto:bart.vanass...@sandisk.com]
 Sent: Thursday, February 19, 2015 1:47 PM
 To: Somnath Kotur; rol...@kernel.org
 Cc: linux-rdma@vger.kernel.org
 Subject: Re: [PATCH v1 00/30] IB/Core: Adding support for RoCEV2
 Specification
 
 On 02/19/15 23:02, Somnath Kotur wrote:
  This series depends on RoCE LAG series (already accepted in net-next
  tree)
 
 Hello Somnath,
 
 Can you make a git tree available with these patches ? These patches do not
 apply cleanly on Dave Miller's latest net-next branch (git commit ID
 fece13ca005a5f559147e9424321f4b5e01272b4; Feb 17, 2015).
 
 Thanks,
 
 Bart.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RE-RESEND V2 for-next 3/5] IB/core: Call ib_unregister_mad_agent() only for valid agents

2015-02-19 Thread Majd Dibbiny

On 17/2/2015 3:59 PM, Yann Droneaud wrote:
 Hi,

 Le lundi 16 février 2015 à 22:28 +0100, Yann Droneaud a écrit :
 Le jeudi 05 février 2015 à 13:53 +0200, Or Gerlitz a écrit :
 From: Majd Dibbiny m...@mellanox.com

 In some error flows, ib_mad_unregister_agent is being invoked also in cases 
 where
   ^^^
   ib_unregister_mad_agent()

 the ib_mad_register_agent call failed (resulting in an illegal pointer in 
 the
  ^
  ib_register_mad_agent()

 agent field).  This causes a kernel crash in the error flow.

 Fix this by calling ib_unregister_mad_agent only for cases where
 ib_register_mad_agent succeeded.

 The code was checking for struct ib_mad_agent *agent not being NULL, 
 while ib_register_mad_agent() returns a ERR_PTR() in case of error
 ... bad :(

 After reading Jason's comments [1], I don't understand the purpose of
 this patch. How can an ERR_PTR() value be present in the arrays ?

 A quick review makes me think that mthca and mlx4 error paths lacks NULL
 assignment after calling ib_unregister_mad_agent(). In other words, qib
 might be correct while the others should be fixed.
You are right. Seems like I missed things here.
I'll fix it and send a new version soon.

 [1] http://mid.gmane.org/20150205174337.gb31...@obsidianresearch.com

 Signed-off-by: Majd Dibbiny m...@mellanox.com
 Signed-off-by: Or Gerlitz ogerl...@mellanox.com
 ---
  drivers/infiniband/hw/mlx4/mad.c|2 +-
  drivers/infiniband/hw/mthca/mthca_mad.c |2 +-
  drivers/infiniband/hw/qib/qib_mad.c |2 +-
  3 files changed, 3 insertions(+), 3 deletions(-)

 diff --git a/drivers/infiniband/hw/mlx4/mad.c 
 b/drivers/infiniband/hw/mlx4/mad.c
 index 82a7dd8..6be0d2c 100644
 --- a/drivers/infiniband/hw/mlx4/mad.c
 +++ b/drivers/infiniband/hw/mlx4/mad.c
 @@ -907,7 +907,7 @@ int mlx4_ib_mad_init(struct mlx4_ib_dev *dev)
  err:
 for (p = 0; p  dev-num_ports; ++p)
 for (q = 0; q = 1; ++q)
 -   if (dev-send_agent[p][q])
 +   if (!IS_ERR(dev-send_agent[p][q]))
 ib_unregister_mad_agent(dev-send_agent[p][q]);
  
 return ret;
 diff --git a/drivers/infiniband/hw/mthca/mthca_mad.c 
 b/drivers/infiniband/hw/mthca/mthca_mad.c
 index 8881fa3..5f1a7ce 100644
 --- a/drivers/infiniband/hw/mthca/mthca_mad.c
 +++ b/drivers/infiniband/hw/mthca/mthca_mad.c
 @@ -317,7 +317,7 @@ int mthca_create_agents(struct mthca_dev *dev)
  err:
 for (p = 0; p  dev-limits.num_ports; ++p)
 for (q = 0; q = 1; ++q)
 -   if (dev-send_agent[p][q])
 +   if (!IS_ERR(dev-send_agent[p][q]))
 ib_unregister_mad_agent(dev-send_agent[p][q]);
  
 return ret;
 diff --git a/drivers/infiniband/hw/qib/qib_mad.c 
 b/drivers/infiniband/hw/qib/qib_mad.c
 index 636be11..6c7cc80 100644
 --- a/drivers/infiniband/hw/qib/qib_mad.c
 +++ b/drivers/infiniband/hw/qib/qib_mad.c
 @@ -2499,7 +2499,7 @@ int qib_create_agents(struct qib_ibdev *dev)
  err:
 for (p = 0; p  dd-num_pports; p++) {
 ibp = dd-pport[p].ibport_data;
 -   if (ibp-send_agent) {
 +   if (!IS_ERR(ibp-send_agent)) {
 agent = ibp-send_agent;
 ibp-send_agent = NULL;
 ib_unregister_mad_agent(agent);
 Regards.


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] RDMA/ocrdma: Changes in driver to incorporate the moving of GID Table mgmt to IB/Core.

2015-02-19 Thread Shachar Raindel


 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
 ow...@vger.kernel.org] On Behalf Of Somnath Kotur
 Sent: Friday, February 20, 2015 12:02 AM
 To: rol...@kernel.org
 Cc: linux-rdma@vger.kernel.org; Somnath Kotur; Devesh Sharma
 Subject: [PATCH] RDMA/ocrdma: Changes in driver to incorporate the
 moving of GID Table mgmt to IB/Core.
 

Som, the patch number seems to be missing here.
When sending next iteration, please make sure:
- That all patches include the proper numbers
- That the version of the patchset is cleanly indicated in the header. You can 
use --subject-prefix=PATCH V2 when using format-patch to make this happen.

Thanks,
--Shachar
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/1] IB/mthca: remove deprecated use of pci api

2015-02-19 Thread Quentin Lambert
Replace occurences of the pci api by appropriate call to the dma api.

A simplified version of the semantic patch that finds this problem is as
follows: (http://coccinelle.lip6.fr)

@deprecated@
idexpression id;
position p;
@@

(
  pci_dma_supported@p ( id, ...)
|
  pci_alloc_consistent@p ( id, ...)
)

@bad1@
idexpression id;
position deprecated.p;
@@
...when != id-dev
   when != pci_get_drvdata ( id )
   when != pci_enable_device ( id )
(
  pci_dma_supported@p ( id, ...)
|
  pci_alloc_consistent@p ( id, ...)
)

@depends on !bad1@
idexpression id;
expression direction;
position deprecated.p;
@@

(
- pci_dma_supported@p ( id,
+ dma_supported ( id-dev,
...
+ , GFP_ATOMIC
  )
|
- pci_alloc_consistent@p ( id,
+ dma_alloc_coherent ( id-dev,
...
+ , GFP_ATOMIC
  )
)

Signed-off-by: Quentin Lambert lambert.quen...@gmail.com
---
 Changes since v1:
   - replace cast from pci enum to dma enum with approriate
 dma enum value as suggested by Roland Dreier

 drivers/infiniband/hw/mthca/mthca_eq.c  | 18 ++
 drivers/infiniband/hw/mthca/mthca_main.c|  8 
 drivers/infiniband/hw/mthca/mthca_memfree.c | 23 ++-
 3 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/drivers/infiniband/hw/mthca/mthca_eq.c 
b/drivers/infiniband/hw/mthca/mthca_eq.c
index 6902017..fd0c437 100644
--- a/drivers/infiniband/hw/mthca/mthca_eq.c
+++ b/drivers/infiniband/hw/mthca/mthca_eq.c
@@ -617,7 +617,7 @@ static void mthca_free_eq(struct mthca_dev *dev,
 
mthca_free_mr(dev, eq-mr);
for (i = 0; i  npages; ++i)
-   pci_free_consistent(dev-pdev, PAGE_SIZE,
+   dma_free_coherent(dev-pdev-dev, PAGE_SIZE,
eq-page_list[i].buf,
dma_unmap_addr(eq-page_list[i], mapping));
 
@@ -739,17 +739,19 @@ int mthca_map_eq_icm(struct mthca_dev *dev, u64 icm_virt)
dev-eq_table.icm_page = alloc_page(GFP_HIGHUSER);
if (!dev-eq_table.icm_page)
return -ENOMEM;
-   dev-eq_table.icm_dma  = pci_map_page(dev-pdev, 
dev-eq_table.icm_page, 0,
- PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-   if (pci_dma_mapping_error(dev-pdev, dev-eq_table.icm_dma)) {
+   dev-eq_table.icm_dma  = dma_map_page(dev-pdev-dev,
+ dev-eq_table.icm_page, 0,
+ PAGE_SIZE, DMA_BIDIRECTIONAL);
+
+   if (dma_mapping_error(dev-pdev-dev, dev-eq_table.icm_dma)) {
__free_page(dev-eq_table.icm_page);
return -ENOMEM;
}
 
ret = mthca_MAP_ICM_page(dev, dev-eq_table.icm_dma, icm_virt);
if (ret) {
-   pci_unmap_page(dev-pdev, dev-eq_table.icm_dma, PAGE_SIZE,
-  PCI_DMA_BIDIRECTIONAL);
+   dma_unmap_page(dev-pdev-dev, dev-eq_table.icm_dma,
+  PAGE_SIZE, DMA_BIDIRECTIONAL);
__free_page(dev-eq_table.icm_page);
}
 
@@ -759,8 +761,8 @@ int mthca_map_eq_icm(struct mthca_dev *dev, u64 icm_virt)
 void mthca_unmap_eq_icm(struct mthca_dev *dev)
 {
mthca_UNMAP_ICM(dev, dev-eq_table.icm_virt, 1);
-   pci_unmap_page(dev-pdev, dev-eq_table.icm_dma, PAGE_SIZE,
-  PCI_DMA_BIDIRECTIONAL);
+   dma_unmap_page(dev-pdev-dev, dev-eq_table.icm_dma, PAGE_SIZE,
+  DMA_BIDIRECTIONAL);
__free_page(dev-eq_table.icm_page);
 }
 
diff --git a/drivers/infiniband/hw/mthca/mthca_main.c 
b/drivers/infiniband/hw/mthca/mthca_main.c
index ded76c1..2b52416 100644
--- a/drivers/infiniband/hw/mthca/mthca_main.c
+++ b/drivers/infiniband/hw/mthca/mthca_main.c
@@ -940,20 +940,20 @@ static int __mthca_init_one(struct pci_dev *pdev, int 
hca_type)
 
pci_set_master(pdev);
 
-   err = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
+   err = dma_set_mask(pdev-dev, DMA_BIT_MASK(64));
if (err) {
dev_warn(pdev-dev, Warning: couldn't set 64-bit PCI DMA 
mask.\n);
-   err = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
+   err = dma_set_mask(pdev-dev, DMA_BIT_MASK(32));
if (err) {
dev_err(pdev-dev, Can't set PCI DMA mask, 
aborting.\n);
goto err_free_res;
}
}
-   err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
+   err = dma_set_coherent_mask(pdev-dev, DMA_BIT_MASK(64));
if (err) {
dev_warn(pdev-dev, Warning: couldn't set 64-bit 
 consistent PCI DMA mask.\n);
-   err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32));
+   err = dma_set_coherent_mask(pdev-dev, DMA_BIT_MASK(32));
if (err) {
dev_err(pdev-dev, Can't set consistent PCI DMA mask, 

aborting.\n);
diff --git 

Re: [PATCH 02/30] IB/core: Add kref to IB devices

2015-02-19 Thread Haggai Eran
On 20/02/2015 00:02, Somnath Kotur wrote:
 From: Matan Barak mat...@mellanox.com
 
 Previously. we used device_mutex lock in order to protect
 the device's list. That means that in order to guarantee a
 device isn't freed while we use it, we had to lock all
 devices.
 
 Adding a kref per IB device. Before an IB device
 is unregistered, we wait before its not held anymore.

Maybe I'm missing something, but IB device already has a struct device
embedded in it, which has a kobject with a kref. Wouldn't it be better
to simply move the operations that need to wait to the ib_device_release
function?

Regards,
Haggai
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/30] IB/core: Add kref to IB devices

2015-02-19 Thread Matan Barak
On Thu, Feb 19, 2015 at 12:57 PM, Haggai Eran hagg...@mellanox.com wrote:
 On 20/02/2015 00:02, Somnath Kotur wrote:
 From: Matan Barak mat...@mellanox.com

 Previously. we used device_mutex lock in order to protect
 the device's list. That means that in order to guarantee a
 device isn't freed while we use it, we had to lock all
 devices.

 Adding a kref per IB device. Before an IB device
 is unregistered, we wait before its not held anymore.

 Maybe I'm missing something, but IB device already has a struct device
 embedded in it, which has a kobject with a kref. Wouldn't it be better
 to simply move the operations that need to wait to the ib_device_release
 function?

You're correct, but in ib_unregister_device we delete all client's related data.
We would like to ensure that all references were released before this
data is being deleted and the device is still functioning rather than
when the IB device's
memory is freed.
ib_device_get and ib_device_hold are APIs for the clients, similar to
dev_hold and dev_put.

Regards,
Matan


 Regards,
 Haggai
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/30] IB/core: Support find sgid index using a filter function

2015-02-19 Thread Matan Barak



On 2/19/2015 2:41 PM, Haggai Eran wrote:

On 20/02/2015 00:02, Somnath Kotur wrote:

@@ -111,6 +111,33 @@ int ib_find_cached_gid_by_port(struct ib_device *device,
   struct net   *net,
   int  if_index,
   u16  *index);
+
+/**
+ * ib_find_cached_gid_by_port - Returns the GID table index where a specified

You probably meant ib_find_gid_by_filter.


Correct, we'll fix for next version.




+ * GID value occurs
+ * @device: The device to query.
+ * @gid: The GID value to search for.
+ * @port_num: The port number of the device where the GID value could be
+ *   searched.
+ * @filter: The filter function is executed on any matching GID in the table.
+ *   If the filter function returns true, the corresponding index is returned,
+ *   otherwise, we continue searching the GID table. It's guaranteed that
+ *   while filter is executed, ndev field is valid and the structure won't
+ *   change. filter is executed in an atomic context. filter must be NULL
+ *   when RoCE GID cache isn't supported on the respective device's port.
+ * @index: The index into the cached GID table where the GID was found.  This
+ *   parameter may be NULL.
+ *
+ * ib_find_cached_gid_by_port() searches for the specified GID value in

Here too.


Thanks,
Matan




+ * the local software cache.
+ */
+int ib_find_gid_by_filter(struct ib_device *device,
+ union ib_gid *gid,
+ u8 port_num,
+ bool (*filter)(const union ib_gid *gid,
+const struct ib_gid_attr *,
+void *),
+ void *context, u16 *index);
  /**
   * ib_get_cached_pkey - Returns a cached PKey table entry
   * @device: The device to query.


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/3] IB/core: Add support for extended query device caps

2015-02-19 Thread Matan Barak
On Wed, Feb 18, 2015 at 9:26 AM, Haggai Eran hagg...@mellanox.com wrote:
 From: Eli Cohen e...@mellanox.com

 Add extensible query device capabilities verb to allow adding new features.
 ib_uverbs_ex_query_device is added and copy_query_dev_fields is used to copy
 capability fields to be used by both ib_uverbs_query_device and
 ib_uverbs_ex_query_device.

 Following the discussion about this patch [1], the code now validates the
 command's comp_mask is zero, returning -EINVAL for unknown values, in order to
 allow extending the verb in the future.

 The verb also checks the user-space provided response buffer size and only
 fills in capabilities that will fit in the buffer. In attempt to follow the
 spirit of presentation [2] by Tzahi Oved that was presented during OpenFabrics
 Alliance International Developer Workshop 2013, the comp_mask bits will only
 describe which fields are valid. Furthermore, fields that can simply be
 cleared when they are not supported, do not require a comp_mask bit at all.
 The verb returns a response_length field containing the actual number of bytes
 written by the kernel, so that a newer version running on an older kernel can
 tell which fields were actually returned.

 [1] [PATCH v1 0/5] IB/core: extended query device caps cleanup for v3.19
 http://thread.gmane.org/gmane.linux.kernel.api/7889/

 [2] 
 https://www.openfabrics.org/images/docs/2013_Dev_Workshop/Tues_0423/2013_Workshop_Tues_0830_Tzahi_Oved-verbs_extensions_ofa_2013-tzahio.pdf

 Cc: Yann Droneaud ydrone...@opteya.com
 Cc: Ira Weiny ira.we...@intel.com
 Cc: Jason Gunthorpe jguntho...@obsidianresearch.com
 Signed-off-by: Eli Cohen e...@mellanox.com
 Signed-off-by: Haggai Eran hagg...@mellanox.com
 ---
  drivers/infiniband/core/uverbs.h  |   1 +
  drivers/infiniband/core/uverbs_cmd.c  | 135 
 +++---
  drivers/infiniband/core/uverbs_main.c |   1 +
  include/uapi/rdma/ib_user_verbs.h |  12 +++
  4 files changed, 108 insertions(+), 41 deletions(-)

 diff --git a/drivers/infiniband/core/uverbs.h 
 b/drivers/infiniband/core/uverbs.h
 index 643c08a025a5..b716b0815644 100644
 --- a/drivers/infiniband/core/uverbs.h
 +++ b/drivers/infiniband/core/uverbs.h
 @@ -258,5 +258,6 @@ IB_UVERBS_DECLARE_CMD(close_xrcd);

  IB_UVERBS_DECLARE_EX_CMD(create_flow);
  IB_UVERBS_DECLARE_EX_CMD(destroy_flow);
 +IB_UVERBS_DECLARE_EX_CMD(query_device);

  #endif /* UVERBS_H */
 diff --git a/drivers/infiniband/core/uverbs_cmd.c 
 b/drivers/infiniband/core/uverbs_cmd.c
 index b7943ff16ed3..75ab6fef6de5 100644
 --- a/drivers/infiniband/core/uverbs_cmd.c
 +++ b/drivers/infiniband/core/uverbs_cmd.c
 @@ -400,6 +400,52 @@ err:
 return ret;
  }

 +static void copy_query_dev_fields(struct ib_uverbs_file *file,
 + struct ib_uverbs_query_device_resp *resp,
 + struct ib_device_attr *attr)
 +{
 +   resp-fw_ver= attr-fw_ver;
 +   resp-node_guid = file-device-ib_dev-node_guid;
 +   resp-sys_image_guid= attr-sys_image_guid;
 +   resp-max_mr_size   = attr-max_mr_size;
 +   resp-page_size_cap = attr-page_size_cap;
 +   resp-vendor_id = attr-vendor_id;
 +   resp-vendor_part_id= attr-vendor_part_id;
 +   resp-hw_ver= attr-hw_ver;
 +   resp-max_qp= attr-max_qp;
 +   resp-max_qp_wr = attr-max_qp_wr;
 +   resp-device_cap_flags  = attr-device_cap_flags;
 +   resp-max_sge   = attr-max_sge;
 +   resp-max_sge_rd= attr-max_sge_rd;
 +   resp-max_cq= attr-max_cq;
 +   resp-max_cqe   = attr-max_cqe;
 +   resp-max_mr= attr-max_mr;
 +   resp-max_pd= attr-max_pd;
 +   resp-max_qp_rd_atom= attr-max_qp_rd_atom;
 +   resp-max_ee_rd_atom= attr-max_ee_rd_atom;
 +   resp-max_res_rd_atom   = attr-max_res_rd_atom;
 +   resp-max_qp_init_rd_atom   = attr-max_qp_init_rd_atom;
 +   resp-max_ee_init_rd_atom   = attr-max_ee_init_rd_atom;
 +   resp-atomic_cap= attr-atomic_cap;
 +   resp-max_ee= attr-max_ee;
 +   resp-max_rdd   = attr-max_rdd;
 +   resp-max_mw= attr-max_mw;
 +   resp-max_raw_ipv6_qp   = attr-max_raw_ipv6_qp;
 +   resp-max_raw_ethy_qp   = attr-max_raw_ethy_qp;
 +   resp-max_mcast_grp = attr-max_mcast_grp;
 +   resp-max_mcast_qp_attach   = attr-max_mcast_qp_attach;
 +   resp-max_total_mcast_qp_attach = attr-max_total_mcast_qp_attach;
 +   resp-max_ah= attr-max_ah;
 +   resp-max_fmr   = attr-max_fmr;
 +   resp-max_map_per_fmr   = attr-max_map_per_fmr;
 +   resp-max_srq   = attr-max_srq;
 +   resp-max_srq_wr= attr-max_srq_wr;
 +   resp-max_srq_sge   = attr-max_srq_sge;
 +   

Re: [PATCH 09/30] IB/core: Modify ib_verbs and cma in order to use roce_gid_cache

2015-02-19 Thread Matan Barak



On 2/19/2015 2:52 PM, Haggai Eran wrote:

On 20/02/2015 00:02, Somnath Kotur wrote:

--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -2089,15 +2089,16 @@ ssize_t ib_uverbs_modify_qp(struct ib_uverbs_file *file,
attr-alt_ah_attr.port_num   = cmd.alt_dest.port_num;

if (qp-real_qp == qp) {
-   ret = ib_resolve_eth_l2_attrs(qp, attr, cmd.attr_mask);
+   ret = ib_resolve_eth_dmac(qp, attr, cmd.attr_mask);
if (ret)
-   goto out;
+   goto out_put;
ret = qp-device-modify_qp(qp, attr,
modify_qp_mask(qp-qp_type, cmd.attr_mask), udata);
} else {
ret = ib_modify_qp(qp, attr, modify_qp_mask(qp-qp_type, 
cmd.attr_mask));
}

+out_put:
put_qp_read(qp);


This seem to solve a leak of QP references when dmac resolution fails.
I'm not sure it belongs with the rest of the patch.


I'll remove the fix from this patch.

Thanks!


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/30] IB/core: Modify ib_verbs and cma in order to use roce_gid_cache

2015-02-19 Thread Or Gerlitz
On Thu, Feb 19, 2015 at 3:22 PM, Matan Barak mat...@mellanox.com wrote:
 On 2/19/2015 2:52 PM, Haggai Eran wrote:
 On 20/02/2015 00:02, Somnath Kotur wrote:
 --- a/drivers/infiniband/core/uverbs_cmd.c
 +++ b/drivers/infiniband/core/uverbs_cmd.c
 @@ -2089,15 +2089,16 @@ ssize_t ib_uverbs_modify_qp(struct ib_uverbs_file
 *file,
 attr-alt_ah_attr.port_num  = cmd.alt_dest.port_num;

 if (qp-real_qp == qp) {
 -   ret = ib_resolve_eth_l2_attrs(qp, attr, cmd.attr_mask);
 +   ret = ib_resolve_eth_dmac(qp, attr, cmd.attr_mask);
 if (ret)
 -   goto out;
 +   goto out_put;
 ret = qp-device-modify_qp(qp, attr,
 modify_qp_mask(qp-qp_type, cmd.attr_mask),
 udata);
 } else {
 ret = ib_modify_qp(qp, attr, modify_qp_mask(qp-qp_type,
 cmd.attr_mask));
 }

 +out_put:
 put_qp_read(qp);


 This seem to solve a leak of QP references when dmac resolution fails.
 I'm not sure it belongs with the rest of the patch.


 I'll remove the fix from this patch.

we just pushed a fix to this leak

http://git.kernel.org/cgit/linux/kernel/git/roland/infiniband.git/commit/?id=0fb8bcf022f19a375d7c4bd79ac513da8ae6d78b
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/30] IB/core: Support find sgid index using a filter function

2015-02-19 Thread Haggai Eran
On 20/02/2015 00:02, Somnath Kotur wrote:
 @@ -111,6 +111,33 @@ int ib_find_cached_gid_by_port(struct ib_device *device,
  struct net   *net,
  int  if_index,
  u16  *index);
 +
 +/**
 + * ib_find_cached_gid_by_port - Returns the GID table index where a specified
You probably meant ib_find_gid_by_filter.

 + * GID value occurs
 + * @device: The device to query.
 + * @gid: The GID value to search for.
 + * @port_num: The port number of the device where the GID value could be
 + *   searched.
 + * @filter: The filter function is executed on any matching GID in the table.
 + *   If the filter function returns true, the corresponding index is 
 returned,
 + *   otherwise, we continue searching the GID table. It's guaranteed that
 + *   while filter is executed, ndev field is valid and the structure won't
 + *   change. filter is executed in an atomic context. filter must be NULL
 + *   when RoCE GID cache isn't supported on the respective device's port.
 + * @index: The index into the cached GID table where the GID was found.  This
 + *   parameter may be NULL.
 + *
 + * ib_find_cached_gid_by_port() searches for the specified GID value in
Here too.

 + * the local software cache.
 + */
 +int ib_find_gid_by_filter(struct ib_device *device,
 +   union ib_gid *gid,
 +   u8 port_num,
 +   bool (*filter)(const union ib_gid *gid,
 +  const struct ib_gid_attr *,
 +  void *),
 +   void *context, u16 *index);
  /**
   * ib_get_cached_pkey - Returns a cached PKey table entry
   * @device: The device to query.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/30] IB/core: Modify ib_verbs and cma in order to use roce_gid_cache

2015-02-19 Thread Haggai Eran
On 20/02/2015 00:02, Somnath Kotur wrote:
 --- a/drivers/infiniband/core/uverbs_cmd.c
 +++ b/drivers/infiniband/core/uverbs_cmd.c
 @@ -2089,15 +2089,16 @@ ssize_t ib_uverbs_modify_qp(struct ib_uverbs_file 
 *file,
   attr-alt_ah_attr.port_num  = cmd.alt_dest.port_num;
  
   if (qp-real_qp == qp) {
 - ret = ib_resolve_eth_l2_attrs(qp, attr, cmd.attr_mask);
 + ret = ib_resolve_eth_dmac(qp, attr, cmd.attr_mask);
   if (ret)
 - goto out;
 + goto out_put;
   ret = qp-device-modify_qp(qp, attr,
   modify_qp_mask(qp-qp_type, cmd.attr_mask), udata);
   } else {
   ret = ib_modify_qp(qp, attr, modify_qp_mask(qp-qp_type, 
 cmd.attr_mask));
   }
  
 +out_put:
   put_qp_read(qp);

This seem to solve a leak of QP references when dmac resolution fails.
I'm not sure it belongs with the rest of the patch.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 30/30] IB/cma: Join and leave multicast groups with IGMP

2015-02-19 Thread Shachar Raindel


 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
 ow...@vger.kernel.org] On Behalf Of Somnath Kotur
 Sent: Friday, February 20, 2015 12:03 AM
 To: rol...@kernel.org
 Cc: linux-rdma@vger.kernel.org; Moni Shoua; Somnath Kotur
 Subject: [PATCH 30/30] IB/cma: Join and leave multicast groups with IGMP
 
 From: Moni Shoua mo...@mellanox.com
 
 Since RoCEv2 is a protocol over IP header it is required to send IGMP
 join and leave requests to the network when joining and leaving
 multicast groups.
 

Are you planning to support also IPv6 multicast? The patch below only support 
IPv4 multicast.

 Signed-off-by: Moni Shoua mo...@mellanox.com
 Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
 ---
  drivers/infiniband/core/cma.c |   78
 ++--
  1 files changed, 74 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/infiniband/core/cma.c
 b/drivers/infiniband/core/cma.c
 index 50635fe..6e658e8 100644
 --- a/drivers/infiniband/core/cma.c
 +++ b/drivers/infiniband/core/cma.c
 @@ -38,6 +38,7 @@
  #include linux/in6.h
  #include linux/mutex.h
  #include linux/random.h
 +#include linux/igmp.h
  #include linux/idr.h
  #include linux/inetdevice.h
  #include linux/slab.h
 @@ -185,6 +186,7 @@ struct rdma_id_private {
   u8  reuseaddr;
   u8  afonly;
   enum ib_gid_typegid_type;
 + booligmp_joined;
  };
 
  struct cma_multicast {
 @@ -283,6 +285,26 @@ static inline void cma_set_ip_ver(struct cma_hdr
 *hdr, u8 ip_ver)
   hdr-ip_version = (ip_ver  4) | (hdr-ip_version  0xF);
  }
 
 +static int cma_igmp_send(struct net_device *ndev, union ib_gid *mgid,
 bool join)
 +{
 + struct in_device *in_dev = NULL;
 +
 + if (ndev) {
 + rtnl_lock();
 + in_dev = __in_dev_get_rtnl(ndev);
 + if (in_dev) {
 + if (join)
You should be iterating here over all ifa entries of the netdev, and update the 
one which is associated with the same IP address as the sgid joining the listen 
state. The current code will break on devices with multiple IP addresses.

 + ip_mc_inc_group(in_dev,
 + *(__be32 *)(mgid-raw+12));

The +12 magic number here seems wrong. Consider moving the GID-IP multicast 
to a small helper function/macro.

 + else
 + ip_mc_dec_group(in_dev,
 + *(__be32 *)(mgid-raw+12));

If the in_dev underwent a restart, wouldn't you end up here decreasing the 
group reference count to -1?

 + }
 + rtnl_unlock();
 + }
 + return (in_dev) ? 0 : -ENODEV;
 +}
 +
  static void cma_attach_to_dev(struct rdma_id_private *id_priv,
 struct cma_device *cma_dev)
  {
 @@ -585,6 +607,7 @@ struct rdma_cm_id
 *rdma_create_id(rdma_cm_event_handler event_handler,
   INIT_LIST_HEAD(id_priv-listen_list);
   INIT_LIST_HEAD(id_priv-mc_list);
   get_random_bytes(id_priv-seq_num, sizeof id_priv-seq_num);
 + id_priv-igmp_joined = false;
 
   return id_priv-id;
  }
 @@ -1076,6 +1099,20 @@ static void cma_leave_mc_groups(struct
 rdma_id_private *id_priv)
   kfree(mc);
   break;
   case IB_LINK_LAYER_ETHERNET:
 + if (id_priv-igmp_joined) {

Anything preventing a racing join on the same ID from increasing the reference 
count and setting igmp_joined after we read it?

 + struct rdma_dev_addr *dev_addr = id_priv-
 id.route.addr.dev_addr;
 + struct net_device *ndev = NULL;
 +
 + if (dev_addr-bound_dev_if)
 + ndev = dev_get_by_index(init_net,
 + 
 dev_addr-bound_dev_if);

Why are you getting the interface again here? Also, why are you not using the 
GID table for this lookup, but instead using the direct netdev API functions?

 + if (ndev) {
 + cma_igmp_send(ndev,
 +   
 mc-multicast.ib-rec.mgid,
 +   false);
 + dev_put(ndev);
 + }
 + }
   kref_put(mc-mcref, release_mc);
   break;
   default:
 @@ -3356,7 +3393,7 @@ static int cma_iboe_join_multicast(struct
 rdma_id_private *id_priv,
  {
   struct iboe_mcast_work *work;
   struct rdma_dev_addr *dev_addr = id_priv-id.route.addr.dev_addr;
 - int err;
 + int err = 0;
   struct sockaddr *addr = (struct sockaddr *)mc-addr;
   struct net_device *ndev = NULL;
 
 @@ -3388,13 +3425,31 @@ static int 

Re: [PATCH 09/30] IB/core: Modify ib_verbs and cma in order to use roce_gid_cache

2015-02-19 Thread Matan Barak



On 2/19/2015 4:37 PM, Haggai Eran wrote:

On 20/02/2015 00:02, Somnath Kotur wrote:

@@ -203,21 +235,30 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 
port_num, struct ib_wc *wc,

memset(ah_attr, 0, sizeof *ah_attr);
if (is_eth) {
+   u16 vlan_id = wc-wc_flags  IB_WC_WITH_VLAN ?
+   wc-vlan_id : 0x;
+
if (!(wc-wc_flags  IB_WC_GRH))
return -EPROTOTYPE;

-   if (wc-wc_flags  IB_WC_WITH_SMAC 
-   wc-wc_flags  IB_WC_WITH_VLAN) {
-   memcpy(ah_attr-dmac, wc-smac, ETH_ALEN);
-   ah_attr-vlan_id = wc-vlan_id;
-   } else {
+   if (!(wc-wc_flags  IB_WC_WITH_SMAC) ||
+   !(wc-wc_flags  IB_WC_WITH_VLAN)) {
ret = rdma_addr_find_dmac_by_grh(grh-dgid, grh-sgid,
-   ah_attr-dmac, ah_attr-vlan_id);
+ah_attr-dmac,
+wc-wc_flags  
IB_WC_WITH_VLAN ?
+NULL : vlan_id,
+0);
if (ret)
return ret;
}
-   } else {
-   ah_attr-vlan_id = 0x;


Previously vlan_id would get set to 0x on non-Ethernet link-layer,
and now it is left as zero. Wouldn't that break things for non-Ethernet
protocols?


On non-Ethernet link-later, vlan_id was ignored. This field was deleted 
in this patchset.





+
+   ret = get_sgid_index_from_eth(device, port_num, vlan_id,
+ grh-dgid, gid_index);
+   if (ret)
+   return ret;
+
+   if (wc-wc_flags  IB_WC_WITH_SMAC)
+   memcpy(ah_attr-dmac, wc-smac, ETH_ALEN);
}

ah_attr-dlid = wc-slid;


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/30] IB/core: Modify ib_verbs and cma in order to use roce_gid_cache

2015-02-19 Thread Matan Barak



On 2/19/2015 5:03 PM, Haggai Eran wrote:

On 20/02/2015 00:02, Somnath Kotur wrote:

@@ -502,9 +547,7 @@ EXPORT_SYMBOL(ib_create_qp);
  static const struct {
int valid;
enum ib_qp_attr_maskreq_param[IB_QPT_MAX];
-   enum ib_qp_attr_maskreq_param_add_eth[IB_QPT_MAX];
enum ib_qp_attr_maskopt_param[IB_QPT_MAX];
-   enum ib_qp_attr_maskopt_param_add_eth[IB_QPT_MAX];
  } qp_state_table[IB_QPS_ERR + 1][IB_QPS_ERR + 1] = {
[IB_QPS_RESET] = {
[IB_QPS_RESET] = { .valid = 1 },
@@ -585,12 +628,6 @@ static const struct {
IB_QP_MAX_DEST_RD_ATOMIC
|
IB_QP_MIN_RNR_TIMER),
},
-   .req_param_add_eth = {
-   [IB_QPT_RC]  = (IB_QP_SMAC),
-   [IB_QPT_UC]  = (IB_QP_SMAC),
-   [IB_QPT_XRC_INI]  = (IB_QP_SMAC),
-   [IB_QPT_XRC_TGT]  = (IB_QP_SMAC)
-   },
.opt_param = {
 [IB_QPT_UD]  = (IB_QP_PKEY_INDEX   
|
 IB_QP_QKEY),
@@ -611,21 +648,7 @@ static const struct {
 [IB_QPT_GSI] = (IB_QP_PKEY_INDEX   
|
 IB_QP_QKEY),
 },
-   .opt_param_add_eth = {
-   [IB_QPT_RC]  = (IB_QP_ALT_SMAC  
|
-   IB_QP_VID   
|
-   IB_QP_ALT_VID),
-   [IB_QPT_UC]  = (IB_QP_ALT_SMAC  
|
-   IB_QP_VID   
|
-   IB_QP_ALT_VID),
-   [IB_QPT_XRC_INI]  = (IB_QP_ALT_SMAC 
|
-   IB_QP_VID   
|
-   IB_QP_ALT_VID),
-   [IB_QPT_XRC_TGT]  = (IB_QP_ALT_SMAC 
|
-   IB_QP_VID   
|
-   IB_QP_ALT_VID)
-   }
-   }
+   },
},
[IB_QPS_RTR]   = {
[IB_QPS_RESET] = { .valid = 1 },
@@ -847,13 +870,6 @@ int ib_modify_qp_is_ok(enum ib_qp_state cur_state, enum 
ib_qp_state next_state,
req_param = qp_state_table[cur_state][next_state].req_param[type];
opt_param = qp_state_table[cur_state][next_state].opt_param[type];

-   if (ll == IB_LINK_LAYER_ETHERNET) {
-   req_param |= qp_state_table[cur_state][next_state].
-   req_param_add_eth[type];
-   opt_param |= qp_state_table[cur_state][next_state].
-   opt_param_add_eth[type];
-   }
-
if ((mask  req_param) != req_param)
return 0;



I understand this patch will remove any kernel reference to these
modify_qp attributes. However, what about user-space? Was it previously
allowed to pass in these parameters?



There was no libibverbs that declared those flags. It was filled by 
ib_resolve_eth_l2_attrs. If someone wrote a custom libibverbs that 
passed those flags, they would have just been ignored. We could replace 
them as reserved flags. What do you think?


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/30] IB/core: Modify ib_verbs and cma in order to use roce_gid_cache

2015-02-19 Thread Matan Barak



On 2/19/2015 5:18 PM, Haggai Eran wrote:

On 20/02/2015 00:02, Somnath Kotur wrote:

--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -458,7 +458,7 @@ static void resolve_cb(int status, struct sockaddr 
*src_addr,
  }

  int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 
*dmac,
-  u16 *vlan_id)
+  u16 *vlan_id, int if_index)
  {
int ret = 0;
struct rdma_dev_addr dev_addr;
@@ -481,6 +481,7 @@ int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union 
ib_gid *dgid, u8 *dmac,
return ret;

memset(dev_addr, 0, sizeof(dev_addr));
+   dev_addr.bound_dev_if = if_index;


There's a call to rdma_resolve_ip later in this function. I think it
overrides the if_index you store here.

I would expect this function to somehow limit the address resolution
done by rdma_resolve_ip so that the result has to use if_index as a
source interface.



I'm not sure that's correct:
static int addr4_resolve(struct sockaddr_in *src_in,
 struct sockaddr_in *dst_in,
 struct rdma_dev_addr *addr)
{
__be32 src_ip = src_in-sin_addr.s_addr;
__be32 dst_ip = dst_in-sin_addr.s_addr;
struct rtable *rt;
struct flowi4 fl4;
int ret;

memset(fl4, 0, sizeof(fl4));
fl4.daddr = dst_ip;
fl4.saddr = src_ip;
fl4.flowi4_oif = addr-bound_dev_if;

static int addr6_resolve(struct sockaddr_in6 *src_in,
 struct sockaddr_in6 *dst_in,
 struct rdma_dev_addr *addr)
{
struct flowi6 fl6;
struct dst_entry *dst;
struct rt6_info *rt;
int ret;

memset(fl6, 0, sizeof fl6);
fl6.daddr = dst_in-sin6_addr;
fl6.saddr = src_in-sin6_addr;
fl6.flowi6_oif = addr-bound_dev_if;


bound_dev_if is the if_index we would like to use.




ctx.addr = dev_addr;
init_completion(ctx.comp);



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/30] IB/core: Modify ib_verbs and cma in order to use roce_gid_cache

2015-02-19 Thread Haggai Eran
On 20/02/2015 00:02, Somnath Kotur wrote:
 --- a/drivers/infiniband/core/addr.c
 +++ b/drivers/infiniband/core/addr.c
 @@ -458,7 +458,7 @@ static void resolve_cb(int status, struct sockaddr 
 *src_addr,
  }
  
  int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 
 *dmac,
 -u16 *vlan_id)
 +u16 *vlan_id, int if_index)
  {
   int ret = 0;
   struct rdma_dev_addr dev_addr;
 @@ -481,6 +481,7 @@ int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union 
 ib_gid *dgid, u8 *dmac,
   return ret;
  
   memset(dev_addr, 0, sizeof(dev_addr));
 + dev_addr.bound_dev_if = if_index;

There's a call to rdma_resolve_ip later in this function. I think it
overrides the if_index you store here.

I would expect this function to somehow limit the address resolution
done by rdma_resolve_ip so that the result has to use if_index as a
source interface.

  
   ctx.addr = dev_addr;
   init_completion(ctx.comp);

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/3] IB/core: Add support for extended query device caps

2015-02-19 Thread Jason Gunthorpe
On Thu, Feb 19, 2015 at 02:30:29PM +0200, Matan Barak wrote:

 I think we should all agree about the extension verbs mechanism before
 taking it further.
 IMHO, using comp_mask only for fields that their valid value != 0 is 
 confusing.
 For example, an old kernel should support a command if:
 (a) it knows all its valid bits
 (b) all fields after the size it knows is 0
 
 In the old schema, the kernel only needs to look at the comp_mask
 bits and execute the command or reject it.

It doesn't really change anything, the kernel still has to go field by
field to check, all that happens is

if ((comp_mask  BIT)  offsetof(field..)  size)
  // field is valid

becomes

if (offsetof(field..)  size  field != 0)
  // field is valid

 Furthermore, ibv_create_flow and ibv_destroy_flow were already
 accepted using the old schema.

IIRC they didn't have values that could meaningfully be 0?

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/30] IB/core: Add gid_type to path and rdma_id_private

2015-02-19 Thread Matan Barak



On 2/19/2015 5:51 PM, Haggai Eran wrote:

On 20/02/2015 00:02, Somnath Kotur wrote:

@@ -1521,6 +1520,8 @@ static int cm_req_handler(struct cm_work *work)
struct ib_cm_id *cm_id;
struct cm_id_private *cm_id_priv, *listen_cm_id_priv;
struct cm_req_msg *req_msg;
+   union ib_gid gid;
+   struct ib_gid_attr gid_attr;
int ret;

req_msg = (struct cm_req_msg *)work-mad_recv_wc-recv_buf.mad;
@@ -1560,11 +1561,19 @@ static int cm_req_handler(struct cm_work *work)
cm_format_paths_from_req(req_msg, work-path[0], work-path[1]);

memcpy(work-path[0].dmac, cm_id_priv-av.ah_attr.dmac, ETH_ALEN);
-   ret = cm_init_av_by_path(work-path[0], cm_id_priv-av);
+   ret = ib_get_cached_gid(work-port-cm_dev-ib_device,
+   work-port-port_num,
+   cm_id_priv-av.ah_attr.grh.sgid_index,
+   gid, gid_attr);
+   if (!ret) {
+   work-path[0].gid_type = gid_attr.gid_type;
+   ret = cm_init_av_by_path(work-path[0], cm_id_priv-av);
+   }
if (ret) {
ib_get_cached_gid(work-port-cm_dev-ib_device,
  work-port-port_num, 0, work-path[0].sgid,
- NULL);
+ gid_attr);
+   work-path[0].gid_type = gid_attr.gid_type;
ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
   work-path[0].sgid, sizeof work-path[0].sgid,
   NULL, 0);


Is there support for alternative path? (Because I don't see a similar
change for path[1]).



No, support was never implemented for path[1]. We filled some attributes 
of path[1] before, but no vendor implemented any support for it (in RoCE 
obviously).

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 29/30] IB/mlx4: Create and use another QP1 for RoCEv2

2015-02-19 Thread Shachar Raindel


 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
 ow...@vger.kernel.org] On Behalf Of Somnath Kotur
 Sent: Friday, February 20, 2015 12:03 AM
 To: rol...@kernel.org
 Cc: linux-rdma@vger.kernel.org; Moni Shoua; Somnath Kotur
 Subject: [PATCH 29/30] IB/mlx4: Create and use another QP1 for RoCEv2
 
 From: Moni Shoua mo...@mellanox.com
 
 The mlx4 driver uses a special QP to implement the GSI QP. This kind of
 QP allows to build the InfiniBand headers in SW to be put before the
 payload that comes in with the WR. The mlx4 HW builds the packet,
 calculates the ICRC and puts it at the end of the payload. This ICRC
 calculation however depends on the QP configuration which is determined
 when QP is modified (roce_mode during INIT-RTR). On the other hand,
 ICRC
 verification when packet is received does to depend on this

The grammar in this sentence is broken. Generally speaking, it is hard to 
understand what the patch does and aim to do from this commit message.
Please rephrase.

 configuration.
 Therefore, using 2 GSI QPs for send (one for each RoCE version) and 1
 GSI QP for receive are required.
 
 Signed-off-by: Moni Shoua mo...@mellanox.com
 Signed-off-by: Somnath Kotur somnath.ko...@emulex.com
 ---
  drivers/infiniband/hw/mlx4/mlx4_ib.h |7 ++
  drivers/infiniband/hw/mlx4/qp.c  |  154
 ++
  2 files changed, 143 insertions(+), 18 deletions(-)
 
 diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h
 b/drivers/infiniband/hw/mlx4/mlx4_ib.h
 index 018bda6..a853330 100644
 --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
 +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
 @@ -159,11 +159,18 @@ struct mlx4_ib_wq {
   unsignedtail;
  };
 
 +enum {
 + MLX4_IB_QP_CREATE_ROCE_V2_GSI = IB_QP_CREATE_RESERVED_START
 +};
 +
  enum mlx4_ib_qp_flags {
   MLX4_IB_QP_LSO = IB_QP_CREATE_IPOIB_UD_LSO,
   MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK =
 IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK,
   MLX4_IB_QP_NETIF = IB_QP_CREATE_NETIF_QP,
   MLX4_IB_QP_CREATE_USE_GFP_NOIO = IB_QP_CREATE_USE_GFP_NOIO,
 +
 + /* Mellanox specific flags start from IB_QP_CREATE_RESERVED_START
 */
 + MLX4_IB_ROCE_V2_GSI_QP = MLX4_IB_QP_CREATE_ROCE_V2_GSI,

Why do you need 2 enums for this flag definition?
Also, the comment refers to IB_QP_CREATE_RESERVED_START, which is not appearing 
here.

   MLX4_IB_SRIOV_TUNNEL_QP = 1  30,
   MLX4_IB_SRIOV_SQP = 1  31,
  };
 diff --git a/drivers/infiniband/hw/mlx4/qp.c
 b/drivers/infiniband/hw/mlx4/qp.c
 index 9996527..161b933 100644
 --- a/drivers/infiniband/hw/mlx4/qp.c
 +++ b/drivers/infiniband/hw/mlx4/qp.c
 @@ -81,6 +81,7 @@ struct mlx4_ib_sqp {
   u32 send_psn;
   struct ib_ud_header ud_header;
   u8  header_buf[MLX4_IB_UD_HEADER_SIZE];
 + struct ib_qp*roce_v2_gsi;
  };
 
  enum {
 @@ -150,7 +151,10 @@ static int is_sqp(struct mlx4_ib_dev *dev, struct
 mlx4_ib_qp *qp)
   }
   }
   }
 - return proxy_sqp;
 + if (proxy_sqp)
 + return 1;
 +
 + return !!(qp-flags  MLX4_IB_ROCE_V2_GSI_QP);

What are the implications of this double-QP scheme in virtualization scenario?

  }
 
  /* used for INIT/CLOSE port logic */
 @@ -672,6 +676,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev,
 struct ib_pd *pd,
   qp = sqp-qp;
   qp-pri.vid = 0x;
   qp-alt.vid = 0x;
 + sqp-roce_v2_gsi = NULL;
   } else {
   qp = kzalloc(sizeof (struct mlx4_ib_qp), gfp);
   if (!qp)
 @@ -1029,9 +1034,17 @@ static void destroy_qp_common(struct mlx4_ib_dev
 *dev, struct mlx4_ib_qp *qp,
   del_gid_entries(qp);
  }
 
 -static u32 get_sqp_num(struct mlx4_ib_dev *dev, struct ib_qp_init_attr
 *attr)
 +static int get_sqp_num(struct mlx4_ib_dev *dev, struct ib_qp_init_attr
 *attr)
  {
   /* Native or PPF */
 + if ((!mlx4_is_mfunc(dev-dev) || mlx4_is_master(dev-dev)) 
 + attr-create_flags  MLX4_IB_QP_CREATE_ROCE_V2_GSI) {
 + int sqpn;
 + int res = mlx4_qp_reserve_range(dev-dev, 1, 1, sqpn, 0);
 +
 + return res ? -abs(res) : sqpn;

This seems wrong. Mlx4_qp_reserve_range returns either 0 (success) or negative 
errno value on error. The check here should therefore say res ? res : sqpn;.

If you want, you can add a WARN_ON(res  0); above it.

 + }
 +
   if (!mlx4_is_mfunc(dev-dev) ||
   (mlx4_is_master(dev-dev) 
attr-create_flags  MLX4_IB_SRIOV_SQP)) {
 @@ -1039,6 +1052,7 @@ static u32 get_sqp_num(struct mlx4_ib_dev *dev,
 struct ib_qp_init_attr *attr)
   (attr-qp_type == IB_QPT_SMI ? 0 : 2) +
   attr-port_num - 1;
   }
 +
   /* PF or VF -- creating proxies */
   if (attr-qp_type == IB_QPT_SMI)
   return 

Re: [PATCH 10/30] IB/core: Add gid_type to path and rdma_id_private

2015-02-19 Thread Haggai Eran
On 20/02/2015 00:02, Somnath Kotur wrote:
 @@ -1521,6 +1520,8 @@ static int cm_req_handler(struct cm_work *work)
   struct ib_cm_id *cm_id;
   struct cm_id_private *cm_id_priv, *listen_cm_id_priv;
   struct cm_req_msg *req_msg;
 + union ib_gid gid;
 + struct ib_gid_attr gid_attr;
   int ret;
  
   req_msg = (struct cm_req_msg *)work-mad_recv_wc-recv_buf.mad;
 @@ -1560,11 +1561,19 @@ static int cm_req_handler(struct cm_work *work)
   cm_format_paths_from_req(req_msg, work-path[0], work-path[1]);
  
   memcpy(work-path[0].dmac, cm_id_priv-av.ah_attr.dmac, ETH_ALEN);
 - ret = cm_init_av_by_path(work-path[0], cm_id_priv-av);
 + ret = ib_get_cached_gid(work-port-cm_dev-ib_device,
 + work-port-port_num,
 + cm_id_priv-av.ah_attr.grh.sgid_index,
 + gid, gid_attr);
 + if (!ret) {
 + work-path[0].gid_type = gid_attr.gid_type;
 + ret = cm_init_av_by_path(work-path[0], cm_id_priv-av);
 + }
   if (ret) {
   ib_get_cached_gid(work-port-cm_dev-ib_device,
 work-port-port_num, 0, work-path[0].sgid,
 -   NULL);
 +   gid_attr);
 + work-path[0].gid_type = gid_attr.gid_type;
   ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
  work-path[0].sgid, sizeof work-path[0].sgid,
  NULL, 0);

Is there support for alternative path? (Because I don't see a similar
change for path[1]).
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/30] IB/core: Modify ib_verbs and cma in order to use roce_gid_cache

2015-02-19 Thread Haggai Eran
On 20/02/2015 00:02, Somnath Kotur wrote:
 @@ -203,21 +235,30 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 
 port_num, struct ib_wc *wc,
  
   memset(ah_attr, 0, sizeof *ah_attr);
   if (is_eth) {
 + u16 vlan_id = wc-wc_flags  IB_WC_WITH_VLAN ?
 + wc-vlan_id : 0x;
 +
   if (!(wc-wc_flags  IB_WC_GRH))
   return -EPROTOTYPE;
  
 - if (wc-wc_flags  IB_WC_WITH_SMAC 
 - wc-wc_flags  IB_WC_WITH_VLAN) {
 - memcpy(ah_attr-dmac, wc-smac, ETH_ALEN);
 - ah_attr-vlan_id = wc-vlan_id;
 - } else {
 + if (!(wc-wc_flags  IB_WC_WITH_SMAC) ||
 + !(wc-wc_flags  IB_WC_WITH_VLAN)) {
   ret = rdma_addr_find_dmac_by_grh(grh-dgid, grh-sgid,
 - ah_attr-dmac, ah_attr-vlan_id);
 +  ah_attr-dmac,
 +  wc-wc_flags  
 IB_WC_WITH_VLAN ?
 +  NULL : vlan_id,
 +  0);
   if (ret)
   return ret;
   }
 - } else {
 - ah_attr-vlan_id = 0x;

Previously vlan_id would get set to 0x on non-Ethernet link-layer,
and now it is left as zero. Wouldn't that break things for non-Ethernet
protocols?

 +
 + ret = get_sgid_index_from_eth(device, port_num, vlan_id,
 +   grh-dgid, gid_index);
 + if (ret)
 + return ret;
 +
 + if (wc-wc_flags  IB_WC_WITH_SMAC)
 + memcpy(ah_attr-dmac, wc-smac, ETH_ALEN);
   }
  
   ah_attr-dlid = wc-slid;

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/30] IB/core: Modify ib_verbs and cma in order to use roce_gid_cache

2015-02-19 Thread Haggai Eran
On 20/02/2015 00:02, Somnath Kotur wrote:
 @@ -502,9 +547,7 @@ EXPORT_SYMBOL(ib_create_qp);
  static const struct {
   int valid;
   enum ib_qp_attr_maskreq_param[IB_QPT_MAX];
 - enum ib_qp_attr_maskreq_param_add_eth[IB_QPT_MAX];
   enum ib_qp_attr_maskopt_param[IB_QPT_MAX];
 - enum ib_qp_attr_maskopt_param_add_eth[IB_QPT_MAX];
  } qp_state_table[IB_QPS_ERR + 1][IB_QPS_ERR + 1] = {
   [IB_QPS_RESET] = {
   [IB_QPS_RESET] = { .valid = 1 },
 @@ -585,12 +628,6 @@ static const struct {
   IB_QP_MAX_DEST_RD_ATOMIC
 |
   IB_QP_MIN_RNR_TIMER),
   },
 - .req_param_add_eth = {
 - [IB_QPT_RC]  = (IB_QP_SMAC),
 - [IB_QPT_UC]  = (IB_QP_SMAC),
 - [IB_QPT_XRC_INI]  = (IB_QP_SMAC),
 - [IB_QPT_XRC_TGT]  = (IB_QP_SMAC)
 - },
   .opt_param = {
[IB_QPT_UD]  = (IB_QP_PKEY_INDEX   
 |
IB_QP_QKEY),
 @@ -611,21 +648,7 @@ static const struct {
[IB_QPT_GSI] = (IB_QP_PKEY_INDEX   
 |
IB_QP_QKEY),
},
 - .opt_param_add_eth = {
 - [IB_QPT_RC]  = (IB_QP_ALT_SMAC  
 |
 - IB_QP_VID   
 |
 - IB_QP_ALT_VID),
 - [IB_QPT_UC]  = (IB_QP_ALT_SMAC  
 |
 - IB_QP_VID   
 |
 - IB_QP_ALT_VID),
 - [IB_QPT_XRC_INI]  = (IB_QP_ALT_SMAC 
 |
 - IB_QP_VID   
 |
 - IB_QP_ALT_VID),
 - [IB_QPT_XRC_TGT]  = (IB_QP_ALT_SMAC 
 |
 - IB_QP_VID   
 |
 - IB_QP_ALT_VID)
 - }
 - }
 + },
   },
   [IB_QPS_RTR]   = {
   [IB_QPS_RESET] = { .valid = 1 },
 @@ -847,13 +870,6 @@ int ib_modify_qp_is_ok(enum ib_qp_state cur_state, enum 
 ib_qp_state next_state,
   req_param = qp_state_table[cur_state][next_state].req_param[type];
   opt_param = qp_state_table[cur_state][next_state].opt_param[type];
  
 - if (ll == IB_LINK_LAYER_ETHERNET) {
 - req_param |= qp_state_table[cur_state][next_state].
 - req_param_add_eth[type];
 - opt_param |= qp_state_table[cur_state][next_state].
 - opt_param_add_eth[type];
 - }
 -
   if ((mask  req_param) != req_param)
   return 0;
  

I understand this patch will remove any kernel reference to these
modify_qp attributes. However, what about user-space? Was it previously
allowed to pass in these parameters?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html