[PATCH for-3.10 0/3] 2nd batch of iSER patches

2013-05-08 Thread Or Gerlitz
Hi Roland,

Here's a 2nd batch of iser patches for 3.10, with the hightlight being
a fix to the device removal flow from Roi Dayan. For some reason the race
this patch fixes doesn't hit on IB link layer as of different timings 
(e.g more modules that register with the IB core, such as IPoIB), but
it was there, thanks for Sean we nailed down the problem and came up
with a proper fix.

Also, with the kernel now having iser target support through LIO and the
increased use cases for iser, I added a MAINTAINERS entry to help people 
figure out who's involved (and send bugs and flames...) hope you're OK 
with that.

Or.

Or Gerlitz (2):
  IB/iser: Add Mellanox copyright
  MAINTAINERS: Add entry for iSCSI Extensions for RDMA (iSER) initiator

Roi Dayan (1):
  IB/iser: Fix device removal flow

 MAINTAINERS  |   13 +
 drivers/infiniband/ulp/iser/iscsi_iser.c |1 +
 drivers/infiniband/ulp/iser/iscsi_iser.h |1 +
 drivers/infiniband/ulp/iser/iser_initiator.c |1 +
 drivers/infiniband/ulp/iser/iser_memory.c|1 +
 drivers/infiniband/ulp/iser/iser_verbs.c |   16 +---
 6 files changed, 26 insertions(+), 7 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RESEND for-3.10 3/3] MAINTAINERS: Add entry for iSCSI Extensions for RDMA (iSER) initiator

2013-05-08 Thread Or Gerlitz
Add entry for the iSER initiator driver and which is maintained by
Or Gerlitz and Roi Dayan below the kernel Infiniband subsystem.

Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 MAINTAINERS |   10 ++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 8bdd7a7..cc5861c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4378,6 +4378,16 @@ S:   Maintained
 F: drivers/scsi/*iscsi*
 F: include/scsi/*iscsi*
 
+ISCSI EXTENSIONS FOR RDMA (ISER) INITIATOR
+M: Or Gerlitz ogerl...@mellanox.com
+M: Roi Dayan r...@mellanox.com
+L: linux-rdma@vger.kernel.org
+S: Supported
+W: http://www.openfabrics.org
+W: www.open-iscsi.org
+Q: http://patchwork.kernel.org/project/linux-rdma/list/
+F: drivers/infiniband/ulp/iser
+
 ISDN SUBSYSTEM
 M: Karsten Keil i...@linux-pingi.de
 L: isdn4li...@listserv.isdn4linux.de (subscribers-only)
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RESEND for-3.10 2/3] IB/iser: Add Mellanox copyright

2013-05-08 Thread Or Gerlitz
Add Mellanox copyright to the iser initiator source code which I maintain.

Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/ulp/iser/iscsi_iser.c |1 +
 drivers/infiniband/ulp/iser/iscsi_iser.h |1 +
 drivers/infiniband/ulp/iser/iser_initiator.c |1 +
 drivers/infiniband/ulp/iser/iser_memory.c|1 +
 drivers/infiniband/ulp/iser/iser_verbs.c |1 +
 5 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c 
b/drivers/infiniband/ulp/iser/iscsi_iser.c
index f19b099..2e84ef8 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -5,6 +5,7 @@
  * Copyright (C) 2004 Alex Aizman
  * Copyright (C) 2005 Mike Christie
  * Copyright (c) 2005, 2006 Voltaire, Inc. All rights reserved.
+ * Copyright (c) 2013 Mellanox Technologies. All rights reserved.
  * maintained by openib-gene...@openib.org
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h 
b/drivers/infiniband/ulp/iser/iscsi_iser.h
index cae6084..e0afab4 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -8,6 +8,7 @@
  *
  * Copyright (c) 2004, 2005, 2006 Voltaire, Inc. All rights reserved.
  * Copyright (c) 2005, 2006 Cisco Systems.  All rights reserved.
+ * Copyright (c) 2013 Mellanox Technologies. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c 
b/drivers/infiniband/ulp/iser/iser_initiator.c
index a00ccd1..b6d81a8 100644
--- a/drivers/infiniband/ulp/iser/iser_initiator.c
+++ b/drivers/infiniband/ulp/iser/iser_initiator.c
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2004, 2005, 2006 Voltaire, Inc. All rights reserved.
+ * Copyright (c) 2013 Mellanox Technologies. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/ulp/iser/iser_memory.c 
b/drivers/infiniband/ulp/iser/iser_memory.c
index 68ebb7f..7827baf 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2004, 2005, 2006 Voltaire, Inc. All rights reserved.
+ * Copyright (c) 2013 Mellanox Technologies. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c 
b/drivers/infiniband/ulp/iser/iser_verbs.c
index f13cc22..2c4941d 100644
--- a/drivers/infiniband/ulp/iser/iser_verbs.c
+++ b/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -1,6 +1,7 @@
 /*
  * Copyright (c) 2004, 2005, 2006 Voltaire, Inc. All rights reserved.
  * Copyright (c) 2005, 2006 Cisco Systems.  All rights reserved.
+ * Copyright (c) 2013 Mellanox Technologies. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RESEND for-3.10 1/3] IB/iser: Fix device removal flow

2013-05-08 Thread Or Gerlitz
From: Roi Dayan r...@mellanox.com

Change the code to destroy the last opened rdma_cm id after making sure
we released all other objects (QP,CQs,PD,etc) associated with the IB device.

Since iser accesses the IB device using the rdma_cm id, we need to free any
objects that are related to the device which is associated with the rdma_cm
id prior to destroying that id. When this isn't ensured, the low level driver
that created this device can be unloaded before iser has a chance to free
all the objects and a such a call may invoke code segment which isn't valid
any more and crash.

Cc: Sean Hefty sean.he...@intel.com
Signed-off-by: Roi Dayan r...@mellanox.com
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/ulp/iser/iser_verbs.c |   15 ---
 1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c 
b/drivers/infiniband/ulp/iser/iser_verbs.c
index 5278916..f13cc22 100644
--- a/drivers/infiniband/ulp/iser/iser_verbs.c
+++ b/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -292,10 +292,10 @@ out_err:
 }
 
 /**
- * releases the FMR pool, QP and CMA ID objects, returns 0 on success,
+ * releases the FMR pool and QP objects, returns 0 on success,
  * -1 on failure
  */
-static int iser_free_ib_conn_res(struct iser_conn *ib_conn, int can_destroy_id)
+static int iser_free_ib_conn_res(struct iser_conn *ib_conn)
 {
int cq_index;
BUG_ON(ib_conn == NULL);
@@ -314,13 +314,9 @@ static int iser_free_ib_conn_res(struct iser_conn 
*ib_conn, int can_destroy_id)
 
rdma_destroy_qp(ib_conn-cma_id);
}
-   /* if cma handler context, the caller acts s.t the cma destroy the id */
-   if (ib_conn-cma_id != NULL  can_destroy_id)
-   rdma_destroy_id(ib_conn-cma_id);
 
ib_conn-fmr_pool = NULL;
ib_conn-qp   = NULL;
-   ib_conn-cma_id   = NULL;
kfree(ib_conn-page_vec);
 
if (ib_conn-login_buf) {
@@ -415,11 +411,16 @@ static void iser_conn_release(struct iser_conn *ib_conn, 
int can_destroy_id)
list_del(ib_conn-conn_list);
mutex_unlock(ig.connlist_mutex);
iser_free_rx_descriptors(ib_conn);
-   iser_free_ib_conn_res(ib_conn, can_destroy_id);
+   iser_free_ib_conn_res(ib_conn);
ib_conn-device = NULL;
/* on EVENT_ADDR_ERROR there's no device yet for this conn */
if (device != NULL)
iser_device_try_release(device);
+   /* if cma handler context, the caller actually destroy the id */
+   if (ib_conn-cma_id != NULL  can_destroy_id) {
+   rdma_destroy_id(ib_conn-cma_id);
+   ib_conn-cma_id = NULL;
+   }
iscsi_destroy_endpoint(ib_conn-ep);
 }
 
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RESEND for-3.10 0/3] 2nd batch of iSER patches

2013-05-08 Thread Or Gerlitz

Resending the whole series as of wrong chunk getting into patch #3, sorry
for that.

Hi Roland,

Here's a 2nd batch of iser patches for 3.10, with the hightlight being
a fix to the device removal flow from Roi Dayan. For some reason the race
this patch fixes doesn't hit on IB link layer as of different timings 
(e.g more modules that register with the IB core, such as IPoIB), but
it was there, thanks for Sean we nailed down the problem and came up
with a proper fix.

Also, with the kernel now having iser target support through LIO and the
increased use cases for iser, I added a MAINTAINERS entry to help people 
figure out who's involved (and send bugs and flames...) hope you're OK 
with that.

Or.

Or Gerlitz (2):
  IB/iser: Add Mellanox copyright
  MAINTAINERS: Add entry for iSCSI Extensions for RDMA (iSER) initiator

Roi Dayan (1):
  IB/iser: Fix device removal flow

 MAINTAINERS  |   10 ++
 drivers/infiniband/ulp/iser/iscsi_iser.c |1 +
 drivers/infiniband/ulp/iser/iscsi_iser.h |1 +
 drivers/infiniband/ulp/iser/iser_initiator.c |1 +
 drivers/infiniband/ulp/iser/iser_memory.c|1 +
 drivers/infiniband/ulp/iser/iser_verbs.c |   16 +---
 6 files changed, 23 insertions(+), 7 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] please pull infiniband.git

2013-05-09 Thread Or Gerlitz

On 09/05/2013 00:20, Roland Dreier wrote:

Or Gerlitz (2):
   IB/iser: Return error to upper layers on EAGAIN registration failures
   IB/iser: Add support for iser CM REQ additional info

Roi Dayan (2):
   IB/iser: Add module version
   IB/iser: Move informational messages from error to info level


Hi Roland, so Linus pulled these patches, but I can't find them on the  
for-next branch nor other branches of your tree, I assume this will be 
fixed through some rebase. Note I sent three more iser patches for 3.10.


Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: list corruption in IPOIB

2013-05-18 Thread Or Gerlitz
On Fri, May 17, 2013 at 10:36 PM, Jack Wang jinpu.w...@profitbricks.com wrote:
 We've seen below neigh-list list corruption warning during testing,

So about little heads up on what kernel you are using? what's the way
to trigger this warning?

  From Dongsu's and my opinion, several place also need
 netif_tx_lock(_bh)/netif_tx_unlock(_bh) pairs around neigh-list , I
 tried to add netif_tx_lock/netif_tx_unlock into ipoib_cm_destroy_tx, it
 improved the situation, there're some other places in ipoib_main.c and
 ipoib_mcast.c, but I don't know which lock should be added, if you can
 take some time to look into it, that will be great.


what do you mean by improved the situation? the waring is gone? and if
yes, what's remain?

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: list corruption in IPOIB

2013-05-19 Thread Or Gerlitz

On 19/05/2013 00:36, Jack Wang wrote:

I tried 3.4.23, and mainline kernel from Roland's rdma-for-linus, we
added bug injection interface,  run multithread iperf, and switched ib
mode between connected and datagram in sync on each side as Shlomo
suggested.


Can you be more specific re the  bug injection interface, is that 
existing kernel mechanism or something you added? so the bug triggers 
when you run iperf in multi-threaded mode AND in parallel inject errors 
AND  in parallel switch between datagram and connected mode? bee --- I 
assume this isn't something you do just for the fun of it... so some 
problem X hits you in production and this problem Y you get with the 
above juggling, any known or empiric relation between the two?


Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MLX4 Cq Question

2013-05-19 Thread Or Gerlitz

On 18/05/2013 00:37, Roland Dreier wrote:

you see that when freeing a CQ, we first do the HW2SW_CQ firmware
command; once this command completes, no more events will be generated
for that CQ.  Then we do synchronize_irq for the CQ's interrupt
vector.  Once that completes, no more completion handlers will be
running for the CQ, so we can safely delete the CQ from the radix tree
(relying on the radix tree's safety of deleting one entry while
possibly looking up other entries, so no lock is needed).  We also use
the lock to synchronize against the CQ event function, which as you
noted does take the lock too.

Basic idea is that we're tricky and careful so we can make the fast
path (completion interrupt handling) lock-free, but then use locks and
whatever else needed in the slow path (CQ async event handling, CQ
destroy).


Jack, so do we finally agree to this analysis?  last time when this was 
on the list, I was under the impression that there was no consensus and 
I also see that on the stack we provide to customers there's a patch of 
yours in that area, or it may fix another bug?


Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: list corruption in IPOIB

2013-05-20 Thread Or Gerlitz

On 20/05/2013 12:10, Jinpu Wang wrote:

which list_del do you mean? in ipoib_cm_tx_start?
yes, but not only, you can start with 5KG hammer and convert all 
thesehits to list_del_init


linux-2.6]# grep list_del drivers/infiniband/ulp/ipoib/*.c | grep neigh
drivers/infiniband/ulp/ipoib/ipoib_cm.c: list_del(neigh-list);
drivers/infiniband/ulp/ipoib/ipoib_cm.c: list_del(neigh-list);
drivers/infiniband/ulp/ipoib/ipoib_cm.c: list_del(neigh-list);
drivers/infiniband/ulp/ipoib/ipoib_main.c: list_del(neigh-list);
drivers/infiniband/ulp/ipoib/ipoib_main.c: list_del(neigh-list);
drivers/infiniband/ulp/ipoib/ipoib_main.c: list_del(neigh-list);
drivers/infiniband/ulp/ipoib/ipoib_main.c: list_del(neigh-list);
drivers/infiniband/ulp/ipoib/ipoib_main.c: list_del(neigh-list);
drivers/infiniband/ulp/ipoib/ipoib_main.c: list_del(neigh-list);

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


libibverbs / libmlx4 release

2013-05-20 Thread Or Gerlitz

Hi Roland,

Following what we discussed last week during the Linux Foundation EU 
summit, I think it would be good to follow what you said and have a 
point release for libibverbs and libmlx4 before we pull in the verbs 
extensions framework and features that use it (XRC, Flow-Steering, etc 
more fun).


I mentioned to you that we have some more libmlx4 patches, but its 
totally OK for us to submit them after that release, makes sense?


Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: list corruption in IPOIB

2013-05-20 Thread Or Gerlitz

On 20/05/2013 15:46, Jinpu Wang wrote:

A quick test show the list_corruption warning is gone, after I convert
  all list_del(neigh-list) to  list_del_list(neigh-list).


yes, but this wasn't your original problem or was it?

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: list corruption in IPOIB

2013-05-20 Thread Or Gerlitz
On Mon, May 20, 2013 at 5:36 PM, Jack Wang jinpu.w...@profitbricks.com wrote:
 Sorry for confusion. Current list corruption is gone in my preliminary test, 
 after I changed
 list_del to list_del_init as Or suggested.
 As Or asked for the original bug, so I just want to show him the whole story.

I am still not clear if  the bug you saw in your production
environment is gone with the list_del_init patch applied, please
clarify.

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


rockets feedbacks?

2013-05-20 Thread Or Gerlitz
Hi Sean,

Do we have some public quoted usages/feedback for rsockets? I think
you've mentioned something during the panel at the Linux EU summit
last week but I am not sure...

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: list corruption in IPOIB

2013-05-20 Thread Or Gerlitz
On Mon, May 20, 2013 at 10:38 PM, Jack Wang jinpu.w...@profitbricks.com wrote:

 The bug in our production environment is introduced in our backport
 about ipoib fixes from mainline, and when we hit that bug we reverted
 back to old kernel without the backport patch, and the bug didn't happen for 
 now.

 This list_del_init patch do fix list corruption warning, but it's not the one 
 we hit in production, the list corruption is reproduced in our test setup 
 with bug injection patch  iperf -P 50  mode switch.

 Is this clear for you now?


NO, you say that the list_del_init patch eliminates the list
corruption warning, does the list corruption is still reproduced in
your test setup even when  the patch is applied?! what's the trace?
and what is the trace you see in your production when using kernel X
(which?) patches with commit Y (which?)

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rockets feedbacks?

2013-05-20 Thread Or Gerlitz
On Mon, May 20, 2013 at 10:52 PM, Hefty, Sean sean.he...@intel.com wrote:
 Do we have some public quoted usages/feedback for rsockets? I think
 you've mentioned something during the panel at the Linux EU summit
 last week but I am not sure...

 Most feedback I can think of has come via private emails or personal 
 interactions, especially specific details of various usage models.

So if you were pushing these private conversations to linux-rdma, more
have been known on rsockets for the benefit of all... oh well. I think
you mentioned something re Intel HPC group, or I am wrong?

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MLX4 Cq Question

2013-05-21 Thread Or Gerlitz

On 20/05/2013 17:53, Jack Morgenstein wrote:

===
net/mlx4_core: Fix racy flow in the driver CQ completion handler

The mlx4 CQ completion handler, mlx4_cq_completion, doesn't bother to lock
the radix tree which is used to manage the table of CQs, nor does it increase
the reference count of the CQ before invoking the user provided callback
(and decrease it afterwards).

This is racy and can cause use-after-free, null pointer dereference, etc, which
result in kernel crashes.

To fix this, we must do the following in mlx4_cq_completion:
- increase the ref count on the cq before invoking the user callback, and
   decrement it after the callback.
- Place a lock around the radix tree lookup/ref-count-increase

Using an irq spinlock will not fix this issue. The problem is that under VPI,
the ETH interface uses multiple msix irq's, which can result in one cq 
completion
event interrupting another in-progress cq completion event. A deadlock results
when the handler for the first cq completion grabs the spinlock, and is
interrupted by the second completion before it has a chance to release the 
spinlock.
The handler for the second completion will deadlock waiting for the spinlock
to be released.


I am not sure to follow on two pieces here:

1. why we say that only mlx4_en uses multiple msix irq's? mlx4_ib also 
exposes multiple vectors (-- EQs -- MSI-X -- IRQ)

and the iser driver use that, e.g creates multiple CQs each on different EQ

2. is possible in the Linux kernel for one hard irq callback to flash on 
CPU X while another hard irq callback is running on the same CPU?


Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MLX4 Cq Question

2013-05-21 Thread Or Gerlitz

On 21/05/2013 13:42, Bart Van Assche wrote:

On 05/21/13 11:40, Or Gerlitz wrote:

2. is possible in the Linux kernel for one hard irq callback to flash on
CPU X while another hard irq callback is running on the same CPU?


I think that from kernel 2.6.35 on MSI IRQs are no longer nested. See 
also 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=753649dbc49345a73a2454c770a3f2d54d11aec6 
or http://lwn.net/Articles/380931/


thanks, so suppose we agree on that, still the patch makes sense as the 
race is there, but does the patch has to change?


Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


better understanding rdma-cm UNREACHABLE/ETIMEDOUT scheme

2013-05-21 Thread Or Gerlitz

Hi Sean,

We have a user space application which is made of M (clients) x N 
(servers) RC connectivity pattern using librdmacm. Basically, there are 
N nodes, each running M client process and each client connects to all N 
servers.


So under some unknown conditions, many of the clients connection 
attempts fail with RDMA_CM_EVENT_UNREACHABLE event and the status is 
-ETIMEDOUT.  Looking on the rdma-cm kernel code, I see that the only 
location which generates this event is in cma_ib_handler when getting 
IB_CM_REQ_ERROR (or IB_CM_REP_ERROR).


Digging down into the CM, I see that the only place where 
IB_CM_REQ_ERROR is delivered is on cm_process_send_error which is called 
when the status of mad send completion is not success or flush.


Digging down into the MAD code and the CM usage of it,  I see that that 
the mad code will issue a mac send completion handler with the 
IB_WC_RESP_TIMEOUT_ERR status, and that the CM code programs the number 
of retries set by its consumer (rdma-cm in this case) into the mad send 
buffer.


Running this over an M=8 and N=4setup, e.g four nodes, each running one 
server process and eight client processes and sampling the IB CM 
counters before and after the job and adding the numbers from the four 
nodes, we see the following


cm_tx_msgs.req = 395
cm_tx_retries.req= 270
cm_rx_msgs.req= 390

cm_tx_msgs.rep= 375
cm_tx_retries.rep= 255
cm_rx_msgs.rep= 380

cm_tx_msgs.rtu= 108
cm_rx_msgs.rtu= 103

cm_tx_msgs.mra= 540
cm_rx_msgs.mra= 270
cm_tx_retries.mra= 270

In cm_send_handler we see that the CM TX retry counter is incremented 
with the number of retries reported
by the MAD layer, I also see that the RDMA-CM programs the CM to do 15 
retries and the CM further programs this into the MAD send buffers.


From the RTU counters its clear that at most ~100 connections got 
established out of 128.


One thing seen in the nodes dmesg is a message from an old patch of 
yours which exists in ofed1.5.3 but didn't hit (or wasn't accepted?) 
upstream saying ib_cm: calculated mra timeout 67584  8192, decreasing 
used timeout_ms does this provides any insight into the problem?


One more piece of info, is that this apps doesn't call rdma_disconnect 
at all, when they are done or if something goes wrong (e.g that 
unreachable event) they simply issue rdma_destroy_id which when I look 
on the rdma-cm/cm code gets to a CM function whic sends a dreq (if the 
ID is in the established state) and puts the ID in the timewait zone.


So it seems we're not loosing mads, also on the stack they use (that 
1.5.3) the ucma backlog size is 128
but each server process gets only 32 request (8x4) so we don't think 
ucma dropping REQs as of no more backlog budget takes place.


Or.







--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MLX4 Cq Question

2013-05-21 Thread Or Gerlitz

On 21/05/2013 17:13, Jack Morgenstein wrote:

I just need to verify that the patch can be applied correctly on the upstream 
kernel.
The use of RCU (and not spinlock) makes sense from a performance standpoint
in any case. We do NOT want to force mlx4_cq_completion to have a spinlock
which is device-global, resulting in having completion event processing be
single-threaded in effect).

cool, lets do that and re-submit
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: better understanding rdma-cm UNREACHABLE/ETIMEDOUT scheme

2013-05-21 Thread Or Gerlitz

On 21/05/2013 18:24, Hefty, Sean wrote:

I don't remember this patch at all.


Alex, can you please send Sean this patch
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: better understanding rdma-cm UNREACHABLE/ETIMEDOUT scheme

2013-05-21 Thread Or Gerlitz
On Tue, May 21, 2013 at 6:24 PM, Hefty, Sean sean.he...@intel.com wrote:

 One thing seen in the nodes dmesg is a message from an old patch of
 yours which exists in ofed1.5.3 but didn't hit (or wasn't accepted?)
 upstream saying ib_cm: calculated mra timeout 67584  8192, decreasing
 used timeout_ms does this provides any insight into the problem?

 I don't remember this patch at all.

Alex sent it to you, is that something which is missing upstream or
alternatively could create troubles on that ofed stack where its
applied?


 My first guess is that the server isn't responding to new requests.

yep, smells like this could be the root cause here, Dina and Alex will
do some tweaking of the server code to make sure there's no starvation
is servicing new connection requests.

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] add RAW Packet QP type

2013-05-21 Thread Or Gerlitz
On Tue, May 21, 2013 at 1:43 AM, Shawn Bohrer shawn.boh...@gmail.com wrote:
 I appologize if I missed it, but did any support for L3/L4 CSUM
 generation get added?  Doesn't look like the upstream libibverbs has
 it, and I don't seem to see any patches floating around.

Roland commented that he will make a point release  this week for
libibverbs and libmlx4, the patches for CSUM offload I will post after
that release.

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 0/9] Add receive Flow Steering support

2013-05-21 Thread Or Gerlitz
On Tue, May 21, 2013 at 1:54 AM, Shawn Bohrer shawn.boh...@gmail.com wrote:

 Are there any patches for libibverbs to add
 ibv_create_flow/ibv_destroy_flow?  And are there any needed patches
 for libmlx4?  I'm building up a stack so we can begin testing this series.

YES there are patches NO I didn't post them here yet, as I went the
bottom-up way of 1st posting kernel patches and once they are accepted
(which didn't happen yet) post the user space patches. Last week over
the Linux EU summit, people made comments that the flow-steering
patches looks OK, and I understand Roland is fine with accepting them
for the 3.11 merge window. I do want you or anyone else to start
testing them right away, please start with getting  a system with 3.10
+ the flow-steering patches to run and I will post here in the coming
days pointer to user space implementation you can use for testing the
kernel patches.

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libibverbs / libmlx4 release

2013-05-22 Thread Or Gerlitz
On Mon, May 20, 2013 at 7:49 PM, Roland Dreier rol...@kernel.org wrote:
 That's fine, I'll do the releases this week.

cool.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ibv_reg_mr call failed

2013-05-26 Thread Or Gerlitz

On 24/05/2013 20:43, Liu Ginhann wrote:

Here is version information of our test chassis,

what driver / card  are you using?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ibv_reg_mr call failed

2013-05-26 Thread Or Gerlitz

On 24/05/2013 20:43, Liu Ginhann wrote:

Ibv_reg_mr() call return EFAULT - bad address


Basically, AFAIK, the IB stack should support what you are trying to do, 
If you tell from which code e.g in libibverbs/libmlx4 or the kernel this 
value is returned, it might be helpful for further assisting you.


Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ibv_reg_mr call failed

2013-05-27 Thread Or Gerlitz
On Sun, May 26, 2013 at 4:52 PM, Liu Ginhann
ginhann@grassvalley.com wrote:
 Will dig into it.Thanks.



does this works if you use get_free_pages in the kernel instead of kmem_cache?

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] libibverbs 1.1.7 is released

2013-05-29 Thread Or Gerlitz

On 29/05/2013 02:10, Roland Dreier wrote:

libibverbs is a library that allows programs to use RDMA verbs for
direct access to RDMA (currently InfiniBand and iWARP) hardware from userspace.


Hey, so there's RoCE out there too...



The new stable release, 1.1.7, is available from


libmlx4 releasing is coming too?

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of ummunot branch?

2013-05-29 Thread Or Gerlitz
On Tue, May 28, 2013 at 8:51 PM, Jeff Squyres (jsquyres)
jsquy...@cisco.com wrote:

  I ask because, as an MPI guy, I would *love* to see this stuff integrated 
 into the kernel and libibverbs.


Hi Jeff,

Have you looked on ODP? see
https://www.openfabrics.org/resources/document-downloads/presentations/doc_download/568-on-demand-paging-for-user-space-networking.html

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of ummunot branch?

2013-05-29 Thread Or Gerlitz

On 30/05/2013 01:56, Jeff Squyres (jsquyres) wrote:

On May 29, 2013, at 4:53 AM, Or Gerlitz or.gerl...@gmail.com wrote:


Have you looked on ODP? see
https://www.openfabrics.org/resources/document-downloads/presentations/doc_download/568-on-demand-paging-for-user-space-networking.html


Is this upstream?


No


Has this been run by the MPI implementor community?


The team that works on this here isn't ready for submission, so 
community runs were not made yet




The limitation of a max of 2 concurrent page faults seems fairly significant.



let me check
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ibv_reg_mr call failed

2013-05-31 Thread Or Gerlitz
On Fri, May 31, 2013 at 2:05 AM, Liu Ginhann
ginhann@grassvalley.com wrote:
 Or,

 Tried __get_free_pages, kmalloc, all behave the same - fail in ibv_reg_mr 
 call but the va mapped back is fine for peek and poke.

 Any other suggestion?

Yes, tell from where the EFAULT  error originates


 Hank

 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
 ow...@vger.kernel.org] On Behalf Of Or Gerlitz
 Sent: Monday, May 27, 2013 12:53 AM
 To: Liu Ginhann
 Cc: Or Gerlitz; linux-rdma@vger.kernel.org
 Subject: Re: ibv_reg_mr call failed

 On Sun, May 26, 2013 at 4:52 PM, Liu Ginhann
 ginhann@grassvalley.com wrote:
  Will dig into it.Thanks.



 does this works if you use get_free_pages in the kernel instead of
 kmem_cache?

 Or.
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in the
 body of a message to majord...@vger.kernel.org More majordomo info at
 http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of ummunot branch?

2013-06-04 Thread Or Gerlitz

On 04/06/2013 04:24, Jeff Squyres (jsquyres) wrote:

On May 29, 2013, at 1:53 AM, Or Gerlitz or.gerl...@gmail.com wrote:


Have you looked on ODP? see
https://www.openfabrics.org/resources/document-downloads/presentations/doc_download/568-on-demand-paging-for-user-space-networking.html


Is the idea behind ODP that, at the beginning of time, you register the entire 
memory space (i.e., NULL to 2^64) and then never worry about registered memory?



Adding Haggai from the team that works on ODP. Haggai, Jeff also made a 
comment over this thread http://marc.info/?t=13697634766r=1w=2 
that a limitation of a max of 2 concurrent page faults seems fairly 
significant which you might want to address too.


Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ibv_reg_mr call failed

2013-06-09 Thread Or Gerlitz

On 08/06/2013 19:42, Liu Ginhann wrote:

does this works if you use get_free_pages in the kernel instead of
kmem_cache?


I tried get_free_pages, kmalloc, kmem_cache_alloc. None of them work, it failed 
with the same error - EFAULT bad pointer.

After code walk through, I believe it failed in ib_umem_get routine 
get_user_pages call. Below is the code snippet and it seems like the address 
point to is expect to be user space memory. If the comment is true, then that 
may be able to explain why ibv_reg_mr is not happy with remap address from 
kernel allocated memory but perfectly fine with malloc from user space.


Guys,  do you agree, will ib_umem_get always fail when provided memory 
which wasn't allocated @ user-space? why?


Or.



If this is true. Do you think there is another way to accomplish this? It got 
to have some way to do this.

You help is appreciated.

Hank

/**
  * ib_umem_get - Pin and DMA map userspace memory.
  * @context: userspace context to pin memory for
  * @addr: userspace virtual address to start at
  * @size: length of region to pin
  * @access: IB_ACCESS_xxx flags for memory being pinned
  * @dmasync: flush in-flight DMA when the memory region is written
  */
struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
size_t size, int access, int dmasync)
{
ret = 0;
while (npages) {
ret = get_user_pages(current, current-mm, cur_base,
 min_t(unsigned long, npages,
   PAGE_SIZE / sizeof (struct page *)),
 1, !umem-writable, page_list, vma_list);

if (ret  0)
goto out;




--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V1 for-next 4/4] IB/mlx4: Add receive Flow Steering support

2013-06-11 Thread Or Gerlitz
From: Hadar Hen Zion had...@mellanox.com

Implement the ib_create_flow and ib_destroy_flow verbs.

Translate the verbs structures provided by the user to HW structures
and call the MLX4_QP_FLOW_STEERING_ATTACH/DETACH firmware commands.

On the ATTACH command completion, the firmware provides 64 bit registration
ID which is returned to the caller within struct ib_flow and used
later for detaching that flow.

Signed-off-by: Hadar Hen Zion had...@mellanox.com
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/hw/mlx4/main.c |  246 +
 1 files changed, 246 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index 23d7343..0ac5023 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -54,6 +54,8 @@
 #define DRV_VERSION1.0
 #define DRV_RELDATEApril 4, 2008
 
+#define MLX4_IB_FLOW_MAX_PRIO 0xFFF
+
 MODULE_AUTHOR(Roland Dreier);
 MODULE_DESCRIPTION(Mellanox ConnectX HCA InfiniBand driver);
 MODULE_LICENSE(Dual BSD/GPL);
@@ -88,6 +90,25 @@ static void init_query_mad(struct ib_smp *mad)
 
 static union ib_gid zgid;
 
+static int check_flow_steering_support(struct mlx4_dev *dev)
+{
+   int ib_num_ports = 0;
+   int i;
+
+   mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_IB)
+   ib_num_ports++;
+
+   if (dev-caps.steering_mode == MLX4_STEERING_MODE_DEVICE_MANAGED) {
+   if (ib_num_ports || mlx4_is_mfunc(dev)) {
+   pr_warn(Device managed flow steering is unavailable 
+   for IB ports or in multifunction env.\n);
+   return 0;
+   }
+   return 1;
+   }
+   return 0;
+}
+
 static int mlx4_ib_query_device(struct ib_device *ibdev,
struct ib_device_attr *props)
 {
@@ -144,6 +165,8 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
props-device_cap_flags |= IB_DEVICE_MEM_WINDOW_TYPE_2B;
else
props-device_cap_flags |= IB_DEVICE_MEM_WINDOW_TYPE_2A;
+   if (check_flow_steering_support(dev-dev))
+   props-device_cap_flags |= IB_DEVICE_MANAGED_FLOW_STEERING;
}
 
props-vendor_id   = be32_to_cpup((__be32 *) (out_mad-data + 
36)) 
@@ -798,6 +821,220 @@ struct mlx4_ib_steering {
union ib_gid gid;
 };
 
+static int parse_flow_attr(struct mlx4_dev *dev,
+  struct _ib_flow_spec *ib_spec,
+  struct _rule_hw *mlx4_spec)
+{
+   enum mlx4_net_trans_rule_id type;
+
+   switch (ib_spec-type) {
+   case IB_FLOW_SPEC_ETH:
+   type = MLX4_NET_TRANS_RULE_ID_ETH;
+   memcpy(mlx4_spec-eth.dst_mac, ib_spec-eth.val.dst_mac,
+  ETH_ALEN);
+   memcpy(mlx4_spec-eth.dst_mac_msk, ib_spec-eth.mask.dst_mac,
+  ETH_ALEN);
+   mlx4_spec-eth.vlan_tag = ib_spec-eth.val.vlan_tag;
+   mlx4_spec-eth.vlan_tag_msk = ib_spec-eth.mask.vlan_tag;
+   break;
+
+   case IB_FLOW_SPEC_IB:
+   type = MLX4_NET_TRANS_RULE_ID_IB;
+   mlx4_spec-ib.l3_qpn = ib_spec-ib.val.l3_type_qpn;
+   mlx4_spec-ib.qpn_mask = ib_spec-ib.mask.l3_type_qpn;
+   memcpy(mlx4_spec-ib.dst_gid, ib_spec-ib.val.dst_gid, 16);
+   memcpy(mlx4_spec-ib.dst_gid_msk,
+  ib_spec-ib.mask.dst_gid, 16);
+   break;
+
+   case IB_FLOW_SPEC_IPV4:
+   type = MLX4_NET_TRANS_RULE_ID_IPV4;
+   mlx4_spec-ipv4.src_ip = ib_spec-ipv4.val.src_ip;
+   mlx4_spec-ipv4.src_ip_msk = ib_spec-ipv4.mask.src_ip;
+   mlx4_spec-ipv4.dst_ip = ib_spec-ipv4.val.dst_ip;
+   mlx4_spec-ipv4.dst_ip_msk = ib_spec-ipv4.mask.dst_ip;
+   break;
+
+   case IB_FLOW_SPEC_TCP:
+   case IB_FLOW_SPEC_UDP:
+   type = ib_spec-type == IB_FLOW_SPEC_TCP ?
+   MLX4_NET_TRANS_RULE_ID_TCP :
+   MLX4_NET_TRANS_RULE_ID_UDP;
+   mlx4_spec-tcp_udp.dst_port = ib_spec-tcp_udp.val.dst_port;
+   mlx4_spec-tcp_udp.dst_port_msk = 
ib_spec-tcp_udp.mask.dst_port;
+   mlx4_spec-tcp_udp.src_port = ib_spec-tcp_udp.val.src_port;
+   mlx4_spec-tcp_udp.src_port_msk = 
ib_spec-tcp_udp.mask.src_port;
+   break;
+
+   default:
+   return -EINVAL;
+   }
+   if (mlx4_map_sw_to_hw_steering_id(dev, type)  0 ||
+   mlx4_hw_rule_sz(dev, type)  0)
+   return -EINVAL;
+   mlx4_spec-id = cpu_to_be16(mlx4_map_sw_to_hw_steering_id(dev, type));
+   mlx4_spec-size = mlx4_hw_rule_sz(dev, type)  2;
+   return mlx4_hw_rule_sz(dev, type);
+}
+
+static int __mlx4_ib_create_flow(struct ib_qp *qp, struct

[PATCH V1 for-next 1/4] IB/core: Add receive Flow Steering support

2013-06-11 Thread Or Gerlitz
From: Hadar Hen Zion had...@mellanox.com

The RDMA stack allows for applications to create IB_QPT_RAW_PACKET QPs,
for which plain Ethernet packets are used, specifically packets which
don't carry any QPN to be matched by the receiving side.

Applications using these QPs must be provided with a method to
program some steering rule with the HW so packets arriving at
the local port can be routed to them.

This patch adds ib_create_flow which allow to provide a flow specification
for a QP, such that when there's a match between the specification and the
received packet, it can be forwarded to that QP, in a similar manner
one needs to use ib_attach_multicast for IB UD multicast handling.

Flow specifications are provided as instances of struct ib_flow_spec_yyy
which describe L2, L3 and L4 headers, currently specs for Ethernet, IPv4,
TCP, UDP and IB are defined. Flow specs are made of values and masks.

The input to ib_create_flow is instance of struct ib_flow_attr which
contain few mandatory control elements and optional flow specs.

struct ib_flow_attr {
enum ib_flow_attr_type type;
u16  size;
u16  priority;
u8   num_of_specs;
u8   port;
u32  flags;
/* Following are the optional layers according to user request
 * struct ib_flow_spec_yyy
 * struct ib_flow_spec_zzz
 */
};

As these specs are eventually coming from user space, they are defined and
used in a way which allows adding new spec types without kernel/user ABI
change, and with a little API enhancement which defines the newly added spec.

The flow spec structures are defined in a TLV (Type-Length-Value) manner,
which allows to call ib_create_flow with a list of variable length of
optional specs.

For the actual processing of ib_flow_attr the driver uses the number of
specs and the size mandatory fields along with the TLV nature of the specs.

Steering rules processing order is according to rules priority. The user
sets the 12 low-order bits from the priority field and the remaining
4 high-order bits are set by the kernel according to a domain the
application or the layer that created the rule belongs to. Lower
priority numerical value means higher priority.

The returned value from ib_create_flow is instance of struct ib_flow
which contains a database pointer (handle) provided by the HW driver
to be used when calling ib_destroy_flow.

Applications that offload TCP/IP traffic could be written also over IB UD QPs.
As such, the ib_create_flow / ib_destroy_flow API is designed to support UD QPs
too, the HW driver sets IB_DEVICE_MANAGED_FLOW_STEERING to denote support
of flow steering.

The ib_flow_attr enum type relates to usage of flow steering for promiscuous
and sniffer purposes:

IB_FLOW_ATTR_NORMAL - regular rule, steering according to rule specification

IB_FLOW_ATTR_ALL_DEFAULT - default unicast and multicast rule, receive
all Ethernet traffic which isn't steered to any QP

IB_FLOW_ATTR_MC_DEFAULT - same as IB_FLOW_ATTR_ALL_DEFAULT but only for 
multicast

IB_FLOW_ATTR_SNIFFER - sniffer rule, receive all port traffic

ALL_DEFAULT and MC_DEFAULT rules options are valid only for Ethernet link type.

Signed-off-by: Hadar Hen Zion had...@mellanox.com
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/core/verbs.c |   30 +
 include/rdma/ib_verbs.h |  136 ++-
 2 files changed, 164 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 22192de..932f4a7 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1254,3 +1254,33 @@ int ib_dealloc_xrcd(struct ib_xrcd *xrcd)
return xrcd-device-dealloc_xrcd(xrcd);
 }
 EXPORT_SYMBOL(ib_dealloc_xrcd);
+
+struct ib_flow *ib_create_flow(struct ib_qp *qp,
+  struct ib_flow_attr *flow_attr,
+  int domain)
+{
+   struct ib_flow *flow_id;
+   if (!qp-device-create_flow)
+   return ERR_PTR(-ENOSYS);
+
+   flow_id = qp-device-create_flow(qp, flow_attr, domain);
+   if (!IS_ERR(flow_id))
+   atomic_inc(qp-usecnt);
+   return flow_id;
+}
+EXPORT_SYMBOL(ib_create_flow);
+
+int ib_destroy_flow(struct ib_flow *flow_id)
+{
+   int err;
+   struct ib_qp *qp = flow_id-qp;
+
+   if (!flow_id-qp-device-destroy_flow)
+   return -ENOSYS;
+
+   err = qp-device-destroy_flow(flow_id);
+   if (!err)
+   atomic_dec(qp-usecnt);
+   return err;
+}
+EXPORT_SYMBOL(ib_destroy_flow);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 98cc4b2..6f76d62 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -116,7 +116,8 @@ enum ib_device_cap_flags {
IB_DEVICE_MEM_MGT_EXTENSIONS= (121),
IB_DEVICE_BLOCK_MULTICAST_LOOPBACK = (122),
IB_DEVICE_MEM_WINDOW_TYPE_2A= (123

[PATCH V1 for-next 0/4] Add receive Flow Steering support

2013-06-11 Thread Or Gerlitz
Hi Roland, all

These patches add Flow Steering support to the kernel IB core, to uverbs and 
to the mlx4 IB (verbs) driver along with one patch to uverbs which adds 
some code to support extensions.

  IB/core: Add receive Flow Steering support
  IB/core: Infra-structure to support verbs extensions through uverbs
  IB/core: Export ib_create/destroy_flow through uverbs
  IB/mlx4: Add receive Flow Steering support

The main patch which introduces the Flow-Steering API is IB/core: Add receive 
Flow 
Steering support, see its change log. Looking on the Network Adapter Flow 
Steering 
slides from Tzahi Oved which he presented on the annual OFA 2012 meeting could 
be helpful
https://www.openfabrics.org/resources/document-downloads/presentations/doc_download/518-network-adapter-flow-steering.html

V0 has been acknowledged by Steve and Christoph, and was also got positive 
feedback from 
Sean and Jason over f2f talks we had during the Linux Foundation EU summit on 
last month.

V1 changes:

 - dropped the five pre-patches which were accepted into 3.10
 - rebased the patches against Roland's for-next / 3.10-rc4
 - in patch #3, ib_uverbs_destroy_flow was returning too quickly when the driver
   returned failure for ib_destroy_flow, need to free some uverbs resources 1st.
 - in patch #4, check index before accessing the array at 
mlx4_ib_create/destroy_flow


Or.

Hadar Hen Zion (3):
  IB/core: Add receive Flow Steering support
  IB/core: Export ib_create/destroy_flow through uverbs
  IB/mlx4: Add receive Flow Steering support

Igor Ivanov (1):
  IB/core: Infra-structure to support verbs extensions through uverbs

 drivers/infiniband/core/uverbs.h  |3 +
 drivers/infiniband/core/uverbs_cmd.c  |  206 +++
 drivers/infiniband/core/uverbs_main.c |   42 +-
 drivers/infiniband/core/verbs.c   |   30 
 drivers/infiniband/hw/mlx4/main.c |  246 +
 include/rdma/ib_verbs.h   |  137 ++-
 include/uapi/rdma/ib_user_verbs.h |  118 -
 7 files changed, 773 insertions(+), 9 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V1 for-next 3/4] IB/core: Export ib_create/destroy_flow through uverbs

2013-06-11 Thread Or Gerlitz
From: Hadar Hen Zion had...@mellanox.com

Implement ib_uverbs_create_flow and ib_uverbs_destroy_flow to
support flow steering for user space applications.

Signed-off-by: Hadar Hen Zion had...@mellanox.com
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/core/uverbs.h  |3 +
 drivers/infiniband/core/uverbs_cmd.c  |  206 +
 drivers/infiniband/core/uverbs_main.c |   13 ++-
 include/rdma/ib_verbs.h   |1 +
 include/uapi/rdma/ib_user_verbs.h |  108 +-
 5 files changed, 329 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index 0fcd7aa..ad9d102 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -155,6 +155,7 @@ extern struct idr ib_uverbs_cq_idr;
 extern struct idr ib_uverbs_qp_idr;
 extern struct idr ib_uverbs_srq_idr;
 extern struct idr ib_uverbs_xrcd_idr;
+extern struct idr ib_uverbs_rule_idr;
 
 void idr_remove_uobj(struct idr *idp, struct ib_uobject *uobj);
 
@@ -215,5 +216,7 @@ IB_UVERBS_DECLARE_CMD(destroy_srq);
 IB_UVERBS_DECLARE_CMD(create_xsrq);
 IB_UVERBS_DECLARE_CMD(open_xrcd);
 IB_UVERBS_DECLARE_CMD(close_xrcd);
+IB_UVERBS_DECLARE_CMD(create_flow);
+IB_UVERBS_DECLARE_CMD(destroy_flow);
 
 #endif /* UVERBS_H */
diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index a7d00f6..956782b 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -54,6 +54,7 @@ static struct uverbs_lock_class qp_lock_class = { .name = 
QP-uobj };
 static struct uverbs_lock_class ah_lock_class  = { .name = AH-uobj };
 static struct uverbs_lock_class srq_lock_class = { .name = SRQ-uobj };
 static struct uverbs_lock_class xrcd_lock_class = { .name = XRCD-uobj };
+static struct uverbs_lock_class rule_lock_class = { .name = RULE-uobj };
 
 #define INIT_UDATA(udata, ibuf, obuf, ilen, olen)  \
do {\
@@ -330,6 +331,7 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
INIT_LIST_HEAD(ucontext-srq_list);
INIT_LIST_HEAD(ucontext-ah_list);
INIT_LIST_HEAD(ucontext-xrcd_list);
+   INIT_LIST_HEAD(ucontext-rule_list);
ucontext-closing = 0;
 
resp.num_comp_vectors = file-device-num_comp_vectors;
@@ -2587,6 +2589,210 @@ out_put:
return ret ? ret : in_len;
 }
 
+static int kern_spec_to_ib_spec(struct ib_kern_spec *kern_spec,
+   struct _ib_flow_spec *ib_spec)
+{
+   ib_spec-type = kern_spec-type;
+
+   switch (ib_spec-type) {
+   case IB_FLOW_SPEC_ETH:
+   ib_spec-eth.size = sizeof(struct ib_flow_spec_eth);
+   memcpy(ib_spec-eth.val, kern_spec-eth.val,
+  sizeof(struct ib_flow_eth_filter));
+   memcpy(ib_spec-eth.mask, kern_spec-eth.mask,
+  sizeof(struct ib_flow_eth_filter));
+   break;
+   case IB_FLOW_SPEC_IB:
+   ib_spec-ib.size = sizeof(struct ib_flow_spec_ib);
+   memcpy(ib_spec-ib.val, kern_spec-ib.val,
+  sizeof(struct ib_flow_ib_filter));
+   memcpy(ib_spec-ib.mask, kern_spec-ib.mask,
+  sizeof(struct ib_flow_ib_filter));
+   break;
+   case IB_FLOW_SPEC_IPV4:
+   ib_spec-ipv4.size = sizeof(struct ib_flow_spec_ipv4);
+   memcpy(ib_spec-ipv4.val, kern_spec-ipv4.val,
+  sizeof(struct ib_flow_ipv4_filter));
+   memcpy(ib_spec-ipv4.mask, kern_spec-ipv4.mask,
+  sizeof(struct ib_flow_ipv4_filter));
+   break;
+   case IB_FLOW_SPEC_TCP:
+   case IB_FLOW_SPEC_UDP:
+   ib_spec-tcp_udp.size = sizeof(struct ib_flow_spec_tcp_udp);
+   memcpy(ib_spec-tcp_udp.val, kern_spec-tcp_udp.val,
+  sizeof(struct ib_flow_tcp_udp_filter));
+   memcpy(ib_spec-tcp_udp.mask, kern_spec-tcp_udp.mask,
+  sizeof(struct ib_flow_tcp_udp_filter));
+   break;
+   default:
+   return -EINVAL;
+   }
+   return 0;
+}
+
+ssize_t ib_uverbs_create_flow(struct ib_uverbs_file *file,
+ const char __user *buf, int in_len,
+ int out_len)
+{
+   struct ib_uverbs_create_flow  cmd;
+   struct ib_uverbs_create_flow_resp resp;
+   struct ib_uobject *uobj;
+   struct ib_flow*flow_id;
+   struct ib_kern_flow_attr  *kern_flow_attr;
+   struct ib_flow_attr   *flow_attr;
+   struct ib_qp  *qp;
+   int err = 0;
+   void *kern_spec;
+   void *ib_spec;
+   int i;
+
+   if (out_len  sizeof(resp))
+   return -ENOSPC;
+
+   if (copy_from_user(cmd, buf

[PATCH V1 for-next 2/4] IB/core: Infra-structure to support verbs extensions through uverbs

2013-06-11 Thread Or Gerlitz
From: Igor Ivanov igor.iva...@itseez.com

Add Infra-structure to support extended uverbs capabilities in a 
forward/backward
manner. Uverbs command opcodes which are based on the verbs extensions approach 
should
be greater or equal to IB_USER_VERBS_CMD_THRESHOLD. They have new header format
and processed a bit differently.

Signed-off-by: Igor Ivanov igor.iva...@itseez.com
Signed-off-by: Hadar Hen Zion had...@mellanox.com
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/core/uverbs_main.c |   29 -
 include/uapi/rdma/ib_user_verbs.h |   10 ++
 2 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_main.c 
b/drivers/infiniband/core/uverbs_main.c
index 2c6f0f2..e4e7b24 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -583,9 +583,6 @@ static ssize_t ib_uverbs_write(struct file *filp, const 
char __user *buf,
if (copy_from_user(hdr, buf, sizeof hdr))
return -EFAULT;
 
-   if (hdr.in_words * 4 != count)
-   return -EINVAL;
-
if (hdr.command = ARRAY_SIZE(uverbs_cmd_table) ||
!uverbs_cmd_table[hdr.command])
return -EINVAL;
@@ -597,8 +594,30 @@ static ssize_t ib_uverbs_write(struct file *filp, const 
char __user *buf,
if (!(file-device-ib_dev-uverbs_cmd_mask  (1ull  hdr.command)))
return -ENOSYS;
 
-   return uverbs_cmd_table[hdr.command](file, buf + sizeof hdr,
-hdr.in_words * 4, hdr.out_words * 
4);
+   if (hdr.command = IB_USER_VERBS_CMD_THRESHOLD) {
+   struct ib_uverbs_cmd_hdr_ex hdr_ex;
+
+   if (copy_from_user(hdr_ex, buf, sizeof(hdr_ex)))
+   return -EFAULT;
+
+   if (((hdr_ex.in_words + hdr_ex.provider_in_words) * 4) != count)
+   return -EINVAL;
+
+   return uverbs_cmd_table[hdr.command](file,
+buf + sizeof(hdr_ex),
+(hdr_ex.in_words +
+ hdr_ex.provider_in_words) 
* 4,
+(hdr_ex.out_words +
+ 
hdr_ex.provider_out_words) * 4);
+   } else {
+   if (hdr.in_words * 4 != count)
+   return -EINVAL;
+
+   return uverbs_cmd_table[hdr.command](file,
+buf + sizeof(hdr),
+hdr.in_words * 4,
+hdr.out_words * 4);
+   }
 }
 
 static int ib_uverbs_mmap(struct file *filp, struct vm_area_struct *vma)
diff --git a/include/uapi/rdma/ib_user_verbs.h 
b/include/uapi/rdma/ib_user_verbs.h
index 805711e..61535aa 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -43,6 +43,7 @@
  * compatibility are made.
  */
 #define IB_USER_VERBS_ABI_VERSION  6
+#define IB_USER_VERBS_CMD_THRESHOLD50
 
 enum {
IB_USER_VERBS_CMD_GET_CONTEXT,
@@ -123,6 +124,15 @@ struct ib_uverbs_cmd_hdr {
__u16 out_words;
 };
 
+struct ib_uverbs_cmd_hdr_ex {
+   __u32 command;
+   __u16 in_words;
+   __u16 out_words;
+   __u16 provider_in_words;
+   __u16 provider_out_words;
+   __u32 cmd_hdr_reserved;
+};
+
 struct ib_uverbs_get_context {
__u64 response;
__u64 driver_data[0];
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/14] IB/srp: Skip host settle delay

2013-06-13 Thread Or Gerlitz

On 13/06/2013 12:53, Sebastian Riemer wrote:

On 12.06.2013 15:24, Bart Van Assche wrote:

The SRP initiator implements host reset by reconnecting to the SRP
target. That means that communication with the target is possible
as soon as host reset finished. Hence skip the host settle delay.

Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Roland Dreier rol...@purestorage.com
Cc: David Dillow dillo...@ornl.gov
Cc: Vu Pham v...@mellanox.com
Cc: Sebastian Riemer sebastian.rie...@profitbricks.com
---
  drivers/infiniband/ulp/srp/ib_srp.c |1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index fb37b47..be12780 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1949,6 +1949,7 @@ static struct scsi_host_template srp_template = {
.eh_abort_handler   = srp_abort,
.eh_device_reset_handler= srp_reset_device,
.eh_host_reset_handler  = srp_reset_host,
+   .skip_settle_delay  = true,
.sg_tablesize   = SRP_DEF_SG_TABLESIZE,
.can_queue  = SRP_CMD_SQ_SIZE,
.this_id= -1,


Signed-off-by: Sebastian Riemer sebastian.rie...@profitbricks.com
Acked-by: Sebastian Riemer sebastian.rie...@profitbricks.com
Tested-by: Sebastian Riemer sebastian.rie...@profitbricks.com
Reviewed-by: Sebastian Riemer sebastian.rie...@profitbricks.com
Reviewed-by: Christoph Hellwig h...@infradead.org

Choose something,


yes, but too many things... else we will end up with one liner patch 
that has ten yyy-by: credit lines...


Or.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next 2/4] IB/mlx4: RoCE IP based GID addressing

2013-06-13 Thread Or Gerlitz
From: Moni Shoua mo...@mellanox.co.il

Currently, the mlx4 driver set RoCE (IBoE) gids to encode related
Ethernet netdevice interface MAC address and possibly VLAN id.

Change this scheme such that gids encode interface IP addresses
(both IP4 and IPv6).

Signed-off-by: Moni Shoua mo...@mellanox.co.il
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/hw/mlx4/ah.c  |   21 +-
 drivers/infiniband/hw/mlx4/cq.c  |5 +
 drivers/infiniband/hw/mlx4/main.c|  461 +++---
 drivers/infiniband/hw/mlx4/mlx4_ib.h |3 +
 drivers/infiniband/hw/mlx4/qp.c  |   19 +-
 include/linux/mlx4/cq.h  |   14 +-
 6 files changed, 354 insertions(+), 169 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/ah.c b/drivers/infiniband/hw/mlx4/ah.c
index a251bec..3941700 100644
--- a/drivers/infiniband/hw/mlx4/ah.c
+++ b/drivers/infiniband/hw/mlx4/ah.c
@@ -92,21 +92,18 @@ static struct ib_ah *create_iboe_ah(struct ib_pd *pd, 
struct ib_ah_attr *ah_attr
 {
struct mlx4_ib_dev *ibdev = to_mdev(pd-device);
struct mlx4_dev *dev = ibdev-dev;
-   union ib_gid sgid;
-   u8 mac[6];
-   int err;
int is_mcast;
+   struct in6_addr in6;
u16 vlan_tag;
 
-   err = mlx4_ib_resolve_grh(ibdev, ah_attr, mac, is_mcast, 
ah_attr-port_num);
-   if (err)
-   return ERR_PTR(err);
-
-   memcpy(ah-av.eth.mac, mac, 6);
-   err = ib_get_cached_gid(pd-device, ah_attr-port_num, 
ah_attr-grh.sgid_index, sgid);
-   if (err)
-   return ERR_PTR(err);
-   vlan_tag = rdma_get_vlan_id(sgid);
+   memcpy(in6, ah_attr-grh.dgid.raw, sizeof(in6));
+   if (rdma_is_multicast_addr(in6)) {
+   is_mcast = 1;
+   rdma_get_mcast_mac(in6, ah-av.eth.mac);
+   } else {
+   memcpy(ah-av.eth.mac, ah_attr-dmac, 6);
+   }
+   vlan_tag = ah_attr-vlan;
if (vlan_tag  0x1000)
vlan_tag |= (ah_attr-sl  7)  13;
ah-av.eth.port_pd = cpu_to_be32(to_mpd(pd)-pdn | (ah_attr-port_num 
 24));
diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index d5e60f4..ba3f85b 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -793,6 +793,11 @@ repoll:
wc-sl  = be16_to_cpu(cqe-sl_vid)  13;
else
wc-sl  = be16_to_cpu(cqe-sl_vid)  12;
+   if (be32_to_cpu(cqe-vlan_my_qpn)  MLX4_CQE_VLAN_PRESENT_MASK)
+   wc-vlan = be16_to_cpu(cqe-sl_vid)  MLX4_CQE_VID_MASK;
+   else
+   wc-vlan = 0x;
+   memcpy(wc-smac, cqe-smac, 6);
}
 
return 0;
diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index 23d7343..8879b41 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -39,6 +39,8 @@
 #include linux/inetdevice.h
 #include linux/rtnetlink.h
 #include linux/if_vlan.h
+#include net/ipv6.h
+#include net/addrconf.h
 
 #include rdma/ib_smi.h
 #include rdma/ib_user_verbs.h
@@ -767,7 +769,6 @@ static int add_gid_entry(struct ib_qp *ibqp, union ib_gid 
*gid)
 int mlx4_ib_add_mc(struct mlx4_ib_dev *mdev, struct mlx4_ib_qp *mqp,
   union ib_gid *gid)
 {
-   u8 mac[6];
struct net_device *ndev;
int ret = 0;
 
@@ -781,11 +782,7 @@ int mlx4_ib_add_mc(struct mlx4_ib_dev *mdev, struct 
mlx4_ib_qp *mqp,
spin_unlock(mdev-iboe.lock);
 
if (ndev) {
-   rdma_get_mcast_mac((struct in6_addr *)gid, mac);
-   rtnl_lock();
-   dev_mc_add(mdev-iboe.netdevs[mqp-port - 1], mac);
ret = 1;
-   rtnl_unlock();
dev_put(ndev);
}
 
@@ -805,6 +802,8 @@ static int mlx4_ib_mcg_attach(struct ib_qp *ibqp, union 
ib_gid *gid, u16 lid)
struct mlx4_ib_qp *mqp = to_mqp(ibqp);
u64 reg_id;
struct mlx4_ib_steering *ib_steering = NULL;
+   enum mlx4_protocol prot = (gid-raw[1] == 0x0e) ?
+   MLX4_PROT_IB_IPV4 : MLX4_PROT_IB_IPV6;
 
if (mdev-dev-caps.steering_mode ==
MLX4_STEERING_MODE_DEVICE_MANAGED) {
@@ -816,7 +815,7 @@ static int mlx4_ib_mcg_attach(struct ib_qp *ibqp, union 
ib_gid *gid, u16 lid)
err = mlx4_multicast_attach(mdev-dev, mqp-mqp, gid-raw, mqp-port,
!!(mqp-flags 
   MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK),
-   MLX4_PROT_IB_IPV6, reg_id);
+   prot, reg_id);
if (err)
goto err_malloc;
 
@@ -835,7 +834,7 @@ static int mlx4_ib_mcg_attach(struct ib_qp *ibqp, union 
ib_gid *gid, u16 lid)
 
 err_add:
mlx4_multicast_detach(mdev-dev, mqp-mqp, gid-raw,
- MLX4_PROT_IB_IPV6, reg_id);
+ prot, reg_id);
 err_malloc

[PATCH for-next 0/4] IP based RoCE GID Addressing

2013-06-13 Thread Or Gerlitz
Currently, the IB stack (core + drivers) handle RoCE (IBoE) gids as
they encode related Ethernet net-device interface MAC address and 
possibly VLAN id.

This series changes RoCE GIDs to encode IP addresses (IPv4 + IPv6)
of the that Ethernet interface, under the following reasoning:

1. There are environments where the compute entity that runs the RoCE 
stack is not aware that its traffic is vlan-tagged. This results with that 
node to create/assume wrong GIDs from the view point of a peer node which 
is aware to vlans. 

Note that node here can be physical node connected to Ethernet switch acting 
in 
access mode talking to another node which does vlan insertion/stripping by 
itself.

Or another example is SRIOV Virtual Function which is configured to work in 
VST 
mode (Virtual-Switch-Tagging) such that the hypervisor configures the HW 
eSWitch 
to do vlan insertion for the vPORT representing that function.

2. When RoCE traffic is inspected (mirrored/trapped) in Ethernet switches for 
monitoring and security purposes. It is much more natural for both humans and 
automated utilities (...) to observe IP addresses in a certain offset into RoCE 
frames L3 header vs. MAC/VLANs (which are there anyway in the L2 header of that 
frame, so they are not gone by this change).

3. Some Bonding/Teaming advanced mode such as balance-alb and balance-tlb 
are using multiple underlying devices in parallel, and hence packets always 
carry the bond IP address but different streams have different source MACs.
The approach brought by this series is part from what would allow to 
support that for RoCE traffic too.

The 1st patch modified the IB core to cope with the new scheme, and the 2nd 
does that 
for the mlx4_ib driver. The 3rd patch sets the foundation for extending uverbs 
to
the new scheme which was introduced lately, and the fourth patch adds two 
extended
uCMA commands and two extended uVERBS commands which are now exported to user 
space.

These extended verbs will allow to enhance user space libraries such that they 
work 
OK over the modified scheme. All RC applications using librdmacm will not need 
to be 
modified at all, since the change will be encapsulated into that library.

The ocrdma driver needs to go through a similar patch as the mlx4_ib one, we can
surely do that patch, just need to dig there a little further. 

Or.

Igor Ivanov (1):
  IB/core: Infra-structure to support verbs extensions through uverbs

Matan Barak (1):
  IB/core: Add RoCE IP based addressing extensions towards user space

Moni Shoua (2):
  IB/core: RoCE IP based GID addressing
  IB/mlx4: RoCE IP based GID addressing

 drivers/infiniband/core/cm.c  |3 +
 drivers/infiniband/core/cma.c |   39 ++-
 drivers/infiniband/core/sa_query.c|5 +
 drivers/infiniband/core/ucma.c|  190 +++--
 drivers/infiniband/core/uverbs.h  |2 +
 drivers/infiniband/core/uverbs_cmd.c  |  330 -
 drivers/infiniband/core/uverbs_main.c |   33 ++-
 drivers/infiniband/core/uverbs_marshall.c |   94 ++-
 drivers/infiniband/core/verbs.c   |7 +
 drivers/infiniband/hw/mlx4/ah.c   |   21 +-
 drivers/infiniband/hw/mlx4/cq.c   |5 +
 drivers/infiniband/hw/mlx4/main.c |  461 -
 drivers/infiniband/hw/mlx4/mlx4_ib.h  |3 +
 drivers/infiniband/hw/mlx4/qp.c   |   19 +-
 include/linux/mlx4/cq.h   |   14 +-
 include/rdma/ib_addr.h|   45 ++--
 include/rdma/ib_marshall.h|   12 +
 include/rdma/ib_sa.h  |3 +
 include/rdma/ib_verbs.h   |4 +
 include/uapi/rdma/ib_user_sa.h|   34 ++-
 include/uapi/rdma/ib_user_verbs.h |  130 -
 include/uapi/rdma/rdma_user_cm.h  |   21 ++-
 22 files changed, 1157 insertions(+), 318 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next 3/4] IB/core: Infra-structure to support verbs extensions through uverbs

2013-06-13 Thread Or Gerlitz
From: Igor Ivanov igor.iva...@itseez.com

Add Infra-structure to support extended uverbs capabilities in a 
forward/backward
manner. Uverbs command opcodes which are based on the verbs extensions approach 
should
be greater or equal to IB_USER_VERBS_CMD_THRESHOLD. They have new header format
and processed a bit differently.

Signed-off-by: Igor Ivanov igor.iva...@itseez.com
Signed-off-by: Hadar Hen Zion had...@mellanox.com
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/core/uverbs_main.c |   29 -
 include/uapi/rdma/ib_user_verbs.h |   10 ++
 2 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_main.c 
b/drivers/infiniband/core/uverbs_main.c
index 2c6f0f2..e4e7b24 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -583,9 +583,6 @@ static ssize_t ib_uverbs_write(struct file *filp, const 
char __user *buf,
if (copy_from_user(hdr, buf, sizeof hdr))
return -EFAULT;
 
-   if (hdr.in_words * 4 != count)
-   return -EINVAL;
-
if (hdr.command = ARRAY_SIZE(uverbs_cmd_table) ||
!uverbs_cmd_table[hdr.command])
return -EINVAL;
@@ -597,8 +594,30 @@ static ssize_t ib_uverbs_write(struct file *filp, const 
char __user *buf,
if (!(file-device-ib_dev-uverbs_cmd_mask  (1ull  hdr.command)))
return -ENOSYS;
 
-   return uverbs_cmd_table[hdr.command](file, buf + sizeof hdr,
-hdr.in_words * 4, hdr.out_words * 
4);
+   if (hdr.command = IB_USER_VERBS_CMD_THRESHOLD) {
+   struct ib_uverbs_cmd_hdr_ex hdr_ex;
+
+   if (copy_from_user(hdr_ex, buf, sizeof(hdr_ex)))
+   return -EFAULT;
+
+   if (((hdr_ex.in_words + hdr_ex.provider_in_words) * 4) != count)
+   return -EINVAL;
+
+   return uverbs_cmd_table[hdr.command](file,
+buf + sizeof(hdr_ex),
+(hdr_ex.in_words +
+ hdr_ex.provider_in_words) 
* 4,
+(hdr_ex.out_words +
+ 
hdr_ex.provider_out_words) * 4);
+   } else {
+   if (hdr.in_words * 4 != count)
+   return -EINVAL;
+
+   return uverbs_cmd_table[hdr.command](file,
+buf + sizeof(hdr),
+hdr.in_words * 4,
+hdr.out_words * 4);
+   }
 }
 
 static int ib_uverbs_mmap(struct file *filp, struct vm_area_struct *vma)
diff --git a/include/uapi/rdma/ib_user_verbs.h 
b/include/uapi/rdma/ib_user_verbs.h
index 805711e..61535aa 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -43,6 +43,7 @@
  * compatibility are made.
  */
 #define IB_USER_VERBS_ABI_VERSION  6
+#define IB_USER_VERBS_CMD_THRESHOLD50
 
 enum {
IB_USER_VERBS_CMD_GET_CONTEXT,
@@ -123,6 +124,15 @@ struct ib_uverbs_cmd_hdr {
__u16 out_words;
 };
 
+struct ib_uverbs_cmd_hdr_ex {
+   __u32 command;
+   __u16 in_words;
+   __u16 out_words;
+   __u16 provider_in_words;
+   __u16 provider_out_words;
+   __u32 cmd_hdr_reserved;
+};
+
 struct ib_uverbs_get_context {
__u64 response;
__u64 driver_data[0];
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-next 1/4] IB/core: RoCE IP based GID addressing

2013-06-13 Thread Or Gerlitz
From: Moni Shoua mo...@mellanox.co.il

Currently, the IB core assume RoCE (IBoE) gids encode related Ethernet
netdevice interface MAC address and possibly VLAN id.

Change gids to be treated as they encode interface IP address.

Since Ethernet layer 2 address parameters are not longer encoded within gids,
had to extend the Infiniband address structures (e.g. ib_ah_attr) with layer 2
address parameters, namely mac and vlan.

Signed-off-by: Moni Shoua mo...@mellanox.co.il
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/core/cm.c   |3 ++
 drivers/infiniband/core/cma.c  |   39 ++
 drivers/infiniband/core/sa_query.c |5 
 drivers/infiniband/core/ucma.c |   18 +++---
 drivers/infiniband/core/verbs.c|7 +
 include/rdma/ib_addr.h |   45 
 include/rdma/ib_sa.h   |3 ++
 include/rdma/ib_verbs.h|4 +++
 8 files changed, 79 insertions(+), 45 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 784b97c..7af618f 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -1557,6 +1557,9 @@ static int cm_req_handler(struct cm_work *work)
 
cm_process_routed_req(req_msg, work-mad_recv_wc-wc);
cm_format_paths_from_req(req_msg, work-path[0], work-path[1]);
+
+   memcpy(work-path[0].dmac, cm_id_priv-av.ah_attr.dmac, 6);
+   work-path[0].vlan = cm_id_priv-av.ah_attr.vlan;
ret = cm_init_av_by_path(work-path[0], cm_id_priv-av);
if (ret) {
ib_get_cached_gid(work-port-cm_dev-ib_device,
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 71c2c71..ba217c9 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -373,7 +373,9 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv)
return -EINVAL;
 
mutex_lock(lock);
-   iboe_addr_get_sgid(dev_addr, iboe_gid);
+   rdma_ip2gid((struct sockaddr *)id_priv-id.route.addr.src_addr,
+   iboe_gid);
+
memcpy(gid, dev_addr-src_dev_addr +
   rdma_addr_gid_offset(dev_addr), sizeof gid);
list_for_each_entry(cma_dev, dev_list, list) {
@@ -1803,7 +1805,7 @@ static int cma_resolve_iboe_route(struct rdma_id_private 
*id_priv)
struct sockaddr_in *src_addr = (struct sockaddr_in 
*)route-addr.src_addr;
struct sockaddr_in *dst_addr = (struct sockaddr_in 
*)route-addr.dst_addr;
struct net_device *ndev = NULL;
-   u16 vid;
+
 
if (src_addr-sin_family != dst_addr-sin_family)
return -EINVAL;
@@ -1830,10 +1832,13 @@ static int cma_resolve_iboe_route(struct 
rdma_id_private *id_priv)
goto err2;
}
 
-   vid = rdma_vlan_dev_vlan_id(ndev);
+   route-path_rec-vlan = rdma_vlan_dev_vlan_id(ndev);
+   memcpy(route-path_rec-dmac, addr-dev_addr.dst_dev_addr, 6);
 
-   iboe_mac_vlan_to_ll(route-path_rec-sgid, 
addr-dev_addr.src_dev_addr, vid);
-   iboe_mac_vlan_to_ll(route-path_rec-dgid, 
addr-dev_addr.dst_dev_addr, vid);
+   rdma_ip2gid((struct sockaddr *)id_priv-id.route.addr.src_addr,
+   route-path_rec-sgid);
+   rdma_ip2gid((struct sockaddr *)id_priv-id.route.addr.dst_addr,
+   route-path_rec-dgid);
 
route-path_rec-hop_limit = 1;
route-path_rec-reversible = 1;
@@ -1970,6 +1975,8 @@ static void addr_handler(int status, struct sockaddr 
*src_addr,
   RDMA_CM_ADDR_RESOLVED))
goto out;
 
+   memcpy(id_priv-id.route.addr.src_addr, src_addr,
+  ip_addr_size(src_addr));
if (!status  !id_priv-cma_dev)
status = cma_acquire_dev(id_priv);
 
@@ -1979,11 +1986,8 @@ static void addr_handler(int status, struct sockaddr 
*src_addr,
goto out;
event.event = RDMA_CM_EVENT_ADDR_ERROR;
event.status = status;
-   } else {
-   memcpy(id_priv-id.route.addr.src_addr, src_addr,
-  ip_addr_size(src_addr));
+   } else
event.event = RDMA_CM_EVENT_ADDR_RESOLVED;
-   }
 
if (id_priv-id.event_handler(id_priv-id, event)) {
cma_exch(id_priv, RDMA_CM_DESTROYING);
@@ -2381,6 +2385,7 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr 
*addr)
if (ret)
goto err1;
 
+   memcpy(id-route.addr.src_addr, addr, ip_addr_size(addr));
if (!cma_any_addr(addr)) {
ret = rdma_translate_ip(addr, id-route.addr.dev_addr);
if (ret)
@@ -2391,7 +2396,6 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr 
*addr)
goto err1;
}
 
-   memcpy(id-route.addr.src_addr, addr, ip_addr_size(addr));
if (!(id_priv-options  (1  CMA_OPTION_AFONLY))) {
if (addr

[PATCH for-next 4/4] IB/core: Add RoCE IP based addressing extensions towards user space

2013-06-13 Thread Or Gerlitz
From: Matan Barak mat...@mellanox.com

Add support for RoCE (IBoE) IP based addressing extensions towards user space.

Extend INIT_QP_ATTR and QUERY_ROUTE ucma commands.

Extend MODIFY_QP and CREATE_AH uverbs commands.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/core/ucma.c|  172 +++-
 drivers/infiniband/core/uverbs.h  |2 +
 drivers/infiniband/core/uverbs_cmd.c  |  330 ++---
 drivers/infiniband/core/uverbs_main.c |4 +-
 drivers/infiniband/core/uverbs_marshall.c |   94 -
 include/rdma/ib_marshall.h|   12 +
 include/uapi/rdma/ib_user_sa.h|   34 +++-
 include/uapi/rdma/ib_user_verbs.h |  120 +++-
 include/uapi/rdma/rdma_user_cm.h  |   21 ++-
 9 files changed, 690 insertions(+), 99 deletions(-)

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index bc2cb5d..c7dfd99 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -599,6 +599,35 @@ static void ucma_copy_ib_route(struct 
rdma_ucm_query_route_resp *resp,
}
 }
 
+static void ucma_copy_ib_route_ex(struct rdma_ucm_query_route_resp_ex *resp,
+ struct rdma_route *route)
+{
+   struct rdma_dev_addr *dev_addr;
+
+   resp-num_paths = route-num_paths;
+   switch (route-num_paths) {
+   case 0:
+   dev_addr = route-addr.dev_addr;
+   rdma_addr_get_dgid(dev_addr,
+  (union ib_gid *)resp-ib_route[0].dgid);
+   rdma_addr_get_sgid(dev_addr,
+  (union ib_gid *)resp-ib_route[0].sgid);
+   resp-ib_route[0].pkey =
+   cpu_to_be16(ib_addr_get_pkey(dev_addr));
+   break;
+   case 2:
+   ib_copy_path_rec_to_user_ex(resp-ib_route[1],
+   route-path_rec[1]);
+   /* fall through */
+   case 1:
+   ib_copy_path_rec_to_user_ex(resp-ib_route[0],
+   route-path_rec[0]);
+   break;
+   default:
+   break;
+   }
+}
+
 static void ucma_copy_iboe_route(struct rdma_ucm_query_route_resp *resp,
 struct rdma_route *route)
 {
@@ -625,14 +654,39 @@ static void ucma_copy_iboe_route(struct 
rdma_ucm_query_route_resp *resp,
}
 }
 
-static void ucma_copy_iw_route(struct rdma_ucm_query_route_resp *resp,
+static void ucma_copy_iboe_route_ex(struct rdma_ucm_query_route_resp_ex *resp,
+   struct rdma_route *route)
+{
+   resp-num_paths = route-num_paths;
+   switch (route-num_paths) {
+   case 0:
+   rdma_ip2gid((struct sockaddr *)route-addr.dst_addr,
+   (union ib_gid *)resp-ib_route[0].dgid);
+   rdma_ip2gid((struct sockaddr *)route-addr.src_addr,
+   (union ib_gid *)resp-ib_route[0].sgid);
+   resp-ib_route[0].pkey = cpu_to_be16(0x);
+   break;
+   case 2:
+   ib_copy_path_rec_to_user_ex(resp-ib_route[1],
+   route-path_rec[1]);
+   /* fall through */
+   case 1:
+   ib_copy_path_rec_to_user_ex(resp-ib_route[0],
+   route-path_rec[0]);
+   break;
+   default:
+   break;
+   }
+}
+
+static void ucma_copy_iw_route(struct ib_user_path_rec *resp_path,
   struct rdma_route *route)
 {
struct rdma_dev_addr *dev_addr;
 
dev_addr = route-addr.dev_addr;
-   rdma_addr_get_dgid(dev_addr, (union ib_gid *) resp-ib_route[0].dgid);
-   rdma_addr_get_sgid(dev_addr, (union ib_gid *) resp-ib_route[0].sgid);
+   rdma_addr_get_dgid(dev_addr, (union ib_gid *)resp_path-dgid);
+   rdma_addr_get_sgid(dev_addr, (union ib_gid *)resp_path-sgid);
 }
 
 static ssize_t ucma_query_route(struct ucma_file *file,
@@ -684,7 +738,74 @@ static ssize_t ucma_query_route(struct ucma_file *file,
}
break;
case RDMA_TRANSPORT_IWARP:
-   ucma_copy_iw_route(resp, ctx-cm_id-route);
+   ucma_copy_iw_route(resp.ib_route[0], ctx-cm_id-route);
+   break;
+   default:
+   break;
+   }
+
+out:
+   if (copy_to_user((void __user *)(unsigned long)cmd.response,
+resp, sizeof(resp)))
+   ret = -EFAULT;
+
+   ucma_put_ctx(ctx);
+   return ret;
+}
+
+static ssize_t ucma_query_route_ex(struct ucma_file *file,
+  const char __user *inbuf,
+  int in_len, int out_len)
+{
+   struct rdma_ucm_query_route_ex cmd;
+   struct

Re: [PATCH for-next 0/4] IP based RoCE GID Addressing

2013-06-14 Thread Or Gerlitz
Jason Gunthorpe jguntho...@obsidianresearch.com wrote:

 Can you talk abit about compatibility please? What happens when nodes
 with this patch are on the same network as nodes without it?

The CM on the passive side would send a reject with the reason being
invalid gid so this will not go unnoticed.


 Does this patch remove the encoding of the VLAN from the GID?

YES, and I explained in argument #1 why the vlan being there doesn't
work in many environments, in other words, its something that needs to
be fix, and this series addresses that.


 How is the destination MAC derived now?

as it was before, using address resolution, e.g ARPs sent by the RDMA-CM.


 There is a RoCE standard, it doesn't say much, but how the MAC and GRH
 GID are related/derived really should be specified...

 Not sure about copying the IP/IPv6 address from the interface into the
 HW, there has always been pressure to keep verbs separate from the net
 stack.. At the very least patch #2 should have its change log updated
 to actually reflect what is in the patch.

Sure, I'll see what needs to be better explained in the change-log.
Note that the inbox RoCE implementation is tightly coupled to
net-devices, e.g the GID table population is based on netevents of
related netdevices.

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 4/4] IB/core: Add RoCE IP based addressing extensions towards user space

2013-06-14 Thread Or Gerlitz
On Thu, Jun 13, 2013 at 8:09 PM, Jason Gunthorpe
jguntho...@obsidianresearch.com wrote:
 On Thu, Jun 13, 2013 at 06:01:44PM +0300, Or Gerlitz wrote:
 From: Matan Barak mat...@mellanox.com

 Add support for RoCE (IBoE) IP based addressing extensions towards
 user space.

 Extend INIT_QP_ATTR and QUERY_ROUTE ucma commands.

 Extend MODIFY_QP and CREATE_AH uverbs commands.

 This is a really big patch Or, there is lots going on here, hard to
 review :(

 The rdma cm stuff should probably be split out of this, and Sean
 should look at it of course.

sure, will do that, one patch for uverbs and one patch for rdma_ucm


 In fact, since the user ABI is so important, every ABI change should
 be a distinct patch, with a good change log, stating the intended
 goals of the change and ABI visible changes it makes.

point taken, will do that, thanks for bringing this over.


 The changelog above is terrible for a huge patch that makes changes to
 the userspace API.

 diff --git a/include/uapi/rdma/ib_user_sa.h b/include/uapi/rdma/ib_user_sa.h
 index cfc7c9b..367d66a 100644
 +++ b/include/uapi/rdma/ib_user_sa.h
 @@ -48,7 +48,13 @@ enum {
  struct ib_path_rec_data {
   __u32   flags;
   __u32   reserved;
 - __u32   path_rec[16];
 + __u32   path_rec[20];
 +};
 +
 +enum ibv_kern_path_rec_attr_mask {
 + IB_USER_PATH_REC_ATTR_DMAC = 1ULL  0,
 + IB_USER_PATH_REC_ATTR_SMAC = 1ULL  1,
 + IB_USER_PATH_REC_ATTR_VID  = 1ULL  2
  };

 So, how is userspace supposed to know what these values are?

Its part of the verbs extensions deal.

 The current system where the MAC address is in the GID seemed
 understandable, assuming you discover the MAC out of band some how...

MAC is Ethernet layer 2 address, I don't see why put mac in L3 header
(GRH) its better understandable vs putting there L3 address (IP).


 +struct ib_uverbs_modify_qp_ex {
 + __u32 comp_mask;
 + struct ib_uverbs_qp_dest dest;
 + struct ib_uverbs_qp_dest alt_dest;
 [...]
 + struct ib_uverbs_qp_dest_ex dest_ex;
 + struct ib_uverbs_qp_dest_ex alt_dest_ex;

 Yuk.. The 'ex' structures don't have to be byte compatible, they just
 have to have a known transform, dest should be the full extended dest,
 not split into two..

 +struct rdma_ucm_query_route_resp_ex {
 + __u64 node_guid;
 + struct ib_user_path_rec_ex ib_route[2];
 + struct sockaddr_in6 src_addr;
 + struct sockaddr_in6 dst_addr;
 + __u32 num_paths;
 + __u8 port_num;
 + __u8 reserved[3];
 +};

 Should these be sockaddr_storage? How does this intersect with Sean's AF_GID 
 work?

sockaddr_in6 is OK for extending rdma_ucm_query_route_resp as its OK
for the non extended version of that command. I don't see any
intersection with the AF_IB work.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V1 for-next 0/4] Add receive Flow Steering support

2013-06-17 Thread Or Gerlitz
On Tue, Jun 11, 2013 at 2:42 PM, Or Gerlitz ogerl...@mellanox.com wrote:
[...]
 V0 has been acknowledged by Steve and Christoph, and was also got positive
 feedback from Sean and Jason over f2f talks we had during the Linux 
 Foundation EU
 summit on last month.

Hi Roland,

So we're @ -rc6 and there's also other goodies on the plate for the
coming merge window ... any comment here, is this safe for 3.11?
taking this? I am asking here, b/c it doesn't seem you update you
patchwork, so no other choice.

Or.


 V1 changes:

  - dropped the five pre-patches which were accepted into 3.10
  - rebased the patches against Roland's for-next / 3.10-rc4
  - in patch #3, ib_uverbs_destroy_flow was returning too quickly when the
 driver
returned failure for ib_destroy_flow, need to free some uverbs
 resources 1st.
  - in patch #4, check index before accessing the array at
 mlx4_ib_create/destroy_flow


 Or.

 Hadar Hen Zion (3):
   IB/core: Add receive Flow Steering support
   IB/core: Export ib_create/destroy_flow through uverbs
   IB/mlx4: Add receive Flow Steering support

 Igor Ivanov (1):
   IB/core: Infra-structure to support verbs extensions through uverbs

  drivers/infiniband/core/uverbs.h  |3 +
  drivers/infiniband/core/uverbs_cmd.c  |  206 +++
  drivers/infiniband/core/uverbs_main.c |   42 +-
  drivers/infiniband/core/verbs.c   |   30 
  drivers/infiniband/hw/mlx4/main.c |  246
 +
  include/rdma/ib_verbs.h   |  137 ++-
  include/uapi/rdma/ib_user_verbs.h |  118 -
  7 files changed, 773 insertions(+), 9 deletions(-)

 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: config file lost

2013-06-18 Thread Or Gerlitz

On 17/06/2013 21:44, Hal Rosenstock wrote:

I'm not 100% sure about the origin of those RPMs but I think the 3.3.15
one is RedHat packaged and the 3.3.16 appears to be PLD packaged and the
processes are a little different. I suspect the 3.3.16 one is packaged
with the spec file in the tree whereas RedHat uses their own spec file.

FWIW it's simple to generate an up to date config file:

opensm -c opensm.conf


Hal,

YES for your observations, that 3.3.15 was RHEL packages and the 3.3.16 
was built from the upstream spec. I know that
I can generate the config file using the method you suggested, however, 
does the upstream service scripts uses the location to which this is 
generated,  so things are plug-and-play, or I need to hack that somehow?


Or.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: config file lost

2013-06-18 Thread Or Gerlitz

On 18/06/2013 14:19, Hal Rosenstock wrote:

Is /etc/rdma a standard location in Linux ? Is it used by other RDMA upstream 
components ?


its used by RHEL packages, not upstream



Also, opensm doesn't by default use this location for the config file. I expect 
that's dealt with by other scripts RedHat supplies.


yes, this is part of their specs I think



so things are plug-and-play, or I need to hack that somehow?

You would currently need to hack that.


How exactly? I'd like to build rpm from upstream opensm, generate config 
file and have the opensm service script to read this config and apply it 
for successive restarts or sig HUPs I send. Maybe you can come up with 
some patch, it will help



Or.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V1 for-next 2/6] IB/mlx4: Use RoCE IP based GIDs in the port GID table

2013-06-19 Thread Or Gerlitz
From: Moni Shoua mo...@mellanox.co.il

Currently, the mlx4 driver set RoCE (IBoE) gids to encode related
Ethernet netdevice interface MAC address and possibly VLAN id.

Change this scheme such that gids encode interface IP addresses
(both IP4 and IPv6).

This requires learning which are the IP addresses which are of use
by a netdevice associated with the HCA port, formatting them to gids
and adding them to the port gid table. Further, events of add and
delete address are caught to maintain the gid table accordingly.

Associated IP addresses may belong to a master of an Ethernet netdevice
on top of that port so this should be considered when building and
maintaining the gid table.

Signed-off-by: Moni Shoua mo...@mellanox.co.il
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/hw/mlx4/main.c|  461 +++---
 drivers/infiniband/hw/mlx4/mlx4_ib.h |3 +
 2 files changed, 320 insertions(+), 144 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index 23d7343..8879b41 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -39,6 +39,8 @@
 #include linux/inetdevice.h
 #include linux/rtnetlink.h
 #include linux/if_vlan.h
+#include net/ipv6.h
+#include net/addrconf.h
 
 #include rdma/ib_smi.h
 #include rdma/ib_user_verbs.h
@@ -767,7 +769,6 @@ static int add_gid_entry(struct ib_qp *ibqp, union ib_gid 
*gid)
 int mlx4_ib_add_mc(struct mlx4_ib_dev *mdev, struct mlx4_ib_qp *mqp,
   union ib_gid *gid)
 {
-   u8 mac[6];
struct net_device *ndev;
int ret = 0;
 
@@ -781,11 +782,7 @@ int mlx4_ib_add_mc(struct mlx4_ib_dev *mdev, struct 
mlx4_ib_qp *mqp,
spin_unlock(mdev-iboe.lock);
 
if (ndev) {
-   rdma_get_mcast_mac((struct in6_addr *)gid, mac);
-   rtnl_lock();
-   dev_mc_add(mdev-iboe.netdevs[mqp-port - 1], mac);
ret = 1;
-   rtnl_unlock();
dev_put(ndev);
}
 
@@ -805,6 +802,8 @@ static int mlx4_ib_mcg_attach(struct ib_qp *ibqp, union 
ib_gid *gid, u16 lid)
struct mlx4_ib_qp *mqp = to_mqp(ibqp);
u64 reg_id;
struct mlx4_ib_steering *ib_steering = NULL;
+   enum mlx4_protocol prot = (gid-raw[1] == 0x0e) ?
+   MLX4_PROT_IB_IPV4 : MLX4_PROT_IB_IPV6;
 
if (mdev-dev-caps.steering_mode ==
MLX4_STEERING_MODE_DEVICE_MANAGED) {
@@ -816,7 +815,7 @@ static int mlx4_ib_mcg_attach(struct ib_qp *ibqp, union 
ib_gid *gid, u16 lid)
err = mlx4_multicast_attach(mdev-dev, mqp-mqp, gid-raw, mqp-port,
!!(mqp-flags 
   MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK),
-   MLX4_PROT_IB_IPV6, reg_id);
+   prot, reg_id);
if (err)
goto err_malloc;
 
@@ -835,7 +834,7 @@ static int mlx4_ib_mcg_attach(struct ib_qp *ibqp, union 
ib_gid *gid, u16 lid)
 
 err_add:
mlx4_multicast_detach(mdev-dev, mqp-mqp, gid-raw,
- MLX4_PROT_IB_IPV6, reg_id);
+ prot, reg_id);
 err_malloc:
kfree(ib_steering);
 
@@ -863,10 +862,11 @@ static int mlx4_ib_mcg_detach(struct ib_qp *ibqp, union 
ib_gid *gid, u16 lid)
int err;
struct mlx4_ib_dev *mdev = to_mdev(ibqp-device);
struct mlx4_ib_qp *mqp = to_mqp(ibqp);
-   u8 mac[6];
struct net_device *ndev;
struct mlx4_ib_gid_entry *ge;
u64 reg_id = 0;
+   enum mlx4_protocol prot = (gid-raw[1] == 0x0e) ?
+   MLX4_PROT_IB_IPV4 : MLX4_PROT_IB_IPV6;
 
if (mdev-dev-caps.steering_mode ==
MLX4_STEERING_MODE_DEVICE_MANAGED) {
@@ -889,7 +889,7 @@ static int mlx4_ib_mcg_detach(struct ib_qp *ibqp, union 
ib_gid *gid, u16 lid)
}
 
err = mlx4_multicast_detach(mdev-dev, mqp-mqp, gid-raw,
-   MLX4_PROT_IB_IPV6, reg_id);
+   prot, reg_id);
if (err)
return err;
 
@@ -901,13 +901,8 @@ static int mlx4_ib_mcg_detach(struct ib_qp *ibqp, union 
ib_gid *gid, u16 lid)
if (ndev)
dev_hold(ndev);
spin_unlock(mdev-iboe.lock);
-   rdma_get_mcast_mac((struct in6_addr *)gid, mac);
-   if (ndev) {
-   rtnl_lock();
-   dev_mc_del(mdev-iboe.netdevs[ge-port - 1], mac);
-   rtnl_unlock();
+   if (ndev)
dev_put(ndev);
-   }
list_del(ge-list);
kfree(ge);
} else
@@ -1003,20 +998,6 @@ static struct device_attribute *mlx4_class_attributes[] = 
{
dev_attr_board_id
 };
 
-static void mlx4_addrconf_ifid_eui48(u8 *eui, u16 vlan_id, struct net_device 
*dev)
-{
-   memcpy(eui, dev-dev_addr, 3

[PATCH V1 for-next 4/6] IB/core: Infra-structure to support verbs extensions through uverbs

2013-06-19 Thread Or Gerlitz
From: Igor Ivanov igor.iva...@itseez.com

Add Infra-structure to support extended uverbs capabilities in a 
forward/backward
manner. Uverbs command opcodes which are based on the verbs extensions approach 
should
be greater or equal to IB_USER_VERBS_CMD_THRESHOLD. They have new header format
and processed a bit differently.

Signed-off-by: Igor Ivanov igor.iva...@itseez.com
Signed-off-by: Hadar Hen Zion had...@mellanox.com
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/core/uverbs_main.c |   29 -
 include/uapi/rdma/ib_user_verbs.h |   10 ++
 2 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_main.c 
b/drivers/infiniband/core/uverbs_main.c
index 2c6f0f2..e4e7b24 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -583,9 +583,6 @@ static ssize_t ib_uverbs_write(struct file *filp, const 
char __user *buf,
if (copy_from_user(hdr, buf, sizeof hdr))
return -EFAULT;
 
-   if (hdr.in_words * 4 != count)
-   return -EINVAL;
-
if (hdr.command = ARRAY_SIZE(uverbs_cmd_table) ||
!uverbs_cmd_table[hdr.command])
return -EINVAL;
@@ -597,8 +594,30 @@ static ssize_t ib_uverbs_write(struct file *filp, const 
char __user *buf,
if (!(file-device-ib_dev-uverbs_cmd_mask  (1ull  hdr.command)))
return -ENOSYS;
 
-   return uverbs_cmd_table[hdr.command](file, buf + sizeof hdr,
-hdr.in_words * 4, hdr.out_words * 
4);
+   if (hdr.command = IB_USER_VERBS_CMD_THRESHOLD) {
+   struct ib_uverbs_cmd_hdr_ex hdr_ex;
+
+   if (copy_from_user(hdr_ex, buf, sizeof(hdr_ex)))
+   return -EFAULT;
+
+   if (((hdr_ex.in_words + hdr_ex.provider_in_words) * 4) != count)
+   return -EINVAL;
+
+   return uverbs_cmd_table[hdr.command](file,
+buf + sizeof(hdr_ex),
+(hdr_ex.in_words +
+ hdr_ex.provider_in_words) 
* 4,
+(hdr_ex.out_words +
+ 
hdr_ex.provider_out_words) * 4);
+   } else {
+   if (hdr.in_words * 4 != count)
+   return -EINVAL;
+
+   return uverbs_cmd_table[hdr.command](file,
+buf + sizeof(hdr),
+hdr.in_words * 4,
+hdr.out_words * 4);
+   }
 }
 
 static int ib_uverbs_mmap(struct file *filp, struct vm_area_struct *vma)
diff --git a/include/uapi/rdma/ib_user_verbs.h 
b/include/uapi/rdma/ib_user_verbs.h
index 805711e..61535aa 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -43,6 +43,7 @@
  * compatibility are made.
  */
 #define IB_USER_VERBS_ABI_VERSION  6
+#define IB_USER_VERBS_CMD_THRESHOLD50
 
 enum {
IB_USER_VERBS_CMD_GET_CONTEXT,
@@ -123,6 +124,15 @@ struct ib_uverbs_cmd_hdr {
__u16 out_words;
 };
 
+struct ib_uverbs_cmd_hdr_ex {
+   __u32 command;
+   __u16 in_words;
+   __u16 out_words;
+   __u16 provider_in_words;
+   __u16 provider_out_words;
+   __u32 cmd_hdr_reserved;
+};
+
 struct ib_uverbs_get_context {
__u64 response;
__u64 driver_data[0];
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V1 for-next 0/6] IP based RoCE GID Addressing

2013-06-19 Thread Or Gerlitz
Currently, the IB stack (core + drivers) handle RoCE (IBoE) gids as
they encode related Ethernet net-device interface MAC address and 
possibly VLAN id.

This series changes RoCE GIDs to encode IP addresses (IPv4 + IPv6)
of the that Ethernet interface, under the following reasoning:

1. There are environments where the compute entity that runs the RoCE 
stack is not aware that its traffic is vlan-tagged. This results with that 
node to create/assume wrong GIDs from the view point of a peer node which 
is aware to vlans. 

Note that node here can be physical node connected to Ethernet switch acting 
in 
access mode talking to another node which does vlan insertion/stripping by 
itself.

Or another example is SRIOV Virtual Function which is configured to work in 
VST 
mode (Virtual-Switch-Tagging) such that the hypervisor configures the HW 
eSWitch 
to do vlan insertion for the vPORT representing that function.

2. When RoCE traffic is inspected (mirrored/trapped) in Ethernet switches for 
monitoring and security purposes. It is much more natural for both humans and 
automated utilities (...) to observe IP addresses in a certain offset into RoCE 
frames L3 header vs. MAC/VLANs (which are there anyway in the L2 header of that 
frame, so they are not gone by this change).

3. Some Bonding/Teaming advanced mode such as balance-alb and balance-tlb 
are using multiple underlying devices in parallel, and hence packets always 
carry the bond IP address but different streams have different source MACs.
The approach brought by this series is part from what would allow to 
support that for RoCE traffic too.

The 1st patch modified the IB core to cope with the new scheme, and the 2nd/3rd 
ones
do that for the mlx4_ib driver. The 4th patch sets the foundation for extending 
uverbs to
the new scheme which was introduced lately, and the 5th/6th patches add two 
extended
uCMA commands and two extended uVERBS commands which are now exported to user 
space.

These extended verbs will allow to enhance user space libraries such that they 
work 
OK over the modified scheme. All RC applications using librdmacm will not need 
to be 
modified at all, since the change will be encapsulated into that library.

The ocrdma driver needs to go through a similar patch as the mlx4_ib one, we can
surely do that patch, just need to dig there a little further. 

Or.

changes from V0:

 - enhanced docuementation of the mlx4_ib, uverbs and ucma patches
 - broke the mlx4_ib patch to two
 - broke the extended user space commands patch to two


Igor Ivanov (1):
  IB/core: Infra-structure to support verbs extensions through uverbs

Matan Barak (2):
  IB/core: Add RoCE IP based addressing extensions for uverbs
  IB/core: Add RoCE IP based addressing extensions for rdma_ucm

Moni Shoua (3):
  IB/core: RoCE IP based GID addressing
  IB/mlx4: Use RoCE IP based GIDs in the port GID table
  IB/mlx4: Handle Ethernet L2 parameters for IP based GID addressing

 drivers/infiniband/core/cm.c  |3 +
 drivers/infiniband/core/cma.c |   39 ++-
 drivers/infiniband/core/sa_query.c|5 +
 drivers/infiniband/core/ucma.c|  190 +++--
 drivers/infiniband/core/uverbs.h  |2 +
 drivers/infiniband/core/uverbs_cmd.c  |  330 -
 drivers/infiniband/core/uverbs_main.c |   33 ++-
 drivers/infiniband/core/uverbs_marshall.c |   94 ++-
 drivers/infiniband/core/verbs.c   |7 +
 drivers/infiniband/hw/mlx4/ah.c   |   21 +-
 drivers/infiniband/hw/mlx4/cq.c   |5 +
 drivers/infiniband/hw/mlx4/main.c |  461 -
 drivers/infiniband/hw/mlx4/mlx4_ib.h  |3 +
 drivers/infiniband/hw/mlx4/qp.c   |   19 +-
 include/linux/mlx4/cq.h   |   14 +-
 include/rdma/ib_addr.h|   45 ++--
 include/rdma/ib_marshall.h|   12 +
 include/rdma/ib_sa.h  |3 +
 include/rdma/ib_verbs.h   |4 +
 include/uapi/rdma/ib_user_sa.h|   34 ++-
 include/uapi/rdma/ib_user_verbs.h |  130 -
 include/uapi/rdma/rdma_user_cm.h  |   21 ++-
 22 files changed, 1157 insertions(+), 318 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V1 for-next 1/6] IB/core: RoCE IP based GID addressing

2013-06-19 Thread Or Gerlitz
From: Moni Shoua mo...@mellanox.co.il

Currently, the IB core assume RoCE (IBoE) gids encode related Ethernet
netdevice interface MAC address and possibly VLAN id.

Change gids to be treated as they encode interface IP address.

Since Ethernet layer 2 address parameters are not longer encoded within gids,
had to extend the Infiniband address structures (e.g. ib_ah_attr) with layer 2
address parameters, namely mac and vlan.

Signed-off-by: Moni Shoua mo...@mellanox.co.il
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/core/cm.c   |3 ++
 drivers/infiniband/core/cma.c  |   39 ++
 drivers/infiniband/core/sa_query.c |5 
 drivers/infiniband/core/ucma.c |   18 +++---
 drivers/infiniband/core/verbs.c|7 +
 include/rdma/ib_addr.h |   45 
 include/rdma/ib_sa.h   |3 ++
 include/rdma/ib_verbs.h|4 +++
 8 files changed, 79 insertions(+), 45 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 784b97c..7af618f 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -1557,6 +1557,9 @@ static int cm_req_handler(struct cm_work *work)
 
cm_process_routed_req(req_msg, work-mad_recv_wc-wc);
cm_format_paths_from_req(req_msg, work-path[0], work-path[1]);
+
+   memcpy(work-path[0].dmac, cm_id_priv-av.ah_attr.dmac, 6);
+   work-path[0].vlan = cm_id_priv-av.ah_attr.vlan;
ret = cm_init_av_by_path(work-path[0], cm_id_priv-av);
if (ret) {
ib_get_cached_gid(work-port-cm_dev-ib_device,
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 71c2c71..ba217c9 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -373,7 +373,9 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv)
return -EINVAL;
 
mutex_lock(lock);
-   iboe_addr_get_sgid(dev_addr, iboe_gid);
+   rdma_ip2gid((struct sockaddr *)id_priv-id.route.addr.src_addr,
+   iboe_gid);
+
memcpy(gid, dev_addr-src_dev_addr +
   rdma_addr_gid_offset(dev_addr), sizeof gid);
list_for_each_entry(cma_dev, dev_list, list) {
@@ -1803,7 +1805,7 @@ static int cma_resolve_iboe_route(struct rdma_id_private 
*id_priv)
struct sockaddr_in *src_addr = (struct sockaddr_in 
*)route-addr.src_addr;
struct sockaddr_in *dst_addr = (struct sockaddr_in 
*)route-addr.dst_addr;
struct net_device *ndev = NULL;
-   u16 vid;
+
 
if (src_addr-sin_family != dst_addr-sin_family)
return -EINVAL;
@@ -1830,10 +1832,13 @@ static int cma_resolve_iboe_route(struct 
rdma_id_private *id_priv)
goto err2;
}
 
-   vid = rdma_vlan_dev_vlan_id(ndev);
+   route-path_rec-vlan = rdma_vlan_dev_vlan_id(ndev);
+   memcpy(route-path_rec-dmac, addr-dev_addr.dst_dev_addr, 6);
 
-   iboe_mac_vlan_to_ll(route-path_rec-sgid, 
addr-dev_addr.src_dev_addr, vid);
-   iboe_mac_vlan_to_ll(route-path_rec-dgid, 
addr-dev_addr.dst_dev_addr, vid);
+   rdma_ip2gid((struct sockaddr *)id_priv-id.route.addr.src_addr,
+   route-path_rec-sgid);
+   rdma_ip2gid((struct sockaddr *)id_priv-id.route.addr.dst_addr,
+   route-path_rec-dgid);
 
route-path_rec-hop_limit = 1;
route-path_rec-reversible = 1;
@@ -1970,6 +1975,8 @@ static void addr_handler(int status, struct sockaddr 
*src_addr,
   RDMA_CM_ADDR_RESOLVED))
goto out;
 
+   memcpy(id_priv-id.route.addr.src_addr, src_addr,
+  ip_addr_size(src_addr));
if (!status  !id_priv-cma_dev)
status = cma_acquire_dev(id_priv);
 
@@ -1979,11 +1986,8 @@ static void addr_handler(int status, struct sockaddr 
*src_addr,
goto out;
event.event = RDMA_CM_EVENT_ADDR_ERROR;
event.status = status;
-   } else {
-   memcpy(id_priv-id.route.addr.src_addr, src_addr,
-  ip_addr_size(src_addr));
+   } else
event.event = RDMA_CM_EVENT_ADDR_RESOLVED;
-   }
 
if (id_priv-id.event_handler(id_priv-id, event)) {
cma_exch(id_priv, RDMA_CM_DESTROYING);
@@ -2381,6 +2385,7 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr 
*addr)
if (ret)
goto err1;
 
+   memcpy(id-route.addr.src_addr, addr, ip_addr_size(addr));
if (!cma_any_addr(addr)) {
ret = rdma_translate_ip(addr, id-route.addr.dev_addr);
if (ret)
@@ -2391,7 +2396,6 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr 
*addr)
goto err1;
}
 
-   memcpy(id-route.addr.src_addr, addr, ip_addr_size(addr));
if (!(id_priv-options  (1  CMA_OPTION_AFONLY))) {
if (addr

[PATCH V1 for-next 5/6] IB/core: Add RoCE IP based addressing extensions for uverbs

2013-06-19 Thread Or Gerlitz
From: Matan Barak mat...@mellanox.com

Add uverbs support for RoCE (IBoE) IP based addressing extensions
towards user space libraries.

Under ip based gid addressing, for RC QPs, QP attributes should contain the
Ethernet L2 destination. Until now, indicatings GID was sufficient. When
using ip encoded in gids, the QP attributes should contain extended destination,
indicating vlan and dmac as well. This is done via a new struct 
ib_uverbs_qp_dest_ex.
This new structure is contained in a new struct ib_uverbs_modify_qp_ex that is
used by MODIFY_QP_EX command. In order to make those changes seamlessly, those
extended structures were added in the bottom of the current structures.

Also, when the gid encodes ip address, the AH attributes should contain also
vlan and dmac. Therefore, ib_uverbs_create_ah was extended to contain those 
fields.
When creating an AH, the user indicates the exact L2 ethernet destination
parameters. This is done by a new CREATE_AH_EX command that uses a new struct
ib_uverbs_create_ah_ex.

struct ib_user_path_rec was extended too, to contain source and destination
MAC and VLAN ID, this structure is of use by the rdma_ucm driver.

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/core/uverbs.h  |2 +
 drivers/infiniband/core/uverbs_cmd.c  |  330 ++---
 drivers/infiniband/core/uverbs_main.c |4 +-
 drivers/infiniband/core/uverbs_marshall.c |   94 -
 include/rdma/ib_marshall.h|   12 +
 include/uapi/rdma/ib_user_sa.h|   34 +++-
 include/uapi/rdma/ib_user_verbs.h |  120 +++-
 7 files changed, 503 insertions(+), 93 deletions(-)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index 0fcd7aa..1ec4850 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -200,11 +200,13 @@ IB_UVERBS_DECLARE_CMD(create_qp);
 IB_UVERBS_DECLARE_CMD(open_qp);
 IB_UVERBS_DECLARE_CMD(query_qp);
 IB_UVERBS_DECLARE_CMD(modify_qp);
+IB_UVERBS_DECLARE_CMD(modify_qp_ex);
 IB_UVERBS_DECLARE_CMD(destroy_qp);
 IB_UVERBS_DECLARE_CMD(post_send);
 IB_UVERBS_DECLARE_CMD(post_recv);
 IB_UVERBS_DECLARE_CMD(post_srq_recv);
 IB_UVERBS_DECLARE_CMD(create_ah);
+IB_UVERBS_DECLARE_CMD(create_ah_ex);
 IB_UVERBS_DECLARE_CMD(destroy_ah);
 IB_UVERBS_DECLARE_CMD(attach_mcast);
 IB_UVERBS_DECLARE_CMD(detach_mcast);
diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index a7d00f6..eb3e7e6 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -1891,6 +1891,58 @@ static int modify_qp_mask(enum ib_qp_type qp_type, int 
mask)
}
 }
 
+static void ib_uverbs_modify_qp_assign(struct ib_uverbs_modify_qp *cmd,
+  struct ib_qp_attr *attr) {
+   attr-qp_state= cmd-qp_state;
+   attr-cur_qp_state= cmd-cur_qp_state;
+   attr-path_mtu= cmd-path_mtu;
+   attr-path_mig_state  = cmd-path_mig_state;
+   attr-qkey= cmd-qkey;
+   attr-rq_psn  = cmd-rq_psn;
+   attr-sq_psn  = cmd-sq_psn;
+   attr-dest_qp_num = cmd-dest_qp_num;
+   attr-qp_access_flags = cmd-qp_access_flags;
+   attr-pkey_index  = cmd-pkey_index;
+   attr-alt_pkey_index  = cmd-alt_pkey_index;
+   attr-en_sqd_async_notify = cmd-en_sqd_async_notify;
+   attr-max_rd_atomic   = cmd-max_rd_atomic;
+   attr-max_dest_rd_atomic  = cmd-max_dest_rd_atomic;
+   attr-min_rnr_timer   = cmd-min_rnr_timer;
+   attr-port_num= cmd-port_num;
+   attr-timeout = cmd-timeout;
+   attr-retry_cnt   = cmd-retry_cnt;
+   attr-rnr_retry   = cmd-rnr_retry;
+   attr-alt_port_num= cmd-alt_port_num;
+   attr-alt_timeout = cmd-alt_timeout;
+
+   memcpy(attr-ah_attr.grh.dgid.raw, cmd-dest.dgid, 16);
+   attr-ah_attr.grh.flow_label= cmd-dest.flow_label;
+   attr-ah_attr.grh.sgid_index= cmd-dest.sgid_index;
+   attr-ah_attr.grh.hop_limit = cmd-dest.hop_limit;
+   attr-ah_attr.grh.traffic_class = cmd-dest.traffic_class;
+   attr-ah_attr.dlid  = cmd-dest.dlid;
+   attr-ah_attr.sl= cmd-dest.sl;
+   attr-ah_attr.src_path_bits = cmd-dest.src_path_bits;
+   attr-ah_attr.static_rate   = cmd-dest.static_rate;
+   attr-ah_attr.ah_flags  = cmd-dest.is_global ?
+ IB_AH_GRH : 0;
+   attr-ah_attr.port_num  = cmd-dest.port_num;
+
+   memcpy(attr-alt_ah_attr.grh.dgid.raw, cmd-alt_dest.dgid, 16);
+   attr-alt_ah_attr.grh.flow_label= cmd-alt_dest.flow_label;
+   attr-alt_ah_attr.grh.sgid_index= cmd-alt_dest.sgid_index;
+   attr

[PATCH V1 for-next 6/6] IB/core: Add RoCE IP based addressing extensions for rdma_ucm

2013-06-19 Thread Or Gerlitz
From: Matan Barak mat...@mellanox.com

Add rdma_ucm support for RoCE (IBoE) IP based addressing extensions
towards librdmacm

Extend INIT_QP_ATTR and QUERY_ROUTE ucma commands.

INIT_QP_ATTR_EX uses struct ib_uverbs_qp_attr_ex

QUERY_ROUTE_EX uses struct rdma_ucm_query_route_resp_ex which in turn
uses ib_user_path_rec_ex

Signed-off-by: Matan Barak mat...@mellanox.com
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/core/ucma.c   |  172 -
 include/uapi/rdma/rdma_user_cm.h |   21 +-
 2 files changed, 187 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index bc2cb5d..c7dfd99 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -599,6 +599,35 @@ static void ucma_copy_ib_route(struct 
rdma_ucm_query_route_resp *resp,
}
 }
 
+static void ucma_copy_ib_route_ex(struct rdma_ucm_query_route_resp_ex *resp,
+ struct rdma_route *route)
+{
+   struct rdma_dev_addr *dev_addr;
+
+   resp-num_paths = route-num_paths;
+   switch (route-num_paths) {
+   case 0:
+   dev_addr = route-addr.dev_addr;
+   rdma_addr_get_dgid(dev_addr,
+  (union ib_gid *)resp-ib_route[0].dgid);
+   rdma_addr_get_sgid(dev_addr,
+  (union ib_gid *)resp-ib_route[0].sgid);
+   resp-ib_route[0].pkey =
+   cpu_to_be16(ib_addr_get_pkey(dev_addr));
+   break;
+   case 2:
+   ib_copy_path_rec_to_user_ex(resp-ib_route[1],
+   route-path_rec[1]);
+   /* fall through */
+   case 1:
+   ib_copy_path_rec_to_user_ex(resp-ib_route[0],
+   route-path_rec[0]);
+   break;
+   default:
+   break;
+   }
+}
+
 static void ucma_copy_iboe_route(struct rdma_ucm_query_route_resp *resp,
 struct rdma_route *route)
 {
@@ -625,14 +654,39 @@ static void ucma_copy_iboe_route(struct 
rdma_ucm_query_route_resp *resp,
}
 }
 
-static void ucma_copy_iw_route(struct rdma_ucm_query_route_resp *resp,
+static void ucma_copy_iboe_route_ex(struct rdma_ucm_query_route_resp_ex *resp,
+   struct rdma_route *route)
+{
+   resp-num_paths = route-num_paths;
+   switch (route-num_paths) {
+   case 0:
+   rdma_ip2gid((struct sockaddr *)route-addr.dst_addr,
+   (union ib_gid *)resp-ib_route[0].dgid);
+   rdma_ip2gid((struct sockaddr *)route-addr.src_addr,
+   (union ib_gid *)resp-ib_route[0].sgid);
+   resp-ib_route[0].pkey = cpu_to_be16(0x);
+   break;
+   case 2:
+   ib_copy_path_rec_to_user_ex(resp-ib_route[1],
+   route-path_rec[1]);
+   /* fall through */
+   case 1:
+   ib_copy_path_rec_to_user_ex(resp-ib_route[0],
+   route-path_rec[0]);
+   break;
+   default:
+   break;
+   }
+}
+
+static void ucma_copy_iw_route(struct ib_user_path_rec *resp_path,
   struct rdma_route *route)
 {
struct rdma_dev_addr *dev_addr;
 
dev_addr = route-addr.dev_addr;
-   rdma_addr_get_dgid(dev_addr, (union ib_gid *) resp-ib_route[0].dgid);
-   rdma_addr_get_sgid(dev_addr, (union ib_gid *) resp-ib_route[0].sgid);
+   rdma_addr_get_dgid(dev_addr, (union ib_gid *)resp_path-dgid);
+   rdma_addr_get_sgid(dev_addr, (union ib_gid *)resp_path-sgid);
 }
 
 static ssize_t ucma_query_route(struct ucma_file *file,
@@ -684,7 +738,74 @@ static ssize_t ucma_query_route(struct ucma_file *file,
}
break;
case RDMA_TRANSPORT_IWARP:
-   ucma_copy_iw_route(resp, ctx-cm_id-route);
+   ucma_copy_iw_route(resp.ib_route[0], ctx-cm_id-route);
+   break;
+   default:
+   break;
+   }
+
+out:
+   if (copy_to_user((void __user *)(unsigned long)cmd.response,
+resp, sizeof(resp)))
+   ret = -EFAULT;
+
+   ucma_put_ctx(ctx);
+   return ret;
+}
+
+static ssize_t ucma_query_route_ex(struct ucma_file *file,
+  const char __user *inbuf,
+  int in_len, int out_len)
+{
+   struct rdma_ucm_query_route_ex cmd;
+   struct rdma_ucm_query_route_resp_ex resp;
+   struct ucma_context *ctx;
+   struct sockaddr *addr;
+   int ret = 0;
+
+   if (out_len  sizeof(resp))
+   return -ENOSPC;
+
+   if (copy_from_user(cmd, inbuf, sizeof(cmd)))
+   return -EFAULT;
+
+   ctx = ucma_get_ctx(file, cmd.id

[PATCH V1 for-next 3/6] IB/mlx4: Handle Ethernet L2 parameters for IP based GID addressing

2013-06-19 Thread Or Gerlitz
From: Moni Shoua mo...@mellanox.co.il

IP based RoCE gids don't store Ethernet L2 parameters, MAC and VLAN.

Hence, we need to extract them now from the CQE and place in struct
ib_wc (to be used for cases were they were taken from the gid).

Also, when modifying a QP or building address handle, instead of
parsing the dgid to get the MAC and VLAN, take them from the
address handle attributes.

Signed-off-by: Moni Shoua mo...@mellanox.co.il
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/hw/mlx4/ah.c |   21 +
 drivers/infiniband/hw/mlx4/cq.c |5 +
 drivers/infiniband/hw/mlx4/qp.c |   19 ++-
 include/linux/mlx4/cq.h |   14 ++
 4 files changed, 34 insertions(+), 25 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/ah.c b/drivers/infiniband/hw/mlx4/ah.c
index a251bec..3941700 100644
--- a/drivers/infiniband/hw/mlx4/ah.c
+++ b/drivers/infiniband/hw/mlx4/ah.c
@@ -92,21 +92,18 @@ static struct ib_ah *create_iboe_ah(struct ib_pd *pd, 
struct ib_ah_attr *ah_attr
 {
struct mlx4_ib_dev *ibdev = to_mdev(pd-device);
struct mlx4_dev *dev = ibdev-dev;
-   union ib_gid sgid;
-   u8 mac[6];
-   int err;
int is_mcast;
+   struct in6_addr in6;
u16 vlan_tag;
 
-   err = mlx4_ib_resolve_grh(ibdev, ah_attr, mac, is_mcast, 
ah_attr-port_num);
-   if (err)
-   return ERR_PTR(err);
-
-   memcpy(ah-av.eth.mac, mac, 6);
-   err = ib_get_cached_gid(pd-device, ah_attr-port_num, 
ah_attr-grh.sgid_index, sgid);
-   if (err)
-   return ERR_PTR(err);
-   vlan_tag = rdma_get_vlan_id(sgid);
+   memcpy(in6, ah_attr-grh.dgid.raw, sizeof(in6));
+   if (rdma_is_multicast_addr(in6)) {
+   is_mcast = 1;
+   rdma_get_mcast_mac(in6, ah-av.eth.mac);
+   } else {
+   memcpy(ah-av.eth.mac, ah_attr-dmac, 6);
+   }
+   vlan_tag = ah_attr-vlan;
if (vlan_tag  0x1000)
vlan_tag |= (ah_attr-sl  7)  13;
ah-av.eth.port_pd = cpu_to_be32(to_mpd(pd)-pdn | (ah_attr-port_num 
 24));
diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index d5e60f4..ba3f85b 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -793,6 +793,11 @@ repoll:
wc-sl  = be16_to_cpu(cqe-sl_vid)  13;
else
wc-sl  = be16_to_cpu(cqe-sl_vid)  12;
+   if (be32_to_cpu(cqe-vlan_my_qpn)  MLX4_CQE_VLAN_PRESENT_MASK)
+   wc-vlan = be16_to_cpu(cqe-sl_vid)  MLX4_CQE_VID_MASK;
+   else
+   wc-vlan = 0x;
+   memcpy(wc-smac, cqe-smac, 6);
}
 
return 0;
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 4f10af2..ddf5a1a 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -1147,11 +1147,8 @@ static void mlx4_set_sched(struct mlx4_qp_path *path, u8 
port)
 static int mlx4_set_path(struct mlx4_ib_dev *dev, const struct ib_ah_attr *ah,
 struct mlx4_qp_path *path, u8 port)
 {
-   int err;
int is_eth = rdma_port_get_link_layer(dev-ib_dev, port) ==
IB_LINK_LAYER_ETHERNET;
-   u8 mac[6];
-   int is_mcast;
u16 vlan_tag;
int vidx;
 
@@ -1188,16 +1185,12 @@ static int mlx4_set_path(struct mlx4_ib_dev *dev, const 
struct ib_ah_attr *ah,
if (!(ah-ah_flags  IB_AH_GRH))
return -1;
 
-   err = mlx4_ib_resolve_grh(dev, ah, mac, is_mcast, port);
-   if (err)
-   return err;
-
-   memcpy(path-dmac, mac, 6);
+   memcpy(path-dmac, ah-dmac, 6);
path-ackto = MLX4_IB_LINK_TYPE_ETH;
/* use index 0 into MAC table for IBoE */
path-grh_mylmc = 0x80;
 
-   vlan_tag = rdma_get_vlan_id(dev-iboe.gid_table[port - 
1][ah-grh.sgid_index]);
+   vlan_tag = ah-vlan;
if (vlan_tag  0x1000) {
if (mlx4_find_cached_vlan(dev-dev, port, vlan_tag, 
vidx))
return -ENOENT;
@@ -1236,6 +1229,7 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
enum mlx4_qp_optpar optpar = 0;
int sqd_event;
int err = -EINVAL;
+   int is_eth;
 
context = kzalloc(sizeof *context, GFP_KERNEL);
if (!context)
@@ -1464,6 +1458,13 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
context-pri_path.ackto = (context-pri_path.ackto  0xf8) |
MLX4_IB_LINK_TYPE_ETH;
 
+   if (ibqp-qp_type == IB_QPT_UD)
+   if (is_eth  (new_state == IB_QPS_RTR)) {
+   context-pri_path.ackto = MLX4_IB_LINK_TYPE_ETH;
+   optpar |= MLX4_QP_OPTPAR_PRIMARY_ADDR_PATH

Re: [PATCH for-next 0/8] Add Mellanox mlx5 driver for Connect-IB devices

2013-06-20 Thread Or Gerlitz

On 16/06/2013 15:02, Eli Cohen wrote:

From: Eli Cohen e...@mellanox.com

The patches that follow constitute the driver for Mellanox's 5th generation
of HCAs named Connect-IB.

The driver is comprised of two kernel modules: mlx5_ib and mlx5_core. This
partitioning resembles what we have for mlx4 with the substantial difference
that mlx5_ib is the pci device driver and not mlx5_core.

mlx5_core provides general functionality that is intended to be used by
other Mellanox devices that will be introduced in the future. In this sense,
it can be perceived as a library. mlx5_ib has a similar role as any hardware
device under drivers/infiniband/hw.


Hi Dave,

So we skipped netdev in V0, in an attempt to reduce cross postings... 
anyway, the mlx5_core driver is similar story as of mlx4_core. So, if 
looking forward, for the initial merge to be simpler, are you OK for 
both the core and IB driver to go through Roland's tree?


Or.



The patches are partitioned to avoid exceeding the 100KB vger.kernel.org
limitation. Only the last patch adds the Makefiles and Kconfigs, to make
things robust for future bisections.

PPC is not yet supported but support will be included in the near future.

Eli

Eli Cohen (8):
   mlx5: Mellanox Connect-IB driver part 1/8
   mlx5: Mellanox Connect-IB driver part 2/8
   mlx5: Mellanox Connect-IB driver part 3/8
   mlx5: Mellanox Connect-IB driver part 4/8
   mlx5: Mellanox Connect-IB driver part 5/8
   mlx5: Mellanox Connect-IB driver part 6/8
   mlx5: Mellanox Connect-IB driver part 7/8
   mlx5: Mellanox Connect-IB driver part 8/8

  MAINTAINERS|   22 +
  drivers/infiniband/Kconfig |1 +
  drivers/infiniband/Makefile|1 +
  drivers/infiniband/hw/mlx5/Kconfig |   10 +
  drivers/infiniband/hw/mlx5/Makefile|4 +
  drivers/infiniband/hw/mlx5/ah.c|   95 +
  drivers/infiniband/hw/mlx5/cq.c|  851 +++
  drivers/infiniband/hw/mlx5/doorbell.c  |  100 +
  drivers/infiniband/hw/mlx5/mad.c   |  143 ++
  drivers/infiniband/hw/mlx5/main.c  | 1512 
  drivers/infiniband/hw/mlx5/mem.c   |  194 ++
  drivers/infiniband/hw/mlx5/mlx5_ib.h   |  593 +
  drivers/infiniband/hw/mlx5/mr.c| 1025 
  drivers/infiniband/hw/mlx5/qp.c| 2549 
  drivers/infiniband/hw/mlx5/srq.c   |  481 
  drivers/infiniband/hw/mlx5/user.h  |  123 +
  drivers/net/ethernet/mellanox/Kconfig  |1 +
  drivers/net/ethernet/mellanox/Makefile |1 +
  drivers/net/ethernet/mellanox/mlx5/core/Kconfig|   18 +
  drivers/net/ethernet/mellanox/mlx5/core/Makefile   |6 +
  drivers/net/ethernet/mellanox/mlx5/core/alloc.c|  244 ++
  drivers/net/ethernet/mellanox/mlx5/core/cmd.c  | 1497 
  drivers/net/ethernet/mellanox/mlx5/core/cq.c   |  226 ++
  drivers/net/ethernet/mellanox/mlx5/core/debugfs.c  |  600 +
  drivers/net/ethernet/mellanox/mlx5/core/eq.c   |  523 
  drivers/net/ethernet/mellanox/mlx5/core/fw.c   |  187 ++
  drivers/net/ethernet/mellanox/mlx5/core/health.c   |  216 ++
  drivers/net/ethernet/mellanox/mlx5/core/mad.c  |   80 +
  drivers/net/ethernet/mellanox/mlx5/core/main.c |  483 
  drivers/net/ethernet/mellanox/mlx5/core/mcg.c  |  108 +
  .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|   96 +
  drivers/net/ethernet/mellanox/mlx5/core/mr.c   |  138 ++
  .../net/ethernet/mellanox/mlx5/core/pagealloc.c|  438 
  drivers/net/ethernet/mellanox/mlx5/core/pd.c   |  103 +
  drivers/net/ethernet/mellanox/mlx5/core/port.c |  106 +
  drivers/net/ethernet/mellanox/mlx5/core/qp.c   |  303 +++
  drivers/net/ethernet/mellanox/mlx5/core/srq.c  |  225 ++
  drivers/net/ethernet/mellanox/mlx5/core/uar.c  |  225 ++
  include/linux/mlx5/cmd.h   |   51 +
  include/linux/mlx5/cq.h|  166 ++
  include/linux/mlx5/device.h|  886 +++
  include/linux/mlx5/doorbell.h  |   81 +
  include/linux/mlx5/driver.h|  763 ++
  include/linux/mlx5/qp.h|  467 
  include/linux/mlx5/srq.h   |   41 +
  45 files changed, 15983 insertions(+)
  create mode 100644 drivers/infiniband/hw/mlx5/Kconfig
  create mode 100644 drivers/infiniband/hw/mlx5/Makefile
  create mode 100644 drivers/infiniband/hw/mlx5/ah.c
  create mode 100644 drivers/infiniband/hw/mlx5/cq.c
  create mode 100644 drivers/infiniband/hw/mlx5/doorbell.c
  create mode 100644 drivers/infiniband/hw/mlx5/mad.c
  create mode 100644 drivers/infiniband/hw/mlx5/main.c
  create mode 100644 drivers/infiniband/hw/mlx5/mem.c
  create mode 100644 

Re: NFS over RDMA benchmark

2013-06-20 Thread Or Gerlitz

On 19/06/2013 18:47, Wendy Cheng wrote:

what kind of HW I would need to run it ?


The mlx4 driver supports memory windows as of kernel 3.9

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/qib: add optional numa affinity

2013-06-22 Thread Or Gerlitz
Mike Marciniszyn mike.marcinis...@intel.com wrote:

 From: Ramkrishna Vepa ramkrishna.v...@intel.com

 This patch adds context relative numa affinity conditioned on the
 module parameter numa_aware. The qib_ctxtdata has an additional
 node_id member and qib_create_ctxtdata() has an addition node_id parameter.

Could you elaborate why making numa awareness be conditioned on module
parameter?

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V1 for-next 3/4] IB/core: Export ib_create/destroy_flow through uverbs

2013-06-24 Thread Or Gerlitz

On 25/06/2013 00:10, Roland Dreier wrote:

On Tue, Jun 11, 2013 at 4:42 AM, Or Gerlitz ogerl...@mellanox.com wrote:

+struct ib_kern_flow {
+   struct ib_device  *device;
+   struct ib_uobject *uobject;
+   void  *flow_context;
+};

I don't think it makes sense to put a structure with kernel pointers
in it into an include file under include/uapi.  For one thing the size
of pointers depends on whether userspace is 32-bit or 64-bit (but of
course there are many other reasons why this will break).


good catch, will look  fix up

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 for-next 1/4] IB/core: Add receive Flow Steering support

2013-06-26 Thread Or Gerlitz
From: Hadar Hen Zion had...@mellanox.com

The RDMA stack allows for applications to create IB_QPT_RAW_PACKET QPs,
for which plain Ethernet packets are used, specifically packets which
don't carry any QPN to be matched by the receiving side.

Applications using these QPs must be provided with a method to
program some steering rule with the HW so packets arriving at
the local port can be routed to them.

This patch adds ib_create_flow which allow to provide a flow specification
for a QP, such that when there's a match between the specification and the
received packet, it can be forwarded to that QP, in a similar manner
one needs to use ib_attach_multicast for IB UD multicast handling.

Flow specifications are provided as instances of struct ib_flow_spec_yyy
which describe L2, L3 and L4 headers, currently specs for Ethernet, IPv4,
TCP, UDP and IB are defined. Flow specs are made of values and masks.

The input to ib_create_flow is instance of struct ib_flow_attr which
contain few mandatory control elements and optional flow specs.

struct ib_flow_attr {
enum ib_flow_attr_type type;
u16  size;
u16  priority;
u8   num_of_specs;
u8   port;
u32  flags;
/* Following are the optional layers according to user request
 * struct ib_flow_spec_yyy
 * struct ib_flow_spec_zzz
 */
};

As these specs are eventually coming from user space, they are defined and
used in a way which allows adding new spec types without kernel/user ABI
change, and with a little API enhancement which defines the newly added spec.

The flow spec structures are defined in a TLV (Type-Length-Value) manner,
which allows to call ib_create_flow with a list of variable length of
optional specs.

For the actual processing of ib_flow_attr the driver uses the number of
specs and the size mandatory fields along with the TLV nature of the specs.

Steering rules processing order is according to rules priority. The user
sets the 12 low-order bits from the priority field and the remaining
4 high-order bits are set by the kernel according to a domain the
application or the layer that created the rule belongs to. Lower
priority numerical value means higher priority.

The returned value from ib_create_flow is instance of struct ib_flow
which contains a database pointer (handle) provided by the HW driver
to be used when calling ib_destroy_flow.

Applications that offload TCP/IP traffic could be written also over IB UD QPs.
As such, the ib_create_flow / ib_destroy_flow API is designed to support UD QPs
too, the HW driver sets IB_DEVICE_MANAGED_FLOW_STEERING to denote support
of flow steering.

The ib_flow_attr enum type relates to usage of flow steering for promiscuous
and sniffer purposes:

IB_FLOW_ATTR_NORMAL - regular rule, steering according to rule specification

IB_FLOW_ATTR_ALL_DEFAULT - default unicast and multicast rule, receive
all Ethernet traffic which isn't steered to any QP

IB_FLOW_ATTR_MC_DEFAULT - same as IB_FLOW_ATTR_ALL_DEFAULT but only for 
multicast

IB_FLOW_ATTR_SNIFFER - sniffer rule, receive all port traffic

ALL_DEFAULT and MC_DEFAULT rules options are valid only for Ethernet link type.

Signed-off-by: Hadar Hen Zion had...@mellanox.com
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/core/verbs.c |   30 +
 include/rdma/ib_verbs.h |  135 ++-
 2 files changed, 163 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 22192de..932f4a7 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1254,3 +1254,33 @@ int ib_dealloc_xrcd(struct ib_xrcd *xrcd)
return xrcd-device-dealloc_xrcd(xrcd);
 }
 EXPORT_SYMBOL(ib_dealloc_xrcd);
+
+struct ib_flow *ib_create_flow(struct ib_qp *qp,
+  struct ib_flow_attr *flow_attr,
+  int domain)
+{
+   struct ib_flow *flow_id;
+   if (!qp-device-create_flow)
+   return ERR_PTR(-ENOSYS);
+
+   flow_id = qp-device-create_flow(qp, flow_attr, domain);
+   if (!IS_ERR(flow_id))
+   atomic_inc(qp-usecnt);
+   return flow_id;
+}
+EXPORT_SYMBOL(ib_create_flow);
+
+int ib_destroy_flow(struct ib_flow *flow_id)
+{
+   int err;
+   struct ib_qp *qp = flow_id-qp;
+
+   if (!flow_id-qp-device-destroy_flow)
+   return -ENOSYS;
+
+   err = qp-device-destroy_flow(flow_id);
+   if (!err)
+   atomic_dec(qp-usecnt);
+   return err;
+}
+EXPORT_SYMBOL(ib_destroy_flow);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 98cc4b2..8e18d17 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -116,7 +116,8 @@ enum ib_device_cap_flags {
IB_DEVICE_MEM_MGT_EXTENSIONS= (121),
IB_DEVICE_BLOCK_MULTICAST_LOOPBACK = (122),
IB_DEVICE_MEM_WINDOW_TYPE_2A= (123

[PATCH V2 for-next 4/4] IB/mlx4: Add receive Flow Steering support

2013-06-26 Thread Or Gerlitz
From: Hadar Hen Zion had...@mellanox.com

Implement ib_create_flow and ib_destroy_flow.

Translate the verbs structures provided by the user to HW structures
and call the MLX4_QP_FLOW_STEERING_ATTACH/DETACH firmware commands.

On the ATTACH command completion, the firmware provides 64 bit registration
ID which is placed into struct mlx4_ib_flow that wraps the instance of 
struct ib_flow which is retuned to caller. Later, this reg ID is used
for detaching that flow from the firmware.

Signed-off-by: Hadar Hen Zion had...@mellanox.com
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/hw/mlx4/main.c|  244 ++
 drivers/infiniband/hw/mlx4/mlx4_ib.h |   12 ++
 include/linux/mlx4/device.h  |5 -
 3 files changed, 256 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index a188d31..752c958 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -54,6 +54,8 @@
 #define DRV_VERSION1.0
 #define DRV_RELDATEApril 4, 2008
 
+#define MLX4_IB_FLOW_MAX_PRIO 0xFFF
+
 MODULE_AUTHOR(Roland Dreier);
 MODULE_DESCRIPTION(Mellanox ConnectX HCA InfiniBand driver);
 MODULE_LICENSE(Dual BSD/GPL);
@@ -88,6 +90,25 @@ static void init_query_mad(struct ib_smp *mad)
 
 static union ib_gid zgid;
 
+static int check_flow_steering_support(struct mlx4_dev *dev)
+{
+   int ib_num_ports = 0;
+   int i;
+
+   mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_IB)
+   ib_num_ports++;
+
+   if (dev-caps.steering_mode == MLX4_STEERING_MODE_DEVICE_MANAGED) {
+   if (ib_num_ports || mlx4_is_mfunc(dev)) {
+   pr_warn(Device managed flow steering is unavailable 
+   for IB ports or in multifunction env.\n);
+   return 0;
+   }
+   return 1;
+   }
+   return 0;
+}
+
 static int mlx4_ib_query_device(struct ib_device *ibdev,
struct ib_device_attr *props)
 {
@@ -144,6 +165,8 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
props-device_cap_flags |= IB_DEVICE_MEM_WINDOW_TYPE_2B;
else
props-device_cap_flags |= IB_DEVICE_MEM_WINDOW_TYPE_2A;
+   if (check_flow_steering_support(dev-dev))
+   props-device_cap_flags |= IB_DEVICE_MANAGED_FLOW_STEERING;
}
 
props-vendor_id   = be32_to_cpup((__be32 *) (out_mad-data + 
36)) 
@@ -798,6 +821,218 @@ struct mlx4_ib_steering {
union ib_gid gid;
 };
 
+static int parse_flow_attr(struct mlx4_dev *dev,
+  struct _ib_flow_spec *ib_spec,
+  struct _rule_hw *mlx4_spec)
+{
+   enum mlx4_net_trans_rule_id type;
+
+   switch (ib_spec-type) {
+   case IB_FLOW_SPEC_ETH:
+   type = MLX4_NET_TRANS_RULE_ID_ETH;
+   memcpy(mlx4_spec-eth.dst_mac, ib_spec-eth.val.dst_mac,
+  ETH_ALEN);
+   memcpy(mlx4_spec-eth.dst_mac_msk, ib_spec-eth.mask.dst_mac,
+  ETH_ALEN);
+   mlx4_spec-eth.vlan_tag = ib_spec-eth.val.vlan_tag;
+   mlx4_spec-eth.vlan_tag_msk = ib_spec-eth.mask.vlan_tag;
+   break;
+
+   case IB_FLOW_SPEC_IB:
+   type = MLX4_NET_TRANS_RULE_ID_IB;
+   mlx4_spec-ib.l3_qpn = ib_spec-ib.val.l3_type_qpn;
+   mlx4_spec-ib.qpn_mask = ib_spec-ib.mask.l3_type_qpn;
+   memcpy(mlx4_spec-ib.dst_gid, ib_spec-ib.val.dst_gid, 16);
+   memcpy(mlx4_spec-ib.dst_gid_msk,
+  ib_spec-ib.mask.dst_gid, 16);
+   break;
+
+   case IB_FLOW_SPEC_IPV4:
+   type = MLX4_NET_TRANS_RULE_ID_IPV4;
+   mlx4_spec-ipv4.src_ip = ib_spec-ipv4.val.src_ip;
+   mlx4_spec-ipv4.src_ip_msk = ib_spec-ipv4.mask.src_ip;
+   mlx4_spec-ipv4.dst_ip = ib_spec-ipv4.val.dst_ip;
+   mlx4_spec-ipv4.dst_ip_msk = ib_spec-ipv4.mask.dst_ip;
+   break;
+
+   case IB_FLOW_SPEC_TCP:
+   case IB_FLOW_SPEC_UDP:
+   type = ib_spec-type == IB_FLOW_SPEC_TCP ?
+   MLX4_NET_TRANS_RULE_ID_TCP :
+   MLX4_NET_TRANS_RULE_ID_UDP;
+   mlx4_spec-tcp_udp.dst_port = ib_spec-tcp_udp.val.dst_port;
+   mlx4_spec-tcp_udp.dst_port_msk = 
ib_spec-tcp_udp.mask.dst_port;
+   mlx4_spec-tcp_udp.src_port = ib_spec-tcp_udp.val.src_port;
+   mlx4_spec-tcp_udp.src_port_msk = 
ib_spec-tcp_udp.mask.src_port;
+   break;
+
+   default:
+   return -EINVAL;
+   }
+   if (mlx4_map_sw_to_hw_steering_id(dev, type)  0 ||
+   mlx4_hw_rule_sz(dev, type)  0)
+   return -EINVAL;
+   mlx4_spec-id = cpu_to_be16(mlx4_map_sw_to_hw_steering_id(dev, type

[PATCH V2 for-next 2/4] IB/core: Infra-structure to support verbs extensions through uverbs

2013-06-26 Thread Or Gerlitz
From: Igor Ivanov igor.iva...@itseez.com

Add Infra-structure to support extended uverbs capabilities in a 
forward/backward
manner. Uverbs command opcodes which are based on the verbs extensions approach 
should
be greater or equal to IB_USER_VERBS_CMD_THRESHOLD. They have new header format
and processed a bit differently.

Signed-off-by: Igor Ivanov igor.iva...@itseez.com
Signed-off-by: Hadar Hen Zion had...@mellanox.com
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/core/uverbs_main.c |   29 -
 include/uapi/rdma/ib_user_verbs.h |   10 ++
 2 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_main.c 
b/drivers/infiniband/core/uverbs_main.c
index 2c6f0f2..e4e7b24 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -583,9 +583,6 @@ static ssize_t ib_uverbs_write(struct file *filp, const 
char __user *buf,
if (copy_from_user(hdr, buf, sizeof hdr))
return -EFAULT;
 
-   if (hdr.in_words * 4 != count)
-   return -EINVAL;
-
if (hdr.command = ARRAY_SIZE(uverbs_cmd_table) ||
!uverbs_cmd_table[hdr.command])
return -EINVAL;
@@ -597,8 +594,30 @@ static ssize_t ib_uverbs_write(struct file *filp, const 
char __user *buf,
if (!(file-device-ib_dev-uverbs_cmd_mask  (1ull  hdr.command)))
return -ENOSYS;
 
-   return uverbs_cmd_table[hdr.command](file, buf + sizeof hdr,
-hdr.in_words * 4, hdr.out_words * 
4);
+   if (hdr.command = IB_USER_VERBS_CMD_THRESHOLD) {
+   struct ib_uverbs_cmd_hdr_ex hdr_ex;
+
+   if (copy_from_user(hdr_ex, buf, sizeof(hdr_ex)))
+   return -EFAULT;
+
+   if (((hdr_ex.in_words + hdr_ex.provider_in_words) * 4) != count)
+   return -EINVAL;
+
+   return uverbs_cmd_table[hdr.command](file,
+buf + sizeof(hdr_ex),
+(hdr_ex.in_words +
+ hdr_ex.provider_in_words) 
* 4,
+(hdr_ex.out_words +
+ 
hdr_ex.provider_out_words) * 4);
+   } else {
+   if (hdr.in_words * 4 != count)
+   return -EINVAL;
+
+   return uverbs_cmd_table[hdr.command](file,
+buf + sizeof(hdr),
+hdr.in_words * 4,
+hdr.out_words * 4);
+   }
 }
 
 static int ib_uverbs_mmap(struct file *filp, struct vm_area_struct *vma)
diff --git a/include/uapi/rdma/ib_user_verbs.h 
b/include/uapi/rdma/ib_user_verbs.h
index 805711e..61535aa 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -43,6 +43,7 @@
  * compatibility are made.
  */
 #define IB_USER_VERBS_ABI_VERSION  6
+#define IB_USER_VERBS_CMD_THRESHOLD50
 
 enum {
IB_USER_VERBS_CMD_GET_CONTEXT,
@@ -123,6 +124,15 @@ struct ib_uverbs_cmd_hdr {
__u16 out_words;
 };
 
+struct ib_uverbs_cmd_hdr_ex {
+   __u32 command;
+   __u16 in_words;
+   __u16 out_words;
+   __u16 provider_in_words;
+   __u16 provider_out_words;
+   __u32 cmd_hdr_reserved;
+};
+
 struct ib_uverbs_get_context {
__u64 response;
__u64 driver_data[0];
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 for-next 3/4] IB/core: Export ib_create/destroy_flow through uverbs

2013-06-26 Thread Or Gerlitz
From: Hadar Hen Zion had...@mellanox.com

Implement ib_uverbs_create_flow and ib_uverbs_destroy_flow to
support flow steering for user space applications.

Signed-off-by: Hadar Hen Zion had...@mellanox.com
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/core/uverbs.h  |3 +
 drivers/infiniband/core/uverbs_cmd.c  |  206 +
 drivers/infiniband/core/uverbs_main.c |   13 ++-
 include/rdma/ib_verbs.h   |1 +
 include/uapi/rdma/ib_user_verbs.h |  102 -
 5 files changed, 323 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index 0fcd7aa..ad9d102 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -155,6 +155,7 @@ extern struct idr ib_uverbs_cq_idr;
 extern struct idr ib_uverbs_qp_idr;
 extern struct idr ib_uverbs_srq_idr;
 extern struct idr ib_uverbs_xrcd_idr;
+extern struct idr ib_uverbs_rule_idr;
 
 void idr_remove_uobj(struct idr *idp, struct ib_uobject *uobj);
 
@@ -215,5 +216,7 @@ IB_UVERBS_DECLARE_CMD(destroy_srq);
 IB_UVERBS_DECLARE_CMD(create_xsrq);
 IB_UVERBS_DECLARE_CMD(open_xrcd);
 IB_UVERBS_DECLARE_CMD(close_xrcd);
+IB_UVERBS_DECLARE_CMD(create_flow);
+IB_UVERBS_DECLARE_CMD(destroy_flow);
 
 #endif /* UVERBS_H */
diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index a7d00f6..956782b 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -54,6 +54,7 @@ static struct uverbs_lock_class qp_lock_class = { .name = 
QP-uobj };
 static struct uverbs_lock_class ah_lock_class  = { .name = AH-uobj };
 static struct uverbs_lock_class srq_lock_class = { .name = SRQ-uobj };
 static struct uverbs_lock_class xrcd_lock_class = { .name = XRCD-uobj };
+static struct uverbs_lock_class rule_lock_class = { .name = RULE-uobj };
 
 #define INIT_UDATA(udata, ibuf, obuf, ilen, olen)  \
do {\
@@ -330,6 +331,7 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
INIT_LIST_HEAD(ucontext-srq_list);
INIT_LIST_HEAD(ucontext-ah_list);
INIT_LIST_HEAD(ucontext-xrcd_list);
+   INIT_LIST_HEAD(ucontext-rule_list);
ucontext-closing = 0;
 
resp.num_comp_vectors = file-device-num_comp_vectors;
@@ -2587,6 +2589,210 @@ out_put:
return ret ? ret : in_len;
 }
 
+static int kern_spec_to_ib_spec(struct ib_kern_spec *kern_spec,
+   struct _ib_flow_spec *ib_spec)
+{
+   ib_spec-type = kern_spec-type;
+
+   switch (ib_spec-type) {
+   case IB_FLOW_SPEC_ETH:
+   ib_spec-eth.size = sizeof(struct ib_flow_spec_eth);
+   memcpy(ib_spec-eth.val, kern_spec-eth.val,
+  sizeof(struct ib_flow_eth_filter));
+   memcpy(ib_spec-eth.mask, kern_spec-eth.mask,
+  sizeof(struct ib_flow_eth_filter));
+   break;
+   case IB_FLOW_SPEC_IB:
+   ib_spec-ib.size = sizeof(struct ib_flow_spec_ib);
+   memcpy(ib_spec-ib.val, kern_spec-ib.val,
+  sizeof(struct ib_flow_ib_filter));
+   memcpy(ib_spec-ib.mask, kern_spec-ib.mask,
+  sizeof(struct ib_flow_ib_filter));
+   break;
+   case IB_FLOW_SPEC_IPV4:
+   ib_spec-ipv4.size = sizeof(struct ib_flow_spec_ipv4);
+   memcpy(ib_spec-ipv4.val, kern_spec-ipv4.val,
+  sizeof(struct ib_flow_ipv4_filter));
+   memcpy(ib_spec-ipv4.mask, kern_spec-ipv4.mask,
+  sizeof(struct ib_flow_ipv4_filter));
+   break;
+   case IB_FLOW_SPEC_TCP:
+   case IB_FLOW_SPEC_UDP:
+   ib_spec-tcp_udp.size = sizeof(struct ib_flow_spec_tcp_udp);
+   memcpy(ib_spec-tcp_udp.val, kern_spec-tcp_udp.val,
+  sizeof(struct ib_flow_tcp_udp_filter));
+   memcpy(ib_spec-tcp_udp.mask, kern_spec-tcp_udp.mask,
+  sizeof(struct ib_flow_tcp_udp_filter));
+   break;
+   default:
+   return -EINVAL;
+   }
+   return 0;
+}
+
+ssize_t ib_uverbs_create_flow(struct ib_uverbs_file *file,
+ const char __user *buf, int in_len,
+ int out_len)
+{
+   struct ib_uverbs_create_flow  cmd;
+   struct ib_uverbs_create_flow_resp resp;
+   struct ib_uobject *uobj;
+   struct ib_flow*flow_id;
+   struct ib_kern_flow_attr  *kern_flow_attr;
+   struct ib_flow_attr   *flow_attr;
+   struct ib_qp  *qp;
+   int err = 0;
+   void *kern_spec;
+   void *ib_spec;
+   int i;
+
+   if (out_len  sizeof(resp))
+   return -ENOSPC;
+
+   if (copy_from_user(cmd, buf

[PATCH for/net-next 8/8] IB/mlx5: Mellanox Connect-IB, IB driver part 5/5

2013-06-26 Thread Or Gerlitz
From: Eli Cohen e...@mellanox.com

Signed-off-by: Eli Cohen e...@mellanox.com
---
 MAINTAINERS |   10 ++
 drivers/infiniband/Kconfig  |1 +
 drivers/infiniband/Makefile |1 +
 drivers/infiniband/hw/mlx5/Kconfig  |   10 ++
 drivers/infiniband/hw/mlx5/Makefile |4 
 5 files changed, 26 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/mlx5/Kconfig
 create mode 100644 drivers/infiniband/hw/mlx5/Makefile

diff --git a/MAINTAINERS b/MAINTAINERS
index 6e82fb5..b426536 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5377,6 +5377,16 @@ S:   Supported
 F: drivers/net/ethernet/mellanox/mlx5/core/
 F: include/linux/mlx5/
 
+Mellanox MLX5 IB driver
+M:  Eli Cohen e...@mellanox.com
+L:  linux-rdma@vger.kernel.org
+W:  http://www.mellanox.com
+Q:  http://patchwork.kernel.org/project/linux-rdma/list/
+T:  git://openfabrics.org/~eli/connect-ib.git
+S:  Supported
+F:  include/linux/mlx5/
+F:  drivers/infiniband/hw/mlx5/
+
 MODULE SUPPORT
 M: Rusty Russell ru...@rustcorp.com.au
 S: Maintained
diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index c85b56c..5ceda71 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -50,6 +50,7 @@ source drivers/infiniband/hw/amso1100/Kconfig
 source drivers/infiniband/hw/cxgb3/Kconfig
 source drivers/infiniband/hw/cxgb4/Kconfig
 source drivers/infiniband/hw/mlx4/Kconfig
+source drivers/infiniband/hw/mlx5/Kconfig
 source drivers/infiniband/hw/nes/Kconfig
 source drivers/infiniband/hw/ocrdma/Kconfig
 
diff --git a/drivers/infiniband/Makefile b/drivers/infiniband/Makefile
index b126fef..1fe6988 100644
--- a/drivers/infiniband/Makefile
+++ b/drivers/infiniband/Makefile
@@ -7,6 +7,7 @@ obj-$(CONFIG_INFINIBAND_AMSO1100)   += hw/amso1100/
 obj-$(CONFIG_INFINIBAND_CXGB3) += hw/cxgb3/
 obj-$(CONFIG_INFINIBAND_CXGB4) += hw/cxgb4/
 obj-$(CONFIG_MLX4_INFINIBAND)  += hw/mlx4/
+obj-$(CONFIG_MLX5_INFINIBAND)  += hw/mlx5/
 obj-$(CONFIG_INFINIBAND_NES)   += hw/nes/
 obj-$(CONFIG_INFINIBAND_OCRDMA)+= hw/ocrdma/
 obj-$(CONFIG_INFINIBAND_IPOIB) += ulp/ipoib/
diff --git a/drivers/infiniband/hw/mlx5/Kconfig 
b/drivers/infiniband/hw/mlx5/Kconfig
new file mode 100644
index 000..8e6aebf
--- /dev/null
+++ b/drivers/infiniband/hw/mlx5/Kconfig
@@ -0,0 +1,10 @@
+config MLX5_INFINIBAND
+   tristate Mellanox Connect-IB HCA support
+   depends on NETDEVICES  ETHERNET  PCI  X86
+   select NET_VENDOR_MELLANOX
+   select MLX5_CORE
+   ---help---
+ This driver provides low-level InfiniBand support for
+ Mellanox Connect-IB PCI Express host channel adapters (HCAs).
+ This is required to use InfiniBand protocols such as
+ IP-over-IB or SRP with these devices.
diff --git a/drivers/infiniband/hw/mlx5/Makefile 
b/drivers/infiniband/hw/mlx5/Makefile
new file mode 100644
index 000..0f492da
--- /dev/null
+++ b/drivers/infiniband/hw/mlx5/Makefile
@@ -0,0 +1,4 @@
+obj-$(CONFIG_MLX5_INFINIBAND)  += mlx5_ib.o
+ccflags-y += -Wall -Werror -DDEBUG
+
+mlx5_ib-y :=   main.o cq.o doorbell.o qp.o mem.o srq.o mr.o ah.o mad.o
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for/net-next 0/8] Add Mellanox mlx5 driver for Connect-IB devices

2013-06-26 Thread Or Gerlitz
From: Eli Cohen e...@mellanox.com

The patches that follow constitute the driver for Mellanox's 5th generation
of HCAs named Connect-IB.

The driver is comprised of two kernel modules: mlx5_ib and mlx5_core. This
partitioning resembles what we have for mlx4 with the substantial difference
that mlx5_ib is the pci device driver and not mlx5_core.

mlx5_core provides general functionality that is intended to be used by
other Mellanox devices that will be introduced in the future. In this sense,
it can be perceived as a library. mlx5_ib has a similar role as any hardware
device under drivers/infiniband/hw.

The patches are partitioned to avoid exceeding the 100KB vger.kernel.org
limitation. They are divided such that the first three ones have the code
of the mlx5_core driver, and the last five the code of the mlx5_ib driver.

Only the last patch per driver adds the Makefiles and Kconfigs, to make
things robust for future bisections.

PPC is not yet supported but support will be included in the near future.

changes from V0:
 - Per Dave's request, cross posting to both netdev and linux-rdma, to see 
   if there are comments from netdev on the core driver.

Eli Cohen (8):
  net/mlx5: Mellanox Connect-IB, core driver part 1/3
  net/mlx5: Mellanox Connect-IB, core driver part 2/3
  net/mlx5: Mellanox Connect-IB, core driver part 3/3
  IB/mlx5: Mellanox Connect-IB, IB driver part 1/5
  IB/mlx5: Mellanox Connect-IB, IB driver part 2/5
  IB/mlx5: Mellanox Connect-IB, IB driver part 3/5
  IB/mlx5: Mellanox Connect-IB, IB driver part 4/5
  IB/mlx5: Mellanox Connect-IB, IB driver part 5/5

 MAINTAINERS|   22 +
 drivers/infiniband/Kconfig |1 +
 drivers/infiniband/Makefile|1 +
 drivers/infiniband/hw/mlx5/Kconfig |   10 +
 drivers/infiniband/hw/mlx5/Makefile|4 +
 drivers/infiniband/hw/mlx5/ah.c|   95 +
 drivers/infiniband/hw/mlx5/cq.c|  851 +++
 drivers/infiniband/hw/mlx5/doorbell.c  |  100 +
 drivers/infiniband/hw/mlx5/mad.c   |  143 ++
 drivers/infiniband/hw/mlx5/main.c  | 1512 
 drivers/infiniband/hw/mlx5/mem.c   |  194 ++
 drivers/infiniband/hw/mlx5/mlx5_ib.h   |  593 +
 drivers/infiniband/hw/mlx5/mr.c| 1025 
 drivers/infiniband/hw/mlx5/qp.c| 2549 
 drivers/infiniband/hw/mlx5/srq.c   |  481 
 drivers/infiniband/hw/mlx5/user.h  |  123 +
 drivers/net/ethernet/mellanox/Kconfig  |1 +
 drivers/net/ethernet/mellanox/Makefile |1 +
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig|   18 +
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |6 +
 drivers/net/ethernet/mellanox/mlx5/core/alloc.c|  244 ++
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c  | 1497 
 drivers/net/ethernet/mellanox/mlx5/core/cq.c   |  226 ++
 drivers/net/ethernet/mellanox/mlx5/core/debugfs.c  |  600 +
 drivers/net/ethernet/mellanox/mlx5/core/eq.c   |  523 
 drivers/net/ethernet/mellanox/mlx5/core/fw.c   |  187 ++
 drivers/net/ethernet/mellanox/mlx5/core/health.c   |  216 ++
 drivers/net/ethernet/mellanox/mlx5/core/mad.c  |   80 +
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  483 
 drivers/net/ethernet/mellanox/mlx5/core/mcg.c  |  108 +
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|   96 +
 drivers/net/ethernet/mellanox/mlx5/core/mr.c   |  138 ++
 .../net/ethernet/mellanox/mlx5/core/pagealloc.c|  438 
 drivers/net/ethernet/mellanox/mlx5/core/pd.c   |  103 +
 drivers/net/ethernet/mellanox/mlx5/core/port.c |  106 +
 drivers/net/ethernet/mellanox/mlx5/core/qp.c   |  303 +++
 drivers/net/ethernet/mellanox/mlx5/core/srq.c  |  225 ++
 drivers/net/ethernet/mellanox/mlx5/core/uar.c  |  225 ++
 include/linux/mlx5/cmd.h   |   51 +
 include/linux/mlx5/cq.h|  166 ++
 include/linux/mlx5/device.h|  886 +++
 include/linux/mlx5/doorbell.h  |   81 +
 include/linux/mlx5/driver.h|  763 ++
 include/linux/mlx5/qp.h|  467 
 include/linux/mlx5/srq.h   |   41 +
 45 files changed, 15983 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/mlx5/Kconfig
 create mode 100644 drivers/infiniband/hw/mlx5/Makefile
 create mode 100644 drivers/infiniband/hw/mlx5/ah.c
 create mode 100644 drivers/infiniband/hw/mlx5/cq.c
 create mode 100644 drivers/infiniband/hw/mlx5/doorbell.c
 create mode 100644 drivers/infiniband/hw/mlx5/mad.c
 create mode 100644 drivers/infiniband/hw/mlx5/main.c
 create mode 100644 drivers/infiniband/hw/mlx5/mem.c
 create mode 100644 drivers/infiniband/hw/mlx5/mlx5_ib.h
 

[PATCH for/net-next 4/8] IB/mlx5: Mellanox Connect-IB, IB driver part 1/5

2013-06-26 Thread Or Gerlitz
From: Eli Cohen e...@mellanox.com

Signed-off-by: Eli Cohen e...@mellanox.com
---
 drivers/infiniband/hw/mlx5/ah.c   |   95 
 drivers/infiniband/hw/mlx5/cq.c   |  851 +
 drivers/infiniband/hw/mlx5/doorbell.c |  100 
 drivers/infiniband/hw/mlx5/mad.c  |  143 ++
 4 files changed, 1189 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/mlx5/ah.c
 create mode 100644 drivers/infiniband/hw/mlx5/cq.c
 create mode 100644 drivers/infiniband/hw/mlx5/doorbell.c
 create mode 100644 drivers/infiniband/hw/mlx5/mad.c

diff --git a/drivers/infiniband/hw/mlx5/ah.c b/drivers/infiniband/hw/mlx5/ah.c
new file mode 100644
index 000..ff8f1cb
--- /dev/null
+++ b/drivers/infiniband/hw/mlx5/ah.c
@@ -0,0 +1,95 @@
+/*
+ * Copyright (c) 2013, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include mlx5_ib.h
+
+struct ib_ah *create_ib_ah(struct ib_ah_attr *ah_attr,
+  struct mlx5_ib_ah *ah)
+{
+   u32 sgi;
+
+   if (ah_attr-ah_flags  IB_AH_GRH) {
+   sgi = ah_attr-grh.sgid_index  20;
+
+   memcpy(ah-av.rgid, ah_attr-grh.dgid, 16);
+   ah-av.grh_gid_fl = cpu_to_be32(ah_attr-grh.flow_label |
+   (1  30) | sgi);
+   ah-av.hop_limit = ah_attr-grh.hop_limit;
+   ah-av.tclass = ah_attr-grh.traffic_class;
+   }
+
+   ah-av.rlid = cpu_to_be16(ah_attr-dlid);
+   ah-av.fl_mlid = ah_attr-src_path_bits  0x7f;
+   ah-av.stat_rate_sl = (ah_attr-static_rate  4) | (ah_attr-sl  0xf);
+
+   return ah-ibah;
+}
+
+struct ib_ah *mlx5_ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr)
+{
+   struct mlx5_ib_ah *ah;
+
+   ah = kzalloc(sizeof(*ah), GFP_ATOMIC);
+   if (!ah)
+   return ERR_PTR(-ENOMEM);
+
+   return create_ib_ah(ah_attr, ah); /* never fails */
+}
+
+int mlx5_ib_query_ah(struct ib_ah *ibah, struct ib_ah_attr *ah_attr)
+{
+   struct mlx5_ib_ah *ah = to_mah(ibah);
+   u32 tmp;
+
+   memset(ah_attr, 0, sizeof(*ah_attr));
+
+   tmp = be32_to_cpu(ah-av.grh_gid_fl);
+   if (tmp  (1  30)) {
+   ah_attr-ah_flags = IB_AH_GRH;
+   ah_attr-grh.sgid_index = (tmp  20)  0xff;
+   ah_attr-grh.flow_label = tmp  0xf;
+   memcpy(ah_attr-grh.dgid, ah-av.rgid, 16);
+   ah_attr-grh.hop_limit = ah-av.hop_limit;
+   ah_attr-grh.traffic_class = ah-av.tclass;
+   }
+   ah_attr-dlid = be16_to_cpu(ah-av.rlid);
+   ah_attr-static_rate = ah-av.stat_rate_sl  4;
+   ah_attr-sl = ah-av.stat_rate_sl  0xf;
+
+   return 0;
+}
+
+int mlx5_ib_destroy_ah(struct ib_ah *ah)
+{
+   kfree(to_mah(ah));
+   return 0;
+}
diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
new file mode 100644
index 000..001e182
--- /dev/null
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -0,0 +1,851 @@
+/*
+ * Copyright (c) 2013, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *  

[PATCH for/net-next 6/8] IB/mlx5: Mellanox Connect-IB, IB driver part 3/5

2013-06-26 Thread Or Gerlitz
From: Eli Cohen e...@mellanox.com

Signed-off-by: Eli Cohen e...@mellanox.com
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  593 
 drivers/infiniband/hw/mlx5/mr.c  | 1025 ++
 2 files changed, 1618 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/mlx5/mlx5_ib.h
 create mode 100644 drivers/infiniband/hw/mlx5/mr.c

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
new file mode 100644
index 000..f197972
--- /dev/null
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -0,0 +1,593 @@
+/*
+ * Copyright (c) 2013, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef MLX5_IB_H
+#define MLX5_IB_H
+
+#include linux/kernel.h
+#include linux/sched.h
+#include rdma/ib_verbs.h
+#include rdma/ib_smi.h
+#include linux/mlx5/driver.h
+#include linux/mlx5/cq.h
+#include linux/mlx5/qp.h
+#include linux/mlx5/srq.h
+#include linux/types.h
+
+extern int mlx5_ib_debug_mask;
+
+#define mlx5_ib_dbg(dev, format, arg...)  \
+do {  \
+   if (debug_mask  mlx5_ib_debug_mask)   \
+   pr_debug(%s:%s:%d:(pid %d):  format, (dev)-ib_dev.name, \
+__func__, __LINE__, current-pid, ##arg); \
+} while (0)
+
+#define mlx5_ib_err(dev, format, arg...) \
+pr_err(%s:%s:%d:(pid %d):  format, (dev)-ib_dev.name, __func__,\
+   __LINE__, current-pid, ##arg)
+
+#define mlx5_ib_warn(dev, format, arg...) \
+pr_warn(%s:%s:%d:(pid %d):  format, (dev)-ib_dev.name, __func__,   \
+   __LINE__, current-pid, ##arg)
+
+#define MLX5_IB_MOD_DBG_MASK(mod_id)\
+static const u32 debug_mask = 1  (mod_id)
+
+enum {
+   MLX5_IB_MMAP_CMD_SHIFT  = 8,
+   MLX5_IB_MMAP_CMD_MASK   = 0xff,
+};
+
+enum mlx5_ib_mmap_cmd {
+   MLX5_IB_MMAP_REGULAR_PAGE   = 0,
+   MLX5_IB_MMAP_GET_CONTIGUOUS_PAGES   = 1, /* always last */
+};
+
+enum {
+   MLX5_RES_SCAT_DATA32_CQE= 0x1,
+   MLX5_RES_SCAT_DATA64_CQE= 0x2,
+   MLX5_REQ_SCAT_DATA32_CQE= 0x11,
+   MLX5_REQ_SCAT_DATA64_CQE= 0x22,
+};
+
+enum {
+   MLX5_IB_MOD_MR,
+   MLX5_IB_MOD_CQ,
+   MLX5_IB_MOD_QP,
+   MLX5_IB_MOD_MEM,
+   MLX5_IB_MOD_MAIN,
+   MLX5_IB_MOD_MAD,
+   MLX5_IB_MOD_SRQ,
+};
+
+/*
+ * we do not expose this yet so we use a value out of range */
+enum {
+   IB_QPT_REG_UMR = IB_QPT_MAX + 0x1234,
+};
+
+/* === this should be passed to the vergbs layer */
+enum {
+   IB_WR_SET_PSV = IB_WR_BIND_MW + 10,
+   IB_WR_GET_PSV,
+   IB_WR_CHECK_PSV,
+   IB_WR_RGET_PSV,
+   IB_WR_RCHECK_PSV,
+   IB_WR_UMR,
+};
+
+enum {
+   IB_SEND_UMR_UNREG   = IB_SEND_IP_CSUM  1,
+};
+
+enum ib_latency_class {
+   IB_LATENCY_CLASS_LOW,
+   IB_LATENCY_CLASS_MEDIUM,
+   IB_LATENCY_CLASS_HIGH,
+   IB_LATENCY_CLASS_FAST_PATH
+};
+/* === this should be passed to the vergbs layer */
+
+
+enum mlx5_ib_mad_ifc_flags {
+   MLX5_MAD_IFC_IGNORE_MKEY= 1,
+   MLX5_MAD_IFC_IGNORE_BKEY= 2,
+   MLX5_MAD_IFC_NET_VIEW   = 4,
+};
+
+struct mlx5_ib_ucontext {
+   struct ib_ucontext  ibucontext;
+   struct list_headdb_page_list;
+
+   /*
+* protect doorbell record alloc/free
+*/
+   struct mutexdb_page_mutex;
+   struct mlx5_uuar_info   uuari;
+};
+
+static inline struct mlx5_ib_ucontext *to_mucontext(struct 

Re: [PATCH V2 for-next 2/4] IB/core: Infra-structure to support verbs extensions through uverbs

2013-06-26 Thread Or Gerlitz

On 26/06/2013 16:05, Roland Dreier wrote:

On Wed, Jun 26, 2013 at 5:57 AM, Or Gerlitz ogerl...@mellanox.com wrote:

Add Infra-structure to support extended uverbs capabilities in a 
forward/backward manner. Uverbs command opcodes which are based on the verbs 
extensions approach should be greater or equal to IB_USER_VERBS_CMD_THRESHOLD. 
They have new header format and processed a bit differently.

I think you missed the feedback I gave to the previous version of this patch:

  This patch at least doesn't have a sufficient changelog.  I don't
  understand what extended capabilities are or why we need to change
  the header format.

  What is the verbs extensions approach?  Why does the kernel need to
  know about it?  What is different about the processing?  The only
  difference I see is that userspace now has a more complicated way to
  pass the size in, which the kernel seems to nearly ignore -- it just
  adds the sizes together and proceeds as before.


Roland, you provided the comment to this patch indeed, but it was on 
another series where the patch was posted, the RoCE IP based addressing 
one. I posted it twice since its an infrastructure (...) patch used by 
both series, I wanted to post V2 of the flow steering patches to make 
sure I addressed your comment on the void pointer OK, and take things 
from there, never mind.


To the point, the uverbs extensions construct is basically made from two 
building blocks


1. extended header which explicitly specifies the in/out verbs data size 
and in/out provider data size


2. a bit mask (comp mask) which allows to specify what fields in the 
uverbs command structure are used.


The combination of 1 + 2 will allow to extend commands which are 
provided along these building blocks without a need to bump the uverbs ABI.


Today, the kernel uverbs layer assumes a given size for each command, so 
for example, the provider udata IN size is in_words - size_of_cmd.


For commands added along this framework, the kernel could support all 
the previous versions towards user space in parallel, say we added new 
command cmdX, to both user and kernel, where v0 is the initial version, 
and later we added few fields to  and have cmdX_v1 and later on more 
fields and have cmdX_v2




+struct ib_uverbs_cmd_hdr_ex {
+   __u32 command;
+   __u16 in_words;
+   __u16 out_words;
+   __u16 provider_in_words;
+   __u16 provider_out_words;
+   __u32 cmd_hdr_reserved;
+};
+



Based on the bits set in the comp mask and the in_words field value, the 
kernel which has cmdX_v2 can work towards older user space 
libraries/applications e.g cmdX_v1 and cmdX_v0


The comp mask is not part of the header, but rather the 1st field of 
every uverbs command and response, here, in this series, it was added in 
patch 3/4 for the uverbs flow-steering structures which are cmdX_v0 in 
this context.


If we only used (in_words - size_of_cmd) we can't achieve that support.

Or.


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 for-next 2/4] IB/core: Infra-structure to support verbs extensions through uverbs

2013-06-26 Thread Or Gerlitz

On 26/06/2013 18:17, Or Gerlitz wrote:


Based on the bits set in the comp mask and the in_words field value, 
the kernel which has cmdX_v2 can work towards older user space 
libraries/applications e.g cmdX_v1 and cmdX_v0


The comp mask is not part of the header, but rather the 1st field of 
every uverbs command and response, here, in this series, it was added 
in patch 3/4 for the uverbs flow-steering structures which are cmdX_v0 
in this context.


The comp mask biz logic is also explained in Tzahi's OFA 2013 talk on 
verbs extensions, he is referring their to extending libibverbs API in 
user space towards applications but the concept is the same, slides here 
https://www.openfabrics.org/resources/document-downloads/presentations/doc_download/549-extending-verbs-api.html



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 for-next 1/4] IB/core: Add receive Flow Steering support

2013-06-26 Thread Or Gerlitz
On Wed, Jun 26, 2013 at 10:56 PM, Hefty, Sean sean.he...@intel.com wrote:
 The input to ib_create_flow is instance of struct ib_flow_attr which
 contain few mandatory control elements and optional flow specs.

 struct ib_flow_attr {
   enum ib_flow_attr_type type;
   u16  size;
   u16  priority;
   u8   num_of_specs;
   u8   port;
   u32  flags;

 This structure could be aligned better.

OK, I assume you mean arrange fields by decreasing size, correct? so
here we need to  put the flags field before the size field.


   /* Following are the optional layers according to user request
* struct ib_flow_spec_yyy
* struct ib_flow_spec_zzz
*/
 };

 As these specs are eventually coming from user space, they are defined and
 used in a way which allows adding new spec types without kernel/user ABI
 change, and with a little API enhancement which defines the newly added spec.

 The flow spec structures are defined in a TLV (Type-Length-Value) manner,
 which allows to call ib_create_flow with a list of variable length of
 optional specs.

 For the actual processing of ib_flow_attr the driver uses the number of
 specs and the size mandatory fields along with the TLV nature of the specs.

 Steering rules processing order is according to rules priority. The user
 sets the 12 low-order bits from the priority field and the remaining
 4 high-order bits are set by the kernel according to a domain the
 application or the layer that created the rule belongs to. Lower
 priority numerical value means higher priority.

 Why are bit fields being exposed to the user in this way?

Yes, this is probably not general enough. So what would you suggest,
use a more integral division? e.g 16 bits for priority and 16 bits for
location?


 +struct ib_flow *ib_create_flow(struct ib_qp *qp,
 +struct ib_flow_attr *flow_attr,
 +int domain)
 +{
 + struct ib_flow *flow_id;
 + if (!qp-device-create_flow)
 + return ERR_PTR(-ENOSYS);
 +
 + flow_id = qp-device-create_flow(qp, flow_attr, domain);
 + if (!IS_ERR(flow_id))
 + atomic_inc(qp-usecnt);
 + return flow_id;
 +}
 +EXPORT_SYMBOL(ib_create_flow);
 +
 +int ib_destroy_flow(struct ib_flow *flow_id)
 +{
 + int err;
 + struct ib_qp *qp = flow_id-qp;
 +
 + if (!flow_id-qp-device-destroy_flow)
 + return -ENOSYS;

 We can assume destroy_flow exists if create_flow does.

OK, will fix.


 +struct ib_flow_ib_filter {
 + __be32  l3_type_qpn;
 + u8  dst_gid[16];
 +};


 Maybe this is just a naming issue, but why wouldn't an IB filter have 
 SLID/DLID instead  of just DGID?  What does l3_type_qpn mean?  Is this just 
 the QPN?

yes, its just the QPN, will fix the name to better match.

 The TCP/IP filters are broken into separate filters based in L4/L3.  It would 
 seem to
 make sense if the IB filters were similarly divided into L2/L3/L4 filters.  
 IB and IPv6
 could probably share the same filter definition.

IPv6 filters wasn't defined through this submission, but as I wrote,
the scheme provided allows for adding more filters and flow specs.




 +struct ib_flow_spec_ib {
 + enum ib_flow_spec_type   type;
 + u16  size;
 + struct ib_flow_ib_filter val;
 + struct ib_flow_ib_filter mask;
 +};
 +
 +struct ib_flow_ipv4_filter {
 + __be32  src_ip;
 + __be32  dst_ip;
 +};
 +
 +struct ib_flow_spec_ipv4 {
 + enum ib_flow_spec_type type;
 + u16size;
 + struct ib_flow_ipv4_filter val;
 + struct ib_flow_ipv4_filter mask;
 +};
 +
 +struct ib_flow_tcp_udp_filter {
 + __be16  dst_port;
 + __be16  src_port;
 +};
 +
 +struct ib_flow_spec_tcp_udp {
 + enum ib_flow_spec_typetype;
 + u16   size;
 + struct ib_flow_tcp_udp_filter val;
 + struct ib_flow_tcp_udp_filter mask;
 +};

 - Sean
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 for-next 1/4] IB/core: Add receive Flow Steering support

2013-06-27 Thread Or Gerlitz
On Thu, Jun 27, 2013 at 12:33 AM, Steve Wise
sw...@opengridcomputing.com wrote:
 On 6/26/2013 4:13 PM, Or Gerlitz wrote:
 On Wed, Jun 26, 2013 at 10:56 PM, Hefty, Sean sean.he...@intel.com

 Steering rules processing order is according to rules priority. The user
 sets the 12 low-order bits from the priority field and the remaining
 4 high-order bits are set by the kernel according to a domain the
 application or the layer that created the rule belongs to. Lower
 priority numerical value means higher priority.

 Why are bit fields being exposed to the user in this way?

 Yes, this is probably not general enough. So what would you suggest,
 use a more integral division? e.g 16 bits for priority and 16 bits for 
 location?

 If the kernel driver is setting the location, whatever that is, why would
 the application need access to it?  IE isn't a priority field enough to
 allow the application provide an ordering/prioritization to the rules?

I wasn't accurate, the idea is that per domain we allow the app to set
the rule priority, but the actual priority towards the HW is made of
the provided prioriry X domain, where different domains have different
priorities along the order set by the verbs header file see enum
ib_flow_domain
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 for-next 1/4] IB/core: Add receive Flow Steering support

2013-06-27 Thread Or Gerlitz
On Thu, Jun 27, 2013 at 11:55 PM, Hefty, Sean sean.he...@intel.com wrote:

 My point was that the IPv6 filter should be defined and used here.  The 
 following basic  filters were defined:
 ethernet -  src/dst mac ...
 ip -src/dst ip
 tcp/udp -   src/dst port
 These are at least somewhat intuitive to me.  The IB filter is
 ib -(src/dst?) qpn, dgid
 This is equivalent to creating a filter that's:
 tcpip - port, dst ip
 IMO, it would be better to define IB filters using the same structure that 
 you used for
 tcp/ip/ethernet.  For example
 ibqp -  src/dst qpn (pkey?)
 ipv6 -  src/dst ipv6/gids (flowlabel?)
 iblink -src/dst lids, (sl?)

 If the hardware can only support matching on the qpn and dgid, then it can 
 simply fail
 any requests which specify a non-zero mask on the unsupported components.

Sean, I agree that the provided filter on dest qpn / dgid doesn't make
sense and will fix that out.

Still for the initial set of patches that goes in I tend to just
remove the IB filter structure and define the different IB filters
along your proposal in a follow-up patches/es, OK?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 0/8] Add Mellanox mlx5 driver for Connect-IB devices

2013-06-28 Thread Or Gerlitz
 On Sun, Jun 16, 2013 at 3:02 PM, Eli Cohen e...@dev.mellanox.co.il wrote:

 From: Eli Cohen e...@mellanox.com

 The patches that follow constitute the driver for Mellanox's 5th generation
 of HCAs named Connect-IB.

 The driver is comprised of two kernel modules: mlx5_ib and mlx5_core. This
 partitioning resembles what we have for mlx4 with the substantial difference
 that mlx5_ib is the pci device driver and not mlx5_core.


Hi Roland,

We're on 3.10-rc7  soon on -rc8, so things warm up for 3.11... today
will mark two working weeks since the mlx5 driver was posted here, and
no comment, its marked as new in your patchwork. Is this safe for
3.11?  any comments or fixes we have to apply? As you probably saw, I
posted V1 which is essentially almost the same as V0 but with netdev
copied, to see if there are rejections/comments from there, so far
nothing. Dave said he wants to see this posted to netdev inorder to
decide if he's OK for the core driver to be pushed through your tree
too.

Or.

mlx5_core provides general functionality that is intended to be used by

 other Mellanox devices that will be introduced in the future. In this sense,
 it can be perceived as a library. mlx5_ib has a similar role as any hardware
 device under drivers/infiniband/hw.

 The patches are partitioned to avoid exceeding the 100KB vger.kernel.org
 limitation. Only the last patch adds the Makefiles and Kconfigs, to make
 things robust for future bisections.

 PPC is not yet supported but support will be included in the near future.

 Eli

 Eli Cohen (8):
   mlx5: Mellanox Connect-IB driver part 1/8
   mlx5: Mellanox Connect-IB driver part 2/8
   mlx5: Mellanox Connect-IB driver part 3/8
   mlx5: Mellanox Connect-IB driver part 4/8
   mlx5: Mellanox Connect-IB driver part 5/8
   mlx5: Mellanox Connect-IB driver part 6/8
   mlx5: Mellanox Connect-IB driver part 7/8
   mlx5: Mellanox Connect-IB driver part 8/8

  MAINTAINERS|   22 +
  drivers/infiniband/Kconfig |1 +
  drivers/infiniband/Makefile|1 +
  drivers/infiniband/hw/mlx5/Kconfig |   10 +
  drivers/infiniband/hw/mlx5/Makefile|4 +
  drivers/infiniband/hw/mlx5/ah.c|   95 +
  drivers/infiniband/hw/mlx5/cq.c|  851 +++
  drivers/infiniband/hw/mlx5/doorbell.c  |  100 +
  drivers/infiniband/hw/mlx5/mad.c   |  143 ++
  drivers/infiniband/hw/mlx5/main.c  | 1512 
  drivers/infiniband/hw/mlx5/mem.c   |  194 ++
  drivers/infiniband/hw/mlx5/mlx5_ib.h   |  593 +
  drivers/infiniband/hw/mlx5/mr.c| 1025 
  drivers/infiniband/hw/mlx5/qp.c| 2549 
 
  drivers/infiniband/hw/mlx5/srq.c   |  481 
  drivers/infiniband/hw/mlx5/user.h  |  123 +
  drivers/net/ethernet/mellanox/Kconfig  |1 +
  drivers/net/ethernet/mellanox/Makefile |1 +
  drivers/net/ethernet/mellanox/mlx5/core/Kconfig|   18 +
  drivers/net/ethernet/mellanox/mlx5/core/Makefile   |6 +
  drivers/net/ethernet/mellanox/mlx5/core/alloc.c|  244 ++
  drivers/net/ethernet/mellanox/mlx5/core/cmd.c  | 1497 
  drivers/net/ethernet/mellanox/mlx5/core/cq.c   |  226 ++
  drivers/net/ethernet/mellanox/mlx5/core/debugfs.c  |  600 +
  drivers/net/ethernet/mellanox/mlx5/core/eq.c   |  523 
  drivers/net/ethernet/mellanox/mlx5/core/fw.c   |  187 ++
  drivers/net/ethernet/mellanox/mlx5/core/health.c   |  216 ++
  drivers/net/ethernet/mellanox/mlx5/core/mad.c  |   80 +
  drivers/net/ethernet/mellanox/mlx5/core/main.c |  483 
  drivers/net/ethernet/mellanox/mlx5/core/mcg.c  |  108 +
  .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|   96 +
  drivers/net/ethernet/mellanox/mlx5/core/mr.c   |  138 ++
  .../net/ethernet/mellanox/mlx5/core/pagealloc.c|  438 
  drivers/net/ethernet/mellanox/mlx5/core/pd.c   |  103 +
  drivers/net/ethernet/mellanox/mlx5/core/port.c |  106 +
  drivers/net/ethernet/mellanox/mlx5/core/qp.c   |  303 +++
  drivers/net/ethernet/mellanox/mlx5/core/srq.c  |  225 ++
  drivers/net/ethernet/mellanox/mlx5/core/uar.c  |  225 ++
  include/linux/mlx5/cmd.h   |   51 +
  include/linux/mlx5/cq.h|  166 ++
  include/linux/mlx5/device.h|  886 +++
  include/linux/mlx5/doorbell.h  |   81 +
  include/linux/mlx5/driver.h|  763 ++
  include/linux/mlx5/qp.h|  467 
  include/linux/mlx5/srq.h   |   41 +
  45 files changed, 15983 insertions(+)
  create mode 100644 drivers/infiniband/hw/mlx5/Kconfig
  create mode 100644 drivers/infiniband/hw/mlx5/Makefile
  create mode 100644 

Re: [PATCH for-next 0/8] Add Mellanox mlx5 driver for Connect-IB devices

2013-06-28 Thread Or Gerlitz
On Fri, Jun 28, 2013 at 4:20 PM, Or Gerlitz or.gerl...@gmail.com wrote:
 On Sun, Jun 16, 2013 at 3:02 PM, Eli Cohen e...@dev.mellanox.co.il wrote:

 From: Eli Cohen e...@mellanox.com

 The patches that follow constitute the driver for Mellanox's 5th
 generation
 of HCAs named Connect-IB.

 The driver is comprised of two kernel modules: mlx5_ib and mlx5_core. This
 partitioning resembles what we have for mlx4 with the substantial
 difference
 that mlx5_ib is the pci device driver and not mlx5_core.


 Hi Roland,

 We're on 3.10-rc7  soon on -rc8, so things warm up for 3.11... today will
 mark two working weeks since the mlx5 driver was posted here, and no
 comment, its marked as new in your patchwork. Is this safe for 3.11?  any
 comments or fixes we have to apply? As you probably saw, I posted V1 which
 is essentially almost the same as V0 but with netdev copied, to see if there
 are rejections/comments from there, so far nothing. Dave said he wants to
 see this posted to netdev inorder to decide if he's OK for the core driver
 to be pushed through your tree too.

If this helps, the patches are here
git://beany.openfabrics.org/~eli/connect-ib.git branch mlx5-v1-int
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 0/8] Add Mellanox mlx5 driver for Connect-IB devices

2013-07-02 Thread Or Gerlitz
On Tue, Jul 2, 2013 at 12:22 AM, Roland Dreier rol...@kernel.org wrote:
 Also, sparse warns about [...] in mlx5_ib.h.  Nor does it have any callers, 
 so it's a bit
 hard to tell if it's really and truly a bug.

removing this function for V2
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 0/8] Add Mellanox mlx5 driver for Connect-IB devices

2013-07-02 Thread Or Gerlitz
On Mon, Jul 1, 2013 at 9:03 PM, Roland Dreier rol...@kernel.org wrote:
 In general I don't think overriding the CFLAGS (as you do in the mlx5
 Makefiles) is a good idea, and in particular here your -Wall -Werror
 break the build, at least for my gcc 4.7.3:

   CC  drivers/infiniband/hw/mlx5/qp.o
 /home/roland/Src/linux-merge.git/drivers/infiniband/hw/mlx5/qp.c: In
 function ‘sq_overhead’:
 /home/roland/Src/linux-merge.git/drivers/infiniband/hw/mlx5/qp.c:234:2:
 error: case value ‘4671’ not in enumerated type ‘enum ib_qp_type’
 [-Werror=switch]


Will do both (A) remove the flags added on the driver makefile and (B)
fix the issues pointed by these flags...

[...]

 What is this IB_QPT_REG_UMR stuff anyway?  Shouldn't we strip out all
 that from the mlx5 driver until it's available in the core code?

IB_QPT_REG_UMR is the type of QP used internally by the driver, to do
plain memory registration by verbs consumers. Will apply here a
similar practice to the one done by mlx4 driver to create the proxy
and tunnel QP types  for SRIOV, e.g will define MLX5_IB_QPT_REG_UMR
and use that under driver specific QP creation flags for which we have
the foundations in the IB verbs header file to go and use.

[...]

 /* === this should be passed to the vergbs layer */
 enum {
 IB_WR_SET_PSV = IB_WR_BIND_MW + 10,
 IB_WR_GET_PSV,
 IB_WR_CHECK_PSV,
 IB_WR_RGET_PSV,
 IB_WR_RCHECK_PSV,
 IB_WR_UMR,
 };

 enum {
 IB_SEND_UMR_UNREG   = IB_SEND_IP_CSUM  1,
 };

 enum ib_latency_class {
 IB_LATENCY_CLASS_LOW,
 IB_LATENCY_CLASS_MEDIUM,
 IB_LATENCY_CLASS_HIGH,
 IB_LATENCY_CLASS_FAST_PATH
 };
 /* === this should be passed to the vergbs layer */

 looks like it shouldn't be in your submission.  (What are vergbs anyway? :)

Will fix that, basically, will remove things we can get along for now,
e.g unused, even not internally such as IB_WR_YYY_PSV, and internalize
what we do need internally e.g use MLX5_IB_XXX where IB_XXX was used

and vergbs is a typo whose fix missed the version submitted...
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 0/8] Add Mellanox mlx5 driver for Connect-IB devices

2013-07-02 Thread Or Gerlitz
On Mon, Jul 1, 2013 at 8:49 PM, Roland Dreier rol...@kernel.org wrote:
 So I'm inclined to apply the mlx5 driver for 3.11, since it's a
 completely new driver.  However, reading through it so far I had the
 following comments, and I'd like these cleanups addressed along with Dave 
 Miller's:

Roland,

Working to have all Dave Miller's comments addressed along with yours
and post V2 later this week, so we will be still on track for a 3.11
merge of the core and IB driver through your tree.


 - The debug mask complexity seems unnecessary now that pr_debug() is
 controllable at runtime with the DYNAMIC_DEBUG stuff.  We should get
 rid of the extra level of indirection.

OK

 - I think the active flag for the health check timer is unnecessary.
 It can just be stopped with del_timer_sync().

Jack was looking on this today and we're not sure, he will send his
reading of the matter tomorrow.

 - Many places use foo_spl as a name, and in the Linux kernel
 foo_lock would be much more idiomatic and easier to read.

sure, done.


 - In:

 +struct mlx5_cmd {
 ...
 +struct mlx5_cmd_statsstats[0x80a];

 the 0x80a magic number really needs to have a name.

done.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/13] IB SRP initiator patches for kernel 3.11

2013-07-03 Thread Or Gerlitz

On 03/07/2013 15:41, Bart Van Assche wrote:


[...]

Bart,


The individual patches in this series are as follows:
0001-IB-srp-Fix-remove_one-crash-due-to-resource-exhausti.patch
0002-IB-srp-Fix-race-between-srp_queuecommand-and-srp_cla.patch
0003-IB-srp-Avoid-that-srp_reset_host-is-skipped-after-a-.patch
0004-IB-srp-Fail-I-O-fast-if-target-offline.patch
0005-IB-srp-Skip-host-settle-delay.patch
0006-IB-srp-Maintain-a-single-connection-per-I_T-nexus.patch
0007-IB-srp-Keep-rport-as-long-as-the-IB-transport-layer.patch
0008-scsi_transport_srp-Add-transport-layer-error-handlin.patch
0009-IB-srp-Add-srp_terminate_io.patch
0010-IB-srp-Use-SRP-transport-layer-error-recovery.patch
0011-IB-srp-Start-timers-if-a-transport-layer-error-occur.patch
0012-IB-srp-Fail-SCSI-commands-silently.patch
0013-IB-srp-Make-HCA-completion-vector-configurable.patch
0014-IB-srp-Make-transport-layer-retry-count-configurable.patch
0015-IB-srp-Bump-driver-version-and-release-date.patch


Some of these patches were already picked by Roland (SB), I would 
suggest that you

post V4 and drop the ones which were accepted.

e8ca413 IB/srp: Bump driver version and release date
4b5e5f4 IB/srp: Make HCA completion vector configurable
96fc248 IB/srp: Maintain a single connection per I_T nexus
99e1c13 IB/srp: Fail I/O fast if target offline
2742c1d IB/srp: Skip host settle delay
086f44f IB/srp: Avoid skipping srp_reset_host() after a transport error
1fe0cb8 IB/srp: Fix remove_one crash due to resource exhaustion

Also, Would help if you use the --cover-letter of git format-patch and
the resulted cover letter  (patch 0/N) as it has standard content which
you can enhance and place your additions.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 0/8] Add Mellanox mlx5 driver for Connect-IB devices

2013-07-03 Thread Or Gerlitz

On 01/07/2013 20:49, Roland Dreier wrote:

- I think the active flag for the health check timer is unnecessary.
It can just be stopped with del_timer_sync().


Hi Roland

Jack looked on this comment/code and he says that the active flag is used
to prevent re-scheduling the timer from inside the timer handling routine.

In the kernel, the comment header in the source file for del_timer_sync
explicitly states that re-scheduling the timer must be prevented,
or the sync is useless:Callers must prevent restarting of the timer, 
otherwise

this function is meaningless

So we believe that code should remain.

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 4/9] IB/core: Add reserved values to enums for low-level drivers use

2013-07-03 Thread Or Gerlitz
From: Jack Morgenstein ja...@dev.mellanox.co.il

Continue the approach taken by commit d2b57063e4a IB/core: Reserve bits in 
enum ib_qp_create_flags for low-level driver use and reserved entries to 
the ib_qp_type and ib_wr_opcode enums. The low-level drivers will then define 
macros to use these reserved values, giving proper names to the macros for 
readability. Also add a range of reserved flags to enum ib_send_flags.

The mlx5 IB driver uses the new additions.

Signed-off-by: Jack Morgenstein ja...@dev.mellanox.co.il
---
 include/rdma/ib_verbs.h |   35 +--
 1 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 98cc4b2..645c3ce 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -610,7 +610,21 @@ enum ib_qp_type {
IB_QPT_RAW_PACKET = 8,
IB_QPT_XRC_INI = 9,
IB_QPT_XRC_TGT,
-   IB_QPT_MAX
+   IB_QPT_MAX,
+   /* Reserve a range for qp types internal to the low level driver.
+* These qp types will not be visible at the IB core layer, so the
+* IB_QPT_MAX usages should not be affected in the core layer
+*/
+   IB_QPT_RESERVED1 = 0x1000,
+   IB_QPT_RESERVED2,
+   IB_QPT_RESERVED3,
+   IB_QPT_RESERVED4,
+   IB_QPT_RESERVED5,
+   IB_QPT_RESERVED6,
+   IB_QPT_RESERVED7,
+   IB_QPT_RESERVED8,
+   IB_QPT_RESERVED9,
+   IB_QPT_RESERVED10,
 };
 
 enum ib_qp_create_flags {
@@ -766,6 +780,19 @@ enum ib_wr_opcode {
IB_WR_MASKED_ATOMIC_CMP_AND_SWP,
IB_WR_MASKED_ATOMIC_FETCH_AND_ADD,
IB_WR_BIND_MW,
+   /* reserve values for low level drivers' internal use.
+* These values will not be used at all in the ib core layer.
+*/
+   IB_WR_RESERVED1 = 0xf0,
+   IB_WR_RESERVED2,
+   IB_WR_RESERVED3,
+   IB_WR_RESERVED4,
+   IB_WR_RESERVED5,
+   IB_WR_RESERVED6,
+   IB_WR_RESERVED7,
+   IB_WR_RESERVED8,
+   IB_WR_RESERVED9,
+   IB_WR_RESERVED10,
 };
 
 enum ib_send_flags {
@@ -773,7 +800,11 @@ enum ib_send_flags {
IB_SEND_SIGNALED= (11),
IB_SEND_SOLICITED   = (12),
IB_SEND_INLINE  = (13),
-   IB_SEND_IP_CSUM = (14)
+   IB_SEND_IP_CSUM = (14),
+
+   /* reserve bits 26-31 for low level drivers' internal use */
+   IB_SEND_RESERVED_START  = (1  26),
+   IB_SEND_RESERVED_END= (1  31),
 };
 
 struct ib_sge {
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 5/9] IB/mlx5: Mellanox Connect-IB, IB driver part 1/5

2013-07-03 Thread Or Gerlitz
From: Eli Cohen e...@mellanox.com

Signed-off-by: Eli Cohen e...@mellanox.com
---
 drivers/infiniband/hw/mlx5/ah.c   |   95 
 drivers/infiniband/hw/mlx5/cq.c   |  844 +
 drivers/infiniband/hw/mlx5/doorbell.c |  100 
 drivers/infiniband/hw/mlx5/mad.c  |  139 ++
 4 files changed, 1178 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/mlx5/ah.c
 create mode 100644 drivers/infiniband/hw/mlx5/cq.c
 create mode 100644 drivers/infiniband/hw/mlx5/doorbell.c
 create mode 100644 drivers/infiniband/hw/mlx5/mad.c

diff --git a/drivers/infiniband/hw/mlx5/ah.c b/drivers/infiniband/hw/mlx5/ah.c
new file mode 100644
index 000..ff8f1cb
--- /dev/null
+++ b/drivers/infiniband/hw/mlx5/ah.c
@@ -0,0 +1,95 @@
+/*
+ * Copyright (c) 2013, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include mlx5_ib.h
+
+struct ib_ah *create_ib_ah(struct ib_ah_attr *ah_attr,
+  struct mlx5_ib_ah *ah)
+{
+   u32 sgi;
+
+   if (ah_attr-ah_flags  IB_AH_GRH) {
+   sgi = ah_attr-grh.sgid_index  20;
+
+   memcpy(ah-av.rgid, ah_attr-grh.dgid, 16);
+   ah-av.grh_gid_fl = cpu_to_be32(ah_attr-grh.flow_label |
+   (1  30) | sgi);
+   ah-av.hop_limit = ah_attr-grh.hop_limit;
+   ah-av.tclass = ah_attr-grh.traffic_class;
+   }
+
+   ah-av.rlid = cpu_to_be16(ah_attr-dlid);
+   ah-av.fl_mlid = ah_attr-src_path_bits  0x7f;
+   ah-av.stat_rate_sl = (ah_attr-static_rate  4) | (ah_attr-sl  0xf);
+
+   return ah-ibah;
+}
+
+struct ib_ah *mlx5_ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr)
+{
+   struct mlx5_ib_ah *ah;
+
+   ah = kzalloc(sizeof(*ah), GFP_ATOMIC);
+   if (!ah)
+   return ERR_PTR(-ENOMEM);
+
+   return create_ib_ah(ah_attr, ah); /* never fails */
+}
+
+int mlx5_ib_query_ah(struct ib_ah *ibah, struct ib_ah_attr *ah_attr)
+{
+   struct mlx5_ib_ah *ah = to_mah(ibah);
+   u32 tmp;
+
+   memset(ah_attr, 0, sizeof(*ah_attr));
+
+   tmp = be32_to_cpu(ah-av.grh_gid_fl);
+   if (tmp  (1  30)) {
+   ah_attr-ah_flags = IB_AH_GRH;
+   ah_attr-grh.sgid_index = (tmp  20)  0xff;
+   ah_attr-grh.flow_label = tmp  0xf;
+   memcpy(ah_attr-grh.dgid, ah-av.rgid, 16);
+   ah_attr-grh.hop_limit = ah-av.hop_limit;
+   ah_attr-grh.traffic_class = ah-av.tclass;
+   }
+   ah_attr-dlid = be16_to_cpu(ah-av.rlid);
+   ah_attr-static_rate = ah-av.stat_rate_sl  4;
+   ah_attr-sl = ah-av.stat_rate_sl  0xf;
+
+   return 0;
+}
+
+int mlx5_ib_destroy_ah(struct ib_ah *ah)
+{
+   kfree(to_mah(ah));
+   return 0;
+}
diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
new file mode 100644
index 000..c05868e
--- /dev/null
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -0,0 +1,844 @@
+/*
+ * Copyright (c) 2013, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *  

[PATCH V2 9/9] IB/mlx5: Mellanox Connect-IB, IB driver part 5/5

2013-07-03 Thread Or Gerlitz
From: Eli Cohen e...@mellanox.com

Signed-off-by: Eli Cohen e...@mellanox.com
---
 MAINTAINERS |   10 ++
 drivers/infiniband/Kconfig  |1 +
 drivers/infiniband/Makefile |1 +
 drivers/infiniband/hw/mlx5/Kconfig  |   10 ++
 drivers/infiniband/hw/mlx5/Makefile |3 +++
 5 files changed, 25 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/mlx5/Kconfig
 create mode 100644 drivers/infiniband/hw/mlx5/Makefile

diff --git a/MAINTAINERS b/MAINTAINERS
index 6e82fb5..b426536 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5377,6 +5377,16 @@ S:   Supported
 F: drivers/net/ethernet/mellanox/mlx5/core/
 F: include/linux/mlx5/
 
+Mellanox MLX5 IB driver
+M:  Eli Cohen e...@mellanox.com
+L:  linux-rdma@vger.kernel.org
+W:  http://www.mellanox.com
+Q:  http://patchwork.kernel.org/project/linux-rdma/list/
+T:  git://openfabrics.org/~eli/connect-ib.git
+S:  Supported
+F:  include/linux/mlx5/
+F:  drivers/infiniband/hw/mlx5/
+
 MODULE SUPPORT
 M: Rusty Russell ru...@rustcorp.com.au
 S: Maintained
diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index c85b56c..5ceda71 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -50,6 +50,7 @@ source drivers/infiniband/hw/amso1100/Kconfig
 source drivers/infiniband/hw/cxgb3/Kconfig
 source drivers/infiniband/hw/cxgb4/Kconfig
 source drivers/infiniband/hw/mlx4/Kconfig
+source drivers/infiniband/hw/mlx5/Kconfig
 source drivers/infiniband/hw/nes/Kconfig
 source drivers/infiniband/hw/ocrdma/Kconfig
 
diff --git a/drivers/infiniband/Makefile b/drivers/infiniband/Makefile
index b126fef..1fe6988 100644
--- a/drivers/infiniband/Makefile
+++ b/drivers/infiniband/Makefile
@@ -7,6 +7,7 @@ obj-$(CONFIG_INFINIBAND_AMSO1100)   += hw/amso1100/
 obj-$(CONFIG_INFINIBAND_CXGB3) += hw/cxgb3/
 obj-$(CONFIG_INFINIBAND_CXGB4) += hw/cxgb4/
 obj-$(CONFIG_MLX4_INFINIBAND)  += hw/mlx4/
+obj-$(CONFIG_MLX5_INFINIBAND)  += hw/mlx5/
 obj-$(CONFIG_INFINIBAND_NES)   += hw/nes/
 obj-$(CONFIG_INFINIBAND_OCRDMA)+= hw/ocrdma/
 obj-$(CONFIG_INFINIBAND_IPOIB) += ulp/ipoib/
diff --git a/drivers/infiniband/hw/mlx5/Kconfig 
b/drivers/infiniband/hw/mlx5/Kconfig
new file mode 100644
index 000..8e6aebf
--- /dev/null
+++ b/drivers/infiniband/hw/mlx5/Kconfig
@@ -0,0 +1,10 @@
+config MLX5_INFINIBAND
+   tristate Mellanox Connect-IB HCA support
+   depends on NETDEVICES  ETHERNET  PCI  X86
+   select NET_VENDOR_MELLANOX
+   select MLX5_CORE
+   ---help---
+ This driver provides low-level InfiniBand support for
+ Mellanox Connect-IB PCI Express host channel adapters (HCAs).
+ This is required to use InfiniBand protocols such as
+ IP-over-IB or SRP with these devices.
diff --git a/drivers/infiniband/hw/mlx5/Makefile 
b/drivers/infiniband/hw/mlx5/Makefile
new file mode 100644
index 000..4ea0135
--- /dev/null
+++ b/drivers/infiniband/hw/mlx5/Makefile
@@ -0,0 +1,3 @@
+obj-$(CONFIG_MLX5_INFINIBAND)  += mlx5_ib.o
+
+mlx5_ib-y :=   main.o cq.o doorbell.o qp.o mem.o srq.o mr.o ah.o mad.o
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 0/9] Add Mellanox mlx5 driver for Connect-IB devices

2013-07-03 Thread Or Gerlitz
Hi Roland, all 

Here's V2 of the driver, with Dave's and Roland's comments addressed, 
looking forward to see if we have OK from Roland to merge that into 3.11

Jack, Moshe and Or.

changes from V1:

- Addreessed Dave Miller's comments:
   * Local variables in functions listed from longest to shortest
   * --i/++i changed to i--/i++ in all for-loops
   * Removed leading /* empty line from all comments
   * magic constants given names
   * endianness code moved to driver.h, and defined an endianness-dependent 
macro 
 for use in assignment.
   * destroy_msg_cache() duplicated code removed

- Addressed Roland's comments:

   * Renamed foo_spl to foo_lock for spinlocks.
   * Eliminated magic number from mlx5_cmd_stats field declaration in struct 
mlx5_cmd.
   * Eliminated unused procedure mlx5_ib_umem_populate_pas()
 command execution times, but all file-name-based mask bits removed.

   * Cleaned up mlx5_ib.h:
   * Added new patch for ib_verbs.h, adding reserved values to several enums
   * For several ib-core enums, added reserved values for use by low-level 
drivers. 
 By defining macros at the low level (i.e., renaming the reserved values, 
in effect), the 
 ll drivers may use these enums without needing to duplicate the ib-core 
enums while adding 
 extra values. This fixes compilation problems such as:
/home/roland/Src/linux-merge.git/drivers/infiniband/hw/mlx5/qp.c:975:2:
error: case value 4671 not in enumerated type enum ib_qp_type

   * Changed ib_latency_class to mlx5_ib_latency_class, visible only in 
low-level driver
   * Eliminated the unused IB_WR_xxx_PSV enums
   * Defined macros MLX5_IB_SEND_UMR_UNREG, MLX5_IB_QPT_REG_UMR, and 
MLX5_IB_WR_UMR, 
 taking advantage of the reserved values added to the ib_core enums.

   * debug-mask removed from mlx5_ib
   * Regarding mlx5_core, still have a debug mask to enable printouts of 
command data and 
   * Removed forced -Wall -Werror -DDEBUG settings in the mlx5 core/ib makefiles

changes from V0:
 - Per Dave's request, cross posting to both netdev and linux-rdma, to see 
   if there are comments from netdev on the core driver.

From: Eli Cohen e...@mellanox.com

The patches that follow constitute the driver for Mellanox's 5th generation
of HCAs named Connect-IB.

The driver is comprised of two kernel modules: mlx5_ib and mlx5_core. This
partitioning resembles what we have for mlx4 with the substantial difference
that mlx5_ib is the pci device driver and not mlx5_core.

mlx5_core provides general functionality that is intended to be used by
other Mellanox devices that will be introduced in the future. In this sense,
it can be perceived as a library. mlx5_ib has a similar role as any hardware
device under drivers/infiniband/hw.

The patches are partitioned to avoid exceeding the 100KB vger.kernel.org
limitation. They are divided such that the first three ones have the code
of the mlx5_core driver, and the last five the code of the mlx5_ib driver.

Only the last patch per driver adds the Makefiles and Kconfigs, to make
things robust for future bisections.

PPC is not yet supported but support will be included in the near future.

Eli Cohen (8):
  net/mlx5: Mellanox Connect-IB, core driver part 1/3
  net/mlx5: Mellanox Connect-IB, core driver part 2/3
  net/mlx5: Mellanox Connect-IB, core driver part 3/3
  IB/mlx5: Mellanox Connect-IB, IB driver part 1/5
  IB/mlx5: Mellanox Connect-IB, IB driver part 2/5
  IB/mlx5: Mellanox Connect-IB, IB driver part 3/5
  IB/mlx5: Mellanox Connect-IB, IB driver part 4/5
  IB/mlx5: Mellanox Connect-IB, IB driver part 5/5

Jack Morgenstein (1):
  IB/core: Add reserved values to enums for low-level drivers use

 MAINTAINERS|   22 +
 drivers/infiniband/Kconfig |1 +
 drivers/infiniband/Makefile|1 +
 drivers/infiniband/hw/mlx5/Kconfig |   10 +
 drivers/infiniband/hw/mlx5/Makefile|3 +
 drivers/infiniband/hw/mlx5/ah.c|   95 +
 drivers/infiniband/hw/mlx5/cq.c|  844 +++
 drivers/infiniband/hw/mlx5/doorbell.c  |  100 +
 drivers/infiniband/hw/mlx5/mad.c   |  139 ++
 drivers/infiniband/hw/mlx5/main.c  | 1504 
 drivers/infiniband/hw/mlx5/mem.c   |  162 ++
 drivers/infiniband/hw/mlx5/mlx5_ib.h   |  547 +
 drivers/infiniband/hw/mlx5/mr.c| 1021 
 drivers/infiniband/hw/mlx5/qp.c| 2537 
 drivers/infiniband/hw/mlx5/srq.c   |  478 
 drivers/infiniband/hw/mlx5/user.h  |  121 +
 drivers/net/ethernet/mellanox/Kconfig  |1 +
 drivers/net/ethernet/mellanox/Makefile |1 +
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig|   18 +
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |5 +
 

[PATCH V2 7/9] IB/mlx5: Mellanox Connect-IB, IB driver part 3/5

2013-07-03 Thread Or Gerlitz
From: Eli Cohen e...@mellanox.com

Signed-off-by: Eli Cohen e...@mellanox.com
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  547 ++
 drivers/infiniband/hw/mlx5/mr.c  | 1021 ++
 2 files changed, 1568 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/mlx5/mlx5_ib.h
 create mode 100644 drivers/infiniband/hw/mlx5/mr.c

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
new file mode 100644
index 000..d2067c3
--- /dev/null
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -0,0 +1,547 @@
+/*
+ * Copyright (c) 2013, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef MLX5_IB_H
+#define MLX5_IB_H
+
+#include linux/kernel.h
+#include linux/sched.h
+#include rdma/ib_verbs.h
+#include rdma/ib_smi.h
+#include linux/mlx5/driver.h
+#include linux/mlx5/cq.h
+#include linux/mlx5/qp.h
+#include linux/mlx5/srq.h
+#include linux/types.h
+
+#define mlx5_ib_dbg(dev, format, arg...)   \
+do {   \
+   pr_debug(%s:%s:%d:(pid %d):  format, (dev)-ib_dev.name,  \
+__func__, __LINE__, current-pid, ##arg);  \
+} while (0)
+
+#define mlx5_ib_err(dev, format, arg...)   \
+pr_err(%s:%s:%d:(pid %d):  format, (dev)-ib_dev.name, __func__, \
+   __LINE__, current-pid, ##arg)
+
+#define mlx5_ib_warn(dev, format, arg...)  \
+pr_warn(%s:%s:%d:(pid %d):  format, (dev)-ib_dev.name, __func__,\
+   __LINE__, current-pid, ##arg)
+
+enum {
+   MLX5_IB_MMAP_CMD_SHIFT  = 8,
+   MLX5_IB_MMAP_CMD_MASK   = 0xff,
+};
+
+enum mlx5_ib_mmap_cmd {
+   MLX5_IB_MMAP_REGULAR_PAGE   = 0,
+   MLX5_IB_MMAP_GET_CONTIGUOUS_PAGES   = 1, /* always last */
+};
+
+enum {
+   MLX5_RES_SCAT_DATA32_CQE= 0x1,
+   MLX5_RES_SCAT_DATA64_CQE= 0x2,
+   MLX5_REQ_SCAT_DATA32_CQE= 0x11,
+   MLX5_REQ_SCAT_DATA64_CQE= 0x22,
+};
+
+enum mlx5_ib_latency_class {
+   MLX5_IB_LATENCY_CLASS_LOW,
+   MLX5_IB_LATENCY_CLASS_MEDIUM,
+   MLX5_IB_LATENCY_CLASS_HIGH,
+   MLX5_IB_LATENCY_CLASS_FAST_PATH
+};
+
+enum mlx5_ib_mad_ifc_flags {
+   MLX5_MAD_IFC_IGNORE_MKEY= 1,
+   MLX5_MAD_IFC_IGNORE_BKEY= 2,
+   MLX5_MAD_IFC_NET_VIEW   = 4,
+};
+
+struct mlx5_ib_ucontext {
+   struct ib_ucontext  ibucontext;
+   struct list_headdb_page_list;
+
+   /* protect doorbell record alloc/free
+*/
+   struct mutexdb_page_mutex;
+   struct mlx5_uuar_info   uuari;
+};
+
+static inline struct mlx5_ib_ucontext *to_mucontext(struct ib_ucontext 
*ibucontext)
+{
+   return container_of(ibucontext, struct mlx5_ib_ucontext, ibucontext);
+}
+
+struct mlx5_ib_pd {
+   struct ib_pdibpd;
+   u32 pdn;
+   u32 pa_lkey;
+};
+
+/* Use macros here so that don't have to duplicate
+ * enum ib_send_flags and enum ib_qp_type for low-level driver
+ */
+
+#define MLX5_IB_SEND_UMR_UNREG IB_SEND_RESERVED_START
+#define MLX5_IB_QPT_REG_UMRIB_QPT_RESERVED1
+#define MLX5_IB_WR_UMR IB_WR_RESERVED1
+
+struct wr_list {
+   u16 opcode;
+   u16 next;
+};
+
+struct mlx5_ib_wq {
+   u64*wrid;
+   u32*wr_data;
+   struct wr_list *w_list;
+   unsigned   *wqe_head;
+   u16 unsig_count;
+
+   /* 

Re: rtnl_lock deadlock on 3.10

2013-07-03 Thread Or Gerlitz

On 03/07/2013 20:22, Shawn Bohrer wrote:

On Wed, Jul 03, 2013 at 07:33:07AM +0200, Hannes Frederic Sowa wrote:

On Wed, Jul 03, 2013 at 07:11:52AM +0200, Hannes Frederic Sowa wrote:

On Tue, Jul 02, 2013 at 01:38:26PM +, Cong Wang wrote:

On Tue, 02 Jul 2013 at 08:28 GMT, Hannes Frederic Sowa 
han...@stressinduktion.org wrote:

On Mon, Jul 01, 2013 at 09:54:56AM -0500, Shawn Bohrer wrote:

I've managed to hit a deadlock at boot a couple times while testing
the 3.10 rc kernels.  It seems to always happen when my network
devices are initializing.  This morning I updated to v3.10 and made a
few config tweaks and so far I've hit it 4 out of 5 reboots.  It looks
like most processes are getting stuck on rtnl_lock.  Below is a boot
log with the soft lockup prints.  Please let know if there is any
other information I can provide:

Could you try a build with CONFIG_LOCKDEP enabled?


The problem is clear: ib_register_device() is called with rtnl_lock,
but itself needs device_mutex, however, ib_register_client() first
acquires device_mutex, then indirectly calls register_netdev() which
takes rtnl_lock. Deadlock!

One possible fix is always taking rtnl_lock before taking
device_mutex, something like below:

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 18c1ece..890870b 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -381,6 +381,7 @@ int ib_register_client(struct ib_client *client)
  {
struct ib_device *device;
  
+	rtnl_lock();

mutex_lock(device_mutex);
  
  	list_add_tail(client-list, client_list);

@@ -389,6 +390,7 @@ int ib_register_client(struct ib_client *client)
client-add(device);
  
  	mutex_unlock(device_mutex);

+   rtnl_unlock();
  
  	return 0;

  }
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index b6e049a..5a7a048 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1609,7 +1609,7 @@ static struct net_device *ipoib_add_port(const char 
*format,
goto event_failed;
}
  
-	result = register_netdev(priv-dev);

+   result = register_netdevice(priv-dev);
if (result) {
printk(KERN_WARNING %s: couldn't register ipoib port %d; error 
%d\n,
   hca-name, port, result);

Looks good to me. Shawn, could you test this patch?

ib_unregister_device/ib_unregister_client would need the same change,
too. I have not checked the other -add() and -remove() functions. Also
cc'ed linux-rdma@vger.kernel.org, Roland Dreier.

Cong's patch is missing the #include linux/rtnetlink.h but otherwise
I've had 34 successful reboots with no deadlocks which is a good sign.
It sounds like there are more paths that need to be audited and a
proper patch submitted.  I can do more testing later if needed.

Thanks,
Shawn



Guys, I was a bit busy today looking into that, but I don't think we 
want the IB core layer  (core/device.c) to

use rtnl locking which is something that belongs to the network stack.

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 for-next 3/4] IB/core: Export ib_create/destroy_flow through uverbs

2013-07-03 Thread Or Gerlitz
From: Hadar Hen Zion had...@mellanox.com

Implement ib_uverbs_create_flow and ib_uverbs_destroy_flow to
support flow steering for user space applications.

Signed-off-by: Hadar Hen Zion had...@mellanox.com
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/core/uverbs.h  |3 +
 drivers/infiniband/core/uverbs_cmd.c  |  199 +
 drivers/infiniband/core/uverbs_main.c |   13 ++-
 include/rdma/ib_verbs.h   |1 +
 include/uapi/rdma/ib_user_verbs.h |   88 ++-
 5 files changed, 302 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index 0fcd7aa..ad9d102 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -155,6 +155,7 @@ extern struct idr ib_uverbs_cq_idr;
 extern struct idr ib_uverbs_qp_idr;
 extern struct idr ib_uverbs_srq_idr;
 extern struct idr ib_uverbs_xrcd_idr;
+extern struct idr ib_uverbs_rule_idr;
 
 void idr_remove_uobj(struct idr *idp, struct ib_uobject *uobj);
 
@@ -215,5 +216,7 @@ IB_UVERBS_DECLARE_CMD(destroy_srq);
 IB_UVERBS_DECLARE_CMD(create_xsrq);
 IB_UVERBS_DECLARE_CMD(open_xrcd);
 IB_UVERBS_DECLARE_CMD(close_xrcd);
+IB_UVERBS_DECLARE_CMD(create_flow);
+IB_UVERBS_DECLARE_CMD(destroy_flow);
 
 #endif /* UVERBS_H */
diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index a7d00f6..bfc53f7 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -54,6 +54,7 @@ static struct uverbs_lock_class qp_lock_class = { .name = 
QP-uobj };
 static struct uverbs_lock_class ah_lock_class  = { .name = AH-uobj };
 static struct uverbs_lock_class srq_lock_class = { .name = SRQ-uobj };
 static struct uverbs_lock_class xrcd_lock_class = { .name = XRCD-uobj };
+static struct uverbs_lock_class rule_lock_class = { .name = RULE-uobj };
 
 #define INIT_UDATA(udata, ibuf, obuf, ilen, olen)  \
do {\
@@ -330,6 +331,7 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
INIT_LIST_HEAD(ucontext-srq_list);
INIT_LIST_HEAD(ucontext-ah_list);
INIT_LIST_HEAD(ucontext-xrcd_list);
+   INIT_LIST_HEAD(ucontext-rule_list);
ucontext-closing = 0;
 
resp.num_comp_vectors = file-device-num_comp_vectors;
@@ -2587,6 +2589,203 @@ out_put:
return ret ? ret : in_len;
 }
 
+static int kern_spec_to_ib_spec(struct ib_kern_spec *kern_spec,
+   struct _ib_flow_spec *ib_spec)
+{
+   ib_spec-type = kern_spec-type;
+
+   switch (ib_spec-type) {
+   case IB_FLOW_SPEC_ETH:
+   ib_spec-eth.size = sizeof(struct ib_flow_spec_eth);
+   memcpy(ib_spec-eth.val, kern_spec-eth.val,
+  sizeof(struct ib_flow_eth_filter));
+   memcpy(ib_spec-eth.mask, kern_spec-eth.mask,
+  sizeof(struct ib_flow_eth_filter));
+   break;
+   case IB_FLOW_SPEC_IPV4:
+   ib_spec-ipv4.size = sizeof(struct ib_flow_spec_ipv4);
+   memcpy(ib_spec-ipv4.val, kern_spec-ipv4.val,
+  sizeof(struct ib_flow_ipv4_filter));
+   memcpy(ib_spec-ipv4.mask, kern_spec-ipv4.mask,
+  sizeof(struct ib_flow_ipv4_filter));
+   break;
+   case IB_FLOW_SPEC_TCP:
+   case IB_FLOW_SPEC_UDP:
+   ib_spec-tcp_udp.size = sizeof(struct ib_flow_spec_tcp_udp);
+   memcpy(ib_spec-tcp_udp.val, kern_spec-tcp_udp.val,
+  sizeof(struct ib_flow_tcp_udp_filter));
+   memcpy(ib_spec-tcp_udp.mask, kern_spec-tcp_udp.mask,
+  sizeof(struct ib_flow_tcp_udp_filter));
+   break;
+   default:
+   return -EINVAL;
+   }
+   return 0;
+}
+
+ssize_t ib_uverbs_create_flow(struct ib_uverbs_file *file,
+ const char __user *buf, int in_len,
+ int out_len)
+{
+   struct ib_uverbs_create_flow  cmd;
+   struct ib_uverbs_create_flow_resp resp;
+   struct ib_uobject *uobj;
+   struct ib_flow*flow_id;
+   struct ib_kern_flow_attr  *kern_flow_attr;
+   struct ib_flow_attr   *flow_attr;
+   struct ib_qp  *qp;
+   int err = 0;
+   void *kern_spec;
+   void *ib_spec;
+   int i;
+
+   if (out_len  sizeof(resp))
+   return -ENOSPC;
+
+   if (copy_from_user(cmd, buf, sizeof(cmd)))
+   return -EFAULT;
+
+   if ((cmd.flow_attr.type == IB_FLOW_ATTR_SNIFFER 
+!capable(CAP_NET_ADMIN)) || !capable(CAP_NET_RAW))
+   return -EPERM;
+
+   if (cmd.flow_attr.num_of_specs) {
+   kern_flow_attr = kmalloc(cmd.flow_attr.size, GFP_KERNEL);
+   if (!kern_flow_attr

[PATCH V3 for-next 2/4] IB/core: Infra-structure to support verbs extensions through uverbs

2013-07-03 Thread Or Gerlitz
From: Igor Ivanov igor.iva...@itseez.com

Add Infra-structure to support extended uverbs capabilities in a 
forward/backward
manner. Uverbs command opcodes which are based on the verbs extensions approach 
should
be greater or equal to IB_USER_VERBS_CMD_THRESHOLD. They have new header format
and processed a bit differently.

Signed-off-by: Igor Ivanov igor.iva...@itseez.com
Signed-off-by: Hadar Hen Zion had...@mellanox.com
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/core/uverbs_main.c |   29 -
 include/uapi/rdma/ib_user_verbs.h |   10 ++
 2 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_main.c 
b/drivers/infiniband/core/uverbs_main.c
index 2c6f0f2..e4e7b24 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -583,9 +583,6 @@ static ssize_t ib_uverbs_write(struct file *filp, const 
char __user *buf,
if (copy_from_user(hdr, buf, sizeof hdr))
return -EFAULT;
 
-   if (hdr.in_words * 4 != count)
-   return -EINVAL;
-
if (hdr.command = ARRAY_SIZE(uverbs_cmd_table) ||
!uverbs_cmd_table[hdr.command])
return -EINVAL;
@@ -597,8 +594,30 @@ static ssize_t ib_uverbs_write(struct file *filp, const 
char __user *buf,
if (!(file-device-ib_dev-uverbs_cmd_mask  (1ull  hdr.command)))
return -ENOSYS;
 
-   return uverbs_cmd_table[hdr.command](file, buf + sizeof hdr,
-hdr.in_words * 4, hdr.out_words * 
4);
+   if (hdr.command = IB_USER_VERBS_CMD_THRESHOLD) {
+   struct ib_uverbs_cmd_hdr_ex hdr_ex;
+
+   if (copy_from_user(hdr_ex, buf, sizeof(hdr_ex)))
+   return -EFAULT;
+
+   if (((hdr_ex.in_words + hdr_ex.provider_in_words) * 4) != count)
+   return -EINVAL;
+
+   return uverbs_cmd_table[hdr.command](file,
+buf + sizeof(hdr_ex),
+(hdr_ex.in_words +
+ hdr_ex.provider_in_words) 
* 4,
+(hdr_ex.out_words +
+ 
hdr_ex.provider_out_words) * 4);
+   } else {
+   if (hdr.in_words * 4 != count)
+   return -EINVAL;
+
+   return uverbs_cmd_table[hdr.command](file,
+buf + sizeof(hdr),
+hdr.in_words * 4,
+hdr.out_words * 4);
+   }
 }
 
 static int ib_uverbs_mmap(struct file *filp, struct vm_area_struct *vma)
diff --git a/include/uapi/rdma/ib_user_verbs.h 
b/include/uapi/rdma/ib_user_verbs.h
index 805711e..61535aa 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -43,6 +43,7 @@
  * compatibility are made.
  */
 #define IB_USER_VERBS_ABI_VERSION  6
+#define IB_USER_VERBS_CMD_THRESHOLD50
 
 enum {
IB_USER_VERBS_CMD_GET_CONTEXT,
@@ -123,6 +124,15 @@ struct ib_uverbs_cmd_hdr {
__u16 out_words;
 };
 
+struct ib_uverbs_cmd_hdr_ex {
+   __u32 command;
+   __u16 in_words;
+   __u16 out_words;
+   __u16 provider_in_words;
+   __u16 provider_out_words;
+   __u32 cmd_hdr_reserved;
+};
+
 struct ib_uverbs_get_context {
__u64 response;
__u64 driver_data[0];
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 for-next 0/4] Add receive Flow Steering support

2013-07-03 Thread Or Gerlitz
Hi Roland, all

V3 addresses the comments made by Sean. There are still some concerns/questions 
posed 
by Roland on the uverbs extensions element of the series. I have posted replies 
for
them, but so far no further comments were made. 

V3 changes:
  - Addressed comments from Sean:
  - modified the change-log of patch #1 to be clearer on the priority and domain
semantics and usage
  - re-arranged the fields of struct ib_flow_attr
  - removed check from ib_flow_destroy
  - removed the IB flow spec which wasn't inline with the L2/L3/L4 approach
done for Ethernet/IP/TCP|UDP, will use proper IB flow specs when adding
the support for IPoIB flow steering

 
V2 changes:
  - dropped struct ib_kern_flow from patch #3, this structure wasn't 
used and was left there by mistake (bug, thanks Roland)
  - removed the void *flow_context field from struct ib_flow, this was 
pointing to driver private data for that flow, but doesn't belong here, 
i.e need not be seen by the verbs consumer but rather hidden.
  - renamed struct mlx4_flow_handle to mlx4_ib_flow, a structure that contains 
the verbs level struct ib_flow and the mlx4 registeration ID for that flow

V1 changes:

 - dropped the five pre-patches which were accepted into 3.10
 - rebased the patches against Roland's for-next / 3.10-rc4
 - in patch #3, ib_uverbs_destroy_flow was returning too quickly when the driver
   returned failure for ib_destroy_flow, need to free some uverbs resources 1st.
 - in patch #4, check index before accessing the array at 
mlx4_ib_create/destroy_flow

These patches add Flow Steering support to the kernel IB core, to uverbs and 
to the mlx4 IB (verbs) driver along with one patch to uverbs which adds 
some code to support extensions.

  IB/core: Add receive Flow Steering support
  IB/core: Infra-structure to support verbs extensions through uverbs
  IB/core: Export ib_create/destroy_flow through uverbs
  IB/mlx4: Add receive Flow Steering support

The main patch which introduces the Flow-Steering API is IB/core: Add receive 
Flow 
Steering support, see its change log. Looking on the Network Adapter Flow 
Steering 
slides from Tzahi Oved which he presented on the annual OFA 2012 meeting could 
be helpful
https://www.openfabrics.org/resources/document-downloads/presentations/doc_download/518-network-adapter-flow-steering.html

Or.

Hadar Hen Zion (3):
  IB/core: Add receive Flow Steering support
  IB/core: Export ib_create/destroy_flow through uverbs
  IB/mlx4: Add receive Flow Steering support

Igor Ivanov (1):
  IB/core: Infra-structure to support verbs extensions through uverbs

 drivers/infiniband/core/uverbs.h  |3 +
 drivers/infiniband/core/uverbs_cmd.c  |  199 
 drivers/infiniband/core/uverbs_main.c |   42 +-
 drivers/infiniband/core/verbs.c   |   27 
 drivers/infiniband/hw/mlx4/main.c |  235 +
 drivers/infiniband/hw/mlx4/mlx4_ib.h  |   12 ++
 include/linux/mlx4/device.h   |5 -
 include/rdma/ib_verbs.h   |  122 +-
 include/uapi/rdma/ib_user_verbs.h |   98 ++-
 9 files changed, 729 insertions(+), 14 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 for-next 4/4] IB/mlx4: Add receive Flow Steering support

2013-07-03 Thread Or Gerlitz
From: Hadar Hen Zion had...@mellanox.com

Implement ib_create_flow and ib_destroy_flow.

Translate the verbs structures provided by the user to HW structures
and call the MLX4_QP_FLOW_STEERING_ATTACH/DETACH firmware commands.

On the ATTACH command completion, the firmware provides 64 bit registration
ID which is placed into struct mlx4_ib_flow that wraps the instance of
struct ib_flow which is retuned to caller. Later, this reg ID is used
for detaching that flow from the firmware.

Signed-off-by: Hadar Hen Zion had...@mellanox.com
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/hw/mlx4/main.c|  235 ++
 drivers/infiniband/hw/mlx4/mlx4_ib.h |   12 ++
 include/linux/mlx4/device.h  |5 -
 3 files changed, 247 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index a188d31..5b5518f 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -54,6 +54,8 @@
 #define DRV_VERSION1.0
 #define DRV_RELDATEApril 4, 2008
 
+#define MLX4_IB_FLOW_MAX_PRIO 0xFFF
+
 MODULE_AUTHOR(Roland Dreier);
 MODULE_DESCRIPTION(Mellanox ConnectX HCA InfiniBand driver);
 MODULE_LICENSE(Dual BSD/GPL);
@@ -88,6 +90,25 @@ static void init_query_mad(struct ib_smp *mad)
 
 static union ib_gid zgid;
 
+static int check_flow_steering_support(struct mlx4_dev *dev)
+{
+   int ib_num_ports = 0;
+   int i;
+
+   mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_IB)
+   ib_num_ports++;
+
+   if (dev-caps.steering_mode == MLX4_STEERING_MODE_DEVICE_MANAGED) {
+   if (ib_num_ports || mlx4_is_mfunc(dev)) {
+   pr_warn(Device managed flow steering is unavailable 
+   for IB ports or in multifunction env.\n);
+   return 0;
+   }
+   return 1;
+   }
+   return 0;
+}
+
 static int mlx4_ib_query_device(struct ib_device *ibdev,
struct ib_device_attr *props)
 {
@@ -144,6 +165,8 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
props-device_cap_flags |= IB_DEVICE_MEM_WINDOW_TYPE_2B;
else
props-device_cap_flags |= IB_DEVICE_MEM_WINDOW_TYPE_2A;
+   if (check_flow_steering_support(dev-dev))
+   props-device_cap_flags |= IB_DEVICE_MANAGED_FLOW_STEERING;
}
 
props-vendor_id   = be32_to_cpup((__be32 *) (out_mad-data + 
36)) 
@@ -798,6 +821,209 @@ struct mlx4_ib_steering {
union ib_gid gid;
 };
 
+static int parse_flow_attr(struct mlx4_dev *dev,
+  struct _ib_flow_spec *ib_spec,
+  struct _rule_hw *mlx4_spec)
+{
+   enum mlx4_net_trans_rule_id type;
+
+   switch (ib_spec-type) {
+   case IB_FLOW_SPEC_ETH:
+   type = MLX4_NET_TRANS_RULE_ID_ETH;
+   memcpy(mlx4_spec-eth.dst_mac, ib_spec-eth.val.dst_mac,
+  ETH_ALEN);
+   memcpy(mlx4_spec-eth.dst_mac_msk, ib_spec-eth.mask.dst_mac,
+  ETH_ALEN);
+   mlx4_spec-eth.vlan_tag = ib_spec-eth.val.vlan_tag;
+   mlx4_spec-eth.vlan_tag_msk = ib_spec-eth.mask.vlan_tag;
+   break;
+
+   case IB_FLOW_SPEC_IPV4:
+   type = MLX4_NET_TRANS_RULE_ID_IPV4;
+   mlx4_spec-ipv4.src_ip = ib_spec-ipv4.val.src_ip;
+   mlx4_spec-ipv4.src_ip_msk = ib_spec-ipv4.mask.src_ip;
+   mlx4_spec-ipv4.dst_ip = ib_spec-ipv4.val.dst_ip;
+   mlx4_spec-ipv4.dst_ip_msk = ib_spec-ipv4.mask.dst_ip;
+   break;
+
+   case IB_FLOW_SPEC_TCP:
+   case IB_FLOW_SPEC_UDP:
+   type = ib_spec-type == IB_FLOW_SPEC_TCP ?
+   MLX4_NET_TRANS_RULE_ID_TCP :
+   MLX4_NET_TRANS_RULE_ID_UDP;
+   mlx4_spec-tcp_udp.dst_port = ib_spec-tcp_udp.val.dst_port;
+   mlx4_spec-tcp_udp.dst_port_msk = 
ib_spec-tcp_udp.mask.dst_port;
+   mlx4_spec-tcp_udp.src_port = ib_spec-tcp_udp.val.src_port;
+   mlx4_spec-tcp_udp.src_port_msk = 
ib_spec-tcp_udp.mask.src_port;
+   break;
+
+   default:
+   return -EINVAL;
+   }
+   if (mlx4_map_sw_to_hw_steering_id(dev, type)  0 ||
+   mlx4_hw_rule_sz(dev, type)  0)
+   return -EINVAL;
+   mlx4_spec-id = cpu_to_be16(mlx4_map_sw_to_hw_steering_id(dev, type));
+   mlx4_spec-size = mlx4_hw_rule_sz(dev, type)  2;
+   return mlx4_hw_rule_sz(dev, type);
+}
+
+static int __mlx4_ib_create_flow(struct ib_qp *qp, struct ib_flow_attr 
*flow_attr,
+ int domain,
+ enum mlx4_net_trans_promisc_mode flow_type,
+ u64 *reg_id)
+{
+   int ret, i;
+   int size = 0;
+   void *ib_flow;
+   struct

[PATCH V3 for-next 1/4] IB/core: Add receive Flow Steering support

2013-07-03 Thread Or Gerlitz
From: Hadar Hen Zion had...@mellanox.com

The RDMA stack allows for applications to create IB_QPT_RAW_PACKET QPs,
for which plain Ethernet packets are used, specifically packets which
don't carry any QPN to be matched by the receiving side.

Applications using these QPs must be provided with a method to
program some steering rule with the HW so packets arriving at
the local port can be routed to them.

This patch adds ib_create_flow which allow to provide a flow specification
for a QP, such that when there's a match between the specification and the
received packet, it can be forwarded to that QP, in a similar manner
one needs to use ib_attach_multicast for IB UD multicast handling.

Flow specifications are provided as instances of struct ib_flow_spec_yyy
which describe L2, L3 and L4 headers, currently specs for Ethernet, IPv4,
TCP and UDP are defined. Flow specs are made of values and masks.

The input to ib_create_flow is instance of struct ib_flow_attr which
contain few mandatory control elements and optional flow specs.

struct ib_flow_attr {
enum ib_flow_attr_type type;
u16  size;
u16  priority;
u32  flags;
u8   num_of_specs;
u8   port;
/* Following are the optional layers according to user request
 * struct ib_flow_spec_yyy
 * struct ib_flow_spec_zzz
 */
};

As these specs are eventually coming from user space, they are defined and
used in a way which allows adding new spec types without kernel/user ABI
change, and with a little API enhancement which defines the newly added spec.

The flow spec structures are defined in a TLV (Type-Length-Value) manner,
which allows to call ib_create_flow with a list of variable length of
optional specs.

For the actual processing of ib_flow_attr the driver uses the number of
specs and the size mandatory fields along with the TLV nature of the specs.

Steering rules processing order is according to the domain over which
the rule is set and the rule priority. All rules set by user space
applicatations fall into the IB_FLOW_DOMAIN_USER domain, other domains
could be used by future IPoIB RFS and Ethetool flow-steering interface
implementation. Lower priority numerical value means higher priority.

The returned value from ib_create_flow is instance of struct ib_flow
which contains a database pointer (handle) provided by the HW driver
to be used when calling ib_destroy_flow.

Applications that offload TCP/IP traffic could be written also over IB UD QPs.
As such, the ib_create_flow / ib_destroy_flow API is designed to support UD QPs
too, the HW driver sets IB_DEVICE_MANAGED_FLOW_STEERING to denote support
of flow steering.

The ib_flow_attr enum type relates to usage of flow steering for promiscuous
and sniffer purposes:

IB_FLOW_ATTR_NORMAL - regular rule, steering according to rule specification

IB_FLOW_ATTR_ALL_DEFAULT - default unicast and multicast rule, receive
all Ethernet traffic which isn't steered to any QP

IB_FLOW_ATTR_MC_DEFAULT - same as IB_FLOW_ATTR_ALL_DEFAULT but only for 
multicast

IB_FLOW_ATTR_SNIFFER - sniffer rule, receive all port traffic

ALL_DEFAULT and MC_DEFAULT rules options are valid only for Ethernet link type.

Signed-off-by: Hadar Hen Zion had...@mellanox.com
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/core/verbs.c |   27 +
 include/rdma/ib_verbs.h |  121 ++-
 2 files changed, 146 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 22192de..87a8102 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1254,3 +1254,30 @@ int ib_dealloc_xrcd(struct ib_xrcd *xrcd)
return xrcd-device-dealloc_xrcd(xrcd);
 }
 EXPORT_SYMBOL(ib_dealloc_xrcd);
+
+struct ib_flow *ib_create_flow(struct ib_qp *qp,
+  struct ib_flow_attr *flow_attr,
+  int domain)
+{
+   struct ib_flow *flow_id;
+   if (!qp-device-create_flow)
+   return ERR_PTR(-ENOSYS);
+
+   flow_id = qp-device-create_flow(qp, flow_attr, domain);
+   if (!IS_ERR(flow_id))
+   atomic_inc(qp-usecnt);
+   return flow_id;
+}
+EXPORT_SYMBOL(ib_create_flow);
+
+int ib_destroy_flow(struct ib_flow *flow_id)
+{
+   int err;
+   struct ib_qp *qp = flow_id-qp;
+
+   err = qp-device-destroy_flow(flow_id);
+   if (!err)
+   atomic_dec(qp-usecnt);
+   return err;
+}
+EXPORT_SYMBOL(ib_destroy_flow);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 98cc4b2..1390a0f 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -116,7 +116,8 @@ enum ib_device_cap_flags {
IB_DEVICE_MEM_MGT_EXTENSIONS= (121),
IB_DEVICE_BLOCK_MULTICAST_LOOPBACK = (122),
IB_DEVICE_MEM_WINDOW_TYPE_2A= (123),
-   IB_DEVICE_MEM_WINDOW_TYPE_2B= (124

Re: [PATCH for-next 0/8] Add Mellanox mlx5 driver for Connect-IB devices

2013-07-03 Thread Or Gerlitz
On Wed, Jul 3, 2013 at 10:26 PM, Roland Dreier rol...@kernel.org wrote:
 On Wed, Jul 3, 2013 at 9:41 AM, Or Gerlitz ogerl...@mellanox.com wrote:
  Jack looked on this comment/code and he says that the active flag is used
  to prevent re-scheduling the timer from inside the timer handling routine.
 
  In the kernel, the comment header in the source file for del_timer_sync
  explicitly states that re-scheduling the timer must be prevented,
  or the sync is useless:Callers must prevent restarting of the timer,
  otherwise
  this function is meaningless
 
  So we believe that code should remain.

 Look at the actual timer code.  del_timer_sync() won't work if
 something unrelated re-adds the timer, but it will work if the timer
 itself is what re-adds itself.

[...]

OK, we will re-look into that tomorrow. So how V2 looks?

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 4/9] IB/core: Add reserved values to enums for low-level drivers use

2013-07-07 Thread Or Gerlitz
From: Jack Morgenstein ja...@dev.mellanox.co.il

Continue the approach taken by commit d2b57063e4a IB/core: Reserve bits in
enum ib_qp_create_flags for low-level driver use and reserved entries to
the ib_qp_type and ib_wr_opcode enums. The low-level drivers will then define
macros to use these reserved values, giving proper names to the macros for
readability. Also add a range of reserved flags to enum ib_send_flags.

The mlx5 IB driver uses the new additions.

Signed-off-by: Jack Morgenstein ja...@dev.mellanox.co.il
---
 include/rdma/ib_verbs.h |   35 +--
 1 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 98cc4b2..645c3ce 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -610,7 +610,21 @@ enum ib_qp_type {
IB_QPT_RAW_PACKET = 8,
IB_QPT_XRC_INI = 9,
IB_QPT_XRC_TGT,
-   IB_QPT_MAX
+   IB_QPT_MAX,
+   /* Reserve a range for qp types internal to the low level driver.
+* These qp types will not be visible at the IB core layer, so the
+* IB_QPT_MAX usages should not be affected in the core layer
+*/
+   IB_QPT_RESERVED1 = 0x1000,
+   IB_QPT_RESERVED2,
+   IB_QPT_RESERVED3,
+   IB_QPT_RESERVED4,
+   IB_QPT_RESERVED5,
+   IB_QPT_RESERVED6,
+   IB_QPT_RESERVED7,
+   IB_QPT_RESERVED8,
+   IB_QPT_RESERVED9,
+   IB_QPT_RESERVED10,
 };
 
 enum ib_qp_create_flags {
@@ -766,6 +780,19 @@ enum ib_wr_opcode {
IB_WR_MASKED_ATOMIC_CMP_AND_SWP,
IB_WR_MASKED_ATOMIC_FETCH_AND_ADD,
IB_WR_BIND_MW,
+   /* reserve values for low level drivers' internal use.
+* These values will not be used at all in the ib core layer.
+*/
+   IB_WR_RESERVED1 = 0xf0,
+   IB_WR_RESERVED2,
+   IB_WR_RESERVED3,
+   IB_WR_RESERVED4,
+   IB_WR_RESERVED5,
+   IB_WR_RESERVED6,
+   IB_WR_RESERVED7,
+   IB_WR_RESERVED8,
+   IB_WR_RESERVED9,
+   IB_WR_RESERVED10,
 };
 
 enum ib_send_flags {
@@ -773,7 +800,11 @@ enum ib_send_flags {
IB_SEND_SIGNALED= (11),
IB_SEND_SOLICITED   = (12),
IB_SEND_INLINE  = (13),
-   IB_SEND_IP_CSUM = (14)
+   IB_SEND_IP_CSUM = (14),
+
+   /* reserve bits 26-31 for low level drivers' internal use */
+   IB_SEND_RESERVED_START  = (1  26),
+   IB_SEND_RESERVED_END= (1  31),
 };
 
 struct ib_sge {
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 0/9] Add Mellanox mlx5 driver for Connect-IB devices

2013-07-07 Thread Or Gerlitz
Hi Roland,

Here's V3 of the mlx5 driver, with Dave's, Joe's and yours comments addressed.

Hoping that would be all for getting this into 3.11

Jack, Moshe and Or.

changes from V2:
 
- Addressed feedback from Joe Perches:
 * Added parentheses around sizeof
 * Removed unnecessary do-while for driver pr_debug envelope (done for 
mlx5_core.h as well)
 * Removed unneeded log output on memory allocation failures
 * Fixed some typos
 * Used snprintf instead of strcpy/strcat (safer and shorter)
 * Removed unnecessary local variable sgi from ib_create_ah()
 * Reduced vzalloc usage by trying to do kzalloc first and vzalloc only 
   if kzalloc fails

- Addressed Roland's feedback:
 * Removed unneeded active flag from health polling -- no need for active 
   flag for re-scheduling from within timer handler when using del_timer_sync.

- Also removed some calls to mlx5_ib_dbg() which had newline char only, and 
therefore 
  only did execution tracing.

changes from V1:

- Addreessed Dave Miller's comments:
   * Local variables in functions listed from longest to shortest
   * --i/++i changed to i--/i++ in all for-loops
   * Removed leading /* empty line from all comments
   * magic constants given names
   * endianness code moved to driver.h, and defined an endianness-dependent 
macro 
 for use in assignment.
   * destroy_msg_cache() duplicated code removed

- Addressed Roland's comments:

   * Renamed foo_spl to foo_lock for spinlocks.
   * Eliminated magic number from mlx5_cmd_stats field declaration in struct 
mlx5_cmd.
   * Eliminated unused procedure mlx5_ib_umem_populate_pas()
 command execution times, but all file-name-based mask bits removed.

   * Cleaned up mlx5_ib.h:
   * Added new patch for ib_verbs.h, adding reserved values to several enums
   * For several ib-core enums, added reserved values for use by low-level 
drivers. 
 By defining macros at the low level (i.e., renaming the reserved values, 
in effect), the 
 ll drivers may use these enums without needing to duplicate the ib-core 
enums while adding 
 extra values. This fixes compilation problems such as:
/home/roland/Src/linux-merge.git/drivers/infiniband/hw/mlx5/qp.c:975:2:
error: case value 4671 not in enumerated type enum ib_qp_type

   * Changed ib_latency_class to mlx5_ib_latency_class, visible only in 
low-level driver
   * Eliminated the unused IB_WR_xxx_PSV enums
   * Defined macros MLX5_IB_SEND_UMR_UNREG, MLX5_IB_QPT_REG_UMR, and 
MLX5_IB_WR_UMR, 
 taking advantage of the reserved values added to the ib_core enums.

   * debug-mask removed from mlx5_ib
   * Regarding mlx5_core, still have a debug mask to enable printouts of 
command data and 
   * Removed forced -Wall -Werror -DDEBUG settings in the mlx5 core/ib makefiles

changes from V0:
 - Per Dave's request, cross posting to both netdev and linux-rdma, to see 
   if there are comments from netdev on the core driver.

The patches that follow constitute the driver for Mellanox's 5th generation
of HCAs named Connect-IB.

The driver is comprised of two kernel modules: mlx5_ib and mlx5_core. This
partitioning resembles what we have for mlx4 with the substantial difference
that mlx5_ib is the pci device driver and not mlx5_core.

mlx5_core provides general functionality that is intended to be used by
other Mellanox devices that will be introduced in the future. In this sense,
it can be perceived as a library. mlx5_ib has a similar role as any hardware
device under drivers/infiniband/hw.

The patches are partitioned to avoid exceeding the 100KB vger.kernel.org
limitation. They are divided such that the first three ones have the code
of the mlx5_core driver, and the last five the code of the mlx5_ib driver.

Only the last patch per driver adds the Makefiles and Kconfigs, to make
things robust for future bisections.

PPC is not yet supported but support will be included in the near future.

Eli Cohen (8):
  net/mlx5: Mellanox Connect-IB, core driver part 1/3
  net/mlx5: Mellanox Connect-IB, core driver part 2/3
  net/mlx5: Mellanox Connect-IB, core driver part 3/3
  IB/mlx5: Mellanox Connect-IB, IB driver part 1/5
  IB/mlx5: Mellanox Connect-IB, IB driver part 2/5
  IB/mlx5: Mellanox Connect-IB, IB driver part 3/5
  IB/mlx5: Mellanox Connect-IB, IB driver part 4/5
  IB/mlx5: Mellanox Connect-IB, IB driver part 5/5

Jack Morgenstein (1):
  IB/core: Add reserved values to enums for low-level drivers use

 MAINTAINERS|   22 +
 drivers/infiniband/Kconfig |1 +
 drivers/infiniband/Makefile|1 +
 drivers/infiniband/hw/mlx5/Kconfig |   10 +
 drivers/infiniband/hw/mlx5/Makefile|3 +
 drivers/infiniband/hw/mlx5/ah.c|   92 +
 drivers/infiniband/hw/mlx5/cq.c|  843 +++
 drivers/infiniband/hw/mlx5/doorbell.c  |  100 +
 drivers/infiniband/hw/mlx5/mad.c   |  139 ++
 

[PATCH V3 7/9] IB/mlx5: Mellanox Connect-IB, IB driver part 3/5

2013-07-07 Thread Or Gerlitz
From: Eli Cohen e...@mellanox.com

Signed-off-by: Eli Cohen e...@mellanox.com
Signed-off-by: Jack Morgenstein ja...@dev.mellanox.co.il
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  545 ++
 drivers/infiniband/hw/mlx5/mr.c  | 1014 ++
 2 files changed, 1559 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/mlx5/mlx5_ib.h
 create mode 100644 drivers/infiniband/hw/mlx5/mr.c

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
new file mode 100644
index 000..836be91
--- /dev/null
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -0,0 +1,545 @@
+/*
+ * Copyright (c) 2013, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef MLX5_IB_H
+#define MLX5_IB_H
+
+#include linux/kernel.h
+#include linux/sched.h
+#include rdma/ib_verbs.h
+#include rdma/ib_smi.h
+#include linux/mlx5/driver.h
+#include linux/mlx5/cq.h
+#include linux/mlx5/qp.h
+#include linux/mlx5/srq.h
+#include linux/types.h
+
+#define mlx5_ib_dbg(dev, format, arg...)   \
+pr_debug(%s:%s:%d:(pid %d):  format, (dev)-ib_dev.name, __func__,   \
+__LINE__, current-pid, ##arg)
+
+#define mlx5_ib_err(dev, format, arg...)   \
+pr_err(%s:%s:%d:(pid %d):  format, (dev)-ib_dev.name, __func__, \
+   __LINE__, current-pid, ##arg)
+
+#define mlx5_ib_warn(dev, format, arg...)  \
+pr_warn(%s:%s:%d:(pid %d):  format, (dev)-ib_dev.name, __func__,\
+   __LINE__, current-pid, ##arg)
+
+enum {
+   MLX5_IB_MMAP_CMD_SHIFT  = 8,
+   MLX5_IB_MMAP_CMD_MASK   = 0xff,
+};
+
+enum mlx5_ib_mmap_cmd {
+   MLX5_IB_MMAP_REGULAR_PAGE   = 0,
+   MLX5_IB_MMAP_GET_CONTIGUOUS_PAGES   = 1, /* always last */
+};
+
+enum {
+   MLX5_RES_SCAT_DATA32_CQE= 0x1,
+   MLX5_RES_SCAT_DATA64_CQE= 0x2,
+   MLX5_REQ_SCAT_DATA32_CQE= 0x11,
+   MLX5_REQ_SCAT_DATA64_CQE= 0x22,
+};
+
+enum mlx5_ib_latency_class {
+   MLX5_IB_LATENCY_CLASS_LOW,
+   MLX5_IB_LATENCY_CLASS_MEDIUM,
+   MLX5_IB_LATENCY_CLASS_HIGH,
+   MLX5_IB_LATENCY_CLASS_FAST_PATH
+};
+
+enum mlx5_ib_mad_ifc_flags {
+   MLX5_MAD_IFC_IGNORE_MKEY= 1,
+   MLX5_MAD_IFC_IGNORE_BKEY= 2,
+   MLX5_MAD_IFC_NET_VIEW   = 4,
+};
+
+struct mlx5_ib_ucontext {
+   struct ib_ucontext  ibucontext;
+   struct list_headdb_page_list;
+
+   /* protect doorbell record alloc/free
+*/
+   struct mutexdb_page_mutex;
+   struct mlx5_uuar_info   uuari;
+};
+
+static inline struct mlx5_ib_ucontext *to_mucontext(struct ib_ucontext 
*ibucontext)
+{
+   return container_of(ibucontext, struct mlx5_ib_ucontext, ibucontext);
+}
+
+struct mlx5_ib_pd {
+   struct ib_pdibpd;
+   u32 pdn;
+   u32 pa_lkey;
+};
+
+/* Use macros here so that don't have to duplicate
+ * enum ib_send_flags and enum ib_qp_type for low-level driver
+ */
+
+#define MLX5_IB_SEND_UMR_UNREG IB_SEND_RESERVED_START
+#define MLX5_IB_QPT_REG_UMRIB_QPT_RESERVED1
+#define MLX5_IB_WR_UMR IB_WR_RESERVED1
+
+struct wr_list {
+   u16 opcode;
+   u16 next;
+};
+
+struct mlx5_ib_wq {
+   u64*wrid;
+   u32*wr_data;
+   struct wr_list *w_list;
+   unsigned   *wqe_head;
+   u16 unsig_count;
+
+   /* serialize post

[PATCH V3 5/9] IB/mlx5: Mellanox Connect-IB, IB driver part 1/5

2013-07-07 Thread Or Gerlitz
From: Eli Cohen e...@mellanox.com

Signed-off-by: Eli Cohen e...@mellanox.com
Signed-off-by: Jack Morgenstein ja...@dev.mellanox.co.il
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/hw/mlx5/ah.c   |   92 
 drivers/infiniband/hw/mlx5/cq.c   |  843 +
 drivers/infiniband/hw/mlx5/doorbell.c |  100 
 drivers/infiniband/hw/mlx5/mad.c  |  139 ++
 4 files changed, 1174 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/mlx5/ah.c
 create mode 100644 drivers/infiniband/hw/mlx5/cq.c
 create mode 100644 drivers/infiniband/hw/mlx5/doorbell.c
 create mode 100644 drivers/infiniband/hw/mlx5/mad.c

diff --git a/drivers/infiniband/hw/mlx5/ah.c b/drivers/infiniband/hw/mlx5/ah.c
new file mode 100644
index 000..39ab0ca
--- /dev/null
+++ b/drivers/infiniband/hw/mlx5/ah.c
@@ -0,0 +1,92 @@
+/*
+ * Copyright (c) 2013, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include mlx5_ib.h
+
+struct ib_ah *create_ib_ah(struct ib_ah_attr *ah_attr,
+  struct mlx5_ib_ah *ah)
+{
+   if (ah_attr-ah_flags  IB_AH_GRH) {
+   memcpy(ah-av.rgid, ah_attr-grh.dgid, 16);
+   ah-av.grh_gid_fl = cpu_to_be32(ah_attr-grh.flow_label |
+   (1  30) |
+   ah_attr-grh.sgid_index  20);
+   ah-av.hop_limit = ah_attr-grh.hop_limit;
+   ah-av.tclass = ah_attr-grh.traffic_class;
+   }
+
+   ah-av.rlid = cpu_to_be16(ah_attr-dlid);
+   ah-av.fl_mlid = ah_attr-src_path_bits  0x7f;
+   ah-av.stat_rate_sl = (ah_attr-static_rate  4) | (ah_attr-sl  0xf);
+
+   return ah-ibah;
+}
+
+struct ib_ah *mlx5_ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr)
+{
+   struct mlx5_ib_ah *ah;
+
+   ah = kzalloc(sizeof(*ah), GFP_ATOMIC);
+   if (!ah)
+   return ERR_PTR(-ENOMEM);
+
+   return create_ib_ah(ah_attr, ah); /* never fails */
+}
+
+int mlx5_ib_query_ah(struct ib_ah *ibah, struct ib_ah_attr *ah_attr)
+{
+   struct mlx5_ib_ah *ah = to_mah(ibah);
+   u32 tmp;
+
+   memset(ah_attr, 0, sizeof(*ah_attr));
+
+   tmp = be32_to_cpu(ah-av.grh_gid_fl);
+   if (tmp  (1  30)) {
+   ah_attr-ah_flags = IB_AH_GRH;
+   ah_attr-grh.sgid_index = (tmp  20)  0xff;
+   ah_attr-grh.flow_label = tmp  0xf;
+   memcpy(ah_attr-grh.dgid, ah-av.rgid, 16);
+   ah_attr-grh.hop_limit = ah-av.hop_limit;
+   ah_attr-grh.traffic_class = ah-av.tclass;
+   }
+   ah_attr-dlid = be16_to_cpu(ah-av.rlid);
+   ah_attr-static_rate = ah-av.stat_rate_sl  4;
+   ah_attr-sl = ah-av.stat_rate_sl  0xf;
+
+   return 0;
+}
+
+int mlx5_ib_destroy_ah(struct ib_ah *ah)
+{
+   kfree(to_mah(ah));
+   return 0;
+}
diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
new file mode 100644
index 000..344ab03
--- /dev/null
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -0,0 +1,843 @@
+/*
+ * Copyright (c) 2013, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following

[PATCH V3 9/9] IB/mlx5: Mellanox Connect-IB, IB driver part 5/5

2013-07-07 Thread Or Gerlitz
From: Eli Cohen e...@mellanox.com

Signed-off-by: Eli Cohen e...@mellanox.com
Signed-off-by: Jack Morgenstein ja...@dev.melanox.co.il
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 MAINTAINERS |   10 ++
 drivers/infiniband/Kconfig  |1 +
 drivers/infiniband/Makefile |1 +
 drivers/infiniband/hw/mlx5/Kconfig  |   10 ++
 drivers/infiniband/hw/mlx5/Makefile |3 +++
 5 files changed, 25 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/mlx5/Kconfig
 create mode 100644 drivers/infiniband/hw/mlx5/Makefile

diff --git a/MAINTAINERS b/MAINTAINERS
index 6e82fb5..b426536 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5377,6 +5377,16 @@ S:   Supported
 F: drivers/net/ethernet/mellanox/mlx5/core/
 F: include/linux/mlx5/
 
+Mellanox MLX5 IB driver
+M:  Eli Cohen e...@mellanox.com
+L:  linux-rdma@vger.kernel.org
+W:  http://www.mellanox.com
+Q:  http://patchwork.kernel.org/project/linux-rdma/list/
+T:  git://openfabrics.org/~eli/connect-ib.git
+S:  Supported
+F:  include/linux/mlx5/
+F:  drivers/infiniband/hw/mlx5/
+
 MODULE SUPPORT
 M: Rusty Russell ru...@rustcorp.com.au
 S: Maintained
diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index c85b56c..5ceda71 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -50,6 +50,7 @@ source drivers/infiniband/hw/amso1100/Kconfig
 source drivers/infiniband/hw/cxgb3/Kconfig
 source drivers/infiniband/hw/cxgb4/Kconfig
 source drivers/infiniband/hw/mlx4/Kconfig
+source drivers/infiniband/hw/mlx5/Kconfig
 source drivers/infiniband/hw/nes/Kconfig
 source drivers/infiniband/hw/ocrdma/Kconfig
 
diff --git a/drivers/infiniband/Makefile b/drivers/infiniband/Makefile
index b126fef..1fe6988 100644
--- a/drivers/infiniband/Makefile
+++ b/drivers/infiniband/Makefile
@@ -7,6 +7,7 @@ obj-$(CONFIG_INFINIBAND_AMSO1100)   += hw/amso1100/
 obj-$(CONFIG_INFINIBAND_CXGB3) += hw/cxgb3/
 obj-$(CONFIG_INFINIBAND_CXGB4) += hw/cxgb4/
 obj-$(CONFIG_MLX4_INFINIBAND)  += hw/mlx4/
+obj-$(CONFIG_MLX5_INFINIBAND)  += hw/mlx5/
 obj-$(CONFIG_INFINIBAND_NES)   += hw/nes/
 obj-$(CONFIG_INFINIBAND_OCRDMA)+= hw/ocrdma/
 obj-$(CONFIG_INFINIBAND_IPOIB) += ulp/ipoib/
diff --git a/drivers/infiniband/hw/mlx5/Kconfig 
b/drivers/infiniband/hw/mlx5/Kconfig
new file mode 100644
index 000..8e6aebf
--- /dev/null
+++ b/drivers/infiniband/hw/mlx5/Kconfig
@@ -0,0 +1,10 @@
+config MLX5_INFINIBAND
+   tristate Mellanox Connect-IB HCA support
+   depends on NETDEVICES  ETHERNET  PCI  X86
+   select NET_VENDOR_MELLANOX
+   select MLX5_CORE
+   ---help---
+ This driver provides low-level InfiniBand support for
+ Mellanox Connect-IB PCI Express host channel adapters (HCAs).
+ This is required to use InfiniBand protocols such as
+ IP-over-IB or SRP with these devices.
diff --git a/drivers/infiniband/hw/mlx5/Makefile 
b/drivers/infiniband/hw/mlx5/Makefile
new file mode 100644
index 000..4ea0135
--- /dev/null
+++ b/drivers/infiniband/hw/mlx5/Makefile
@@ -0,0 +1,3 @@
+obj-$(CONFIG_MLX5_INFINIBAND)  += mlx5_ib.o
+
+mlx5_ib-y :=   main.o cq.o doorbell.o qp.o mem.o srq.o mr.o ah.o mad.o
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 6/9] IB/mlx5: Mellanox Connect-IB, IB driver part 2/5

2013-07-07 Thread Or Gerlitz
From: Eli Cohen e...@mellanox.com

Signed-off-by: Eli Cohen e...@mellanox.com
Signed-off-by: Jack Morgenstein ja...@dev.mellanox.co.il
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 drivers/infiniband/hw/mlx5/main.c | 1504 +
 drivers/infiniband/hw/mlx5/mem.c  |  162 
 2 files changed, 1666 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/mlx5/main.c
 create mode 100644 drivers/infiniband/hw/mlx5/mem.c

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
new file mode 100644
index 000..6b1007f
--- /dev/null
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -0,0 +1,1504 @@
+/*
+ * Copyright (c) 2013, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include asm-generic/kmap_types.h
+#include linux/module.h
+#include linux/init.h
+#include linux/errno.h
+#include linux/pci.h
+#include linux/dma-mapping.h
+#include linux/slab.h
+#include linux/io-mapping.h
+#include linux/sched.h
+#include rdma/ib_user_verbs.h
+#include rdma/ib_smi.h
+#include rdma/ib_umem.h
+#include user.h
+#include mlx5_ib.h
+
+#define DRIVER_NAME mlx5_ib
+#define DRIVER_VERSION 1.0
+#define DRIVER_RELDATE June 2013
+
+MODULE_AUTHOR(Eli Cohen e...@mellanox.com);
+MODULE_DESCRIPTION(Mellanox Connect-IB HCA IB driver);
+MODULE_LICENSE(Dual BSD/GPL);
+MODULE_VERSION(DRIVER_VERSION);
+
+static int prof_sel = 2;
+module_param_named(prof_sel, prof_sel, int, 0444);
+MODULE_PARM_DESC(prof_sel, profile selector. Valid range 0 - 2);
+
+static char mlx5_version[] =
+   DRIVER_NAME : Mellanox Connect-IB Infiniband driver v
+   DRIVER_VERSION  ( DRIVER_RELDATE )\n;
+
+struct mlx5_profile profile[] = {
+   [0] = {
+   .mask   = 0,
+   },
+   [1] = {
+   .mask   = MLX5_PROF_MASK_QP_SIZE,
+   .log_max_qp = 12,
+   },
+   [2] = {
+   .mask   = MLX5_PROF_MASK_QP_SIZE |
+ MLX5_PROF_MASK_MR_CACHE,
+   .log_max_qp = 17,
+   .mr_cache[0]= {
+   .size   = 500,
+   .limit  = 250
+   },
+   .mr_cache[1]= {
+   .size   = 500,
+   .limit  = 250
+   },
+   .mr_cache[2]= {
+   .size   = 500,
+   .limit  = 250
+   },
+   .mr_cache[3]= {
+   .size   = 500,
+   .limit  = 250
+   },
+   .mr_cache[4]= {
+   .size   = 500,
+   .limit  = 250
+   },
+   .mr_cache[5]= {
+   .size   = 500,
+   .limit  = 250
+   },
+   .mr_cache[6]= {
+   .size   = 500,
+   .limit  = 250
+   },
+   .mr_cache[7]= {
+   .size   = 500,
+   .limit  = 250
+   },
+   .mr_cache[8]= {
+   .size   = 500,
+   .limit  = 250
+   },
+   .mr_cache[9]= {
+   .size   = 500,
+   .limit  = 250
+   },
+   .mr_cache[10]   = {
+   .size   = 500,
+   .limit  = 250
+   },
+   .mr_cache[11]   = {
+   .size   = 500,
+   .limit

wrong email address in mlx5 patch signature

2013-07-08 Thread Or Gerlitz

Hi Roland,

There's a typo in Jack's email address which is our mistake, was in V3 
9/9, please fix
it to be Jack Morgenstein ja...@dev.mellanox.co.il  (the error is 
missing l in mellanox)


thanks,

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


<    5   6   7   8   9   10   11   12   13   14   >