[PATCH] [TRIVIAL] opensm/osm_log.h: fix function documentation

2012-09-20 Thread Yevgeny Kliteynik

Signed-off-by: Yevgeny Kliteynik klit...@dev.mellanox.co.il
---
 include/opensm/osm_log.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/opensm/osm_log.h b/include/opensm/osm_log.h
index 3247296..61ba750 100644
--- a/include/opensm/osm_log.h
+++ b/include/opensm/osm_log.h
@@ -1,6 +1,6 @@
 /*
  * Copyright (c) 2004-2009 Voltaire, Inc. All rights reserved.
- * Copyright (c) 2002-2006 Mellanox Technologies LTD. All rights reserved.
+ * Copyright (c) 2002-2012 Mellanox Technologies LTD. All rights reserved.
  * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
  *
  * This software is available to you under a choice of one of two
@@ -427,7 +427,7 @@ static inline void osm_log_set_level(IN osm_log_t * p_log,
 *  [in] New level to set.
 *
 * RETURN VALUES
-*  Returns the current log level.
+*  This function does not return a value.
 *
 * NOTES
 *
-- 
1.7.11.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Correct option names in opensm man page

2012-09-20 Thread Bart Van Assche
Long options have to be preceded by a double dash instead of a single.

Signed-off-by: Bart Van Assche bvanass...@acm.org
---
 man/opensm.8.in |   30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/man/opensm.8.in b/man/opensm.8.in
index 79ff6a5..4e61c2e 100644
--- a/man/opensm.8.in
+++ b/man/opensm.8.in
@@ -11,7 +11,7 @@ opensm \- InfiniBand subnet manager and administration (SM/SA)
 [\-g(uid) GUID in hex]
 [\-l(mc) LMC]
 [\-p(riority) PRIORITY]
-[\-smkey SM_Key]
+[\-\-smkey SM_Key]
 [\-\-sm_sl SL number]
 [\-r(eassign_lids)]
 [\-R engine name(s) | \-\-routing_engine engine name(s)]
@@ -34,9 +34,9 @@ opensm \- InfiniBand subnet manager and administration (SM/SA)
 [\-s(weep) interval]
 [\-t(imeout) milliseconds]
 [\-\-retries number]
-[\-maxsmps number]
-[\-console [off | local | socket | loopback]]
-[\-console-port port]
+[\-\-maxsmps number]
+[\-\-console [off | local | socket | loopback]]
+[\-\-console-port port]
 [\-i(gnore-guids) equalize-ignore-guids-file]
 [\-w | \-\-hop_weights_file path to file]
 [\-O | \-\-port_search_ordering_file path to file]
@@ -134,7 +134,7 @@ This will effect the handover cases, where master
 is chosen by priority and GUID.  Range goes from 0
 (default and lowest priority) to 15 (highest).
 .TP
-\fB\-smkey\fR SM_Key value
+\fB\-\-smkey\fR SM_Key value
 This option specifies the SM\'s SM_Key (64 bits).
 This will effect SM authentication.
 Note that OpenSM version 3.2.1 and below used the default value '1'
@@ -263,15 +263,15 @@ for transactions.
 Without --retries, OpenSM defaults to 3 retries
 for transactions.
 .TP
-\fB\-maxsmps\fR number
+\fB\-\-maxsmps\fR number
 This option specifies the number of VL15 SMP MADs
 allowed on the wire at any one time.
-Specifying -maxsmps 0 allows unlimited outstanding
+Specifying \-\-maxsmps 0 allows unlimited outstanding
 SMPs.
-Without -maxsmps, OpenSM defaults to a maximum of
+Without \-\-maxsmps, OpenSM defaults to a maximum of
 4 outstanding SMPs.
 .TP
-\fB\-console [off | local | loopback | socket]\fR
+\fB\-\-console [off | local | loopback | socket]\fR
 This option brings up the OpenSM console (default off).  Note, loopback and
 socket open a socket which can be connected to WITHOUT CREDENTIALS.  Loopback
 is safer if access to your SM host is controlled.  tcp_wrappers
@@ -279,12 +279,12 @@ is safer if access to your SM host is controlled.  
tcp_wrappers
 will only be available if OpenSM was built with --enable-console-loopback
 (default yes) and --enable-console-socket (default no) respectively.
 .TP
-\fB\-console-port\fR port
+\fB\-\-console-port\fR port
 Specify an alternate telnet port for the socket console (default 1).
 Note that this option only appears if OpenSM was built with
 --enable-console-socket.
 .TP
-\fB\-i\fR, \fB\-ignore-guids\fR equalize-ignore-guids-file
+\fB\-i\fR, \fB\-\-ignore-guids\fR equalize-ignore-guids-file
 This option provides the means to define a set of ports
 (by node guid and port number) that will be ignored by the link load
 equalization algorithm.
@@ -320,7 +320,7 @@ listed on a line are assigned to the remaining dimensions, 
in port
 order.  Anything after a # is a comment.
 .TP
 \fB\-O\fR, \fB\-\-dimn_ports_file\fR path to file \fB(DEPRECATED)\fR
-This is a deprecated flag. Please use \fB-port_search_ordering_file\fR instead.
+This is a deprecated flag. Please use \fB\-\-port_search_ordering_file\fR 
instead.
 This option provides a mapping between hypercube dimensions and ports
 on a per switch basis for the DOR routing engine.  The file consists
 of lines containing a switch node GUID (specified as a 64 bit hex
@@ -409,12 +409,12 @@ option can be used in conjunction with the perfmgr so as 
to
 run a standalone performance manager without SM/SA.  However,
 this is NOT currently implemented in the performance manager.
 .TP
-\fB\-perfmgr\fR
+\fB\-\-perfmgr\fR
 Enable the perfmgr.  Only takes effect if --enable-perfmgr was specified at
 configure time.  See performance-manager-HOWTO.txt in opensm doc for
 more information on running perfmgr.
 .TP
-\fB\-perfmgr_sweep_time_s\fR seconds
+\fB\-\-perfmgr_sweep_time_s\fR seconds
 Specify the sweep time for the performance manager in seconds
 (default is 180 seconds).  Only takes
 effect if --enable-perfmgr was specified at configure time.
@@ -962,7 +962,7 @@ port GUID. The latter is supplied by:
 
 -i equalize-ignore-guids-file
 .br
--ignore-guids equalize-ignore-guids-file
+\-\-ignore-guids equalize-ignore-guids-file
   This option provides the means to define a set of ports
   (by guid) that will be ignored by the link load
   equalization algorithm. Note that only endports (CA,
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 25/25] userns: Convert ipathfs to use GLOBAL_ROOT_UID and GLOBAL_ROOT_GID

2012-09-20 Thread Marciniszyn, Mike
 Subject: [PATCH 25/25] userns: Convert ipathfs to use GLOBAL_ROOT_UID and
 GLOBAL_ROOT_GID
 
 From: Eric W. Biederman ebied...@xmission.com
 
 Cc: Mike Marciniszyn infinip...@intel.com
 Acked-by: Serge Hallyn serge.hal...@canonical.com
 Signed-off-by: Eric W. Biederman ebied...@xmission.com

Thanks for the patch!

Acked-by: Mike Marciniszyn mike.marcinis...@intel.com
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to preserve QP over HA events for librdmacm applications

2012-09-20 Thread Pradeep Satyanarayana

On 09/19/2012 11:14 AM, Atchley, Scott wrote:

On Sep 19, 2012, at 1:05 PM, Hefty, Seansean.he...@intel.com  wrote:


I too would be interested in bringing a QP from error back to a usable state. I
have been debating whether to reconnect using the current RDMA calls versus
trying to transition the existing RC QP.

I assumed to transition the existing QP that I would need to open a socket to
coordinate the two sides. Is that correct?

If I were instead to use rdma_connect(), does it require a new CM id or just a
new QP within the same id?


What if you say pre-created a second (fail over) QP for HA purposes all 
under the covers of a single socket? And both QPs were connected before 
the failure. Not sure if that would work with the same CM id though. If 
not, we will need to rdma_connect() the second QP after failure.


By having a second QP and bound to say a different port/device, one 
could survive not just link up/down events, but device failures too. 
Would that be more generic?


Thanks
Pradeep

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to preserve QP over HA events for librdmacm applications

2012-09-20 Thread Atchley, Scott
On Sep 20, 2012, at 1:37 PM, Pradeep Satyanarayana 
prade...@linux.vnet.ibm.com wrote:

 On 09/19/2012 11:14 AM, Atchley, Scott wrote:
 On Sep 19, 2012, at 1:05 PM, Hefty, Seansean.he...@intel.com  wrote:
 
 I too would be interested in bringing a QP from error back to a usable 
 state. I
 have been debating whether to reconnect using the current RDMA calls versus
 trying to transition the existing RC QP.
 
 I assumed to transition the existing QP that I would need to open a socket 
 to
 coordinate the two sides. Is that correct?
 
 If I were instead to use rdma_connect(), does it require a new CM id or 
 just a
 new QP within the same id?
 
 What if you say pre-created a second (fail over) QP for HA purposes all 
 under the covers of a single socket? And both QPs were connected before 
 the failure. Not sure if that would work with the same CM id though. If 
 not, we will need to rdma_connect() the second QP after failure.
 
 By having a second QP and bound to say a different port/device, one 
 could survive not just link up/down events, but device failures too. 
 Would that be more generic?

Hi Pradeep,

What is the memory cost of a QP? I assume it will require a second CM id as 
well.

Involving a second device and/or port is not an option for my usage.

Scott--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] RDMA/nes: Fix for incorrect resolving the loopback MAC address

2012-09-20 Thread Tatyana Nikolova
Loopback code changes to fix incorrect resolving of loopback MAC address.

Signed-off-by: Tatyana Nikolova tatyana.e.nikol...@intel.com
---
 drivers/infiniband/hw/nes/nes_cm.c |   32 +++-
 1 files changed, 7 insertions(+), 25 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes_cm.c 
b/drivers/infiniband/hw/nes/nes_cm.c
index 020e95c..49a9383 100644
--- a/drivers/infiniband/hw/nes/nes_cm.c
+++ b/drivers/infiniband/hw/nes/nes_cm.c
@@ -1356,7 +1356,7 @@ static int nes_addr_resolve_neigh(struct nes_vnic 
*nesvnic, u32 dst_ip, int arpi
else
netdev = nesvnic-netdev;
 
-   neigh = dst_neigh_lookup(rt-dst, dst_ip);
+   neigh = neigh_lookup(arp_tbl, rt-rt_gateway, netdev);
 
rcu_read_lock();
if (neigh) {
@@ -1465,12 +1465,8 @@ static struct nes_cm_node *make_cm_node(struct 
nes_cm_core *cm_core,
cm_node-loopbackpartner = NULL;
 
/* get the mac addr for the remote node */
-   if (ipv4_is_loopback(htonl(cm_node-rem_addr))) {
-   arpindex = nes_arp_table(nesdev, ntohl(nesvnic-local_ipaddr), 
NULL, NES_ARP_RESOLVE);
-   } else {
-   oldarpindex = nes_arp_table(nesdev, cm_node-rem_addr, NULL, 
NES_ARP_RESOLVE);
-   arpindex = nes_addr_resolve_neigh(nesvnic, cm_info-rem_addr, 
oldarpindex);
-   }
+   oldarpindex = nes_arp_table(nesdev, cm_node-rem_addr, NULL, 
NES_ARP_RESOLVE);
+   arpindex = nes_addr_resolve_neigh(nesvnic, cm_info-rem_addr, 
oldarpindex);
if (arpindex  0) {
kfree(cm_node);
return NULL;
@@ -3153,11 +3149,7 @@ int nes_accept(struct iw_cm_id *cm_id, struct 
iw_cm_conn_param *conn_param)
nesqp-nesqp_context-tcpPorts[1] =
cpu_to_le16(ntohs(cm_id-remote_addr.sin_port));
 
-   if (ipv4_is_loopback(cm_id-remote_addr.sin_addr.s_addr))
-   nesqp-nesqp_context-ip0 =
-   cpu_to_le32(ntohl(nesvnic-local_ipaddr));
-   else
-   nesqp-nesqp_context-ip0 =
+   nesqp-nesqp_context-ip0 =
cpu_to_le32(ntohl(cm_id-remote_addr.sin_addr.s_addr));
 
nesqp-nesqp_context-misc2 |= cpu_to_le32(
@@ -3182,10 +3174,7 @@ int nes_accept(struct iw_cm_id *cm_id, struct 
iw_cm_conn_param *conn_param)
memset(nes_quad, 0, sizeof(nes_quad));
nes_quad.DstIpAdrIndex =
cpu_to_le32((u32)PCI_FUNC(nesdev-pcidev-devfn)  24);
-   if (ipv4_is_loopback(cm_id-remote_addr.sin_addr.s_addr))
-   nes_quad.SrcIpadr = nesvnic-local_ipaddr;
-   else
-   nes_quad.SrcIpadr = cm_id-remote_addr.sin_addr.s_addr;
+   nes_quad.SrcIpadr = cm_id-remote_addr.sin_addr.s_addr;
nes_quad.TcpPorts[0] = cm_id-remote_addr.sin_port;
nes_quad.TcpPorts[1] = cm_id-local_addr.sin_port;
 
@@ -3538,11 +3527,7 @@ static void cm_event_connected(struct nes_cm_event 
*event)
cpu_to_le16(ntohs(cm_id-local_addr.sin_port));
nesqp-nesqp_context-tcpPorts[1] =
cpu_to_le16(ntohs(cm_id-remote_addr.sin_port));
-   if (ipv4_is_loopback(cm_id-remote_addr.sin_addr.s_addr))
-   nesqp-nesqp_context-ip0 =
-   cpu_to_le32(ntohl(nesvnic-local_ipaddr));
-   else
-   nesqp-nesqp_context-ip0 =
+   nesqp-nesqp_context-ip0 =
cpu_to_le32(ntohl(cm_id-remote_addr.sin_addr.s_addr));
 
nesqp-nesqp_context-misc2 |= cpu_to_le32(
@@ -3571,10 +3556,7 @@ static void cm_event_connected(struct nes_cm_event 
*event)
 
nes_quad.DstIpAdrIndex =
cpu_to_le32((u32)PCI_FUNC(nesdev-pcidev-devfn)  24);
-   if (ipv4_is_loopback(cm_id-remote_addr.sin_addr.s_addr))
-   nes_quad.SrcIpadr = nesvnic-local_ipaddr;
-   else
-   nes_quad.SrcIpadr = cm_id-remote_addr.sin_addr.s_addr;
+   nes_quad.SrcIpadr = cm_id-remote_addr.sin_addr.s_addr;
nes_quad.TcpPorts[0] = cm_id-remote_addr.sin_port;
nes_quad.TcpPorts[1] = cm_id-local_addr.sin_port;
 
-- 
1.7.4.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] RDMA/nes: Fix for incorrect MSS when tso is on

2012-09-20 Thread Tatyana Nikolova
In nes tso handling code, skb_shared_info is used to get the MSS instead of 
bool function skb_is_gso() which returns 1

Signed-off-by: Tatyana Nikolova tatyana.e.nikol...@intel.com
---
 drivers/infiniband/hw/nes/nes_nic.c |   14 +++---
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes_nic.c 
b/drivers/infiniband/hw/nes/nes_nic.c
index f3a3ecf..1c02ba7 100644
--- a/drivers/infiniband/hw/nes/nes_nic.c
+++ b/drivers/infiniband/hw/nes/nes_nic.c
@@ -390,10 +390,10 @@ static int nes_nic_send(struct sk_buff *skb, struct 
net_device *netdev)
tcph = tcp_hdr(skb);
if (1) {
if (skb_is_gso(skb)) {
-   /* nes_debug(NES_DBG_NIC_TX, %s: TSO 
request... seg size = %u\n,
-   netdev-name, skb_is_gso(skb)); 
*/
+   /* nes_debug(NES_DBG_NIC_TX, %s: TSO 
request... is_gso = %u seg size = %u\n,
+   netdev-name, skb_is_gso(skb), 
skb_shinfo(skb)-gso_size); */
wqe_misc |= NES_NIC_SQ_WQE_LSO_ENABLE |
-   NES_NIC_SQ_WQE_COMPLETION | 
(u16)skb_is_gso(skb);
+   NES_NIC_SQ_WQE_COMPLETION | 
(u16)skb_shinfo(skb)-gso_size;
set_wqe_32bit_value(nic_sqe-wqe_words, 
NES_NIC_SQ_WQE_LSO_INFO_IDX,
((u32)tcph-doff) |
(((u32)(((unsigned char *)tcph) 
- skb-data))  4));
@@ -597,10 +597,10 @@ tso_sq_no_longer_full:
nes_debug(NES_DBG_NIC_TX, ERROR: SKB 
header too big, headlen=%u, FIRST_FRAG_SIZE=%u\n,
original_first_length, 
NES_FIRST_FRAG_SIZE);
nes_debug(NES_DBG_NIC_TX, %s Request 
to tx NIC packet length %u, headlen %u,
-(%u frags), 
tso_size=%u\n,
+(%u frags), is_gso = 
%u tso_size=%u\n,
netdev-name,
skb-len, 
skb_headlen(skb),
-   
skb_shinfo(skb)-nr_frags, skb_is_gso(skb));
+   
skb_shinfo(skb)-nr_frags, skb_is_gso(skb), skb_shinfo(skb)-gso_size);
}

memcpy(nesnic-first_frag_vbase[nesnic-sq_head].buffer,
skb-data, min(((unsigned 
int)NES_FIRST_FRAG_SIZE),
@@ -652,8 +652,8 @@ tso_sq_no_longer_full:
} else {
nesnic-tx_skb[nesnic-sq_head] = NULL;
}
-   wqe_misc |= NES_NIC_SQ_WQE_COMPLETION | 
(u16)skb_is_gso(skb);
-   if ((tso_wqe_length + original_first_length)  
skb_is_gso(skb)) {
+   wqe_misc |= NES_NIC_SQ_WQE_COMPLETION | 
(u16)skb_shinfo(skb)-gso_size;
+   if ((tso_wqe_length + original_first_length)  
skb_shinfo(skb)-gso_size) {
wqe_misc |= NES_NIC_SQ_WQE_LSO_ENABLE;
} else {
iph-tot_len = htons(tso_wqe_length + 
original_first_length - nhoffset);
-- 
1.7.4.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: how to preserve QP over HA events for librdmacm applications

2012-09-20 Thread Hefty, Sean
 What if you say pre-created a second (fail over) QP for HA purposes all
 under the covers of a single socket? And both QPs were connected before
 the failure. Not sure if that would work with the same CM id though. If
 not, we will need to rdma_connect() the second QP after failure.

CM IDs are not shared across devices, and can't be reused for different QPs 
until the first connection has been torn down and gone through timewait.  For 
IB, you probably want path migration capabilities.

Anything more generic should really be handled by the application.  Migrating a 
connection between devices also requires using different CQs, PDs, MRs, etc.

- Sean
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH libibverbs 0/3] add raw packet QP, new helper and examples cleanups

2012-09-20 Thread Or Gerlitz
Hi Roland,

This batch of libibverbs patches include the raw packet QP which
was submitted long ago, with new change-log, taken from the kernel,
also a helper verbs from Dotan and Hal related to the new IB link speeds,
and cleanup patch from Dotan for the examples.

Or.

Dotan Barak (2):
  Add helpers to deal with new InfiniBand link speeds
  Fix resource leaks in the pingpong examples present in the failure/error 
flows.

Or Gerlitz (1):
  Add raw packet QP type

 Makefile.am|6 +++-
 examples/rc_pingpong.c |   43 ++---
 examples/srq_pingpong.c|   51 ---
 examples/uc_pingpong.c |   43 ++---
 examples/ud_pingpong.c |   43 ++---
 include/infiniband/verbs.h |   26 -
 man/ibv_create_qp.3|2 +-
 man/ibv_modify_qp.3|   10 
 man/ibv_rate_to_mbps.3 |   45 ++
 src/libibverbs.map |3 ++
 src/verbs.c|   48 +
 11 files changed, 282 insertions(+), 38 deletions(-)
 create mode 100644 man/ibv_rate_to_mbps.3

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH libibverbs 1/3] Add raw packet QP type

2012-09-20 Thread Or Gerlitz
IB_QPT_RAW_PACKET allows applications to build a complete packet,
including L2 headers, when sending; on the receive side, the HW will
not strip any headers.

This QP type is designed for userspace direct access to Ethernet; for
example by applications that do TCP/IP themselves.  Only processes
with the NET_RAW capability are allowed to create raw packet QPs (the
name raw packet QP is supposed to suggest an analogy to AF_PACKET /
SOL_RAW sockets).

Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 include/infiniband/verbs.h |3 ++-
 man/ibv_create_qp.3|2 +-
 man/ibv_modify_qp.3|   10 ++
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
index 6acfc81..8ed8a66 100644
--- a/include/infiniband/verbs.h
+++ b/include/infiniband/verbs.h
@@ -399,7 +399,8 @@ struct ibv_srq_init_attr {
 enum ibv_qp_type {
IBV_QPT_RC = 2,
IBV_QPT_UC,
-   IBV_QPT_UD
+   IBV_QPT_UD,
+   IBV_QPT_RAW_PACKET = 8
 };
 
 struct ibv_qp_cap {
diff --git a/man/ibv_create_qp.3 b/man/ibv_create_qp.3
index 5301ad8..7feeab2 100644
--- a/man/ibv_create_qp.3
+++ b/man/ibv_create_qp.3
@@ -28,7 +28,7 @@ struct ibv_cq  *send_cq;/* CQ to be 
associated with the Send Que
 struct ibv_cq  *recv_cq;/* CQ to be associated with the 
Receive Queue (RQ) */
 struct ibv_srq *srq;/* SRQ handle if QP is to be 
associated with an SRQ, otherwise NULL */
 struct ibv_qp_cap   cap;/* QP capabilities */
-enum ibv_qp_typeqp_type;/* QP Transport Service Type: 
IBV_QPT_RC, IBV_QPT_UC, or IBV_QPT_UD */
+enum ibv_qp_typeqp_type;/* QP Transport Service Type: 
IBV_QPT_RC, IBV_QPT_UC, IBV_QPT_UD or IBV_QPT_RAW_PACKET */
 int sq_sig_all; /* If set, each Work Request (WR) 
submitted to the SQ generates a completion entry */
 .in -8
 };
diff --git a/man/ibv_modify_qp.3 b/man/ibv_modify_qp.3
index 9eabcdf..cb3faaa 100644
--- a/man/ibv_modify_qp.3
+++ b/man/ibv_modify_qp.3
@@ -159,6 +159,16 @@ RTR  \fB  IBV_QP_STATE, IBV_QP_AV, 
IBV_QP_PATH_MTU, \fR
 RTS  \fB  IBV_QP_STATE, IBV_QP_SQ_PSN, IBV_QP_MAX_QP_RD_ATOMIC, \fR
  \fB  IBV_QP_RETRY_CNT, IBV_QP_RNR_RETRY, IBV_QP_TIMEOUT \fR
 .fi
+.PP
+.nf
+For QP Transport Service Type \fB IBV_QPT_RAW_PACKET\fR:
+.sp
+Next state Required attributes
+\-\-\-\-\-\-\-\-\-\- 
\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+Init \fB  IBV_QP_STATE, IBV_QP_PORT\fR
+RTR  \fB  IBV_QP_STATE\fR
+RTS  \fB  IBV_QP_STATE\fR
+.fi
 .SH SEE ALSO
 .BR ibv_create_qp (3),
 .BR ibv_destroy_qp (3),
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH libibverbs 2/3] Add helpers to deal with new InfiniBand link speeds

2012-09-20 Thread Or Gerlitz
From: Dotan Barak dot...@dev.mellanox.co.il

Introduce support for the following extended speeds:

FDR:IBA extended speed 14.0625 Gbps.
EDR:IBA extended speed 25.78125 Gbps.

Signed-off-by: Dotan Barak dot...@dev.mellanox.co.il
Reviewed-by: Hal Rosenstock h...@mellanox.com
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 Makefile.am|6 +++-
 include/infiniband/verbs.h |   23 -
 man/ibv_rate_to_mbps.3 |   45 +
 src/libibverbs.map |3 ++
 src/verbs.c|   48 
 5 files changed, 122 insertions(+), 3 deletions(-)
 create mode 100644 man/ibv_rate_to_mbps.3

diff --git a/Makefile.am b/Makefile.am
index cd00a65..40e83be 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -54,7 +54,7 @@ man_MANS = man/ibv_asyncwatch.1 man/ibv_devices.1 
man/ibv_devinfo.1   \
 man/ibv_post_srq_recv.3 man/ibv_query_device.3 man/ibv_query_gid.3 \
 man/ibv_query_pkey.3 man/ibv_query_port.3 man/ibv_query_qp.3   \
 man/ibv_query_srq.3 man/ibv_rate_to_mult.3 man/ibv_reg_mr.3
\
-man/ibv_req_notify_cq.3 man/ibv_resize_cq.3
+man/ibv_req_notify_cq.3 man/ibv_resize_cq.3 man/ibv_rate_to_mbps.3
 
 DEBIAN = debian/changelog debian/compat debian/control debian/copyright \
 debian/ibverbs-utils.install debian/libibverbs1.install \
@@ -88,6 +88,7 @@ install-data-hook:
$(RM) mult_to_ibv_rate.3  \
$(RM) ibv_node_type_str.3  \
$(RM) ibv_port_state_str.3  \
+   $(RM) mbps_to_ibv_rate.3  \
$(LN_S) ibv_get_async_event.3 ibv_ack_async_event.3  \
$(LN_S) ibv_get_cq_event.3 ibv_ack_cq_events.3  \
$(LN_S) ibv_open_device.3 ibv_close_device.3  \
@@ -103,4 +104,5 @@ install-data-hook:
$(LN_S) ibv_create_ah_from_wc.3 ibv_init_ah_from_wc.3  \
$(LN_S) ibv_rate_to_mult.3 mult_to_ibv_rate.3  \
$(LN_S) ibv_event_type_str.3 ibv_node_type_str.3  \
-   $(LN_S) ibv_event_type_str.3 ibv_port_state_str.3
+   $(LN_S) ibv_event_type_str.3 ibv_port_state_str.3  \
+   $(LN_S) ibv_rate_to_mbps.3 mbps_to_ibv_rate.3
diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
index 8ed8a66..4b1ab57 100644
--- a/include/infiniband/verbs.h
+++ b/include/infiniband/verbs.h
@@ -353,7 +353,15 @@ enum ibv_rate {
IBV_RATE_40_GBPS  = 7,
IBV_RATE_60_GBPS  = 8,
IBV_RATE_80_GBPS  = 9,
-   IBV_RATE_120_GBPS = 10
+   IBV_RATE_120_GBPS = 10,
+   IBV_RATE_14_GBPS  = 11,
+   IBV_RATE_56_GBPS  = 12,
+   IBV_RATE_112_GBPS = 13,
+   IBV_RATE_168_GBPS = 14,
+   IBV_RATE_25_GBPS  = 15,
+   IBV_RATE_100_GBPS = 16,
+   IBV_RATE_200_GBPS = 17,
+   IBV_RATE_300_GBPS = 18
 };
 
 /**
@@ -370,6 +378,19 @@ int ibv_rate_to_mult(enum ibv_rate rate) __attribute_const;
  */
 enum ibv_rate mult_to_ibv_rate(int mult) __attribute_const;
 
+/**
+ * ibv_rate_to_mbps - Convert the IB rate enum to Mbit/sec.
+ * For example, IBV_RATE_5_GBPS will return the value 5000.
+ * @rate: rate to convert.
+ */
+int ibv_rate_to_mbps(enum ibv_rate rate) __attribute_const;
+
+/**
+ * mbps_to_ibv_rate - Convert a Mbit/sec value to an IB rate enum.
+ * @mbps: value to convert.
+ */
+enum ibv_rate mbps_to_ibv_rate(int mbps) __attribute_const;
+
 struct ibv_ah_attr {
struct ibv_global_route grh;
uint16_tdlid;
diff --git a/man/ibv_rate_to_mbps.3 b/man/ibv_rate_to_mbps.3
new file mode 100644
index 000..089db01
--- /dev/null
+++ b/man/ibv_rate_to_mbps.3
@@ -0,0 +1,45 @@
+.\ -*- nroff -*-
+.\
+.TH IBV_RATE_TO_MBPS 3 2012-03-31 libibverbs Libibverbs Programmer's Manual
+.SH NAME
+.nf
+ibv_rate_to_mbps \- convert IB rate enumeration to Mbit/sec
+.sp
+mbps_to_ibv_rate \- convert Mbit/sec to an IB rate enumeration
+.SH SYNOPSIS
+.nf
+.B #include infiniband/verbs.h
+.sp
+.BI int ibv_rate_to_mbps(enum ibv_rate  rate );
+.sp
+.BI enum ibv_rate mbps_to_ibv_rate(int  mbps );
+.fi
+.SH DESCRIPTION
+.B ibv_rate_to_mbps()
+converts the IB transmission rate enumeration
+.I rate
+to a number of Mbit/sec. For example, if
+.I rate
+is
+.BR IBV_RATE_5_GBPS\fR,
+the value 5000 will be returned (5 Gbit/sec = 5000 Mbit/sec).
+.PP
+.B mbps_to_ibv_rate()
+converts the number of Mbit/sec
+.I mult
+to an IB transmission rate enumeration. For example, if
+.I mult
+is 5000, the rate enumeration
+.BR IBV_RATE_5_GBPS
+will be returned.
+.SH RETURN VALUE
+.B ibv_rate_to_mbps()
+returns the number of Mbit/sec.
+.PP
+.B mbps_to_ibv_rate()
+returns the enumeration representing the IB transmission rate.
+.SH SEE ALSO
+.BR ibv_query_port (3)
+.SH AUTHORS
+.TP
+Dotan Barak dot...@dev.mellanox.co.il
diff --git a/src/libibverbs.map b/src/libibverbs.map
index 1827da0..7e722f4 100644
--- a/src/libibverbs.map
+++ b/src/libibverbs.map
@@ -96,4 +96,7 @@ IBVERBS_1.1 {
ibv_port_state_str;
ibv_event_type_str;
ibv_wc_status_str;
+
+   

[PATCH libibverbs 3/3] Fix resource leaks in the pingpong examples present in the failure/error flows.

2012-09-20 Thread Or Gerlitz
From: Dotan Barak dot...@dev.mellanox.co.il

Signed-off-by: Dotan Barak dot...@dev.mellanox.co.il
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 examples/rc_pingpong.c  |   43 ---
 examples/srq_pingpong.c |   51 ++
 examples/uc_pingpong.c  |   43 ---
 examples/ud_pingpong.c  |   43 ---
 4 files changed, 147 insertions(+), 33 deletions(-)

diff --git a/examples/rc_pingpong.c b/examples/rc_pingpong.c
index 0b7f3e0..15494a1 100644
--- a/examples/rc_pingpong.c
+++ b/examples/rc_pingpong.c
@@ -325,7 +325,7 @@ static struct pingpong_context *pp_init_ctx(struct 
ibv_device *ib_dev, int size,
ctx-buf = memalign(page_size, size);
if (!ctx-buf) {
fprintf(stderr, Couldn't allocate work buf.\n);
-   return NULL;
+   goto clean_ctx;
}
 
/* FIXME memset(ctx-buf, 0, size); */
@@ -335,14 +335,14 @@ static struct pingpong_context *pp_init_ctx(struct 
ibv_device *ib_dev, int size,
if (!ctx-context) {
fprintf(stderr, Couldn't get context for %s\n,
ibv_get_device_name(ib_dev));
-   return NULL;
+   goto clean_buffer;
}
 
if (use_event) {
ctx-channel = ibv_create_comp_channel(ctx-context);
if (!ctx-channel) {
fprintf(stderr, Couldn't create completion channel\n);
-   return NULL;
+   goto clean_device;
}
} else
ctx-channel = NULL;
@@ -350,20 +350,20 @@ static struct pingpong_context *pp_init_ctx(struct 
ibv_device *ib_dev, int size,
ctx-pd = ibv_alloc_pd(ctx-context);
if (!ctx-pd) {
fprintf(stderr, Couldn't allocate PD\n);
-   return NULL;
+   goto clean_comp_channel;
}
 
ctx-mr = ibv_reg_mr(ctx-pd, ctx-buf, size, IBV_ACCESS_LOCAL_WRITE);
if (!ctx-mr) {
fprintf(stderr, Couldn't register MR\n);
-   return NULL;
+   goto clean_pd;
}
 
ctx-cq = ibv_create_cq(ctx-context, rx_depth + 1, NULL,
ctx-channel, 0);
if (!ctx-cq) {
fprintf(stderr, Couldn't create CQ\n);
-   return NULL;
+   goto clean_mr;
}
 
{
@@ -382,7 +382,7 @@ static struct pingpong_context *pp_init_ctx(struct 
ibv_device *ib_dev, int size,
ctx-qp = ibv_create_qp(ctx-pd, attr);
if (!ctx-qp)  {
fprintf(stderr, Couldn't create QP\n);
-   return NULL;
+   goto clean_cq;
}
}
 
@@ -400,11 +400,38 @@ static struct pingpong_context *pp_init_ctx(struct 
ibv_device *ib_dev, int size,
  IBV_QP_PORT   |
  IBV_QP_ACCESS_FLAGS)) {
fprintf(stderr, Failed to modify QP to INIT\n);
-   return NULL;
+   goto clean_qp;
}
}
 
return ctx;
+
+clean_qp:
+   ibv_destroy_qp(ctx-qp);
+
+clean_cq:
+   ibv_destroy_cq(ctx-cq);
+
+clean_mr:
+   ibv_dereg_mr(ctx-mr);
+
+clean_pd:
+   ibv_dealloc_pd(ctx-pd);
+
+clean_comp_channel:
+   if (ctx-channel)
+   ibv_destroy_comp_channel(ctx-channel);
+
+clean_device:
+   ibv_close_device(ctx-context);
+
+clean_buffer:
+   free(ctx-buf);
+
+clean_ctx:
+   free(ctx);
+
+   return NULL;
 }
 
 int pp_close_ctx(struct pingpong_context *ctx)
diff --git a/examples/srq_pingpong.c b/examples/srq_pingpong.c
index 298dca4..6e00f8c 100644
--- a/examples/srq_pingpong.c
+++ b/examples/srq_pingpong.c
@@ -357,7 +357,7 @@ static struct pingpong_context *pp_init_ctx(struct 
ibv_device *ib_dev, int size,
ctx-buf = memalign(page_size, size);
if (!ctx-buf) {
fprintf(stderr, Couldn't allocate work buf.\n);
-   return NULL;
+   goto clean_ctx;
}
 
memset(ctx-buf, 0, size);
@@ -366,14 +366,14 @@ static struct pingpong_context *pp_init_ctx(struct 
ibv_device *ib_dev, int size,
if (!ctx-context) {
fprintf(stderr, Couldn't get context for %s\n,
ibv_get_device_name(ib_dev));
-   return NULL;
+   goto clean_buffer;
}
 
if (use_event) {
ctx-channel = ibv_create_comp_channel(ctx-context);
if (!ctx-channel) {
fprintf(stderr, Couldn't create completion channel\n);
-   return NULL;
+   goto clean_device;
}
} else
ctx-channel = NULL;
@@ -381,20 +381,20 @@ static struct pingpong_context 

[PATCH libmlx4 0/8] add raw packet QP, resource limitations, fixes/cleanups

2012-09-20 Thread Or Gerlitz
Roland,

This batch of libmlx4 patch contains the patch to support raw
packet QP, two patches from Sagi that relate to resource 
limitations, and few simple fixes/cleanups from Dotan.

The first three were submitted pretty long while ago, for this 
re-submission I made some changes in the change-logs and cleaned
some checkpatch comments.

Or.

Dotan Barak (5):
  Replace sscanf() to strtol()
  Allow to use the whole BF buffer
  Use BlueFlame for RDMA_WRITE/WITH_IMM without data
  Change enumeration names for masked atomic opcodes
  When calling ibv_modify_qp() return right value

Or Gerlitz (1):
  Add raw packet QP support

Sagi Grimberg (2):
  Report correct QP/CQ related resource limits
  Limit qp resources accepted for ibv_create_qp()

 src/mlx4.c  |   17 +++--
 src/mlx4.h  |   21 +++--
 src/qp.c|   14 +++---
 src/verbs.c |   24 +---
 4 files changed, 62 insertions(+), 14 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH libmlx4 1/8] Add raw packet QP support

2012-09-20 Thread Or Gerlitz
Implement raw packet QPs for Ethernet ports.

Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 src/qp.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/src/qp.c b/src/qp.c
index 40a6689..90c4e80 100644
--- a/src/qp.c
+++ b/src/qp.c
@@ -286,6 +286,10 @@ int mlx4_post_send(struct ibv_qp *ibqp, struct ibv_send_wr 
*wr,
size += sizeof (struct mlx4_wqe_datagram_seg) / 16;
break;
 
+   case IBV_QPT_RAW_PACKET:
+   /* For raw eth, the MLX4_WQE_CTRL_SOLICIT flag is used
+* to indicate that no icrc should be calculated */
+   ctrl-srcrb_flags |= htonl(MLX4_WQE_CTRL_SOLICIT);
default:
break;
}
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH libmlx4 3/8] Limit qp resources accepted for ibv_create_qp()

2012-09-20 Thread Or Gerlitz
From: Sagi Grimberg sa...@mellanox.co.il

Use the limits reported in ib_query_device(). Make sure that the limits
returned to the caller following qp creation also lie within the
reported device limits.

Signed-off-by: Sagi Grimberg sa...@mellanox.co.il
Signed-off-by: Jack Morgenstein ja...@dev.mellanox.co.il
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 src/mlx4.h  |   14 ++
 src/qp.c|6 --
 src/verbs.c |   18 +-
 3 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/src/mlx4.h b/src/mlx4.h
index efaa7e9..e1daaf7 100644
--- a/src/mlx4.h
+++ b/src/mlx4.h
@@ -83,6 +83,20 @@
 
 #define PFXmlx4: 
 
+#ifndef max
+#define max(a, b) \
+   ({ typeof(a) _a = (a); \
+  typeof(b) _b = (b); \
+  _a  _b ? _a : _b; })
+#endif
+
+#ifndef min
+#define min(a, b) \
+   ({ typeof(a) _a = (a); \
+  typeof(b) _b = (b); \
+  _a  _b ? _a : _b; })
+#endif
+
 enum {
MLX4_CQ_ENTRY_SIZE  = 0x20
 };
diff --git a/src/qp.c b/src/qp.c
index 90c4e80..0dfe3a7 100644
--- a/src/qp.c
+++ b/src/qp.c
@@ -628,6 +628,7 @@ void mlx4_set_sq_sizes(struct mlx4_qp *qp, struct 
ibv_qp_cap *cap,
   enum ibv_qp_type type)
 {
int wqe_size;
+   struct mlx4_context *ctx = to_mctx(qp-ibv_qp.context);
 
wqe_size = (1  qp-sq.wqe_shift) - sizeof (struct mlx4_wqe_ctrl_seg);
switch (type) {
@@ -645,8 +646,9 @@ void mlx4_set_sq_sizes(struct mlx4_qp *qp, struct 
ibv_qp_cap *cap,
}
 
qp-sq.max_gs= wqe_size / sizeof (struct mlx4_wqe_data_seg);
-   cap-max_send_sge= qp-sq.max_gs;
-   qp-sq.max_post  = qp-sq.wqe_cnt - qp-sq_spare_wqes;
+   cap-max_send_sge= min(ctx-max_sge, qp-sq.max_gs);
+   qp-sq.max_post  = min(ctx-max_qp_wr,
+  qp-sq.wqe_cnt - qp-sq_spare_wqes);
cap-max_send_wr = qp-sq.max_post;
 
/*
diff --git a/src/verbs.c b/src/verbs.c
index 408fc6d..f629275 100644
--- a/src/verbs.c
+++ b/src/verbs.c
@@ -390,12 +390,14 @@ struct ibv_qp *mlx4_create_qp(struct ibv_pd *pd, struct 
ibv_qp_init_attr *attr)
struct ibv_create_qp_resp resp;
struct mlx4_qp   *qp;
int   ret;
+   struct mlx4_context  *context = to_mctx(pd-context);
+
 
/* Sanity check QP size before proceeding */
-   if (attr-cap.max_send_wr  65536 ||
-   attr-cap.max_recv_wr  65536 ||
-   attr-cap.max_send_sge 64||
-   attr-cap.max_recv_sge 64||
+   if (attr-cap.max_send_wr  context-max_qp_wr ||
+   attr-cap.max_recv_wr  context-max_qp_wr ||
+   attr-cap.max_send_sge context-max_sge   ||
+   attr-cap.max_recv_sge context-max_sge   ||
attr-cap.max_inline_data  1024)
return NULL;
 
@@ -464,8 +466,14 @@ struct ibv_qp *mlx4_create_qp(struct ibv_pd *pd, struct 
ibv_qp_init_attr *attr)
goto err_destroy;
pthread_mutex_unlock(to_mctx(pd-context)-qp_table_mutex);
 
-   qp-rq.wqe_cnt = qp-rq.max_post = attr-cap.max_recv_wr;
+   qp-rq.wqe_cnt = attr-cap.max_recv_wr;
qp-rq.max_gs  = attr-cap.max_recv_sge;
+
+   /* adjust rq maxima to not exceed reported device maxima */
+   attr-cap.max_recv_wr = min(context-max_qp_wr, attr-cap.max_recv_wr);
+   attr-cap.max_recv_sge = min(context-max_sge, attr-cap.max_recv_sge);
+
+   qp-rq.max_post = attr-cap.max_recv_wr;
mlx4_set_sq_sizes(qp, attr-cap, attr-qp_type);
 
qp-doorbell_qpn= htonl(qp-ibv_qp.qp_num  8);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH libmlx4 7/8] Change enumeration names for masked atomic opcodes

2012-09-20 Thread Or Gerlitz
From: Dotan Barak dot...@dev.mellanox.co.il

Change the enumeration names of the masked atomic opcodes to be
consistent with the ones used by the mlx4 kernel driver.

Signed-off-by: Dotan Barak dot...@dev.mellanox.co.il
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 src/mlx4.h |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mlx4.h b/src/mlx4.h
index e1daaf7..139db24 100644
--- a/src/mlx4.h
+++ b/src/mlx4.h
@@ -128,8 +128,8 @@ enum {
MLX4_OPCODE_RDMA_READ   = 0x10,
MLX4_OPCODE_ATOMIC_CS   = 0x11,
MLX4_OPCODE_ATOMIC_FA   = 0x12,
-   MLX4_OPCODE_ATOMIC_MASK_CS  = 0x14,
-   MLX4_OPCODE_ATOMIC_MASK_FA  = 0x15,
+   MLX4_OPCODE_MASKED_ATOMIC_CS= 0x14,
+   MLX4_OPCODE_MASKED_ATOMIC_FA= 0x15,
MLX4_OPCODE_BIND_MW = 0x18,
MLX4_OPCODE_FMR = 0x19,
MLX4_OPCODE_LOCAL_INVAL = 0x1b,
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH libmlx4 6/8] Use BlueFlame for RDMA_WRITE/WITH_IMM without data

2012-09-20 Thread Or Gerlitz
From: Dotan Barak dot...@dev.mellanox.co.il

Use Blue-Flame for RDMA Write and RDMA Write with immediate without any
data (no s/g). This improves latency for those messages.

Signed-off-by: Dotan Barak dot...@dev.mellanox.co.il
Reviewed-by: Jack Morgenstein ja...@dev.mellanox.co.il
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 src/qp.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/src/qp.c b/src/qp.c
index 812e6ec..e770ec8 100644
--- a/src/qp.c
+++ b/src/qp.c
@@ -267,6 +267,8 @@ int mlx4_post_send(struct ibv_qp *ibqp, struct ibv_send_wr 
*wr,
/* fall through */
case IBV_WR_RDMA_WRITE:
case IBV_WR_RDMA_WRITE_WITH_IMM:
+   if (!wr-num_sge)
+   inl = 1;
set_raddr_seg(wqe, wr-wr.rdma.remote_addr,
  wr-wr.rdma.rkey);
wqe  += sizeof (struct mlx4_wqe_raddr_seg);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH libmlx4 5/8] Allow to use the whole BF buffer

2012-09-20 Thread Or Gerlitz
From: Dotan Barak dot...@dev.mellanox.co.il

Increase the maximum size of messages (from 192 to 208) that will use the blue
flame buffer.

Signed-off-by: Dotan Barak dot...@dev.mellanox.co.il
Reviewed-by: Jack Morgenstein ja...@dev.mellanox.co.il
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 src/qp.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/qp.c b/src/qp.c
index 0dfe3a7..812e6ec 100644
--- a/src/qp.c
+++ b/src/qp.c
@@ -398,7 +398,7 @@ int mlx4_post_send(struct ibv_qp *ibqp, struct ibv_send_wr 
*wr,
 out:
ctx = to_mctx(ibqp-context);
 
-   if (nreq == 1  inl  size  1  size  ctx-bf_buf_size / 16) {
+   if (nreq == 1  inl  size  1  size = ctx-bf_buf_size / 16) {
ctrl-owner_opcode |= htonl((qp-sq.head  0x)  8);
*(uint32_t *) ctrl-reserved |= qp-doorbell_qpn;
/*
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH libmlx4 4/8] Replace sscanf() to strtol()

2012-09-20 Thread Or Gerlitz
From: Dotan Barak dot...@dev.mellanox.co.il

When converting a string to a numeric value, strtol() is more safe to use.

Signed-off-by: Dotan Barak dot...@dev.mellanox.co.il
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 src/mlx4.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mlx4.c b/src/mlx4.c
index 4989c46..0a9139f 100644
--- a/src/mlx4.c
+++ b/src/mlx4.c
@@ -224,12 +224,12 @@ static struct ibv_device *mlx4_driver_init(const char 
*uverbs_sys_path,
if (ibv_read_sysfs_file(uverbs_sys_path, device/vendor,
value, sizeof value)  0)
return NULL;
-   sscanf(value, %i, vendor);
+   vendor = strtol(value, NULL, 16);
 
if (ibv_read_sysfs_file(uverbs_sys_path, device/device,
value, sizeof value)  0)
return NULL;
-   sscanf(value, %i, device);
+   device = strtol(value, NULL, 16);
 
for (i = 0; i  sizeof hca_table / sizeof hca_table[0]; ++i)
if (vendor == hca_table[i].vendor 
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH libmlx4 8/8] When calling ibv_modify_qp() return right value

2012-09-20 Thread Or Gerlitz
From: Dotan Barak dot...@dev.mellanox.co.il

When the ibv_query_port() call made by mlx4_modify_qp() fails, the return
value from the latter should indicate the error status of the former and
not simply -1.

Signed-off-by: Dotan Barak dot...@dev.mellanox.co.il
Signed-off-by: Or Gerlitz ogerl...@mellanox.com
---
 src/verbs.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/verbs.c b/src/verbs.c
index f629275..fb73fdc 100644
--- a/src/verbs.c
+++ b/src/verbs.c
@@ -534,8 +534,10 @@ int mlx4_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr 
*attr,
int ret;
 
if (attr_mask  IBV_QP_PORT) {
-   if (ibv_query_port(qp-pd-context, attr-port_num, port_attr))
-   return -1;
+   ret = ibv_query_port(qp-pd-context, attr-port_num,
+port_attr);
+   if (ret)
+   return ret;
mqp-link_layer = port_attr.link_layer;
}
 
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] RDMA/nes: Cosmetic changes

2012-09-20 Thread Tatyana Nikolova
Removing unnecessary statement if(1)
Refactoring a statement (wqe_misc |= NES_NIC_SQ_WQE_COMPLETION) out of if/else 
statement, because it is independant of the flow
Defining netdev-features in one line for clarity

Signed-off-by: Tatyana Nikolova tatyana.e.nikol...@intel.com
---
 drivers/infiniband/hw/nes/nes_nic.c |   32 +---
 1 files changed, 13 insertions(+), 19 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes_nic.c 
b/drivers/infiniband/hw/nes/nes_nic.c
index 1c02ba7..d0391ac 100644
--- a/drivers/infiniband/hw/nes/nes_nic.c
+++ b/drivers/infiniband/hw/nes/nes_nic.c
@@ -385,24 +385,20 @@ static int nes_nic_send(struct sk_buff *skb, struct 
net_device *netdev)
/* bump past the vlan tag */
wqe_fragment_length++;
/*  wqe_fragment_address = (u64 
*)nic_sqe-wqe_words[NES_NIC_SQ_WQE_FRAG0_LOW_IDX]; */
+   wqe_misc |= NES_NIC_SQ_WQE_COMPLETION;
 
if (skb-ip_summed == CHECKSUM_PARTIAL) {
-   tcph = tcp_hdr(skb);
-   if (1) {
-   if (skb_is_gso(skb)) {
-   /* nes_debug(NES_DBG_NIC_TX, %s: TSO 
request... is_gso = %u seg size = %u\n,
-   netdev-name, skb_is_gso(skb), 
skb_shinfo(skb)-gso_size); */
-   wqe_misc |= NES_NIC_SQ_WQE_LSO_ENABLE |
-   NES_NIC_SQ_WQE_COMPLETION | 
(u16)skb_shinfo(skb)-gso_size;
-   set_wqe_32bit_value(nic_sqe-wqe_words, 
NES_NIC_SQ_WQE_LSO_INFO_IDX,
-   ((u32)tcph-doff) |
-   (((u32)(((unsigned char *)tcph) 
- skb-data))  4));
-   } else {
-   wqe_misc |= NES_NIC_SQ_WQE_COMPLETION;
-   }
+   if (skb_is_gso(skb)) {
+   tcph = tcp_hdr(skb);
+   /* nes_debug(NES_DBG_NIC_TX, %s: TSO request... is_gso 
= %u seg size = %u\n,
+   netdev-name, skb_is_gso(skb), 
skb_shinfo(skb)-gso_size); */
+   wqe_misc |= NES_NIC_SQ_WQE_LSO_ENABLE | 
(u16)skb_shinfo(skb)-gso_size;
+   set_wqe_32bit_value(nic_sqe-wqe_words, 
NES_NIC_SQ_WQE_LSO_INFO_IDX,
+   ((u32)tcph-doff) |
+   (((u32)(((unsigned char *)tcph) - 
skb-data))  4));
}
} else {/* CHECKSUM_HW */
-   wqe_misc |= NES_NIC_SQ_WQE_DISABLE_CHKSUM | 
NES_NIC_SQ_WQE_COMPLETION;
+   wqe_misc |= NES_NIC_SQ_WQE_DISABLE_CHKSUM;
}
 
set_wqe_32bit_value(nic_sqe-wqe_words, NES_NIC_SQ_WQE_TOTAL_LENGTH_IDX,
@@ -1679,12 +1675,10 @@ struct net_device *nes_netdev_init(struct nes_device 
*nesdev,
netdev-hard_header_len = ETH_HLEN;
netdev-addr_len = ETH_ALEN;
netdev-type = ARPHRD_ETHER;
-   netdev-features = NETIF_F_HIGHDMA;
netdev-netdev_ops = nes_netdev_ops;
netdev-ethtool_ops = nes_ethtool_ops;
netif_napi_add(netdev, nesvnic-napi, nes_netdev_poll, 128);
nes_debug(NES_DBG_INIT, Enabling VLAN Insert/Delete.\n);
-   netdev-features |= NETIF_F_HW_VLAN_TX;
 
/* Fill in the port structure */
nesvnic-netdev = netdev;
@@ -1711,11 +1705,11 @@ struct net_device *nes_netdev_init(struct nes_device 
*nesdev,
netdev-dev_addr[5] = (u8)u64temp;
memcpy(netdev-perm_addr, netdev-dev_addr, 6);
 
-   netdev-hw_features = NETIF_F_RXCSUM | NETIF_F_SG | NETIF_F_IP_CSUM |
- NETIF_F_HW_VLAN_RX;
+   netdev-hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_RXCSUM | 
NETIF_F_HW_VLAN_RX;
if ((nesvnic-logical_port  2) || (nesdev-nesadapter-hw_rev != 
NE020_REV))
netdev-hw_features |= NETIF_F_TSO;
-   netdev-features |= netdev-hw_features;
+
+   netdev-features = netdev-hw_features | NETIF_F_HIGHDMA | 
NETIF_F_HW_VLAN_TX;
netdev-hw_features |= NETIF_F_LRO;
 
nes_debug(NES_DBG_INIT, nesvnic = %p, reported features = 0x%lX, QPid 
= %d,
-- 
1.7.4.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] RDMA/nes: Fix for crash when tx chksum offload is off

2012-09-20 Thread Tatyana Nikolova
When tx checksum offload is disabled for iWarp connection, skb-ip_summed needs 
to be set to CHECKSUM_NONE

Signed-off-by: Tatyana Nikolova tatyana.e.nikol...@intel.com
---
 drivers/infiniband/hw/nes/nes_cm.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes_cm.c 
b/drivers/infiniband/hw/nes/nes_cm.c
index 49a9383..f843bb0 100644
--- a/drivers/infiniband/hw/nes/nes_cm.c
+++ b/drivers/infiniband/hw/nes/nes_cm.c
@@ -430,6 +430,8 @@ static void form_cm_frame(struct sk_buff *skb,
buf += sizeof(*tcph);
 
skb-ip_summed = CHECKSUM_PARTIAL;
+   if (!(cm_node-netdev-features  NETIF_F_IP_CSUM))
+   skb-ip_summed = CHECKSUM_NONE;
skb-protocol = htons(0x800);
skb-data_len = 0;
skb-mac_len = ETH_HLEN;
-- 
1.7.4.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to preserve QP over HA events for librdmacm applications

2012-09-20 Thread Pradeep Satyanarayana

On 09/20/2012 01:10 PM, Hefty, Sean wrote:

What if you say pre-created a second (fail over) QP for HA purposes all
under the covers of a single socket? And both QPs were connected before
the failure. Not sure if that would work with the same CM id though. If
not, we will need to rdma_connect() the second QP after failure.


CM IDs are not shared across devices, and can't be reused for different QPs 
until the first connection has been torn down and gone through timewait.  For 
IB, you probably want path migration capabilities.

Fair enough, I understand one needs to use a different CM id. For the IB 
case I was thinking of avoiding APM (since that is limited to a device 
-isn't that so?).



Anything more generic should really be handled by the application.  Migrating a 
connection between devices also requires using different CQs, PDs, MRs, etc.



Is PD device specific? Couldn't one reuse the same CQs and MRs, even 
though the QP is different? Of course only one QP would be active at any 
time.


Thanks
Pradeep

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] RDMA/nes: Print hardware resource type

2012-09-20 Thread Tatyana Nikolova
Hardware resource types are added and when a resource isn't available, its type 
is printed. 

Signed-off-by: Tatyana Nikolova tatyana.e.nikol...@intel.com
---
 drivers/infiniband/hw/nes/nes.h   |   15 ---
 drivers/infiniband/hw/nes/nes_utils.c |2 +-
 drivers/infiniband/hw/nes/nes_verbs.c |   14 +++---
 3 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes.h b/drivers/infiniband/hw/nes/nes.h
index c438e46..48e4238 100644
--- a/drivers/infiniband/hw/nes/nes.h
+++ b/drivers/infiniband/hw/nes/nes.h
@@ -399,11 +399,20 @@ static inline void nes_write8(void __iomem *addr, u8 val)
writeb(val, addr);
 }
 
-
+typedef enum nes_resource_t {
+   NES_RESOURCE_MW = 1,
+   NES_RESOURCE_FAST_MR,
+   NES_RESOURCE_PHYS_MR,
+   NES_RESOURCE_USER_MR,
+   NES_RESOURCE_PD,
+   NES_RESOURCE_QP,
+   NES_RESOURCE_CQ,
+   NES_RESOURCE_ARP
+} nes_resource_t;
 
 static inline int nes_alloc_resource(struct nes_adapter *nesadapter,
unsigned long *resource_array, u32 max_resources,
-   u32 *req_resource_num, u32 *next)
+   u32 *req_resource_num, u32 *next, nes_resource_t resource_type)
 {
unsigned long flags;
u32 resource_num;
@@ -414,7 +423,7 @@ static inline int nes_alloc_resource(struct nes_adapter 
*nesadapter,
if (resource_num = max_resources) {
resource_num = find_first_zero_bit(resource_array, 
max_resources);
if (resource_num = max_resources) {
-   printk(KERN_ERR PFX %s: No available resourcess.\n, 
__func__);
+   printk(KERN_ERR PFX %s: No available resources 
[type=%u].\n, __func__, resource_type);
spin_unlock_irqrestore(nesadapter-resource_lock, 
flags);
return -EMFILE;
}
diff --git a/drivers/infiniband/hw/nes/nes_utils.c 
b/drivers/infiniband/hw/nes/nes_utils.c
index e98f4fc..2042c0f 100644
--- a/drivers/infiniband/hw/nes/nes_utils.c
+++ b/drivers/infiniband/hw/nes/nes_utils.c
@@ -699,7 +699,7 @@ int nes_arp_table(struct nes_device *nesdev, u32 ip_addr, 
u8 *mac_addr, u32 acti
 
arp_index = 0;
err = nes_alloc_resource(nesadapter, nesadapter-allocated_arps,
-   nesadapter-arp_table_size, (u32 *)arp_index, 
nesadapter-next_arp_index);
+   nesadapter-arp_table_size, (u32 *)arp_index, 
nesadapter-next_arp_index, NES_RESOURCE_ARP);
if (err) {
nes_debug(NES_DBG_NETDEV, nes_alloc_resource returned 
error = %u\n, err);
return err;
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c 
b/drivers/infiniband/hw/nes/nes_verbs.c
index 8b8812d..1dadcf3 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -80,7 +80,7 @@ static struct ib_mw *nes_alloc_mw(struct ib_pd *ibpd) {
next_stag_index %= nesadapter-max_mr;
 
ret = nes_alloc_resource(nesadapter, nesadapter-allocated_mrs,
-   nesadapter-max_mr, stag_index, next_stag_index);
+   nesadapter-max_mr, stag_index, next_stag_index, 
NES_RESOURCE_MW);
if (ret) {
return ERR_PTR(ret);
}
@@ -404,7 +404,7 @@ static struct ib_mr *nes_alloc_fast_reg_mr(struct ib_pd 
*ibpd, int max_page_list
 
err = nes_alloc_resource(nesadapter, nesadapter-allocated_mrs,
 nesadapter-max_mr, stag_index,
-next_stag_index);
+next_stag_index, NES_RESOURCE_FAST_MR);
if (err)
return ERR_PTR(err);
 
@@ -780,7 +780,7 @@ static struct ib_pd *nes_alloc_pd(struct ib_device *ibdev,
netdev_refcnt_read(nesvnic-netdev));
 
err = nes_alloc_resource(nesadapter, nesadapter-allocated_pds,
-   nesadapter-max_pd, pd_num, nesadapter-next_pd);
+   nesadapter-max_pd, pd_num, nesadapter-next_pd, 
NES_RESOURCE_PD);
if (err) {
return ERR_PTR(err);
}
@@ -1157,7 +1157,7 @@ static struct ib_qp *nes_create_qp(struct ib_pd *ibpd,
nes_debug(NES_DBG_QP, RQ size=%u, SQ Size=%u\n, 
rq_size, sq_size);
 
ret = nes_alloc_resource(nesadapter, 
nesadapter-allocated_qps,
-   nesadapter-max_qp, qp_num, 
nesadapter-next_qp);
+   nesadapter-max_qp, qp_num, 
nesadapter-next_qp, NES_RESOURCE_QP);
if (ret) {
return ERR_PTR(ret);
}
@@ -1546,7 +1546,7 @@ static struct ib_cq *nes_create_cq(struct ib_device 
*ibdev, int entries,
return ERR_PTR(-EINVAL);
 
err = nes_alloc_resource(nesadapter, nesadapter-allocated_cqs,
- 

[PATCH 1/8] libibverbs: Infra-structure changes to support verbs extension

2012-09-20 Thread Hefty, Sean
From: Yishai Hadas yish...@mellanox.com

Infrastructure to support extended verbs capabilities in a forward/backward
manner.

The general operation as shown in the following pseudo-code:

ibv_open_device()
{
context = device-ops.alloc_context();
if (context == -1) {
context_ex = malloc(verbs_context + verbs_device-context_size);
verbs_device-init_context(context_ex);
context_ex-context.abi_compat = -1;
}
}

If the underlying provider supports extensions, it returns -1 from its
alloc_context() call.  Ibverbs then allocates the ibv_context structure and
calls into the provider to finish initializing it.

When extensions are supported, the ibv_device structure is embedded in a
larger verbs_device structure.  Similarly, ibv_context is embedded inside
a larger verbs_context structure.

Signed-off-by: Yishai Hadas yish...@mellanox.com
Signed-off-by: Tzahi Oved tza...@mellanox.com
---
 include/infiniband/driver.h |1 +
 include/infiniband/verbs.h  |   52 +++
 src/cmd.c   |   41 +-
 src/device.c|   52 ---
 src/init.c  |8 +++
 src/libibverbs.map  |1 +
 6 files changed, 107 insertions(+), 48 deletions(-)

diff --git a/include/infiniband/driver.h b/include/infiniband/driver.h
index 9a81416..5af0d7f 100644
--- a/include/infiniband/driver.h
+++ b/include/infiniband/driver.h
@@ -57,6 +57,7 @@ typedef struct ibv_device *(*ibv_driver_init_func)(const char 
*uverbs_sys_path,
   int abi_version);
 
 void ibv_register_driver(const char *name, ibv_driver_init_func init_func);
+void verbs_register_driver(const char *name, ibv_driver_init_func init_func);
 int ibv_cmd_get_context(struct ibv_context *context, struct ibv_get_context 
*cmd,
size_t cmd_size, struct ibv_get_context_resp *resp,
size_t resp_size);
diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
index 6acfc81..a2577d8 100644
--- a/include/infiniband/verbs.h
+++ b/include/infiniband/verbs.h
@@ -38,6 +38,7 @@
 
 #include stdint.h
 #include pthread.h
+#include stddef.h
 
 #ifdef __cplusplus
 #  define BEGIN_C_DECLS extern C {
@@ -63,6 +64,19 @@ union ibv_gid {
} global;
 };
 
+#ifndef container_of
+/**
+  * container_of - cast a member of a structure out to the containing structure
+  * @ptr:the pointer to the member.
+  * @type:   the type of the container struct this is embedded in.
+  * @member: the name of the member within the struct.
+  *
+ */
+#define container_of(ptr, type, member) ({\
+   const typeof(((type *)0)-member) * __mptr = (ptr);\
+   (type *)((char *)__mptr - offsetof(type, member)); })
+#endif
+
 enum ibv_node_type {
IBV_NODE_UNKNOWN= -1,
IBV_NODE_CA = 1,
@@ -634,6 +648,17 @@ struct ibv_device {
charibdev_path[IBV_SYSFS_PATH_MAX];
 };
 
+struct verbs_device {
+   struct ibv_device device; /* Must be first */
+   size_t  sz;
+   size_t  size_of_context;
+   int (*init_context)(struct verbs_device *device,
+   struct ibv_context *ctx, int cmd_fd);
+   void(*uninit_context)(struct verbs_device *device,
+   struct ibv_context *ctx);
+   /* future fields added here */
+};
+
 struct ibv_context_ops {
int (*query_device)(struct ibv_context *context,
  struct ibv_device_attr 
*device_attr);
@@ -702,6 +727,33 @@ struct ibv_context {
void   *abi_compat;
 };
 
+struct verbs_context {
+
+   /*  grows up - new fields go here
+   int (*drv_new_func1) ();new corresponding provider call of func1
+   int (*lib_new_func1) ();New library call func1
+   */
+   size_t sz;  /* Set by library on struct allocation,must be
+   * located right before struct ibv_context
+   */
+   struct ibv_context context;/* Must be last field in the struct */
+};
+
+static inline struct verbs_context *verbs_get_ctx(
+   const struct ibv_context *ctx)
+{
+   if (ctx-abi_compat != ((uint8_t *)NULL)-1)
+   return NULL;
+
+   return container_of(ctx, struct verbs_context, context);
+}
+
+static inline struct verbs_device *verbs_get_device(
+   const struct ibv_device *dev)
+{
+   return container_of(dev, struct verbs_device, device);
+}
+
 /**
  * ibv_get_device_list - Get list of IB devices currently available
  * @num_devices: optional.  if non-NULL, set to the number of devices
diff --git a/src/cmd.c b/src/cmd.c
index 9789092..dab8930 100644
--- a/src/cmd.c
+++ b/src/cmd.c
@@ 

[PATCH 4/8] livibverbs: Add support for XRC SRQs

2012-09-20 Thread Hefty, Sean
XRC support requires the use of a new type of SRQ.

XRC shared receive queues: xrc srq's are similar to normal
srq's, except that they are bound to an xrcd, rather
than to a protection domain.  Based on the current spec
and implementation, they are only usable with xrc qps.  To
support xrc srq's, we define a new srq_init_attr structure
to include an srq type and other needed information.

The kernel ABI is also updated to allow creating extended
SRQs.

Signed-off-by: Sean Hefty sean.he...@intel.com
---
 include/infiniband/driver.h   |4 +++
 include/infiniband/kern-abi.h |   21 -
 include/infiniband/verbs.h|   37 ++
 src/cmd.c |   38 +++
 src/libibverbs.map|2 ++
 src/verbs.c   |   50 -
 6 files changed, 144 insertions(+), 8 deletions(-)

diff --git a/include/infiniband/driver.h b/include/infiniband/driver.h
index 193b0dd..cac48ab 100644
--- a/include/infiniband/driver.h
+++ b/include/infiniband/driver.h
@@ -105,6 +105,10 @@ int ibv_cmd_create_srq(struct ibv_pd *pd,
   struct ibv_srq *srq, struct ibv_srq_init_attr *attr,
   struct ibv_create_srq *cmd, size_t cmd_size,
   struct ibv_create_srq_resp *resp, size_t resp_size);
+int ibv_cmd_create_srq_ex(struct ibv_pd *pd,
+ struct ibv_srq *srq, struct ibv_srq_init_attr_ex 
*attr_ex,
+ struct ibv_create_xsrq *cmd, size_t cmd_size,
+ struct ibv_create_srq_resp *resp, size_t resp_size);
 int ibv_cmd_modify_srq(struct ibv_srq *srq,
   struct ibv_srq_attr *srq_attr,
   int srq_attr_mask,
diff --git a/include/infiniband/kern-abi.h b/include/infiniband/kern-abi.h
index d7c673f..3d72fa7 100644
--- a/include/infiniband/kern-abi.h
+++ b/include/infiniband/kern-abi.h
@@ -88,6 +88,7 @@ enum {
IB_USER_VERBS_CMD_POST_SRQ_RECV,
IB_USER_VERBS_CMD_OPEN_XRCD,
IB_USER_VERBS_CMD_CLOSE_XRCD,
+   IB_USER_VERBS_CMD_CREATE_XSRQ
 };
 
 /*
@@ -730,11 +731,28 @@ struct ibv_create_srq {
__u64 driver_data[0];
 };
 
+struct ibv_create_xsrq {
+   __u32 command;
+   __u16 in_words;
+   __u16 out_words;
+   __u64 response;
+   __u64 user_handle;
+   __u32 srq_type;
+   __u32 pd_handle;
+   __u32 max_wr;
+   __u32 max_sge;
+   __u32 srq_limit;
+   __u32 reserved;
+   __u32 xrcd_handle;
+   __u32 cq_handle;
+   __u64 driver_data[0];
+};
+
 struct ibv_create_srq_resp {
__u32 srq_handle;
__u32 max_wr;
__u32 max_sge;
-   __u32 reserved;
+   __u32 srqn;
 };
 
 struct ibv_modify_srq {
@@ -829,6 +847,7 @@ enum {
IB_USER_VERBS_CMD_CREATE_COMP_CHANNEL_V2 = -1,
IB_USER_VERBS_CMD_OPEN_XRCD_V2 = -1,
IB_USER_VERBS_CMD_CLOSE_XRCD_V2 = -1,
+   IB_USER_VERBS_CMD_CREATE_XSRQ_V2 = -1
 };
 
 struct ibv_destroy_cq_v1 {
diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
index 6e02c9a..8756fed 100644
--- a/include/infiniband/verbs.h
+++ b/include/infiniband/verbs.h
@@ -437,6 +437,28 @@ struct ibv_srq_init_attr {
struct ibv_srq_attr attr;
 };
 
+enum ibv_srq_type {
+   IBV_SRQT_BASIC,
+   IBV_SRQT_XRC
+};
+
+enum ibv_srq_init_attr_mask {
+   IBV_SRQ_INIT_ATTR_TYPE  = 1  0,
+   IBV_SRQ_INIT_ATTR_XRCD  = 1  1,
+   IBV_SRQ_INIT_ATTR_CQ= 1  2,
+   IBV_SRQ_INIT_ATTR_RESERVED  = 1  3
+};
+
+struct ibv_srq_init_attr_ex {
+   void   *srq_context;
+   struct ibv_srq_attr attr;
+
+   uint64_tcomp_mask;
+   enum ibv_srq_type   srq_type;
+   struct ibv_xrcd*xrcd;
+   struct ibv_cq  *cq;
+};
+
 enum ibv_qp_type {
IBV_QPT_RC = 2,
IBV_QPT_UC,
@@ -596,7 +618,11 @@ struct ibv_mw_bind {
 };
 
 enum ibv_srq_mask {
-   IBV_SRQ_RESERVED= 1  0
+   IBV_SRQ_TYPE= 1  0,
+   IBV_SRQ_XRCD= 1  1,
+   IBV_SRQ_CQ  = 1  2,
+   IBV_SRQ_NUM = 1  3,
+   IBV_SRQ_RESERVED= 1  4
 };
 
 struct ibv_srq {
@@ -610,6 +636,10 @@ struct ibv_srq {
uint32_tevents_completed;
 
uint32_tcomp_mask;
+   enum ibv_srq_type   srq_type;
+   struct ibv_xrcd*xrcd;
+   struct ibv_cq  *cq;
+   uint32_tsrq_num;
 };
 
 enum ibv_qp_mask {
@@ -790,6 +820,8 @@ struct verbs_context {
int (*drv_new_func1) ();new corresponding provider call of func1
int (*lib_new_func1) ();New library call func1
*/
+   struct ibv_srq *(*create_srq_ex)(struct ibv_pd *pd,
+struct ibv_srq_init_attr_ex 
*srq_init_attr_ex);
struct ibv_xrcd * 

[PATCH 3/8] libibverbs: Introduce XRC domains

2012-09-20 Thread Hefty, Sean
XRC introduces several new concepts and structures, one of
which is the XRC domain.

XRC domains: xrcd's are a type of protection domain used to
associate shared receive queues with xrc queue pairs.  Since
xrcd are meant to be shared among multiple processes, we
introduce new APIs to open/close xrcd's.

The user to kernel ABI is extended to account for opening/
closing the xrcd.

Signed-off-by: Sean Hefty sean.he...@intel.com
---
 include/infiniband/driver.h   |5 +
 include/infiniband/kern-abi.h |   27 ++-
 include/infiniband/verbs.h|   30 +++---
 src/cmd.c |   34 ++
 src/libibverbs.map|6 ++
 src/verbs.c   |   26 ++
 6 files changed, 124 insertions(+), 4 deletions(-)

diff --git a/include/infiniband/driver.h b/include/infiniband/driver.h
index 5af0d7f..193b0dd 100644
--- a/include/infiniband/driver.h
+++ b/include/infiniband/driver.h
@@ -76,6 +76,11 @@ int ibv_cmd_alloc_pd(struct ibv_context *context, struct 
ibv_pd *pd,
 struct ibv_alloc_pd *cmd, size_t cmd_size,
 struct ibv_alloc_pd_resp *resp, size_t resp_size);
 int ibv_cmd_dealloc_pd(struct ibv_pd *pd);
+int ibv_cmd_open_xrcd(struct ibv_context *context, struct ibv_xrcd *xrcd,
+ int fd, int oflags,
+ struct ibv_open_xrcd *cmd, size_t cmd_size,
+ struct ibv_open_xrcd_resp *resp, size_t resp_size);
+int ibv_cmd_close_xrcd(struct ibv_xrcd *xrcd);
 #define IBV_CMD_REG_MR_HAS_RESP_PARAMS
 int ibv_cmd_reg_mr(struct ibv_pd *pd, void *addr, size_t length,
   uint64_t hca_va, int access,
diff --git a/include/infiniband/kern-abi.h b/include/infiniband/kern-abi.h
index 619ea7e..d7c673f 100644
--- a/include/infiniband/kern-abi.h
+++ b/include/infiniband/kern-abi.h
@@ -85,7 +85,9 @@ enum {
IB_USER_VERBS_CMD_MODIFY_SRQ,
IB_USER_VERBS_CMD_QUERY_SRQ,
IB_USER_VERBS_CMD_DESTROY_SRQ,
-   IB_USER_VERBS_CMD_POST_SRQ_RECV
+   IB_USER_VERBS_CMD_POST_SRQ_RECV,
+   IB_USER_VERBS_CMD_OPEN_XRCD,
+   IB_USER_VERBS_CMD_CLOSE_XRCD,
 };
 
 /*
@@ -246,6 +248,27 @@ struct ibv_dealloc_pd {
__u32 pd_handle;
 };
 
+struct ibv_open_xrcd {
+   __u32 command;
+   __u16 in_words;
+   __u16 out_words;
+   __u64 response;
+   __u32 fd;
+   __u32 oflags;
+   __u64 driver_data[0];
+};
+
+struct ibv_open_xrcd_resp {
+   __u32 xrcd_handle;
+};
+
+struct ibv_close_xrcd {
+   __u32 command;
+   __u16 in_words;
+   __u16 out_words;
+   __u32 xrcd_handle;
+};
+
 struct ibv_reg_mr {
__u32 command;
__u16 in_words;
@@ -804,6 +827,8 @@ enum {
 * trick opcodes in IBV_INIT_CMD() doesn't break.
 */
IB_USER_VERBS_CMD_CREATE_COMP_CHANNEL_V2 = -1,
+   IB_USER_VERBS_CMD_OPEN_XRCD_V2 = -1,
+   IB_USER_VERBS_CMD_CLOSE_XRCD_V2 = -1,
 };
 
 struct ibv_destroy_cq_v1 {
diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
index f3cb2fc..6e02c9a 100644
--- a/include/infiniband/verbs.h
+++ b/include/infiniband/verbs.h
@@ -1,6 +1,6 @@
 /*
  * Copyright (c) 2004, 2005 Topspin Communications.  All rights reserved.
- * Copyright (c) 2004 Intel Corporation.  All rights reserved.
+ * Copyright (c) 2004, 2011-2012 Intel Corporation.  All rights reserved.
  * Copyright (c) 2005, 2006, 2007 Cisco Systems, Inc.  All rights reserved.
  * Copyright (c) 2005 PathScale, Inc.  All rights reserved.
  *
@@ -317,6 +317,18 @@ struct ibv_pd {
uint32_tcomp_mask;
 };
 
+enum ibv_xrcd_mask {
+   IBV_XRCD_CONTEXT= 1  0,
+   IBV_XRCD_HANDLE = 1  1,
+   IBV_XRCD_RESERVED   = 1  2
+};
+
+struct ibv_xrcd {
+   uint64_tcomp_mask;
+   struct ibv_context *context;
+   uint32_thandle;
+};
+
 enum ibv_rereg_mr_flags {
IBV_REREG_MR_CHANGE_TRANSLATION = (1  0),
IBV_REREG_MR_CHANGE_PD  = (1  1),
@@ -774,11 +786,13 @@ struct ibv_context {
 };
 
 struct verbs_context {
-
/*  grows up - new fields go here
int (*drv_new_func1) ();new corresponding provider call of func1
int (*lib_new_func1) ();New library call func1
*/
+   struct ibv_xrcd *   (*open_xrcd)(struct ibv_context *context,
+int fd, int oflags);
+   int (*close_xrcd)(struct ibv_xrcd *xrcd);
size_t sz;  /* Set by library on struct allocation,must be
* located right before struct ibv_context
*/
@@ -873,7 +887,7 @@ static inline int ___ibv_query_port(struct ibv_context 
*context,
uint8_t port_num,
struct ibv_port_attr *port_attr)
 {
-   /* For compatability when running with old 

[PATCH 6/8] libibverbs: libibverbs: Add ibv_open_qp

2012-09-20 Thread Hefty, Sean
XRC receive QPs are shareable across multiple processes.  Allow
any process with access to the xrc domain to open an existing
QP.  After opening the QP, the process will receive events
related to the QP and be able to modify the QP.

Signed-off-by: Sean Hefty sean.he...@intel.com
---
 include/infiniband/driver.h   |4 
 include/infiniband/kern-abi.h |   20 ++--
 include/infiniband/verbs.h|   25 -
 src/cmd.c |   22 ++
 src/libibverbs.map|2 ++
 src/verbs.c   |   36 
 6 files changed, 106 insertions(+), 3 deletions(-)

diff --git a/include/infiniband/driver.h b/include/infiniband/driver.h
index c86109d..96f 100644
--- a/include/infiniband/driver.h
+++ b/include/infiniband/driver.h
@@ -126,6 +126,10 @@ int ibv_cmd_create_qp_ex(struct ibv_pd *pd,
 struct ibv_qp *qp, struct ibv_qp_init_attr_ex *attr_ex,
 struct ibv_create_qp *cmd, size_t cmd_size,
 struct ibv_create_qp_resp *resp, size_t resp_size);
+int ibv_cmd_open_qp(struct ibv_xrcd *xrcd,
+   struct ibv_qp *qp, struct ibv_qp_open_attr *attr,
+   struct ibv_open_qp *cmd, size_t cmd_size,
+   struct ibv_create_qp_resp *resp, size_t resp_size);
 int ibv_cmd_query_qp(struct ibv_qp *qp, struct ibv_qp_attr *qp_attr,
 int attr_mask,
 struct ibv_qp_init_attr *qp_init_attr,
diff --git a/include/infiniband/kern-abi.h b/include/infiniband/kern-abi.h
index b6d5ce9..e24fa4f 100644
--- a/include/infiniband/kern-abi.h
+++ b/include/infiniband/kern-abi.h
@@ -88,7 +88,8 @@ enum {
IB_USER_VERBS_CMD_POST_SRQ_RECV,
IB_USER_VERBS_CMD_OPEN_XRCD,
IB_USER_VERBS_CMD_CLOSE_XRCD,
-   IB_USER_VERBS_CMD_CREATE_XSRQ
+   IB_USER_VERBS_CMD_CREATE_XSRQ,
+   IB_USER_VERBS_CMD_OPEN_QP
 };
 
 /*
@@ -476,6 +477,20 @@ struct ibv_create_qp {
__u64 driver_data[0];
 };
 
+struct ibv_open_qp {
+   __u32 command;
+   __u16 in_words;
+   __u16 out_words;
+   __u64 response;
+   __u64 user_handle;
+   __u32 pd_handle;
+   __u32 qpn;
+   __u8  qp_type;
+   __u8  reserved[7];
+   __u64 driver_data[0];
+};
+
+/* also used for open response */
 struct ibv_create_qp_resp {
__u32 qp_handle;
__u32 qpn;
@@ -852,7 +867,8 @@ enum {
IB_USER_VERBS_CMD_CREATE_COMP_CHANNEL_V2 = -1,
IB_USER_VERBS_CMD_OPEN_XRCD_V2 = -1,
IB_USER_VERBS_CMD_CLOSE_XRCD_V2 = -1,
-   IB_USER_VERBS_CMD_CREATE_XSRQ_V2 = -1
+   IB_USER_VERBS_CMD_CREATE_XSRQ_V2 = -1,
+   IB_USER_VERBS_CMD_OPEN_QP_V2 = -1
 };
 
 struct ibv_destroy_cq_v1 {
diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
index 9524095..a3bf14c 100644
--- a/include/infiniband/verbs.h
+++ b/include/infiniband/verbs.h
@@ -504,6 +504,20 @@ struct ibv_qp_init_attr_ex {
struct ibv_xrcd*xrcd;
 };
 
+enum ibv_qp_open_attr_mask {
+   IBV_QP_OPEN_ATTR_CONTEXT= 1  0,
+   IBV_QP_OPEN_ATTR_NUM= 1  1,
+   IBV_QP_OPEN_ATTR_TYPE   = 1  2,
+   IBV_QP_OPEN_ATTR_RESERVED   = 1  3
+ };
+
+struct ibv_qp_open_attr {
+   uint32_tcomp_mask;
+   uint32_tqp_num;
+   void   *qp_context;
+   enum ibv_qp_typeqp_type;
+};
+
 enum ibv_qp_attr_mask {
IBV_QP_STATE= 1   0,
IBV_QP_CUR_STATE= 1   1,
@@ -535,7 +549,8 @@ enum ibv_qp_state {
IBV_QPS_RTS,
IBV_QPS_SQD,
IBV_QPS_SQE,
-   IBV_QPS_ERR
+   IBV_QPS_ERR,
+   IBV_QPS_UNKNOWN
 };
 
 enum ibv_mig_state {
@@ -848,6 +863,8 @@ struct verbs_context {
int (*drv_new_func1) ();new corresponding provider call of func1
int (*lib_new_func1) ();New library call func1
*/
+   struct ibv_qp * (*open_qp)(struct ibv_xrcd *xrcd,
+  struct ibv_qp_open_attr *attr);
struct ibv_qp * (*create_qp_ex)(struct ibv_pd *pd,
struct ibv_qp_init_attr_ex 
*qp_init_attr_ex);
struct ibv_srq *(*create_srq_ex)(struct ibv_pd *pd,
@@ -1171,6 +1188,12 @@ struct ibv_qp *ibv_create_qp_ex(struct ibv_pd *pd,
struct ibv_qp_init_attr_ex *qp_init_attr_ex);
 
 /**
+ * ibv_open_qp - Open a shareable queue pair.
+ */
+struct ibv_qp *ibv_open_qp(struct ibv_xrcd *xrcd,
+  struct ibv_qp_open_attr *qp_open_attr);
+
+/**
  * ibv_modify_qp - Modify a queue pair.
  */
 int ibv_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr,
diff --git a/src/cmd.c b/src/cmd.c
index 005bf94..be4fac7 100644
--- a/src/cmd.c
+++ b/src/cmd.c
@@ -716,6 +716,28 @@ int ibv_cmd_create_qp(struct ibv_pd *pd,
return 

[PATCH 5/8] libibverbs: libibverbs: Add support for XRC QPs

2012-09-20 Thread Hefty, Sean
XRC queue pairs: xrc defines two new types of QPs.  The
initiator, or send-side, xrc qp behaves similar to a send-
only RC qp.  xrc send qp's are managed through the existing
QP functions.  The send_wr structure is extended in a back-
wards compatible way to support posting sends on a send xrc
qp, which require specifying the remote xrc srq.

The target, or receive-side, xrc qp behaves differently
than other implemented qp's.  A recv xrc qp can be created,
modified, and destroyed like other qp's through the existing
calls.  The qp_init_attr structure is extended for xrc qp's.

Because xrc recv qp's are bound to an xrcd, rather than a pd,
it is intended to be used among multiple processes.  Any process
with access to an xrcd may allocate and connect an xrc recv qp.
The actual xrc recv qp is allocated and managed by the kernel.
If the owning process explicit destroys the xrc recv qp, it is
destroyed.  However, if the xrc recv qp is left open when the
user process exits or closes its device, then the lifetime of
the xrc recv qp is bound with the lifetime of the xrcd.

Signed-off-by: Sean Hefty sean.he...@intel.com
---
 include/infiniband/driver.h   |4 ++
 include/infiniband/kern-abi.h |5 +++
 include/infiniband/verbs.h|   39 +++--
 src/cmd.c |   76 +
 src/libibverbs.map|2 +
 src/verbs.c   |   60 
 6 files changed, 151 insertions(+), 35 deletions(-)

diff --git a/include/infiniband/driver.h b/include/infiniband/driver.h
index cac48ab..c86109d 100644
--- a/include/infiniband/driver.h
+++ b/include/infiniband/driver.h
@@ -122,6 +122,10 @@ int ibv_cmd_create_qp(struct ibv_pd *pd,
  struct ibv_qp *qp, struct ibv_qp_init_attr *attr,
  struct ibv_create_qp *cmd, size_t cmd_size,
  struct ibv_create_qp_resp *resp, size_t resp_size);
+int ibv_cmd_create_qp_ex(struct ibv_pd *pd,
+struct ibv_qp *qp, struct ibv_qp_init_attr_ex *attr_ex,
+struct ibv_create_qp *cmd, size_t cmd_size,
+struct ibv_create_qp_resp *resp, size_t resp_size);
 int ibv_cmd_query_qp(struct ibv_qp *qp, struct ibv_qp_attr *qp_attr,
 int attr_mask,
 struct ibv_qp_init_attr *qp_init_attr,
diff --git a/include/infiniband/kern-abi.h b/include/infiniband/kern-abi.h
index 3d72fa7..b6d5ce9 100644
--- a/include/infiniband/kern-abi.h
+++ b/include/infiniband/kern-abi.h
@@ -617,6 +617,11 @@ struct ibv_kern_send_wr {
__u32 remote_qkey;
__u32 reserved;
} ud;
+   struct {
+   __u64 reserved[3];
+   __u32 reserved2;
+   __u32 remote_srqn;
+   } xrc;
} wr;
 };
 
diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
index 8756fed..9524095 100644
--- a/include/infiniband/verbs.h
+++ b/include/infiniband/verbs.h
@@ -106,7 +106,8 @@ enum ibv_device_cap_flags {
IBV_DEVICE_SYS_IMAGE_GUID   = 1  11,
IBV_DEVICE_RC_RNR_NAK_GEN   = 1  12,
IBV_DEVICE_SRQ_RESIZE   = 1  13,
-   IBV_DEVICE_N_NOTIFY_CQ  = 1  14
+   IBV_DEVICE_N_NOTIFY_CQ  = 1  14,
+   IBV_DEVICE_XRC  = 1  20
 };
 
 enum ibv_atomic_cap {
@@ -462,7 +463,9 @@ struct ibv_srq_init_attr_ex {
 enum ibv_qp_type {
IBV_QPT_RC = 2,
IBV_QPT_UC,
-   IBV_QPT_UD
+   IBV_QPT_UD,
+   IBV_QPT_XRC_SEND = 9,
+   IBV_QPT_XRC_RECV
 };
 
 struct ibv_qp_cap {
@@ -483,6 +486,24 @@ struct ibv_qp_init_attr {
int sq_sig_all;
 };
 
+enum ibv_qp_init_attr_mask {
+   IBV_QP_INIT_ATTR_XRCD   = 1  0,
+   IBV_QP_INIT_ATTR_RESERVED   = 1  1
+};
+
+struct ibv_qp_init_attr_ex {
+   void   *qp_context;
+   struct ibv_cq  *send_cq;
+   struct ibv_cq  *recv_cq;
+   struct ibv_srq *srq;
+   struct ibv_qp_cap   cap;
+   enum ibv_qp_typeqp_type;
+   int sq_sig_all;
+
+   uint64_tcomp_mask;
+   struct ibv_xrcd*xrcd;
+};
+
 enum ibv_qp_attr_mask {
IBV_QP_STATE= 1   0,
IBV_QP_CUR_STATE= 1   1,
@@ -598,6 +619,11 @@ struct ibv_send_wr {
uint32_tremote_qpn;
uint32_tremote_qkey;
} ud;
+   struct {
+   uint64_treserved[3];
+   uint32_treserved2;
+   uint32_tremote_srqn;
+   } xrc;
} wr;
 };
 
@@ -643,7 +669,8 @@ struct ibv_srq {
 };
 
 enum ibv_qp_mask {
-   IBV_QP_RESERVED = 1  0
+   IBV_QP_XRCD = 1  

[PATCH 2/8] libibverbs: Support older providers that do not support extensions

2012-09-20 Thread Hefty, Sean
In order to support providers that do not handle extensions, including
providers built against an older version of ibverbs, add a compatibility
layer.  This allows most of the core ibverbs code to assume that
extensions are always available.  The compatibility layer is responsible
for converting between the extended verbs and the current structure
definitions.

The compatibility layer allows applications to make use of extended
structures, independent of the provider supporting them, and allows us
to extend existing structures which are normally allocated by the
provider: ibv_qp, ibv_srq, ibv_ah, ibv_mr, ibv_cq, ibv_pd, ibv_mw,
and ibv_comp_channel.  Existing applications are unaffected.

The compatibility layer is similar to the 1.0 compat code.  It allocates
the extended structures, which then point to the ones allocated by the
provider.  For the most part, the compatibility code is invoked by
directing the ibv_context ops into a compat function, which then redirects
the call to the provider.  This results in one extra level of indirection
when running with a provider that does not support extensions.

The exceptions to the indirection are opening and closing a device and
handling asynchronous events.

Signed-off-by: Sean Hefty sean.he...@intel.com
---
 Makefile.am|2 
 include/infiniband/verbs.h |   57 -
 src/compat-ex.c|  511 
 src/device.c   |  112 +-
 src/ibverbs.h  |   29 ++
 src/verbs.c|9 +
 6 files changed, 655 insertions(+), 65 deletions(-)
 create mode 100644 src/compat-ex.c

diff --git a/Makefile.am b/Makefile.am
index cd00a65..1cfb008 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -11,7 +11,7 @@ libibverbs_version_script = @LIBIBVERBS_VERSION_SCRIPT@
 
 src_libibverbs_la_SOURCES = src/cmd.c src/compat-1_0.c src/device.c src/init.c 
\
src/marshall.c src/memory.c src/sysfs.c src/verbs.c 
\
-   src/enum_strs.c
+   src/enum_strs.c src/compat-ex.c
 src_libibverbs_la_LDFLAGS = -version-info 1 -export-dynamic \
 $(libibverbs_version_script)
 src_libibverbs_la_DEPENDENCIES = $(srcdir)/src/libibverbs.map
diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
index a2577d8..f3cb2fc 100644
--- a/include/infiniband/verbs.h
+++ b/include/infiniband/verbs.h
@@ -307,9 +307,14 @@ enum ibv_access_flags {
IBV_ACCESS_MW_BIND  = (14)
 };
 
+enum ibv_pd_mask {
+   IBV_PD_RESERVED = 1  0
+};
+
 struct ibv_pd {
struct ibv_context *context;
uint32_thandle;
+   uint32_tcomp_mask;
 };
 
 enum ibv_rereg_mr_flags {
@@ -319,6 +324,10 @@ enum ibv_rereg_mr_flags {
IBV_REREG_MR_KEEP_VALID = (1  3)
 };
 
+enum ibv_mr_mask {
+   IBV_MR_RESERVED = 1  0
+};
+
 struct ibv_mr {
struct ibv_context *context;
struct ibv_pd  *pd;
@@ -327,6 +336,7 @@ struct ibv_mr {
uint32_thandle;
uint32_tlkey;
uint32_trkey;
+   uint32_tcomp_mask;
 };
 
 enum ibv_mw_type {
@@ -334,10 +344,15 @@ enum ibv_mw_type {
IBV_MW_TYPE_2   = 2
 };
 
+enum ibv_mw_mask {
+   IBV_MW_RESERVED = 1  0
+};
+
 struct ibv_mw {
struct ibv_context *context;
struct ibv_pd  *pd;
uint32_trkey;
+   uint32_tcomp_mask;
 };
 
 struct ibv_global_route {
@@ -568,6 +583,10 @@ struct ibv_mw_bind {
int mw_access_flags;
 };
 
+enum ibv_srq_mask {
+   IBV_SRQ_RESERVED= 1  0
+};
+
 struct ibv_srq {
struct ibv_context *context;
void   *srq_context;
@@ -577,6 +596,12 @@ struct ibv_srq {
pthread_mutex_t mutex;
pthread_cond_t  cond;
uint32_tevents_completed;
+
+   uint32_tcomp_mask;
+};
+
+enum ibv_qp_mask {
+   IBV_QP_RESERVED = 1  0
 };
 
 struct ibv_qp {
@@ -594,12 +619,24 @@ struct ibv_qp {
pthread_mutex_t mutex;
pthread_cond_t  cond;
uint32_tevents_completed;
+
+   uint32_tcomp_mask;
+};
+
+enum ibv_comp_channel_mask {
+   IBV_COMP_CHANNEL_RESERVED   = 1  0
 };
 
 struct ibv_comp_channel {
struct ibv_context *context;
int fd;
int refcnt;
+
+   uint32_tcomp_mask;
+};
+
+enum ibv_cq_mask {
+   IBV_CQ_RESERVED = 1  0
 };
 
 struct ibv_cq {
@@ -613,16 +650,25 @@ struct ibv_cq {
pthread_cond_t  cond;
uint32_tcomp_events_completed;
uint32_tasync_events_completed;
+
+   uint32_t   

[PATCH 8/8] libibverbs: Add XRC sample application

2012-09-20 Thread Hefty, Sean
From: Jay Sternberg jay.e.sternb...@intel.com

Signed-off-by: Jay Sternberg jay.e.sternb...@intel.com
Signed-off-by: Sean Hefty sean.he...@intel.com
---
 Makefile.am  |4 
 examples/xsrq_pingpong.c |  877 ++
 2 files changed, 880 insertions(+), 1 deletions(-)
 create mode 100644 examples/xsrq_pingpong.c

diff --git a/Makefile.am b/Makefile.am
index 1cfb008..939b9d2 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -18,7 +18,7 @@ src_libibverbs_la_DEPENDENCIES = $(srcdir)/src/libibverbs.map
 
 bin_PROGRAMS = examples/ibv_devices examples/ibv_devinfo \
 examples/ibv_asyncwatch examples/ibv_rc_pingpong examples/ibv_uc_pingpong \
-examples/ibv_ud_pingpong examples/ibv_srq_pingpong
+examples/ibv_ud_pingpong examples/ibv_srq_pingpong 
examples/ibv_xsrq_pingpong
 examples_ibv_devices_SOURCES = examples/device_list.c
 examples_ibv_devices_LDADD = $(top_builddir)/src/libibverbs.la
 examples_ibv_devinfo_SOURCES = examples/devinfo.c
@@ -31,6 +31,8 @@ examples_ibv_ud_pingpong_SOURCES = examples/ud_pingpong.c 
examples/pingpong.c
 examples_ibv_ud_pingpong_LDADD = $(top_builddir)/src/libibverbs.la
 examples_ibv_srq_pingpong_SOURCES = examples/srq_pingpong.c examples/pingpong.c
 examples_ibv_srq_pingpong_LDADD = $(top_builddir)/src/libibverbs.la
+examples_ibv_xsrq_pingpong_SOURCES = examples/xsrq_pingpong.c 
examples/pingpong.c
+examples_ibv_xsrq_pingpong_LDADD = $(top_builddir)/src/libibverbs.la
 examples_ibv_asyncwatch_SOURCES = examples/asyncwatch.c
 examples_ibv_asyncwatch_LDADD = $(top_builddir)/src/libibverbs.la
 
diff --git a/examples/xsrq_pingpong.c b/examples/xsrq_pingpong.c
new file mode 100644
index 000..a1763d3
--- /dev/null
+++ b/examples/xsrq_pingpong.c
@@ -0,0 +1,877 @@
+/*
+ * Copyright (c) 2005 Topspin Communications.  All rights reserved.
+ * Copyright (c) 2011 Intel Corporation, Inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#if HAVE_CONFIG_H
+#  include config.h
+#endif /* HAVE_CONFIG_H */
+
+#include stdio.h
+#include fcntl.h
+#include errno.h
+#include stdlib.h
+#include unistd.h
+#include string.h
+#include sys/types.h
+#include sys/socket.h
+#include sys/time.h
+#include netdb.h
+#include malloc.h
+#include getopt.h
+#include arpa/inet.h
+#include time.h
+
+#include pingpong.h
+
+#define MSG_FORMAT %04x:%06x:%06x:%06x:%04x
+#define MSG_SIZE   30
+#define MSG_SSCAN  %x:%x:%x:%x:%x
+#define ADDR_FORMAT \
+   %8s: LID %04x, QPN RECV %06x SEND %06x, PSN %06x, SRQN %04x\n
+static int page_size;
+
+struct pingpong_dest {
+   int lid;
+   int recv_qpn;
+   int send_qpn;
+   int recv_psn;
+   int send_psn;
+   int srqn;
+   int pp_cnt;
+};
+
+struct pingpong_context {
+   struct ibv_context  *context;
+   struct ibv_comp_channel *channel;
+   struct ibv_pd   *pd;
+   struct ibv_mr   *mr;
+   struct ibv_cq   *send_cq;
+   struct ibv_cq   *recv_cq;
+   struct ibv_srq  *srq;
+   struct ibv_xrcd *xrcd;
+   struct ibv_qp   **recv_qp;
+   struct ibv_qp   **send_qp;
+   struct pingpong_dest*rem_dest;
+   void*buf;
+   int  lid;
+   int  sl;
+   enum ibv_mtu mtu;
+   int  ib_port;
+   int  fd;
+   int  size;
+   int  num_clients;
+   int  num_tests;
+   int  use_event;
+};
+
+struct pingpong_context ctx;
+
+
+static int 

[PATCH 1/2] libmlx4: Infra-structure changes to support verbs extensions

2012-09-20 Thread Hefty, Sean
From: Yishai Hadas yish...@mellanox.com

Signed-off-by: Yishai Hadas yish...@mellanox.com
Signed-off-by: Tzahi Oved tza...@mellanox.com
---
 src/mlx4.c |   83 +++-
 src/mlx4.h |   16 
 2 files changed, 70 insertions(+), 29 deletions(-)

diff --git a/src/mlx4.c b/src/mlx4.c
index 8cf249a..1a4e8b0 100644
--- a/src/mlx4.c
+++ b/src/mlx4.c
@@ -120,22 +120,34 @@ static struct ibv_context_ops mlx4_ctx_ops = {
.detach_mcast  = ibv_cmd_detach_mcast
 };
 
+/*  function is used to indicate to the library of verbs extensions
+  *   provider capable
+  */
 static struct ibv_context *mlx4_alloc_context(struct ibv_device *ibdev, int 
cmd_fd)
 {
-   struct mlx4_context*context;
+   return (struct ibv_context *)(((uint8_t *)NULL)-1);
+}
+
+static int mlx4_init_context(struct verbs_device *device,
+   struct ibv_context *ibv_ctx, int cmd_fd)
+{
+   struct mlx4_context *context;
struct ibv_get_context  cmd;
struct mlx4_alloc_ucontext_resp resp;
int i;
+   /* verbs_context should be used for new verbs
+ *struct verbs_context *verbs_ctx = verbs_get_ctx(ibv_ctx);
+*/
 
-   context = calloc(1, sizeof *context);
-   if (!context)
-   return NULL;
-
-   context-ibv_ctx.cmd_fd = cmd_fd;
+   /* memory footprint of mlx4_context and verbs_context share
+ * struct ibv_context.
+   */
+   context = to_mctx(ibv_ctx);
+   ibv_ctx-cmd_fd = cmd_fd;
 
-   if (ibv_cmd_get_context(context-ibv_ctx, cmd, sizeof cmd,
+   if (ibv_cmd_get_context(ibv_ctx, cmd, sizeof(cmd),
resp.ibv_resp, sizeof resp))
-   goto err_free;
+   return errno;
 
context-num_qps= resp.qp_tab_size;
context-qp_table_shift = ffs(context-num_qps) - 1 - 
MLX4_QP_TABLE_BITS;
@@ -150,15 +162,15 @@ static struct ibv_context *mlx4_alloc_context(struct 
ibv_device *ibdev, int cmd_
 
pthread_mutex_init(context-db_list_mutex, NULL);
 
-   context-uar = mmap(NULL, to_mdev(ibdev)-page_size, PROT_WRITE,
+   context-uar = mmap(NULL, to_mdev_ex(device)-page_size, PROT_WRITE,
MAP_SHARED, cmd_fd, 0);
if (context-uar == MAP_FAILED)
-   goto err_free;
+   return errno;
 
if (resp.bf_reg_size) {
-   context-bf_page = mmap(NULL, to_mdev(ibdev)-page_size,
+   context-bf_page = mmap(NULL, to_mdev_ex(device)-page_size,
PROT_WRITE, MAP_SHARED, cmd_fd,
-   to_mdev(ibdev)-page_size);
+   to_mdev_ex(device)-page_size);
if (context-bf_page == MAP_FAILED) {
fprintf(stderr, PFX Warning: BlueFlame available, 
but failed to mmap() BlueFlame page.\n);
@@ -176,23 +188,28 @@ static struct ibv_context *mlx4_alloc_context(struct 
ibv_device *ibdev, int cmd_
 
pthread_spin_init(context-uar_lock, PTHREAD_PROCESS_PRIVATE);
 
-   context-ibv_ctx.ops = mlx4_ctx_ops;
-
-   return context-ibv_ctx;
+   ibv_ctx-ops = mlx4_ctx_ops;
+   /* New verbs should be added as below
+ * verbs_ctx-drv_new_func1 = mlx4_new_func1;
+ */
+   return 0;
 
-err_free:
-   free(context);
-   return NULL;
 }
 
-static void mlx4_free_context(struct ibv_context *ibctx)
+
+static void mlx4_uninit_context(struct verbs_device *device,
+   struct ibv_context *ibv_ctx)
 {
-   struct mlx4_context *context = to_mctx(ibctx);
+   struct mlx4_context *context = to_mctx(ibv_ctx);
 
-   munmap(context-uar, to_mdev(ibctx-device)-page_size);
+   munmap(context-uar, to_mdev_ex(device)-page_size);
if (context-bf_page)
-   munmap(context-bf_page, to_mdev(ibctx-device)-page_size);
-   free(context);
+   munmap(context-bf_page, to_mdev_ex(device)-page_size);
+}
+
+static void mlx4_free_context(struct ibv_context *ibctx)
+{
+   return;
 }
 
 static struct ibv_device_ops mlx4_dev_ops = {
@@ -204,7 +221,7 @@ static struct ibv_device *mlx4_driver_init(const char 
*uverbs_sys_path,
int abi_version)
 {
charvalue[8];
-   struct mlx4_device*dev;
+   struct mlx4_device_ex   *dev;
unsignedvendor, device;
int i;
 
@@ -226,7 +243,7 @@ static struct ibv_device *mlx4_driver_init(const char 
*uverbs_sys_path,
return NULL;
 
 found:
-   if (abi_version  MLX4_UVERBS_MIN_ABI_VERSION ||
+   if (abi_version = MLX4_UVERBS_MIN_ABI_VERSION ||
abi_version  MLX4_UVERBS_MAX_ABI_VERSION) {
fprintf(stderr, PFX Fatal: ABI version %d of %s is 

[PATCH 2/2] libmlx4: Add support for XRC QPs

2012-09-20 Thread Hefty, Sean

Signed-off-by: Sean Hefty sean.he...@intel.com
---
Note that I have a hack in cq.c.  Someone more familiar with the mlx4
HW needs to look at the change.

 src/buf.c  |6 +-
 src/cq.c   |   40 ---
 src/mlx4-abi.h |6 ++
 src/mlx4.c |   19 +++--
 src/mlx4.h |   59 
 src/qp.c   |   35 ++
 src/srq.c  |  151 +
 src/verbs.c|  205 ++--
 8 files changed, 436 insertions(+), 85 deletions(-)

diff --git a/src/buf.c b/src/buf.c
index a80bcb1..50957bb 100644
--- a/src/buf.c
+++ b/src/buf.c
@@ -78,6 +78,8 @@ int mlx4_alloc_buf(struct mlx4_buf *buf, size_t size, int 
page_size)
 
 void mlx4_free_buf(struct mlx4_buf *buf)
 {
-   ibv_dofork_range(buf-buf, buf-length);
-   munmap(buf-buf, buf-length);
+   if (buf-length) {
+   ibv_dofork_range(buf-buf, buf-length);
+   munmap(buf-buf, buf-length);
+   }
 }
diff --git a/src/cq.c b/src/cq.c
index 8f7a8cc..5945270 100644
--- a/src/cq.c
+++ b/src/cq.c
@@ -220,33 +220,43 @@ static int mlx4_poll_one(struct mlx4_cq *cq,
rmb();
 
qpn = ntohl(cqe-vlan_my_qpn)  MLX4_CQE_QPN_MASK;
+   wc-qp_num = qpn;
 
is_send  = cqe-owner_sr_opcode  MLX4_CQE_IS_SEND_MASK;
is_error = (cqe-owner_sr_opcode  MLX4_CQE_OPCODE_MASK) ==
MLX4_CQE_OPCODE_ERROR;
 
-   if (!*cur_qp ||
-   (qpn != (*cur_qp)-ibv_qp.qp_num)) {
+   if ((qpn  MLX4_XRC_QPN_BIT)  !is_send) {
/*
-* We do not have to take the QP table lock here,
-* because CQs will be locked while QPs are removed
+* We do not have to take the XSRQ table lock here,
+* because CQs will be locked while SRQs are removed
 * from the table.
 */
-   *cur_qp = mlx4_find_qp(to_mctx(cq-ibv_cq.context), qpn);
-   if (!*cur_qp)
+   srq = mlx4_find_xsrq(to_mctx(cq-ibv_cq.context)-xsrq_table,
+ntohl(cqe-g_mlpath_rqpn)  
MLX4_CQE_QPN_MASK);
+   if (!srq)
return CQ_POLL_ERR;
+   } else {
+   if (!*cur_qp || (qpn != (*cur_qp)-ibv_qp.qp_num)) {
+   /*
+* We do not have to take the QP table lock here,
+* because CQs will be locked while QPs are removed
+* from the table.
+*/
+   *cur_qp = mlx4_find_qp(to_mctx(cq-ibv_cq.context), 
qpn);
+   if (!*cur_qp)
+   return CQ_POLL_ERR;
+   }
+   srq = ((*cur_qp)-ibv_qp.srq) ? to_msrq((*cur_qp)-ibv_qp.srq) 
: NULL;
}
 
-   wc-qp_num = (*cur_qp)-ibv_qp.qp_num;
-
if (is_send) {
wq = (*cur_qp)-sq;
wqe_index = ntohs(cqe-wqe_index);
wq-tail += (uint16_t) (wqe_index - (uint16_t) wq-tail);
wc-wr_id = wq-wrid[wq-tail  (wq-wqe_cnt - 1)];
++wq-tail;
-   } else if ((*cur_qp)-ibv_qp.srq) {
-   srq = to_msrq((*cur_qp)-ibv_qp.srq);
+   } else if (srq) {
wqe_index = htons(cqe-wqe_index);
wc-wr_id = srq-wrid[wqe_index];
mlx4_free_srq_wqe(srq, wqe_index);
@@ -322,7 +332,8 @@ static int mlx4_poll_one(struct mlx4_cq *cq,
wc-dlid_path_bits = (g_mlpath_rqpn  24)  0x7f;
wc-wc_flags  |= g_mlpath_rqpn  0x8000 ? IBV_WC_GRH : 
0;
wc-pkey_index = ntohl(cqe-immed_rss_invalid)  0x7f;
-   if ((*cur_qp)-link_layer == IBV_LINK_LAYER_ETHERNET)
+   /* HACK */
+   if ((*cur_qp)  (*cur_qp)-link_layer == 
IBV_LINK_LAYER_ETHERNET)
wc-sl = ntohs(cqe-sl_vid)  13;
else
wc-sl = ntohs(cqe-sl_vid)  12;
@@ -411,7 +422,12 @@ void __mlx4_cq_clean(struct mlx4_cq *cq, uint32_t qpn, 
struct mlx4_srq *srq)
 */
while ((int) --prod_index - (int) cq-cons_index = 0) {
cqe = get_cqe(cq, prod_index  cq-ibv_cq.cqe);
-   if ((ntohl(cqe-vlan_my_qpn)  MLX4_CQE_QPN_MASK) == qpn) {
+   if (srq  srq-ext_srq 
+   ntohl(cqe-g_mlpath_rqpn  MLX4_CQE_QPN_MASK) == 
MLX4_GET_SRQN(srq) 
+   !(cqe-owner_sr_opcode  MLX4_CQE_IS_SEND_MASK)) {
+   mlx4_free_srq_wqe(srq, ntohs(cqe-wqe_index));
+   ++nfreed;
+   } else if ((ntohl(cqe-vlan_my_qpn)  MLX4_CQE_QPN_MASK) == 
qpn) {
if (srq  !(cqe-owner_sr_opcode  
MLX4_CQE_IS_SEND_MASK))
mlx4_free_srq_wqe(srq, ntohs(cqe-wqe_index));
++nfreed;
diff --git a/src/mlx4-abi.h b/src/mlx4-abi.h
index 20a40c9..40d0d9a 100644
--- 

[PATCH 7/8] libibverbs: Add man page for ibv_open_qp

2012-09-20 Thread Hefty, Sean
Signed-off-by: Sean Hefty sean.he...@intel.com
---
 man/ibv_open_qp.3 |   50 ++
 1 files changed, 50 insertions(+), 0 deletions(-)
 create mode 100644 man/ibv_open_qp.3

diff --git a/man/ibv_open_qp.3 b/man/ibv_open_qp.3
new file mode 100644
index 000..0bc5647
--- /dev/null
+++ b/man/ibv_open_qp.3
@@ -0,0 +1,50 @@
+.\ -*- nroff -*-
+.\
+.TH IBV_OPEN_QP 3 2011-08-12 libibverbs Libibverbs Programmer's Manual
+.SH NAME
+ibv_open_qp \- open a shareable queue pair (QP)
+.SH SYNOPSIS
+.nf
+.B #include infiniband/verbs.h
+.sp
+.BI struct ibv_qp *ibv_open_qp(struct ibv_xrcd  *xrcd ,
+.BIstruct ibv_qp_open_attr  *qp_open_attr );
+.fi
+.SH DESCRIPTION
+.B ibv_open_qp()
+opens an existing queue pair (QP) associated with the extended protection 
domain
+.I xrcd\fR.
+The argument
+.I qp_open_attr
+is an ibv_qp_open_attr struct, as defined in infiniband/verbs.h.
+.PP
+.nf
+struct ibv_qp_open_attr {
+.in +8
+uint32_t  comp_mask;  /* Identifies valid fields */
+uint32_t  qp_num; /* QP number */
+void *qp_context; /* Associated context of the QP */
+enum ibv_qp_type  qp_type;/* QP transport service type */
+.fi
+.PP
+.B ibv_destroy_qp()
+closes the opened QP and destroys the underlying QP if it has no
+other references.
+.I qp\fR.
+.SH RETURN VALUE
+.B ibv_open_qp()
+returns a pointer to the opened QP, or NULL if the request fails.
+Check the QP number (\fBqp_num\fR) in the returned QP.
+.SH NOTES
+.B ibv_open_qp()
+will fail if a it is asked to open a QP that does not exist within
+the xrcd with the specified qp_num and qp_type.
+.SH SEE ALSO
+.BR ibv_alloc_pd (3),
+.BR ibv_create_qp (3),
+.BR ibv_create_qp_ex (3),
+.BR ibv_modify_qp (3),
+.BR ibv_query_qp (3)
+.SH AUTHORS
+.TP
+Sean Hefty sean.he...@intel.com


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: how to preserve QP over HA events for librdmacm applications

2012-09-20 Thread Hefty, Sean
 Fair enough, I understand one needs to use a different CM id. For the IB
 case I was thinking of avoiding APM (since that is limited to a device
 -isn't that so?).

APM is limited to a single device, as is memory registration, CQs, PDs, SRQs, 
etc.  Migration between devices requires entirely new memory registrations, the 
use of different lkeys/rkeys, and new CQs.  There's no guarantee that the HW 
devices support the same features - registration size, QP size, CQ size, etc.

 Is PD device specific? Couldn't one reuse the same CQs and MRs, even
 though the QP is different? Of course only one QP would be active at any
 time.

You can only reuse the resources if you limit yourself to the same device.

Supporting migration between devices requires a higher level abstraction which 
hides the internal RDMA device details.

HA itself likely requires more than simply establishing a new connection.  You 
may need to resolve the addresses again, to determine where to migrate to, plus 
obtain new path records.  Any app that wants full HA capability really needs to 
be able to handle a connection failing completely and establishing a new one.

- Sean
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next V2 04/22] IB/mlx4: SRIOV IB context objects and proxy/tunnel sqp support

2012-09-20 Thread Or Gerlitz
On Tue, Sep 11, 2012 at 8:10 PM, Doug Ledford dledf...@redhat.com wrote:
 On 8/3/2012 4:40 AM, Jack Morgenstein wrote:
  struct mlx4_ib_sriov{} is created by the master only.
  It is a container for the following:
  1. All the info required by the PPF to multiplex and de-multiplex MADs
 (including those from the PF). (struct mlx4_ib_demux_ctx demux)

 OK, so can we have at least a single reference to the various
 abbreviations before using them exclusively?  I know PF and PPF may be
 common, but it might be nice that they were used once in full form
 before abbreviated in commit messages.

PF is physical function and PPF primary physical function


  2. All the info required to manage alias GUIDs (i.e., the GUID at
 index 0 that each guest perceives.  In fact, this is not the
 GUID which is actually at index 0, but is, in fact, the GUID
 which is at index[VF number] in the physical table.

 OK, this has been one of the things that has made reviewing this
 difficult.  I freely admit that I've steadfastly ignored SRIOV for as
 long as I can, so maybe this is just me.  But, in the context of this
 driver, how am I supposed to know which code paths will be on the host
 and which on the guest?

For the mlx4 driver the approach taken was to para-virtualize mlx4_core
such that both the PPF and VFs run the same driver but within that
driver two flows are operative. In mlx4_core it should be pretty clear
to see where are you now, in mlx4_ib sometimes less easy, I'll leave
that to Jack
to address with more details.

 Also, I note that you do math every time you want to know if you are on
 a parent device or a virtual device.  Do you really want to do math all
 the time, or would it be better to save off your status on device init
 and just refer to that when you would do math in this patch?

this is addressed in down stream patch #21 of this series

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] infiniband-diags: Allow user to specify multiple ports

2012-09-20 Thread Ira Weiny
On Thu, 13 Sep 2012 16:13:05 -0700
Albert Chu ch...@llnl.gov wrote:

 Allow user to select multiple ports via comma or range input.  May
 be particularly useful for gathering aggregate performance counters
 for groups of ports, such as all the uplinks or downlinks from a
 switch.
 
 Signed-off-by: Albert Chu ch...@llnl.gov

Thanks applied,
Ira

 ---
  doc/rst/perfquery.8.in.rst |   19 ++
  src/perfquery.c|   57 +--
  2 files changed, 68 insertions(+), 8 deletions(-)
 
 diff --git a/doc/rst/perfquery.8.in.rst b/doc/rst/perfquery.8.in.rst
 index 359c94e..5e3d709 100644
 --- a/doc/rst/perfquery.8.in.rst
 +++ b/doc/rst/perfquery.8.in.rst
 @@ -14,7 +14,7 @@ query InfiniBand port counters on a single port
  SYNOPSIS
  
  
 -perfquery [options] [lid|guid [[port] [reset_mask]]]
 +perfquery [options] [lid|guid [[port(s)] [reset_mask]]]
  
  DESCRIPTION
  ===
 @@ -32,6 +32,9 @@ octets divided by 4 rather than just octets.
  
  Note: Inputting a port of 255 indicates an operation be performed on all 
 ports.
  
 +Note: For PortCounters, ExtendedCounters, and resets, multiple ports can be
 +specified by either a comma separated list or a port range.  See examples 
 below.
 +
  
  OPTIONS
  ===
 @@ -98,15 +101,16 @@ OPTIONS
   show port samples control.
  
  **-a, --all_ports**
 - show aggregated counters for all ports of the destination lid or reset
 - all counters for all ports.  If the destination lid does not support
 + show aggregated counters for all ports of the destination lid, reset
 + all counters for all ports, or if multiple ports are specified, 
 aggregate
 + the counters of the specified ports.  If the destination lid does not 
 support
   the AllPortSelect flag, all ports will be iterated through to emulate
   AllPortSelect behavior.
  
  **-l, --loop_ports**
   If all ports are selected by the user (either through the **-a** option
 - or port 255) iterate through each port rather than doing than aggregate
 - operation.
 + or port 255) or multiple ports are specified iterate through each port 
 rather
 + than doing than aggregate operation.
  
  **-r, --reset_after_read**
   reset counters after read
 @@ -158,6 +162,7 @@ EXAMPLES
  
  
  ::
 +
   perfquery# read local port performance counters
   perfquery 32 1   # read performance counters from lid 32, port 1
   perfquery -x 32 1# read extended performance counters from lid 
 32, port 1
 @@ -169,6 +174,10 @@ EXAMPLES
   perfquery -R -a 32   # reset performance counters of all ports
   perfquery -R 32 2 0x0fff # reset only error counters of port 2
   perfquery -R 32 2 0xf000 # reset only non-error counters of port 2
 + perfquery -a 32 1-10 # read performance counters from lid 32, port 
 1-10, aggregate output
 + perfquery -l 32 1-10 # read performance counters from lid 32, port 
 1-10, output each port
 + perfquery -a 32 1,4,8# read performance counters from lid 32, port 
 1, 4, and 8, aggregate output
 + perfquery -l 32 1,4,8# read performance counters from lid 32, port 
 1, 4, and 8, output each port
  
  AUTHOR
  ==
 diff --git a/src/perfquery.c b/src/perfquery.c
 index 32dd98f..27ec4f7 100644
 --- a/src/perfquery.c
 +++ b/src/perfquery.c
 @@ -94,6 +94,7 @@ struct perf_count perf_count =
  struct perf_count_ext perf_count_ext = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  
  #define ALL_PORTS 0xFF
 +#define MAX_PORTS 255
  
  /* Notes: IB semantics is to cap counters if count has exceeded limits.
   * Therefore we must check for overflows and cap the counters if necessary.
 @@ -371,6 +372,8 @@ static int reset, reset_only, all_ports, loop_ports, 
 port, extended, xmt_sl,
  rcv_sl, xmt_disc, rcv_err, extended_speeds, smpl_ctl, oprcvcounters, 
 flowctlcounters,
  vloppackets, vlopdata, vlxmitflowctlerrors, vlxmitcounters, swportvlcong,
  rcvcc, slrcvfecn, slrcvbecn, xmitcc, vlxmittimecc;
 +static int ports[MAX_PORTS];
 +static int ports_count;
  
  static void common_func(ib_portid_t * portid, int port_num, int mask,
   unsigned query, unsigned reset,
 @@ -666,6 +669,7 @@ int main(int argc, char **argv)
   uint8_t data[IB_SMP_DATA_SIZE] = { 0 };
   int start_port = 1;
   int enhancedport0;
 + char *tmpstr;
   int i;
  
   const struct ibdiag_opt opts[] = {
 @@ -694,7 +698,7 @@ int main(int argc, char **argv)
   {Reset_only, 'R', 0, NULL, only reset counters},
   {0}
   };
 - char usage_args[] =  [lid|guid [[port] [reset_mask]]];
 + char usage_args[] =  [lid|guid [[port(s)] [reset_mask]]];
   const char *usage_examples[] = {
   \t\t# read local port's performance counters,
   32 1\t\t# read performance counters from lid 32, port 1,
 @@ -707,6 +711,10 @@ int main(int argc, char **argv)
   -R -a 

Re: [PATCH] infiniband-diags/src/perfquery.c: Fix all_ports corner case

2012-09-20 Thread Ira Weiny
On Thu, 13 Sep 2012 23:16:14 -0700
Al Chu ch...@llnl.gov wrote:

 if AllPortSelect is not supported and --all_ports is specified, it should
 only emulate AllPortSelect if no port is specified or the all ports
 port number (255) is specified.
 
 Signed-off-by: Albert Chu ch...@llnl.gov

Thanks applied,
Ira

 ---
  src/perfquery.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/src/perfquery.c b/src/perfquery.c
 index 27ec4f7..605ece9 100644
 --- a/src/perfquery.c
 +++ b/src/perfquery.c
 @@ -783,7 +783,7 @@ int main(int argc, char **argv)
   if (!(cap_mask  IB_PM_ALL_PORT_SELECT)) {  /* bit 8 is 
 AllPortSelect */
   if (!all_ports  port == ALL_PORTS)
   IBERROR(AllPortSelect not supported);
 - if (all_ports)
 + if (all_ports  port == ALL_PORTS)
   all_ports_loop = 1;
   }
  
 -- 
 1.7.1
 
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Ira Weiny
Member of Technical Staff
Lawrence Livermore National Lab
925-423-8008
wei...@llnl.gov
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH libmlx4 1/8] Add raw packet QP support

2012-09-20 Thread Luick, Dean


 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
 ow...@vger.kernel.org] On Behalf Of Or Gerlitz
 Sent: Thursday, September 20, 2012 3:31 PM
 To: rol...@kernel.org
 Cc: linux-rdma@vger.kernel.org; Or Gerlitz
 Subject: [PATCH libmlx4 1/8] Add raw packet QP support
 
 Implement raw packet QPs for Ethernet ports.
 
 Signed-off-by: Or Gerlitz ogerl...@mellanox.com
 ---
  src/qp.c |4 
  1 files changed, 4 insertions(+), 0 deletions(-)
 
 diff --git a/src/qp.c b/src/qp.c
 index 40a6689..90c4e80 100644
 --- a/src/qp.c
 +++ b/src/qp.c
 @@ -286,6 +286,10 @@ int mlx4_post_send(struct ibv_qp *ibqp, struct
 ibv_send_wr *wr,
   size += sizeof (struct mlx4_wqe_datagram_seg) / 16;
   break;
 
 + case IBV_QPT_RAW_PACKET:
 + /* For raw eth, the MLX4_WQE_CTRL_SOLICIT flag is
 used
 +  * to indicate that no icrc should be calculated */
 + ctrl-srcrb_flags |=
 htonl(MLX4_WQE_CTRL_SOLICIT);
   default:
   break;
   }

Add a break for the new case?  While the above code works as expected, it is 
inconsistent from what it was before, and could lead to problems if the default 
case starts doing something.


Dean
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html