[PATCH v2] opensm/osmeventplugin: added couple of events to monitor SM

2010-04-07 Thread Yevgeny Kliteynik
Hi Sasha,

I've added a couple of new events that allow event
plug-in to see what SM is doing, when it is sweeping
and when it updates dump files:

  OSM_EVENT_ID_L_SWEEP_STARTED,
  OSM_EVENT_ID_L_SWEEP_DONE,
  OSM_EVENT_ID_H_SWEEP_STARTED,
  OSM_EVENT_ID_H_SWEEP_DONE,
  OSM_EVENT_ID_REROUTE_DONE,
  OSM_EVENT_ID_ENTERING_STANDBY,
  OSM_EVENT_ID_SM_PORT_DOWN,
  OSM_EVENT_ID_SA_DB_DUMPED

The last event is reported when SA DB was actually dumped.
I'm thinking of similar optimization for guid2lid file - it
doesn't have to be dumped at the end of each heavy sweep,
as many heavy sweeps don't really happen because of nodes
appearing/disappearing.

Signed-off-by: Yevgeny Kliteynik klit...@dev.mellanox.co.il
---

Changes from V1:
  - added reporting OSM_EVENT_ID_H_SWEEP_DONE event
  - rebased to latest master

 opensm/include/opensm/osm_event_plugin.h   |   10 +-
 opensm/opensm/osm_state_mgr.c  |   22 +-
 opensm/osmeventplugin/src/osmeventplugin.c |   24 
 3 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/opensm/include/opensm/osm_event_plugin.h 
b/opensm/include/opensm/osm_event_plugin.h
index 33d1920..f5a57d7 100644
--- a/opensm/include/opensm/osm_event_plugin.h
+++ b/opensm/include/opensm/osm_event_plugin.h
@@ -72,7 +72,15 @@ typedef enum {
OSM_EVENT_ID_PORT_SELECT,
OSM_EVENT_ID_TRAP,
OSM_EVENT_ID_SUBNET_UP,
-   OSM_EVENT_ID_MAX
+   OSM_EVENT_ID_MAX,
+   OSM_EVENT_ID_L_SWEEP_STARTED,
+   OSM_EVENT_ID_L_SWEEP_DONE,
+   OSM_EVENT_ID_H_SWEEP_STARTED,
+   OSM_EVENT_ID_H_SWEEP_DONE,
+   OSM_EVENT_ID_REROUTE_DONE,
+   OSM_EVENT_ID_ENTERING_STANDBY,
+   OSM_EVENT_ID_SM_PORT_DOWN,
+   OSM_EVENT_ID_SA_DB_DUMPED
 } osm_epi_event_id_t;

 typedef struct osm_epi_port_id {
diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c
index e43463f..d5dff14 100644
--- a/opensm/opensm/osm_state_mgr.c
+++ b/opensm/opensm/osm_state_mgr.c
@@ -1076,6 +1076,9 @@ static void do_sweep(osm_sm_t * sm)
sm-p_subn-sm_state != IB_SMINFO_STATE_DISCOVERING)
return;

+   osm_opensm_report_event(sm-p_subn-p_osm,
+   OSM_EVENT_ID_L_SWEEP_STARTED, NULL);
+
if (sm-p_subn-coming_out_of_standby)
/*
 * Need to force re-write of sm_base_lid to all ports
@@ -,6 +1114,8 @@ static void do_sweep(osm_sm_t * sm)
osm_sa_db_file_dump(sm-p_subn-p_osm);
OSM_LOG_MSG_BOX(sm-p_log, OSM_LOG_VERBOSE,
LIGHT SWEEP COMPLETE);
+   osm_opensm_report_event(sm-p_subn-p_osm,
+   OSM_EVENT_ID_L_SWEEP_DONE, NULL);
return;
}
}
@@ -1151,6 +1156,8 @@ static void do_sweep(osm_sm_t * sm)
if (!sm-p_subn-subnet_initialization_error) {
OSM_LOG_MSG_BOX(sm-p_log, OSM_LOG_VERBOSE,
REROUTE COMPLETE);
+   osm_opensm_report_event(sm-p_subn-p_osm,
+   OSM_EVENT_ID_REROUTE_DONE, NULL);
return;
}
}
@@ -1158,6 +1165,9 @@ static void do_sweep(osm_sm_t * sm)
/* go to heavy sweep */
 repeat_discovery:

+   osm_opensm_report_event(sm-p_subn-p_osm,
+   OSM_EVENT_ID_H_SWEEP_STARTED, NULL);
+
/* First of all - unset all flags */
sm-p_subn-force_heavy_sweep = FALSE;
sm-p_subn-force_reroute = FALSE;
@@ -1185,6 +1195,8 @@ repeat_discovery:

/* Move to DISCOVERING state */
osm_sm_state_mgr_process(sm, OSM_SM_SIGNAL_DISCOVER);
+   osm_opensm_report_event(sm-p_subn-p_osm,
+   OSM_EVENT_ID_SM_PORT_DOWN, NULL);
return;
}

@@ -1205,6 +1217,8 @@ repeat_discovery:
ENTERING STANDBY STATE);
/* notify master SM about us */
osm_send_trap144(sm, 0);
+   osm_opensm_report_event(sm-p_subn-p_osm,
+   OSM_EVENT_ID_ENTERING_STANDBY, NULL);
return;
}

@@ -1212,6 +1226,9 @@ repeat_discovery:
if (sm-p_subn-force_heavy_sweep)
goto repeat_discovery;

+   osm_opensm_report_event(sm-p_subn-p_osm,
+   OSM_EVENT_ID_H_SWEEP_DONE, NULL);
+
OSM_LOG_MSG_BOX(sm-p_log, OSM_LOG_VERBOSE, HEAVY SWEEP COMPLETE);

/* If we are MASTER - get the highest remote_sm, and
@@ -1375,7 +1392,10 @@ repeat_discovery:

if (osm_log_is_active(sm-p_log, OSM_LOG_VERBOSE) ||
sm-p_subn-opt.sa_db_dump)
-   osm_sa_db_file_dump(sm-p_subn-p_osm);
+   if (!osm_sa_db_file_dump(sm-p_subn-p_osm))
+ 

[patch] infiniband: checking the wrong variable

2010-04-07 Thread Dan Carpenter
The intent here was to check the mfrpl-mapped_page_list allocation.
We checked mfrpl-ibfrpl.page_list earlier.

Signed-off-by: Dan Carpenter erro...@gmail.com

diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
index 56147b2..1d27b9a 100644
--- a/drivers/infiniband/hw/mlx4/mr.c
+++ b/drivers/infiniband/hw/mlx4/mr.c
@@ -240,7 +240,7 @@ struct ib_fast_reg_page_list 
*mlx4_ib_alloc_fast_reg_page_list(struct ib_device
mfrpl-mapped_page_list = dma_alloc_coherent(dev-dev-pdev-dev,
 size, mfrpl-map,
 GFP_KERNEL);
-   if (!mfrpl-ibfrpl.page_list)
+   if (!mfrpl-mapped_page_list)
goto err_free;
 
WARN_ON(mfrpl-map  0x3f);
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [RFC] ummunotify: Userspace support for MMU notifications

2010-04-07 Thread Eric B Munson
I am resubmitting this to try and restart the discussion about how
this should be implemented properly.  As the dicsussion was left
I see two possible solutions, one is a new class of perf event that
would be setup to prioritize catching every event over resource
consumption that would monitor MMU events filtered by registered
address ranges.  The other option is the one presented below, a
character device that uses ioctl and read to register address ranges
and return MMU events.  I want to try and pick the best solution
so I can move forward with it.

From: Roland Dreier rolandd at cisco.com

As discussed in http://article.gmane.org/gmane.linux.drivers.openib/61925
and follow-up messages, libraries using RDMA would like to track
precisely when application code changes memory mapping via free(),
munmap(), etc.  Current pure-userspace solutions using malloc hooks
and other tricks are not robust, and the feeling among experts is that
the issue is unfixable without kernel help.

We solve this not by implementing the full API proposed in the email
linked above but rather with a simpler and more generic interface,
which may be useful in other contexts.  Specifically, we implement a
new character device driver, ummunotify, that creates a /dev/ummunotify
node.  A userspace process can open this node read-only and use the fd
as follows:

 1. ioctl() to register/unregister an address range to watch in the
kernel (cf struct ummunotify_register_ioctl in linux/ummunotify.h).

 2. read() to retrieve events generated when a mapping in a watched
address range is invalidated (cf struct ummunotify_event in
linux/ummunotify.h).  select()/poll()/epoll() and SIGIO are
handled for this IO.

 3. mmap() one page at offset 0 to map a kernel page that contains a
generation counter that is incremented each time an event is
generated.  This allows userspace to have a fast path that checks
that no events have occurred without a system call.

Thanks to Jason Gunthorpe jgunthorpe at obsidianresearch.com for
suggestions on the interface design.  Also thanks to Jeff Squyres
jsquyres at cisco.com for prototyping support for this in Open MPI, which
helped find several bugs during development.

Signed-off-by: Roland Dreier rolandd at cisco.com
Signed-off-by: Eric B Munson ebmun...@us.ibm.com

---

Changes since v3:
 - Fixed replaced [get|put] user with copy_[from|to]_user to fix x86
   builds
---
 Documentation/Makefile|3 +-
 drivers/char/Kconfig  |   12 +
 drivers/char/Makefile |1 +
 drivers/char/ummunotify.c |  567 +
 4 files changed, 582 insertions(+), 1 deletions(-)
 create mode 100644 drivers/char/ummunotify.c

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 6fc7ea1..27ba76a 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -1,3 +1,4 @@
 obj-m := DocBook/ accounting/ auxdisplay/ connector/ \
filesystems/ filesystems/configfs/ ia64/ laptops/ networking/ \
-   pcmcia/ spi/ timers/ video4linux/ vm/ watchdog/src/
+   pcmcia/ spi/ timers/ video4linux/ vm/ ummunotify/ \
+   watchdog/src/
diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig
index 3141dd3..cf26019 100644
--- a/drivers/char/Kconfig
+++ b/drivers/char/Kconfig
@@ -,6 +,18 @@ config DEVPORT
depends on ISA || PCI
default y
 
+config UMMUNOTIFY
+   tristate Userspace MMU notifications
+   select MMU_NOTIFIER
+   help
+ The ummunotify (userspace MMU notification) driver creates a
+ character device that can be used by userspace libraries to
+ get notifications when an application's memory mapping
+ changed.  This is used, for example, by RDMA libraries to
+ improve the reliability of memory registration caching, since
+ the kernel's MMU notifications can be used to know precisely
+ when to shoot down a cached registration.
+
 source drivers/s390/char/Kconfig
 
 endmenu
diff --git a/drivers/char/Makefile b/drivers/char/Makefile
index f957edf..521e5de 100644
--- a/drivers/char/Makefile
+++ b/drivers/char/Makefile
@@ -97,6 +97,7 @@ obj-$(CONFIG_NSC_GPIO)+= nsc_gpio.o
 obj-$(CONFIG_CS5535_GPIO)  += cs5535_gpio.o
 obj-$(CONFIG_GPIO_TB0219)  += tb0219.o
 obj-$(CONFIG_TELCLOCK) += tlclk.o
+obj-$(CONFIG_UMMUNOTIFY)   += ummunotify.o
 
 obj-$(CONFIG_MWAVE)+= mwave/
 obj-$(CONFIG_AGP)  += agp/
diff --git a/drivers/char/ummunotify.c b/drivers/char/ummunotify.c
new file mode 100644
index 000..c14df3f
--- /dev/null
+++ b/drivers/char/ummunotify.c
@@ -0,0 +1,567 @@
+/*
+ * Copyright (c) 2009 Cisco Systems.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
+ * 

Re: [PATCH v2 13/51] IB/qib: Add qib_driver.c

2010-04-07 Thread Roland Dreier
  +DEFINE_MUTEX(qib_mutex);/* general driver use */

Rather than having this ill-defined mutex that I think is going to make
it hard to understand the locking and get the lock ordering right, would
it be better to have well-defined locking rules?  AFAICT this mutex is
used in only two places, qib_diag.c and qib_file_op.c.  Are those two
uses protecting the same thing?  Or could we have two static mutexes,
one in each file, that protects what each file needs protected?
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fork safe clarification

2010-04-07 Thread Matthew Small
I know many of the people here are very busy and these might be
questions that are not about your work at hand, but these intricacies
of libibverbs are hard to figure out without an intimate knowledge of
the drivers (from the user side).  I would really appreciate anyone
who would take the time to respond.

On Thu, Apr 1, 2010 at 10:09 AM, Matthew Small matthewtsm...@gmail.com wrote:
 I am trying to understand the behavior of the libibverbs after it has been
 set into fork safe mode via a successful call to ibv_fork_init() or setting
 the environmental variable IBV_FORK_SAFE.  For my purposes I would like to
 know the following :

 Are PDs, QPs and CQs created before a fork shared by the parent and child
 after fork() has returned (ie. both can submit WRs, poll CQ, etc.)?


 What about MRs registered before the fork?  Even though the child doesn't
 have access to the parent's memory, can he sill submit WRs on a QP with an
 MR created before the fork?


 What if the MR pages in the above scenario are accessible in both parent and
 child (shared memory)?  Are there complications with registering shared
 memory?


 In general, are pointers returned by libibverbs pointer to user/process
 address space (as ibv_mr pointers must be) or kernel space (eg.  if an
 unrelated process had another process's QP pointer, lkey, and a virtual
 address could it post (almost certainly unsafely) a WR to the other
 process's QP?


 Sorry if the questions seem progressively more goofy and thanks in advance
 for any clarification.

 -Matt

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 35/37] librdmacm/mckey: use AF_IB for unmapped multicast addresses

2010-04-07 Thread Sean Hefty
If the user joins an unmapped multicast address, use AF_IB,
rather than AF_INET6, to communicate that information with the
kernel.

Signed-off-by: Sean Hefty sean.he...@intel.com
---
This requires AF_IB support in the kernel.

 examples/mckey.c |   21 ++---
 1 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/examples/mckey.c b/examples/mckey.c
index ddc3495..a6b5c4d 100644
--- a/examples/mckey.c
+++ b/examples/mckey.c
@@ -46,6 +46,7 @@
 #include getopt.h
 
 #include rdma/rdma_cma.h
+#include infiniband/ib.h
 
 struct cmatest_node {
int id;
@@ -67,9 +68,9 @@ struct cmatest {
int conn_index;
int connects_left;
 
-   struct sockaddr_in6 dst_in;
+   struct sockaddr_storage dst_in;
struct sockaddr *dst_addr;
-   struct sockaddr_in6 src_in;
+   struct sockaddr_storage src_in;
struct sockaddr *src_addr;
 };
 
@@ -460,6 +461,20 @@ static int get_addr(char *dst, struct sockaddr *addr)
return ret;
 }
 
+static int get_dst_addr(char *dst, struct sockaddr *addr)
+{
+   struct sockaddr_ib *sib;
+
+   if (!unmapped_addr)
+   return get_addr(dst, addr);
+
+   sib = (struct sockaddr_ib *) addr;
+   memset(sib, 0, sizeof *sib);
+   sib-sib_family = AF_IB;
+   inet_pton(AF_INET6, dst, sib-sib_addr);
+   return 0;
+}
+
 static int run(void)
 {
int i, ret;
@@ -471,7 +486,7 @@ static int run(void)
return ret;
}
 
-   ret = get_addr(dst_addr, (struct sockaddr *) test.dst_in);
+   ret = get_dst_addr(dst_addr, (struct sockaddr *) test.dst_in);
if (ret)
return ret;
 



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 34/37] librdmacm: update man pages

2010-04-07 Thread Sean Hefty
Update man pages to reflect changes to the APIs.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 man/rdma_cm.7  |   56 +--
 man/rdma_create_ep.3   |   57 
 man/rdma_create_id.3   |   14 
 man/rdma_create_qp.3   |   15 +++--
 man/rdma_get_request.3 |   31 ++
 man/rdma_migrate_id.3  |6 -
 6 files changed, 165 insertions(+), 14 deletions(-)

diff --git a/man/rdma_cm.7 b/man/rdma_cm.7
index fd04959..ff5d489 100644
--- a/man/rdma_cm.7
+++ b/man/rdma_cm.7
@@ -8,17 +8,59 @@ Used to establish communication over RDMA transports.
 .SH NOTES
 The RDMA CM is a communication manager used to setup reliable, connected
 and unreliable datagram data transfers.  It provides an RDMA transport
-neutral interface for establishing connections.  The API is based on sockets,
-but adapted for queue pair (QP) based semantics: communication must be
-over a specific RDMA device, and data transfers are message based.
+neutral interface for establishing connections.  The API concepts are
+is based on sockets, but adapted for queue pair (QP) based semantics:
+communication must be over a specific RDMA device, and data transfers
+are message based.
 .P
-The RDMA CM only provides the communication management (connection setup /
-teardown) portion of an RDMA API.  It works in conjunction with the verbs
+The RDMA CM can control both the QP and communication management (connection 
setup /
+teardown) portions of an RDMA API, or only the communication management
+piece.  It works in conjunction with the verbs
 API defined by the libibverbs library.  The libibverbs library provides the
-interfaces needed to send and receive data.
+underlying interfaces needed to send and receive data.
+.P
+The RDMA CM can operate asynchronously or synchronously.  The mode of
+operation is controlled by the user through the use of the rdma_cm event 
channel
+parameter in specific calls.  If an event channel is provided, an rdma_cm 
identifier
+will report its event data (results of connecting, for example), on that 
channel.
+If a channel is not provided, then all rdma_cm operations for the selected
+rdma_cm identifier are will block until they complete.
+.SH RDMA VERBS
+The rdma_cm supports the full range of verbs available through the libibverbs
+library and interfaces.  However, it also provides wrapper functions for some
+of the more commonly used verbs funcationality.  The full set of abstracted
+verb calls are:
+.P rdma_reg_msgs  - register an array of buffers for sending and receiving
+.P rdma_reg_read  - registers a buffer for RDMA read operations
+.P rdma_reg_write - registers a buffer for RDMA write operations
+.P rdma_dereg_mr  - deregisters a memory region
+.P
+.P rdma_post_recv  - post a buffer to receive a message
+.P rdma_post_send  - post a buffer to send a message
+.P rdma_post_read  - post an RDMA to read data into a buffer
+.P rdma_post_write - post an RDMA to send data from a buffer
+.P
+.P rdma_post_recvv  - post a vector of buffers to receive a message
+.P rdma_post_sendv  - post a vector of buffers to send a message
+.P rdma_post_readv  - post a vector of buffers to receive an RDMA read
+.P rdma_post_writev - post a vector of buffers to send an RDMA write
+.P
+.P rdma_post_ud_send - post a buffer to send a message on a UD QP
+.P
+.P rdma_get_send_comp - get completion status for a send or RDMA operation
+.P rdma_get_recv_comp - get information about a completed receive
 .SH CLIENT OPERATION
 This section provides a general overview of the basic operation for the active,
-or client, side of communication.  A general connection flow would be:
+or client, side of communication.  This flow assume asynchronous operation with
+low level call details shown.  For
+synchronous operation, calls to rdma_create_event_channel, rdma_get_cm_event,
+rdma_ack_cm_event, and rdma_destroy_event_channel
+would be eliminated.  Abstracted calls, such as rdma_create_ep encapsulate
+serveral of these calls under a single API.
+Users may also refer to the example applications for
+code samples.  A general connection flow would be:
+.IP rdma_getaddrinfo
+retrieve address information of the destination
 .IP rdma_create_event_channel
 create channel to receive events
 .IP rdma_create_id
diff --git a/man/rdma_create_ep.3 b/man/rdma_create_ep.3
new file mode 100644
index 000..ae07113
--- /dev/null
+++ b/man/rdma_create_ep.3
@@ -0,0 +1,57 @@
+.TH RDMA_CREATE_EP 3 2007-08-06 librdmacm Librdmacm Programmer's 
Manual librdmacm
+.SH NAME
+rdma_create_ep \- Allocate a communication identifier and optional QP.
+.SH SYNOPSIS
+.B #include rdma/rdma_cma.h
+.P
+.B int rdma_create_ep
+.BI (struct rdma_cm_id ** id ,
+.BI struct rdma_addrinfo * res ,
+.BI struct ibv_pd  * pd ,
+.BI struct ibv_qp_init_attr * qp_init_attr );
+.SH ARGUMENTS
+.IP id 12
+A reference where the allocated communication identifier will be

[infiniband-diags] [0/3] support --diff and --diffcheck in ibnetdiscover

2010-04-07 Thread Al Chu
Hey Sasha,

The following sets of patches implement a --diff and --diffcheck options
in ibnetdiscover to let users diff an ibnetdiscover state to a previous
ibnetdiscover state.  The goal of this option is to help system
administrators isolate/determine changes in the network quickly compared
to a previous state.  Here's an example:

#  ./ibnetdiscover --diff=orig.cache 

vendid=0x8f1
devid=0x5a30
sysimgguid=0x8f10400411f57
switchguid=0x8f10400411f56(8f10400411f56)
Switch  24 S-0008f10400411f56 # ISR9024D Voltaire base port 0 lid 
11 lmc 0
 [14]  H-0002c90200219ef0[1](2c90200219ef1)  # wopr0 lid 64 4xDDR
 [19]  H-0002c903ff7c[1](2c903ff7d)  # wopr9 lid 48 4xDDR
 [20]  H-0002c903ff7c[1](2c903ff7d)  # wopr9 lid 4 4xDDR

 vendid=0x2c9
 devid=0x6282
 sysimgguid=0x2c90200219ef3
 caguid=0x2c90200219ef0
 Ca2 H-0002c90200219ef0  # wopr0
 [1](2c90200219ef1)S-0008f10400411f56[14]# lid 64 lmc 2 
ISR9024D Voltaire lid 11 4xDDR

In this particular example, port 14 on the switch (which is connected to
node 'wopr0') was up before but is now down (and the associated CA is
noted too).  In addition, 'wopr9' is connected to port 20 instead of
port 19 on the switch.

By default --diff checks switches, cas, routers, and port connections.
The --diffcheck option allows the user to specify which diff options
they want done, and also adds other diff checks for lids and/or node
descriptions.  More diff checks could be added later as needed.  For
example, the following only checks for differences of lids on switches.

#  ./ibnetdiscover --diff=orig.cache --diffcheck=sw,lid

vendid=0x8f1
devid=0x5a30
sysimgguid=0x8f10400411f57
switchguid=0x8f10400411f56(8f10400411f56)
 Switch24 S-0008f10400411f56 # ISR9024D Voltaire base port 
0 lid 11 lmc 0
 Switch24 S-0008f10400411f56 # ISR9024D Voltaire base port 
 0 lid 3 lmc 0
 [13]  H-0002c90200219e64[1](2c90200219e65)  # wopri lid 4 4xDDR
 [13]  H-0002c90200219e64[1](2c90200219e65)  # wopri lid 1 4xDDR

Others on the list may wonder how this is different than just using the
normal 'diff' tool.  The differences I can think of are:

1) This checks differences in the network, not text.  This is
particularly important when lids, lmc, etc. are changed.  Otherwise
there are many differences in a normal diff output that aren't
necessary.

2) This provides the appropriate context in the diff output, showing
the appropriate system ids to allow a system administrator to identify
ports on what switch have changed.  Under normal diff output, you may
not get that appropriate context of information.  The system
administrator can of course use options like --context in diff, but the
goal is to make the diff output clear and concise, not outputting
unnecessary junk.

3) As parallelization has been added into ibnetdisocver/libibnetdiscover
this becomes more critical as output in ibnetdiscover/libibnetdiscover
can be re-ordered.  So a normal diff suddenly is non-functional.

There's probably other minor advantages.  Even if minor output tweaks
happen to ibnetdiscover in the future, this can still work against old
cache files.

Al


-- 
Albert Chu
ch...@llnl.gov
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 31/37] librdmacm: provide abstracted verb calls

2010-04-07 Thread Sean Hefty
Provide abstractions to the verb calls to simplify the user
interface for more casual verbs consumers.  Users still have
access to the full range of verbs functionality by calling
verbs directly.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 Makefile.am   |5 -
 include/rdma/rdma_verbs.h |  287 +
 2 files changed, 290 insertions(+), 2 deletions(-)

diff --git a/Makefile.am b/Makefile.am
index 8d86045..8aef24a 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -31,7 +31,8 @@ librdmacmincludedir = $(includedir)/rdma 
$(includedir)/infiniband
 
 librdmacminclude_HEADERS = include/rdma/rdma_cma_abi.h \
   include/rdma/rdma_cma.h \
-  include/infiniband/ib.h
+  include/infiniband/ib.h \
+  include/rdma/rdma_verbs.h
 
 man_MANS = \
man/rdma_accept.3 \
@@ -69,7 +70,7 @@ man_MANS = \
man/rdma_cm.7
 
 EXTRA_DIST = include/rdma/rdma_cma_abi.h include/rdma/rdma_cma.h \
-include/infiniband/ib.h \
+include/infiniband/ib.h include/rdma/rdma_verbs.h \
 src/cma.h src/librdmacm.map librdmacm.spec.in $(man_MANS)
 
 dist-hook: librdmacm.spec
diff --git a/include/rdma/rdma_verbs.h b/include/rdma/rdma_verbs.h
new file mode 100644
index 000..05964c1
--- /dev/null
+++ b/include/rdma/rdma_verbs.h
@@ -0,0 +1,287 @@
+/*
+ * Copyright (c) 2010 Intel Corporation.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#if !defined(RDMA_VERBS_H)
+#define RDMA_VERBS_H
+
+#include assert.h
+#include infiniband/verbs.h
+#include rdma/rdma_cma.h
+
+#ifdef __cplusplus
+extern C {
+#endif
+
+/*
+ * Memory registration helpers.
+ */
+static inline struct ibv_mr *
+rdma_reg_msgs(struct rdma_cm_id *id, void *addr, size_t length)
+{
+   return ibv_reg_mr(id-qp-pd, addr, length, IBV_ACCESS_LOCAL_WRITE);
+}
+
+static inline struct ibv_mr *
+rdma_reg_read(struct rdma_cm_id *id, void *addr, size_t length)
+{
+   return ibv_reg_mr(id-qp-pd, addr, length, IBV_ACCESS_LOCAL_WRITE|
+   IBV_ACCESS_REMOTE_READ);
+}
+
+static inline struct ibv_mr *
+rdma_reg_write(struct rdma_cm_id *id, void *addr, size_t length)
+{
+   return ibv_reg_mr(id-qp-pd, addr, length, IBV_ACCESS_LOCAL_WRITE |
+   IBV_ACCESS_REMOTE_WRITE);
+}
+
+static inline int
+rdma_dereg_mr(struct ibv_mr *mr)
+{
+   return ibv_dereg_mr(mr);
+}
+
+
+/*
+ * Vectored send, receive, and RDMA operations.
+ * Support multiple scatter-gather entries.
+ */
+static inline int
+rdma_post_recvv(struct rdma_cm_id *id, void *context, struct ibv_sge *sgl,
+   int nsge)
+{
+   struct ibv_recv_wr wr, *bad;
+
+   wr.wr_id = (uintptr_t) context;
+   wr.next = NULL;
+   wr.sg_list = sgl;
+   wr.num_sge = nsge;
+
+   return ibv_post_recv(id-qp, wr, bad);
+}
+
+static inline int
+rdma_post_sendv(struct rdma_cm_id *id, void *context, struct ibv_sge *sgl,
+   int nsge, int flags)
+{
+   struct ibv_send_wr wr, *bad;
+
+   wr.wr_id = (uintptr_t) context;
+   wr.next = NULL;
+   wr.sg_list = sgl;
+   wr.num_sge = nsge;
+   wr.opcode = IBV_WR_SEND;
+   wr.send_flags = flags;
+
+   return ibv_post_send(id-qp, wr, bad);
+}
+
+static inline int
+rdma_post_readv(struct rdma_cm_id *id, void *context, struct ibv_sge *sgl,
+   int nsge, int flags, uint64_t remote_addr, uint32_t rkey)
+{
+   struct ibv_send_wr wr, *bad;
+
+   wr.wr_id = (uintptr_t) context;
+   wr.next = 

[PATCH 27/37] librdmacm: add support for IB ACM service

2010-04-07 Thread Sean Hefty
Allow the librdmacm to contact a service via sockets to obtain
address mapping and path record data.  The use of the service
is controlled through a build option (with-ib_acm).  If the
library fails to contact the service, it falls back to using
the kernel services to resolve address and routing data.

Signed-off-by: Sean Hefty sean.he...@intel.com
---
Once IB ACM is proven, the build option can be removed.

 Makefile.am|2 -
 configure.in   |   14 +
 src/acm.c  |  160 
 src/addrinfo.c |3 +
 src/cma.c  |9 ++-
 src/cma.h  |   13 -
 6 files changed, 197 insertions(+), 4 deletions(-)

diff --git a/Makefile.am b/Makefile.am
index be53c78..8d86045 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -12,7 +12,7 @@ else
 librdmacm_version_script =
 endif
 
-src_librdmacm_la_SOURCES = src/cma.c src/addrinfo.c
+src_librdmacm_la_SOURCES = src/cma.c src/addrinfo.c src/acm.c
 src_librdmacm_la_LDFLAGS = -version-info 1 -export-dynamic \
   $(librdmacm_version_script)
 src_librdmacm_la_DEPENDENCIES =  $(srcdir)/src/librdmacm.map
diff --git a/configure.in b/configure.in
index 1122966..3db4247 100644
--- a/configure.in
+++ b/configure.in
@@ -21,6 +21,15 @@ if test $with_valgrind !=   test $with_valgrind != 
no; then
fi
 fi
 
+AC_ARG_WITH([ib_acm],
+AC_HELP_STRING([--with-ib_acm],
+  [Use IB ACM for route resolution - default NO]))
+
+if test $with_ib_acm !=   test $with_ib_acm != no; then
+   AC_DEFINE([USE_IB_ACM], 1,
+ [Define to 1 to use IB ACM for endpoint resolution])
+fi
+
 AC_ARG_ENABLE(libcheck, [  --disable-libcheck  do not test for presence of 
ib libraries],
 [   if test $enableval = no; then
 disable_libcheck=yes
@@ -51,6 +60,11 @@ AC_CHECK_HEADER(valgrind/memcheck.h, [],
 AC_MSG_ERROR([valgrind requested but valgrind/memcheck.h not found.]))
 fi
 
+if test $with_ib_acm !=   test $with_ib_acm != no; then
+AC_CHECK_HEADER(infiniband/acm.h, [],
+AC_MSG_ERROR([IB ACM requested but infiniband/acm.h not found.]))
+fi
+
 fi
 
 AC_CACHE_CHECK(whether ld accepts --version-script, ac_cv_version_script,
diff --git a/src/acm.c b/src/acm.c
new file mode 100644
index 000..34fdf3c
--- /dev/null
+++ b/src/acm.c
@@ -0,0 +1,160 @@
+/*
+ * Copyright (c) 2010 Intel Corporation.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#if HAVE_CONFIG_H
+#  include config.h
+#endif /* HAVE_CONFIG_H */
+
+#include sys/types.h
+#include sys/socket.h
+#include netdb.h
+#include unistd.h
+
+#include cma.h
+#include rdma/rdma_cma.h
+#include infiniband/ib.h
+#include infiniband/sa.h
+
+#ifdef USE_IB_ACM
+#include infiniband/acm.h
+
+static pthread_mutex_t acm_lock = PTHREAD_MUTEX_INITIALIZER;
+static int sock;
+static short server_port = 6125;
+
+void ucma_ib_init(void)
+{
+   struct sockaddr_in addr;
+   int ret;
+
+   sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
+   if (sock  0)
+   return;
+
+   memset(addr, 0, sizeof addr);
+   addr.sin_family = AF_INET;
+   addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+   addr.sin_port = htons(server_port);
+   ret = connect(sock, (struct sockaddr *) addr, sizeof(addr));
+   if (ret)
+   goto err;
+
+   return;
+
+err:
+   close(sock);
+   sock = 0;
+}
+
+void ucma_ib_cleanup(void)
+{
+   if (sock  0) {
+   shutdown(sock, SHUT_RDWR);
+   close(sock);
+   }
+}
+
+static void ucma_ib_save_resp(struct rdma_addrinfo *rai, struct 
acm_resolve_msg *msg)
+{
+   

[PATCH 26/37] librdmacm: set src_addr in rdma_getaddrinfo

2010-04-07 Thread Sean Hefty
RDMA requires the user to allocate hardware resources before
establishing a connection.  To support this, the user must know
the source address that the connection will use to reach the
remote endpoint.  Modify rdma_getaddrinfo to determine an
appropriate source address based on the specified destination,
when a source address is not given.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 src/addrinfo.c |   60 
 1 files changed, 60 insertions(+), 0 deletions(-)

diff --git a/src/addrinfo.c b/src/addrinfo.c
index 15ae071..dfaf9d5 100644
--- a/src/addrinfo.c
+++ b/src/addrinfo.c
@@ -39,6 +39,7 @@
 #include sys/types.h
 #include sys/socket.h
 #include netdb.h
+#include unistd.h
 
 #include cma.h
 #include rdma/rdma_cma.h
@@ -129,6 +130,48 @@ static int ucma_convert_to_rai(struct rdma_addrinfo *rai, 
struct addrinfo *ai)
return 0;
 }
 
+static int ucma_resolve_src(struct rdma_addrinfo *rai)
+{
+   struct sockaddr *addr;
+   socklen_t len;
+   int ret, s;
+
+   s = socket(rai-ai_family, SOCK_DGRAM, IPPROTO_UDP);
+   if (s  0)
+   return s;
+
+   ret = connect(s, rai-ai_dst_addr, rai-ai_dst_len);
+   if (ret)
+   goto err1;
+
+   addr = zalloc(rai-ai_dst_len);
+   if (!addr) {
+   ret = ERR(ENOMEM);
+   goto err1;
+   }
+
+   len = rai-ai_dst_len;
+   ret = getsockname(s, addr, len);
+   if (ret)
+   goto err2;
+
+   if (addr-sa_family == AF_INET)
+   ((struct sockaddr_in *) addr)-sin_port = 0;
+   else
+   ((struct sockaddr_in6 *) addr)-sin6_port = 0;
+   rai-ai_src_addr = addr;
+   rai-ai_src_len = len;
+
+   close(s);
+   return 0;
+
+err2:
+   free(addr);
+err1:
+   close(s);
+   return ret;
+}
+
 int rdma_getaddrinfo(char *node, char *service,
 struct rdma_addrinfo *hints,
 struct rdma_addrinfo **res)
@@ -159,6 +202,23 @@ int rdma_getaddrinfo(char *node, char *service,
if (ret)
goto err2;
 
+   if (!rai-ai_src_len) {
+   if (hints  hints-ai_src_len) {
+   rai-ai_src_addr = zalloc(hints-ai_src_len);
+   if (!rai-ai_src_addr) {
+   ret = ERR(ENOMEM);
+   goto err2;
+   }
+   memcpy(rai-ai_src_addr, hints-ai_src_addr,
+  hints-ai_src_len);
+   rai-ai_src_len = hints-ai_src_len;
+   } else {
+   ret = ucma_resolve_src(rai);
+   if (ret)
+   goto err2;
+   }
+   }
+
freeaddrinfo(ai);
*res = rai;
return 0;



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 19/37] librdmacm: add rdma_get_request

2010-04-07 Thread Sean Hefty
To simplify passive side operation and better support synchronous
operations, add rdma_get_request().  This function is called on the
listening side to retrieve a connection request event.

Signed-off-by: Sean Hefty sean.he...@intel.com
---
Ideally, this call would have been rdma_accept, to match with the socket
accept call, but it was already taken.

 include/rdma/rdma_cma.h |5 +
 src/cma.c   |   38 ++
 src/librdmacm.map   |1 +
 3 files changed, 44 insertions(+), 0 deletions(-)

diff --git a/include/rdma/rdma_cma.h b/include/rdma/rdma_cma.h
index 1db559e..89013a0 100644
--- a/include/rdma/rdma_cma.h
+++ b/include/rdma/rdma_cma.h
@@ -381,6 +381,11 @@ int rdma_connect(struct rdma_cm_id *id, struct 
rdma_conn_param *conn_param);
 int rdma_listen(struct rdma_cm_id *id, int backlog);
 
 /**
+ * rdma_get_request
+ */
+int rdma_get_request(struct rdma_cm_id *listen, struct rdma_cm_id **id);
+
+/**
  * rdma_accept - Called to accept a connection request.
  * @id: Connection identifier associated with the request.
  * @conn_param: Optional information needed to establish the connection.
diff --git a/src/cma.c b/src/cma.c
index 8aa7b05..9de33d4 100644
--- a/src/cma.c
+++ b/src/cma.c
@@ -1242,6 +1242,44 @@ int rdma_listen(struct rdma_cm_id *id, int backlog)
return ucma_query_route(id);
 }
 
+int rdma_get_request(struct rdma_cm_id *listen, struct rdma_cm_id **id)
+{
+   struct cma_id_private *id_priv;
+   struct rdma_cm_event *event;
+   int ret;
+
+   id_priv = container_of(listen, struct cma_id_private, id);
+   if (!id_priv-sync)
+   return ERR(EINVAL);
+
+   if (listen-event) {
+   rdma_ack_cm_event(listen-event);
+   listen-event = NULL;
+   }
+
+   ret = rdma_get_cm_event(listen-channel, event);
+   if (ret)
+   return ret;
+
+   if (event-status) {
+   ret = event-status;
+   goto err;
+   }
+   
+   if (event-event != RDMA_CM_EVENT_CONNECT_REQUEST) {
+   ret = ERR(EINVAL);
+   goto err;
+   }
+
+   *id = event-id;
+   (*id)-event = event;
+   return 0;
+
+err:
+   listen-event = event;
+   return ret;
+}
+
 int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param)
 {
struct ucma_abi_accept *cmd;
diff --git a/src/librdmacm.map b/src/librdmacm.map
index 1f07102..f6af452 100644
--- a/src/librdmacm.map
+++ b/src/librdmacm.map
@@ -30,5 +30,6 @@ RDMACM_1.0 {
rdma_migrate_id;
rdma_getaddrinfo;
rdma_freeaddrinfo;
+   rdma_get_request;
local: *;
 };



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 17/37] librdmacm: expose ucma_init to other internal modules

2010-04-07 Thread Sean Hefty
Remove static property from ucma_init and expose its
definition in cma.h.  The address resolution module will
need access to this function.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 src/cma.c |   14 +-
 src/cma.h |2 ++
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/src/cma.c b/src/cma.c
index 6ef4b96..8aa7b05 100644
--- a/src/cma.c
+++ b/src/cma.c
@@ -188,13 +188,17 @@ static int check_abi_version(void)
return 0;
 }
 
-static int ucma_init(void)
+int ucma_init(void)
 {
struct ibv_device **dev_list = NULL;
struct cma_device *cma_dev;
struct ibv_device_attr attr;
int i, ret, dev_cnt;
 
+   /* Quick check without lock to see if we're already initialized */
+   if (cma_dev_cnt)
+   return 0;
+
pthread_mutex_lock(mut);
if (cma_dev_cnt) {
pthread_mutex_unlock(mut);
@@ -271,7 +275,7 @@ struct ibv_context **rdma_get_devices(int *num_devices)
struct ibv_context **devs = NULL;
int i;
 
-   if (!cma_dev_cnt  ucma_init())
+   if (ucma_init())
goto out;
 
devs = malloc(sizeof *devs * (cma_dev_cnt + 1));
@@ -301,7 +305,7 @@ struct rdma_event_channel *rdma_create_event_channel(void)
 {
struct rdma_event_channel *channel;
 
-   if (!cma_dev_cnt  ucma_init())
+   if (ucma_init())
return NULL;
 
channel = malloc(sizeof *channel);
@@ -396,7 +400,7 @@ int rdma_create_id(struct rdma_event_channel *channel,
void *msg;
int ret, size;
 
-   ret = cma_dev_cnt ? 0 : ucma_init();
+   ret = ucma_init();
if (ret)
return ret;
 
@@ -1712,7 +1716,7 @@ int rdma_get_cm_event(struct rdma_event_channel *channel,
void *msg;
int ret, size;
 
-   ret = cma_dev_cnt ? 0 : ucma_init();
+   ret = ucma_init();
if (ret)
return ret;
 
diff --git a/src/cma.h b/src/cma.h
index 92e771e..06ca38c 100644
--- a/src/cma.h
+++ b/src/cma.h
@@ -82,5 +82,7 @@ static inline void *zalloc(size_t size)
return buf;
 }
 
+int ucma_init();
+
 #endif /* CMA_H */
 



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 15/37] librdmacm: allow user to specify max RDMA resources

2010-04-07 Thread Sean Hefty
Allow the user to indicate that the library should select the
maximum RDMA read values available that should be used when
establishing a connection.  The library selects the maximum
based on local hardware limitations and connection request
data.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 include/rdma/rdma_cma.h |5 +++
 src/cma.c   |   83 +++
 src/cma.h   |2 +
 3 files changed, 62 insertions(+), 28 deletions(-)

diff --git a/include/rdma/rdma_cma.h b/include/rdma/rdma_cma.h
index d8cbb91..f50b4dd 100644
--- a/include/rdma/rdma_cma.h
+++ b/include/rdma/rdma_cma.h
@@ -121,6 +121,11 @@ struct rdma_cm_id {
struct ibv_cq   *recv_cq;
 };
 
+enum {
+   RDMA_MAX_RESP_RES = 0xFF,
+   RDMA_MAX_INIT_DEPTH = 0xFF
+};
+
 struct rdma_conn_param {
const void *private_data;
uint8_t private_data_len;
diff --git a/src/cma.c b/src/cma.c
index 805aca3..b8d57a5 100644
--- a/src/cma.c
+++ b/src/cma.c
@@ -112,6 +112,8 @@ struct cma_id_private {
pthread_mutex_t   mut;
uint32_t  handle;
struct cma_multicast *mc_list;
+   uint8_t   initiator_depth;
+   uint8_t   responder_resources;
 };
 
 struct cma_multicast {
@@ -850,8 +852,7 @@ static int rdma_init_qp_attr(struct rdma_cm_id *id, struct 
ibv_qp_attr *qp_attr,
return 0;
 }
 
-static int ucma_modify_qp_rtr(struct rdma_cm_id *id,
- struct rdma_conn_param *conn_param)
+static int ucma_modify_qp_rtr(struct rdma_cm_id *id, uint8_t resp_res)
 {
struct ibv_qp_attr qp_attr;
int qp_attr_mask, ret;
@@ -874,13 +875,12 @@ static int ucma_modify_qp_rtr(struct rdma_cm_id *id,
if (ret)
return ret;
 
-   if (conn_param)
-   qp_attr.max_dest_rd_atomic = conn_param-responder_resources;
+   if (resp_res != RDMA_MAX_RESP_RES)
+   qp_attr.max_dest_rd_atomic = resp_res;
return ibv_modify_qp(id-qp, qp_attr, qp_attr_mask);
 }
 
-static int ucma_modify_qp_rts(struct rdma_cm_id *id,
- struct rdma_conn_param *conn_param)
+static int ucma_modify_qp_rts(struct rdma_cm_id *id, uint8_t init_depth)
 {
struct ibv_qp_attr qp_attr;
int qp_attr_mask, ret;
@@ -890,8 +890,8 @@ static int ucma_modify_qp_rts(struct rdma_cm_id *id,
if (ret)
return ret;
 
-   if (conn_param)
-   qp_attr.max_rd_atomic = conn_param-initiator_depth;
+   if (init_depth != RDMA_MAX_INIT_DEPTH)
+   qp_attr.max_rd_atomic = init_depth;
return ibv_modify_qp(id-qp, qp_attr, qp_attr_mask);
 }
 
@@ -1128,28 +1128,31 @@ void rdma_destroy_qp(struct rdma_cm_id *id)
 }
 
 static int ucma_valid_param(struct cma_id_private *id_priv,
-   struct rdma_conn_param *conn_param)
+   struct rdma_conn_param *param)
 {
if (id_priv-id.ps != RDMA_PS_TCP)
return 0;
 
-   if ((conn_param-responder_resources 
-id_priv-cma_dev-max_responder_resources) ||
-   (conn_param-initiator_depth 
-id_priv-cma_dev-max_initiator_depth))
+   if ((param-responder_resources != RDMA_MAX_RESP_RES) 
+   (param-responder_resources  
id_priv-cma_dev-max_responder_resources))
+   return ERR(EINVAL);
+
+   if ((param-initiator_depth != RDMA_MAX_INIT_DEPTH) 
+   (param-initiator_depth  id_priv-cma_dev-max_initiator_depth))
return ERR(EINVAL);
 
return 0;
 }
 
-static void ucma_copy_conn_param_to_kern(struct ucma_abi_conn_param *dst,
+static void ucma_copy_conn_param_to_kern(struct cma_id_private *id_priv,
+struct ucma_abi_conn_param *dst,
 struct rdma_conn_param *src,
 uint32_t qp_num, uint8_t srq)
 {
dst-qp_num = qp_num;
dst-srq = srq;
-   dst-responder_resources = src-responder_resources;
-   dst-initiator_depth = src-initiator_depth;
+   dst-responder_resources = id_priv-responder_resources;
+   dst-initiator_depth = id_priv-initiator_depth;
dst-flow_control = src-flow_control;
dst-retry_count = src-retry_count;
dst-rnr_retry_count = src-rnr_retry_count;
@@ -1174,15 +1177,24 @@ int rdma_connect(struct rdma_cm_id *id, struct 
rdma_conn_param *conn_param)
if (ret)
return ret;
 
+   if (conn_param-initiator_depth != RDMA_MAX_INIT_DEPTH)
+   id_priv-initiator_depth = conn_param-initiator_depth;
+   else
+   id_priv-initiator_depth = 
id_priv-cma_dev-max_initiator_depth;
+   if (conn_param-responder_resources != RDMA_MAX_RESP_RES)
+   id_priv-responder_resources = conn_param-responder_resources;
+   else
+   id_priv-responder_resources = 

[PATCH 14/37] librdmacm: make CQs optional for rdma_create_qp

2010-04-07 Thread Sean Hefty
Allow the user to specify NULL for the send and receive CQs when
creating a QP through rdma_create_qp.  The librdmacm will automatically
create CQs for the user, along with completion channel.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 include/rdma/rdma_cma.h |4 +++
 src/cma.c   |   74 ---
 2 files changed, 74 insertions(+), 4 deletions(-)

diff --git a/include/rdma/rdma_cma.h b/include/rdma/rdma_cma.h
index ccf6cd4..d8cbb91 100644
--- a/include/rdma/rdma_cma.h
+++ b/include/rdma/rdma_cma.h
@@ -115,6 +115,10 @@ struct rdma_cm_id {
enum rdma_port_space ps;
uint8_t  port_num;
struct rdma_cm_event*event;
+   struct ibv_comp_channel *send_cq_channel;
+   struct ibv_cq   *send_cq;
+   struct ibv_comp_channel *recv_cq_channel;
+   struct ibv_cq   *recv_cq;
 };
 
 struct rdma_conn_param {
diff --git a/src/cma.c b/src/cma.c
index 0587ab3..805aca3 100644
--- a/src/cma.c
+++ b/src/cma.c
@@ -1025,6 +1025,63 @@ static int ucma_init_ud_qp(struct cma_id_private 
*id_priv, struct ibv_qp *qp)
return ibv_modify_qp(qp, qp_attr, IBV_QP_STATE | IBV_QP_SQ_PSN);
 }
 
+static void ucma_destroy_cqs(struct rdma_cm_id *id)
+{
+   if (id-recv_cq)
+   ibv_destroy_cq(id-recv_cq);
+
+   if (id-recv_cq_channel)
+   ibv_destroy_comp_channel(id-recv_cq_channel);
+
+   if (id-send_cq)
+   ibv_destroy_cq(id-send_cq);
+
+   if (id-send_cq_channel)
+   ibv_destroy_comp_channel(id-send_cq_channel);
+}
+
+static int ucma_create_cqs(struct rdma_cm_id *id, struct ibv_qp_init_attr 
*attr)
+{
+   int ret;
+
+   if (!attr-recv_cq) {
+   id-recv_cq_channel = ibv_create_comp_channel(id-verbs);
+   if (!id-recv_cq_channel) {
+   ret = ERR(ENOMEM);
+   goto err;
+   }
+
+   id-recv_cq = ibv_create_cq(id-verbs, attr-cap.max_recv_wr,
+   id, id-recv_cq_channel, 0);
+   if (!id-recv_cq) {
+   ret = ERR(ENOMEM);
+   goto err;
+   }
+   attr-recv_cq = id-recv_cq;
+   }
+
+   if (!attr-send_cq) {
+   id-send_cq_channel = ibv_create_comp_channel(id-verbs);
+   if (!id-send_cq_channel) {
+   ret = ERR(ENOMEM);
+   goto err;
+   }
+
+   id-send_cq = ibv_create_cq(id-verbs, attr-cap.max_send_wr,
+   id, id-send_cq_channel, 0);
+   if (!id-send_cq) {
+   ret = ERR(ENOMEM);
+   goto err;
+   }
+   attr-send_cq = id-send_cq;
+   }
+
+   return 0;
+err:
+   ucma_destroy_cqs(id);
+   return ret;
+}
+
 int rdma_create_qp(struct rdma_cm_id *id, struct ibv_pd *pd,
   struct ibv_qp_init_attr *qp_init_attr)
 {
@@ -1038,27 +1095,36 @@ int rdma_create_qp(struct rdma_cm_id *id, struct ibv_pd 
*pd,
else if (id-verbs != pd-context)
return ERR(EINVAL);
 
+   ret = ucma_create_cqs(id, qp_init_attr);
+   if (ret)
+   return ret;
+
qp = ibv_create_qp(pd, qp_init_attr);
-   if (!qp)
-   return ERR(ENOMEM);
+   if (!qp) {
+   ret = ERR(ENOMEM);
+   goto err1;
+   }
 
if (ucma_is_ud_ps(id-ps))
ret = ucma_init_ud_qp(id_priv, qp);
else
ret = ucma_init_conn_qp(id_priv, qp);
if (ret)
-   goto err;
+   goto err2;
 
id-qp = qp;
return 0;
-err:
+err2:
ibv_destroy_qp(qp);
+err1:
+   ucma_destroy_cqs(id);
return ret;
 }
 
 void rdma_destroy_qp(struct rdma_cm_id *id)
 {
ibv_destroy_qp(id-qp);
+   ucma_destroy_cqs(id);
 }
 
 static int ucma_valid_param(struct cma_id_private *id_priv,



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/37] librdmacm: support synchronous rdma_cm_id's

2010-04-07 Thread Sean Hefty
Allow the user to specify NULL as the rdma_event_channel in
order to indicate that the rdma_cm_id should process all requests
synchronously.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 include/rdma/rdma_cma.h |1 +
 src/cma.c   |   93 +++
 2 files changed, 78 insertions(+), 16 deletions(-)

diff --git a/include/rdma/rdma_cma.h b/include/rdma/rdma_cma.h
index a071a9b..83418c3 100644
--- a/include/rdma/rdma_cma.h
+++ b/include/rdma/rdma_cma.h
@@ -114,6 +114,7 @@ struct rdma_cm_id {
struct rdma_routeroute;
enum rdma_port_space ps;
uint8_t  port_num;
+   struct rdma_cm_event*event;
 };
 
 struct rdma_conn_param {
diff --git a/src/cma.c b/src/cma.c
index 4025aeb..c7a3a7b 100644
--- a/src/cma.c
+++ b/src/cma.c
@@ -106,6 +106,7 @@ struct cma_id_private {
struct cma_device *cma_dev;
int   events_completed;
int   connect_error;
+   int   sync;
pthread_cond_tcond;
pthread_mutex_t   mut;
uint32_t  handle;
@@ -333,6 +334,9 @@ static void ucma_free_id(struct cma_id_private *id_priv)
pthread_mutex_destroy(id_priv-mut);
if (id_priv-id.route.path_rec)
free(id_priv-id.route.path_rec);
+
+   if (id_priv-sync)
+   rdma_destroy_event_channel(id_priv-id.channel);
free(id_priv);
 }
 
@@ -348,7 +352,16 @@ static struct cma_id_private *ucma_alloc_id(struct 
rdma_event_channel *channel,
 
id_priv-id.context = context;
id_priv-id.ps = ps;
-   id_priv-id.channel = channel;
+
+   if (!channel) {
+   id_priv-id.channel = rdma_create_event_channel();
+   if (!id_priv-id.channel)
+   goto err;
+   id_priv-sync = 1;
+   } else {
+   id_priv-id.channel = channel;
+   }
+
pthread_mutex_init(id_priv-mut, NULL);
if (pthread_cond_init(id_priv-cond, NULL))
goto err;
@@ -381,7 +394,7 @@ int rdma_create_id(struct rdma_event_channel *channel,
cmd-uid = (uintptr_t) id_priv;
cmd-ps = ps;
 
-   ret = write(channel-fd, msg, size);
+   ret = write(id_priv-id.channel-fd, msg, size);
if (ret != size)
goto err;
 
@@ -424,6 +437,9 @@ int rdma_destroy_id(struct rdma_cm_id *id)
if (ret  0)
return ret;
 
+   if (id_priv-id.event)
+   rdma_ack_cm_event(id_priv-id.event);
+
pthread_mutex_lock(id_priv-mut);
while (id_priv-events_completed  ret)
pthread_cond_wait(id_priv-cond, id_priv-mut);
@@ -694,6 +710,25 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr 
*addr)
return ucma_query_route(id);
 }
 
+static int ucma_complete(struct cma_id_private *id_priv)
+{
+   int ret;
+
+   if (!id_priv-sync)
+   return 0;
+
+   if (id_priv-id.event) {
+   rdma_ack_cm_event(id_priv-id.event);
+   id_priv-id.event = NULL;
+   }
+
+   ret = rdma_get_cm_event(id_priv-id.channel, id_priv-id.event);
+   if (ret)
+   return ret;
+
+   return id_priv-id.event-status;
+}
+
 static int rdma_resolve_addr2(struct rdma_cm_id *id, struct sockaddr *src_addr,
  socklen_t src_len, struct sockaddr *dst_addr,
  socklen_t dst_len, int timeout_ms)
@@ -718,7 +753,7 @@ static int rdma_resolve_addr2(struct rdma_cm_id *id, struct 
sockaddr *src_addr,
return (ret = 0) ? ERR(ENODATA) : -1;
 
memcpy(id-route.addr.dst_addr, dst_addr, dst_len);
-   return 0;
+   return ucma_complete(id_priv);
 }
 
 int rdma_resolve_addr(struct rdma_cm_id *id, struct sockaddr *src_addr,
@@ -751,7 +786,7 @@ int rdma_resolve_addr(struct rdma_cm_id *id, struct 
sockaddr *src_addr,
return (ret = 0) ? ERR(ENODATA) : -1;
 
memcpy(id-route.addr.dst_addr, dst_addr, dst_len);
-   return 0;
+   return ucma_complete(id_priv);
 }
 
 int rdma_resolve_route(struct rdma_cm_id *id, int timeout_ms)
@@ -770,7 +805,7 @@ int rdma_resolve_route(struct rdma_cm_id *id, int 
timeout_ms)
if (ret != size)
return (ret = 0) ? ERR(ENODATA) : -1;
 
-   return 0;
+   return ucma_complete(id_priv);
 }
 
 static int ucma_is_ud_ps(enum rdma_port_space ps)
@@ -1074,7 +1109,7 @@ int rdma_connect(struct rdma_cm_id *id, struct 
rdma_conn_param *conn_param)
if (ret != size)
return (ret = 0) ? ERR(ENODATA) : -1;
 
-   return 0;
+   return ucma_complete(id_priv);
 }
 
 int rdma_listen(struct rdma_cm_id *id, int backlog)
@@ -1139,7 +1174,7 @@ int rdma_accept(struct rdma_cm_id *id, struct 
rdma_conn_param *conn_param)
return (ret = 0) ? ERR(ENODATA) : -1;
}
 
-   return 0;
+   return ucma_complete(id_priv);
 }
 
 int 

[PATCH 11/37] librdmacm: add zalloc call

2010-04-07 Thread Sean Hefty
Signed-off-by: Sean Hefty sean.he...@intel.com
---

 src/cma.c |6 ++
 src/cma.h |   10 ++
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/src/cma.c b/src/cma.c
index a85448b..4025aeb 100644
--- a/src/cma.c
+++ b/src/cma.c
@@ -342,11 +342,10 @@ static struct cma_id_private *ucma_alloc_id(struct 
rdma_event_channel *channel,
 {
struct cma_id_private *id_priv;
 
-   id_priv = malloc(sizeof *id_priv);
+   id_priv = zalloc(sizeof *id_priv);
if (!id_priv)
return NULL;
 
-   memset(id_priv, 0, sizeof *id_priv);
id_priv-id.context = context;
id_priv-id.ps = ps;
id_priv-id.channel = channel;
@@ -1228,11 +1227,10 @@ static int rdma_join_multicast2(struct rdma_cm_id *id, 
struct sockaddr *addr,
int ret, size;

id_priv = container_of(id, struct cma_id_private, id);
-   mc = malloc(sizeof *mc);
+   mc = zalloc(sizeof *mc);
if (!mc)
return ERR(ENOMEM);
 
-   memset(mc, 0, sizeof *mc);
mc-context = context;
mc-id_priv = id_priv;
memcpy(mc-addr, addr, addrlen);
diff --git a/src/cma.h b/src/cma.h
index 1c0ab8b..fcfb1f7 100644
--- a/src/cma.h
+++ b/src/cma.h
@@ -42,6 +42,7 @@
 #include errno.h
 #include endian.h
 #include byteswap.h
+#include string.h
 
 #ifdef INCLUDE_VALGRIND
 #   include valgrind/memcheck.h
@@ -70,5 +71,14 @@ static inline int ERR(int err)
return -1;
 }
 
+static inline void *zalloc(size_t size)
+{
+   void *buf;
+
+   if ((buf = malloc(size)))
+   memset(buf, 0, size);
+   return buf;
+}
+
 #endif /* CMA_H */
 



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/37] librdmacm: move common definitions to internal header file

2010-04-07 Thread Sean Hefty
Signed-off-by: Sean Hefty sean.he...@intel.com
---

 Makefile.am |2 +-
 src/cma.c   |   28 +-
 src/cma.h   |   74 +++
 3 files changed, 76 insertions(+), 28 deletions(-)

diff --git a/Makefile.am b/Makefile.am
index 2898ad9..c9be437 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -70,7 +70,7 @@ man_MANS = \
 
 EXTRA_DIST = include/rdma/rdma_cma_abi.h include/rdma/rdma_cma.h \
 include/infiniband/ib.h \
-src/librdmacm.map librdmacm.spec.in $(man_MANS)
+src/cma.h src/librdmacm.map librdmacm.spec.in $(man_MANS)
 
 dist-hook: librdmacm.spec
cp librdmacm.spec $(distdir)
diff --git a/src/cma.c b/src/cma.c
index c83d9d2..a85448b 100644
--- a/src/cma.c
+++ b/src/cma.c
@@ -50,39 +50,13 @@
 #include byteswap.h
 #include stddef.h
 
+#include cma.h
 #include infiniband/driver.h
 #include infiniband/marshall.h
 #include rdma/rdma_cma.h
 #include rdma/rdma_cma_abi.h
 #include infiniband/ib.h
 
-#ifdef INCLUDE_VALGRIND
-#   include valgrind/memcheck.h
-#   ifndef VALGRIND_MAKE_MEM_DEFINED
-#   warning Valgrind requested, but VALGRIND_MAKE_MEM_DEFINED undefined
-#   endif
-#endif
-
-#ifndef VALGRIND_MAKE_MEM_DEFINED
-#   define VALGRIND_MAKE_MEM_DEFINED(addr,len)
-#endif
-
-#define PFX librdmacm: 
-
-#if __BYTE_ORDER == __LITTLE_ENDIAN
-static inline uint64_t htonll(uint64_t x) { return bswap_64(x); }
-static inline uint64_t ntohll(uint64_t x) { return bswap_64(x); }
-#else
-static inline uint64_t htonll(uint64_t x) { return x; }
-static inline uint64_t ntohll(uint64_t x) { return x; }
-#endif
-
-static inline int ERR(int err)
-{
-   errno = err;
-   return -1;
-}
-
 #define CMA_CREATE_MSG_CMD_RESP(msg, cmd, resp, type, size) \
 do {\
struct ucma_abi_cmd_hdr *hdr; \
diff --git a/src/cma.h b/src/cma.h
new file mode 100644
index 000..1c0ab8b
--- /dev/null
+++ b/src/cma.h
@@ -0,0 +1,74 @@
+/*
+ * Copyright (c) 2005-2010 Intel Corporation.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#if !defined(CMA_H)
+#define CMA_H
+
+#if HAVE_CONFIG_H
+#  include config.h
+#endif /* HAVE_CONFIG_H */
+
+#include stdlib.h
+#include errno.h
+#include endian.h
+#include byteswap.h
+
+#ifdef INCLUDE_VALGRIND
+#   include valgrind/memcheck.h
+#   ifndef VALGRIND_MAKE_MEM_DEFINED
+#   warning Valgrind requested, but VALGRIND_MAKE_MEM_DEFINED undefined
+#   endif
+#endif
+
+#ifndef VALGRIND_MAKE_MEM_DEFINED
+#   define VALGRIND_MAKE_MEM_DEFINED(addr,len)
+#endif
+
+#define PFX librdmacm: 
+
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+static inline uint64_t htonll(uint64_t x) { return bswap_64(x); }
+static inline uint64_t ntohll(uint64_t x) { return bswap_64(x); }
+#else
+static inline uint64_t htonll(uint64_t x) { return x; }
+static inline uint64_t ntohll(uint64_t x) { return x; }
+#endif
+
+static inline int ERR(int err)
+{
+   errno = err;
+   return -1;
+}
+
+#endif /* CMA_H */
+



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/37] librdmacm: replace query_route call with separate queries

2010-04-07 Thread Sean Hefty
To support other address families and multiple path records,
replace the query_route call with specific query calls to obtain
only the desired information.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 src/cma.c |   69 +
 1 files changed, 60 insertions(+), 9 deletions(-)

diff --git a/src/cma.c b/src/cma.c
index 2a70d20..c57d166 100644
--- a/src/cma.c
+++ b/src/cma.c
@@ -161,6 +161,7 @@ static struct cma_device *cma_dev_array;
 static int cma_dev_cnt;
 static pthread_mutex_t mut = PTHREAD_MUTEX_INITIALIZER;
 static int abi_ver = RDMA_USER_CM_MAX_ABI_VERSION;
+int af_ib_support;
 
 #define container_of(ptr, type, field) \
((type *) ((void *)ptr - offsetof(type, field)))
@@ -627,7 +628,7 @@ static int ucma_query_route(struct rdma_cm_id *id)
struct cma_id_private *id_priv;
void *msg;
int ret, size, i;
-   
+
CMA_CREATE_MSG_CMD_RESP(msg, cmd, resp, UCMA_CMD_QUERY_ROUTE, size);
id_priv = container_of(id, struct cma_id_private, id);
cmd-id = id_priv-handle;
@@ -1060,7 +1061,10 @@ int rdma_listen(struct rdma_cm_id *id, int backlog)
if (ret != size)
return (ret = 0) ? ERR(ENODATA) : -1;
 
-   return ucma_query_route(id);
+   if (af_ib_support)
+   return ucma_query_addr(id);
+   else
+   return ucma_query_route(id);
 }
 
 int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param)
@@ -1326,6 +1330,57 @@ int rdma_ack_cm_event(struct rdma_cm_event *event)
return 0;
 }
 
+static void ucma_process_addr_resolved(struct cma_event *evt)
+{
+   if (af_ib_support) {
+   evt-event.status = ucma_query_addr(evt-id_priv-id);
+   if (!evt-event.status 
+   evt-id_priv-id.verbs-device-transport_type == 
IBV_TRANSPORT_IB)
+   evt-event.status = ucma_query_gid(evt-id_priv-id);
+   } else {
+   evt-event.status = ucma_query_route(evt-id_priv-id);
+   }
+
+   if (evt-event.status)
+   evt-event.event = RDMA_CM_EVENT_ADDR_ERROR;
+}
+
+static void ucma_process_route_resolved(struct cma_event *evt)
+{
+   if (evt-id_priv-id.verbs-device-transport_type != IBV_TRANSPORT_IB)
+   return;
+
+   if (af_ib_support)
+   evt-event.status = ucma_query_path(evt-id_priv-id);
+   else
+   evt-event.status = ucma_query_route(evt-id_priv-id);
+
+   if (evt-event.status)
+   evt-event.event = RDMA_CM_EVENT_ROUTE_ERROR;
+}
+
+static int ucma_query_req_info(struct rdma_cm_id *id)
+{
+   int ret;
+
+   if (!af_ib_support)
+   return ucma_query_route(id);
+
+   ret = ucma_query_addr(id);
+   if (ret)
+   return ret;
+
+   ret = ucma_query_gid(id);
+   if (ret)
+   return ret;
+
+   ret = ucma_query_path(id);
+   if (ret)
+   return ret;
+
+   return 0;
+}
+
 static int ucma_process_conn_req(struct cma_event *evt,
 uint32_t handle)
 {
@@ -1344,7 +1399,7 @@ static int ucma_process_conn_req(struct cma_event *evt,
evt-event.id = id_priv-id;
id_priv-handle = handle;
 
-   ret = ucma_query_route(id_priv-id);
+   ret = ucma_query_req_info(id_priv-id);
if (ret) {
rdma_destroy_id(id_priv-id);
goto err;
@@ -1473,14 +1528,10 @@ retry:
 
switch (resp-event) {
case RDMA_CM_EVENT_ADDR_RESOLVED:
-   evt-event.status = ucma_query_route(evt-id_priv-id);
-   if (evt-event.status)
-   evt-event.event = RDMA_CM_EVENT_ADDR_ERROR;
+   ucma_process_addr_resolved(evt);
break;
case RDMA_CM_EVENT_ROUTE_RESOLVED:
-   evt-event.status = ucma_query_route(evt-id_priv-id);
-   if (evt-event.status)
-   evt-event.event = RDMA_CM_EVENT_ROUTE_ERROR;
+   ucma_process_route_resolved(evt);
break;
case RDMA_CM_EVENT_CONNECT_REQUEST:
evt-id_priv = (void *) (uintptr_t) resp-uid;



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/37] librdmacm: add ability to query IB path records

2010-04-07 Thread Sean Hefty
The current query_route command only supports 2 path records.
Add support for query_path, which is capable of supporting
multiple paths.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 include/rdma/rdma_cma_abi.h |   10 +
 src/cma.c   |   82 +++
 2 files changed, 91 insertions(+), 1 deletions(-)

diff --git a/include/rdma/rdma_cma_abi.h b/include/rdma/rdma_cma_abi.h
index 5c736fb..6c83fe8 100644
--- a/include/rdma/rdma_cma_abi.h
+++ b/include/rdma/rdma_cma_abi.h
@@ -35,6 +35,7 @@
 
 #include infiniband/kern-abi.h
 #include infiniband/sa-kern-abi.h
+#include infiniband/sa.h
 
 /*
  * This file must be kept in sync with the kernel's version of rdma_user_cm.h
@@ -114,7 +115,8 @@ struct ucma_abi_resolve_route {
 };
 
 enum {
-   UCMA_QUERY_ADDR
+   UCMA_QUERY_ADDR,
+   UCMA_QUERY_PATH
 };
 
 struct ucma_abi_query {
@@ -144,6 +146,12 @@ struct ucma_abi_query_addr_resp {
struct sockaddr_storage dst_addr;
 };
 
+struct ucma_abi_query_path_resp {
+   __u32 num_paths;
+   __u32 reserved;
+   struct ib_path_data path_data[0];
+};
+
 struct ucma_abi_conn_param {
__u32 qp_num;
__u32 reserved;
diff --git a/src/cma.c b/src/cma.c
index 2aef594..c3c6b73 100644
--- a/src/cma.c
+++ b/src/cma.c
@@ -506,6 +506,88 @@ static int ucma_query_addr(struct rdma_cm_id *id)
return 0;
 }
 
+static void ucma_convert_path(struct ib_path_data *path_data,
+ struct ibv_sa_path_rec *sa_path)
+{
+   uint32_t fl_hop;
+
+   sa_path-dgid = path_data-path.dgid;
+   sa_path-sgid = path_data-path.sgid;
+   sa_path-dlid = path_data-path.dlid;
+   sa_path-slid = path_data-path.slid;
+   sa_path-raw_traffic = 0;
+
+   fl_hop = ntohl(path_data-path.flowlabel_hoplimit);
+   sa_path-flow_label = htonl(fl_hop  8);
+   sa_path-hop_limit = (uint8_t) fl_hop;
+
+   sa_path-traffic_class = path_data-path.tclass;
+   sa_path-reversible = path_data-path.reversible_numpath  7;
+   sa_path-numb_path = 1;
+   sa_path-pkey = path_data-path.pkey;
+   sa_path-sl = ntohs(path_data-path.qosclass_sl)  0xF;
+   sa_path-mtu_selector = 1;
+   sa_path-mtu = path_data-path.mtu  0x1F;
+   sa_path-rate_selector = 1;
+   sa_path-rate = path_data-path.rate  0x1F;
+   sa_path-packet_life_time_selector = 1;
+   sa_path-packet_life_time = path_data-path.packetlifetime  0x1F;
+
+   sa_path-preference = (uint8_t) path_data-flags;
+}
+
+static int ucma_query_path(struct rdma_cm_id *id)
+{
+   struct ucma_abi_query_path_resp *resp;
+   struct ucma_abi_query *cmd;
+   struct ucma_abi_cmd_hdr *hdr;
+   struct cma_id_private *id_priv;
+   void *msg;
+   int ret, size, i;
+
+   size = sizeof(*hdr) + sizeof(*cmd);
+   msg = alloca(size);
+   if (!msg)
+   return ERR(ENOMEM);
+
+   hdr = msg;
+   cmd = msg + sizeof(*hdr);
+
+   hdr-cmd = UCMA_CMD_QUERY;
+   hdr-in  = sizeof(*cmd);
+   hdr-out = sizeof(*resp) + sizeof(struct ib_path_data) * 6;
+
+   memset(cmd, 0, sizeof(*cmd));
+
+   resp = alloca(hdr-out);
+   if (!resp)
+   return ERR(ENOMEM);
+
+   id_priv = container_of(id, struct cma_id_private, id);
+   cmd-response = (uintptr_t) resp;
+   cmd-id = id_priv-handle;
+   cmd-option = UCMA_QUERY_PATH;
+
+   ret = write(id-channel-fd, msg, size);
+   if (ret != size)
+   return (ret = 0) ? ERR(ENODATA) : -1;
+
+   VALGRIND_MAKE_MEM_DEFINED(resp, hdr-out);
+
+   if (resp-num_paths) {
+   id-route.path_rec = malloc(sizeof(*id-route.path_rec) *
+   resp-num_paths);
+   if (!id-route.path_rec)
+   return ERR(ENOMEM);
+
+   id-route.num_paths = resp-num_paths;
+   for (i = 0; i  resp-num_paths; i++)
+   ucma_convert_path(resp-path_data[i], 
id-route.path_rec[i]);
+   }
+
+   return 0;
+}
+
 static int ucma_query_route(struct rdma_cm_id *id)
 {
struct ucma_abi_query_route_resp *resp;



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/37] librdmacm: name changes to indicate only IP addresses supported

2010-04-07 Thread Sean Hefty
Several commands to the kernel RDMA CM only support IP addresses
because of limitations in the structure definition.  Update
the library to match the name changes in the kernel and indicate
that only IP addresses can be used with the current commands.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 include/rdma/rdma_cma_abi.h |   12 ++--
 src/cma.c   |   12 ++--
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/rdma/rdma_cma_abi.h b/include/rdma/rdma_cma_abi.h
index 1a3a9c2..e51a372 100644
--- a/include/rdma/rdma_cma_abi.h
+++ b/include/rdma/rdma_cma_abi.h
@@ -48,8 +48,8 @@
 enum {
UCMA_CMD_CREATE_ID,
UCMA_CMD_DESTROY_ID,
-   UCMA_CMD_BIND_ADDR,
-   UCMA_CMD_RESOLVE_ADDR,
+   UCMA_CMD_BIND_IP,
+   UCMA_CMD_RESOLVE_IP,
UCMA_CMD_RESOLVE_ROUTE,
UCMA_CMD_QUERY_ROUTE,
UCMA_CMD_CONNECT,
@@ -62,7 +62,7 @@ enum {
UCMA_CMD_GET_OPTION,
UCMA_CMD_SET_OPTION,
UCMA_CMD_NOTIFY,
-   UCMA_CMD_JOIN_MCAST,
+   UCMA_CMD_JOIN_IP_MCAST,
UCMA_CMD_LEAVE_MCAST,
UCMA_CMD_MIGRATE_ID
 };
@@ -94,13 +94,13 @@ struct ucma_abi_destroy_id_resp {
__u32 events_reported;
 };
 
-struct ucma_abi_bind_addr {
+struct ucma_abi_bind_ip {
__u64 response;
struct sockaddr_in6 addr;
__u32 id;
 };
 
-struct ucma_abi_resolve_addr {
+struct ucma_abi_resolve_ip {
struct sockaddr_in6 src_addr;
struct sockaddr_in6 dst_addr;
__u32 id;
@@ -192,7 +192,7 @@ struct ucma_abi_notify {
__u32 event;
 };
 
-struct ucma_abi_join_mcast {
+struct ucma_abi_join_ip_mcast {
__u64 response; /* ucma_abi_create_id_resp */
__u64 uid;
struct sockaddr_in6 addr;
diff --git a/src/cma.c b/src/cma.c
index 59e89dd..b5f71d0 100644
--- a/src/cma.c
+++ b/src/cma.c
@@ -525,7 +525,7 @@ static int ucma_query_route(struct rdma_cm_id *id)
 
 int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
 {
-   struct ucma_abi_bind_addr *cmd;
+   struct ucma_abi_bind_ip *cmd;
struct cma_id_private *id_priv;
void *msg;
int ret, size, addrlen;
@@ -534,7 +534,7 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr 
*addr)
if (!addrlen)
return ERR(EINVAL);
 
-   CMA_CREATE_MSG_CMD(msg, cmd, UCMA_CMD_BIND_ADDR, size);
+   CMA_CREATE_MSG_CMD(msg, cmd, UCMA_CMD_BIND_IP, size);
id_priv = container_of(id, struct cma_id_private, id);
cmd-id = id_priv-handle;
memcpy(cmd-addr, addr, addrlen);
@@ -549,7 +549,7 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr 
*addr)
 int rdma_resolve_addr(struct rdma_cm_id *id, struct sockaddr *src_addr,
  struct sockaddr *dst_addr, int timeout_ms)
 {
-   struct ucma_abi_resolve_addr *cmd;
+   struct ucma_abi_resolve_ip *cmd;
struct cma_id_private *id_priv;
void *msg;
int ret, size, daddrlen;
@@ -558,7 +558,7 @@ int rdma_resolve_addr(struct rdma_cm_id *id, struct 
sockaddr *src_addr,
if (!daddrlen)
return ERR(EINVAL);
 
-   CMA_CREATE_MSG_CMD(msg, cmd, UCMA_CMD_RESOLVE_ADDR, size);
+   CMA_CREATE_MSG_CMD(msg, cmd, UCMA_CMD_RESOLVE_IP, size);
id_priv = container_of(id, struct cma_id_private, id);
cmd-id = id_priv-handle;
if (src_addr)
@@ -1037,7 +1037,7 @@ int rdma_disconnect(struct rdma_cm_id *id)
 int rdma_join_multicast(struct rdma_cm_id *id, struct sockaddr *addr,
void *context)
 {
-   struct ucma_abi_join_mcast *cmd;
+   struct ucma_abi_join_ip_mcast *cmd;
struct ucma_abi_create_id_resp *resp;
struct cma_id_private *id_priv;
struct cma_multicast *mc, **pos;
@@ -1067,7 +1067,7 @@ int rdma_join_multicast(struct rdma_cm_id *id, struct 
sockaddr *addr,
id_priv-mc_list = mc;
pthread_mutex_unlock(id_priv-mut);
 
-   CMA_CREATE_MSG_CMD_RESP(msg, cmd, resp, UCMA_CMD_JOIN_MCAST, size);
+   CMA_CREATE_MSG_CMD_RESP(msg, cmd, resp, UCMA_CMD_JOIN_IP_MCAST, size);
cmd-id = id_priv-handle;
memcpy(cmd-addr, addr, addrlen);
cmd-uid = (uintptr_t) mc;



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/37] librdmacm: support querying AF_IB addresses

2010-04-07 Thread Sean Hefty
The current query route command returns path record data and address
information.  The latter is restricted to sizeof(sockaddr_in6).  In
order to support AF_IB, modify the library to use the new query addr
command, which supports larger address sizes and avoids querying for
path records data when none are available.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 include/rdma/rdma_cma_abi.h |   22 +++---
 src/cma.c   |   35 ++-
 2 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/include/rdma/rdma_cma_abi.h b/include/rdma/rdma_cma_abi.h
index e51a372..5c736fb 100644
--- a/include/rdma/rdma_cma_abi.h
+++ b/include/rdma/rdma_cma_abi.h
@@ -64,7 +64,8 @@ enum {
UCMA_CMD_NOTIFY,
UCMA_CMD_JOIN_IP_MCAST,
UCMA_CMD_LEAVE_MCAST,
-   UCMA_CMD_MIGRATE_ID
+   UCMA_CMD_MIGRATE_ID,
+   UCMA_CMD_QUERY
 };
 
 struct ucma_abi_cmd_hdr {
@@ -112,10 +113,14 @@ struct ucma_abi_resolve_route {
__u32 timeout_ms;
 };
 
-struct ucma_abi_query_route {
+enum {
+   UCMA_QUERY_ADDR
+};
+
+struct ucma_abi_query {
__u64 response;
__u32 id;
-   __u32 reserved;
+   __u32 option;
 };
 
 struct ucma_abi_query_route_resp {
@@ -128,6 +133,17 @@ struct ucma_abi_query_route_resp {
__u8 reserved[3];
 };
 
+struct ucma_abi_query_addr_resp {
+   __u64 node_guid;
+   __u8  port_num;
+   __u8  reserved;
+   __u16 pkey;
+   __u16 src_size;
+   __u16 dst_size;
+   struct sockaddr_storage src_addr;
+   struct sockaddr_storage dst_addr;
+};
+
 struct ucma_abi_conn_param {
__u32 qp_num;
__u32 reserved;
diff --git a/src/cma.c b/src/cma.c
index b5f71d0..2aef594 100644
--- a/src/cma.c
+++ b/src/cma.c
@@ -473,10 +473,43 @@ static int ucma_addrlen(struct sockaddr *addr)
}
 }
 
+static int ucma_query_addr(struct rdma_cm_id *id)
+{
+   struct ucma_abi_query_addr_resp *resp;
+   struct ucma_abi_query *cmd;
+   struct cma_id_private *id_priv;
+   void *msg;
+   int ret, size;
+   
+   CMA_CREATE_MSG_CMD_RESP(msg, cmd, resp, UCMA_CMD_QUERY, size);
+   id_priv = container_of(id, struct cma_id_private, id);
+   cmd-id = id_priv-handle;
+   cmd-option = UCMA_QUERY_ADDR;
+
+   ret = write(id-channel-fd, msg, size);
+   if (ret != size)
+   return (ret = 0) ? ERR(ENODATA) : -1;
+
+   VALGRIND_MAKE_MEM_DEFINED(resp, sizeof *resp);
+
+   memcpy(id-route.addr.src_addr, resp-src_addr, resp-src_size);
+   memcpy(id-route.addr.dst_addr, resp-dst_addr, resp-dst_size);
+
+   if (!id_priv-cma_dev  resp-node_guid) {
+   ret = ucma_get_device(id_priv, resp-node_guid);
+   if (ret)
+   return ret;
+   id-port_num = resp-port_num;
+   id-route.addr.addr.ibaddr.pkey = resp-pkey;
+   }
+
+   return 0;
+}
+
 static int ucma_query_route(struct rdma_cm_id *id)
 {
struct ucma_abi_query_route_resp *resp;
-   struct ucma_abi_query_route *cmd;
+   struct ucma_abi_query *cmd;
struct cma_id_private *id_priv;
void *msg;
int ret, size, i;



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/37] librdmacm: add support to query GIDs

2010-04-07 Thread Sean Hefty
Support query GID ABI to obtain GID information separately from
path record data and sa_family addressing.

This patch also adds the definition for sockaddr_ib for userspace.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 Makefile.am |6 ++-
 include/infiniband/ib.h |   97 +++
 include/rdma/rdma_cma.h |4 +-
 include/rdma/rdma_cma_abi.h |3 +
 src/cma.c   |   32 ++
 5 files changed, 137 insertions(+), 5 deletions(-)

diff --git a/Makefile.am b/Makefile.am
index 290cbc3..2898ad9 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -27,10 +27,11 @@ examples_udaddy_LDADD = $(top_builddir)/src/librdmacm.la
 examples_mckey_SOURCES = examples/mckey.c
 examples_mckey_LDADD = $(top_builddir)/src/librdmacm.la
 
-librdmacmincludedir = $(includedir)/rdma
+librdmacmincludedir = $(includedir)/rdma $(includedir)/infiniband
 
 librdmacminclude_HEADERS = include/rdma/rdma_cma_abi.h \
-  include/rdma/rdma_cma.h
+  include/rdma/rdma_cma.h \
+  include/infiniband/ib.h
 
 man_MANS = \
man/rdma_accept.3 \
@@ -68,6 +69,7 @@ man_MANS = \
man/rdma_cm.7
 
 EXTRA_DIST = include/rdma/rdma_cma_abi.h include/rdma/rdma_cma.h \
+include/infiniband/ib.h \
 src/librdmacm.map librdmacm.spec.in $(man_MANS)
 
 dist-hook: librdmacm.spec
diff --git a/include/infiniband/ib.h b/include/infiniband/ib.h
new file mode 100644
index 000..3a97322
--- /dev/null
+++ b/include/infiniband/ib.h
@@ -0,0 +1,97 @@
+/*
+ * Copyright (c) 2010 Intel Corporation.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#if !defined(_RDMA_IB_H)
+#define _RDMA_IB_H
+
+#include linux/types.h
+#include string.h
+
+#ifndef AF_IB
+#define AF_IB 27
+#endif
+#ifndef PF_IB
+#define PF_IB AF_IB
+#endif
+
+struct ib_addr {
+   union {
+   __u8uib_addr8[16];
+   __be16  uib_addr16[8];
+   __be32  uib_addr32[4];
+   __be64  uib_addr64[2];
+   } ib_u;
+#define sib_addr8  ib_u.uib_addr8
+#define sib_addr16 ib_u.uib_addr16
+#define sib_addr32 ib_u.uib_addr32
+#define sib_addr64 ib_u.uib_addr64
+#define sib_rawib_u.uib_addr8
+#define sib_subnet_prefix  ib_u.uib_addr64[0]
+#define sib_interface_id   ib_u.uib_addr64[1]
+};
+
+static inline int ib_addr_any(const struct ib_addr *a)
+{
+   return ((a-sib_addr64[0] | a-sib_addr64[1]) == 0);
+}
+
+static inline int ib_addr_loopback(const struct ib_addr *a)
+{
+   return ((a-sib_addr32[0] | a-sib_addr32[1] |
+a-sib_addr32[2] | (a-sib_addr32[3] ^ htonl(1))) == 0);
+}
+
+static inline void ib_addr_set(struct ib_addr *addr,
+  __be32 w1, __be32 w2, __be32 w3, __be32 w4)
+{
+   addr-sib_addr32[0] = w1;
+   addr-sib_addr32[1] = w2;
+   addr-sib_addr32[2] = w3;
+   addr-sib_addr32[3] = w4;
+}
+
+static inline int ib_addr_cmp(const struct ib_addr *a1, const struct ib_addr 
*a2)
+{
+   return memcmp(a1, a2, sizeof(struct ib_addr));
+}
+
+struct sockaddr_ib {
+   unsigned short int  sib_family; /* AF_IB */
+   __be16  sib_pkey;
+   __be32  sib_flowinfo;
+   struct ib_addr  sib_addr;
+   __be64  sib_sid;
+   __be64  sib_sid_mask;
+   __u64   sib_scope_id;
+};
+
+#endif /* _RDMA_IB_H */
diff --git a/include/rdma/rdma_cma.h 

[PATCH 8/37] librdmacm: add support for PF_IB to resolve_addr

2010-04-07 Thread Sean Hefty
Allow user to specify PF_IB addresses to rdma_resolve_addr.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 include/rdma/rdma_cma_abi.h |   13 -
 src/cma.c   |   44 +--
 2 files changed, 50 insertions(+), 7 deletions(-)

diff --git a/include/rdma/rdma_cma_abi.h b/include/rdma/rdma_cma_abi.h
index 8add397..4a7a55d 100644
--- a/include/rdma/rdma_cma_abi.h
+++ b/include/rdma/rdma_cma_abi.h
@@ -67,7 +67,8 @@ enum {
UCMA_CMD_LEAVE_MCAST,
UCMA_CMD_MIGRATE_ID,
UCMA_CMD_QUERY,
-   UCMA_CMD_BIND
+   UCMA_CMD_BIND,
+   UCMA_CMD_RESOLVE_ADDR
 };
 
 struct ucma_abi_cmd_hdr {
@@ -117,6 +118,16 @@ struct ucma_abi_resolve_ip {
__u32 timeout_ms;
 };
 
+struct ucma_abi_resolve_addr {
+   __u32 id;
+   __u32 timeout_ms;
+   __u16 src_size;
+   __u16 dst_size;
+   __u32 reserved;
+   struct sockaddr_storage src_addr;
+   struct sockaddr_storage dst_addr;
+};
+
 struct ucma_abi_resolve_route {
__u32 id;
__u32 timeout_ms;
diff --git a/src/cma.c b/src/cma.c
index be61333..e22e1b4 100644
--- a/src/cma.c
+++ b/src/cma.c
@@ -721,31 +721,63 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr 
*addr)
return ucma_query_route(id);
 }
 
+static int rdma_resolve_addr2(struct rdma_cm_id *id, struct sockaddr *src_addr,
+ socklen_t src_len, struct sockaddr *dst_addr,
+ socklen_t dst_len, int timeout_ms)
+{
+   struct ucma_abi_resolve_addr *cmd;
+   struct cma_id_private *id_priv;
+   void *msg;
+   int ret, size;
+   
+   CMA_CREATE_MSG_CMD(msg, cmd, UCMA_CMD_RESOLVE_ADDR, size);
+   id_priv = container_of(id, struct cma_id_private, id);
+   cmd-id = id_priv-handle;
+   if ((cmd-src_size = src_len))
+   memcpy(cmd-src_addr, src_addr, src_len);
+   memcpy(cmd-dst_addr, dst_addr, dst_len);
+   cmd-dst_size = dst_len;
+   cmd-timeout_ms = timeout_ms;
+   cmd-reserved = 0;
+
+   ret = write(id-channel-fd, msg, size);
+   if (ret != size)
+   return (ret = 0) ? ERR(ENODATA) : -1;
+
+   memcpy(id-route.addr.dst_addr, dst_addr, dst_len);
+   return 0;
+}
+
 int rdma_resolve_addr(struct rdma_cm_id *id, struct sockaddr *src_addr,
  struct sockaddr *dst_addr, int timeout_ms)
 {
struct ucma_abi_resolve_ip *cmd;
struct cma_id_private *id_priv;
void *msg;
-   int ret, size, daddrlen;
+   int ret, size, dst_len, src_len;

-   daddrlen = ucma_addrlen(dst_addr);
-   if (!daddrlen)
+   dst_len = ucma_addrlen(dst_addr);
+   if (!dst_len)
return ERR(EINVAL);
 
+   src_len = ucma_addrlen(src_addr);
+   if (af_ib_support)
+   return rdma_resolve_addr2(id, src_addr, src_len, dst_addr,
+ dst_len, timeout_ms);
+
CMA_CREATE_MSG_CMD(msg, cmd, UCMA_CMD_RESOLVE_IP, size);
id_priv = container_of(id, struct cma_id_private, id);
cmd-id = id_priv-handle;
if (src_addr)
-   memcpy(cmd-src_addr, src_addr, ucma_addrlen(src_addr));
-   memcpy(cmd-dst_addr, dst_addr, daddrlen);
+   memcpy(cmd-src_addr, src_addr, src_len);
+   memcpy(cmd-dst_addr, dst_addr, dst_len);
cmd-timeout_ms = timeout_ms;
 
ret = write(id-channel-fd, msg, size);
if (ret != size)
return (ret = 0) ? ERR(ENODATA) : -1;
 
-   memcpy(id-route.addr.dst_addr, dst_addr, daddrlen);
+   memcpy(id-route.addr.dst_addr, dst_addr, dst_len);
return 0;
 }
 



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/37] librdmacm: allow pd parameter to be optional

2010-04-07 Thread Sean Hefty
Allow the user to create a QP using rdma_create_qp without
specifying a PD.  If a PD is not given, a default PD will be
used instead.  This simplifies the user interface.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 include/rdma/rdma_cma.h |4 +++-
 src/cma.c   |   24 +++-
 2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/include/rdma/rdma_cma.h b/include/rdma/rdma_cma.h
index 83418c3..ccf6cd4 100644
--- a/include/rdma/rdma_cma.h
+++ b/include/rdma/rdma_cma.h
@@ -279,7 +279,7 @@ int rdma_resolve_route(struct rdma_cm_id *id, int 
timeout_ms);
 /**
  * rdma_create_qp - Allocate a QP.
  * @id: RDMA identifier.
- * @pd: protection domain for the QP.
+ * @pd: Optional protection domain for the QP.
  * @qp_init_attr: initial QP attributes.
  * Description:
  *  Allocate a QP associated with the specified rdma_cm_id and transition it
@@ -291,6 +291,8 @@ int rdma_resolve_route(struct rdma_cm_id *id, int 
timeout_ms);
  *   librdmacm through their states.  After being allocated, the QP will be
  *   ready to handle posting of receives.  If the QP is unconnected, it will
  *   be ready to post sends.
+ *   If pd is NULL, then the QP will be allocated using a default protection
+ *   domain associated with the underlying RDMA device.
  * See also:
  *   rdma_bind_addr, rdma_resolve_addr, rdma_destroy_qp, ibv_create_qp,
  *   ibv_modify_qp
diff --git a/src/cma.c b/src/cma.c
index c7a3a7b..0587ab3 100644
--- a/src/cma.c
+++ b/src/cma.c
@@ -95,6 +95,7 @@ do {\
 
 struct cma_device {
struct ibv_context *verbs;
+   struct ibv_pd  *pd;
uint64_tguid;
int port_cnt;
uint8_t max_initiator_depth;
@@ -144,9 +145,11 @@ int af_ib_support;
 static void ucma_cleanup(void)
 {
if (cma_dev_cnt) {
-   while (cma_dev_cnt)
-   ibv_close_device(cma_dev_array[--cma_dev_cnt].verbs);
-   
+   while (cma_dev_cnt--) {
+   ibv_dealloc_pd(cma_dev_array[cma_dev_cnt].pd);
+   ibv_close_device(cma_dev_array[cma_dev_cnt].verbs);
+   }
+
free(cma_dev_array);
cma_dev_cnt = 0;
}
@@ -224,6 +227,13 @@ static int ucma_init(void)
goto err3;
}
 
+   cma_dev-pd = ibv_alloc_pd(cma_dev-verbs);
+   if (!cma_dev-pd) {
+   ibv_close_device(cma_dev-verbs);
+   ret = ERR(ENOMEM);
+   goto err3;
+   }
+
i++;
ret = ibv_query_device(cma_dev-verbs, attr);
if (ret) {
@@ -242,8 +252,10 @@ static int ucma_init(void)
return 0;
 
 err3:
-   while (i--)
+   while (i--) {
+   ibv_dealloc_pd(cma_dev_array[i].pd);
ibv_close_device(cma_dev_array[i].verbs);
+   }
free(cma_dev_array);
 err2:
ibv_free_device_list(dev_list);
@@ -1021,7 +1033,9 @@ int rdma_create_qp(struct rdma_cm_id *id, struct ibv_pd 
*pd,
int ret;
 
id_priv = container_of(id, struct cma_id_private, id);
-   if (id-verbs != pd-context)
+   if (!pd)
+   pd = id_priv-cma_dev-pd;
+   else if (id-verbs != pd-context)
return ERR(EINVAL);
 
qp = ibv_create_qp(pd, qp_init_attr);



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 21/37] librdmacm: specify qp_type when creating id

2010-04-07 Thread Sean Hefty
To support AF_IB / PS_IB, we need to specify the qp type when
creating the rdma_cm_id.  The kernel requires this in order
to select the correct type of operation to perform (e.g. SIDR
versus REQ).

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 include/rdma/rdma_cma_abi.h |3 ++-
 src/cma.c   |   18 +++---
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/include/rdma/rdma_cma_abi.h b/include/rdma/rdma_cma_abi.h
index c3981e6..bd4ca0f 100644
--- a/include/rdma/rdma_cma_abi.h
+++ b/include/rdma/rdma_cma_abi.h
@@ -82,7 +82,8 @@ struct ucma_abi_create_id {
__u64 uid;
__u64 response;
__u16 ps;
-   __u8  reserved[6];
+   __u8  qp_type;
+   __u8  reserved[5];
 };
 
 struct ucma_abi_create_id_resp {
diff --git a/src/cma.c b/src/cma.c
index 9de33d4..e31fb8a 100644
--- a/src/cma.c
+++ b/src/cma.c
@@ -390,9 +390,9 @@ err:ucma_free_id(id_priv);
return NULL;
 }
 
-int rdma_create_id(struct rdma_event_channel *channel,
-  struct rdma_cm_id **id, void *context,
-  enum rdma_port_space ps)
+static int rdma_create_id2(struct rdma_event_channel *channel,
+  struct rdma_cm_id **id, void *context,
+  enum rdma_port_space ps, enum ibv_qp_type qp_type)
 {
struct ucma_abi_create_id_resp *resp;
struct ucma_abi_create_id *cmd;
@@ -411,6 +411,7 @@ int rdma_create_id(struct rdma_event_channel *channel,
CMA_CREATE_MSG_CMD_RESP(msg, cmd, resp, UCMA_CMD_CREATE_ID, size);
cmd-uid = (uintptr_t) id_priv;
cmd-ps = ps;
+   cmd-qp_type = qp_type;
 
ret = write(id_priv-id.channel-fd, msg, size);
if (ret != size)
@@ -426,6 +427,17 @@ err:   ucma_free_id(id_priv);
return ret;
 }
 
+int rdma_create_id(struct rdma_event_channel *channel,
+  struct rdma_cm_id **id, void *context,
+  enum rdma_port_space ps)
+{
+   enum ibv_qp_type qp_type;
+
+   qp_type = (ps == RDMA_PS_IPOIB || ps == RDMA_PS_UDP) ?
+ IBV_QPT_UD : IBV_QPT_RC;
+   return rdma_create_id2(channel, id, context, ps, qp_type);
+}
+
 static int ucma_destroy_kern_id(int fd, uint32_t handle)
 {
struct ucma_abi_destroy_id_resp *resp;



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[infiniband-diags] [1/3] support --diff in ibnetdiscover

2010-04-07 Thread Al Chu
Hi Sasha,

This patch adds the default --diff support in ibnetdiscover.

Al

-- 
Albert Chu
ch...@llnl.gov
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory
---BeginMessage---

Signed-off-by: Albert Chu ch...@llnl.gov
---
 infiniband-diags/man/ibnetdiscover.8 |7 +
 infiniband-diags/src/ibnetdiscover.c |  246 +
 2 files changed, 223 insertions(+), 30 deletions(-)

diff --git a/infiniband-diags/man/ibnetdiscover.8 
b/infiniband-diags/man/ibnetdiscover.8
index 082a8e4..975b999 100644
--- a/infiniband-diags/man/ibnetdiscover.8
+++ b/infiniband-diags/man/ibnetdiscover.8
@@ -57,6 +57,13 @@ Load and use the cached ibnetdiscover data stored in the 
specified
 filename.  May be useful for outputting and learning about other
 fabrics or a previous state of a fabric.
 .TP
+\fB\-\-diff\fR filename
+Load cached ibnetdiscover data and do a diff comparison to the current
+network or another cache.  A special diff output for ibnetdiscover
+output will be displayed showing differences between the old and current
+fabric.  By default, the following are compared for differences: switches,
+channel adapters, routers, and port connections.
+.TP
 \fB\-p\fR, \fB\-\-ports\fR
 Obtain a ports report which is a
 list of connected ports with relevant information (like LID, portnum,
diff --git a/infiniband-diags/src/ibnetdiscover.c 
b/infiniband-diags/src/ibnetdiscover.c
index 651bafd..4da09ce 100644
--- a/infiniband-diags/src/ibnetdiscover.c
+++ b/infiniband-diags/src/ibnetdiscover.c
@@ -57,6 +57,16 @@
 #define LIST_SWITCH_NODE (1  IB_NODE_SWITCH)
 #define LIST_ROUTER_NODE (1  IB_NODE_ROUTER)
 
+#define DIFF_FLAG_SWITCH  0x0001
+#define DIFF_FLAG_CA  0x0002
+#define DIFF_FLAG_ROUTER  0x0004
+#define DIFF_FLAG_PORT_CONNECTION 0x0008
+
+#define DIFF_FLAG_DEFAULT  (DIFF_FLAG_SWITCH \
+   | DIFF_FLAG_CA \
+   | DIFF_FLAG_ROUTER \
+   | DIFF_FLAG_PORT_CONNECTION)
+
 struct ibmad_port *srcport;
 
 static FILE *f;
@@ -65,6 +75,7 @@ static char *node_name_map_file = NULL;
 static nn_map_t *node_name_map = NULL;
 static char *cache_file = NULL;
 static char *load_cache_file = NULL;
+static char *diff_cache_file = NULL;
 
 static int report_max_hops = 0;
 
@@ -183,16 +194,20 @@ void list_nodes(ibnd_fabric_t * fabric, int list)
ibnd_iter_nodes_type(fabric, list_node, IB_NODE_ROUTER, NULL);
 }
 
-void out_ids(ibnd_node_t * node, int group, char *chname)
+void out_ids(ibnd_node_t * node, int group, char *chname, char *out_prefix)
 {
uint64_t sysimgguid =
mad_get_field64(node-info, 0, IB_NODE_SYSTEM_GUID_F);
 
-   fprintf(f, \nvendid=0x%x\ndevid=0x%x\n,
-   mad_get_field(node-info, 0, IB_NODE_VENDORID_F),
+   fprintf(f, \n%svendid=0x%x\n,
+   out_prefix ? out_prefix : ,
+   mad_get_field(node-info, 0, IB_NODE_VENDORID_F));
+   fprintf(f, %sdevid=0x%x\n,
+   out_prefix ? out_prefix : ,
mad_get_field(node-info, 0, IB_NODE_DEVID_F));
if (sysimgguid)
-   fprintf(f, sysimgguid=0x% PRIx64, sysimgguid);
+   fprintf(f, %ssysimgguid=0x% PRIx64,
+   out_prefix ? out_prefix : , sysimgguid);
if (group  node-chassis  node-chassis-chassisnum) {
fprintf(f, \t\t# Chassis %d, node-chassis-chassisnum);
if (chname)
@@ -217,14 +232,15 @@ uint64_t out_chassis(ibnd_fabric_t * fabric, unsigned 
char chassisnum)
return guid;
 }
 
-void out_switch(ibnd_node_t * node, int group, char *chname)
+void out_switch(ibnd_node_t * node, int group, char *chname, char *out_prefix)
 {
char *str;
char str2[256];
char *nodename = NULL;
 
-   out_ids(node, group, chname);
-   fprintf(f, switchguid=0x% PRIx64, node-guid);
+   out_ids(node, group, chname, out_prefix);
+   fprintf(f, %sswitchguid=0x% PRIx64,
+   out_prefix ? out_prefix : , node-guid);
fprintf(f, (% PRIx64 ),
mad_get_field64(node-info, 0, IB_NODE_PORT_GUID_F));
if (group) {
@@ -239,7 +255,8 @@ void out_switch(ibnd_node_t * node, int group, char *chname)
 
nodename = remap_node_name(node_name_map, node-guid, node-nodedesc);
 
-   fprintf(f, \nSwitch\t%d %s\t\t# \%s\ %s port 0 lid %d lmc %d\n,
+   fprintf(f, \n%sSwitch\t%d %s\t\t# \%s\ %s port 0 lid %d lmc %d\n,
+   out_prefix ? out_prefix : ,
node-numports, node_name(node), nodename,
node-smaenhsp0 ? enhanced : base,
node-smalid, node-smalmc);
@@ -247,12 +264,12 @@ void out_switch(ibnd_node_t * node, int group, char 
*chname)
free(nodename);
 }
 
-void out_ca(ibnd_node_t * node, int group, char *chname)
+void out_ca(ibnd_node_t * node, int group, char *chname, char *out_prefix)
 {
char 

Re: Fork safe clarification

2010-04-07 Thread Roland Dreier
  Are PDs, QPs and CQs created before a fork shared by the parent and child
  after fork() has returned (ie. both can submit WRs, poll CQ, etc.)?

no, QPs and CQs are accessible only in the parent.  The child can still
use the uverbs file descriptor to do things, but libibverbs will
probably get very confused in this case.  More userspace development
would probably be required to make this really work.  Since the PD is
attached to the FD, it could be shared.

  What about MRs registered before the fork?  Even though the child doesn't
  have access to the parent's memory, can he sill submit WRs on a QP with an
  MR created before the fork?

yes.

  What if the MR pages in the above scenario are accessible in both parent and
  child (shared memory)?  Are there complications with registering shared
  memory?

shouldn't make a difference.

  In general, are pointers returned by libibverbs pointer to user/process
  address space (as ibv_mr pointers must be) or kernel space (eg.  if an
  unrelated process had another process's QP pointer, lkey, and a virtual
  address could it post (almost certainly unsafely) a WR to the other
  process's QP?

Not sure I understand this.  All the pointers from libibverbs are of
course userspace pointers.  What could a userspace process do with a
kernel pointer?  Processes own all their resources and can't access
other resources.

 - R.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[infiniband-diags] [3/3] support lid and nodedesc diffchecks in ibnetdiscover

2010-04-07 Thread Al Chu
Hi Sasha,

This patch adds lid and node description diff options for --diffcheck in
ibnetdiscover.

Al

-- 
Albert Chu
ch...@llnl.gov
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory
---BeginMessage---

Signed-off-by: Albert Chu ch...@llnl.gov
---
 infiniband-diags/man/ibnetdiscover.8 |3 +-
 infiniband-diags/src/ibnetdiscover.c |  211 --
 2 files changed, 154 insertions(+), 60 deletions(-)

diff --git a/infiniband-diags/man/ibnetdiscover.8 
b/infiniband-diags/man/ibnetdiscover.8
index e122736..76cfbc8 100644
--- a/infiniband-diags/man/ibnetdiscover.8
+++ b/infiniband-diags/man/ibnetdiscover.8
@@ -68,7 +68,8 @@ channel adapters, routers, and port connections.
 Specify what diff checks should be done in the \fB\-\-diff\fR option above.
 Comma separate multiple diff check key(s).  The available diff checks
 are: \fIsw\fR = switches, \fIca\fR = channel adapters, \fIrouter\fR = routers,
-\fIport\fR = port connections descriptions.  Note that \fIport\fR is
+\fIport\fR = port connections, \fIlid\fR = lids, \fInodedesc\fR = node
+descriptions.  Note that \fIport\fR, \fIlid\fR, and \fInodedesc\fR are
 checked only for the node types that are specified (e.g. \fIsw\fR,
 \fIca\fR, \fIrouter\fR).
 .TP
diff --git a/infiniband-diags/src/ibnetdiscover.c 
b/infiniband-diags/src/ibnetdiscover.c
index 4435ade..770c589 100644
--- a/infiniband-diags/src/ibnetdiscover.c
+++ b/infiniband-diags/src/ibnetdiscover.c
@@ -61,6 +61,8 @@
 #define DIFF_FLAG_CA   0x0002
 #define DIFF_FLAG_ROUTER   0x0004
 #define DIFF_FLAG_PORT_CONNECTION  0x0008
+#define DIFF_FLAG_LID  0x0010
+#define DIFF_FLAG_NODE_DESCRIPTION 0x0020
 
 #define DIFF_FLAG_DEFAULT  (DIFF_FLAG_SWITCH \
| DIFF_FLAG_CA \
@@ -233,15 +235,29 @@ uint64_t out_chassis(ibnd_fabric_t * fabric, unsigned 
char chassisnum)
return guid;
 }
 
-void out_switch(ibnd_node_t * node, int group, char *chname, char *out_prefix)
+void out_switch_detail(ibnd_node_t * node, char *sw_prefix)
+{
+   char *nodename = NULL;
+
+   nodename = remap_node_name(node_name_map, node-guid, node-nodedesc);
+
+   fprintf(f, %sSwitch\t%d %s\t\t# \%s\ %s port 0 lid %d lmc %d,
+   sw_prefix ? sw_prefix : ,
+   node-numports, node_name(node), nodename,
+   node-smaenhsp0 ? enhanced : base,
+   node-smalid, node-smalmc);
+
+   free(nodename);
+}
+
+void out_switch(ibnd_node_t * node, int group, char *chname, char *id_prefix, 
char *sw_prefix)
 {
char *str;
char str2[256];
-   char *nodename = NULL;
 
-   out_ids(node, group, chname, out_prefix);
+   out_ids(node, group, chname, id_prefix);
fprintf(f, %sswitchguid=0x% PRIx64,
-   out_prefix ? out_prefix : , node-guid);
+   id_prefix ? id_prefix : , node-guid);
fprintf(f, (% PRIx64 ),
mad_get_field64(node-info, 0, IB_NODE_PORT_GUID_F));
if (group) {
@@ -253,45 +269,54 @@ void out_switch(ibnd_node_t * node, int group, char 
*chname, char *out_prefix)
if (str)
fprintf(f, %s, str);
}
+   fprintf(f, \n);
 
-   nodename = remap_node_name(node_name_map, node-guid, node-nodedesc);
+   out_switch_detail(node, sw_prefix);
+   fprintf(f, \n);
+}
 
-   fprintf(f, \n%sSwitch\t%d %s\t\t# \%s\ %s port 0 lid %d lmc %d\n,
-   out_prefix ? out_prefix : ,
-   node-numports, node_name(node), nodename,
-   node-smaenhsp0 ? enhanced : base,
-   node-smalid, node-smalmc);
+void out_ca_detail(ibnd_node_t * node, char *ca_prefix)
+{
+   char *node_type;
 
-   free(nodename);
+   switch (node-type) {
+   case IB_NODE_CA:
+   node_type = Ca;
+   break;
+   case IB_NODE_ROUTER:
+   node_type = Rt;
+   break;
+   default:
+   node_type = ???;
+   break;
+   }
+
+   fprintf(f, %s%s\t%d %s\t\t# \%s\,
+   ca_prefix ? ca_prefix : ,
+   node_type, node-numports, node_name(node),
+   clean_nodedesc(node-nodedesc));
 }
 
-void out_ca(ibnd_node_t * node, int group, char *chname, char *out_prefix)
+void out_ca(ibnd_node_t * node, int group, char *chname, char *id_prefix, char 
*ca_prefix)
 {
char *node_type;
-   char *node_type2;
 
-   out_ids(node, group, chname, out_prefix);
+   out_ids(node, group, chname, id_prefix);
switch (node-type) {
case IB_NODE_CA:
node_type = ca;
-   node_type2 = Ca;
break;
case IB_NODE_ROUTER:
node_type = rt;
-   node_type2 = Rt;
break;
default:
node_type = ???;
-   node_type2 = ???;
break;
}
 

[infiniband-diags] [2/3] support --diffcheck in ibnetdiscover

2010-04-07 Thread Al Chu
Hi Sasha,

This patch adds basic --diffcheck support in ibnetdiscover, allowing
configuration of the diff checks done in the default --diff option.

Al

-- 
Albert Chu
ch...@llnl.gov
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory
---BeginMessage---

Signed-off-by: Albert Chu ch...@llnl.gov
---
 infiniband-diags/man/ibnetdiscover.8 |8 +
 infiniband-diags/src/ibnetdiscover.c |   50 +-
 2 files changed, 45 insertions(+), 13 deletions(-)

diff --git a/infiniband-diags/man/ibnetdiscover.8 
b/infiniband-diags/man/ibnetdiscover.8
index 975b999..e122736 100644
--- a/infiniband-diags/man/ibnetdiscover.8
+++ b/infiniband-diags/man/ibnetdiscover.8
@@ -64,6 +64,14 @@ output will be displayed showing differences between the old 
and current
 fabric.  By default, the following are compared for differences: switches,
 channel adapters, routers, and port connections.
 .TP
+\fB\-\-diffcheck\fR key(s)
+Specify what diff checks should be done in the \fB\-\-diff\fR option above.
+Comma separate multiple diff check key(s).  The available diff checks
+are: \fIsw\fR = switches, \fIca\fR = channel adapters, \fIrouter\fR = routers,
+\fIport\fR = port connections descriptions.  Note that \fIport\fR is
+checked only for the node types that are specified (e.g. \fIsw\fR,
+\fIca\fR, \fIrouter\fR).
+.TP
 \fB\-p\fR, \fB\-\-ports\fR
 Obtain a ports report which is a
 list of connected ports with relevant information (like LID, portnum,
diff --git a/infiniband-diags/src/ibnetdiscover.c 
b/infiniband-diags/src/ibnetdiscover.c
index 4da09ce..4435ade 100644
--- a/infiniband-diags/src/ibnetdiscover.c
+++ b/infiniband-diags/src/ibnetdiscover.c
@@ -57,10 +57,10 @@
 #define LIST_SWITCH_NODE (1  IB_NODE_SWITCH)
 #define LIST_ROUTER_NODE (1  IB_NODE_ROUTER)
 
-#define DIFF_FLAG_SWITCH  0x0001
-#define DIFF_FLAG_CA  0x0002
-#define DIFF_FLAG_ROUTER  0x0004
-#define DIFF_FLAG_PORT_CONNECTION 0x0008
+#define DIFF_FLAG_SWITCH   0x0001
+#define DIFF_FLAG_CA   0x0002
+#define DIFF_FLAG_ROUTER   0x0004
+#define DIFF_FLAG_PORT_CONNECTION  0x0008
 
 #define DIFF_FLAG_DEFAULT  (DIFF_FLAG_SWITCH \
| DIFF_FLAG_CA \
@@ -76,6 +76,7 @@ static nn_map_t *node_name_map = NULL;
 static char *cache_file = NULL;
 static char *load_cache_file = NULL;
 static char *diff_cache_file = NULL;
+static uint32_t diffcheck_flags = DIFF_FLAG_DEFAULT;
 
 static int report_max_hops = 0;
 
@@ -735,7 +736,9 @@ static int diff_common(ibnd_fabric_t * orig_fabric,
 * in new_fabric but not in orig_fabric.
 *
 * In this diff, we don't need to check port connections,
-* since it has already been done before.
+* lids, or node descriptions since it has already been
+ * done (i.e. checks are only done when guid exists on both
+* orig and new).
 */
iter_diff_data.diff_flags = diff_flags  ~DIFF_FLAG_PORT_CONNECTION;
iter_diff_data.fabric1 = new_fabric;
@@ -752,29 +755,27 @@ static int diff_common(ibnd_fabric_t * orig_fabric,
 
 int diff(ibnd_fabric_t * orig_fabric, ibnd_fabric_t * new_fabric)
 {
-   uint32_t diff_flags = DIFF_FLAG_DEFAULT;
-
-   if (diff_flags  DIFF_FLAG_SWITCH)
+   if (diffcheck_flags  DIFF_FLAG_SWITCH)
diff_common(orig_fabric,
new_fabric,
IB_NODE_SWITCH,
-   diff_flags,
+   diffcheck_flags,
out_switch,
out_switch_port);
 
-   if (diff_flags  DIFF_FLAG_CA)
+   if (diffcheck_flags  DIFF_FLAG_CA)
diff_common(orig_fabric,
new_fabric,
IB_NODE_CA,
-   diff_flags,
+   diffcheck_flags,
out_ca,
out_ca_port);
 
-   if (diff_flags  DIFF_FLAG_ROUTER)
+   if (diffcheck_flags  DIFF_FLAG_ROUTER)
diff_common(orig_fabric,
new_fabric,
IB_NODE_ROUTER,
-   diff_flags,
+   diffcheck_flags,
out_ca,
out_ca_port);
 
@@ -786,6 +787,8 @@ static int list, group, ports_report;
 
 static int process_opt(void *context, int ch, char *optarg)
 {
+   char *p;
+
switch (ch) {
case 1:
node_name_map_file = strdup(optarg);
@@ -799,6 +802,25 @@ static int process_opt(void *context, int ch, char *optarg)
case 4:
diff_cache_file = strdup(optarg);
break;
+   case 5:
+   diffcheck_flags = 0;
+   p = strtok(optarg, ,);
+   while (p) {
+   if 

Re: [PATCH] [RFC] ummunotify: Userspace support for MMU notifications

2010-04-07 Thread Randy Dunlap
On Wed,  7 Apr 2010 13:30:29 +0100 Eric B Munson wrote:

 Signed-off-by: Roland Dreier rolandd at cisco.com

Use unobfuscated @.

 Signed-off-by: Eric B Munson ebmun...@us.ibm.com
 
 ---
 
 Changes since v3:
  - Fixed replaced [get|put] user with copy_[from|to]_user to fix x86
builds
 ---
  Documentation/Makefile|3 +-
  drivers/char/Kconfig  |   12 +
  drivers/char/Makefile |1 +
  drivers/char/ummunotify.c |  567 
 +
  4 files changed, 582 insertions(+), 1 deletions(-)
  create mode 100644 drivers/char/ummunotify.c
 
 diff --git a/Documentation/Makefile b/Documentation/Makefile
 index 6fc7ea1..27ba76a 100644
 --- a/Documentation/Makefile
 +++ b/Documentation/Makefile
 @@ -1,3 +1,4 @@
  obj-m := DocBook/ accounting/ auxdisplay/ connector/ \
   filesystems/ filesystems/configfs/ ia64/ laptops/ networking/ \
 - pcmcia/ spi/ timers/ video4linux/ vm/ watchdog/src/
 + pcmcia/ spi/ timers/ video4linux/ vm/ ummunotify/ \
 + watchdog/src/

What is this change to Documentation/Makefile for?
Is there some file that should be added in Documentation/ummunotify/ ?


 diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig
 index 3141dd3..cf26019 100644
 --- a/drivers/char/Kconfig
 +++ b/drivers/char/Kconfig
 @@ -,6 +,18 @@ config DEVPORT
   depends on ISA || PCI
   default y
  
 +config UMMUNOTIFY
 +   tristate Userspace MMU notifications
 +   select MMU_NOTIFIER
 +   help
 + The ummunotify (userspace MMU notification) driver creates a
 + character device that can be used by userspace libraries to
 + get notifications when an application's memory mapping
 + changed.  This is used, for example, by RDMA libraries to
 + improve the reliability of memory registration caching, since
 + the kernel's MMU notifications can be used to know precisely
 + when to shoot down a cached registration.
 +
  source drivers/s390/char/Kconfig
  
  endmenu
 diff --git a/drivers/char/Makefile b/drivers/char/Makefile
 index f957edf..521e5de 100644
 --- a/drivers/char/Makefile
 +++ b/drivers/char/Makefile
 @@ -97,6 +97,7 @@ obj-$(CONFIG_NSC_GPIO)  += nsc_gpio.o
  obj-$(CONFIG_CS5535_GPIO)+= cs5535_gpio.o
  obj-$(CONFIG_GPIO_TB0219)+= tb0219.o
  obj-$(CONFIG_TELCLOCK)   += tlclk.o
 +obj-$(CONFIG_UMMUNOTIFY) += ummunotify.o
  
  obj-$(CONFIG_MWAVE)  += mwave/
  obj-$(CONFIG_AGP)+= agp/


---
~Randy
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/37] librdmacm: add support for AF_IB

2010-04-07 Thread Sean Hefty
The following patch series adds several enhancements to the librdmacm
intended to simplify using RDMA devices and address scalability issues.
Major changes include:

* Adding support for AF_IB.

* The addition of a new API: rdma_getaddrinfo.  This call provides
  functionality similar to getaddrinfo for RDMA devices.  In addition
  to resolving names to addresses, it can also resolve route and
  connection data.  rdma_getaddrinfo can return addresses using
  AF_INET, AF_INET6, and AF_IB.

* Add support for IB ACM.  IB ACM defines a socket based protocol to
  an IB address and route resolution service.  One implementation of that
  service is provided separately, but anyone can implement the service
  provided that they adhere to the IB ACM communication protocol.
  Use of IB ACM is not required.

* Support synchronous operation for library calls.  Users can control
  whether an rdma_cm_id operates asynchronously or synchronously based on
  the rdma_event_channel parameter.  Use of synchronous operations
  reduces the amount of application code required to use the librdmacm.

* Allow the library to abstract RDMA resource creation for simpler RDMA
  applications.  The library can now allocate PDs, CQs, and QPs for the
  user, if not provided.

* Provide a set of helper verbs calls for posting work requests and
  checking for completions.  These are simple wrappers around libibverbs
  calls.

This patch series is also available through my git tree at:

git://git.openfabrics.org/~shefty/librdmacm.git af_ib

Signed-off-by: Sean Hefty sean.he...@intel.com


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 13/51] IB/qib: Add qib_driver.c

2010-04-07 Thread Steve Wise

Roland Dreier wrote:

  +unsigned qib_debug;
  +module_param_named(debug, qib_debug, uint, S_IWUSR | S_IRUGO);
  +MODULE_PARM_DESC(debug, mask for debug prints);

Did you look at using trace events for this stuff?  That gives you
extremely low overhead when tracing is turned off (dynamic patching to
NOP out the tracing when it's disabled) and also very fine-grained (per
trace site) control over what gets printed; plus you get dumping of the
trace buffer on crash, etc.

 - R.
  


Where can I find information on trace events?  Something in Documentation/*?

Thanks,


Steve.


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Dimension port order file support

2010-04-07 Thread Dale Purdy

On Wed, 24 Mar 2010, Sasha Khapyorsky wrote:


Hi Dale,

On 18:06 Wed 03 Mar , Dale Purdy wrote:


Provide a means to specify on a per switch basis the mapping (order)
between switch ports and dimensions for Dimension Order Routing.  This
allows the DOR routing engine to be used when the cabling is not
properly aligned for DOR, either initially, or for an upgrade.


Nice stuff.

Is this something useful with ! '-R dor'?



I'm not using the dimn_ports array in anything but DOR, but I do think
it could be useful for some of the other routing engines.



Signed-off-by: Dale Purdy pu...@sgi.com


The patch itself is broken somehow - it has double space at start of
non-changed line (it is fixable with sed -e 's/^  / /', so don't resend
patch only for this).



Yes I see - odd.  The original patch file didn't have this - must have
happened when loading it into mail.  Hopefully my updated patch will
be ok.


Some more minor comments are below.

...



+static int set_dimn_ports(void *ctx, uint64_t guid, char *p)
+{
+   osm_ucast_mgr_t *m = ctx;
+   osm_node_t *node = osm_get_node_by_guid(m-p_subn, cl_hton64(guid));
+   osm_switch_t *sw;
+   uint8_t *dimn_ports = NULL;
+   uint8_t port;
+   uint *ports = NULL;


'uint' is not something standard (we had some build compatibility issues
with 'uint' in infiniband-diags in the past), so what about 'unsigned
int'?



ok, fixed.


+   const int bpw = sizeof(*ports)*8;
+   int words;
+   int i = 1; /* port 0 maps to port 0 */
+
+   if (!node || !(sw = node-sw)) {
+   OSM_LOG(m-p_log, OSM_LOG_DEBUG,
+   switch with guid 0x%016 PRIx64  is not found\n,
+   guid);
+   return 0;
+   }
+
+   if (sw-dimn_ports) {
+   OSM_LOG(m-p_log, OSM_LOG_DEBUG,
+   switch with guid 0x%016 PRIx64  already listed\n,
+   guid);


It is GIUD double listed case, right? Wouldn't OSM_LOG_VERBOSE be more
appropriate?



fixed.


+   while ((*p != '\0')  (*p != '#')) {
+   char *e;
+
+   port = strtoul(p, e, 0);
+   if ((p == e) || (port == 0) || (port = sw-num_ports) ||
+   !osm_node_get_physp_ptr(node, port)) {
+   OSM_LOG(m-p_log, OSM_LOG_DEBUG,
+   bad port %d specified for guid 0x%016 PRIx64 
\n,
+   port, guid);
+   free(dimn_ports);
+   free(ports);


Ditto.



fixed.


+   return 0;
+   }
+
+   if (ports[port/bpw]  (1u  (port%bpw))) {
+   OSM_LOG(m-p_log, OSM_LOG_DEBUG,
+   port %d already specified for guid 0x%016 PRIx64 
\n,
+   port, guid);


Ditto.



fixed.


+   cl_qmap_apply_func(p_sw_guid_tbl, free_dimn_ports, NULL);
+   if (p_mgr-p_subn-opt.dimn_ports_file) {
+   OSM_LOG(p_mgr-p_log, OSM_LOG_DEBUG,
+   Fetching dimension ports file \'%s\'\n,
+   p_mgr-p_subn-opt.dimn_ports_file);
+   if (parse_node_map(p_mgr-p_subn-opt.dimn_ports_file,
+  set_dimn_ports, p_mgr)) {
+   OSM_LOG(p_mgr-p_log, OSM_LOG_ERROR, ERR 3A05: 
+   cannot parse dimn_ports_file \'%s\'\n,
+   p_mgr-p_subn-opt.dimn_ports_file);
+   }
+   }
+


Hmm, if it is DOR only it can be done under 'if (is_dor)' (to save
cycles of other REs). Otherwise (generic usability)
ucast_mgr_setup_all_switches() seems as more appropriate place to have
such setup, no?



moved to ucast_mgr_setup_all_switches() as you suggested.


And what about adding:

if (sw-dimn_ports)
free(dimn_ports);

in osm_switch_delete()?



fixed.

New patch attached.

DaleDimension port order file support (V2)

Provide a means to specify on a per switch basis the mapping (order)
between switch ports and dimensions for Dimension Order Routing.  This
allows the DOR routing engine to be used when the cabling is not
properly aligned for DOR, either initially, or for an upgrade.

Signed-off-by: Dale Purdy pu...@sgi.com
---
 opensm/include/opensm/osm_subnet.h |1 +
 opensm/include/opensm/osm_switch.h |   30 +
 opensm/man/opensm.8.in |   31 --
 opensm/opensm/main.c   |   13 -
 opensm/opensm/osm_subnet.c |7 ++
 opensm/opensm/osm_switch.c |4 +-
 opensm/opensm/osm_ucast_mgr.c  |  116 +++-
 7 files changed, 192 insertions(+), 10 deletions(-)

diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h
index 3970e98..e4e298e 100644
--- a/opensm/include/opensm/osm_subnet.h
+++ b/opensm/include/opensm/osm_subnet.h
@@ -186,6 +186,7 @@ typedef struct osm_subn_opt {
 	uint16_t 

Re: [PATCH 26/37] librdmacm: set src_addr in rdma_getaddrinfo

2010-04-07 Thread Jason Gunthorpe
On Wed, Apr 07, 2010 at 10:12:43AM -0700, Sean Hefty wrote:
 RDMA requires the user to allocate hardware resources before
 establishing a connection.  To support this, the user must know
 the source address that the connection will use to reach the
 remote endpoint.  Modify rdma_getaddrinfo to determine an
 appropriate source address based on the specified destination,
 when a source address is not given.

I haven't looked through everything you posted to make a suggestion
here, but this bothers me..

The resources should be allocated after the rdma_bind syscall, prior to
listen/accept or connect, IMHO.

How does tha rai-ai_src_addr get used to allocate resources anyhow?

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 22/37] librdmacm: add new call to create id

2010-04-07 Thread Jason Gunthorpe
On Wed, Apr 07, 2010 at 10:12:44AM -0700, Sean Hefty wrote:

 + *   The rdma_cm_id will be set to use synchronous operations (connect,
 + *   listen, and get_request).  To convert to synchronous operation, the
   ^

asynchronous?

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 22/37] librdmacm: add new call to create id

2010-04-07 Thread Hefty, Sean
 + *   The rdma_cm_id will be set to use synchronous operations (connect,
 + *   listen, and get_request).  To convert to synchronous operation, the
   ^
asynchronous?

yes - thanks
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 26/37] librdmacm: set src_addr in rdma_getaddrinfo

2010-04-07 Thread Sean Hefty
I haven't looked through everything you posted to make a suggestion
here, but this bothers me..

The resources should be allocated after the rdma_bind syscall, prior to
listen/accept or connect, IMHO.

How does tha rai-ai_src_addr get used to allocate resources anyhow?

Maybe the patch description is off.

All this does (in a very non-sexy way) is set ai_src_addr.  It does not allocate
any hardware resources.  A user can provide ai_src_addr as input into rdma_bind
or rdma_resolve_addr.

The motivation is twofold.  First, the user can select the rdma_addrinfo for a
connection by examining the src/dst address pair.  This may be desired for
failover or performance reasons.  Second, route resolution may require knowing
both the source and destination addresses.  For example, IB ACM requires both
addresses as input.

- Sean

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ummunotify: progress at last!

2010-04-07 Thread Jason Gunthorpe
On Wed, Apr 07, 2010 at 12:37:03PM -0700, Roland Dreier wrote:
   No, there is no mmap. Like this:
   
   u64 my_counter = 0;
   
   ibv_set_mmu_counter(verbs, my_counter);
   [..]
   while (my_counter != last_my_counter) {
   last_my_counter = my_counter;
   ibv_get_mmu_notifications(verbs, ...);   // - I am a memory barrier 
 as well
   }
   
   The kernel 'syscall' ibv_set_mmu_counter would bind the given verbs to
   the 8 byte counter you specified without having to the mmap thing. As
   I understand it this is what perfevents does.
 
 I was trying to look at how perf events handles this, and AFAICT it
 looks like kernel/perf_event.c just supports mmap().  Can you expand on
 what you meant here?
 
 (I was trying to figure out how one would handle the case where
 userspace gives us a counter in highmem -- doing kmap_atomic() seems to
 be to only option but then I'm not sure if I want to deal with that...)

I think I was mistaken here, disregard..

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 26/37] librdmacm: set src_addr in rdma_getaddrinfo

2010-04-07 Thread Jason Gunthorpe
On Wed, Apr 07, 2010 at 12:54:56PM -0700, Sean Hefty wrote:
 I haven't looked through everything you posted to make a suggestion
 here, but this bothers me..
 
 The resources should be allocated after the rdma_bind syscall, prior to
 listen/accept or connect, IMHO.
 
 How does tha rai-ai_src_addr get used to allocate resources anyhow?
 
 Maybe the patch description is off.
 
 All this does (in a very non-sexy way) is set ai_src_addr.  It does
 not allocate any hardware resources.  A user can provide ai_src_addr
 as input into rdma_bind or rdma_resolve_addr.
 
 The motivation is twofold.  First, the user can select the
 rdma_addrinfo for a connection by examining the src/dst address
 pair.  This may be desired for failover or performance reasons.
 Second, route resolution may require knowing both the source and
 destination addresses.  For example, IB ACM requires both addresses
 as input.

Huumm

I don't have a problem with ai_src_addr being set, when necessary, but
setting it unconditionally seems wrong to me. In most cases the kernel
should select the source during route resolution, not be forced to
something in userspace.

Certainly for AF_INET/6 I don't think this should be done..

Apps doing complex things for failover should supply a source address
in the hints and call rdma_getaddrinfo for each adaptor.

AF_IB has the scope ID in the destination to specify the adaptor for
link-local GIDs, so the source should not often be needed.

Not sure what you mean that ACM requires it? Doesn't ACM plug in at
the rdma_getaddrinfo stage? If so it can get the source on its own
like you did in this patch. I agree that ACM should always return
results with the source set, because it is providing path records
relative to a specific adaptor.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 31/37] librdmacm: provide abstracted verb calls

2010-04-07 Thread Sean Hefty
 +static inline int
 +rdma_get_send_comp(struct rdma_cm_id *id, struct ibv_wc *wc)
 +{
 +struct ibv_cq *cq;
 +void *context;
 +int ret;
 +
 +ret = ibv_poll_cq(id-send_cq, 1, wc);
 +if (ret)
 +return ret;
 +
 +ret = ibv_req_notify_cq(id-send_cq, 0);
 +if (ret)
 +return ret;
 +
 +ret = ibv_poll_cq(id-send_cq, 1, wc);
 +if (ret)
 +return ret;
 +
 +ret = ibv_get_cq_event(id-send_cq_channel, cq, context);
 +if (ret)
 +return ret;


This doesn't look correct.  If the send isn't complete by the time the
2nd ibv_poll_cq() completes, then this function will return without
having filled in the wc.  Or am I missing something?  Shouldn't the
ibv_get_cq_event() be the first thing this function does?  The same
issue/question exists for rdma_get_recv_comp().

I think it's possible for the function to return without having filled in a wc.
If the 2nd poll removes a completion, it can leave a cq event on the channel,
which a subsequent call could retrieve, but then find the cq empty.

The idea for this call is to abstract poll, notify_cq, and get_cq_event, but
still provide decent performance.  (Scalability is a separate matter.  I
couldn't find a decent way to abstract a CQ shared across QPs or between the
receive and send queues.)

To avoid returning from the call without a completion, I think the following
structure works:

poll()
notify_cq()
poll()
while (no completion) {
get_cq_event()
poll()
}

The only drawback I see is that it's theoretically possible to build up a queue
of cq events in the kernel.  Not sure how to fix that.  Any ideas?

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 31/37] librdmacm: provide abstracted verb calls

2010-04-07 Thread Steve Wise

Sean Hefty wrote:

+static inline int
+rdma_get_send_comp(struct rdma_cm_id *id, struct ibv_wc *wc)
+{
+   struct ibv_cq *cq;
+   void *context;
+   int ret;
+
+   ret = ibv_poll_cq(id-send_cq, 1, wc);
+   if (ret)
+   return ret;
+
+   ret = ibv_req_notify_cq(id-send_cq, 0);
+   if (ret)
+   return ret;
+
+   ret = ibv_poll_cq(id-send_cq, 1, wc);
+   if (ret)
+   return ret;
+
+   ret = ibv_get_cq_event(id-send_cq_channel, cq, context);
+   if (ret)
+   return ret;

  

This doesn't look correct.  If the send isn't complete by the time the
2nd ibv_poll_cq() completes, then this function will return without
having filled in the wc.  Or am I missing something?  Shouldn't the
ibv_get_cq_event() be the first thing this function does?  The same
issue/question exists for rdma_get_recv_comp().



I think it's possible for the function to return without having filled in a wc.
  



So its busted?  Or is this intended behavior?



If the 2nd poll removes a completion, it can leave a cq event on the channel,
which a subsequent call could retrieve, but then find the cq empty.

The idea for this call is to abstract poll, notify_cq, and get_cq_event, but
still provide decent performance.  (Scalability is a separate matter.  I
couldn't find a decent way to abstract a CQ shared across QPs or between the
receive and send queues.)

To avoid returning from the call without a completion, I think the following
structure works:

poll()
notify_cq()
poll()
while (no completion) {
get_cq_event()
poll()
}
  


Is rdma_get_send_completion() supposed to return exactly one wc?  If so 
then the 2 polls can cause a wc to get silently discarded.   I must 
still not be understanding the intended use?


I would think this should just be:

get_cq_event()
notify_cq()
poll()



The only drawback I see is that it's theoretically possible to build up a queue
of cq events in the kernel.  Not sure how to fix that.  Any ideas?

  


That can always happen, yes?


Steve.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] RDMA/nes: correct cap.max_inline_data assignment in nes_query_qp

2010-04-07 Thread Roland Dreier
thanks, applied
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] infiniband: checking the wrong variable

2010-04-07 Thread Roland Dreier
thanks, applied.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/10] iw_cxgb4: Add connection management functions.

2010-04-07 Thread Roland Dreier
  +void _c4iw_free_ep(struct kref *kref)
 ...
  +ep = container_of(container_of(kref, struct c4iw_ep_common, kref),
  +  struct c4iw_ep, com);

sparse warns of some internal container_of variable shadowing itself
here.  You can avoid that and write this more simply as:

ep = container_of(kref, struct c4iw_ep, com.kref);
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/10] iw_cxgb4: Add connection management functions.

2010-04-07 Thread Roland Dreier
  +wr_waitp = (struct c4iw_wr_wait *)rpl-data[1];

Sparse complains about this case from __be64 to a pointer.  I assume
this is OK but you probably want to stick a __force in there to annotate it.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/10] iw_cxgb4: Add memory management functions.

2010-04-07 Thread Roland Dreier
  +req-wr.wr_lo = (u64)wr_wait;

wr_lo is __be64.  The cast should be to __force __be64 here I think.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/10] iw_cxgb4: Add connection management functions.

2010-04-07 Thread Steve Wise

Roland Dreier wrote:

  +int peer2peer = 0;
  +module_param(peer2peer, int, 0644);
  +MODULE_PARM_DESC(peer2peer, Support peer2peer ULPs (default=0));

If you build iw_cxgb3 and iw_cxgb4 into the kernel, the peer2peer symbol
names clash.  (Same problem occurs if you try to load cxgb3 and cxgb4
modules at the same time, I think).  



Both iw_cxgb3 and iw_cxgb4 load ok concurrently when compiled as modules.


The option was originally intended to be used in more than just cm.c.  
So there's a piece of code missing in qp.c.  I'll clean this up.   I 
might make an attribute in c4iw_endpoint that indicates this mode.  Then 
the qp code won't need the global option and can key off the endpoint 
attribute.   So I can make this a static as you suggest.




We can fix it here in cxgb4 by just
making peer2peer static (and deleting the extern declaration).

However peer2peer is not that great of a name for a global symbol; might
be good to add a patch to cxgb3 to rename peer2peer to something like
iwch_peer2peer and using module_param_named()... 
  



I'll do this for cxgb3.


Steve.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/10] iw_cxgb4: Add driver, fw, and hw headers.

2010-04-07 Thread Steve Wise

Roland Dreier wrote:

You have:

  +struct fw_ri_send_wr {
 ...
  + __be16 wrid;

  +struct fw_ri_recv_wr {
 ...
  + __be16 wrid;

But also:

  +static inline void init_wr_hdr(union t4_wr *wqe, u16 wrid,
  +enum fw_wr_opcodes opcode, u8 flags, u8 len16)
 ...
  + wqe-send.wrid = wrid;

and similar for recv.wrid in qp.c.  sparse correctly warns about this
endianness clash.

The intention is that the device just treats wrid as opaque I assume so
I think the correct fix is to go from __be16 to u16 in the structure
declarations.

 - R.
  



Yes, it should be a u16 in the wr structs.





--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/10] iw_cxgb4: Add connection management functions.

2010-04-07 Thread Roland Dreier
  Both iw_cxgb3 and iw_cxgb4 load ok concurrently when compiled as modules.

Oh, right.  The peer2peer symbol isn't exported so the clash is only if
you try to build them both into the kernel (as I often do as part of my
quick build tests).
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 31/37] librdmacm: provide abstracted verb calls

2010-04-07 Thread Steve Wise

Sean Hefty wrote:

I think it's possible for the function to return without having filled in a
  

wc.

So its busted?  Or is this intended behavior?



Depends on the point of view, I guess.  :)  It would be nice to avoid that
situation.

  

Is rdma_get_send_completion() supposed to return exactly one wc?  If so
then the 2 polls can cause a wc to get silently discarded.   I must
still not be understanding the intended use?



How can a wc get discarded?  Maybe the return code from ibv_poll_cq is confusing
you?  If the first poll finds a wc, ibv_poll_cq returns 1, and we exit the
function.  Otherwise, we rearm the cq, then poll again to make sure that nothing
got missed.

  



Right.  I missed that.  poll will return 1 if there's a completion 
returned.  Nevermind :)




I would think this should just be:

get_cq_event()
notify_cq()
poll()



This requires arming the CQ up front.  I was also trying to avoid the overhead
of always calling get_cq_event and notify_cq to just pull a completed request
off of the work queue.

  



I was confused on the poll_cq return code (and I've been working in this 
code for umpteen years :) ).




The only drawback I see is that it's theoretically possible to build up a
  

queue


of cq events in the kernel.  Not sure how to fix that.  Any ideas?

  

That can always happen, yes?



It seems like it should be avoidable.  Maybe 1 event can queue up, but I think
we can prevent more by not rearming until that event gets pulled.

If nothing else, I think this discussion shows why we need this sort of wrapper.
:)
  


Indeed!  I like the wrappers.


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 26/37] librdmacm: set src_addr in rdma_getaddrinfo

2010-04-07 Thread Sean Hefty
I don't have a problem with ai_src_addr being set, when necessary, but
setting it unconditionally seems wrong to me. In most cases the kernel
should select the source during route resolution, not be forced to
something in userspace.

Just to be precise, the source is selected during address resolution, and the
existing APIs allow the user to indicate that a specific source should be used.

This is a requirement of some applications.

Not sure what you mean that ACM requires it? Doesn't ACM plug in at
the rdma_getaddrinfo stage? If so it can get the source on its own
like you did in this patch. I agree that ACM should always return
results with the source set, because it is providing path records
relative to a specific adaptor.

Yes - the code to set the source could move from librdmacm into ACM.

I can change rdma_getaddrinfo to only set the source address if either the user
provides one through a hint, or if resolved through ACM.

- Sean

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 26/37] librdmacm: set src_addr in rdma_getaddrinfo

2010-04-07 Thread Jason Gunthorpe
On Wed, Apr 07, 2010 at 03:10:36PM -0700, Sean Hefty wrote:

 Not sure what you mean that ACM requires it? Doesn't ACM plug in at
 the rdma_getaddrinfo stage? If so it can get the source on its own
 like you did in this patch. I agree that ACM should always return
 results with the source set, because it is providing path records
 relative to a specific adaptor.
 
 Yes - the code to set the source could move from librdmacm into ACM.
 
 I can change rdma_getaddrinfo to only set the source address if
 either the user provides one through a hint, or if resolved through
 ACM.

That would be my preference. I think the kernel calls should use a
null source address in the common case and a set source should be an
exceptional case. This matches sockets very well.

I'd see two cases for setting a source address, an app that wants to
control the bind port - this is similar to socket cases, and is
generally an exceptional case.

The other is that an app wants the connection to be usable with a
certain PD. This is more like the DAPL case, as far as I understand it
(ie resources have been allocated against a PD prior to the addresses
being known). This would be best served by having the hints include a
PD and have rdma_getaddrinfo generate a source address that works with
that PD. A PD is more general than a source address - in single HCA
cases a PD will be usable with all ports.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: librdmacm meets libiwarp

2010-04-07 Thread Sean Hefty
it's nice to see, that you seem to have liked most of
my ideas and intentions of libiwarp :) thanks for
making that work a bit more sustainable!

Thanks for the input.  I think there could still be a little more work done to
handle completions across shared CQs, plus add in SRQ support.

I was wondering where I could get a copy of your
latest code to look at it as a whole and (maybe)
comment on it.

The code is available from my git tree, in the af_ib branch:

git://git.openfabrics.org/~shefty/librdmacm.git af_ib

- Sean

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/10] iw_cxgb4: Add driver, fw, and hw headers.

2010-04-07 Thread Roland Dreier
  Shouldn't it be __u16?  These structs are part of the firmware to host
  driver/lib API. 

Yes, if this header is used by userspace too then you want __u16.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 13/51] IB/qib: Add qib_driver.c

2010-04-07 Thread John A. Gregor
Roland Dreier rdre...@cisco.com wrote:
   Where can I find information on trace events?  Something in 
 Documentation/*?

 Yep, Documentation/trace/events.txt.

LWN just did a really good writeup on using the TRACE_EVENT macro:

http://lwn.net/Articles/379903/

Part 2 is still behind the paywall.

-John Gregor
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html