Re: [PATCH] infiniband-diags: Ignore PortInfo data on down port.

2010-03-23 Thread Sasha Khapyorsky
On 11:19 Fri 19 Mar , Ira Weiny wrote:
 
 From: Ira Weiny wei...@llnl.gov
 Date: Thu, 18 Mar 2010 18:35:34 -0700
 Subject: [PATCH] infiniband-diags: Ignore PortInfo data on down port.
 
 According to C14-24.2.1:
 If PortInfo:PortState == Down then only PortInfo:PortState and
 PortInfo:PortPhysicalState _must_ be valid.  Other fields may be invalid
 depending on the vendor.  Therefore ignore all PortInfo data other than those
 fields when reporting PortInfo on a down port.
 
 Signed-off-by: Ira Weiny wei...@llnl.gov

Applied. Thanks.

Sasha
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next-2.6 PATCH] ipoib: remove addrlen check for mc addresses

2010-03-23 Thread Eli Cohen
On Mon, Mar 22, 2010 at 02:21:39PM +0100, Jiri Pirko wrote:
 Finally this bit can be removed. Currently, after the bonding driver is
 changed/fixed (32a806c194ea112cfab00f558482dd97bee5e44e net-next-2.6),

Could you send a link to the git tree where I can find this commit and
the related fixes?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next-2.6 PATCH] ipoib: remove addrlen check for mc addresses

2010-03-23 Thread Or Gerlitz
Eli Cohen wrote:
 Could you send a link to the git tree where I can find this commit and
 the related fixes?

basically, as the subject line suggests, it should be in Dave's net-next tree

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: opensm/main.c: foce stdout to be line-buffered

2010-03-23 Thread Sasha Khapyorsky
Hi Yevgeny,

On 10:26 Wed 03 Mar , Yevgeny Kliteynik wrote:
 When stdout is assigned to a terminal, it is line-buffered.
 But when opensm's stdout is redirected to a file, stdout
 becomes block-buffered, which means that '\n' won't cause
 the buffer to be flushed.

Such redirection happens in daemon mode. Another case would be
'opensm  somefile '. Where do you see the problem?

Would '-d2' option be related to the issue?

 
 Forcing stdout to always be line-buffered.
 
 Signed-off-by: Yevgeny Kliteynik klit...@dev.mellanox.co.il
 ---
  opensm/opensm/main.c |3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)
 
 diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c
 index f9a33af..5ea65dd 100644
 --- a/opensm/opensm/main.c
 +++ b/opensm/opensm/main.c
 @@ -613,6 +613,9 @@ int main(int argc, char *argv[])
   {NULL, 0, NULL, 0}  /* Required at the end of the array */
   };
 
 + /* force stdout to be line-buffered */
 + setlinebuf(stdout);

What about stderr? IOW describe your problem in more details (see above).

Sasha

 +
   /* Make sure that the opensm and complib were compiled using
  same modes (debug/free) */
   if (osm_is_debug() != cl_is_debug()) {
 -- 
 1.5.1.4
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next-2.6 PATCH] ipoib: remove addrlen check for mc addresses

2010-03-23 Thread Moni Shoua

 Could you send a link to the git tree where I can find this commit and
 the related fixes?
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
See commit 5e47596bee12597824a3b5b21e20f80b61e58a35 for the fix prior to this 
one.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Perfquery can be too noisy.

2010-03-23 Thread Sasha Khapyorsky
Hi Mike,

On 15:01 Thu 04 Mar , Mike Heinz wrote:
 When perfquery is run against fabrics that do not support PortXmitWait, it 
 emits this warning for every port:
 
 ibwarn: [23225] dump_perfcounters: PortXmitWait not indicated so ignore this 
 counter

Tee patch is malformed (white space mangled).

 
 When running ibcheckerrors on a large fabric, this leads to a flood of 
 warnings.
 
 The proposed patch reduces the warning to a verbose message

So I'm applying this verbosity reducing part by hands.

 and, on fabrics that do not support PortXmitWait, it suppresses the output of 
 the XmitWait attribute.

This could be done, but see the comment below.

 
 Signed-off-by: Michael Heinz michaelhe...@qlogic.com
 
 diff --git a/infiniband-diags/src/perfquery.c 
 b/infiniband-diags/src/perfquery.c
 index 3ae692c..812b4c2 100644
 --- a/infiniband-diags/src/perfquery.c
 +++ b/infiniband-diags/src/perfquery.c
 @@ -282,6 +282,25 @@ static void 
 output_aggregate_perfcounters_ext(ib_portid_t * portid)
ALL_PORTS, buf);
  }
 
 +static int dump_fields(char *buf, int bufsz, void *data, int start, int end)
 +{
 +   char val[64];
 +   char *s = buf;
 +   int n, field;
 +   for (field = start; field = end  bufsz  0; field++) {
 +   mad_decode_field(data, field, val);
 +   if (!mad_dump_field(field, s, bufsz, val))
 +   return -1;
 +   n = strlen(s);
 +   s += n;
 +   *s++ = '\n';
 +   *s = 0;
 +   n++;
 +   bufsz -= n;
 +   }
 +   return (int)(s - buf);
 +}

Seems that this low level stuf is copied from libibmad. Wouldn't it be
better just to add appropriate function there (if needed).

Sasha

 +
  static void dump_perfcounters(int extended, int timeout, uint16_t cap_mask,
   ib_portid_t * portid, int port, int aggregate)
  {
 @@ -293,8 +312,7 @@ static void dump_perfcounters(int extended, int timeout, 
 uint16_t cap_mask,
 IBERROR(perfquery);
 if (!(cap_mask  0x1000)) {
 /* if PortCounters:PortXmitWait not supported clear 
 this counter */
 -   IBWARN
 -   (PortXmitWait not indicated so ignore this 
 counter);
 +   VERBOSE(PortXmitWait not indicated so ignore this 
 counter);
 perf_count.xmtwait = 0;
 mad_encode_field(pc, IB_PC_XMT_WAIT_F,
  perf_count.xmtwait);
 @@ -302,7 +320,8 @@ static void dump_perfcounters(int extended, int timeout, 
 uint16_t cap_mask,
 if (aggregate)
 aggregate_perfcounters();
 else
 -   mad_dump_perfcounters(buf, sizeof buf, pc, sizeof pc);
 +   dump_fields(buf, sizeof buf, pc, IB_PC_FIRST_F,
 +   (cap_mask  
 0x1000)?IB_PC_LAST_F:IB_PC_RCV_PKTS_F);
 } else {
 if (!(cap_mask  0x200))/* 1.2 errata: bit 9 is 
 extended counter support */
 IBWARN
 
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: infiniband-diags/ibstatus: add link_layer for RoCEE support

2010-03-23 Thread Sasha Khapyorsky
On 12:37 Tue 09 Mar , Yevgeny Kliteynik wrote:
 RoCEE introduces new file in sysfs: link_layer.
 Assume IB if the file doesn't exist (driver w/o RoCEE support).
 
 Signed-off-by: Yevgeny Kliteynik klit...@dev.mellanox.co.il

Applied. Thanks.

Sasha
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next-2.6 PATCH] ipoib: remove addrlen check for mc addresses

2010-03-23 Thread Eli Cohen
On Tue, Mar 23, 2010 at 12:34:13PM +0200, Or Gerlitz wrote:
 
 basically, as the subject line suggests, it should be in Dave's net-next tree
 
I just need to clone this tree and need the url. Can you give it to me
from .git/config?
Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libibumad: added link_layer for RoCEE support and updated umad version

2010-03-23 Thread Sasha Khapyorsky
On 12:39 Tue 09 Mar , Yevgeny Kliteynik wrote:
 Added link_layer field to umad_port_t.
 The field is implemented as char[].
 If the relevant file doesn't exist in sysfs, the link layer
 is IB. Otherwise, the content of link_layer file is used.
 
 The libibumad version is promoted.
 
 Signed-off-by: Yevgeny Kliteynik klit...@dev.mellanox.co.il

Applied but see below. Thanks.

 ---
  libibumad/include/infiniband/umad.h |2 ++
  libibumad/libibumad.ver |2 +-
  libibumad/src/umad.c|5 +
  3 files changed, 8 insertions(+), 1 deletions(-)
 
 diff --git a/libibumad/include/infiniband/umad.h 
 b/libibumad/include/infiniband/umad.h
 index 1f82183..f9da204 100644
 --- a/libibumad/include/infiniband/umad.h
 +++ b/libibumad/include/infiniband/umad.h
 @@ -116,6 +116,7 @@ typedef struct ib_user_mad {
  #define SYS_PORT_RATErate
  #define SYS_PORT_GUIDport_guid
  #define SYS_PORT_GID gids/0
 +#define SYS_PORT_LINK_LAYER  link_layer
 
  typedef struct umad_port {
   char ca_name[UMAD_CA_NAME_LEN];
 @@ -132,6 +133,7 @@ typedef struct umad_port {
   uint64_t port_guid;
   unsigned pkeys_size;
   uint16_t *pkeys;
 + char link_layer[UMAD_CA_NAME_LEN];
  } umad_port_t;
 
  typedef struct umad_ca {
 diff --git a/libibumad/libibumad.ver b/libibumad/libibumad.ver
 index 57cddbd..225738c 100644
 --- a/libibumad/libibumad.ver
 +++ b/libibumad/libibumad.ver
 @@ -6,4 +6,4 @@
  # API_REV - advance on any added API
  # RUNNING_REV - advance any change to the vendor files
  # AGE - number of backward versions the API still supports
 -LIBVERSION=2:1:0
 +LIBVERSION=2:2:0

This patch actually breaks ABI (mostly for umad_port_t users - see for
example update_umad_port() in opensm/libvendor/osm_vendor_ibumad_sa.c).
So it looks that API_REV should be advanced as well.

I will do this before next release.

Sasha

 diff --git a/libibumad/src/umad.c b/libibumad/src/umad.c
 index 277ae6b..d16e750 100644
 --- a/libibumad/src/umad.c
 +++ b/libibumad/src/umad.c
 @@ -159,6 +159,11 @@ static int get_port(char *ca_name, char *dir, int 
 portnum, umad_port_t * port)
   if (sys_read_uint(port_dir, SYS_PORT_CAPMASK, port-capmask)  0)
   goto clean;
 
 + if (sys_read_string(port_dir, SYS_PORT_LINK_LAYER,
 + port-link_layer, UMAD_CA_NAME_LEN)  0)
 + /* assume IB by default */
 + sprintf(port-link_layer, IB);
 +
   port-capmask = htonl(port-capmask);
 
   if (sys_read_gid(port_dir, SYS_PORT_GID, gid)  0)
 -- 
 1.5.1.4
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next-2.6 PATCH] ipoib: remove addrlen check for mc addresses

2010-03-23 Thread Moni Shoua
Eli Cohen wrote:
 On Tue, Mar 23, 2010 at 12:34:13PM +0200, Or Gerlitz wrote:
 basically, as the subject line suggests, it should be in Dave's net-next tree

 I just need to clone this tree and need the url. Can you give it to me
 from .git/config?
 Thanks.
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
maybe your'e looking for this one
http://git.kernel.org/?p=linux/kernel/git/davem/net-next-2.6.git;a=commit;h=32a806c194ea112cfab00f558482dd97bee5e44e

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: infiniband-diags/ibstat.c: print link layer for RoCEE support

2010-03-23 Thread Sasha Khapyorsky
On 12:41 Tue 09 Mar , Yevgeny Kliteynik wrote:
 
 Signed-off-by: Yevgeny Kliteynik klit...@dev.mellanox.co.il

Applied. Thanks.

Sasha
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: opensm/main.c: foce stdout to be line-buffered

2010-03-23 Thread Yevgeny Kliteynik

Hi Sasha,

On 23/Mar/10 12:37, Sasha Khapyorsky wrote:

Hi Yevgeny,

On 10:26 Wed 03 Mar , Yevgeny Kliteynik wrote:

When stdout is assigned to a terminal, it is line-buffered.
But when opensm's stdout is redirected to a file, stdout
becomes block-buffered, which means that '\n' won't cause
the buffer to be flushed.


Such redirection happens in daemon mode. Another case would be
'opensm  somefile '. Where do you see the problem?


The problematic case is 'opensm  somefile'.
 

Would '-d2' option be related to the issue?


'-d2' refers to the log file, I'm talking about
printf to stdout.



Forcing stdout to always be line-buffered.

Signed-off-by: Yevgeny Kliteynikklit...@dev.mellanox.co.il
---
  opensm/opensm/main.c |3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c
index f9a33af..5ea65dd 100644
--- a/opensm/opensm/main.c
+++ b/opensm/opensm/main.c
@@ -613,6 +613,9 @@ int main(int argc, char *argv[])
{NULL, 0, NULL, 0}  /* Required at the end of the array */
};

+   /* force stdout to be line-buffered */
+   setlinebuf(stdout);


What about stderr?


stderr is always unbuffered, no matter where
is stderr actually assigned to.


IOW describe your problem in more details (see above).


I'm running opensm  somefile, and I don't see SM's stdout
(such as SUBNET UP message, or new cached options after SIGHUP),
because when stdout is assigned to file and not terminal, it is
handled differently. Instead of flushing on printing '\n',
it becomes buffered, which means that you don't control when
is this buffer flushed.
My fix forces stdout to always flush stdout when printing '\n'.
It has no effect when stdout is assigned to terminal, and it
changes buffering when SM's stdout is redirected.

More details about stdout/stderr buffering:

http://www.pixelbeat.org/programming/stdio_buffering/

-- Yevgeny


Sasha


+
/* Make sure that the opensm and complib were compiled using
   same modes (debug/free) */
if (osm_is_debug() != cl_is_debug()) {
--
1.5.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: opensm/main.c: foce stdout to be line-buffered

2010-03-23 Thread Sasha Khapyorsky
On 14:25 Tue 23 Mar , Yevgeny Kliteynik wrote:
 
 I'm running opensm  somefile, and I don't see SM's stdout
 (such as SUBNET UP message, or new cached options after SIGHUP),
 because when stdout is assigned to file and not terminal, it is
 handled differently. Instead of flushing on printing '\n',
 it becomes buffered, which means that you don't control when
 is this buffer flushed.
 My fix forces stdout to always flush stdout when printing '\n'.
 It has no effect when stdout is assigned to terminal, and it
 changes buffering when SM's stdout is redirected.
 
 More details about stdout/stderr buffering:
 
 http://www.pixelbeat.org/programming/stdio_buffering/

There you can find couple of ways to workaround this issue, for example:

  stdbuf -o L opensm  somefile

I would prefer to not change an external settings so the program would
work as expected.

Sasha
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ummunotify: progress at last!

2010-03-23 Thread Jason Gunthorpe
On Tue, Mar 23, 2010 at 12:06:50PM -0400, Jeff Squyres wrote:
 IBM has found a resource that they think will be able to progress Roland's 
 ummunotify work.
 
 After a few discussions in Sonoma last week and some off-list emails, here's 
 what we decided:
 
 1. Take Roland's last code drop (Roland: can you re-send the last copy of 
 your code?).
 
 2. Do not convert it to the perf events kernel framework as the Linux kernel 
 community requested.  Instead, migrate the functionality into the ibv code 
 base.  Roland thinks that most of the code should be adaptable without too 
 many changes.  Here's the highlights of the new functionality:
 
a. Add a new flag to ibv_reg_mr() that does the same function as 
 UMMUNOTIFY_REGISTER_REGION
b. ibv_dereg_mr() always performs the equivalent of
UMMUNOTIFY_UNREGISTER_REGION (if necessary)

c. Make a new device somewhere (under /dev/infiniband?) that
performs the same functions as /dev/ummunotify (open it to mmap
the counter into user space, and read events when something
interesting happens)

I would prefer to do this by adding a new verbs call that returns a fd
directly. Ie use ib_uverbs_alloc_event_file and act like
ibv_create_comp_channel.

The main reason for the new FD is so it can be polled on..

You can also avoid the mmap scheme by doing what perf events does,
pass in a pointer from userspace and have the kernel pin that page it
is on.

So, I'd suggest

fd = ibv_create_mmu_monitor(verbs, counter);
[..]
poll(fd);
[..]
if (counter != last_counter)
  [..]
close(fd);

Refuse to create more than one mmu_monitor for each verbs for now.

I looked at this for a little while at Sonoma and I think it is quite
straightforward, I'm happy to look over anything.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ummunotify: progress at last!

2010-03-23 Thread Jeff Squyres
On Mar 23, 2010, at 12:59 PM, Jason Gunthorpe wrote:

 The main reason for the new FD is so it can be polled on..

What do you poll on the fd for?

With ummunotify, you only read() from the fd when (counter != last_counter).  
Were you thinking that the poll() would be for something else?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ummunotify: progress at last!

2010-03-23 Thread Jason Gunthorpe
On Tue, Mar 23, 2010 at 01:17:40PM -0400, Jeff Squyres wrote:
 On Mar 23, 2010, at 12:59 PM, Jason Gunthorpe wrote:
 
  The main reason for the new FD is so it can be polled on..
 
 What do you poll on the fd for?
 
 With ummunotify, you only read() from the fd when (counter !=
 last_counter).  Were you thinking that the poll() would be for
 something else?

poll() is for apps that want to get the notifications without
spinning on the counter. If you don't think that is worth doing it
does simplify things alot, just add two new verbs calls:

ibv_set_mmu_counter(verbs, my_counter);
ibv_get_mmu_notifications(verbs, my_list, sizeof(my_list));

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] [PATCH 0/22] [for 2.6.36] rdma/cm: add support for native IB addressing

2010-03-23 Thread Sean Hefty
The following patch series extends the rdma_cm to support native
Infiniband addressing through the use of a new AF_IB address family.
It defines a new struct sockaddr_ib that may be used to specify an
IB GID, along with other IB address attributes, such as the pkey and
service ID.

The higher level intent is to support a user space call, rdma_getaddrinfo,
which can return AF_IB addresses to an application.  This allows the
rdma_cm to support transport specific features, such as failover and
non-reversible paths.  (An implementation of rdma_getaddrinfo is included
in a separate patch set to the librdmacm.)

This set is very lightly tested, but I wanted to start soliciting feedback
with more extensive testing ongoing in parallel.  For more casual reviewers
of patches, the first patch in the series is probably the most important,
as it defines struct sockaddr_ib.

I think a reasonable kernel target for these patches would be 2.6.36.

Signed-off-by: Sean Hefty sean.he...@intel.com

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] [PATCH 2/22] [for 2.6.36] rdma/cm: fix handling of ipv6 addressing in cma_use_port

2010-03-23 Thread Sean Hefty
cma_use_port is coded assuming that the sockaddr is an ipv4 address.
Since ipv6 addressing is supported, and also to support other address
families, make the code more generic in its address handling.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 drivers/infiniband/core/cma.c |   29 ++---
 1 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 875e34e..9041a2b 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -643,6 +643,21 @@ static inline int cma_any_addr(struct sockaddr *addr)
return cma_zero_addr(addr) || cma_loopback_addr(addr);
 }
 
+static int cma_addr_cmp(struct sockaddr *src, struct sockaddr *dst)
+{
+   if (src-sa_family != dst-sa_family)
+   return -1;
+
+   switch (src-sa_family) {
+   case AF_INET:
+   return ((struct sockaddr_in *) src)-sin_addr.s_addr !=
+  ((struct sockaddr_in *) dst)-sin_addr.s_addr;
+   default:
+   return ipv6_addr_cmp(((struct sockaddr_in6 *) src)-sin6_addr,
+((struct sockaddr_in6 *) dst)-sin6_addr);
+   }
+}
+
 static inline __be16 cma_port(struct sockaddr *addr)
 {
if (addr-sa_family == AF_INET)
@@ -2014,13 +2029,13 @@ err1:
 static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv)
 {
struct rdma_id_private *cur_id;
-   struct sockaddr_in *sin, *cur_sin;
+   struct sockaddr *addr, *cur_addr;
struct rdma_bind_list *bind_list;
struct hlist_node *node;
unsigned short snum;
 
-   sin = (struct sockaddr_in *) id_priv-id.route.addr.src_addr;
-   snum = ntohs(sin-sin_port);
+   addr = (struct sockaddr *) id_priv-id.route.addr.src_addr;
+   snum = ntohs(cma_port(addr));
if (snum  PROT_SOCK  !capable(CAP_NET_BIND_SERVICE))
return -EACCES;
 
@@ -2032,15 +2047,15 @@ static int cma_use_port(struct idr *ps, struct 
rdma_id_private *id_priv)
 * We don't support binding to any address if anyone is bound to
 * a specific address on the same port.
 */
-   if (cma_any_addr((struct sockaddr *) id_priv-id.route.addr.src_addr))
+   if (cma_any_addr(addr))
return -EADDRNOTAVAIL;
 
hlist_for_each_entry(cur_id, node, bind_list-owners, node) {
-   if (cma_any_addr((struct sockaddr *) 
cur_id-id.route.addr.src_addr))
+   cur_addr = (struct sockaddr *) cur_id-id.route.addr.src_addr;
+   if (cma_any_addr(cur_addr))
return -EADDRNOTAVAIL;
 
-   cur_sin = (struct sockaddr_in *) 
cur_id-id.route.addr.src_addr;
-   if (sin-sin_addr.s_addr == cur_sin-sin_addr.s_addr)
+   if (!cma_addr_cmp(addr, cur_addr))
return -EADDRINUSE;
}
 



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] [PATCH 3/22] [for 2.6.36] rdma/cm: include AF_IB in loopback and any address checks

2010-03-23 Thread Sean Hefty
Enhance checks for loopback and any address to support AF_IB
in addition to AF_INET and AF_INT6.  This will allow future
patches to use AF_IB when binding and resolving addresses.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 drivers/infiniband/core/cma.c |   35 ---
 1 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 9041a2b..6460fbf 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -46,6 +46,7 @@
 
 #include rdma/rdma_cm.h
 #include rdma/rdma_cm_ib.h
+#include rdma/ib.h
 #include rdma/ib_cache.h
 #include rdma/ib_cm.h
 #include rdma/ib_sa.h
@@ -616,26 +617,30 @@ EXPORT_SYMBOL(rdma_init_qp_attr);
 
 static inline int cma_zero_addr(struct sockaddr *addr)
 {
-   struct in6_addr *ip6;
-
-   if (addr-sa_family == AF_INET)
-   return ipv4_is_zeronet(
-   ((struct sockaddr_in *)addr)-sin_addr.s_addr);
-   else {
-   ip6 = ((struct sockaddr_in6 *) addr)-sin6_addr;
-   return (ip6-s6_addr32[0] | ip6-s6_addr32[1] |
-   ip6-s6_addr32[2] | ip6-s6_addr32[3]) == 0;
+   switch (addr-sa_family) {
+   case AF_INET:
+   return ipv4_is_zeronet(((struct sockaddr_in 
*)addr)-sin_addr.s_addr);
+   case AF_INET6:
+   return ipv6_addr_any(((struct sockaddr_in6 *) 
addr)-sin6_addr);
+   case AF_IB:
+   return ib_addr_any(((struct sockaddr_ib *) addr)-sib_addr);
+   default:
+   return 0;
}
 }
 
 static inline int cma_loopback_addr(struct sockaddr *addr)
 {
-   if (addr-sa_family == AF_INET)
-   return ipv4_is_loopback(
-   ((struct sockaddr_in *) addr)-sin_addr.s_addr);
-   else
-   return ipv6_addr_loopback(
-   ((struct sockaddr_in6 *) addr)-sin6_addr);
+   switch (addr-sa_family) {
+   case AF_INET:
+   return ipv4_is_loopback(((struct sockaddr_in *) 
addr)-sin_addr.s_addr);
+   case AF_INET6:
+   return ipv6_addr_loopback(((struct sockaddr_in6 *) 
addr)-sin6_addr);
+   case AF_IB:
+   return ib_addr_loopback(((struct sockaddr_ib *) 
addr)-sib_addr);
+   default:
+   return 0;
+   }
 }
 
 static inline int cma_any_addr(struct sockaddr *addr)



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] [PATCH 6/22] [for 2.6.36] rdma/cm: Allow user to specify AF_IB when binding

2010-03-23 Thread Sean Hefty
Modify rdma_bind_addr to allow the user to specify AF_IB when
binding to a device.  AF_IB indicates that the user is not
mapping an IP address to the native IB addressing.  (The mapping
may have already been done, or is not needed.)

The format of the SID is still controlled by the rdma cm,
but is now exported in its entirety, rather than
just the 16 bit port value based on the RDMA CM IP annex.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 drivers/infiniband/core/cma.c |   21 +
 1 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 1546236..0a3bbf9 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -324,6 +324,13 @@ static int cma_set_qkey(struct rdma_id_private *id_priv)
return ret;
 }
 
+static void cma_translate_ib(struct sockaddr_ib *addr, struct rdma_dev_addr 
*dev_addr)
+{
+   dev_addr-dev_type = ARPHRD_INFINIBAND;
+   rdma_addr_set_sgid(dev_addr, (union ib_gid *) addr-sib_addr);
+   ib_addr_set_pkey(dev_addr, ntohs(addr-sib_pkey));
+}
+
 static int cma_acquire_dev(struct rdma_id_private *id_priv)
 {
struct rdma_dev_addr *dev_addr = id_priv-id.route.addr.dev_addr;
@@ -2148,7 +2155,8 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr 
*addr)
struct rdma_id_private *id_priv;
int ret;
 
-   if (addr-sa_family != AF_INET  addr-sa_family != AF_INET6)
+   if (addr-sa_family != AF_INET  addr-sa_family != AF_INET6 
+   addr-sa_family != AF_IB)
return -EAFNOSUPPORT;
 
id_priv = container_of(id, struct rdma_id_private, id);
@@ -2160,9 +2168,14 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct 
sockaddr *addr)
goto err1;
 
if (!cma_any_addr(addr)) {
-   ret = rdma_translate_ip(addr, id-route.addr.dev_addr);
-   if (ret)
-   goto err1;
+   if (addr-sa_family == AF_IB) {
+   cma_translate_ib((struct sockaddr_ib *) addr,
+id-route.addr.dev_addr);
+   } else {
+   ret = rdma_translate_ip(addr, id-route.addr.dev_addr);
+   if (ret)
+   goto err1;
+   }
 
mutex_lock(lock);
ret = cma_acquire_dev(id_priv);



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] [PATCH 7/22] [for 2.6.36] rdma/cm: do not modify sa_family when setting loopback address

2010-03-23 Thread Sean Hefty
cma_resolve_loopback is called after an rdma_cm_id has been
bound to a specific sa_family and port.  Once the
source sa_family for the id has been set, do not modify it.
Only the actual IP address portion of the source address
needs to be set.

As part of this fix, we can simplify setting the source address
by moving the loopback address assignment from cma_resolve_loopback
to cma_bind_loopback.  cma_bind_loopback is only invoked when
the source address is the loopback address.

Finally, add loopback support for AF_IB as part of the change.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 drivers/infiniband/core/cma.c |   31 ++-
 1 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 0a3bbf9..7981a85 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1776,6 +1776,23 @@ err:
 }
 EXPORT_SYMBOL(rdma_resolve_route);
 
+static void cma_set_loopback(struct sockaddr *addr)
+{
+   switch (addr-sa_family) {
+   case AF_INET:
+   ((struct sockaddr_in *) addr)-sin_addr.s_addr = 
htonl(INADDR_LOOPBACK);
+   break;
+   case AF_INET6:
+   ipv6_addr_set(((struct sockaddr_in6 *) addr)-sin6_addr,
+ 0, 0, 0, htonl(1));
+   break;
+   default:
+   ib_addr_set(((struct sockaddr_ib *) addr)-sib_addr,
+   0, 0, 0, htonl(1));
+   break;
+   }
+}
+
 static int cma_bind_loopback(struct rdma_id_private *id_priv)
 {
struct cma_device *cma_dev;
@@ -1816,6 +1833,7 @@ port_found:
ib_addr_set_pkey(id_priv-id.route.addr.dev_addr, pkey);
id_priv-id.port_num = p;
cma_attach_to_dev(id_priv, cma_dev);
+   cma_set_loopback((struct sockaddr *) id_priv-id.route.addr.src_addr);
 out:
mutex_unlock(lock);
return ret;
@@ -1870,7 +1888,6 @@ out:
 static int cma_resolve_loopback(struct rdma_id_private *id_priv)
 {
struct cma_work *work;
-   struct sockaddr *src, *dst;
union ib_gid gid;
int ret;
 
@@ -1887,18 +1904,6 @@ static int cma_resolve_loopback(struct rdma_id_private 
*id_priv)
rdma_addr_get_sgid(id_priv-id.route.addr.dev_addr, gid);
rdma_addr_set_dgid(id_priv-id.route.addr.dev_addr, gid);
 
-   src = (struct sockaddr *) id_priv-id.route.addr.src_addr;
-   if (cma_zero_addr(src)) {
-   dst = (struct sockaddr *) id_priv-id.route.addr.dst_addr;
-   if ((src-sa_family = dst-sa_family) == AF_INET) {
-   ((struct sockaddr_in *) src)-sin_addr.s_addr =
-   ((struct sockaddr_in *) dst)-sin_addr.s_addr;
-   } else {
-   ipv6_addr_copy(((struct sockaddr_in6 *) 
src)-sin6_addr,
-  ((struct sockaddr_in6 *) 
dst)-sin6_addr);
-   }
-   }
-
work-id = id_priv;
INIT_WORK(work-work, cma_work_handler);
work-old_state = CMA_ADDR_QUERY;



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] [PATCH 8/22] [for 2.6.36] rdma/cm: restrict AF_IB loopback to binding to IB devices only

2010-03-23 Thread Sean Hefty
If a user specifies AF_IB as the source address for a loopback
connection, limit the resolution to IB devices only.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 drivers/infiniband/core/cma.c |   30 ++
 1 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 7981a85..0bc33cc 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1795,26 +1795,40 @@ static void cma_set_loopback(struct sockaddr *addr)
 
 static int cma_bind_loopback(struct rdma_id_private *id_priv)
 {
-   struct cma_device *cma_dev;
+   struct cma_device *cma_dev, *cur_dev;
+   struct sockaddr *addr;
struct ib_port_attr port_attr;
union ib_gid gid;
u16 pkey;
int ret;
u8 p;
 
+   cma_dev = NULL;
+   addr = (struct sockaddr *) id_priv-id.route.addr.src_addr;
mutex_lock(lock);
-   if (list_empty(dev_list)) {
+   list_for_each_entry(cur_dev, dev_list, list) {
+   if (addr-sa_family == AF_IB 
+   rdma_node_get_transport(cur_dev-device-node_type) != 
RDMA_TRANSPORT_IB)
+   continue;
+
+   if (!cma_dev)
+   cma_dev = cur_dev;
+
+   for (p = 1; p = cur_dev-device-phys_port_cnt; ++p) {
+   if (!ib_query_port(cur_dev-device, p, port_attr) 
+   port_attr.state == IB_PORT_ACTIVE) {
+   cma_dev = cur_dev;
+   goto port_found;
+   }
+   }
+   }
+
+   if (!cma_dev) {
ret = -ENODEV;
goto out;
}
-   list_for_each_entry(cma_dev, dev_list, list)
-   for (p = 1; p = cma_dev-device-phys_port_cnt; ++p)
-   if (!ib_query_port(cma_dev-device, p, port_attr) 
-   port_attr.state == IB_PORT_ACTIVE)
-   goto port_found;
 
p = 1;
-   cma_dev = list_entry(dev_list.next, struct cma_device, list);
 
 port_found:
ret = ib_get_cached_gid(cma_dev-device, p, 0, gid);



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] [PATCH 9/22] [for 2.6.36] rdma/cm: add support for AF_IB to rdma_resolve_addr

2010-03-23 Thread Sean Hefty
Allow the user to specify the remote address using AF_IB format.
When AF_IB is used, the remote address simply needs to be recorded,
and no resolution using ARP is done.  The local address may still
need to be resolved however.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 drivers/infiniband/core/cma.c |  108 +++--
 1 files changed, 102 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 0bc33cc..c7b6912 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -350,6 +350,62 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv)
return ret;
 }
 
+/*
+ * Select the source IB device and address to reach the destination IB address.
+ */
+static int cma_resolve_ib_dev(struct rdma_id_private *id_priv)
+{
+   struct cma_device *cma_dev, *cur_dev;
+   struct sockaddr_ib *addr;
+   union ib_gid gid, sgid, *dgid;
+   u16 pkey, index;
+   u8 port, p;
+   int i;
+
+   cma_dev = NULL;
+   addr = (struct sockaddr_ib *) id_priv-id.route.addr.dst_addr;
+   dgid = (union ib_gid *) addr-sib_addr;
+   pkey = ntohs(addr-sib_pkey);
+
+   list_for_each_entry(cur_dev, dev_list, list) {
+   if (rdma_node_get_transport(cur_dev-device-node_type) != 
RDMA_TRANSPORT_IB)
+   continue;
+
+   for (p = 1; p = cur_dev-device-phys_port_cnt; ++p) {
+   if (ib_find_cached_pkey(cur_dev-device, p, pkey, 
index))
+   continue;
+
+   for (i = 0; !ib_get_cached_gid(cur_dev-device, p, i, 
gid); i++) {
+   if (!memcmp(gid, dgid, sizeof(gid))) {
+   cma_dev = cur_dev;
+   sgid = gid;
+   port = p;
+   goto found;
+   }
+
+   if (!cma_dev  (gid.global.subnet_prefix ==
+dgid-global.subnet_prefix)) {
+   cma_dev = cur_dev;
+   sgid = gid;
+   port = p;
+   }
+   }
+   }
+   }
+
+   if (!cma_dev) {
+   return -ENODEV;
+   }
+
+found:
+   cma_attach_to_dev(id_priv, cma_dev);
+   id_priv-id.port_num = port;
+   addr = (struct sockaddr_ib *) id_priv-id.route.addr.src_addr;
+   memcpy(addr-sib_addr, sgid, sizeof sgid);
+   cma_translate_ib(addr, id_priv-id.route.addr.dev_addr);
+   return 0;
+}
+
 static void cma_deref_id(struct rdma_id_private *id_priv)
 {
if (atomic_dec_and_test(id_priv-refcount))
@@ -1930,14 +1986,48 @@ err:
return ret;
 }
 
+static int cma_resolve_ib_addr(struct rdma_id_private *id_priv)
+{
+   struct cma_work *work;
+   int ret;
+
+   work = kzalloc(sizeof *work, GFP_KERNEL);
+   if (!work)
+   return -ENOMEM;
+
+   if (!id_priv-cma_dev) {
+   ret = cma_resolve_ib_dev(id_priv);
+   if (ret)
+   goto err;
+   }
+
+   rdma_addr_set_dgid(id_priv-id.route.addr.dev_addr, (union ib_gid *)
+   (((struct sockaddr_ib *) 
id_priv-id.route.addr.dst_addr)-sib_addr));
+
+   work-id = id_priv;
+   INIT_WORK(work-work, cma_work_handler);
+   work-old_state = CMA_ADDR_QUERY;
+   work-new_state = CMA_ADDR_RESOLVED;
+   work-event.event = RDMA_CM_EVENT_ADDR_RESOLVED;
+   queue_work(cma_wq, work-work);
+   return 0;
+err:
+   kfree(work);
+   return ret;
+}
+
 static int cma_bind_addr(struct rdma_cm_id *id, struct sockaddr *src_addr,
 struct sockaddr *dst_addr)
 {
if (!src_addr || !src_addr-sa_family) {
src_addr = (struct sockaddr *) id-route.addr.src_addr;
-   if ((src_addr-sa_family = dst_addr-sa_family) == AF_INET6) {
+   src_addr-sa_family = dst_addr-sa_family;
+   if (dst_addr-sa_family == AF_INET6) {
((struct sockaddr_in6 *) src_addr)-sin6_scope_id =
((struct sockaddr_in6 *) 
dst_addr)-sin6_scope_id;
+   } else if (dst_addr-sa_family == AF_IB) {
+   ((struct sockaddr_ib *) src_addr)-sib_pkey =
+   ((struct sockaddr_ib *) dst_addr)-sib_pkey;
}
}
return rdma_bind_addr(id, src_addr);
@@ -1961,12 +2051,18 @@ int rdma_resolve_addr(struct rdma_cm_id *id, struct 
sockaddr *src_addr,
 
atomic_inc(id_priv-refcount);
memcpy(id-route.addr.dst_addr, dst_addr, rdma_addr_size(dst_addr));
-   if (cma_any_addr(dst_addr))
+   if (cma_any_addr(dst_addr)) {
ret = 

[RFC] [PATCH 11/22] [for 2.6.36] rdma/cm: fixup white space

2010-03-23 Thread Sean Hefty
Fix white space issue that bugs me.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 drivers/infiniband/core/cma.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 326bda6..b4adcc6 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1657,7 +1657,7 @@ static int cma_query_ib_route(struct rdma_id_private 
*id_priv, int timeout_ms,
path_rec.numb_path = 1;
path_rec.reversible = 1;
path_rec.service_id = cma_get_service_id(id_priv-id.ps,
-   (struct sockaddr *) 
addr-dst_addr);
+(struct sockaddr *) 
addr-dst_addr);
 
comp_mask = IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID |
IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH |



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] [PATCH 12/22] [for 2.6.36] rdma/cm: expose private data when using AF_IB

2010-03-23 Thread Sean Hefty
If the source or destination address is AF_IB, then do not
reserve a portion of the private data in the IB CM REQ or SIDR
REQ messages for the cma header.  Instead, all private data
should be exported to the user.  When AF_IB is used, the
rdma cm does not have sufficient information to fill in the
cma header.  Additionally, this will be necessary to support
any IB connection through the rdma cm interface, 

Signed-off-by: Sean Hefty sean.he...@intel.com
---
rdma_getaddrinfo will end up formatting the private data for the user
if necessary.

 drivers/infiniband/core/cma.c |   46 +
 1 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index b4adcc6..0d3c4ef 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -818,14 +818,13 @@ static void cma_save_net_info(struct rdma_addr *addr,
}
 }
 
-static inline int cma_user_data_offset(enum rdma_port_space ps)
+static inline int cma_user_data_offset(struct rdma_cm_id *id)
 {
-   switch (ps) {
-   case RDMA_PS_SDP:
+   if (id-ps == RDMA_PS_SDP || id-route.addr.src_addr.ss_family == AF_IB 
||
+   id-route.addr.dst_addr.ss_family == AF_IB)
return 0;
-   default:
+   else
return sizeof(struct cma_hdr);
-   }
 }
 
 static void cma_cancel_route(struct rdma_id_private *id_priv)
@@ -1201,7 +1200,7 @@ static int cma_req_handler(struct ib_cm_id *cm_id, struct 
ib_cm_event *ib_event)
return -ECONNABORTED;
 
memset(event, 0, sizeof event);
-   offset = cma_user_data_offset(listen_id-id.ps);
+   offset = cma_user_data_offset(listen_id-id);
event.event = RDMA_CM_EVENT_CONNECT_REQUEST;
if (cma_is_ud_ps(listen_id-id.ps)) {
conn_id = cma_new_udp_id(listen_id-id, ib_event);
@@ -2320,19 +2319,19 @@ err1:
 }
 EXPORT_SYMBOL(rdma_bind_addr);
 
-static int cma_format_hdr(void *hdr, enum rdma_port_space ps,
- struct rdma_route *route)
+static int cma_format_hdr(void *hdr, struct rdma_cm_id *id)
 {
struct cma_hdr *cma_hdr;
struct sdp_hh *sdp_hdr;
 
-   if (route-addr.src_addr.ss_family == AF_INET) {
+   if (id-route.addr.src_addr.ss_family == AF_INET 
+   id-route.addr.dst_addr.ss_family == AF_INET) {
struct sockaddr_in *src4, *dst4;
 
-   src4 = (struct sockaddr_in *) route-addr.src_addr;
-   dst4 = (struct sockaddr_in *) route-addr.dst_addr;
+   src4 = (struct sockaddr_in *) id-route.addr.src_addr;
+   dst4 = (struct sockaddr_in *) id-route.addr.dst_addr;
 
-   switch (ps) {
+   switch (id-ps) {
case RDMA_PS_SDP:
sdp_hdr = hdr;
if (sdp_get_majv(sdp_hdr-sdp_version) != 
SDP_MAJ_VERSION)
@@ -2351,13 +2350,14 @@ static int cma_format_hdr(void *hdr, enum 
rdma_port_space ps,
cma_hdr-port = src4-sin_port;
break;
}
-   } else {
+   } else if (id-route.addr.src_addr.ss_family == AF_INET6 
+  id-route.addr.dst_addr.ss_family == AF_INET6) {
struct sockaddr_in6 *src6, *dst6;
 
-   src6 = (struct sockaddr_in6 *) route-addr.src_addr;
-   dst6 = (struct sockaddr_in6 *) route-addr.dst_addr;
+   src6 = (struct sockaddr_in6 *) id-route.addr.src_addr;
+   dst6 = (struct sockaddr_in6 *) id-route.addr.dst_addr;
 
-   switch (ps) {
+   switch (id-ps) {
case RDMA_PS_SDP:
sdp_hdr = hdr;
if (sdp_get_majv(sdp_hdr-sdp_version) != 
SDP_MAJ_VERSION)
@@ -2449,20 +2449,20 @@ static int cma_resolve_ib_udp(struct rdma_id_private 
*id_priv,
 {
struct ib_cm_sidr_req_param req;
struct rdma_route *route;
-   int ret;
+   int offset, ret;
 
-   req.private_data_len = sizeof(struct cma_hdr) +
-  conn_param-private_data_len;
+   offset = cma_user_data_offset(id_priv-id);
+   req.private_data_len = offset + conn_param-private_data_len;
req.private_data = kzalloc(req.private_data_len, GFP_ATOMIC);
if (!req.private_data)
return -ENOMEM;
 
if (conn_param-private_data  conn_param-private_data_len)
-   memcpy((void *) req.private_data + sizeof(struct cma_hdr),
+   memcpy((void *) req.private_data + offset,
   conn_param-private_data, conn_param-private_data_len);
 
route = id_priv-id.route;
-   ret = cma_format_hdr((void *) req.private_data, id_priv-id.ps, route);
+   ret = cma_format_hdr((void *) req.private_data, id_priv-id);
if (ret)
goto out;
 
@@ -2498,7 +2498,7 @@ static int cma_connect_ib(struct rdma_id_private *id_priv,
   

[RFC] [PATCH 13/22] [for 2.6.36] rdma/cm: only listen on IB devices when using AF_IB

2010-03-23 Thread Sean Hefty
If an rdma_cm_id is bound to AF_IB, with a wild card address,
only listen on IB devices.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 drivers/infiniband/core/cma.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 0d3c4ef..c5bb70c 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1535,6 +1535,10 @@ static void cma_listen_on_dev(struct rdma_id_private 
*id_priv,
struct rdma_cm_id *id;
int ret;
 
+   if (id_priv-id.route.addr.src_addr.ss_family == AF_IB 
+   rdma_node_get_transport(cma_dev-device-node_type) != 
RDMA_TRANSPORT_IB)
+   return;
+
id = rdma_create_id(cma_listen_handler, id_priv, id_priv-id.ps);
if (IS_ERR(id))
return;



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] [PATCH 14/22] [for 2.6.36] rdma/ucm: support querying for AF_IB addresses

2010-03-23 Thread Sean Hefty
The sockaddr structure for AF_IB is larger than sockaddr_in6.
The rdma cm user space ABI uses the latter to exchange address
information between user space and the kernel.

To support querying for larger addresses, define a new query
command that exchanges data using sockaddr_storage, rather
than sockaddr_in6.  Unlike the existing query_route command,
the new command only returns address information.  Route
(i.e. path record) data is separated.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 drivers/infiniband/core/ucma.c |   76 +++-
 include/rdma/rdma_user_cm.h|   22 ++--
 2 files changed, 93 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index b2e16c3..da5a3ec 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -44,6 +44,7 @@
 #include rdma/ib_marshall.h
 #include rdma/rdma_cm.h
 #include rdma/rdma_cm_ib.h
+#include rdma/ib_addr.h
 
 MODULE_AUTHOR(Sean Hefty);
 MODULE_DESCRIPTION(RDMA Userspace Connection Manager Access);
@@ -586,7 +587,7 @@ static ssize_t ucma_query_route(struct ucma_file *file,
const char __user *inbuf,
int in_len, int out_len)
 {
-   struct rdma_ucm_query_route cmd;
+   struct rdma_ucm_query cmd;
struct rdma_ucm_query_route_resp resp;
struct ucma_context *ctx;
struct sockaddr *addr;
@@ -633,6 +634,76 @@ out:
return ret;
 }
 
+static void ucma_query_device_addr(struct rdma_cm_id *cm_id,
+  struct rdma_ucm_query_addr_resp *resp)
+{
+   if (!cm_id-device)
+   return;
+
+   resp-node_guid = (__force __u64) cm_id-device-node_guid;
+   resp-port_num = cm_id-port_num;
+   resp-pkey = (__force __u16) cpu_to_be16(
+ib_addr_get_pkey(cm_id-route.addr.dev_addr));
+}
+
+static ssize_t ucma_query_addr(struct ucma_context *ctx,
+  void __user *response, int out_len)
+{
+   struct rdma_ucm_query_addr_resp resp;
+   struct sockaddr *addr;
+   int ret = 0;
+
+   if (out_len  sizeof(resp))
+   return -ENOSPC;
+
+   memset(resp, 0, sizeof resp);
+
+   addr = (struct sockaddr *) ctx-cm_id-route.addr.src_addr;
+   resp.src_size = rdma_addr_size(addr);
+   memcpy(resp.src_addr, addr, resp.src_size);
+
+   addr = (struct sockaddr *) ctx-cm_id-route.addr.dst_addr;
+   resp.dst_size = rdma_addr_size(addr);
+   memcpy(resp.dst_addr, addr, resp.dst_size);
+
+   ucma_query_device_addr(ctx-cm_id, resp);
+
+   if (copy_to_user(response, resp, sizeof(resp)))
+   ret = -EFAULT;
+
+   return ret;
+}
+
+static ssize_t ucma_query(struct ucma_file *file,
+ const char __user *inbuf,
+ int in_len, int out_len)
+{
+   struct rdma_ucm_query cmd;
+   struct ucma_context *ctx;
+   void __user *response;
+   int ret;
+
+   if (copy_from_user(cmd, inbuf, sizeof(cmd)))
+   return -EFAULT;
+
+   response = (void __user *)(unsigned long) cmd.response;
+   ctx = ucma_get_ctx(file, cmd.id);
+   if (IS_ERR(ctx))
+   return PTR_ERR(ctx);
+
+   switch (cmd.option) {
+   case RDMA_USER_CM_QUERY_ADDR:
+   ret = ucma_query_addr(ctx, response, out_len);
+   break;
+   default:
+   ret = -ENOSYS;
+   break;
+   }
+
+   ucma_put_ctx(ctx);
+   return ret;
+}
+
 static void ucma_copy_conn_param(struct rdma_conn_param *dst,
 struct rdma_ucm_conn_param *src)
 {
@@ -1151,7 +1222,8 @@ static ssize_t (*ucma_cmd_table[])(struct ucma_file *file,
[RDMA_USER_CM_CMD_NOTIFY]   = ucma_notify,
[RDMA_USER_CM_CMD_JOIN_MCAST]   = ucma_join_multicast,
[RDMA_USER_CM_CMD_LEAVE_MCAST]  = ucma_leave_multicast,
-   [RDMA_USER_CM_CMD_MIGRATE_ID]   = ucma_migrate_id
+   [RDMA_USER_CM_CMD_MIGRATE_ID]   = ucma_migrate_id,
+   [RDMA_USER_CM_CMD_QUERY]= ucma_query
 };
 
 static ssize_t ucma_write(struct file *filp, const char __user *buf,
diff --git a/include/rdma/rdma_user_cm.h b/include/rdma/rdma_user_cm.h
index 1d16502..0d9e984 100644
--- a/include/rdma/rdma_user_cm.h
+++ b/include/rdma/rdma_user_cm.h
@@ -61,7 +61,8 @@ enum {
RDMA_USER_CM_CMD_NOTIFY,
RDMA_USER_CM_CMD_JOIN_MCAST,
RDMA_USER_CM_CMD_LEAVE_MCAST,
-   RDMA_USER_CM_CMD_MIGRATE_ID
+   RDMA_USER_CM_CMD_MIGRATE_ID,
+   RDMA_USER_CM_CMD_QUERY
 };
 
 /*
@@ -112,10 +113,14 @@ struct rdma_ucm_resolve_route {
__u32 timeout_ms;
 };
 
-struct rdma_ucm_query_route {
+enum {
+   RDMA_USER_CM_QUERY_ADDR
+};
+
+struct rdma_ucm_query {
__u64 response;
__u32 id;
-   __u32 reserved;
+   __u32 option;
 };
 
 struct rdma_ucm_query_route_resp {
@@ -128,6 +133,17 @@ struct 

[RFC] [PATCH 15/22] [for 2.6.36] ib/sa: export function to pack a path record into wire format

2010-03-23 Thread Sean Hefty
Allow converting from struct ib_sa_path_rec to the IB defined
SA path record wire format.  This will be used to report path
data from the rdma cm into user space.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 drivers/infiniband/core/sa_query.c |6 ++
 include/rdma/ib_sa.h   |6 ++
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/core/sa_query.c 
b/drivers/infiniband/core/sa_query.c
index 7e1ffd8..54ec971 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -610,6 +610,12 @@ void ib_sa_unpack_path(void *attribute, struct 
ib_sa_path_rec *rec)
 }
 EXPORT_SYMBOL(ib_sa_unpack_path);
 
+void ib_sa_pack_path(struct ib_sa_path_rec *rec, void *attribute)
+{
+   ib_pack(path_rec_table, ARRAY_SIZE(path_rec_table), rec, attribute);
+}
+EXPORT_SYMBOL(ib_sa_pack_path);
+
 static void ib_sa_path_rec_callback(struct ib_sa_query *sa_query,
int status,
struct ib_sa_mad *mad)
diff --git a/include/rdma/ib_sa.h b/include/rdma/ib_sa.h
index 1082afa..86aa772 100644
--- a/include/rdma/ib_sa.h
+++ b/include/rdma/ib_sa.h
@@ -385,4 +385,10 @@ int ib_init_ah_from_path(struct ib_device *device, u8 
port_num,
  */
 void ib_sa_unpack_path(void *attribute, struct ib_sa_path_rec *rec);
 
+/**
+ * ib_sa_pack_path - Conert a path record from struct ib_sa_path_rec
+ * to IB MAD wire format.
+ */
+void ib_sa_pack_path(struct ib_sa_path_rec *rec, void *attribute);
+
 #endif /* IB_SA_H */



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] [PATCH 16/22] [for 2.6.36] rdma/ucm: support querying when IB paths are not reversible

2010-03-23 Thread Sean Hefty
The current query_route call can return up to two path records.
The assumption being that one is the primary path, with optional
support for an alternate path.  In both cases, the paths are
assumed to be reversible and are used to send CM MADs.

With the ability to manually set IB path data, the rdma cm
can eventually be capable of using up to 6 paths per
connection:

forward primary, reverse primary,
forward alternate, reverse alternate,
reversible primary path for CM MADs
reversible alternate path for CM MADs.

(It is unclear at this time if IB routing will complicate this.)
In order to handle more flexible routing topologies, add a new 
command to report any number of paths.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 drivers/infiniband/core/ucma.c |   35 +++
 include/rdma/rdma_user_cm.h|9 -
 2 files changed, 43 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index da5a3ec..86115ec 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -674,6 +674,38 @@ static ssize_t ucma_query_addr(struct ucma_context *ctx,
return ret;
 }
 
+static ssize_t ucma_query_path(struct ucma_context *ctx,
+  void __user *response, int out_len)
+{
+   struct rdma_ucm_query_path_resp *resp;
+   int i, ret = 0;
+
+   if (out_len  sizeof(*resp))
+   return -ENOSPC;
+
+   resp = kzalloc(out_len, GFP_KERNEL);
+   if (!resp)
+   return -ENOMEM;
+
+   resp-num_paths = ctx-cm_id-route.num_paths;
+   for (i = 0, out_len -= sizeof(*resp);
+i  resp-num_paths  out_len  sizeof(struct ib_path_rec_data);
+i++, out_len -= sizeof(struct ib_path_rec_data)) {
+
+   resp-path_data[i].flags = IB_PATH_GMP | IB_PATH_PRIMARY |
+  IB_PATH_BIDIRECTIONAL;
+   ib_sa_pack_path(ctx-cm_id-route.path_rec[i],
+   resp-path_data[i].path_rec);
+   }
+
+   if (copy_to_user(response, resp,
+sizeof(*resp) + (i * sizeof(struct ib_path_rec_data
+   ret = -EFAULT;
+
+   kfree(resp);
+   return ret;
+}
+
 static ssize_t ucma_query(struct ucma_file *file,
  const char __user *inbuf,
  int in_len, int out_len)
@@ -695,6 +727,9 @@ static ssize_t ucma_query(struct ucma_file *file,
case RDMA_USER_CM_QUERY_ADDR:
ret = ucma_query_addr(ctx, response, out_len);
break;
+   case RDMA_USER_CM_QUERY_PATH:
+   ret = ucma_query_path(ctx, response, out_len);
+   break;
default:
ret = -ENOSYS;
break;
diff --git a/include/rdma/rdma_user_cm.h b/include/rdma/rdma_user_cm.h
index 0d9e984..ee48dde 100644
--- a/include/rdma/rdma_user_cm.h
+++ b/include/rdma/rdma_user_cm.h
@@ -114,7 +114,8 @@ struct rdma_ucm_resolve_route {
 };
 
 enum {
-   RDMA_USER_CM_QUERY_ADDR
+   RDMA_USER_CM_QUERY_ADDR,
+   RDMA_USER_CM_QUERY_PATH
 };
 
 struct rdma_ucm_query {
@@ -144,6 +145,12 @@ struct rdma_ucm_query_addr_resp {
struct sockaddr_storage dst_addr;
 };
 
+struct rdma_ucm_query_path_resp {
+   __u32 num_paths;
+   __u32 reserved;
+   struct ib_path_rec_data path_data[0];
+};
+
 struct rdma_ucm_conn_param {
__u32 qp_num;
__u32 reserved;



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] [PATCH 17/22] [for 2.6.36] rdma/cm: export cma_get_service_id

2010-03-23 Thread Sean Hefty
Allow the rdma_ucm to query the IB service ID formed or
allocated by the rdma_cm by exporting the cma_get_service_id
functionality.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 drivers/infiniband/core/cma.c |   19 ++-
 include/rdma/rdma_cm.h|7 +++
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index c5bb70c..1d64342 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1256,13 +1256,14 @@ out:
return ret;
 }
 
-static __be64 cma_get_service_id(enum rdma_port_space ps, struct sockaddr 
*addr)
+__be64 rdma_get_service_id(struct rdma_cm_id *id, struct sockaddr *addr)
 {
if (addr-sa_family == AF_IB)
return ((struct sockaddr_ib *) addr)-sib_sid;
 
-   return cpu_to_be64(((u64)ps  16) + be16_to_cpu(cma_port(addr)));
+   return cpu_to_be64(((u64)id-ps  16) + be16_to_cpu(cma_port(addr)));
 }
+EXPORT_SYMBOL(rdma_get_service_id);
 
 static void cma_set_compare_data(enum rdma_port_space ps, struct sockaddr 
*addr,
 struct ib_cm_compare_data *compare)
@@ -1478,7 +1479,7 @@ static int cma_ib_listen(struct rdma_id_private *id_priv)
return PTR_ERR(id_priv-cm_id.ib);
 
addr = (struct sockaddr *) id_priv-id.route.addr.src_addr;
-   svc_id = cma_get_service_id(id_priv-id.ps, addr);
+   svc_id = rdma_get_service_id(id_priv-id, addr);
if (cma_any_addr(addr))
ret = ib_cm_listen(id_priv-cm_id.ib, svc_id, 0, NULL);
else {
@@ -1659,8 +1660,8 @@ static int cma_query_ib_route(struct rdma_id_private 
*id_priv, int timeout_ms,
path_rec.pkey = cpu_to_be16(ib_addr_get_pkey(addr-dev_addr));
path_rec.numb_path = 1;
path_rec.reversible = 1;
-   path_rec.service_id = cma_get_service_id(id_priv-id.ps,
-(struct sockaddr *) 
addr-dst_addr);
+   path_rec.service_id = rdma_get_service_id(id_priv-id,
+ (struct sockaddr *) 
addr-dst_addr);
 
comp_mask = IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID |
IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH |
@@ -2478,8 +2479,8 @@ static int cma_resolve_ib_udp(struct rdma_id_private 
*id_priv,
}
 
req.path = route-path_rec;
-   req.service_id = cma_get_service_id(id_priv-id.ps,
-   (struct sockaddr *) 
route-addr.dst_addr);
+   req.service_id = rdma_get_service_id(id_priv-id,
+(struct sockaddr *) 
route-addr.dst_addr);
req.timeout_ms = 1  (CMA_CM_RESPONSE_TIMEOUT - 8);
req.max_cm_retries = CMA_MAX_CM_RETRIES;
 
@@ -2529,8 +2530,8 @@ static int cma_connect_ib(struct rdma_id_private *id_priv,
if (route-num_paths == 2)
req.alternate_path = route-path_rec[1];
 
-   req.service_id = cma_get_service_id(id_priv-id.ps,
-   (struct sockaddr *) 
route-addr.dst_addr);
+   req.service_id = rdma_get_service_id(id_priv-id,
+(struct sockaddr *) 
route-addr.dst_addr);
req.qp_num = id_priv-qp_num;
req.qp_type = IB_QPT_RC;
req.starting_psn = id_priv-seq_num;
diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h
index c6b2962..ffc544f 100644
--- a/include/rdma/rdma_cm.h
+++ b/include/rdma/rdma_cm.h
@@ -330,4 +330,11 @@ void rdma_leave_multicast(struct rdma_cm_id *id, struct 
sockaddr *addr);
  */
 void rdma_set_service_type(struct rdma_cm_id *id, int tos);
 
+/**
+ * rdma_get_service_id - Return the IB service ID for a specified address.
+ * @id: Communication identifier associated with the address.
+ * @addr: Address for the service ID.
+ */
+__be64 rdma_get_service_id(struct rdma_cm_id *id, struct sockaddr *addr);
+
 #endif /* RDMA_CM_H */



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] [PATCH 20/22] [for 2.6.36] rdma/ucm: allow user space to bind to AF_IB

2010-03-23 Thread Sean Hefty
Support user space binding to addresses using AF_IB.  Since
sockaddr_ib is larger than sockaddr_in6, we need to define
a larger structure when binding using AF_IB.  This time we
use sockaddr_storage to cover future cases.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 drivers/infiniband/core/ucma.c |   27 ++-
 include/rdma/rdma_user_cm.h|   10 +-
 2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 950bd35..e2d8dcf 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -514,6 +514,30 @@ static ssize_t ucma_bind_ip(struct ucma_file *file, const 
char __user *inbuf,
return ret;
 }
 
+static ssize_t ucma_bind(struct ucma_file *file, const char __user *inbuf,
+int in_len, int out_len)
+{
+   struct rdma_ucm_bind cmd;
+   struct sockaddr *addr;
+   struct ucma_context *ctx;
+   int ret;
+
+   if (copy_from_user(cmd, inbuf, sizeof(cmd)))
+   return -EFAULT;
+
+   addr = (struct sockaddr *) cmd.addr;
+   if (cmd.reserved || !cmd.addr_size || (cmd.addr_size != 
rdma_addr_size(addr)))
+   return -EINVAL;
+
+   ctx = ucma_get_ctx(file, cmd.id);
+   if (IS_ERR(ctx))
+   return PTR_ERR(ctx);
+
+   ret = rdma_bind_addr(ctx-cm_id, addr);
+   ucma_put_ctx(ctx);
+   return ret;
+}
+
 static ssize_t ucma_resolve_ip(struct ucma_file *file,
   const char __user *inbuf,
   int in_len, int out_len)
@@ -1308,7 +1332,8 @@ static ssize_t (*ucma_cmd_table[])(struct ucma_file *file,
[RDMA_USER_CM_CMD_JOIN_IP_MCAST]= ucma_join_ip_multicast,
[RDMA_USER_CM_CMD_LEAVE_MCAST]  = ucma_leave_multicast,
[RDMA_USER_CM_CMD_MIGRATE_ID]   = ucma_migrate_id,
-   [RDMA_USER_CM_CMD_QUERY]= ucma_query
+   [RDMA_USER_CM_CMD_QUERY]= ucma_query,
+   [RDMA_USER_CM_CMD_BIND] = ucma_bind
 };
 
 static ssize_t ucma_write(struct file *filp, const char __user *buf,
diff --git a/include/rdma/rdma_user_cm.h b/include/rdma/rdma_user_cm.h
index bbb724b..009b8da 100644
--- a/include/rdma/rdma_user_cm.h
+++ b/include/rdma/rdma_user_cm.h
@@ -62,7 +62,8 @@ enum {
RDMA_USER_CM_CMD_JOIN_IP_MCAST,
RDMA_USER_CM_CMD_LEAVE_MCAST,
RDMA_USER_CM_CMD_MIGRATE_ID,
-   RDMA_USER_CM_CMD_QUERY
+   RDMA_USER_CM_CMD_QUERY,
+   RDMA_USER_CM_CMD_BIND
 };
 
 /*
@@ -101,6 +102,13 @@ struct rdma_ucm_bind_ip {
__u32 id;
 };
 
+struct rdma_ucm_bind {
+   __u32 id;
+   __u16 addr_size;
+   __u16 reserved;
+   struct sockaddr_storage addr;
+};
+
 struct rdma_ucm_resolve_ip {
struct sockaddr_in6 src_addr;
struct sockaddr_in6 dst_addr;



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] [PATCH 22/22] [for 2.6.36] rdma/ucm: allow user space to specify AF_IB when joining multicast

2010-03-23 Thread Sean Hefty
Allow user space applications to join multicast groups using MGIDs
directly.  MGIDs may be passed using AF_IB addresses.  Since the
current multicast join command only supports addresses as large as
sockaddr_in6, define a new structure for joining addresses specified
using sockaddr_ib.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 drivers/infiniband/core/ucma.c |   55 
 include/rdma/rdma_user_cm.h|   12 -
 2 files changed, 55 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 2224a05..a9b917a 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -1139,23 +1139,23 @@ static ssize_t ucma_notify(struct ucma_file *file, 
const char __user *inbuf,
return ret;
 }
 
-static ssize_t ucma_join_ip_multicast(struct ucma_file *file,
- const char __user *inbuf,
- int in_len, int out_len)
+static ssize_t ucma_process_join(struct ucma_file *file,
+struct rdma_ucm_join_mcast *cmd,  int out_len)
 {
-   struct rdma_ucm_join_ip_mcast cmd;
struct rdma_ucm_create_id_resp resp;
struct ucma_context *ctx;
struct ucma_multicast *mc;
+   struct sockaddr *addr;
int ret;
 
if (out_len  sizeof(resp))
return -ENOSPC;
 
-   if (copy_from_user(cmd, inbuf, sizeof(cmd)))
-   return -EFAULT;
+   addr = (struct sockaddr *) cmd-addr;
+   if (cmd-reserved || !cmd-addr_size || (cmd-addr_size != 
rdma_addr_size(addr)))
+   return -EINVAL;
 
-   ctx = ucma_get_ctx(file, cmd.id);
+   ctx = ucma_get_ctx(file, cmd-id);
if (IS_ERR(ctx))
return PTR_ERR(ctx);
 
@@ -1166,14 +1166,14 @@ static ssize_t ucma_join_ip_multicast(struct ucma_file 
*file,
goto err1;
}
 
-   mc-uid = cmd.uid;
-   memcpy(mc-addr, cmd.addr, sizeof cmd.addr);
+   mc-uid = cmd-uid;
+   memcpy(mc-addr, addr, cmd-addr_size);
ret = rdma_join_multicast(ctx-cm_id, (struct sockaddr *) mc-addr, 
mc);
if (ret)
goto err2;
 
resp.id = mc-id;
-   if (copy_to_user((void __user *)(unsigned long)cmd.response,
+   if (copy_to_user((void __user *)(unsigned long) cmd-response,
 resp, sizeof(resp))) {
ret = -EFAULT;
goto err3;
@@ -1198,6 +1198,38 @@ err1:
return ret;
 }
 
+static ssize_t ucma_join_ip_multicast(struct ucma_file *file,
+ const char __user *inbuf,
+ int in_len, int out_len)
+{
+   struct rdma_ucm_join_ip_mcast cmd;
+   struct rdma_ucm_join_mcast join_cmd;
+
+   if (copy_from_user(cmd, inbuf, sizeof(cmd)))
+   return -EFAULT;
+
+   join_cmd.response = cmd.response;
+   join_cmd.uid = cmd.uid;
+   join_cmd.id = cmd.id;
+   join_cmd.addr_size = rdma_addr_size((struct sockaddr *) cmd.addr);
+   join_cmd.reserved = 0;
+   memcpy(join_cmd.addr, cmd.addr, join_cmd.addr_size);
+
+   return ucma_process_join(file, join_cmd, out_len);
+}
+
+static ssize_t ucma_join_multicast(struct ucma_file *file,
+  const char __user *inbuf,
+  int in_len, int out_len)
+{
+   struct rdma_ucm_join_mcast cmd;
+
+   if (copy_from_user(cmd, inbuf, sizeof(cmd)))
+   return -EFAULT;
+
+   return ucma_process_join(file, cmd, out_len);
+}
+
 static ssize_t ucma_leave_multicast(struct ucma_file *file,
const char __user *inbuf,
int in_len, int out_len)
@@ -1361,7 +1393,8 @@ static ssize_t (*ucma_cmd_table[])(struct ucma_file *file,
[RDMA_USER_CM_CMD_MIGRATE_ID]   = ucma_migrate_id,
[RDMA_USER_CM_CMD_QUERY]= ucma_query,
[RDMA_USER_CM_CMD_BIND] = ucma_bind,
-   [RDMA_USER_CM_CMD_RESOLVE_ADDR] = ucma_resolve_addr
+   [RDMA_USER_CM_CMD_RESOLVE_ADDR] = ucma_resolve_addr,
+   [RDMA_USER_CM_CMD_JOIN_MCAST]   = ucma_join_multicast
 };
 
 static ssize_t ucma_write(struct file *filp, const char __user *buf,
diff --git a/include/rdma/rdma_user_cm.h b/include/rdma/rdma_user_cm.h
index c546e18..9f14ca8 100644
--- a/include/rdma/rdma_user_cm.h
+++ b/include/rdma/rdma_user_cm.h
@@ -64,7 +64,8 @@ enum {
RDMA_USER_CM_CMD_MIGRATE_ID,
RDMA_USER_CM_CMD_QUERY,
RDMA_USER_CM_CMD_BIND,
-   RDMA_USER_CM_CMD_RESOLVE_ADDR
+   RDMA_USER_CM_CMD_RESOLVE_ADDR,
+   RDMA_USER_CM_CMD_JOIN_MCAST
 };
 
 /*
@@ -241,6 +242,15 @@ struct rdma_ucm_join_ip_mcast {
__u32 id;
 };
 
+struct rdma_ucm_join_mcast {
+   __u64 response; /* rdma_ucma_create_id_resp */
+   __u64 uid;
+   __u32 id;
+   __u16 addr_size;
+   __u16 

[PATCH] IB core: Fix locking on device numbers allocation

2010-03-23 Thread Eli Cohen
When the driver needs to dynamically allocate char device numbers in systems
with more than IB_UVERBS_MAX_DEVICES, it releases map lock, allocates a new
range and a new device number from that range, and only then re-acquires the
lock. This must be protected for the same reasoning that the map_lock spinlock
is used. Without protecting we could also end up calling alloc_chrdev_region()
a nubmer of times and cause a leakage. Fix this by replacing map_lock with a
mutex and apply on the all the allocation code.

Signed-off-by: Eli Cohen e...@mellanox.co.il
---
 drivers/infiniband/core/uverbs_main.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_main.c 
b/drivers/infiniband/core/uverbs_main.c
index d805cf3..9589c71 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -72,7 +72,7 @@ DEFINE_IDR(ib_uverbs_cq_idr);
 DEFINE_IDR(ib_uverbs_qp_idr);
 DEFINE_IDR(ib_uverbs_srq_idr);
 
-static DEFINE_SPINLOCK(map_lock);
+static DEFINE_MUTEX(map_lock);
 static DECLARE_BITMAP(dev_map, IB_UVERBS_MAX_DEVICES);
 
 static ssize_t (*uverbs_cmd_table[])(struct ib_uverbs_file *file,
@@ -738,15 +738,15 @@ static void ib_uverbs_add_one(struct ib_device *device)
kref_init(uverbs_dev-ref);
init_completion(uverbs_dev-comp);
 
-   spin_lock(map_lock);
+   mutex_lock(map_lock);
devnum = find_first_zero_bit(dev_map, IB_UVERBS_MAX_DEVICES);
if (devnum = IB_UVERBS_MAX_DEVICES) {
-   spin_unlock(map_lock);
devnum = find_overflow_devnum();
-   if (devnum  0)
+   if (devnum  0) {
+   mutex_unlock(map_lock);
goto err;
+   }
 
-   spin_lock(map_lock);
uverbs_dev-devnum = devnum + IB_UVERBS_MAX_DEVICES;
base = devnum + overflow_maj;
set_bit(devnum, overflow_map);
@@ -755,7 +755,7 @@ static void ib_uverbs_add_one(struct ib_device *device)
base = devnum + IB_UVERBS_BASE_DEV;
set_bit(devnum, dev_map);
}
-   spin_unlock(map_lock);
+   mutex_unlock(map_lock);
 
uverbs_dev-ib_dev   = device;
uverbs_dev-num_comp_vectors = device-num_comp_vectors;
-- 
1.7.0.3

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ummunotify: progress at last!

2010-03-23 Thread Jeff Squyres
On Mar 23, 2010, at 1:29 PM, Jason Gunthorpe wrote:

  What do you poll on the fd for?
 
 poll() is for apps that want to get the notifications without
 spinning on the counter.

Ah, ok.

I think even with the ummunotify interface, that would work, too.  Meaning: 
since you have to read() from the fd to get event details, poll() would *also* 
tell you if there was something to read (in addition to checking if 
(last_counter != counter)).  The counter is a fast way of checking -- e.g., if 
you need to check in your fast path (which MPI's likely will).  poll() could be 
used if you don't care if the check is slow.

 If you don't think that is worth doing it
 does simplify things alot, just add two new verbs calls:
 
 ibv_set_mmu_counter(verbs, my_counter);
 ibv_get_mmu_notifications(verbs, my_list, sizeof(my_list));

I have no real opinion on whether the mmap/read should be hidden by the above 
ibv calls or not.  Either is fine with me.  I would *assume* that 
ibv_get_mmu_notifications() is non-blocking, right?  E.g., if you ask for N and 
only M are available (where M  N), then the call returns with only M items 
filled (and M could be 0).  Perhaps you need another parameter to indicate how 
many items in my_list were actually filled?  Or is that the return value?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC] [PATCH 0/22] [for 2.6.36] rdma/cm: add support for native IB addressing

2010-03-23 Thread Sean Hefty
The following patch series extends the rdma_cm to support native
Infiniband addressing through the use of a new AF_IB address family.
It defines a new struct sockaddr_ib that may be used to specify an
IB GID, along with other IB address attributes, such as the pkey and
service ID.

The kernel patches are also available from:

git://git.openfabrics.org/~shefty/rdma-dev.git af_ib

The higher level intent is to support a user space call, rdma_getaddrinfo,
which can return AF_IB addresses to an application.  This allows the
rdma_cm to support transport specific features, such as failover and
non-reversible paths.  (An implementation of rdma_getaddrinfo is included
in a separate patch set to the librdmacm.)

User space patches are available from:

git://git.openfabrics.org/~shefty/librdmacm.git af_ib

Since there are 33 user space patches to the librdmacm, including new
functionality to simplify establishing connections, I will wait a day or
so before posting them.

Updates to ib_acm are at:

git://git.openfabrics.org/~shefty/ibacm.git

- Sean

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ummunotify: progress at last!

2010-03-23 Thread Jason Gunthorpe
On Tue, Mar 23, 2010 at 03:17:12PM -0400, Jeff Squyres wrote:

  If you don't think that is worth doing it
  does simplify things alot, just add two new verbs calls:
  
  ibv_set_mmu_counter(verbs, my_counter);
  ibv_get_mmu_notifications(verbs, my_list, sizeof(my_list));
 
 I have no real opinion on whether the mmap/read should be hidden by
 the above ibv calls or not.  Either is fine with me.  I would
 *assume* that ibv_get_mmu_notifications() is non-blocking, right?
 E.g., if you ask for N and only M are available (where M  N), then
 the call returns with only M items filled (and M could be 0).
 Perhaps you need another parameter to indicate how many items in
 my_list were actually filled?  Or is that the return value?
 
Right, non-blocking, return value indicates length, like read().

These are not hiding mmap/read, they are new uverbs 'syscalls' that
get the kernel to perform that operation.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ummunotify: progress at last!

2010-03-23 Thread Jeff Squyres
On Mar 23, 2010, at 3:52 PM, Jason Gunthorpe wrote:

   ibv_set_mmu_counter(verbs, my_counter);
   ibv_get_mmu_notifications(verbs, my_list, sizeof(my_list));
 
 These are not hiding mmap/read, they are new uverbs 'syscalls' that
 get the kernel to perform that operation.

Oh -- so there's 2 mechanisms to get the counter info (for example):

1. the above uverb
2. mmap

Right?

I don't really have an opinion here -- I'm not really an owner of the ibv 
API.  As long as there is a fast/mmap way for me to get the counter without an 
extra function call, I'm happy.  :-)

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ummunotify: progress at last!

2010-03-23 Thread Jason Gunthorpe
On Tue, Mar 23, 2010 at 04:01:21PM -0400, Jeff Squyres wrote:
 On Mar 23, 2010, at 3:52 PM, Jason Gunthorpe wrote:
 
ibv_set_mmu_counter(verbs, my_counter);
ibv_get_mmu_notifications(verbs, my_list, sizeof(my_list));
  
  These are not hiding mmap/read, they are new uverbs 'syscalls' that
  get the kernel to perform that operation.
 
 Oh -- so there's 2 mechanisms to get the counter info (for example):
 
 1. the above uverb
 2. mmap
 
 Right?

No, there is no mmap. Like this:

u64 my_counter = 0;

ibv_set_mmu_counter(verbs, my_counter);
[..]
while (my_counter != last_my_counter) {
last_my_counter = my_counter;
ibv_get_mmu_notifications(verbs, ...);   // - I am a memory barrier as well
}

The kernel 'syscall' ibv_set_mmu_counter would bind the given verbs to
the 8 byte counter you specified without having to the mmap thing. As
I understand it this is what perfevents does.

Integrating with the verbs api avoids the need for another device
file. That is good. Eliminating poll() from the API can remove the
dedicated fd entirely. Within verbs I guess we could replace poll()
with something like a completion channel (??) if anyone cares.

So the user space visible functionality boils down to 2 new 'syscalls'
and the new flag to ibv_reg_mr.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: opensm/main.c: foce stdout to be line-buffered

2010-03-23 Thread Yevgeny Kliteynik

On 23/Mar/10 18:37, Sasha Khapyorsky wrote:

On 14:25 Tue 23 Mar , Yevgeny Kliteynik wrote:


I'm running opensm  somefile, and I don't see SM's stdout
(such as SUBNET UP message, or new cached options after SIGHUP),
because when stdout is assigned to file and not terminal, it is
handled differently. Instead of flushing on printing '\n',
it becomes buffered, which means that you don't control when
is this buffer flushed.
My fix forces stdout to always flush stdout when printing '\n'.
It has no effect when stdout is assigned to terminal, and it
changes buffering when SM's stdout is redirected.

More details about stdout/stderr buffering:

http://www.pixelbeat.org/programming/stdio_buffering/


There you can find couple of ways to workaround this issue, for example:

   stdbuf -o L opensm  somefile


This is not a usual shell command, nor some common
tool that is used in Linux distros. It's just a tool
that is provided in some package called coreutils 7.5.
 

I would prefer to not change an external settings so the program would
work as expected.


IMHO, on the contrary - the expected behavior
would be achieved with the patch.

But in any case, what exactly seems problematic here?

I can't see any impact on any aspect whatsoever.
It is somehow related to performance (more flushes
when the stream is line-buffered instead of just
buffered), but we're talking about stdout here, not
the log file, so the performance is not affected too.

-- Yevgeny
 

Sasha



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ummunotify: progress at last!

2010-03-23 Thread Roland Dreier
  What do you poll on the fd for?

  With ummunotify, you only read() from the fd when (counter !=
  last_counter).  Were you thinking that the poll() would be for
  something else?

The ummunotify fd supported poll() (and SIGIO etc).  You don't have to
use it if you don't want to, but I made it as much of a normal fd as
possible.

 - R.
-- 
Roland Dreier  rola...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ummunotify: progress at last!

2010-03-23 Thread Jason Gunthorpe
On Tue, Mar 23, 2010 at 10:55:08PM -0700, Roland Dreier wrote:
   I would prefer to do this by adding a new verbs call that returns a fd
   directly. Ie use ib_uverbs_alloc_event_file and act like
   ibv_create_comp_channel.
   
   The main reason for the new FD is so it can be polled on..
 
 Agree, we don't want a new device node I don't think -- too hard to
 associate an fd you get from a separate open() with a uverbs context.

Right, that would suck..
 
   You can also avoid the mmap scheme by doing what perf events does,
   pass in a pointer from userspace and have the kernel pin that page it
   is on.
 
 I wonder, is that a win?  I guess you don't even have to pin it, just do
 copy_to_user() to update the counter, but mmap doesn't seem so bad.

Not sure you can call copy_to_user from a mmu_notifier callback? What
if it faults?

I think the idea would be that by letting user space select the
counter location it could place it in a sensible cachline/etc and
probably avoid a pointer indirection.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ummunotify: progress at last!

2010-03-23 Thread Roland Dreier
  No, there is no mmap. Like this:
  
  u64 my_counter = 0;
  
  ibv_set_mmu_counter(verbs, my_counter);
  [..]
  while (my_counter != last_my_counter) {
  last_my_counter = my_counter;
  ibv_get_mmu_notifications(verbs, ...);   // - I am a memory barrier as 
  well
  }
  
  The kernel 'syscall' ibv_set_mmu_counter would bind the given verbs to
  the 8 byte counter you specified without having to the mmap thing. As
  I understand it this is what perfevents does.
  
  Integrating with the verbs api avoids the need for another device
  file. That is good. Eliminating poll() from the API can remove the
  dedicated fd entirely. Within verbs I guess we could replace poll()
  with something like a completion channel (??) if anyone cares.

That is all definitely doable.  I wonder if it's better to get rid of
the dedicated fd though.  After all, having the fd means a fancy app can
do poll() or sigio or whatever internally.  Being able to integrate into
an fd-driven event loop seems like a pretty big thing to give up.

Also, having a uverbs syscall that is exactly like read() seems a bit
of a stretch, even within the ugliness that is the uverbs interface.
(We love all our children, but sometimes we have to be realistic)

 - R.
-- 
Roland Dreier  rola...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html