Re: [PATCH] infiniband-diags: Ignore PortInfo data on down port.
On 11:19 Fri 19 Mar , Ira Weiny wrote: From: Ira Weiny wei...@llnl.gov Date: Thu, 18 Mar 2010 18:35:34 -0700 Subject: [PATCH] infiniband-diags: Ignore PortInfo data on down port. According to C14-24.2.1: If PortInfo:PortState == Down then only PortInfo:PortState and PortInfo:PortPhysicalState _must_ be valid. Other fields may be invalid depending on the vendor. Therefore ignore all PortInfo data other than those fields when reporting PortInfo on a down port. Signed-off-by: Ira Weiny wei...@llnl.gov Applied. Thanks. Sasha -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next-2.6 PATCH] ipoib: remove addrlen check for mc addresses
On Mon, Mar 22, 2010 at 02:21:39PM +0100, Jiri Pirko wrote: Finally this bit can be removed. Currently, after the bonding driver is changed/fixed (32a806c194ea112cfab00f558482dd97bee5e44e net-next-2.6), Could you send a link to the git tree where I can find this commit and the related fixes? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next-2.6 PATCH] ipoib: remove addrlen check for mc addresses
Eli Cohen wrote: Could you send a link to the git tree where I can find this commit and the related fixes? basically, as the subject line suggests, it should be in Dave's net-next tree Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: opensm/main.c: foce stdout to be line-buffered
Hi Yevgeny, On 10:26 Wed 03 Mar , Yevgeny Kliteynik wrote: When stdout is assigned to a terminal, it is line-buffered. But when opensm's stdout is redirected to a file, stdout becomes block-buffered, which means that '\n' won't cause the buffer to be flushed. Such redirection happens in daemon mode. Another case would be 'opensm somefile '. Where do you see the problem? Would '-d2' option be related to the issue? Forcing stdout to always be line-buffered. Signed-off-by: Yevgeny Kliteynik klit...@dev.mellanox.co.il --- opensm/opensm/main.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c index f9a33af..5ea65dd 100644 --- a/opensm/opensm/main.c +++ b/opensm/opensm/main.c @@ -613,6 +613,9 @@ int main(int argc, char *argv[]) {NULL, 0, NULL, 0} /* Required at the end of the array */ }; + /* force stdout to be line-buffered */ + setlinebuf(stdout); What about stderr? IOW describe your problem in more details (see above). Sasha + /* Make sure that the opensm and complib were compiled using same modes (debug/free) */ if (osm_is_debug() != cl_is_debug()) { -- 1.5.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next-2.6 PATCH] ipoib: remove addrlen check for mc addresses
Could you send a link to the git tree where I can find this commit and the related fixes? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html See commit 5e47596bee12597824a3b5b21e20f80b61e58a35 for the fix prior to this one. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Perfquery can be too noisy.
Hi Mike, On 15:01 Thu 04 Mar , Mike Heinz wrote: When perfquery is run against fabrics that do not support PortXmitWait, it emits this warning for every port: ibwarn: [23225] dump_perfcounters: PortXmitWait not indicated so ignore this counter Tee patch is malformed (white space mangled). When running ibcheckerrors on a large fabric, this leads to a flood of warnings. The proposed patch reduces the warning to a verbose message So I'm applying this verbosity reducing part by hands. and, on fabrics that do not support PortXmitWait, it suppresses the output of the XmitWait attribute. This could be done, but see the comment below. Signed-off-by: Michael Heinz michaelhe...@qlogic.com diff --git a/infiniband-diags/src/perfquery.c b/infiniband-diags/src/perfquery.c index 3ae692c..812b4c2 100644 --- a/infiniband-diags/src/perfquery.c +++ b/infiniband-diags/src/perfquery.c @@ -282,6 +282,25 @@ static void output_aggregate_perfcounters_ext(ib_portid_t * portid) ALL_PORTS, buf); } +static int dump_fields(char *buf, int bufsz, void *data, int start, int end) +{ + char val[64]; + char *s = buf; + int n, field; + for (field = start; field = end bufsz 0; field++) { + mad_decode_field(data, field, val); + if (!mad_dump_field(field, s, bufsz, val)) + return -1; + n = strlen(s); + s += n; + *s++ = '\n'; + *s = 0; + n++; + bufsz -= n; + } + return (int)(s - buf); +} Seems that this low level stuf is copied from libibmad. Wouldn't it be better just to add appropriate function there (if needed). Sasha + static void dump_perfcounters(int extended, int timeout, uint16_t cap_mask, ib_portid_t * portid, int port, int aggregate) { @@ -293,8 +312,7 @@ static void dump_perfcounters(int extended, int timeout, uint16_t cap_mask, IBERROR(perfquery); if (!(cap_mask 0x1000)) { /* if PortCounters:PortXmitWait not supported clear this counter */ - IBWARN - (PortXmitWait not indicated so ignore this counter); + VERBOSE(PortXmitWait not indicated so ignore this counter); perf_count.xmtwait = 0; mad_encode_field(pc, IB_PC_XMT_WAIT_F, perf_count.xmtwait); @@ -302,7 +320,8 @@ static void dump_perfcounters(int extended, int timeout, uint16_t cap_mask, if (aggregate) aggregate_perfcounters(); else - mad_dump_perfcounters(buf, sizeof buf, pc, sizeof pc); + dump_fields(buf, sizeof buf, pc, IB_PC_FIRST_F, + (cap_mask 0x1000)?IB_PC_LAST_F:IB_PC_RCV_PKTS_F); } else { if (!(cap_mask 0x200))/* 1.2 errata: bit 9 is extended counter support */ IBWARN -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: infiniband-diags/ibstatus: add link_layer for RoCEE support
On 12:37 Tue 09 Mar , Yevgeny Kliteynik wrote: RoCEE introduces new file in sysfs: link_layer. Assume IB if the file doesn't exist (driver w/o RoCEE support). Signed-off-by: Yevgeny Kliteynik klit...@dev.mellanox.co.il Applied. Thanks. Sasha -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next-2.6 PATCH] ipoib: remove addrlen check for mc addresses
On Tue, Mar 23, 2010 at 12:34:13PM +0200, Or Gerlitz wrote: basically, as the subject line suggests, it should be in Dave's net-next tree I just need to clone this tree and need the url. Can you give it to me from .git/config? Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: libibumad: added link_layer for RoCEE support and updated umad version
On 12:39 Tue 09 Mar , Yevgeny Kliteynik wrote: Added link_layer field to umad_port_t. The field is implemented as char[]. If the relevant file doesn't exist in sysfs, the link layer is IB. Otherwise, the content of link_layer file is used. The libibumad version is promoted. Signed-off-by: Yevgeny Kliteynik klit...@dev.mellanox.co.il Applied but see below. Thanks. --- libibumad/include/infiniband/umad.h |2 ++ libibumad/libibumad.ver |2 +- libibumad/src/umad.c|5 + 3 files changed, 8 insertions(+), 1 deletions(-) diff --git a/libibumad/include/infiniband/umad.h b/libibumad/include/infiniband/umad.h index 1f82183..f9da204 100644 --- a/libibumad/include/infiniband/umad.h +++ b/libibumad/include/infiniband/umad.h @@ -116,6 +116,7 @@ typedef struct ib_user_mad { #define SYS_PORT_RATErate #define SYS_PORT_GUIDport_guid #define SYS_PORT_GID gids/0 +#define SYS_PORT_LINK_LAYER link_layer typedef struct umad_port { char ca_name[UMAD_CA_NAME_LEN]; @@ -132,6 +133,7 @@ typedef struct umad_port { uint64_t port_guid; unsigned pkeys_size; uint16_t *pkeys; + char link_layer[UMAD_CA_NAME_LEN]; } umad_port_t; typedef struct umad_ca { diff --git a/libibumad/libibumad.ver b/libibumad/libibumad.ver index 57cddbd..225738c 100644 --- a/libibumad/libibumad.ver +++ b/libibumad/libibumad.ver @@ -6,4 +6,4 @@ # API_REV - advance on any added API # RUNNING_REV - advance any change to the vendor files # AGE - number of backward versions the API still supports -LIBVERSION=2:1:0 +LIBVERSION=2:2:0 This patch actually breaks ABI (mostly for umad_port_t users - see for example update_umad_port() in opensm/libvendor/osm_vendor_ibumad_sa.c). So it looks that API_REV should be advanced as well. I will do this before next release. Sasha diff --git a/libibumad/src/umad.c b/libibumad/src/umad.c index 277ae6b..d16e750 100644 --- a/libibumad/src/umad.c +++ b/libibumad/src/umad.c @@ -159,6 +159,11 @@ static int get_port(char *ca_name, char *dir, int portnum, umad_port_t * port) if (sys_read_uint(port_dir, SYS_PORT_CAPMASK, port-capmask) 0) goto clean; + if (sys_read_string(port_dir, SYS_PORT_LINK_LAYER, + port-link_layer, UMAD_CA_NAME_LEN) 0) + /* assume IB by default */ + sprintf(port-link_layer, IB); + port-capmask = htonl(port-capmask); if (sys_read_gid(port_dir, SYS_PORT_GID, gid) 0) -- 1.5.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next-2.6 PATCH] ipoib: remove addrlen check for mc addresses
Eli Cohen wrote: On Tue, Mar 23, 2010 at 12:34:13PM +0200, Or Gerlitz wrote: basically, as the subject line suggests, it should be in Dave's net-next tree I just need to clone this tree and need the url. Can you give it to me from .git/config? Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html maybe your'e looking for this one http://git.kernel.org/?p=linux/kernel/git/davem/net-next-2.6.git;a=commit;h=32a806c194ea112cfab00f558482dd97bee5e44e -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: infiniband-diags/ibstat.c: print link layer for RoCEE support
On 12:41 Tue 09 Mar , Yevgeny Kliteynik wrote: Signed-off-by: Yevgeny Kliteynik klit...@dev.mellanox.co.il Applied. Thanks. Sasha -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: opensm/main.c: foce stdout to be line-buffered
Hi Sasha, On 23/Mar/10 12:37, Sasha Khapyorsky wrote: Hi Yevgeny, On 10:26 Wed 03 Mar , Yevgeny Kliteynik wrote: When stdout is assigned to a terminal, it is line-buffered. But when opensm's stdout is redirected to a file, stdout becomes block-buffered, which means that '\n' won't cause the buffer to be flushed. Such redirection happens in daemon mode. Another case would be 'opensm somefile '. Where do you see the problem? The problematic case is 'opensm somefile'. Would '-d2' option be related to the issue? '-d2' refers to the log file, I'm talking about printf to stdout. Forcing stdout to always be line-buffered. Signed-off-by: Yevgeny Kliteynikklit...@dev.mellanox.co.il --- opensm/opensm/main.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c index f9a33af..5ea65dd 100644 --- a/opensm/opensm/main.c +++ b/opensm/opensm/main.c @@ -613,6 +613,9 @@ int main(int argc, char *argv[]) {NULL, 0, NULL, 0} /* Required at the end of the array */ }; + /* force stdout to be line-buffered */ + setlinebuf(stdout); What about stderr? stderr is always unbuffered, no matter where is stderr actually assigned to. IOW describe your problem in more details (see above). I'm running opensm somefile, and I don't see SM's stdout (such as SUBNET UP message, or new cached options after SIGHUP), because when stdout is assigned to file and not terminal, it is handled differently. Instead of flushing on printing '\n', it becomes buffered, which means that you don't control when is this buffer flushed. My fix forces stdout to always flush stdout when printing '\n'. It has no effect when stdout is assigned to terminal, and it changes buffering when SM's stdout is redirected. More details about stdout/stderr buffering: http://www.pixelbeat.org/programming/stdio_buffering/ -- Yevgeny Sasha + /* Make sure that the opensm and complib were compiled using same modes (debug/free) */ if (osm_is_debug() != cl_is_debug()) { -- 1.5.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: opensm/main.c: foce stdout to be line-buffered
On 14:25 Tue 23 Mar , Yevgeny Kliteynik wrote: I'm running opensm somefile, and I don't see SM's stdout (such as SUBNET UP message, or new cached options after SIGHUP), because when stdout is assigned to file and not terminal, it is handled differently. Instead of flushing on printing '\n', it becomes buffered, which means that you don't control when is this buffer flushed. My fix forces stdout to always flush stdout when printing '\n'. It has no effect when stdout is assigned to terminal, and it changes buffering when SM's stdout is redirected. More details about stdout/stderr buffering: http://www.pixelbeat.org/programming/stdio_buffering/ There you can find couple of ways to workaround this issue, for example: stdbuf -o L opensm somefile I would prefer to not change an external settings so the program would work as expected. Sasha -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ummunotify: progress at last!
On Tue, Mar 23, 2010 at 12:06:50PM -0400, Jeff Squyres wrote: IBM has found a resource that they think will be able to progress Roland's ummunotify work. After a few discussions in Sonoma last week and some off-list emails, here's what we decided: 1. Take Roland's last code drop (Roland: can you re-send the last copy of your code?). 2. Do not convert it to the perf events kernel framework as the Linux kernel community requested. Instead, migrate the functionality into the ibv code base. Roland thinks that most of the code should be adaptable without too many changes. Here's the highlights of the new functionality: a. Add a new flag to ibv_reg_mr() that does the same function as UMMUNOTIFY_REGISTER_REGION b. ibv_dereg_mr() always performs the equivalent of UMMUNOTIFY_UNREGISTER_REGION (if necessary) c. Make a new device somewhere (under /dev/infiniband?) that performs the same functions as /dev/ummunotify (open it to mmap the counter into user space, and read events when something interesting happens) I would prefer to do this by adding a new verbs call that returns a fd directly. Ie use ib_uverbs_alloc_event_file and act like ibv_create_comp_channel. The main reason for the new FD is so it can be polled on.. You can also avoid the mmap scheme by doing what perf events does, pass in a pointer from userspace and have the kernel pin that page it is on. So, I'd suggest fd = ibv_create_mmu_monitor(verbs, counter); [..] poll(fd); [..] if (counter != last_counter) [..] close(fd); Refuse to create more than one mmu_monitor for each verbs for now. I looked at this for a little while at Sonoma and I think it is quite straightforward, I'm happy to look over anything. Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ummunotify: progress at last!
On Mar 23, 2010, at 12:59 PM, Jason Gunthorpe wrote: The main reason for the new FD is so it can be polled on.. What do you poll on the fd for? With ummunotify, you only read() from the fd when (counter != last_counter). Were you thinking that the poll() would be for something else? -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ummunotify: progress at last!
On Tue, Mar 23, 2010 at 01:17:40PM -0400, Jeff Squyres wrote: On Mar 23, 2010, at 12:59 PM, Jason Gunthorpe wrote: The main reason for the new FD is so it can be polled on.. What do you poll on the fd for? With ummunotify, you only read() from the fd when (counter != last_counter). Were you thinking that the poll() would be for something else? poll() is for apps that want to get the notifications without spinning on the counter. If you don't think that is worth doing it does simplify things alot, just add two new verbs calls: ibv_set_mmu_counter(verbs, my_counter); ibv_get_mmu_notifications(verbs, my_list, sizeof(my_list)); Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] [PATCH 0/22] [for 2.6.36] rdma/cm: add support for native IB addressing
The following patch series extends the rdma_cm to support native Infiniband addressing through the use of a new AF_IB address family. It defines a new struct sockaddr_ib that may be used to specify an IB GID, along with other IB address attributes, such as the pkey and service ID. The higher level intent is to support a user space call, rdma_getaddrinfo, which can return AF_IB addresses to an application. This allows the rdma_cm to support transport specific features, such as failover and non-reversible paths. (An implementation of rdma_getaddrinfo is included in a separate patch set to the librdmacm.) This set is very lightly tested, but I wanted to start soliciting feedback with more extensive testing ongoing in parallel. For more casual reviewers of patches, the first patch in the series is probably the most important, as it defines struct sockaddr_ib. I think a reasonable kernel target for these patches would be 2.6.36. Signed-off-by: Sean Hefty sean.he...@intel.com -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] [PATCH 2/22] [for 2.6.36] rdma/cm: fix handling of ipv6 addressing in cma_use_port
cma_use_port is coded assuming that the sockaddr is an ipv4 address. Since ipv6 addressing is supported, and also to support other address families, make the code more generic in its address handling. Signed-off-by: Sean Hefty sean.he...@intel.com --- drivers/infiniband/core/cma.c | 29 ++--- 1 files changed, 22 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 875e34e..9041a2b 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -643,6 +643,21 @@ static inline int cma_any_addr(struct sockaddr *addr) return cma_zero_addr(addr) || cma_loopback_addr(addr); } +static int cma_addr_cmp(struct sockaddr *src, struct sockaddr *dst) +{ + if (src-sa_family != dst-sa_family) + return -1; + + switch (src-sa_family) { + case AF_INET: + return ((struct sockaddr_in *) src)-sin_addr.s_addr != + ((struct sockaddr_in *) dst)-sin_addr.s_addr; + default: + return ipv6_addr_cmp(((struct sockaddr_in6 *) src)-sin6_addr, +((struct sockaddr_in6 *) dst)-sin6_addr); + } +} + static inline __be16 cma_port(struct sockaddr *addr) { if (addr-sa_family == AF_INET) @@ -2014,13 +2029,13 @@ err1: static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv) { struct rdma_id_private *cur_id; - struct sockaddr_in *sin, *cur_sin; + struct sockaddr *addr, *cur_addr; struct rdma_bind_list *bind_list; struct hlist_node *node; unsigned short snum; - sin = (struct sockaddr_in *) id_priv-id.route.addr.src_addr; - snum = ntohs(sin-sin_port); + addr = (struct sockaddr *) id_priv-id.route.addr.src_addr; + snum = ntohs(cma_port(addr)); if (snum PROT_SOCK !capable(CAP_NET_BIND_SERVICE)) return -EACCES; @@ -2032,15 +2047,15 @@ static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv) * We don't support binding to any address if anyone is bound to * a specific address on the same port. */ - if (cma_any_addr((struct sockaddr *) id_priv-id.route.addr.src_addr)) + if (cma_any_addr(addr)) return -EADDRNOTAVAIL; hlist_for_each_entry(cur_id, node, bind_list-owners, node) { - if (cma_any_addr((struct sockaddr *) cur_id-id.route.addr.src_addr)) + cur_addr = (struct sockaddr *) cur_id-id.route.addr.src_addr; + if (cma_any_addr(cur_addr)) return -EADDRNOTAVAIL; - cur_sin = (struct sockaddr_in *) cur_id-id.route.addr.src_addr; - if (sin-sin_addr.s_addr == cur_sin-sin_addr.s_addr) + if (!cma_addr_cmp(addr, cur_addr)) return -EADDRINUSE; } -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] [PATCH 3/22] [for 2.6.36] rdma/cm: include AF_IB in loopback and any address checks
Enhance checks for loopback and any address to support AF_IB in addition to AF_INET and AF_INT6. This will allow future patches to use AF_IB when binding and resolving addresses. Signed-off-by: Sean Hefty sean.he...@intel.com --- drivers/infiniband/core/cma.c | 35 --- 1 files changed, 20 insertions(+), 15 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 9041a2b..6460fbf 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -46,6 +46,7 @@ #include rdma/rdma_cm.h #include rdma/rdma_cm_ib.h +#include rdma/ib.h #include rdma/ib_cache.h #include rdma/ib_cm.h #include rdma/ib_sa.h @@ -616,26 +617,30 @@ EXPORT_SYMBOL(rdma_init_qp_attr); static inline int cma_zero_addr(struct sockaddr *addr) { - struct in6_addr *ip6; - - if (addr-sa_family == AF_INET) - return ipv4_is_zeronet( - ((struct sockaddr_in *)addr)-sin_addr.s_addr); - else { - ip6 = ((struct sockaddr_in6 *) addr)-sin6_addr; - return (ip6-s6_addr32[0] | ip6-s6_addr32[1] | - ip6-s6_addr32[2] | ip6-s6_addr32[3]) == 0; + switch (addr-sa_family) { + case AF_INET: + return ipv4_is_zeronet(((struct sockaddr_in *)addr)-sin_addr.s_addr); + case AF_INET6: + return ipv6_addr_any(((struct sockaddr_in6 *) addr)-sin6_addr); + case AF_IB: + return ib_addr_any(((struct sockaddr_ib *) addr)-sib_addr); + default: + return 0; } } static inline int cma_loopback_addr(struct sockaddr *addr) { - if (addr-sa_family == AF_INET) - return ipv4_is_loopback( - ((struct sockaddr_in *) addr)-sin_addr.s_addr); - else - return ipv6_addr_loopback( - ((struct sockaddr_in6 *) addr)-sin6_addr); + switch (addr-sa_family) { + case AF_INET: + return ipv4_is_loopback(((struct sockaddr_in *) addr)-sin_addr.s_addr); + case AF_INET6: + return ipv6_addr_loopback(((struct sockaddr_in6 *) addr)-sin6_addr); + case AF_IB: + return ib_addr_loopback(((struct sockaddr_ib *) addr)-sib_addr); + default: + return 0; + } } static inline int cma_any_addr(struct sockaddr *addr) -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] [PATCH 6/22] [for 2.6.36] rdma/cm: Allow user to specify AF_IB when binding
Modify rdma_bind_addr to allow the user to specify AF_IB when binding to a device. AF_IB indicates that the user is not mapping an IP address to the native IB addressing. (The mapping may have already been done, or is not needed.) The format of the SID is still controlled by the rdma cm, but is now exported in its entirety, rather than just the 16 bit port value based on the RDMA CM IP annex. Signed-off-by: Sean Hefty sean.he...@intel.com --- drivers/infiniband/core/cma.c | 21 + 1 files changed, 17 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 1546236..0a3bbf9 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -324,6 +324,13 @@ static int cma_set_qkey(struct rdma_id_private *id_priv) return ret; } +static void cma_translate_ib(struct sockaddr_ib *addr, struct rdma_dev_addr *dev_addr) +{ + dev_addr-dev_type = ARPHRD_INFINIBAND; + rdma_addr_set_sgid(dev_addr, (union ib_gid *) addr-sib_addr); + ib_addr_set_pkey(dev_addr, ntohs(addr-sib_pkey)); +} + static int cma_acquire_dev(struct rdma_id_private *id_priv) { struct rdma_dev_addr *dev_addr = id_priv-id.route.addr.dev_addr; @@ -2148,7 +2155,8 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr) struct rdma_id_private *id_priv; int ret; - if (addr-sa_family != AF_INET addr-sa_family != AF_INET6) + if (addr-sa_family != AF_INET addr-sa_family != AF_INET6 + addr-sa_family != AF_IB) return -EAFNOSUPPORT; id_priv = container_of(id, struct rdma_id_private, id); @@ -2160,9 +2168,14 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr) goto err1; if (!cma_any_addr(addr)) { - ret = rdma_translate_ip(addr, id-route.addr.dev_addr); - if (ret) - goto err1; + if (addr-sa_family == AF_IB) { + cma_translate_ib((struct sockaddr_ib *) addr, +id-route.addr.dev_addr); + } else { + ret = rdma_translate_ip(addr, id-route.addr.dev_addr); + if (ret) + goto err1; + } mutex_lock(lock); ret = cma_acquire_dev(id_priv); -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] [PATCH 7/22] [for 2.6.36] rdma/cm: do not modify sa_family when setting loopback address
cma_resolve_loopback is called after an rdma_cm_id has been bound to a specific sa_family and port. Once the source sa_family for the id has been set, do not modify it. Only the actual IP address portion of the source address needs to be set. As part of this fix, we can simplify setting the source address by moving the loopback address assignment from cma_resolve_loopback to cma_bind_loopback. cma_bind_loopback is only invoked when the source address is the loopback address. Finally, add loopback support for AF_IB as part of the change. Signed-off-by: Sean Hefty sean.he...@intel.com --- drivers/infiniband/core/cma.c | 31 ++- 1 files changed, 18 insertions(+), 13 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 0a3bbf9..7981a85 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1776,6 +1776,23 @@ err: } EXPORT_SYMBOL(rdma_resolve_route); +static void cma_set_loopback(struct sockaddr *addr) +{ + switch (addr-sa_family) { + case AF_INET: + ((struct sockaddr_in *) addr)-sin_addr.s_addr = htonl(INADDR_LOOPBACK); + break; + case AF_INET6: + ipv6_addr_set(((struct sockaddr_in6 *) addr)-sin6_addr, + 0, 0, 0, htonl(1)); + break; + default: + ib_addr_set(((struct sockaddr_ib *) addr)-sib_addr, + 0, 0, 0, htonl(1)); + break; + } +} + static int cma_bind_loopback(struct rdma_id_private *id_priv) { struct cma_device *cma_dev; @@ -1816,6 +1833,7 @@ port_found: ib_addr_set_pkey(id_priv-id.route.addr.dev_addr, pkey); id_priv-id.port_num = p; cma_attach_to_dev(id_priv, cma_dev); + cma_set_loopback((struct sockaddr *) id_priv-id.route.addr.src_addr); out: mutex_unlock(lock); return ret; @@ -1870,7 +1888,6 @@ out: static int cma_resolve_loopback(struct rdma_id_private *id_priv) { struct cma_work *work; - struct sockaddr *src, *dst; union ib_gid gid; int ret; @@ -1887,18 +1904,6 @@ static int cma_resolve_loopback(struct rdma_id_private *id_priv) rdma_addr_get_sgid(id_priv-id.route.addr.dev_addr, gid); rdma_addr_set_dgid(id_priv-id.route.addr.dev_addr, gid); - src = (struct sockaddr *) id_priv-id.route.addr.src_addr; - if (cma_zero_addr(src)) { - dst = (struct sockaddr *) id_priv-id.route.addr.dst_addr; - if ((src-sa_family = dst-sa_family) == AF_INET) { - ((struct sockaddr_in *) src)-sin_addr.s_addr = - ((struct sockaddr_in *) dst)-sin_addr.s_addr; - } else { - ipv6_addr_copy(((struct sockaddr_in6 *) src)-sin6_addr, - ((struct sockaddr_in6 *) dst)-sin6_addr); - } - } - work-id = id_priv; INIT_WORK(work-work, cma_work_handler); work-old_state = CMA_ADDR_QUERY; -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] [PATCH 8/22] [for 2.6.36] rdma/cm: restrict AF_IB loopback to binding to IB devices only
If a user specifies AF_IB as the source address for a loopback connection, limit the resolution to IB devices only. Signed-off-by: Sean Hefty sean.he...@intel.com --- drivers/infiniband/core/cma.c | 30 ++ 1 files changed, 22 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 7981a85..0bc33cc 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1795,26 +1795,40 @@ static void cma_set_loopback(struct sockaddr *addr) static int cma_bind_loopback(struct rdma_id_private *id_priv) { - struct cma_device *cma_dev; + struct cma_device *cma_dev, *cur_dev; + struct sockaddr *addr; struct ib_port_attr port_attr; union ib_gid gid; u16 pkey; int ret; u8 p; + cma_dev = NULL; + addr = (struct sockaddr *) id_priv-id.route.addr.src_addr; mutex_lock(lock); - if (list_empty(dev_list)) { + list_for_each_entry(cur_dev, dev_list, list) { + if (addr-sa_family == AF_IB + rdma_node_get_transport(cur_dev-device-node_type) != RDMA_TRANSPORT_IB) + continue; + + if (!cma_dev) + cma_dev = cur_dev; + + for (p = 1; p = cur_dev-device-phys_port_cnt; ++p) { + if (!ib_query_port(cur_dev-device, p, port_attr) + port_attr.state == IB_PORT_ACTIVE) { + cma_dev = cur_dev; + goto port_found; + } + } + } + + if (!cma_dev) { ret = -ENODEV; goto out; } - list_for_each_entry(cma_dev, dev_list, list) - for (p = 1; p = cma_dev-device-phys_port_cnt; ++p) - if (!ib_query_port(cma_dev-device, p, port_attr) - port_attr.state == IB_PORT_ACTIVE) - goto port_found; p = 1; - cma_dev = list_entry(dev_list.next, struct cma_device, list); port_found: ret = ib_get_cached_gid(cma_dev-device, p, 0, gid); -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] [PATCH 9/22] [for 2.6.36] rdma/cm: add support for AF_IB to rdma_resolve_addr
Allow the user to specify the remote address using AF_IB format. When AF_IB is used, the remote address simply needs to be recorded, and no resolution using ARP is done. The local address may still need to be resolved however. Signed-off-by: Sean Hefty sean.he...@intel.com --- drivers/infiniband/core/cma.c | 108 +++-- 1 files changed, 102 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 0bc33cc..c7b6912 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -350,6 +350,62 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv) return ret; } +/* + * Select the source IB device and address to reach the destination IB address. + */ +static int cma_resolve_ib_dev(struct rdma_id_private *id_priv) +{ + struct cma_device *cma_dev, *cur_dev; + struct sockaddr_ib *addr; + union ib_gid gid, sgid, *dgid; + u16 pkey, index; + u8 port, p; + int i; + + cma_dev = NULL; + addr = (struct sockaddr_ib *) id_priv-id.route.addr.dst_addr; + dgid = (union ib_gid *) addr-sib_addr; + pkey = ntohs(addr-sib_pkey); + + list_for_each_entry(cur_dev, dev_list, list) { + if (rdma_node_get_transport(cur_dev-device-node_type) != RDMA_TRANSPORT_IB) + continue; + + for (p = 1; p = cur_dev-device-phys_port_cnt; ++p) { + if (ib_find_cached_pkey(cur_dev-device, p, pkey, index)) + continue; + + for (i = 0; !ib_get_cached_gid(cur_dev-device, p, i, gid); i++) { + if (!memcmp(gid, dgid, sizeof(gid))) { + cma_dev = cur_dev; + sgid = gid; + port = p; + goto found; + } + + if (!cma_dev (gid.global.subnet_prefix == +dgid-global.subnet_prefix)) { + cma_dev = cur_dev; + sgid = gid; + port = p; + } + } + } + } + + if (!cma_dev) { + return -ENODEV; + } + +found: + cma_attach_to_dev(id_priv, cma_dev); + id_priv-id.port_num = port; + addr = (struct sockaddr_ib *) id_priv-id.route.addr.src_addr; + memcpy(addr-sib_addr, sgid, sizeof sgid); + cma_translate_ib(addr, id_priv-id.route.addr.dev_addr); + return 0; +} + static void cma_deref_id(struct rdma_id_private *id_priv) { if (atomic_dec_and_test(id_priv-refcount)) @@ -1930,14 +1986,48 @@ err: return ret; } +static int cma_resolve_ib_addr(struct rdma_id_private *id_priv) +{ + struct cma_work *work; + int ret; + + work = kzalloc(sizeof *work, GFP_KERNEL); + if (!work) + return -ENOMEM; + + if (!id_priv-cma_dev) { + ret = cma_resolve_ib_dev(id_priv); + if (ret) + goto err; + } + + rdma_addr_set_dgid(id_priv-id.route.addr.dev_addr, (union ib_gid *) + (((struct sockaddr_ib *) id_priv-id.route.addr.dst_addr)-sib_addr)); + + work-id = id_priv; + INIT_WORK(work-work, cma_work_handler); + work-old_state = CMA_ADDR_QUERY; + work-new_state = CMA_ADDR_RESOLVED; + work-event.event = RDMA_CM_EVENT_ADDR_RESOLVED; + queue_work(cma_wq, work-work); + return 0; +err: + kfree(work); + return ret; +} + static int cma_bind_addr(struct rdma_cm_id *id, struct sockaddr *src_addr, struct sockaddr *dst_addr) { if (!src_addr || !src_addr-sa_family) { src_addr = (struct sockaddr *) id-route.addr.src_addr; - if ((src_addr-sa_family = dst_addr-sa_family) == AF_INET6) { + src_addr-sa_family = dst_addr-sa_family; + if (dst_addr-sa_family == AF_INET6) { ((struct sockaddr_in6 *) src_addr)-sin6_scope_id = ((struct sockaddr_in6 *) dst_addr)-sin6_scope_id; + } else if (dst_addr-sa_family == AF_IB) { + ((struct sockaddr_ib *) src_addr)-sib_pkey = + ((struct sockaddr_ib *) dst_addr)-sib_pkey; } } return rdma_bind_addr(id, src_addr); @@ -1961,12 +2051,18 @@ int rdma_resolve_addr(struct rdma_cm_id *id, struct sockaddr *src_addr, atomic_inc(id_priv-refcount); memcpy(id-route.addr.dst_addr, dst_addr, rdma_addr_size(dst_addr)); - if (cma_any_addr(dst_addr)) + if (cma_any_addr(dst_addr)) { ret =
[RFC] [PATCH 11/22] [for 2.6.36] rdma/cm: fixup white space
Fix white space issue that bugs me. Signed-off-by: Sean Hefty sean.he...@intel.com --- drivers/infiniband/core/cma.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 326bda6..b4adcc6 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1657,7 +1657,7 @@ static int cma_query_ib_route(struct rdma_id_private *id_priv, int timeout_ms, path_rec.numb_path = 1; path_rec.reversible = 1; path_rec.service_id = cma_get_service_id(id_priv-id.ps, - (struct sockaddr *) addr-dst_addr); +(struct sockaddr *) addr-dst_addr); comp_mask = IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID | IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH | -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] [PATCH 12/22] [for 2.6.36] rdma/cm: expose private data when using AF_IB
If the source or destination address is AF_IB, then do not reserve a portion of the private data in the IB CM REQ or SIDR REQ messages for the cma header. Instead, all private data should be exported to the user. When AF_IB is used, the rdma cm does not have sufficient information to fill in the cma header. Additionally, this will be necessary to support any IB connection through the rdma cm interface, Signed-off-by: Sean Hefty sean.he...@intel.com --- rdma_getaddrinfo will end up formatting the private data for the user if necessary. drivers/infiniband/core/cma.c | 46 + 1 files changed, 23 insertions(+), 23 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index b4adcc6..0d3c4ef 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -818,14 +818,13 @@ static void cma_save_net_info(struct rdma_addr *addr, } } -static inline int cma_user_data_offset(enum rdma_port_space ps) +static inline int cma_user_data_offset(struct rdma_cm_id *id) { - switch (ps) { - case RDMA_PS_SDP: + if (id-ps == RDMA_PS_SDP || id-route.addr.src_addr.ss_family == AF_IB || + id-route.addr.dst_addr.ss_family == AF_IB) return 0; - default: + else return sizeof(struct cma_hdr); - } } static void cma_cancel_route(struct rdma_id_private *id_priv) @@ -1201,7 +1200,7 @@ static int cma_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event) return -ECONNABORTED; memset(event, 0, sizeof event); - offset = cma_user_data_offset(listen_id-id.ps); + offset = cma_user_data_offset(listen_id-id); event.event = RDMA_CM_EVENT_CONNECT_REQUEST; if (cma_is_ud_ps(listen_id-id.ps)) { conn_id = cma_new_udp_id(listen_id-id, ib_event); @@ -2320,19 +2319,19 @@ err1: } EXPORT_SYMBOL(rdma_bind_addr); -static int cma_format_hdr(void *hdr, enum rdma_port_space ps, - struct rdma_route *route) +static int cma_format_hdr(void *hdr, struct rdma_cm_id *id) { struct cma_hdr *cma_hdr; struct sdp_hh *sdp_hdr; - if (route-addr.src_addr.ss_family == AF_INET) { + if (id-route.addr.src_addr.ss_family == AF_INET + id-route.addr.dst_addr.ss_family == AF_INET) { struct sockaddr_in *src4, *dst4; - src4 = (struct sockaddr_in *) route-addr.src_addr; - dst4 = (struct sockaddr_in *) route-addr.dst_addr; + src4 = (struct sockaddr_in *) id-route.addr.src_addr; + dst4 = (struct sockaddr_in *) id-route.addr.dst_addr; - switch (ps) { + switch (id-ps) { case RDMA_PS_SDP: sdp_hdr = hdr; if (sdp_get_majv(sdp_hdr-sdp_version) != SDP_MAJ_VERSION) @@ -2351,13 +2350,14 @@ static int cma_format_hdr(void *hdr, enum rdma_port_space ps, cma_hdr-port = src4-sin_port; break; } - } else { + } else if (id-route.addr.src_addr.ss_family == AF_INET6 + id-route.addr.dst_addr.ss_family == AF_INET6) { struct sockaddr_in6 *src6, *dst6; - src6 = (struct sockaddr_in6 *) route-addr.src_addr; - dst6 = (struct sockaddr_in6 *) route-addr.dst_addr; + src6 = (struct sockaddr_in6 *) id-route.addr.src_addr; + dst6 = (struct sockaddr_in6 *) id-route.addr.dst_addr; - switch (ps) { + switch (id-ps) { case RDMA_PS_SDP: sdp_hdr = hdr; if (sdp_get_majv(sdp_hdr-sdp_version) != SDP_MAJ_VERSION) @@ -2449,20 +2449,20 @@ static int cma_resolve_ib_udp(struct rdma_id_private *id_priv, { struct ib_cm_sidr_req_param req; struct rdma_route *route; - int ret; + int offset, ret; - req.private_data_len = sizeof(struct cma_hdr) + - conn_param-private_data_len; + offset = cma_user_data_offset(id_priv-id); + req.private_data_len = offset + conn_param-private_data_len; req.private_data = kzalloc(req.private_data_len, GFP_ATOMIC); if (!req.private_data) return -ENOMEM; if (conn_param-private_data conn_param-private_data_len) - memcpy((void *) req.private_data + sizeof(struct cma_hdr), + memcpy((void *) req.private_data + offset, conn_param-private_data, conn_param-private_data_len); route = id_priv-id.route; - ret = cma_format_hdr((void *) req.private_data, id_priv-id.ps, route); + ret = cma_format_hdr((void *) req.private_data, id_priv-id); if (ret) goto out; @@ -2498,7 +2498,7 @@ static int cma_connect_ib(struct rdma_id_private *id_priv,
[RFC] [PATCH 13/22] [for 2.6.36] rdma/cm: only listen on IB devices when using AF_IB
If an rdma_cm_id is bound to AF_IB, with a wild card address, only listen on IB devices. Signed-off-by: Sean Hefty sean.he...@intel.com --- drivers/infiniband/core/cma.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 0d3c4ef..c5bb70c 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1535,6 +1535,10 @@ static void cma_listen_on_dev(struct rdma_id_private *id_priv, struct rdma_cm_id *id; int ret; + if (id_priv-id.route.addr.src_addr.ss_family == AF_IB + rdma_node_get_transport(cma_dev-device-node_type) != RDMA_TRANSPORT_IB) + return; + id = rdma_create_id(cma_listen_handler, id_priv, id_priv-id.ps); if (IS_ERR(id)) return; -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] [PATCH 14/22] [for 2.6.36] rdma/ucm: support querying for AF_IB addresses
The sockaddr structure for AF_IB is larger than sockaddr_in6. The rdma cm user space ABI uses the latter to exchange address information between user space and the kernel. To support querying for larger addresses, define a new query command that exchanges data using sockaddr_storage, rather than sockaddr_in6. Unlike the existing query_route command, the new command only returns address information. Route (i.e. path record) data is separated. Signed-off-by: Sean Hefty sean.he...@intel.com --- drivers/infiniband/core/ucma.c | 76 +++- include/rdma/rdma_user_cm.h| 22 ++-- 2 files changed, 93 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c index b2e16c3..da5a3ec 100644 --- a/drivers/infiniband/core/ucma.c +++ b/drivers/infiniband/core/ucma.c @@ -44,6 +44,7 @@ #include rdma/ib_marshall.h #include rdma/rdma_cm.h #include rdma/rdma_cm_ib.h +#include rdma/ib_addr.h MODULE_AUTHOR(Sean Hefty); MODULE_DESCRIPTION(RDMA Userspace Connection Manager Access); @@ -586,7 +587,7 @@ static ssize_t ucma_query_route(struct ucma_file *file, const char __user *inbuf, int in_len, int out_len) { - struct rdma_ucm_query_route cmd; + struct rdma_ucm_query cmd; struct rdma_ucm_query_route_resp resp; struct ucma_context *ctx; struct sockaddr *addr; @@ -633,6 +634,76 @@ out: return ret; } +static void ucma_query_device_addr(struct rdma_cm_id *cm_id, + struct rdma_ucm_query_addr_resp *resp) +{ + if (!cm_id-device) + return; + + resp-node_guid = (__force __u64) cm_id-device-node_guid; + resp-port_num = cm_id-port_num; + resp-pkey = (__force __u16) cpu_to_be16( +ib_addr_get_pkey(cm_id-route.addr.dev_addr)); +} + +static ssize_t ucma_query_addr(struct ucma_context *ctx, + void __user *response, int out_len) +{ + struct rdma_ucm_query_addr_resp resp; + struct sockaddr *addr; + int ret = 0; + + if (out_len sizeof(resp)) + return -ENOSPC; + + memset(resp, 0, sizeof resp); + + addr = (struct sockaddr *) ctx-cm_id-route.addr.src_addr; + resp.src_size = rdma_addr_size(addr); + memcpy(resp.src_addr, addr, resp.src_size); + + addr = (struct sockaddr *) ctx-cm_id-route.addr.dst_addr; + resp.dst_size = rdma_addr_size(addr); + memcpy(resp.dst_addr, addr, resp.dst_size); + + ucma_query_device_addr(ctx-cm_id, resp); + + if (copy_to_user(response, resp, sizeof(resp))) + ret = -EFAULT; + + return ret; +} + +static ssize_t ucma_query(struct ucma_file *file, + const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_query cmd; + struct ucma_context *ctx; + void __user *response; + int ret; + + if (copy_from_user(cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + response = (void __user *)(unsigned long) cmd.response; + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + switch (cmd.option) { + case RDMA_USER_CM_QUERY_ADDR: + ret = ucma_query_addr(ctx, response, out_len); + break; + default: + ret = -ENOSYS; + break; + } + + ucma_put_ctx(ctx); + return ret; +} + static void ucma_copy_conn_param(struct rdma_conn_param *dst, struct rdma_ucm_conn_param *src) { @@ -1151,7 +1222,8 @@ static ssize_t (*ucma_cmd_table[])(struct ucma_file *file, [RDMA_USER_CM_CMD_NOTIFY] = ucma_notify, [RDMA_USER_CM_CMD_JOIN_MCAST] = ucma_join_multicast, [RDMA_USER_CM_CMD_LEAVE_MCAST] = ucma_leave_multicast, - [RDMA_USER_CM_CMD_MIGRATE_ID] = ucma_migrate_id + [RDMA_USER_CM_CMD_MIGRATE_ID] = ucma_migrate_id, + [RDMA_USER_CM_CMD_QUERY]= ucma_query }; static ssize_t ucma_write(struct file *filp, const char __user *buf, diff --git a/include/rdma/rdma_user_cm.h b/include/rdma/rdma_user_cm.h index 1d16502..0d9e984 100644 --- a/include/rdma/rdma_user_cm.h +++ b/include/rdma/rdma_user_cm.h @@ -61,7 +61,8 @@ enum { RDMA_USER_CM_CMD_NOTIFY, RDMA_USER_CM_CMD_JOIN_MCAST, RDMA_USER_CM_CMD_LEAVE_MCAST, - RDMA_USER_CM_CMD_MIGRATE_ID + RDMA_USER_CM_CMD_MIGRATE_ID, + RDMA_USER_CM_CMD_QUERY }; /* @@ -112,10 +113,14 @@ struct rdma_ucm_resolve_route { __u32 timeout_ms; }; -struct rdma_ucm_query_route { +enum { + RDMA_USER_CM_QUERY_ADDR +}; + +struct rdma_ucm_query { __u64 response; __u32 id; - __u32 reserved; + __u32 option; }; struct rdma_ucm_query_route_resp { @@ -128,6 +133,17 @@ struct
[RFC] [PATCH 15/22] [for 2.6.36] ib/sa: export function to pack a path record into wire format
Allow converting from struct ib_sa_path_rec to the IB defined SA path record wire format. This will be used to report path data from the rdma cm into user space. Signed-off-by: Sean Hefty sean.he...@intel.com --- drivers/infiniband/core/sa_query.c |6 ++ include/rdma/ib_sa.h |6 ++ 2 files changed, 12 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index 7e1ffd8..54ec971 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -610,6 +610,12 @@ void ib_sa_unpack_path(void *attribute, struct ib_sa_path_rec *rec) } EXPORT_SYMBOL(ib_sa_unpack_path); +void ib_sa_pack_path(struct ib_sa_path_rec *rec, void *attribute) +{ + ib_pack(path_rec_table, ARRAY_SIZE(path_rec_table), rec, attribute); +} +EXPORT_SYMBOL(ib_sa_pack_path); + static void ib_sa_path_rec_callback(struct ib_sa_query *sa_query, int status, struct ib_sa_mad *mad) diff --git a/include/rdma/ib_sa.h b/include/rdma/ib_sa.h index 1082afa..86aa772 100644 --- a/include/rdma/ib_sa.h +++ b/include/rdma/ib_sa.h @@ -385,4 +385,10 @@ int ib_init_ah_from_path(struct ib_device *device, u8 port_num, */ void ib_sa_unpack_path(void *attribute, struct ib_sa_path_rec *rec); +/** + * ib_sa_pack_path - Conert a path record from struct ib_sa_path_rec + * to IB MAD wire format. + */ +void ib_sa_pack_path(struct ib_sa_path_rec *rec, void *attribute); + #endif /* IB_SA_H */ -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] [PATCH 16/22] [for 2.6.36] rdma/ucm: support querying when IB paths are not reversible
The current query_route call can return up to two path records. The assumption being that one is the primary path, with optional support for an alternate path. In both cases, the paths are assumed to be reversible and are used to send CM MADs. With the ability to manually set IB path data, the rdma cm can eventually be capable of using up to 6 paths per connection: forward primary, reverse primary, forward alternate, reverse alternate, reversible primary path for CM MADs reversible alternate path for CM MADs. (It is unclear at this time if IB routing will complicate this.) In order to handle more flexible routing topologies, add a new command to report any number of paths. Signed-off-by: Sean Hefty sean.he...@intel.com --- drivers/infiniband/core/ucma.c | 35 +++ include/rdma/rdma_user_cm.h|9 - 2 files changed, 43 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c index da5a3ec..86115ec 100644 --- a/drivers/infiniband/core/ucma.c +++ b/drivers/infiniband/core/ucma.c @@ -674,6 +674,38 @@ static ssize_t ucma_query_addr(struct ucma_context *ctx, return ret; } +static ssize_t ucma_query_path(struct ucma_context *ctx, + void __user *response, int out_len) +{ + struct rdma_ucm_query_path_resp *resp; + int i, ret = 0; + + if (out_len sizeof(*resp)) + return -ENOSPC; + + resp = kzalloc(out_len, GFP_KERNEL); + if (!resp) + return -ENOMEM; + + resp-num_paths = ctx-cm_id-route.num_paths; + for (i = 0, out_len -= sizeof(*resp); +i resp-num_paths out_len sizeof(struct ib_path_rec_data); +i++, out_len -= sizeof(struct ib_path_rec_data)) { + + resp-path_data[i].flags = IB_PATH_GMP | IB_PATH_PRIMARY | + IB_PATH_BIDIRECTIONAL; + ib_sa_pack_path(ctx-cm_id-route.path_rec[i], + resp-path_data[i].path_rec); + } + + if (copy_to_user(response, resp, +sizeof(*resp) + (i * sizeof(struct ib_path_rec_data + ret = -EFAULT; + + kfree(resp); + return ret; +} + static ssize_t ucma_query(struct ucma_file *file, const char __user *inbuf, int in_len, int out_len) @@ -695,6 +727,9 @@ static ssize_t ucma_query(struct ucma_file *file, case RDMA_USER_CM_QUERY_ADDR: ret = ucma_query_addr(ctx, response, out_len); break; + case RDMA_USER_CM_QUERY_PATH: + ret = ucma_query_path(ctx, response, out_len); + break; default: ret = -ENOSYS; break; diff --git a/include/rdma/rdma_user_cm.h b/include/rdma/rdma_user_cm.h index 0d9e984..ee48dde 100644 --- a/include/rdma/rdma_user_cm.h +++ b/include/rdma/rdma_user_cm.h @@ -114,7 +114,8 @@ struct rdma_ucm_resolve_route { }; enum { - RDMA_USER_CM_QUERY_ADDR + RDMA_USER_CM_QUERY_ADDR, + RDMA_USER_CM_QUERY_PATH }; struct rdma_ucm_query { @@ -144,6 +145,12 @@ struct rdma_ucm_query_addr_resp { struct sockaddr_storage dst_addr; }; +struct rdma_ucm_query_path_resp { + __u32 num_paths; + __u32 reserved; + struct ib_path_rec_data path_data[0]; +}; + struct rdma_ucm_conn_param { __u32 qp_num; __u32 reserved; -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] [PATCH 17/22] [for 2.6.36] rdma/cm: export cma_get_service_id
Allow the rdma_ucm to query the IB service ID formed or allocated by the rdma_cm by exporting the cma_get_service_id functionality. Signed-off-by: Sean Hefty sean.he...@intel.com --- drivers/infiniband/core/cma.c | 19 ++- include/rdma/rdma_cm.h|7 +++ 2 files changed, 17 insertions(+), 9 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index c5bb70c..1d64342 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1256,13 +1256,14 @@ out: return ret; } -static __be64 cma_get_service_id(enum rdma_port_space ps, struct sockaddr *addr) +__be64 rdma_get_service_id(struct rdma_cm_id *id, struct sockaddr *addr) { if (addr-sa_family == AF_IB) return ((struct sockaddr_ib *) addr)-sib_sid; - return cpu_to_be64(((u64)ps 16) + be16_to_cpu(cma_port(addr))); + return cpu_to_be64(((u64)id-ps 16) + be16_to_cpu(cma_port(addr))); } +EXPORT_SYMBOL(rdma_get_service_id); static void cma_set_compare_data(enum rdma_port_space ps, struct sockaddr *addr, struct ib_cm_compare_data *compare) @@ -1478,7 +1479,7 @@ static int cma_ib_listen(struct rdma_id_private *id_priv) return PTR_ERR(id_priv-cm_id.ib); addr = (struct sockaddr *) id_priv-id.route.addr.src_addr; - svc_id = cma_get_service_id(id_priv-id.ps, addr); + svc_id = rdma_get_service_id(id_priv-id, addr); if (cma_any_addr(addr)) ret = ib_cm_listen(id_priv-cm_id.ib, svc_id, 0, NULL); else { @@ -1659,8 +1660,8 @@ static int cma_query_ib_route(struct rdma_id_private *id_priv, int timeout_ms, path_rec.pkey = cpu_to_be16(ib_addr_get_pkey(addr-dev_addr)); path_rec.numb_path = 1; path_rec.reversible = 1; - path_rec.service_id = cma_get_service_id(id_priv-id.ps, -(struct sockaddr *) addr-dst_addr); + path_rec.service_id = rdma_get_service_id(id_priv-id, + (struct sockaddr *) addr-dst_addr); comp_mask = IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID | IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH | @@ -2478,8 +2479,8 @@ static int cma_resolve_ib_udp(struct rdma_id_private *id_priv, } req.path = route-path_rec; - req.service_id = cma_get_service_id(id_priv-id.ps, - (struct sockaddr *) route-addr.dst_addr); + req.service_id = rdma_get_service_id(id_priv-id, +(struct sockaddr *) route-addr.dst_addr); req.timeout_ms = 1 (CMA_CM_RESPONSE_TIMEOUT - 8); req.max_cm_retries = CMA_MAX_CM_RETRIES; @@ -2529,8 +2530,8 @@ static int cma_connect_ib(struct rdma_id_private *id_priv, if (route-num_paths == 2) req.alternate_path = route-path_rec[1]; - req.service_id = cma_get_service_id(id_priv-id.ps, - (struct sockaddr *) route-addr.dst_addr); + req.service_id = rdma_get_service_id(id_priv-id, +(struct sockaddr *) route-addr.dst_addr); req.qp_num = id_priv-qp_num; req.qp_type = IB_QPT_RC; req.starting_psn = id_priv-seq_num; diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h index c6b2962..ffc544f 100644 --- a/include/rdma/rdma_cm.h +++ b/include/rdma/rdma_cm.h @@ -330,4 +330,11 @@ void rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr); */ void rdma_set_service_type(struct rdma_cm_id *id, int tos); +/** + * rdma_get_service_id - Return the IB service ID for a specified address. + * @id: Communication identifier associated with the address. + * @addr: Address for the service ID. + */ +__be64 rdma_get_service_id(struct rdma_cm_id *id, struct sockaddr *addr); + #endif /* RDMA_CM_H */ -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] [PATCH 20/22] [for 2.6.36] rdma/ucm: allow user space to bind to AF_IB
Support user space binding to addresses using AF_IB. Since sockaddr_ib is larger than sockaddr_in6, we need to define a larger structure when binding using AF_IB. This time we use sockaddr_storage to cover future cases. Signed-off-by: Sean Hefty sean.he...@intel.com --- drivers/infiniband/core/ucma.c | 27 ++- include/rdma/rdma_user_cm.h| 10 +- 2 files changed, 35 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c index 950bd35..e2d8dcf 100644 --- a/drivers/infiniband/core/ucma.c +++ b/drivers/infiniband/core/ucma.c @@ -514,6 +514,30 @@ static ssize_t ucma_bind_ip(struct ucma_file *file, const char __user *inbuf, return ret; } +static ssize_t ucma_bind(struct ucma_file *file, const char __user *inbuf, +int in_len, int out_len) +{ + struct rdma_ucm_bind cmd; + struct sockaddr *addr; + struct ucma_context *ctx; + int ret; + + if (copy_from_user(cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + addr = (struct sockaddr *) cmd.addr; + if (cmd.reserved || !cmd.addr_size || (cmd.addr_size != rdma_addr_size(addr))) + return -EINVAL; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + ret = rdma_bind_addr(ctx-cm_id, addr); + ucma_put_ctx(ctx); + return ret; +} + static ssize_t ucma_resolve_ip(struct ucma_file *file, const char __user *inbuf, int in_len, int out_len) @@ -1308,7 +1332,8 @@ static ssize_t (*ucma_cmd_table[])(struct ucma_file *file, [RDMA_USER_CM_CMD_JOIN_IP_MCAST]= ucma_join_ip_multicast, [RDMA_USER_CM_CMD_LEAVE_MCAST] = ucma_leave_multicast, [RDMA_USER_CM_CMD_MIGRATE_ID] = ucma_migrate_id, - [RDMA_USER_CM_CMD_QUERY]= ucma_query + [RDMA_USER_CM_CMD_QUERY]= ucma_query, + [RDMA_USER_CM_CMD_BIND] = ucma_bind }; static ssize_t ucma_write(struct file *filp, const char __user *buf, diff --git a/include/rdma/rdma_user_cm.h b/include/rdma/rdma_user_cm.h index bbb724b..009b8da 100644 --- a/include/rdma/rdma_user_cm.h +++ b/include/rdma/rdma_user_cm.h @@ -62,7 +62,8 @@ enum { RDMA_USER_CM_CMD_JOIN_IP_MCAST, RDMA_USER_CM_CMD_LEAVE_MCAST, RDMA_USER_CM_CMD_MIGRATE_ID, - RDMA_USER_CM_CMD_QUERY + RDMA_USER_CM_CMD_QUERY, + RDMA_USER_CM_CMD_BIND }; /* @@ -101,6 +102,13 @@ struct rdma_ucm_bind_ip { __u32 id; }; +struct rdma_ucm_bind { + __u32 id; + __u16 addr_size; + __u16 reserved; + struct sockaddr_storage addr; +}; + struct rdma_ucm_resolve_ip { struct sockaddr_in6 src_addr; struct sockaddr_in6 dst_addr; -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] [PATCH 22/22] [for 2.6.36] rdma/ucm: allow user space to specify AF_IB when joining multicast
Allow user space applications to join multicast groups using MGIDs directly. MGIDs may be passed using AF_IB addresses. Since the current multicast join command only supports addresses as large as sockaddr_in6, define a new structure for joining addresses specified using sockaddr_ib. Signed-off-by: Sean Hefty sean.he...@intel.com --- drivers/infiniband/core/ucma.c | 55 include/rdma/rdma_user_cm.h| 12 - 2 files changed, 55 insertions(+), 12 deletions(-) diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c index 2224a05..a9b917a 100644 --- a/drivers/infiniband/core/ucma.c +++ b/drivers/infiniband/core/ucma.c @@ -1139,23 +1139,23 @@ static ssize_t ucma_notify(struct ucma_file *file, const char __user *inbuf, return ret; } -static ssize_t ucma_join_ip_multicast(struct ucma_file *file, - const char __user *inbuf, - int in_len, int out_len) +static ssize_t ucma_process_join(struct ucma_file *file, +struct rdma_ucm_join_mcast *cmd, int out_len) { - struct rdma_ucm_join_ip_mcast cmd; struct rdma_ucm_create_id_resp resp; struct ucma_context *ctx; struct ucma_multicast *mc; + struct sockaddr *addr; int ret; if (out_len sizeof(resp)) return -ENOSPC; - if (copy_from_user(cmd, inbuf, sizeof(cmd))) - return -EFAULT; + addr = (struct sockaddr *) cmd-addr; + if (cmd-reserved || !cmd-addr_size || (cmd-addr_size != rdma_addr_size(addr))) + return -EINVAL; - ctx = ucma_get_ctx(file, cmd.id); + ctx = ucma_get_ctx(file, cmd-id); if (IS_ERR(ctx)) return PTR_ERR(ctx); @@ -1166,14 +1166,14 @@ static ssize_t ucma_join_ip_multicast(struct ucma_file *file, goto err1; } - mc-uid = cmd.uid; - memcpy(mc-addr, cmd.addr, sizeof cmd.addr); + mc-uid = cmd-uid; + memcpy(mc-addr, addr, cmd-addr_size); ret = rdma_join_multicast(ctx-cm_id, (struct sockaddr *) mc-addr, mc); if (ret) goto err2; resp.id = mc-id; - if (copy_to_user((void __user *)(unsigned long)cmd.response, + if (copy_to_user((void __user *)(unsigned long) cmd-response, resp, sizeof(resp))) { ret = -EFAULT; goto err3; @@ -1198,6 +1198,38 @@ err1: return ret; } +static ssize_t ucma_join_ip_multicast(struct ucma_file *file, + const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_join_ip_mcast cmd; + struct rdma_ucm_join_mcast join_cmd; + + if (copy_from_user(cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + join_cmd.response = cmd.response; + join_cmd.uid = cmd.uid; + join_cmd.id = cmd.id; + join_cmd.addr_size = rdma_addr_size((struct sockaddr *) cmd.addr); + join_cmd.reserved = 0; + memcpy(join_cmd.addr, cmd.addr, join_cmd.addr_size); + + return ucma_process_join(file, join_cmd, out_len); +} + +static ssize_t ucma_join_multicast(struct ucma_file *file, + const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_join_mcast cmd; + + if (copy_from_user(cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + return ucma_process_join(file, cmd, out_len); +} + static ssize_t ucma_leave_multicast(struct ucma_file *file, const char __user *inbuf, int in_len, int out_len) @@ -1361,7 +1393,8 @@ static ssize_t (*ucma_cmd_table[])(struct ucma_file *file, [RDMA_USER_CM_CMD_MIGRATE_ID] = ucma_migrate_id, [RDMA_USER_CM_CMD_QUERY]= ucma_query, [RDMA_USER_CM_CMD_BIND] = ucma_bind, - [RDMA_USER_CM_CMD_RESOLVE_ADDR] = ucma_resolve_addr + [RDMA_USER_CM_CMD_RESOLVE_ADDR] = ucma_resolve_addr, + [RDMA_USER_CM_CMD_JOIN_MCAST] = ucma_join_multicast }; static ssize_t ucma_write(struct file *filp, const char __user *buf, diff --git a/include/rdma/rdma_user_cm.h b/include/rdma/rdma_user_cm.h index c546e18..9f14ca8 100644 --- a/include/rdma/rdma_user_cm.h +++ b/include/rdma/rdma_user_cm.h @@ -64,7 +64,8 @@ enum { RDMA_USER_CM_CMD_MIGRATE_ID, RDMA_USER_CM_CMD_QUERY, RDMA_USER_CM_CMD_BIND, - RDMA_USER_CM_CMD_RESOLVE_ADDR + RDMA_USER_CM_CMD_RESOLVE_ADDR, + RDMA_USER_CM_CMD_JOIN_MCAST }; /* @@ -241,6 +242,15 @@ struct rdma_ucm_join_ip_mcast { __u32 id; }; +struct rdma_ucm_join_mcast { + __u64 response; /* rdma_ucma_create_id_resp */ + __u64 uid; + __u32 id; + __u16 addr_size; + __u16
[PATCH] IB core: Fix locking on device numbers allocation
When the driver needs to dynamically allocate char device numbers in systems with more than IB_UVERBS_MAX_DEVICES, it releases map lock, allocates a new range and a new device number from that range, and only then re-acquires the lock. This must be protected for the same reasoning that the map_lock spinlock is used. Without protecting we could also end up calling alloc_chrdev_region() a nubmer of times and cause a leakage. Fix this by replacing map_lock with a mutex and apply on the all the allocation code. Signed-off-by: Eli Cohen e...@mellanox.co.il --- drivers/infiniband/core/uverbs_main.c | 12 ++-- 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index d805cf3..9589c71 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -72,7 +72,7 @@ DEFINE_IDR(ib_uverbs_cq_idr); DEFINE_IDR(ib_uverbs_qp_idr); DEFINE_IDR(ib_uverbs_srq_idr); -static DEFINE_SPINLOCK(map_lock); +static DEFINE_MUTEX(map_lock); static DECLARE_BITMAP(dev_map, IB_UVERBS_MAX_DEVICES); static ssize_t (*uverbs_cmd_table[])(struct ib_uverbs_file *file, @@ -738,15 +738,15 @@ static void ib_uverbs_add_one(struct ib_device *device) kref_init(uverbs_dev-ref); init_completion(uverbs_dev-comp); - spin_lock(map_lock); + mutex_lock(map_lock); devnum = find_first_zero_bit(dev_map, IB_UVERBS_MAX_DEVICES); if (devnum = IB_UVERBS_MAX_DEVICES) { - spin_unlock(map_lock); devnum = find_overflow_devnum(); - if (devnum 0) + if (devnum 0) { + mutex_unlock(map_lock); goto err; + } - spin_lock(map_lock); uverbs_dev-devnum = devnum + IB_UVERBS_MAX_DEVICES; base = devnum + overflow_maj; set_bit(devnum, overflow_map); @@ -755,7 +755,7 @@ static void ib_uverbs_add_one(struct ib_device *device) base = devnum + IB_UVERBS_BASE_DEV; set_bit(devnum, dev_map); } - spin_unlock(map_lock); + mutex_unlock(map_lock); uverbs_dev-ib_dev = device; uverbs_dev-num_comp_vectors = device-num_comp_vectors; -- 1.7.0.3 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ummunotify: progress at last!
On Mar 23, 2010, at 1:29 PM, Jason Gunthorpe wrote: What do you poll on the fd for? poll() is for apps that want to get the notifications without spinning on the counter. Ah, ok. I think even with the ummunotify interface, that would work, too. Meaning: since you have to read() from the fd to get event details, poll() would *also* tell you if there was something to read (in addition to checking if (last_counter != counter)). The counter is a fast way of checking -- e.g., if you need to check in your fast path (which MPI's likely will). poll() could be used if you don't care if the check is slow. If you don't think that is worth doing it does simplify things alot, just add two new verbs calls: ibv_set_mmu_counter(verbs, my_counter); ibv_get_mmu_notifications(verbs, my_list, sizeof(my_list)); I have no real opinion on whether the mmap/read should be hidden by the above ibv calls or not. Either is fine with me. I would *assume* that ibv_get_mmu_notifications() is non-blocking, right? E.g., if you ask for N and only M are available (where M N), then the call returns with only M items filled (and M could be 0). Perhaps you need another parameter to indicate how many items in my_list were actually filled? Or is that the return value? -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC] [PATCH 0/22] [for 2.6.36] rdma/cm: add support for native IB addressing
The following patch series extends the rdma_cm to support native Infiniband addressing through the use of a new AF_IB address family. It defines a new struct sockaddr_ib that may be used to specify an IB GID, along with other IB address attributes, such as the pkey and service ID. The kernel patches are also available from: git://git.openfabrics.org/~shefty/rdma-dev.git af_ib The higher level intent is to support a user space call, rdma_getaddrinfo, which can return AF_IB addresses to an application. This allows the rdma_cm to support transport specific features, such as failover and non-reversible paths. (An implementation of rdma_getaddrinfo is included in a separate patch set to the librdmacm.) User space patches are available from: git://git.openfabrics.org/~shefty/librdmacm.git af_ib Since there are 33 user space patches to the librdmacm, including new functionality to simplify establishing connections, I will wait a day or so before posting them. Updates to ib_acm are at: git://git.openfabrics.org/~shefty/ibacm.git - Sean -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ummunotify: progress at last!
On Tue, Mar 23, 2010 at 03:17:12PM -0400, Jeff Squyres wrote: If you don't think that is worth doing it does simplify things alot, just add two new verbs calls: ibv_set_mmu_counter(verbs, my_counter); ibv_get_mmu_notifications(verbs, my_list, sizeof(my_list)); I have no real opinion on whether the mmap/read should be hidden by the above ibv calls or not. Either is fine with me. I would *assume* that ibv_get_mmu_notifications() is non-blocking, right? E.g., if you ask for N and only M are available (where M N), then the call returns with only M items filled (and M could be 0). Perhaps you need another parameter to indicate how many items in my_list were actually filled? Or is that the return value? Right, non-blocking, return value indicates length, like read(). These are not hiding mmap/read, they are new uverbs 'syscalls' that get the kernel to perform that operation. Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ummunotify: progress at last!
On Mar 23, 2010, at 3:52 PM, Jason Gunthorpe wrote: ibv_set_mmu_counter(verbs, my_counter); ibv_get_mmu_notifications(verbs, my_list, sizeof(my_list)); These are not hiding mmap/read, they are new uverbs 'syscalls' that get the kernel to perform that operation. Oh -- so there's 2 mechanisms to get the counter info (for example): 1. the above uverb 2. mmap Right? I don't really have an opinion here -- I'm not really an owner of the ibv API. As long as there is a fast/mmap way for me to get the counter without an extra function call, I'm happy. :-) -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ummunotify: progress at last!
On Tue, Mar 23, 2010 at 04:01:21PM -0400, Jeff Squyres wrote: On Mar 23, 2010, at 3:52 PM, Jason Gunthorpe wrote: ibv_set_mmu_counter(verbs, my_counter); ibv_get_mmu_notifications(verbs, my_list, sizeof(my_list)); These are not hiding mmap/read, they are new uverbs 'syscalls' that get the kernel to perform that operation. Oh -- so there's 2 mechanisms to get the counter info (for example): 1. the above uverb 2. mmap Right? No, there is no mmap. Like this: u64 my_counter = 0; ibv_set_mmu_counter(verbs, my_counter); [..] while (my_counter != last_my_counter) { last_my_counter = my_counter; ibv_get_mmu_notifications(verbs, ...); // - I am a memory barrier as well } The kernel 'syscall' ibv_set_mmu_counter would bind the given verbs to the 8 byte counter you specified without having to the mmap thing. As I understand it this is what perfevents does. Integrating with the verbs api avoids the need for another device file. That is good. Eliminating poll() from the API can remove the dedicated fd entirely. Within verbs I guess we could replace poll() with something like a completion channel (??) if anyone cares. So the user space visible functionality boils down to 2 new 'syscalls' and the new flag to ibv_reg_mr. Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: opensm/main.c: foce stdout to be line-buffered
On 23/Mar/10 18:37, Sasha Khapyorsky wrote: On 14:25 Tue 23 Mar , Yevgeny Kliteynik wrote: I'm running opensm somefile, and I don't see SM's stdout (such as SUBNET UP message, or new cached options after SIGHUP), because when stdout is assigned to file and not terminal, it is handled differently. Instead of flushing on printing '\n', it becomes buffered, which means that you don't control when is this buffer flushed. My fix forces stdout to always flush stdout when printing '\n'. It has no effect when stdout is assigned to terminal, and it changes buffering when SM's stdout is redirected. More details about stdout/stderr buffering: http://www.pixelbeat.org/programming/stdio_buffering/ There you can find couple of ways to workaround this issue, for example: stdbuf -o L opensm somefile This is not a usual shell command, nor some common tool that is used in Linux distros. It's just a tool that is provided in some package called coreutils 7.5. I would prefer to not change an external settings so the program would work as expected. IMHO, on the contrary - the expected behavior would be achieved with the patch. But in any case, what exactly seems problematic here? I can't see any impact on any aspect whatsoever. It is somehow related to performance (more flushes when the stream is line-buffered instead of just buffered), but we're talking about stdout here, not the log file, so the performance is not affected too. -- Yevgeny Sasha -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ummunotify: progress at last!
What do you poll on the fd for? With ummunotify, you only read() from the fd when (counter != last_counter). Were you thinking that the poll() would be for something else? The ummunotify fd supported poll() (and SIGIO etc). You don't have to use it if you don't want to, but I made it as much of a normal fd as possible. - R. -- Roland Dreier rola...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ummunotify: progress at last!
On Tue, Mar 23, 2010 at 10:55:08PM -0700, Roland Dreier wrote: I would prefer to do this by adding a new verbs call that returns a fd directly. Ie use ib_uverbs_alloc_event_file and act like ibv_create_comp_channel. The main reason for the new FD is so it can be polled on.. Agree, we don't want a new device node I don't think -- too hard to associate an fd you get from a separate open() with a uverbs context. Right, that would suck.. You can also avoid the mmap scheme by doing what perf events does, pass in a pointer from userspace and have the kernel pin that page it is on. I wonder, is that a win? I guess you don't even have to pin it, just do copy_to_user() to update the counter, but mmap doesn't seem so bad. Not sure you can call copy_to_user from a mmu_notifier callback? What if it faults? I think the idea would be that by letting user space select the counter location it could place it in a sensible cachline/etc and probably avoid a pointer indirection. Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ummunotify: progress at last!
No, there is no mmap. Like this: u64 my_counter = 0; ibv_set_mmu_counter(verbs, my_counter); [..] while (my_counter != last_my_counter) { last_my_counter = my_counter; ibv_get_mmu_notifications(verbs, ...); // - I am a memory barrier as well } The kernel 'syscall' ibv_set_mmu_counter would bind the given verbs to the 8 byte counter you specified without having to the mmap thing. As I understand it this is what perfevents does. Integrating with the verbs api avoids the need for another device file. That is good. Eliminating poll() from the API can remove the dedicated fd entirely. Within verbs I guess we could replace poll() with something like a completion channel (??) if anyone cares. That is all definitely doable. I wonder if it's better to get rid of the dedicated fd though. After all, having the fd means a fancy app can do poll() or sigio or whatever internally. Being able to integrate into an fd-driven event loop seems like a pretty big thing to give up. Also, having a uverbs syscall that is exactly like read() seems a bit of a stretch, even within the ugliness that is the uverbs interface. (We love all our children, but sometimes we have to be realistic) - R. -- Roland Dreier rola...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html