Re: XRC and ib_poll_cq
On 4/3/2013 3:35 AM, Hefty, Sean wrote: Thanks, it seems that I asked the question in a wrong way. The context is what I need eventually but in order to find it I first need to recover the QP or the XRC SRQ. Per core there may be several QP and SRQ using the same CQ. When doing ib_poll_cq on the CQ the WQE in hand has only an ib_qp pointer. Correct - dereference the qp pointer to get the context, srq, and srq_context. struct ib_wc wc; ib_poll_cq(...wc) my_qp = wc.qp; my_qp_context = wc.qp-qp_context; my_srq = wc.qp-srq; my_srq_context = wc.qp-srq-srq_context; this tells you what QP/SRQ the completion is related to. Hi Sean, Your answer actually raises another question. According to the annex the XRC SRQ is used directly (it is like an RD QP but without a requester side), and also in the user space example xrc_pingpong.c no QP is created on-top of the XRC SRQ. So the question is what is this QP that you de-referenced in the statement my_srq = wc.qp-srq;? Is it just and artifact of the implementation? S.P. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC-v2 10/12] iser-target: Add logic for verbs
On 02/04/2013 09:18, Or Gerlitz wrote: On 23/03/2013 01:55, Nicholas A. Bellinger wrote: +++ b/drivers/infiniband/ulp/isert/isert_verbs.h @@ -0,0 +1,5 @@ +extern void isert_connect_release(struct isert_conn *); +extern void isert_put_conn(struct isert_conn *); +extern int isert_cma_handler(struct rdma_cm_id *, struct rdma_cm_event *); +extern int isert_post_recv(struct isert_conn *, u32); +extern int isert_post_send(struct isert_conn *, struct iser_tx_desc *); why use extern here? maybe a left over from V1? Nic, are you picking this comment one and its sister comment asking to remove externs and use less header files? Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] IPoIB: add support for TIPC protocol
Support TIPC in the IPoIB driver. Since IPoIB now keeps track of its own neighbour entries and doesn't require the packet to have a dst_entry anymore, the only necessary changes are to: - not drop multicast TIPC packets because of the unknown ethernet type - handle unicast TIPC packets similar to IPv4/IPv6 unicast packets in ipoib_start_xmit(). An alternative would be to remove all ethertype limitations since they're not necessary anymore, all TIPC needs to know about is ARP and RARP since it wants to always perform path find, even if a path is already known. Signed-off-by: Patrick McHardy ka...@trash.net --- drivers/infiniband/ulp/ipoib/ipoib_main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 8534afd..554b906 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -730,7 +730,8 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev) if ((header-proto != htons(ETH_P_IP)) (header-proto != htons(ETH_P_IPV6)) (header-proto != htons(ETH_P_ARP)) - (header-proto != htons(ETH_P_RARP))) { + (header-proto != htons(ETH_P_RARP)) + (header-proto != htons(ETH_P_TIPC))) { /* ethertype not supported by IPoIB */ ++dev-stats.tx_dropped; dev_kfree_skb_any(skb); @@ -751,6 +752,7 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev) switch (header-proto) { case htons(ETH_P_IP): case htons(ETH_P_IPV6): + case htons(ETH_P_TIPC): neigh = ipoib_neigh_get(dev, cb-hwaddr); if (unlikely(!neigh)) { neigh_add_path(skb, cb-hwaddr, dev); -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] tipc: set skb-protocol in eth_media packet transmission
The skb-protocol field is used by packet classifiers and for AF_PACKET cooked format, TIPC needs to set it properly. Fixes packet classification and ethertype of 0x in cooked captures: Out 20:c9:d0:43:12:d9 ethertype Unknown (0x), length 56: 0x: 5b50 0028 30d4 0100 1000 0100 1001 [P.(..0. 0x0010: 03e8 0001 20c9 d043 12d9 ...C 0x0020: Signed-off-by: Patrick McHardy ka...@trash.net --- net/tipc/eth_media.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/tipc/eth_media.c b/net/tipc/eth_media.c index 0648819..120a676 100644 --- a/net/tipc/eth_media.c +++ b/net/tipc/eth_media.c @@ -111,6 +111,7 @@ static int send_msg(struct sk_buff *buf, struct tipc_bearer *tb_ptr, skb_reset_network_header(clone); clone-dev = dev; + clone-protocol = htons(ETH_P_TIPC); dev_hard_header(clone, dev, ETH_P_TIPC, dest-value, dev-dev_addr, clone-len); dev_queue_xmit(clone); -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] tipc: move bcast_addr from struct tipc_media to struct tipc_bearer
Some network protocols, like InfiniBand, don't have a fixed broadcast address but one that depends on the configuration. Move the bcast_addr to struct tipc_bearer and initialize it with the broadcast address of the network device when the bearer is enabled. Signed-off-by: Patrick McHardy ka...@trash.net --- net/tipc/bcast.c | 4 ++-- net/tipc/bearer.c| 5 + net/tipc/bearer.h| 5 +++-- net/tipc/discover.c | 2 +- net/tipc/eth_media.c | 18 +++--- 5 files changed, 18 insertions(+), 16 deletions(-) diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c index 2655c9f..25e159c 100644 --- a/net/tipc/bcast.c +++ b/net/tipc/bcast.c @@ -620,10 +620,10 @@ static int tipc_bcbearer_send(struct sk_buff *buf, continue; /* bearer pair doesn't add anything */ if (!tipc_bearer_blocked(p)) - tipc_bearer_send(p, buf, p-media-bcast_addr); + tipc_bearer_send(p, buf, p-bcast_addr); else if (s !tipc_bearer_blocked(s)) /* unable to send on primary bearer */ - tipc_bearer_send(s, buf, s-media-bcast_addr); + tipc_bearer_send(s, buf, s-bcast_addr); else /* unable to send on either bearer */ continue; diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c index aa62f93..45d5398 100644 --- a/net/tipc/bearer.c +++ b/net/tipc/bearer.c @@ -89,9 +89,6 @@ int tipc_register_media(struct tipc_media *m_ptr) if ((strlen(m_ptr-name) + 1) TIPC_MAX_MEDIA_NAME) goto exit; - if ((m_ptr-bcast_addr.media_id != m_ptr-type_id) || - !m_ptr-bcast_addr.broadcast) - goto exit; if (m_ptr-priority TIPC_MAX_LINK_PRI) goto exit; if ((m_ptr-tolerance TIPC_MIN_LINK_TOL) || @@ -407,7 +404,7 @@ restart: INIT_LIST_HEAD(b_ptr-links); spin_lock_init(b_ptr-lock); - res = tipc_disc_create(b_ptr, m_ptr-bcast_addr, disc_domain); + res = tipc_disc_create(b_ptr, b_ptr-bcast_addr, disc_domain); if (res) { bearer_disable(b_ptr); pr_warn(Bearer %s rejected, discovery object creation failed\n, diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h index cc2d74e..3b3fa26 100644 --- a/net/tipc/bearer.h +++ b/net/tipc/bearer.h @@ -94,8 +94,8 @@ struct tipc_media { void (*disable_bearer)(struct tipc_bearer *b_ptr); int (*addr2str)(struct tipc_media_addr *a, char *str_buf, int str_size); int (*addr2msg)(struct tipc_media_addr *a, char *msg_area); - int (*msg2addr)(struct tipc_media_addr *a, char *msg_area); - struct tipc_media_addr bcast_addr; + int (*msg2addr)(const struct tipc_bearer *b_ptr, + struct tipc_media_addr *a, char *msg_area); u32 priority; u32 tolerance; u32 window; @@ -134,6 +134,7 @@ struct tipc_bearer { char name[TIPC_MAX_BEARER_NAME]; spinlock_t lock; struct tipc_media *media; + struct tipc_media_addr bcast_addr; u32 priority; u32 window; u32 tolerance; diff --git a/net/tipc/discover.c b/net/tipc/discover.c index 1074b95..eedff58 100644 --- a/net/tipc/discover.c +++ b/net/tipc/discover.c @@ -129,7 +129,7 @@ void tipc_disc_recv_msg(struct sk_buff *buf, struct tipc_bearer *b_ptr) int link_fully_up; media_addr.broadcast = 1; - b_ptr-media-msg2addr(media_addr, msg_media_addr(msg)); + b_ptr-media-msg2addr(b_ptr, media_addr, msg_media_addr(msg)); kfree_skb(buf); /* Ensure message from node is valid and communication is permitted */ diff --git a/net/tipc/eth_media.c b/net/tipc/eth_media.c index 1bdc6df..0648819 100644 --- a/net/tipc/eth_media.c +++ b/net/tipc/eth_media.c @@ -77,12 +77,13 @@ static struct notifier_block notifier = { * Media-dependent value field stores MAC address in first 6 bytes * and zeroes out the remaining bytes. */ -static void eth_media_addr_set(struct tipc_media_addr *a, char *mac) +static void eth_media_addr_set(const struct tipc_bearer *tb_ptr, + struct tipc_media_addr *a, char *mac) { memcpy(a-value, mac, ETH_ALEN); memset(a-value + ETH_ALEN, 0, sizeof(a-value) - ETH_ALEN); a-media_id = TIPC_MEDIA_TYPE_ETH; - a-broadcast = !memcmp(mac, eth_media_info.bcast_addr.value, ETH_ALEN); + a-broadcast = !memcmp(mac, tb_ptr-bcast_addr.value, ETH_ALEN); } /** @@ -201,9 +202,13 @@ static int enable_bearer(struct tipc_bearer *tb_ptr) /* Associate TIPC bearer with Ethernet bearer */ eb_ptr-bearer = tb_ptr; tb_ptr-usr_handle = (void *)eb_ptr; + memset(tb_ptr-bcast_addr.value, 0, sizeof(tb_ptr-bcast_addr.value)); + memcpy(tb_ptr-bcast_addr.value, dev-broadcast, ETH_ALEN); + tb_ptr-bcast_addr.media_id = TIPC_MEDIA_TYPE_ETH; +
[PATCH RFC 0/5] tipc: add support for TIPC over InfiniBand
The following patchset adds support for running TIPC over InfiniBand. The patchset consists of three parts (+ a minor fix for the ethernet media type): - Preparation: removal of an the unused str2addr callback and move of the bcast_addr from struct tipc_media to struct tipc_bearer. This is necessary because InfiniBand doesn't have a fixed broadcast address like ethernet, so it needs to be initialized with the device's broadcast address when the bearer is enabled - Introduction of a TIPC InfiniBand media type. A new media type is needed to deal with the different address sizes - Support for ETH_P_TIPC in IPoIB The last patch is something I'd like to discuss, I realize that this diverges from the IPoIB specification, however the alternative would be to implement something which would be pretty much identical to IPoIB with the only difference of handling a different ethertype in the xmit function. In fact I'd like to propose to remove all higher layer protocol knowledge from IPoIB except for ARP and RARP, which need special treatment. With the recent patch to manage neighbour entries in IPoIB itself, no further knowledge of higher layer protocols is required. The patchset is based on net-next. Comments welcome. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] tipc: add InfiniBand media type
Add InfiniBand media type based on the ethernet media type. The only real difference is that in case of InfiniBand, we need the entire 20 bytes of space reserved for media addresses, so the TIPC media type ID is not explicitly stored in the packet payload. Sample output of tipc-config: # tipc-config -v -addr -netid -nt=all -p -m -b -n -ls node address: 10.1.4 current network id: 4711 Type Lower Upper Port Identity Publication Scope 0 167776257 167776257 10.1.1:18555125771855512578 cluster 167776260 167776260 10.1.4:12164546571216454658 zone 1 1 1 10.1.4:12164792351216479236 node Ports: 1216479235: bound to {1,1} 1216454657: bound to {0,167776260} Media: eth ib Bearers: ib:ib0 Nodes known: 10.1.1: up Link broadcast-link Window:20 packets RX packets:0 fragments:0/0 bundles:0/0 TX packets:0 fragments:0/0 bundles:0/0 RX naks:0 defs:0 dups:0 TX naks:0 acks:0 dups:0 Congestion bearer:0 link:0 Send queue max:0 avg:0 Link 10.1.4:ib0-10.1.1:ib0 ACTIVE MTU:2044 Priority:10 Tolerance:1500 ms Window:50 packets RX packets:80 fragments:0/0 bundles:0/0 TX packets:40 fragments:0/0 bundles:0/0 TX profile sample:22 packets average:54 octets 0-64:100% -256:0% -1024:0% -4096:0% -16384:0% -32768:0% -66000:0% RX states:410 probes:213 naks:0 defs:0 dups:0 TX states:410 probes:197 naks:0 acks:0 dups:0 Congestion bearer:0 link:0 Send queue max:1 avg:0 Signed-off-by: Patrick McHardy ka...@trash.net --- net/tipc/Kconfig| 7 + net/tipc/Makefile | 2 + net/tipc/bearer.c | 2 +- net/tipc/bearer.h | 9 ++ net/tipc/core.c | 14 +- net/tipc/ib_media.c | 387 6 files changed, 418 insertions(+), 3 deletions(-) create mode 100644 net/tipc/ib_media.c diff --git a/net/tipc/Kconfig b/net/tipc/Kconfig index 4f99600..900ee66 100644 --- a/net/tipc/Kconfig +++ b/net/tipc/Kconfig @@ -31,3 +31,10 @@ config TIPC_PORTS Setting this to a smaller value saves some memory, setting it to higher allows for more ports. + +config TIPC_MEDIA_IB + bool InfiniBand media type support + depends on INFINIBAND_IPOIB + help + Saying Y here will enable support for running TIPC on + IP-over-InfiniBand devices. diff --git a/net/tipc/Makefile b/net/tipc/Makefile index 6cd55d6..4df8e02 100644 --- a/net/tipc/Makefile +++ b/net/tipc/Makefile @@ -9,3 +9,5 @@ tipc-y += addr.o bcast.o bearer.o config.o \ name_distr.o subscr.o name_table.o net.o \ netlink.o node.o node_subscr.o port.o ref.o \ socket.o log.o eth_media.o + +tipc-$(CONFIG_TIPC_MEDIA_IB) += ib_media.o diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c index 45d5398..cb29ef7 100644 --- a/net/tipc/bearer.c +++ b/net/tipc/bearer.c @@ -39,7 +39,7 @@ #include bearer.h #include discover.h -#define MAX_ADDR_STR 32 +#define MAX_ADDR_STR 60 static struct tipc_media *media_list[MAX_MEDIA]; static u32 media_count; diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h index 3b3fa26..be68105 100644 --- a/net/tipc/bearer.h +++ b/net/tipc/bearer.h @@ -56,6 +56,7 @@ * Identifiers of supported TIPC media types */ #define TIPC_MEDIA_TYPE_ETH1 +#define TIPC_MEDIA_TYPE_IB 2 /** * struct tipc_media_addr - destination address used by TIPC bearers @@ -174,6 +175,14 @@ int tipc_disable_bearer(const char *name); int tipc_eth_media_start(void); void tipc_eth_media_stop(void); +#ifdef CONFIG_TIPC_MEDIA_IB +int tipc_ib_media_start(void); +void tipc_ib_media_stop(void); +#else +int tipc_ib_media_start(void) { return 0; } +void tipc_ib_media_stop(void) { return; } +#endif + int tipc_media_set_priority(const char *name, u32 new_value); int tipc_media_set_window(const char *name, u32 new_value); void tipc_media_addr_printf(char *buf, int len, struct tipc_media_addr *a); diff --git a/net/tipc/core.c b/net/tipc/core.c index fc05cec..133aa4a 100644 --- a/net/tipc/core.c +++ b/net/tipc/core.c @@ -82,6 +82,7 @@ static void tipc_core_stop_net(void) { tipc_net_stop(); tipc_eth_media_stop(); + tipc_ib_media_stop(); } /** @@ -93,8 +94,17 @@ int tipc_core_start_net(unsigned long addr) tipc_net_start(addr); res = tipc_eth_media_start(); - if (res) - tipc_core_stop_net(); + if (res 0) + goto err1; + res = tipc_ib_media_start(); + if (res 0) + goto err2; + return res; + +err2: + tipc_eth_media_stop(); +err1: + tipc_core_stop_net(); return res; } diff --git a/net/tipc/ib_media.c b/net/tipc/ib_media.c new file mode 100644 index 000..2a2864c --- /dev/null +++ b/net/tipc/ib_media.c @@ -0,0 +1,387 @@ +/* + * net/tipc/ib_media.c: Infiniband bearer support for TIPC + * + * Copyright (c) 2013 Patrick McHardy ka...@trash.net + * + * Based on eth_media.c, which carries the
[PATCH 1/5] tipc: remove unused str2addr media callback
Signed-off-by: Patrick McHardy ka...@trash.net --- net/tipc/bearer.h| 2 -- net/tipc/eth_media.c | 20 2 files changed, 22 deletions(-) diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h index 39f1192..cc2d74e 100644 --- a/net/tipc/bearer.h +++ b/net/tipc/bearer.h @@ -77,7 +77,6 @@ struct tipc_bearer; * @enable_bearer: routine which enables a bearer * @disable_bearer: routine which disables a bearer * @addr2str: routine which converts media address to string - * @str2addr: routine which converts media address from string * @addr2msg: routine which converts media address to protocol message area * @msg2addr: routine which converts media address from protocol message area * @bcast_addr: media address used in broadcasting @@ -94,7 +93,6 @@ struct tipc_media { int (*enable_bearer)(struct tipc_bearer *b_ptr); void (*disable_bearer)(struct tipc_bearer *b_ptr); int (*addr2str)(struct tipc_media_addr *a, char *str_buf, int str_size); - int (*str2addr)(struct tipc_media_addr *a, char *str_buf); int (*addr2msg)(struct tipc_media_addr *a, char *msg_area); int (*msg2addr)(struct tipc_media_addr *a, char *msg_area); struct tipc_media_addr bcast_addr; diff --git a/net/tipc/eth_media.c b/net/tipc/eth_media.c index 2132c1e..1bdc6df 100644 --- a/net/tipc/eth_media.c +++ b/net/tipc/eth_media.c @@ -302,25 +302,6 @@ static int eth_addr2str(struct tipc_media_addr *a, char *str_buf, int str_size) } /** - * eth_str2addr - convert string to Ethernet address - */ -static int eth_str2addr(struct tipc_media_addr *a, char *str_buf) -{ - char mac[ETH_ALEN]; - int r; - - r = sscanf(str_buf, %02x:%02x:%02x:%02x:%02x:%02x, - (u32 *)mac[0], (u32 *)mac[1], (u32 *)mac[2], - (u32 *)mac[3], (u32 *)mac[4], (u32 *)mac[5]); - - if (r != ETH_ALEN) - return 1; - - eth_media_addr_set(a, mac); - return 0; -} - -/** * eth_str2addr - convert Ethernet address format to message header format */ static int eth_addr2msg(struct tipc_media_addr *a, char *msg_area) @@ -351,7 +332,6 @@ static struct tipc_media eth_media_info = { .enable_bearer = enable_bearer, .disable_bearer = disable_bearer, .addr2str = eth_addr2str, - .str2addr = eth_str2addr, .addr2msg = eth_addr2msg, .msg2addr = eth_msg2addr, .bcast_addr = { { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }, -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
New patches
I'm about to send some patches for libibverbs and Roland's infiniband kernel git tree. The patches fit into two general categories: 1. Add enums for Cisco's Ethernet Virtual NIC (it's not an RNIC and therefore doesn't fit the RNIC/IWARP enums). Also add enums for 1500 and 9000 MTUs. 2. Minor modernization of the GNU Autotools usage in libibverbs. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] Use autoreconf in autogen.sh
The old sequence of Autotools commands listed in autogen.sh is no longer correct. Instead, just use the single autoreconf command, which will invoke all the Right Autotools commands in the correct order. --- autogen.sh | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/autogen.sh b/autogen.sh index fd47839..6c9233e 100755 --- a/autogen.sh +++ b/autogen.sh @@ -1,8 +1,4 @@ #! /bin/sh set -x -aclocal -I config -libtoolize --force --copy -autoheader -automake --foreign --add-missing --copy -autoconf +autoreconf -ifv -I config -- 1.8.1.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] Add IBV_*_USNIC enums for the Cisco Ethernet Virtual NIC.
Per off-list conversation with Roland, add some new enums for the Cisco Ethernet Virtual NIC (it's not an RNIC/iWARP device, so it doesn't fit in the same category as RDMA_NODE_RNIC / RDMA_TRANSPORT_IWARP). USNIC = Userspace NIC. --- examples/devinfo.c | 1 + include/infiniband/verbs.h | 6 -- src/enum_strs.c| 5 +++-- src/init.c | 5 - 4 files changed, 12 insertions(+), 5 deletions(-) diff --git a/examples/devinfo.c b/examples/devinfo.c index 7dc0463..98a6b4b 100644 --- a/examples/devinfo.c +++ b/examples/devinfo.c @@ -72,6 +72,7 @@ static const char *transport_str(enum ibv_transport_type transport) switch (transport) { case IBV_TRANSPORT_IB:return InfiniBand; case IBV_TRANSPORT_IWARP: return iWARP; + case IBV_TRANSPORT_USNIC: return USNIC; default: return invalid transport; } } diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h index 6acfc81..6a6944c 100644 --- a/include/infiniband/verbs.h +++ b/include/infiniband/verbs.h @@ -68,13 +68,15 @@ enum ibv_node_type { IBV_NODE_CA = 1, IBV_NODE_SWITCH, IBV_NODE_ROUTER, - IBV_NODE_RNIC + IBV_NODE_RNIC, + IBV_NODE_USNIC }; enum ibv_transport_type { IBV_TRANSPORT_UNKNOWN = -1, IBV_TRANSPORT_IB= 0, - IBV_TRANSPORT_IWARP + IBV_TRANSPORT_IWARP, + IBV_TRANSPORT_USNIC }; enum ibv_device_cap_flags { diff --git a/src/enum_strs.c b/src/enum_strs.c index 54d71a6..0d68c75 100644 --- a/src/enum_strs.c +++ b/src/enum_strs.c @@ -38,10 +38,11 @@ const char *ibv_node_type_str(enum ibv_node_type node_type) [IBV_NODE_CA] = InfiniBand channel adapter, [IBV_NODE_SWITCH] = InfiniBand switch, [IBV_NODE_ROUTER] = InfiniBand router, - [IBV_NODE_RNIC] = iWARP NIC + [IBV_NODE_RNIC] = iWARP NIC, + [IBV_NODE_USNIC]= Ethernet USNIC }; - if (node_type IBV_NODE_CA || node_type IBV_NODE_RNIC) + if (node_type IBV_NODE_CA || node_type IBV_NODE_USNIC) return unknown; return node_type_str[node_type]; diff --git a/src/init.c b/src/init.c index 8d6786e..e4ef001 100644 --- a/src/init.c +++ b/src/init.c @@ -346,7 +346,7 @@ static struct ibv_device *try_driver(struct ibv_driver *driver, dev-node_type = IBV_NODE_UNKNOWN; } else { dev-node_type = strtol(value, NULL, 10); - if (dev-node_type IBV_NODE_CA || dev-node_type IBV_NODE_RNIC) + if (dev-node_type IBV_NODE_CA || dev-node_type IBV_NODE_USNIC) dev-node_type = IBV_NODE_UNKNOWN; } @@ -359,6 +359,9 @@ static struct ibv_device *try_driver(struct ibv_driver *driver, case IBV_NODE_RNIC: dev-transport_type = IBV_TRANSPORT_IWARP; break; + case IBV_NODE_USNIC: + dev-transport_type = IBV_TRANSPORT_USNIC; + break; default: dev-transport_type = IBV_TRANSPORT_UNKNOWN; break; -- 1.8.1.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] Add IBV_MTU_1500|9000 enums.
Allow specification of common Ethernet MTUs. --- examples/devinfo.c | 2 ++ examples/pingpong.c| 2 ++ include/infiniband/verbs.h | 6 -- 3 files changed, 8 insertions(+), 2 deletions(-) diff --git a/examples/devinfo.c b/examples/devinfo.c index 98a6b4b..6700882 100644 --- a/examples/devinfo.c +++ b/examples/devinfo.c @@ -118,8 +118,10 @@ static const char *mtu_str(enum ibv_mtu max_mtu) case IBV_MTU_256: return 256; case IBV_MTU_512: return 512; case IBV_MTU_1024: return 1024; + case IBV_MTU_1500: return 1500; case IBV_MTU_2048: return 2048; case IBV_MTU_4096: return 4096; + case IBV_MTU_9000: return 9000; default: return invalid MTU; } } diff --git a/examples/pingpong.c b/examples/pingpong.c index 90732ef..d7443a8 100644 --- a/examples/pingpong.c +++ b/examples/pingpong.c @@ -42,8 +42,10 @@ enum ibv_mtu pp_mtu_to_enum(int mtu) case 256: return IBV_MTU_256; case 512: return IBV_MTU_512; case 1024: return IBV_MTU_1024; + case 1500: return IBV_MTU_1500; case 2048: return IBV_MTU_2048; case 4096: return IBV_MTU_4096; + case 9000: return IBV_MTU_9000; default: return -1; } } diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h index 6a6944c..1583c34 100644 --- a/include/infiniband/verbs.h +++ b/include/infiniband/verbs.h @@ -150,8 +150,10 @@ enum ibv_mtu { IBV_MTU_256 = 1, IBV_MTU_512 = 2, IBV_MTU_1024 = 3, - IBV_MTU_2048 = 4, - IBV_MTU_4096 = 5 + IBV_MTU_1500 = 4, + IBV_MTU_2048 = 5, + IBV_MTU_4096 = 6, + IBV_MTU_9000 = 7 }; enum ibv_port_state { -- 1.8.1.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] .gitignore updates and renameconfigure.in-.ac
Added some entries to config/.gitignore for newer versions of the GNU Autotools. Also renamed configure.in - configure.ac to accomodate newer GNU Autotools (http://lists.gnu.org/archive/html/autotools-announce/2012-11/msg0.html announced the intent to drop support for configure.in in future versions of Autoconf). --- .gitignore | 6 + configure.ac | 74 configure.in | 74 3 files changed, 80 insertions(+), 74 deletions(-) create mode 100644 configure.ac delete mode 100644 configure.in diff --git a/.gitignore b/.gitignore index 78effef..d198dd1 100644 --- a/.gitignore +++ b/.gitignore @@ -6,6 +6,7 @@ autom4te.cache aclocal.m4 stamp-h.in config.h.in +config.h.in~ config.log config.h .libs @@ -15,3 +16,8 @@ Makefile config.status stamp-h1 libtool +config/libtool.m4 +config/ltoptions.m4 +config/ltsugar.m4 +config/ltversion.m4 +config/lt~obsolete.m4 diff --git a/configure.ac b/configure.ac new file mode 100644 index 000..efdc5ac --- /dev/null +++ b/configure.ac @@ -0,0 +1,74 @@ +dnl Process this file with autoconf to produce a configure script. + +AC_PREREQ(2.57) +AC_INIT(libibverbs, 1.1.6, linux-rdma@vger.kernel.org) +AC_CONFIG_SRCDIR([src/ibverbs.h]) +AC_CONFIG_AUX_DIR(config) +AC_CONFIG_MACRO_DIR(config) +AC_CONFIG_HEADER(config.h) +AM_INIT_AUTOMAKE([foreign]) +m4_ifdef([AM_SILENT_RULES], [AM_SILENT_RULES([yes])]) + +dnl Checks for programs +AC_PROG_CC +AC_GNU_SOURCE +AC_PROG_LN_S +AC_PROG_LIBTOOL + +LT_INIT + +AC_ARG_WITH([valgrind], +AC_HELP_STRING([--with-valgrind], +[Enable Valgrind annotations (small runtime overhead, default NO)])) +if test x$with_valgrind = x || test x$with_valgrind = xno; then +want_valgrind=no +AC_DEFINE([NVALGRIND], 1, [Define to 1 to disable Valgrind annotations.]) +else +want_valgrind=yes +if test -d $with_valgrind; then +CPPFLAGS=$CPPFLAGS -I$with_valgrind/include +fi +fi + +dnl Checks for libraries +AC_CHECK_LIB(dl, dlsym, [], +AC_MSG_ERROR([dlsym() not found. libibverbs requires libdl.])) +AC_CHECK_LIB(pthread, pthread_mutex_init, [], +AC_MSG_ERROR([pthread_mutex_init() not found. libibverbs requires libpthread.])) + +dnl Checks for header files. +AC_HEADER_STDC +AC_CHECK_HEADER(valgrind/memcheck.h, +[AC_DEFINE(HAVE_VALGRIND_MEMCHECK_H, 1, +[Define to 1 if you have the valgrind/memcheck.h header file.])], +[if test $want_valgrind = yes; then +AC_MSG_ERROR([Valgrind memcheck support requested, but valgrind/memcheck.h not found.]) +fi]) + +dnl Checks for typedefs, structures, and compiler characteristics. +AC_C_CONST + +AC_CACHE_CHECK(whether ld accepts --version-script, ac_cv_version_script, +[if test -n `$LD --help /dev/null 2/dev/null | grep version-script`; then + ac_cv_version_script=yes +else + ac_cv_version_script=no +fi]) + +if test $ac_cv_version_script = yes; then + LIBIBVERBS_VERSION_SCRIPT='-Wl,--version-script=$(srcdir)/src/libibverbs.map' +else +LIBIBVERBS_VERSION_SCRIPT= +fi +AC_SUBST(LIBIBVERBS_VERSION_SCRIPT) + +AC_CACHE_CHECK(for .symver assembler support, ac_cv_asm_symver_support, +[AC_TRY_COMPILE(, [asm(symbol:\n.symver symbol, api@ABI\n);], +ac_cv_asm_symver_support=yes, +ac_cv_asm_symver_support=no)]) +if test $ac_cv_asm_symver_support = yes; then +AC_DEFINE([HAVE_SYMVER_SUPPORT], 1, [assembler has .symver support]) +fi + +AC_CONFIG_FILES([Makefile libibverbs.spec]) +AC_OUTPUT diff --git a/configure.in b/configure.in deleted file mode 100644 index efdc5ac..000 --- a/configure.in +++ /dev/null @@ -1,74 +0,0 @@ -dnl Process this file with autoconf to produce a configure script. - -AC_PREREQ(2.57) -AC_INIT(libibverbs, 1.1.6, linux-rdma@vger.kernel.org) -AC_CONFIG_SRCDIR([src/ibverbs.h]) -AC_CONFIG_AUX_DIR(config) -AC_CONFIG_MACRO_DIR(config) -AC_CONFIG_HEADER(config.h) -AM_INIT_AUTOMAKE([foreign]) -m4_ifdef([AM_SILENT_RULES], [AM_SILENT_RULES([yes])]) - -dnl Checks for programs -AC_PROG_CC -AC_GNU_SOURCE -AC_PROG_LN_S -AC_PROG_LIBTOOL - -LT_INIT - -AC_ARG_WITH([valgrind], -AC_HELP_STRING([--with-valgrind], -[Enable Valgrind annotations (small runtime overhead, default NO)])) -if test x$with_valgrind = x || test x$with_valgrind = xno; then -want_valgrind=no -AC_DEFINE([NVALGRIND], 1, [Define to 1 to disable Valgrind annotations.]) -else -want_valgrind=yes -if test -d $with_valgrind; then -CPPFLAGS=$CPPFLAGS -I$with_valgrind/include -fi -fi - -dnl Checks for libraries -AC_CHECK_LIB(dl, dlsym, [], -AC_MSG_ERROR([dlsym() not found. libibverbs requires libdl.])) -AC_CHECK_LIB(pthread, pthread_mutex_init, [], -AC_MSG_ERROR([pthread_mutex_init() not found. libibverbs requires libpthread.])) - -dnl Checks for header files. -AC_HEADER_STDC -AC_CHECK_HEADER(valgrind/memcheck.h, -[AC_DEFINE(HAVE_VALGRIND_MEMCHECK_H, 1, -
[PATCH 2/2] Ad IB_MTU_1500|9000 enums.
Allow specification of common Ethernet MTUs. --- include/rdma/ib_addr.h | 6 +- include/rdma/ib_verbs.h | 8 ++-- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h index 9996539..1f6fbbc 100644 --- a/include/rdma/ib_addr.h +++ b/include/rdma/ib_addr.h @@ -200,10 +200,14 @@ static inline enum ib_mtu iboe_get_mtu(int mtu) */ mtu = mtu - IB_GRH_BYTES - IB_BTH_BYTES - 28; - if (mtu = ib_mtu_enum_to_int(IB_MTU_4096)) + if (mtu = ib_mtu_enum_to_int(IB_MTU_9000)) + return IB_MTU_9000; + else if (mtu = ib_mtu_enum_to_int(IB_MTU_4096)) return IB_MTU_4096; else if (mtu = ib_mtu_enum_to_int(IB_MTU_2048)) return IB_MTU_2048; + else if (mtu = ib_mtu_enum_to_int(IB_MTU_1500)) + return IB_MTU_1500; else if (mtu = ib_mtu_enum_to_int(IB_MTU_1024)) return IB_MTU_1024; else if (mtu = ib_mtu_enum_to_int(IB_MTU_512)) diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 8a66758..4670f6f 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -174,8 +174,10 @@ enum ib_mtu { IB_MTU_256 = 1, IB_MTU_512 = 2, IB_MTU_1024 = 3, - IB_MTU_2048 = 4, - IB_MTU_4096 = 5 + IB_MTU_1500 = 4, + IB_MTU_2048 = 5, + IB_MTU_4096 = 6, + IB_MTU_9000 = 7 }; static inline int ib_mtu_enum_to_int(enum ib_mtu mtu) @@ -184,8 +186,10 @@ static inline int ib_mtu_enum_to_int(enum ib_mtu mtu) case IB_MTU_256: return 256; case IB_MTU_512: return 512; case IB_MTU_1024: return 1024; + case IB_MTU_1500: return 1500; case IB_MTU_2048: return 2048; case IB_MTU_4096: return 4096; + case IB_MTU_9000: return 9000; default: return -1; } } -- 1.8.1.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] Add RDMA_*_USNIC enums for the Cisco Ethernet Virtual NIC.
Per off-list conversation with Roland, add some new enums for the Cisco Ethernet Virtual NIC (it's not an RNIC/iWARP device, so it doesn't fit in the same category as RDMA_NODE_RNIC / RDMA_TRANSPORT_IWARP). USNIC = Userspace NIC. --- drivers/infiniband/core/verbs.c | 3 +++ include/rdma/ib_verbs.h | 6 -- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index a8fdd33..2a35518 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -114,6 +114,8 @@ rdma_node_get_transport(enum rdma_node_type node_type) return RDMA_TRANSPORT_IB; case RDMA_NODE_RNIC: return RDMA_TRANSPORT_IWARP; + case RDMA_NODE_USNIC: + return RDMA_TRANSPORT_USNIC; default: BUG(); return 0; @@ -130,6 +132,7 @@ enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device, u8 port_ case RDMA_TRANSPORT_IB: return IB_LINK_LAYER_INFINIBAND; case RDMA_TRANSPORT_IWARP: + case RDMA_TRANSPORT_USNIC: return IB_LINK_LAYER_ETHERNET; default: return IB_LINK_LAYER_UNSPECIFIED; diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 98cc4b2..8a66758 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -67,12 +67,14 @@ enum rdma_node_type { RDMA_NODE_IB_CA = 1, RDMA_NODE_IB_SWITCH, RDMA_NODE_IB_ROUTER, - RDMA_NODE_RNIC + RDMA_NODE_RNIC, + RDMA_NODE_USNIC }; enum rdma_transport_type { RDMA_TRANSPORT_IB, - RDMA_TRANSPORT_IWARP + RDMA_TRANSPORT_IWARP, + RDMA_TRANSPORT_USNIC }; enum rdma_transport_type -- 1.8.1.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/4] Add IBV_*_USNIC enums for the Cisco Ethernet Virtual NIC.
Per off-list conversation with Roland, add some new enums for the Cisco Ethernet Virtual NIC (it's not an RNIC/iWARP device, so it doesn't fit in the same category as RDMA_NODE_RNIC / RDMA_TRANSPORT_IWARP). USNIC = Userspace NIC. Can we get a better patch description? Maybe mention something about the NIC? Does it support all verbs? Is it for kernel users or just user space? Does this simply export a raw ethernet interface? --- examples/devinfo.c | 1 + include/infiniband/verbs.h | 6 -- src/enum_strs.c| 5 +++-- src/init.c | 5 - 4 files changed, 12 insertions(+), 5 deletions(-) diff --git a/examples/devinfo.c b/examples/devinfo.c index 7dc0463..98a6b4b 100644 --- a/examples/devinfo.c +++ b/examples/devinfo.c @@ -72,6 +72,7 @@ static const char *transport_str(enum ibv_transport_type transport) switch (transport) { case IBV_TRANSPORT_IB:return InfiniBand; case IBV_TRANSPORT_IWARP: return iWARP; + case IBV_TRANSPORT_USNIC: return USNIC; default: return invalid transport; } } diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h index 6acfc81..6a6944c 100644 --- a/include/infiniband/verbs.h +++ b/include/infiniband/verbs.h @@ -68,13 +68,15 @@ enum ibv_node_type { IBV_NODE_CA = 1, IBV_NODE_SWITCH, IBV_NODE_ROUTER, - IBV_NODE_RNIC + IBV_NODE_RNIC, + IBV_NODE_USNIC }; enum ibv_transport_type { IBV_TRANSPORT_UNKNOWN = -1, IBV_TRANSPORT_IB= 0, - IBV_TRANSPORT_IWARP + IBV_TRANSPORT_IWARP, + IBV_TRANSPORT_USNIC }; enum ibv_device_cap_flags { diff --git a/src/enum_strs.c b/src/enum_strs.c index 54d71a6..0d68c75 100644 --- a/src/enum_strs.c +++ b/src/enum_strs.c @@ -38,10 +38,11 @@ const char *ibv_node_type_str(enum ibv_node_type node_type) [IBV_NODE_CA] = InfiniBand channel adapter, [IBV_NODE_SWITCH] = InfiniBand switch, [IBV_NODE_ROUTER] = InfiniBand router, - [IBV_NODE_RNIC] = iWARP NIC + [IBV_NODE_RNIC] = iWARP NIC, + [IBV_NODE_USNIC]= Ethernet USNIC }; - if (node_type IBV_NODE_CA || node_type IBV_NODE_RNIC) + if (node_type IBV_NODE_CA || node_type IBV_NODE_USNIC) return unknown; return node_type_str[node_type]; diff --git a/src/init.c b/src/init.c index 8d6786e..e4ef001 100644 --- a/src/init.c +++ b/src/init.c @@ -346,7 +346,7 @@ static struct ibv_device *try_driver(struct ibv_driver *driver, dev-node_type = IBV_NODE_UNKNOWN; } else { dev-node_type = strtol(value, NULL, 10); - if (dev-node_type IBV_NODE_CA || dev-node_type IBV_NODE_RNIC) + if (dev-node_type IBV_NODE_CA || dev-node_type IBV_NODE_USNIC) dev-node_type = IBV_NODE_UNKNOWN; } @@ -359,6 +359,9 @@ static struct ibv_device *try_driver(struct ibv_driver *driver, case IBV_NODE_RNIC: dev-transport_type = IBV_TRANSPORT_IWARP; break; + case IBV_NODE_USNIC: + dev-transport_type = IBV_TRANSPORT_USNIC; + break; default: dev-transport_type = IBV_TRANSPORT_UNKNOWN; break; -- 1.8.1.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] tipc: add InfiniBand media type
On Wed, Apr 03, 2013 at 04:41:40PM +0200, Erik Hugne wrote: On Wed, Apr 03, 2013 at 02:43:29PM +0200, Patrick McHardy wrote: diff --git a/net/tipc/Kconfig b/net/tipc/Kconfig index 4f99600..900ee66 100644 --- a/net/tipc/Kconfig +++ b/net/tipc/Kconfig @@ -31,3 +31,10 @@ config TIPC_PORTS Setting this to a smaller value saves some memory, setting it to higher allows for more ports. + +config TIPC_MEDIA_IB + bool InfiniBand media type support + depends on INFINIBAND_IPOIB + help + Saying Y here will enable support for running TIPC on + IP-over-InfiniBand devices. diff --git a/net/tipc/Makefile b/net/tipc/Makefile index 6cd55d6..4df8e02 100644 --- a/net/tipc/Makefile +++ b/net/tipc/Makefile @@ -9,3 +9,5 @@ tipc-y += addr.o bcast.o bearer.o config.o \ name_distr.o subscr.o name_table.o net.o \ netlink.o node.o node_subscr.o port.o ref.o \ socket.o log.o eth_media.o + +tipc-$(CONFIG_TIPC_MEDIA_IB) += ib_media.o The TIPC_MEDIA_IB option shows up directly under networking options, instead of under TIPC. I think depends on TIPC is missing? Oops, I guess I messed that up during forward porting. I'll fix it up for the next submission, thanks. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] tipc: add InfiniBand media type
On Wed, Apr 03, 2013 at 02:43:29PM +0200, Patrick McHardy wrote: diff --git a/net/tipc/Kconfig b/net/tipc/Kconfig index 4f99600..900ee66 100644 --- a/net/tipc/Kconfig +++ b/net/tipc/Kconfig @@ -31,3 +31,10 @@ config TIPC_PORTS Setting this to a smaller value saves some memory, setting it to higher allows for more ports. + +config TIPC_MEDIA_IB + bool InfiniBand media type support + depends on INFINIBAND_IPOIB + help + Saying Y here will enable support for running TIPC on + IP-over-InfiniBand devices. diff --git a/net/tipc/Makefile b/net/tipc/Makefile index 6cd55d6..4df8e02 100644 --- a/net/tipc/Makefile +++ b/net/tipc/Makefile @@ -9,3 +9,5 @@ tipc-y+= addr.o bcast.o bearer.o config.o \ name_distr.o subscr.o name_table.o net.o \ netlink.o node.o node_subscr.o port.o ref.o \ socket.o log.o eth_media.o + +tipc-$(CONFIG_TIPC_MEDIA_IB) += ib_media.o The TIPC_MEDIA_IB option shows up directly under networking options, instead of under TIPC. I think depends on TIPC is missing? //E -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] IPoIB: add support for TIPC protocol
On 03/04/2013 15:43, Patrick McHardy wrote: [...] all TIPC needs to know about is ARP and RARP since it wants to always perform path find, even if a path is already known. [...] Not sure to follow this part... did you mean all IPoIB needs to know about is ARP or RARP, this makes sense indeed, since for arp/rarp we want to call unicast_arp_send which does path_find and looks also for the case the path isn't valid Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] IPoIB: add support for TIPC protocol
On Wed, Apr 03, 2013 at 06:31:49PM +0300, Or Gerlitz wrote: On 03/04/2013 15:43, Patrick McHardy wrote: [...] all TIPC needs to know about is ARP and RARP since it wants to always perform path find, even if a path is already known. [...] Not sure to follow this part... did you mean all IPoIB needs to know about is ARP or RARP, this makes sense indeed, since for arp/rarp we want to call unicast_arp_send which does path_find and looks also for the case the path isn't valid What I meant is that it doesn't require any knowledge about IPv4/IPv6 or other higher layer protocols anymore. At least almost none. We have protocol knowledge in ipoib_start_xmit(). For broadcast packets, it drops unknown protocols. For unicast packets, it handles ARP/RARP seperately because of the path find differences, IP/IPv6 are sent using the neigh, all others are dropped. ipoib_cm also has knowledge about IPv4/IPv6 in order to send ICMP errors. What we could do instead of adding TIPC to the broadcast-don't-drop list and to the send-using-neigh list in ipoib_start_xmit() is to only treat ARP/RARP special and send every other protocol using the neigh or ipoib_mcast_send(). Right now the supported protocols are artificially limited without a technical reason. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] rdma: add a new IB_ACCESS_GIFT flag
On Tue, Apr 02, 2013 at 08:05:21PM +0300, Michael S. Tsirkin wrote: On Tue, Apr 02, 2013 at 09:57:38AM -0700, Roland Dreier wrote: On Tue, Apr 2, 2013 at 8:51 AM, Michael S. Tsirkin m...@redhat.com wrote: At the moment registering an MR breaks COW. This breaks memory overcommit for users such as KVM: we have a lot of COW pages, e.g. instances of the zero page or pages shared using KSM. If the application does not care that adapter sees stale data (for example, it tracks writes reregisters and resends), it can use a new IBV_ACCESS_GIFT flag to prevent registration from breaking COW. The semantics are similar to that of SPLICE_F_GIFT thus the name. Signed-off-by: Michael S. Tsirkin m...@redhat.com Roland, Michael is yet to test this but could you please confirm whether this looks acceptable to you? The patch itself is reasonable I guess, given the needs of this particular app. I'm not particularly happy with the name of the flag. The analogy with SPLICE_F_GIFT doesn't seem particularly strong and I'm not convinced even the splice flag name is very understandable. But in the RDMA case there's not really any sense in which we're gifting memory to the adapter -- we're just telling the library please don't trigger copy-on-write and it doesn't seem particularly easy for users to understand that from the flag name. - R. The point really is that any writes by application won't be seen until re-registration, right? OK, what's a better name? IBV_ACCESS_NON_COHERENT? Please tell me what is preferable and we'll go ahead with it. Um. ping? We are at -rc5 and things need to fall into place if we are to have it in 3.10 ... -- MST -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Ad IB_MTU_1500|9000 enums.
On Wed, Apr 3, 2013 at 6:13 AM, Jeff Squyres jsquy...@cisco.com wrote: diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 8a66758..4670f6f 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -174,8 +174,10 @@ enum ib_mtu { IB_MTU_256 = 1, IB_MTU_512 = 2, IB_MTU_1024 = 3, - IB_MTU_2048 = 4, - IB_MTU_4096 = 5 + IB_MTU_1500 = 4, + IB_MTU_2048 = 5, + IB_MTU_4096 = 6, + IB_MTU_9000 = 7 }; I don't think we can blithely do this... I think the IB enum values are defined to match the values used in the IB spec (PathRecord etc). Even if we change it so 1500 and 9000 are outside of the range used by the IB spec, I don't understand the motivation for this change. What does this buy us? How is iWARP working today without this change? - R. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Ad IB_MTU_1500|9000 enums.
On 4/3/2013 11:52 AM, Roland Dreier wrote: On Wed, Apr 3, 2013 at 6:13 AM, Jeff Squyres jsquy...@cisco.com wrote: diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 8a66758..4670f6f 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -174,8 +174,10 @@ enum ib_mtu { IB_MTU_256 = 1, IB_MTU_512 = 2, IB_MTU_1024 = 3, - IB_MTU_2048 = 4, - IB_MTU_4096 = 5 + IB_MTU_1500 = 4, + IB_MTU_2048 = 5, + IB_MTU_4096 = 6, + IB_MTU_9000 = 7 }; I don't think we can blithely do this... I think the IB enum values are defined to match the values used in the IB spec (PathRecord etc). Even if we change it so 1500 and 9000 are outside of the range used by the IB spec, I don't understand the motivation for this change. What does this buy us? How is iWARP working today without this change? The IB_MTU stuff really doesn't apply to iwarp devices. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] Add IBV_*_USNIC enums for the Cisco Ethernet Virtual NIC.
On Wed, Apr 3, 2013 at 5:49 PM, Hefty, Sean sean.he...@intel.com wrote: Per off-list conversation with Roland, add some new enums for the Cisco Ethernet Virtual NIC (it's not an RNIC/iWARP device, so it doesn't fit in the same category as RDMA_NODE_RNIC / RDMA_TRANSPORT_IWARP). USNIC = Userspace NIC. Can we get a better patch description? Maybe mention something about the NIC? Does it support all verbs? Is it for kernel users or just user space? Does this simply export a raw ethernet interface? Jeff, I agree with Sean, there's not much point to review/discuss these general/pre-step patches without seeing some actual device specific kernel (if there are such or user space code if there aren't any kernel ones) code. e.g you can submit the two kernel pre-step patches as the two first pieces in a series that has the driver code. Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IPoIB - GRO forces memcpy inside __pskb_pull_tail
Hello, ... If I get it right round about 6% (7.38% * 84.56%) of the time the machine does a memcpy inside __pskb_pull_tail. The comments on this function reads ... it expands header moving its tail forward and copying necessary data from fragmented part. ... It is pretty complicated. Luckily, it is called only in exceptional cases That does not sound good at all. I repeated the test on a normal Intel gigabit network without jumbo frames and __pskb_pull_tail was not in the top consumer list. Does anyone have an idea if this is normal GRO behaviour for IPOIB. At the moment I have a full test environment and could implement and verify some kernel corrections if someone could give a helpful hint. As always, it would be good and helpful if you can re-run the test with the latest upstream kernel, e.g 3.9-rc, and anyway, I added Eric who might have some insight on the matter. Or. going through hard lessons to understand the SKBs maybe I finally found the reason for the unnecessary memcpy commands. Even with newest 3.9-rc5 kernel the problem persists. IPoIB creates only fragmented SKBs without any single bit in the normal data part. Some debug messages during GRO handling showed skb-len = 1988 (total data) skb-data_len= 1988 (paged data) skb_headlen(skb) = 0(non paged data) inet_gro_receive() requires the IP header inside the SKB. So it pulls missing data from fragments. This process requires extra memcpy operations. It all comes from ipoib_ud_need_sg() that determines if a receive block will fit into a single page. Whenever this function is called the one and only parameter is max_ib_mtu of the device. In my case with a ConnectX card this defaults to 4K no matter what MTU is really set. As a result IPoIB will always create a separate SKB fragment for the incoming data. My old but nicely working switch only allows a MTU of 2044 bytes. So I assumed that I do not need to care about fragments and modifed the priv-max_ib_mtu hardcoded to 3072. Pages are sufficient large for this MTU. A quick test afterwards without claim of perfectionism showed the expected effects. 1) no more additional memcpy operations 2) netperf throughput raised from ~ 5.3GBit to ~ 5.8GBit I hope that I'm not totally wrong with this finding and my simple explanation is conclusive. Maybe someone with more knowledge about this all can assist me to get an offical patch into the RDMA development tree? Thanks in advance. Markus -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 for-next 0/5] IB/IPoIB: Add multi-queue TSS and RSS support
On Tue, Apr 2, 2013 at 12:50 AM, Or Gerlitz or.gerl...@gmail.com wrote: On Sun, Mar 24, 2013 at 2:44 PM, Or Gerlitz ogerl...@mellanox.com wrote: On 19/03/2013 20:57, Hefty, Sean wrote: I have not had a chance to look at v3 yet. will love it if you do so... we've just posted V4 which is respin of V3 over an ipoib change Hi Sean, we have posted the TSS/RSS V0 patches on May 2012 and so far attempted to address all the feedback / questions you provided/had. Could you comment how you see things w.r.t to these patches? specifically the QP groups concept on which you had raised some concerns which we believe were addressed with V3. Hi Sean, Ping. You had concerns on the suggested concept, we want to know if we addressed them, can you comment? Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/5] tipc: add support for TIPC over InfiniBand
On 04/03/2013 08:43 AM, Patrick McHardy wrote: The following patchset adds support for running TIPC over InfiniBand. The patchset consists of three parts (+ a minor fix for the ethernet media type): - Preparation: removal of an the unused str2addr callback and move of the bcast_addr from struct tipc_media to struct tipc_bearer. This is necessary because InfiniBand doesn't have a fixed broadcast address like ethernet, so it needs to be initialized with the device's broadcast address when the bearer is enabled - Introduction of a TIPC InfiniBand media type. A new media type is needed to deal with the different address sizes - Support for ETH_P_TIPC in IPoIB The last patch is something I'd like to discuss, I realize that this diverges from the IPoIB specification, however the alternative would be to implement something which would be pretty much identical to IPoIB with the only difference of handling a different ethertype in the xmit function. In fact I'd like to propose to remove all higher layer protocol knowledge from IPoIB except for ARP and RARP, which need special treatment. With the recent patch to manage neighbour entries in IPoIB itself, no further knowledge of higher layer protocols is required. The patchset is based on net-next. Comments welcome. Happy to see this initiative being taken. It seems to me that you have grasped our intentions for how to add a new bearer, so I really don't have much comments, except the one already made by Erik. To me it looks good. Regards ///jon -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH V3 for-next 0/5] IB/IPoIB: Add multi-queue TSS and RSS support
Hi Sean, Ping. You had concerns on the suggested concept, we want to know if we addressed them, can you comment? I'm in meetings this week until tomorrow. I'll try to take a look at the updated patches then or Friday. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 for-next 0/5] IB/IPoIB: Add multi-queue TSS and RSS support
On Wed, Apr 3, 2013 at 11:12 PM, Hefty, Sean sean.he...@intel.com wrote: Hi Sean, Ping. You had concerns on the suggested concept, we want to know if we addressed them, can you comment? I'm in meetings this week until tomorrow. I'll try to take a look at the updated patches then or Friday. OK, thanks, the 3.10 merge window is coming closer and I want to know where are we in that respect. Almost every Ethernet NIC you use has RSS, there's no reason for IPoIB not to support that too. Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1] infiniband-diags/saquery.c: switchinfo support added
Husam, I had to fix a couple of line wraps in your mailer (see below). But even after that the patch does not apply. 13:10:37 git am \[PATCH_v1\]__infiniband-diags_saquery.c__switchinfo_support_added.eml Applying: infiniband-diags/saquery.c: switchinfo support added error: patch failed: src/saquery.c:481 error: src/saquery.c: patch does not apply Patch failed at 0001 infiniband-diags/saquery.c: switchinfo support added When you have resolved this problem run git am --resolved. If you would prefer to skip this patch, instead run git am --skip. To restore the original branch and stop patching run git am --abort. iweiny iqa-136 ~/dev/management/infiniband-diags 13:10:44 patch -p 1 \[PATCH_v1\]__infiniband-diags_saquery.c__switchinfo_support_added.eml patching file src/saquery.c Hunk #1 FAILED at 481. Hunk #2 FAILED at 1150. Hunk #3 FAILED at 1342. 3 out of 3 hunks FAILED -- saving rejects to file src/saquery.c.rej Could you ensure this is based on the current master branch? On Sun, 31 Mar 2013 16:29:17 +0300 Husam Kahalah hkaha...@asaltech.com wrote: Added support to filter SwitchInfoRecords by switch LID Signed-off-by: Husam kahalah hkaha...@asaltech.com --- src/saquery.c | 55 +++ 1 files changed, 55 insertions(+), 0 deletions(-) diff --git a/src/saquery.c b/src/saquery.c index d31d77d..6390bcd 100644 --- a/src/saquery.c +++ b/src/saquery.c @@ -481,6 +481,41 @@ static void dump_service_record(void *data) cl_ntoh64(p_sr-service_data64[1])); } +static void dump_switch_info_record(void *data) +{ +ib_switch_info_record_t *p_sir = data; + +printf(SwitchInfoRecord dump:\n + \t\tRID\n + \t\tlid.%u\n + \t\tSwitchInfo dump:\n + \t\tlin_cap.0x%X\n + \t\trand_cap0x%X\n + \t\tmcast_cap...0x%X\n + \t\tlin_top.0x%X\n + \t\tdef_port%u\n + \t\tdef_mcast_pri_port..%u\n + \t\tdef_mcast_not_port..%u\n + \t\tlife_state..%u\n + \t\tlids_per_port...0x%X\n + \t\tenforce_cap.0x%X\n + \t\tflags...%u\n + \t\tmcast_top...0x%X\n, + cl_ntoh16(p_sir-lid), + cl_ntoh16(p_sir-switch_info.lin_cap), + cl_ntoh16(p_sir-switch_info.rand_cap), + cl_ntoh16(p_sir-switch_info.mcast_cap), + cl_ntoh16(p_sir-switch_info.lin_top), + p_sir-switch_info.def_port, + p_sir-switch_info.def_mcast_pri_port, + p_sir-switch_info.def_mcast_not_port, + p_sir-switch_info.life_state, + cl_ntoh16(p_sir-switch_info.lids_per_port), + cl_ntoh16(p_sir-switch_info.enforce_cap), + p_sir-switch_info.flags, + cl_ntoh16(p_sir-switch_info.mcast_top)); +} + static void dump_inform_info_record(void *data) { char gid_str[INET6_ADDRSTRLEN]; @@ -1150,6 +1185,24 @@ static int query_service_records(const struct query_cmd *q, struct sa_handle * h Line wrap here failed to apply. dump_service_record); } +static int query_switchinfo_records(const struct query_cmd *q, +struct sa_handle * h, struct query_params *p, +int argc, char *argv[]) +{ +ib_switch_info_record_t swir; +ib_net64_t comp_mask = 0; +int lid = 0; + +if (argc 0) +parse_lid_and_ports(h, argv[0], lid, NULL, NULL); + +memset(swir, 0, sizeof(swir)); +CHECK_AND_SET_VAL(lid, 16, 0, swir.lid, SWIR, LID); + +return get_and_dump_any_records(h, IB_SA_ATTR_SWITCHINFORECORD, 0, comp_mask, Line wrap here failed to apply. Thanks, Ira +swir, sizeof(swir), dump_switch_info_record); +} + static int query_inform_info_records(const struct query_cmd *q, struct sa_handle * h, struct query_params *p, int argc, char *argv[]) @@ -1342,6 +1395,8 @@ static const struct query_cmd query_cmds[] = { [[mlid]/[position]/[block]], query_mft_records}, {GUIDInfoRecord, GIR, IB_SA_ATTR_GUIDINFORECORD, [[lid]/[block]], query_guidinfo_records}, +{SwitchInfoRecord, SWIR, IB_SA_ATTR_SWITCHINFORECORD, + [lid], query_switchinfo_records}, {0} }; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Ira Weiny ira.we...@intel.com -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC-v2 10/12] iser-target: Add logic for verbs
On Wed, 2013-04-03 at 10:04 +0300, Or Gerlitz wrote: On 02/04/2013 09:18, Or Gerlitz wrote: On 23/03/2013 01:55, Nicholas A. Bellinger wrote: +++ b/drivers/infiniband/ulp/isert/isert_verbs.h @@ -0,0 +1,5 @@ +extern void isert_connect_release(struct isert_conn *); +extern void isert_put_conn(struct isert_conn *); +extern int isert_cma_handler(struct rdma_cm_id *, struct rdma_cm_event *); +extern int isert_post_recv(struct isert_conn *, u32); +extern int isert_post_send(struct isert_conn *, struct iser_tx_desc *); why use extern here? maybe a left over from V1? Nic, are you picking this comment one and its sister comment asking to remove externs and use less header files? So in yesterday's target-pending/iser-target-wip push, source/headers have been merged into a single ib_isert.[c,h], with the exception of the existing isert_proto.h definitions. This will be included as a single commit for RFC-v3. --nab -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] libibmad: Add setting of errno on function failure
On Sat, 30 Mar 2013 09:53:18 -0700 Boris Chiu boris.c...@oracle.com wrote: From: Boris Chiu boris.c...@oracle.com To: Ira Weiny ira.we...@intel.com From d26b4c3cf435f2e89cb8c86c30043e7bfad7cec0 Mon Sep 17 00:00:00 2001 From: Brendan Doyle brendan.do...@oracle.com Date: Tue, 12 Mar 2013 19:38:52 + Subject: [PATCH] libibmad: Add setting of errno on function failure Signed-off-by: Brendan Doyle brendan.do...@oracle.com Thanks applied, Ira -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC-v2 10/12] iser-target: Add logic for verbs
Nicholas A. Bellinger n...@linux-iscsi.org wrote: So in yesterday's target-pending/iser-target-wip push, source/headers have been merged into a single ib_isert.[c,h], with the exception of the existing isert_proto.h definitions. This will be included as a single commit for RFC-v3. sounds good -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html