RE: [PATCH librdmacm 3/8] autogen.sh: Use autoreconf in autogen.sh
Hi, Le 17.07.2013 06:22, Hefty, Sean a écrit : Thanks - I pulled in these patches, but see below: Thanks. diff --git a/autogen.sh b/autogen.sh index f433312..6c9233e 100755 --- a/autogen.sh +++ b/autogen.sh @@ -1,9 +1,4 @@ #! /bin/sh set -x -test -d ./config || mkdir ./config Without the above line, the build fails. I added it back in. If there's some other way of ensuring that this directory exists, please let me know. Sorry for the inconvenience. I've checked libibverbs: it has a .gitignore in ./config so that git kept the empty directory. It's a different solution for the same problem, I'm not able to say if it's a better one. Regards. -- Yann Droneaud OPTEYA -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH librdmacm 3/8] autogen.sh: Use autoreconf in autogen.sh
On 17/07/2013 07:22, Hefty, Sean wrote: Thanks - I pulled in these patches, but see below: Hi Sean, If you do this house cleanup, could you also address the below build warnings. I can see them when I build rpm from the 1.0.17 tar ball, but not when doing plain make on the latest git, probably b/c the build through the spec uses some more build/warnings flags. Or. configure: creating ./config.status config.status: creating Makefile config.status: creating librdmacm.spec config.status: creating config.h config.status: executing depfiles commands config.status: executing libtool commands + make -j2 make all-am make[1]: Entering directory `/usr/src/redhat/BUILD/librdmacm-1.0.17' CC src_librdmacm_la-cma.lo CC src_librdmacm_la-addrinfo.lo src/addrinfo.c: In function 'ucma_convert_to_rai': src/addrinfo.c:193: warning: dereferencing type-punned pointer will break strict-aliasing rules src/addrinfo.c:210: warning: dereferencing type-punned pointer will break strict-aliasing rules CC src_librdmacm_la-acm.lo CC src_librdmacm_la-rsocket.lo src/rsocket.c: In function 'rs_modify_svcs': src/rsocket.c:403: warning: ignoring return value of 'write', declared with attribute warn_unused_result src/rsocket.c:404: warning: ignoring return value of 'read', declared with attribute warn_unused_result src/rsocket.c: In function 'rs_configure': src/rsocket.c:460: warning: ignoring return value of 'fscanf', declared with attribute warn_unused_result src/rsocket.c:465: warning: ignoring return value of 'fscanf', declared with attribute warn_unused_result src/rsocket.c:473: warning: ignoring return value of 'fscanf', declared with attribute warn_unused_result src/rsocket.c:478: warning: ignoring return value of 'fscanf', declared with attribute warn_unused_result src/rsocket.c:483: warning: ignoring return value of 'fscanf', declared with attribute warn_unused_result src/rsocket.c:491: warning: ignoring return value of 'fscanf', declared with attribute warn_unused_result src/rsocket.c:498: warning: ignoring return value of 'fscanf', declared with attribute warn_unused_result src/rsocket.c: In function 'rs_svc_process_sock': src/rsocket.c:3623: warning: ignoring return value of 'read', declared with attribute warn_unused_result src/rsocket.c:3632: warning: ignoring return value of 'write', declared with attribute warn_unused_result src/rsocket.c: In function 'rs_svc_run': src/rsocket.c:3805: warning: ignoring return value of 'write', declared with attribute warn_unused_result src/rsocket.c: In function 'ds_get_dest': src/rsocket.c:1451: warning: 'qp' may be used uninitialized in this function src/rsocket.c: In function 'rs_send_iomaps': src/rsocket.c:2305: warning: 'ret' may be used uninitialized in this function CC src_librdmacm_la-indexer.lo CC src_librspreload_la-preload.lo src/preload.c: In function 'dup2': src/preload.c:1020: warning: value computed is not used CC src_librspreload_la-indexer.lo CC cmatose.o CC common.o CC rping.o CC udaddy.o CC mckey.o CC rdma_client.o CC rdma_server.o CC rdma_xclient.o CC rdma_xserver.o CC rstream.o CC rcopy.o CC riostream.o CC udpong.o CCLD src/librdmacm.la CCLD src/librspreload.la CCLD examples/ucmatose CCLD examples/rping CCLD examples/udaddy CCLD examples/mckey CCLD examples/rdma_client CCLD examples/rdma_server CCLD examples/rdma_xclient CCLD examples/rdma_xserver CCLD examples/rstream CCLD examples/rcopy CCLD examples/riostream CCLD examples/udpong make[1]: Leaving directory `/usr/src/redhat/BUILD/librdmacm-1.0.17' + exit 0 Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.10929 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] RDMA/cma: silence GCC warning
Building cma.o triggers this GCC warning: drivers/infiniband/core/cma.c: In function ‘rdma_resolve_addr’: drivers/infiniband/core/cma.c:465:23: warning: ‘port’ may be used uninitialized in this function [-Wmaybe-uninitialized] drivers/infiniband/core/cma.c:426:5: note: ‘port’ was declared here This is a false positive, as port will always be initialized if we're at found. But if we assign to id_priv-id.port_num directly, we can drop port. That will, obviously, silence GCC. Signed-off-by: Paul Bolle pebo...@tiscali.nl --- 0) v2: assign to id_priv-id.port_num directly, instead of initializing port to 0, as discussed with Sean. 1) Still only compile tested. drivers/infiniband/core/cma.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index f1c279f..84487a2 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -423,7 +423,7 @@ static int cma_resolve_ib_dev(struct rdma_id_private *id_priv) struct sockaddr_ib *addr; union ib_gid gid, sgid, *dgid; u16 pkey, index; - u8 port, p; + u8 p; int i; cma_dev = NULL; @@ -443,7 +443,7 @@ static int cma_resolve_ib_dev(struct rdma_id_private *id_priv) if (!memcmp(gid, dgid, sizeof(gid))) { cma_dev = cur_dev; sgid = gid; - port = p; + id_priv-id.port_num = p; goto found; } @@ -451,7 +451,7 @@ static int cma_resolve_ib_dev(struct rdma_id_private *id_priv) dgid-global.subnet_prefix)) { cma_dev = cur_dev; sgid = gid; - port = p; + id_priv-id.port_num = p; } } } @@ -462,7 +462,6 @@ static int cma_resolve_ib_dev(struct rdma_id_private *id_priv) found: cma_attach_to_dev(id_priv, cma_dev); - id_priv-id.port_num = port; addr = (struct sockaddr_ib *) cma_src_addr(id_priv); memcpy(addr-sib_addr, sgid, sizeof sgid); cma_translate_ib(addr, id_priv-id.route.addr.dev_addr); -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 11/13] IB/srp: Make HCA completion vector configurable
On 7/16/2013 6:11 PM, Bart Van Assche wrote: On 14/07/2013 3:43, Sagi Grimberg wrote: Just wrote a small patch to allow srp_daemon spread connection across HCA's completion vectors. Hello Sagi, How about the following approach: - Add support for reading the completion vector from srp_daemon.conf, similar to how several other parameters are already read from that file. Here We need to take into consideration that we are changing the functionality of srp_daemon.conf. Now instead of simply allowing/dis-allowing targets of specific attributes, we are also defining configuration attributes of allowed targets. This might be uncomfortable for the user to explicitly write N target strings in srp_daemon.conf just for completion vectors assignment. Perhaps srp_daemon.conf can contain a list (comma separated) of reserved completion vectors for srp_daemon to spread CQs among them. If this line won't exist - srp_daemon will spread assignment on all HCAs completion vectors. - If the completion vector parameter has not been set in srp_daemon.conf, let srp_daemon assign a completion vector such that IB interrupts for different SRP hosts use different completion vectors. Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH FIXES for-3.11 4/4] IB/ipoib: Fix pkey-change flow for Virtualization environments
From: Erez Shitrit ere...@mellanox.com IPoIB's required behaviour w.r.t to the pkey used by the device is the following: - For parent interfaces (e.g ib0, ib1, etc) who are created automatically as a result of hot-plug events from the IB core, the driver needs to take whatever pkey vlaue it finds in index 0, and stick to that index. - For child interfaces (e.g ib0.8001, etc) created by admin directive, the driver needs to use and stick to the value provided during its creation. In SR-IOV environment its possible for the VF probe to take place before the cloud management software provisions the suitable pkey for the VF in the paravirtualed PKEY table index 0. When this is the case, the VF IB stack will find in index 0 an invalide pkey, which is all zeros. Moreover, the cloud managment can assign the pkey value at index 0 at any time of the guest life cycle. The correct behavior for IPoIB to address these requirements for parent interfaces is to use PKEY_CHANGE event as trigger to optionally re-init the device pkey value and re-create all the relevant resources accordingly, if the value of the pkey in index 0 has changed (from invalid to valid or from valid value X to invalid value Y). This patch enhances the heavy flushing code which is triggered by pkey change event, to behave correctly for parent devices. For child devices, the code remains the same, namely chases pkey value and not index. Signed-off-by: Erez Shitrit ere...@mellanox.com Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/ulp/ipoib/ipoib_ib.c| 68 +++- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 12 - 2 files changed, 66 insertions(+), 14 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 2cfa76f..a2db524 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -932,12 +932,39 @@ int ipoib_ib_dev_init(struct net_device *dev, struct ib_device *ca, int port) return 0; } +/* + * Takes whatever value which is in pkey index 0 and updates priv-pkey + * returns 0 if the pkey value was changed. + */ +static inline int update_parent_pkey_index(struct ipoib_dev_priv *priv) +{ + int result; + u16 prev_pkey; + + prev_pkey = priv-pkey; + result = ib_query_pkey(priv-ca, priv-port, 0, priv-pkey); + if (result) { + ipoib_warn(priv, ib_query_pkey port %d failed (ret = %d)\n, + priv-port, result); + return result; + } + + if (prev_pkey != priv-pkey) { + ipoib_dbg(priv, pkey changed from 0x%x to 0x%x\n, + prev_pkey, priv-pkey); + return 0; + } + + return 1; +} + static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv, enum ipoib_flush_level level) { struct ipoib_dev_priv *cpriv; struct net_device *dev = priv-dev; u16 new_index; + int result; mutex_lock(priv-vlan_mutex); @@ -951,6 +978,10 @@ static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv, mutex_unlock(priv-vlan_mutex); if (!test_bit(IPOIB_FLAG_INITIALIZED, priv-flags)) { + /* for non-child devices must check/update the pkey value here */ + if (level == IPOIB_FLUSH_HEAVY + !test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags)) + update_parent_pkey_index(priv); ipoib_dbg(priv, Not flushing - IPOIB_FLAG_INITIALIZED not set.\n); return; } @@ -961,21 +992,32 @@ static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv, } if (level == IPOIB_FLUSH_HEAVY) { - if (ib_find_pkey(priv-ca, priv-port, priv-pkey, new_index)) { - clear_bit(IPOIB_PKEY_ASSIGNED, priv-flags); - ipoib_ib_dev_down(dev, 0); - ipoib_ib_dev_stop(dev, 0); - if (ipoib_pkey_dev_delay_open(dev)) + /* child devices chase their origin pkey value, while non-child +* (parent) devices should always takes what present in pkey index 0 +*/ + if (test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags)) { + if (ib_find_pkey(priv-ca, priv-port, priv-pkey, new_index)) { + clear_bit(IPOIB_PKEY_ASSIGNED, priv-flags); + ipoib_ib_dev_down(dev, 0); + ipoib_ib_dev_stop(dev, 0); + if (ipoib_pkey_dev_delay_open(dev)) + return; + } + /* restart QP only if P_Key index is changed */ + if (test_and_set_bit(IPOIB_PKEY_ASSIGNED, priv-flags) + new_index ==
[PATCH FIXES for-3.11 1/4] IB/core: Create QP1 using the pkey index which contains the default pkey
From: Jack Morgenstein ja...@dev.mellanox.co.il Currently, QP1 is created using pkey index 0. This patch simply looks for the index containing the default pkey, rather than hard-coding pkey index 0. This change will have no effect in Native mode, since QP0 and QP1 are created before the SM configures the port, so pkey table will still be the default table defined by the IB Spec, in C10-123: If non-volatile storage is not used to hold P_Key Table contents, then if a PM (Partition Manager) is not present, and prior to PM initialization of the P_Key Table, the P_Key Table must act as if it contains a single valid entry, at P_Key_ix = 0, containing the default partition key. All other entries in the P_Key Table must be invalid. Thus, in the native mode case, the driver will find the default pkey at index 0 (so it will be no different than the hard-coding). However, in SRIOV mode, for VFs, the pkey table may be paravirtualized, so that the VF's pkey index zero may not necessarily be mapped to the real pkey index 0. For VFs, therefore, it is important to find the virtual index which maps to the real default pkey. This commit does the following for QP1 creation: 1. Find the pkey index containing the default pkey, and use that index if found. ib_find_pkey() returns the index of the limited-membership default pkey (0x7FFF) if the full-member default pkey is not in the table. 2. If neither form of the default pkey is found, use pkey index 0 (previous behavior). Signed-off-by: Jack Morgenstein ja...@dev.mellanox.co.il Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/core/mad.c |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index dc3fd1e..9be6754 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -2663,6 +2663,7 @@ static int ib_mad_port_start(struct ib_mad_port_private *port_priv) int ret, i; struct ib_qp_attr *attr; struct ib_qp *qp; + u16 pkey_index = 0; attr = kmalloc(sizeof *attr, GFP_KERNEL); if (!attr) { @@ -2670,6 +2671,11 @@ static int ib_mad_port_start(struct ib_mad_port_private *port_priv) return -ENOMEM; } + ret = ib_find_pkey(port_priv-device, port_priv-port_num, + IB_DEFAULT_PKEY_FULL, pkey_index); + if (ret) + pkey_index = 0; + for (i = 0; i IB_MAD_QPS_CORE; i++) { qp = port_priv-qp_info[i].qp; if (!qp) @@ -2680,7 +2686,7 @@ static int ib_mad_port_start(struct ib_mad_port_private *port_priv) * one is needed for the Reset to Init transition */ attr-qp_state = IB_QPS_INIT; - attr-pkey_index = 0; + attr-pkey_index = pkey_index; attr-qkey = (qp-qp_num == 0) ? 0 : IB_QP1_QKEY; ret = ib_modify_qp(qp, attr, IB_QP_STATE | IB_QP_PKEY_INDEX | IB_QP_QKEY); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH FIXES for-3.11 3/4] IB/ipoib: Make sure child devices use valid/proper pkeys
Make sure that the IB invalid pkey (0x or 0x8000) isn't used for child devices. Also, make sure to always set the full membership bit for the pkey of devices created by rtnl link ops. Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/ulp/ipoib/ipoib_main.c|2 +- drivers/infiniband/ulp/ipoib/ipoib_netlink.c |9 + 2 files changed, 10 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index b6e049a..c6f71a8 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -1461,7 +1461,7 @@ static ssize_t create_child(struct device *dev, if (sscanf(buf, %i, pkey) != 1) return -EINVAL; - if (pkey 0 || pkey 0x) + if (pkey = 0 || pkey 0x || pkey == 0x8000) return -EINVAL; /* diff --git a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c index 7468593..f81abe1 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c @@ -119,6 +119,15 @@ static int ipoib_new_child_link(struct net *src_net, struct net_device *dev, } else child_pkey = nla_get_u16(data[IFLA_IPOIB_PKEY]); + if (child_pkey == 0 || child_pkey == 0x8000) + return -EINVAL; + + /* +* Set the full membership bit, so that we join the right +* broadcast group, etc. +*/ + child_pkey |= 0x8000; + err = __ipoib_vlan_add(ppriv, netdev_priv(dev), child_pkey, IPOIB_RTNL_CHILD); if (!err data) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH FIXES for-3.11 2/4] IB/mlx4: Use default pkey when creating tunnel QPs
From: Jack Morgenstein ja...@dev.mellanox.co.il When creating tunnel QPs for special QP tunneling, look for the default pkey in the slave's virtual pkey table. If it is present, use the real pkey index where the default pkey is located. If the default pkey is not found in the pkey table, use the real pkey index which is stored at index 0 in the slave's virtual pkey table (this is the current behavior). This change is required to support cloud computing, where the paravirtualized index of the default pkey is moved to index 1 or higher. The pkey at paravirtualized index 0 is used for the default IPoIB interface created by the VF. Its possible for the pkey value at paravirtualized index 0 to be invalid (zero) at VF probe time (pkey index 0 is mapped to real pkey index 127, which contains pkey = 0). At some point after the VF probe, the cloud computing interface at the Hypervisor maps virtual index 0 for the VF to the pkey index containing the pkey that IPoIB will use in its operation. However, when the tunnel QP is created, the pkey at the slave's virtual index 0 is still mapped to the invalid pkey index, so tunnel QP creation fails. This commit causes the Hypervisor to search for the default pkey in the slave's pkey table -- and this pkey is present in the table (at index 0) at tunnel QP creation time, so that the tunnel QP creation will succeed. Signed-off-by: Jack Morgenstein ja...@dev.mellanox.co.il Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/hw/mlx4/mad.c | 10 -- 1 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c index 4d599ce..f2a3f48 100644 --- a/drivers/infiniband/hw/mlx4/mad.c +++ b/drivers/infiniband/hw/mlx4/mad.c @@ -1511,8 +1511,14 @@ static int create_pv_sqp(struct mlx4_ib_demux_pv_ctx *ctx, memset(attr, 0, sizeof attr); attr.qp_state = IB_QPS_INIT; - attr.pkey_index = - to_mdev(ctx-ib_dev)-pkeys.virt2phys_pkey[ctx-slave][ctx-port - 1][0]; + ret = 0; + if (create_tun) + ret = find_slave_port_pkey_ix(to_mdev(ctx-ib_dev), ctx-slave, + ctx-port, IB_DEFAULT_PKEY_FULL, + attr.pkey_index); + if (ret || !create_tun) + attr.pkey_index = + to_mdev(ctx-ib_dev)-pkeys.virt2phys_pkey[ctx-slave][ctx-port - 1][0]; attr.qkey = IB_QP1_QKEY; attr.port_num = ctx-port; ret = ib_modify_qp(tun_qp-qp, attr, qp_attr_mask_INIT); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH FIXES for-3.11 0/4] Pkey fixes for IB core and IPoIB
Hi Roland, This set of fixes is critical for Virtualization environments when the VM para-virtualized PKEY table isn't fully configured at the time the VF is probed, or when the management pkey is provisioned to non-zero index in the VF pkey table. The first three patches are pretty much few liners (two of them with somehow long change log...). The forth one a bit larger. Would be happy to see them all going to -stable, either by you adding a Cc: sta...@vger.kernel.org when you push them or I can send them to Greg after they spend some time upstream. Or. Erez Shitrit (1): IB/ipoib: Fix pkey-change flow for Virtualization environments Jack Morgenstein (2): IB/core: Create QP1 using the pkey index which contains the default pkey IB/mlx4: Use default pkey when creating tunnel QPs Or Gerlitz (1): IB/ipoib: Make sure child devices use valid/proper pkeys drivers/infiniband/core/mad.c |8 +++- drivers/infiniband/hw/mlx4/mad.c | 10 +++- drivers/infiniband/ulp/ipoib/ipoib_ib.c| 68 +++- drivers/infiniband/ulp/ipoib/ipoib_main.c |2 +- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 12 - drivers/infiniband/ulp/ipoib/ipoib_netlink.c |9 +++ 6 files changed, 91 insertions(+), 18 deletions(-) This is the sequence of events when the IPoIB patch is applied at the guest: -- the VM pkey table contains 0x in index 0, and hence the mgid has 0x8000 as the pkey [root@xena017-3 infiniband]# cat /sys/class/infiniband/mlx4_0/ports/1/pkeys/0 0x [root@xena017-3 ~]# ip addr show ib0 22: ib0: NO-CARRIER,BROADCAST,MULTICAST,UP mtu 2044 qdisc pfifo_fast state DOWN qlen 256 link/infiniband 80:00:05:8b:fe:80:00:00:00:00:00:00:00:14:05:00:00:00:04:f9 brd 00:ff:ff:ff:ff:12:40:1b:80:00:00:00:00:00:00:00:ff:ff:ff:ff inet 192.168.20.199/24 brd 192.168.20.255 scope global ib0 -- the hypervisor changed pkey value in index 0 of the VM pkey table to contain 0x8001 [root@xena017-3 infiniband]# cat /sys/class/infiniband/mlx4_0/ports/1/pkeys/0 0x8001 [root@xena017-3 ~]# dmesg ib0: bringing up interface IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready ib0: multicast join failed for ff12:401b:8000:::::, status -22 ib0: multicast join failed for ff12:401b:8000:::::, status -22 [...] ib0: Event 12 on device mlx4_0 port 1 ib0: pkey changed from 0x8000 to 0x8001 ib0: downing ib_dev ib0: All sends and receives done. ib0: Created ah 88011a11f4e0 IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready ib0: Created ah 88011a11f660 ib0: Created ah 88011a11f780 --- mgid changed to use 0x8001 [root@xena017-3 ~]# ip addr show ib0 22: ib0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 2044 qdisc pfifo_fast state UP qlen 256 link/infiniband 80:00:05:8b:fe:80:00:00:00:00:00:00:00:14:05:00:00:00:04:f9 brd 00:ff:ff:ff:ff:12:40:1b:80:01:00:00:00:00:00:00:ff:ff:ff:ff inet 192.168.20.199/24 brd 192.168.20.255 scope global ib0 ping works etc -- when the mlx4 patch isn't applied on the host we see these errors mlx4_ib create_pv_sqp: Couldn't change tunnel qp state to INIT (-22) mlx4_ib create_pv_resources: Couldn't create tunnel for QP1 (-22) -- this is the sequence of events when the IPoIB patch is not applied at the guest: -- the VM pkey table contains 0x in index 0, and hence the mgid has 0x8000 as the pkey [root@xena017-3 infiniband]# cat /sys/class/infiniband/mlx4_0/ports/1/pkeys/0 0x [root@xena017-3 infiniband]# modprobe ib_ipoib debug_level=1 [root@xena017-3 infiniband]# ip a s ib0 30: ib0: NO-CARRIER,BROADCAST,MULTICAST,UP mtu 2044 qdisc pfifo_fast state DOWN qlen 256 link/infiniband 80:00:05:93:fe:80:00:00:00:00:00:00:00:14:05:00:00:00:04:f9 brd 00:ff:ff:ff:ff:12:40:1b:80:00:00:00:00:00:00:00:ff:ff:ff:ff [root@xena017-3 infiniband]# cat /sys/class/infiniband/mlx4_0/ports/1/pkeys/0 0x8001 -- ipoib got the event, but nothing changed [root@xena017-3 infiniband]# ip a s ib0 30: ib0: NO-CARRIER,BROADCAST,MULTICAST,UP mtu 2044 qdisc pfifo_fast state DOWN qlen 256 link/infiniband 80:00:05:93:fe:80:00:00:00:00:00:00:00:14:05:00:00:00:04:f9 brd 00:ff:ff:ff:ff:12:40:1b:80:00:00:00:00:00:00:00:ff:ff:ff:ff inet 192.168.20.199/24 brd 192.168.20.255 scope global ib0 [root@xena017-3 infiniband]# dmesg ib%d: max_srq_sge=31 ib%d: max_cm_mtu = 0xfff0, num_frags=16 ib%d: max_srq_sge=31 ib%d: max_cm_mtu = 0xfff0, num_frags=16 ib0: bringing up interface IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready ib0: multicast join failed for ff12:401b:8000:::::, status -22 ib0: multicast join failed for ff12:401b:8000:::::, status -22 ib0: multicast join failed for ff12:401b:8000:::::, status -22 ib0: multicast join failed for ff12:401b:8000:::::, status -22 ib0: Event 12 on device mlx4_0 port 1 ib0: downing ib_dev ib0: All sends and receives done. -- To unsubscribe from
[PATCH REPOST FIXES for-3.11 1/4] IB/core: Create QP1 using the pkey index which contains the default pkey
From: Jack Morgenstein ja...@dev.mellanox.co.il Currently, QP1 is created using pkey index 0. This patch simply looks for the index containing the default pkey, rather than hard-coding pkey index 0. This change will have no effect in Native mode, since QP0 and QP1 are created before the SM configures the port, so pkey table will still be the default table defined by the IB Spec, in C10-123: If non-volatile storage is not used to hold P_Key Table contents, then if a PM (Partition Manager) is not present, and prior to PM initialization of the P_Key Table, the P_Key Table must act as if it contains a single valid entry, at P_Key_ix = 0, containing the default partition key. All other entries in the P_Key Table must be invalid. Thus, in the native mode case, the driver will find the default pkey at index 0 (so it will be no different than the hard-coding). However, in SRIOV mode, for VFs, the pkey table may be paravirtualized, so that the VF's pkey index zero may not necessarily be mapped to the real pkey index 0. For VFs, therefore, it is important to find the virtual index which maps to the real default pkey. This commit does the following for QP1 creation: 1. Find the pkey index containing the default pkey, and use that index if found. ib_find_pkey() returns the index of the limited-membership default pkey (0x7FFF) if the full-member default pkey is not in the table. 2. If neither form of the default pkey is found, use pkey index 0 (previous behavior). Signed-off-by: Jack Morgenstein ja...@dev.mellanox.co.il Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/core/mad.c |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index dc3fd1e..9be6754 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -2663,6 +2663,7 @@ static int ib_mad_port_start(struct ib_mad_port_private *port_priv) int ret, i; struct ib_qp_attr *attr; struct ib_qp *qp; + u16 pkey_index = 0; attr = kmalloc(sizeof *attr, GFP_KERNEL); if (!attr) { @@ -2670,6 +2671,11 @@ static int ib_mad_port_start(struct ib_mad_port_private *port_priv) return -ENOMEM; } + ret = ib_find_pkey(port_priv-device, port_priv-port_num, + IB_DEFAULT_PKEY_FULL, pkey_index); + if (ret) + pkey_index = 0; + for (i = 0; i IB_MAD_QPS_CORE; i++) { qp = port_priv-qp_info[i].qp; if (!qp) @@ -2680,7 +2686,7 @@ static int ib_mad_port_start(struct ib_mad_port_private *port_priv) * one is needed for the Reset to Init transition */ attr-qp_state = IB_QPS_INIT; - attr-pkey_index = 0; + attr-pkey_index = pkey_index; attr-qkey = (qp-qp_num == 0) ? 0 : IB_QP1_QKEY; ret = ib_modify_qp(qp, attr, IB_QP_STATE | IB_QP_PKEY_INDEX | IB_QP_QKEY); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH REPOST FIXES for-3.11 4/4] IB/ipoib: Fix pkey-change flow for Virtualization environments
From: Erez Shitrit ere...@mellanox.com IPoIB's required behaviour w.r.t to the pkey used by the device is the following: - For parent interfaces (e.g ib0, ib1, etc) who are created automatically as a result of hot-plug events from the IB core, the driver needs to take whatever pkey vlaue it finds in index 0, and stick to that index. - For child interfaces (e.g ib0.8001, etc) created by admin directive, the driver needs to use and stick to the value provided during its creation. In SR-IOV environment its possible for the VF probe to take place before the cloud management software provisions the suitable pkey for the VF in the paravirtualed PKEY table index 0. When this is the case, the VF IB stack will find in index 0 an invalide pkey, which is all zeros. Moreover, the cloud managment can assign the pkey value at index 0 at any time of the guest life cycle. The correct behavior for IPoIB to address these requirements for parent interfaces is to use PKEY_CHANGE event as trigger to optionally re-init the device pkey value and re-create all the relevant resources accordingly, if the value of the pkey in index 0 has changed (from invalid to valid or from valid value X to invalid value Y). This patch enhances the heavy flushing code which is triggered by pkey change event, to behave correctly for parent devices. For child devices, the code remains the same, namely chases pkey value and not index. Signed-off-by: Erez Shitrit ere...@mellanox.com Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/ulp/ipoib/ipoib_ib.c | 76 +- 1 files changed, 63 insertions(+), 13 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 2cfa76f..196b1d1 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -932,12 +932,47 @@ int ipoib_ib_dev_init(struct net_device *dev, struct ib_device *ca, int port) return 0; } +/* + * Takes whatever value which is in pkey index 0 and updates priv-pkey + * returns 0 if the pkey value was changed. + */ +static inline int update_parent_pkey(struct ipoib_dev_priv *priv) +{ + int result; + u16 prev_pkey; + + prev_pkey = priv-pkey; + result = ib_query_pkey(priv-ca, priv-port, 0, priv-pkey); + if (result) { + ipoib_warn(priv, ib_query_pkey port %d failed (ret = %d)\n, + priv-port, result); + return result; + } + + priv-pkey |= 0x8000; + + if (prev_pkey != priv-pkey) { + ipoib_dbg(priv, pkey changed from 0x%x to 0x%x\n, + prev_pkey, priv-pkey); + /* +* Update the pkey in the broadcast address, while making sure to set +* the full membership bit, so that we join the right broadcast group. +*/ + priv-dev-broadcast[8] = priv-pkey 8; + priv-dev-broadcast[9] = priv-pkey 0xff; + return 0; + } + + return 1; +} + static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv, enum ipoib_flush_level level) { struct ipoib_dev_priv *cpriv; struct net_device *dev = priv-dev; u16 new_index; + int result; mutex_lock(priv-vlan_mutex); @@ -951,6 +986,10 @@ static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv, mutex_unlock(priv-vlan_mutex); if (!test_bit(IPOIB_FLAG_INITIALIZED, priv-flags)) { + /* for non-child devices must check/update the pkey value here */ + if (level == IPOIB_FLUSH_HEAVY + !test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags)) + update_parent_pkey(priv); ipoib_dbg(priv, Not flushing - IPOIB_FLAG_INITIALIZED not set.\n); return; } @@ -961,21 +1000,32 @@ static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv, } if (level == IPOIB_FLUSH_HEAVY) { - if (ib_find_pkey(priv-ca, priv-port, priv-pkey, new_index)) { - clear_bit(IPOIB_PKEY_ASSIGNED, priv-flags); - ipoib_ib_dev_down(dev, 0); - ipoib_ib_dev_stop(dev, 0); - if (ipoib_pkey_dev_delay_open(dev)) + /* child devices chase their origin pkey value, while non-child +* (parent) devices should always takes what present in pkey index 0 +*/ + if (test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags)) { + if (ib_find_pkey(priv-ca, priv-port, priv-pkey, new_index)) { + clear_bit(IPOIB_PKEY_ASSIGNED, priv-flags); + ipoib_ib_dev_down(dev, 0); + ipoib_ib_dev_stop(dev, 0); + if (ipoib_pkey_dev_delay_open(dev))
Re: [PATCH -next] IB/mlx5: use module_pci_driver to simplify the code
Looks to me like a convenience that we may need to give up later should we need to put any code in the init or cleanup functions. On Wed, Jul 17, 2013 at 09:56:41AM +0800, Wei Yongjun wrote: From: Wei Yongjun yongjun_...@trendmicro.com.cn Use the module_pci_driver() macro to make the code simpler by eliminating module_init and module_exit calls. Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn --- drivers/infiniband/hw/mlx5/main.c | 13 + 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 8000fff..0cdc185 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -1490,15 +1490,4 @@ static struct pci_driver mlx5_ib_driver = { .remove = remove_one }; -static int __init mlx5_ib_init(void) -{ - return pci_register_driver(mlx5_ib_driver); -} - -static void __exit mlx5_ib_cleanup(void) -{ - pci_unregister_driver(mlx5_ib_driver); -} - -module_init(mlx5_ib_init); -module_exit(mlx5_ib_cleanup); +module_pci_driver(mlx5_ib_driver); -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] mlx5: qp: variable may be used uninitialized
Acked-by: Eli Cohen e...@mellanox.com On Tue, Jul 16, 2013 at 03:35:01PM +0200, Andi Shyti wrote: in the sq_overhead() function, if qp_typ is equal to IB_QPT_RC, size will be used uninitialized. Signed-off-by: Andi Shyti a...@etezian.org --- drivers/infiniband/hw/mlx5/qp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 16ac54c..045f8cd 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -199,7 +199,7 @@ static int set_rq_size(struct mlx5_ib_dev *dev, struct ib_qp_cap *cap, static int sq_overhead(enum ib_qp_type qp_type) { - int size; + int size = 0; switch (qp_type) { case IB_QPT_XRC_INI: -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch -next] mlx5: return -EFAULT instead of -EPERM
Acked-by Eli Cohen e...@mellanox.com On Wed, Jul 10, 2013 at 01:58:59PM +0300, Dan Carpenter wrote: For copy_to/from_user() failure, the correct error code is -EFAULT not -EPERM. Signed-off-by: Dan Carpenter dan.carpen...@oracle.com diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index e2daa8f..bd41df9 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -171,7 +171,7 @@ static ssize_t size_write(struct file *filp, const char __user *buf, int c; if (copy_from_user(lbuf, buf, sizeof(lbuf))) - return -EPERM; + return -EFAULT; c = order2idx(dev, ent-order); lbuf[sizeof(lbuf) - 1] = 0; @@ -208,7 +208,7 @@ static ssize_t size_read(struct file *filp, char __user *buf, size_t count, return err; if (copy_to_user(buf, lbuf, err)) - return -EPERM; + return -EFAULT; *pos += err; @@ -233,7 +233,7 @@ static ssize_t limit_write(struct file *filp, const char __user *buf, int c; if (copy_from_user(lbuf, buf, sizeof(lbuf))) - return -EPERM; + return -EFAULT; c = order2idx(dev, ent-order); lbuf[sizeof(lbuf) - 1] = 0; @@ -270,7 +270,7 @@ static ssize_t limit_read(struct file *filp, char __user *buf, size_t count, return err; if (copy_to_user(buf, lbuf, err)) - return -EPERM; + return -EFAULT; *pos += err; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c index c1c0eef..205753a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c @@ -693,7 +693,7 @@ static ssize_t dbg_write(struct file *filp, const char __user *buf, return -ENOMEM; if (copy_from_user(lbuf, buf, sizeof(lbuf))) - return -EPERM; + return -EFAULT; lbuf[sizeof(lbuf) - 1] = 0; @@ -889,7 +889,7 @@ static ssize_t data_write(struct file *filp, const char __user *buf, return -ENOMEM; if (copy_from_user(ptr, buf, count)) { - err = -EPERM; + err = -EFAULT; goto out; } dbg-in_msg = ptr; @@ -919,7 +919,7 @@ static ssize_t data_read(struct file *filp, char __user *buf, size_t count, copy = min_t(int, count, dbg-outlen); if (copy_to_user(buf, dbg-out_msg, copy)) - return -EPERM; + return -EFAULT; *pos += copy; @@ -949,7 +949,7 @@ static ssize_t outlen_read(struct file *filp, char __user *buf, size_t count, return err; if (copy_to_user(buf, outlen, err)) - return -EPERM; + return -EFAULT; *pos += err; @@ -974,7 +974,7 @@ static ssize_t outlen_write(struct file *filp, const char __user *buf, dbg-outlen = 0; if (copy_from_user(outlen_str, buf, count)) - return -EPERM; + return -EFAULT; outlen_str[7] = 0; -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH REPOST FIXES for-3.11 1/4] IB/core: Create QP1 using the pkey index which contains the default pkey
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index dc3fd1e..9be6754 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -2663,6 +2663,7 @@ static int ib_mad_port_start(struct ib_mad_port_private *port_priv) int ret, i; struct ib_qp_attr *attr; struct ib_qp *qp; + u16 pkey_index = 0; This shouldn't need to be initialized, as it is always set further down in the patch. Reviewed-by: Sean Hefty sean.he...@intel.com -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2] libibverbs: Allow arbitrary int values for MTU
On 07/16/2013 08:16 PM, Roland Dreier wrote: On Tue, Jul 16, 2013 at 10:11 AM, Jeff Squyres (jsquyres) jsquy...@cisco.com wrote: - doing it this way preserves ABI, so existing binaries are safe I still don't get this. Wouldn't an existing binary be pretty surprised to get a value wildly out of range of the enum? Yes, but there's no way around that without simply lying about the MTU. So, the argument was made in the thread that historically, applications have had to be modified when moved to a new link layer (aka, iWARP meant IB apps had to be slightly modified for connection reasons, RoCE again required some slight app modifications, etc) so this was seen as a case of the app will work on fabrics it already knows about, and will only get confused if moved to this new fabric, and in that case, the app needs to be modified anyway, so that's acceptable breakage for keeping the apps working the rest of the time. That was the argument anyway. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2] RDMA/cma: silence GCC warning
Building cma.o triggers this GCC warning: drivers/infiniband/core/cma.c: In function ‘rdma_resolve_addr’: drivers/infiniband/core/cma.c:465:23: warning: ‘port’ may be used uninitialized in this function [-Wmaybe-uninitialized] drivers/infiniband/core/cma.c:426:5: note: ‘port’ was declared here This is a false positive, as port will always be initialized if we're at found. But if we assign to id_priv-id.port_num directly, we can drop port. That will, obviously, silence GCC. Signed-off-by: Paul Bolle pebo...@tiscali.nl Acked-by: Sean Hefty sean.he...@intel.com --- 0) v2: assign to id_priv-id.port_num directly, instead of initializing port to 0, as discussed with Sean. 1) Still only compile tested. tested - thanks N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
RE: [PATCH V2] libibverbs: Allow arbitrary int values for MTU
I hadn't looked at the kernel side yet; I was waiting for the userspace side to sort itself out first. I think it makes sense to start with how user space can get the data. Without eating up reserved fields, we're starting with 8 bit values. Hmm. 16 bits is probably enough for the MTU values, but still, changing kern- abi.h will be problematic from an ABI perspective. Do people care about the kernel ABI, or is that mainly a userspace issue? Well, we definitely care about the kernel to user ABI. I can't imagine that we're dealing with more than a handful of actual MTU values. Maybe the simplest thing is to extend the mtu enum to include what new values are needed, plus add a function to convert it. (Can we call mulligan?) I don't know how iwarp handles this. Does it just report the wrong mtu, since it doesn't necessarily matter? Steve - any idea here? - Sean -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] RDMA/nes: Reversing commit bca1935ccdec to silence allmodconfig build warning
Reversing commit Fix compilation error when nes_debug is enabled which removes variables nes_tcp_state_str and nes_iwarp_state_str, assuming that they aren't defined. However, they are defined within a #ifdef NES_DEBUG statement, which if enabled causes defined but not used compiler warning, when the variables are removed. Signed-off-by: Tatyana Nikolova tatyana.e.nikol...@intel.com Reported-by: Stephen Rothwell s...@canb.auug.org.au --- drivers/infiniband/hw/nes/nes_hw.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c index 418004c..9020024 100644 --- a/drivers/infiniband/hw/nes/nes_hw.c +++ b/drivers/infiniband/hw/nes/nes_hw.c @@ -3570,10 +3570,10 @@ static void nes_process_iwarp_aeqe(struct nes_device *nesdev, tcp_state = (aeq_info NES_AEQE_TCP_STATE_MASK) NES_AEQE_TCP_STATE_SHIFT; iwarp_state = (aeq_info NES_AEQE_IWARP_STATE_MASK) NES_AEQE_IWARP_STATE_SHIFT; nes_debug(NES_DBG_AEQ, aeid = 0x%04X, qp-cq id = %d, aeqe = %p, -Tcp state = %d, iWARP state = %d\n, +Tcp state = %s, iWARP state = %s\n, async_event_id, le32_to_cpu(aeqe-aeqe_words[NES_AEQE_COMP_QP_CQ_ID_IDX]), aeqe, - tcp_state, iwarp_state); + nes_tcp_state_str[tcp_state], nes_iwarp_state_str[iwarp_state]); aeqe_cq_id = le32_to_cpu(aeqe-aeqe_words[NES_AEQE_COMP_QP_CQ_ID_IDX]); if (aeq_info NES_AEQE_QP) { -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2] libibverbs: Allow arbitrary int values for MTU
On 7/17/2013 4:41 PM, Hefty, Sean wrote: I hadn't looked at the kernel side yet; I was waiting for the userspace side to sort itself out first. I think it makes sense to start with how user space can get the data. Without eating up reserved fields, we're starting with 8 bit values. Hmm. 16 bits is probably enough for the MTU values, but still, changing kern- abi.h will be problematic from an ABI perspective. Do people care about the kernel ABI, or is that mainly a userspace issue? Well, we definitely care about the kernel to user ABI. I can't imagine that we're dealing with more than a handful of actual MTU values. Maybe the simplest thing is to extend the mtu enum to include what new values are needed, plus add a function to convert it. (Can we call mulligan?) I don't know how iwarp handles this. Does it just report the wrong mtu, since it doesn't necessarily matter? Steve - any idea here? The iwarp drivers just report the nearest mtu enum. Apps don't need it for iwarp like they do for ib. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH V3 for-next 3/4] IB/core: Export ib_create/destroy_flow through uverbs
+ssize_t ib_uverbs_create_flow(struct ib_uverbs_file *file, + const char __user *buf, int in_len, + int out_len) +{ + struct ib_uverbs_create_flow cmd; + struct ib_uverbs_create_flow_resp resp; + struct ib_uobject *uobj; + struct ib_flow*flow_id; + struct ib_kern_flow_attr *kern_flow_attr; + struct ib_flow_attr *flow_attr; + struct ib_qp *qp; + int err = 0; + void *kern_spec; + void *ib_spec; + int i; + + if (out_len sizeof(resp)) + return -ENOSPC; + + if (copy_from_user(cmd, buf, sizeof(cmd))) + return -EFAULT; + + if ((cmd.flow_attr.type == IB_FLOW_ATTR_SNIFFER + !capable(CAP_NET_ADMIN)) || !capable(CAP_NET_RAW)) + return -EPERM; + + if (cmd.flow_attr.num_of_specs) { + kern_flow_attr = kmalloc(cmd.flow_attr.size, GFP_KERNEL); + if (!kern_flow_attr) + return -ENOMEM; + + memcpy(kern_flow_attr, cmd.flow_attr, sizeof(*kern_flow_attr)); + if (copy_from_user(kern_flow_attr + 1, buf + sizeof(cmd), +cmd.flow_attr.size - sizeof(cmd))) { + err = -EFAULT; + goto err_free_attr; + } + } else { + kern_flow_attr = cmd.flow_attr; + } + + uobj = kmalloc(sizeof(*uobj), GFP_KERNEL); + if (!uobj) { + err = -ENOMEM; + goto err_free_attr; + } + init_uobj(uobj, 0, file-ucontext, rule_lock_class); + down_write(uobj-mutex); + + qp = idr_read_qp(cmd.qp_handle, file-ucontext); + if (!qp) { + err = -EINVAL; + goto err_uobj; + } + + flow_attr = kmalloc(cmd.flow_attr.size, GFP_KERNEL); + if (!flow_attr) { + err = -ENOMEM; + goto err_put; + } + + flow_attr-type = kern_flow_attr-type; + flow_attr-priority = kern_flow_attr-priority; + flow_attr-num_of_specs = kern_flow_attr-num_of_specs; + flow_attr-port = kern_flow_attr-port; + flow_attr-flags = kern_flow_attr-flags; + flow_attr-size = sizeof(*flow_attr); + + kern_spec = kern_flow_attr + 1; + ib_spec = flow_attr + 1; + for (i = 0; i flow_attr-num_of_specs; i++) { + err = kern_spec_to_ib_spec(kern_spec, ib_spec); + if (err) + goto err_free; + flow_attr-size += + ((struct _ib_flow_spec *)ib_spec)-size; + kern_spec += ((struct ib_kern_spec *)kern_spec)-size; + ib_spec += ((struct _ib_flow_spec *)ib_spec)-size; I didn't see where the ib_kern_spec size field was validated. Maybe add this check to kern_spec_to_ib_spec? + } + flow_id = ib_create_flow(qp, flow_attr, IB_FLOW_DOMAIN_USER); + if (IS_ERR(flow_id)) { + err = PTR_ERR(flow_id); + goto err_free; + } + flow_id-qp = qp; + flow_id-uobject = uobj; + uobj-object = flow_id; + + err = idr_add_uobj(ib_uverbs_rule_idr, uobj); + if (err) + goto destroy_flow; + + memset(resp, 0, sizeof(resp)); + resp.flow_handle = uobj-id; + + if (copy_to_user((void __user *)(unsigned long) cmd.response, + resp, sizeof(resp))) { + err = -EFAULT; + goto err_copy; + } + + put_qp_read(qp); + mutex_lock(file-mutex); + list_add_tail(uobj-list, file-ucontext-rule_list); + mutex_unlock(file-mutex); + + uobj-live = 1; + + up_write(uobj-mutex); + kfree(flow_attr); + if (cmd.flow_attr.num_of_specs) + kfree(kern_flow_attr); + return in_len; +err_copy: + idr_remove_uobj(ib_uverbs_rule_idr, uobj); +destroy_flow: + ib_destroy_flow(flow_id); +err_free: + kfree(flow_attr); +err_put: + put_qp_read(qp); +err_uobj: + put_uobj_write(uobj); +err_free_attr: + if (cmd.flow_attr.num_of_specs) + kfree(kern_flow_attr); + return err; +} + +ssize_t ib_uverbs_destroy_flow(struct ib_uverbs_file *file, +const char __user *buf, int in_len, +int out_len) { + struct ib_uverbs_destroy_flow cmd; + struct ib_flow *flow_id; + struct ib_uobject *uobj; + int ret; + + if (copy_from_user(cmd, buf, sizeof(cmd))) + return -EFAULT; + + uobj = idr_write_uobj(ib_uverbs_rule_idr, cmd.flow_handle, + file-ucontext); + if (!uobj) + return -EINVAL; + flow_id = uobj-object; + + ret = ib_destroy_flow(flow_id); + if (!ret) + uobj-live = 0; + + put_uobj_write(uobj); + +
Re: [PATCH V2] libibverbs: Allow arbitrary int values for MTU
On Jul 17, 2013, at 5:44 PM, Steve Wise sw...@opengridcomputing.com wrote: The iwarp drivers just report the nearest mtu enum. Apps don't need it for iwarp like they do for ib. For RC, it doesn't matter much. So the fact that RoCE and iWARP lie about their MTU isn't a huge deal. It's wrong, but it doesn't matter much. We need it for UD for our upcoming device, however, because the MTU is the only way to get the max message size. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html