[ANNOUNCE] uDAPL library: v1 - compat-1.2.15 and v2 - dapl-2.0.25 release
New release for uDAPL 1.2 and 2.0 available on the OFA download page and in my git tree. md5sum: d7b710aebd8bb9b48b6b8982f71f62c7 compat-dapl-1.2.15.tar.gz md5sum: 3a14b650bbfbe38243eb4a11157bef5d dapl-2.0.25.tar.gz Summary of v1 and v2 changes since last release: v1 - Release 1.2.15 fixes: v1 - dtest, dapltest: conflict with dapl-2 utils package, change to dapl1, dapltest1 v1 - scm: fix compiler warning, unused variable v2 - Release 2.0.25 fixes: v2 - winof scm: initialize opt for NODELAY setsockopt v2 - winof cma: windows definition for EADDRNOTAVAIL missing v2 - scm: client side setsockopt NODELAY fails if data arrives before setting v2 - cma: setup_listener Cannot assign requested address v2 - common: seg fault in dapl_evd_wait with multi-thread application using CNO's. v2 - ucm: inbound DREQ/DREP handshake should transition QP. v2 - winof: Remove duplicate include of comp_channel.cpp from cm.c as it is included in opensm_ucb/device v2 - winof: Utilize WinOF version of inet_ntop() for Windows OSes which do not support inet_ntop(). Vlad, please pull both new packages into latest OFED 1.5 build and install the following: dapl-2.0.25-1 dapl-utils-2.0.25-1 dapl-devel-2.0.25-1 dapl-debuginfo-2.0.25-1 compat-dapl-1.2.15-1 compat-dapl-devel-1.2.15-1 See http://www.openfabrics.org/downloads/dapl/ more details. Thanks, -arlin -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
QoS settings not mapped correctly per pkey ?
Hello, Following the QoS experiments I carried out yesterday, I wanted to set up 3 IP networks, each one bound to a particular pkey, in order to achieve QoS for each network. Unfortunately, it seems that something is not mapped properly in the ULP layers (vlarb tables are fine). The settings are as follows: opensm.conf: qos_max_vls8 qos_high_limit 1 qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0 qos_vlarb_low 0:8,1:1,2:1,3:4,4:0,5:0 qos_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 The corresponding VLArb tables are fine on both the server (pichu16) and the client (pichu22): [r...@pichu22 network-scripts]# smpquery vlarb -D 0 # VLArbitration tables: DR path slid 65535; dlid 65535; 0 port 0 LowCap 8 HighCap 8 # Low priority VL Arbitration Table: VL: |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 | WEIGHT: |0x8 |0x1 |0x1 |0x4 |0x0 |0x0 |0x0 |0x0 | # High priority VL Arbitration Table: VL: |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 | WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 | [r...@pichu16 ~]# smpquery vlarb -D 0 # VLArbitration tables: DR path slid 65535; dlid 65535; 0 port 0 LowCap 8 HighCap 8 # Low priority VL Arbitration Table: VL: |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 | WEIGHT: |0x8 |0x1 |0x1 |0x4 |0x0 |0x0 |0x0 |0x0 | # High priority VL Arbitration Table: VL: |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 | WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 | partitions.conf: --- default=0x7fff,ipoib: ALL=full; ip_backbone=0x0001,ipoib: ALL=full; ip_admin=0x0002,ipoib: ALL=full; qos-policy.conf: --- qos-ulps default: 0 # default SL ipoib, pkey 0x7FFF : 1 # IP with default pkey 0x7FFF ipoib, pkey 0x1: 2 # backbone IP with pkey 0x1 ipoib, pkey 0x2: 3 # admin IP with pkey 0x2 end-qos-ulps Assigned IP addresses (in /etc/hosts): - 10.12.1.4 pichu16-ic0 # default IPoIB network, pkey 0x7FFF 10.13.1.4 pichu16-backbone# IPoIB backbone network, pkey 0x1 10.14.1.4 pichu16-admin # IPoIB admin network, pkey 0x2 10.12.1.10 pichu22-ic0 # default IPoIB network, pkey 0x7FFF 10.13.1.10 pichu22-backbone# IPoIB backbone network, pkey 0x1 10.14.1.10 pichu22-admin # IPoIB admin network, pkey 0x2 Note that the netmask is /16, so the -ic0, -backbone and -admin networks cannot see each other. IPoIB settings on server side: -- [r...@pichu16 ~]# tail -n 5 /etc/sysconfig/network-scripts/ifcfg-ib0* == /etc/sysconfig/network-scripts/ifcfg-ib0 == BOOTPROTO=static IPADDR=10.12.1.4 NETMASK=255.255.0.0 ONBOOT=yes MTU=2044 == /etc/sysconfig/network-scripts/ifcfg-ib0.8001 == BOOTPROTO=static IPADDR=10.13.1.4 NETMASK=255.255.0.0 ONBOOT=yes MTU=2044 == /etc/sysconfig/network-scripts/ifcfg-ib0.8002 == BOOTPROTO=static IPADDR=10.14.1.4 NETMASK=255.255.0.0 ONBOOT=yes MTU=2044 [r...@pichu16 ~]# ip addr show ib0 4: ib0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 2044 qdisc pfifo_fast state UP qlen 256 link/infiniband 80:00:00:48:fe:80:00:00:00:00:00:00:2c:90:00:10:0d:00:05:6d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff inet 10.12.1.4/16 brd 10.12.255.255 scope global ib0 inet 10.13.1.4/16 brd 10.13.255.255 scope global ib0 inet 10.14.1.4/16 brd 10.14.255.255 scope global ib0 inet6 fe80::2e90:10:d00:56d/64 scope link valid_lft forever preferred_lft forever IPoIB settings on client side: -- [r...@pichu22 ~]# tail -n 5 /etc/sysconfig/network-scripts/ifcfg-ib0* == /etc/sysconfig/network-scripts/ifcfg-ib0 == BOOTPROTO=static IPADDR=10.12.1.10 NETMASK=255.255.0.0 ONBOOT=yes MTU=2044 == /etc/sysconfig/network-scripts/ifcfg-ib0.8001 == BOOTPROTO=static IPADDR=10.13.1.10 NETMASK=255.255.0.0 ONBOOT=yes MTU=2044 == /etc/sysconfig/network-scripts/ifcfg-ib0.8002 == BOOTPROTO=static IPADDR=10.14.1.10 NETMASK=255.255.0.0 ONBOOT=yes MTU=2044 [r...@pichu22 ~]# ip addr show ib0 48: ib0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 2044 qdisc pfifo_fast state UP qlen 256 link/infiniband 80:00:00:48:fe:80:00:00:00:00:00:00:2c:90:00:10:0d:00:06:79 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff inet 10.12.1.10/16 brd 10.12.255.255 scope global ib0 inet 10.13.1.10/16 brd 10.13.255.255 scope global ib0 inet 10.14.1.10/16 brd 10.14.255.255 scope global ib0 inet6 fe80::2e90:10:d00:679/64 scope link valid_lft forever preferred_lft forever Iperf servers on server side: - Quoting from iperf help: -B, --bind host bind to host, an interface or multicast address -s, --server run in server mode Each iperf server is bound to a dedicated interface as follows: [r...@pichu16 ~]# iperf -s -B pichu16-backbone [r...@pichu16 ~]# iperf -s -B pichu16-admin [r...@pichu16 ~]# iperf -s -B pichu16-ic0 Iperf clients on client side:
Re: RDMAoE verbs questions
On Nov 25, 2009, at 2:25 AM, Or Gerlitz wrote: I was reviewing Mellanox's Open MPI patches for RDMAoE support Can you send us point to the patch series (mail thread or some repository where they sit)? Here's one thread: http://www.open-mpi.org/community/lists/devel/2009/11/7063.php the latest patch in that thread is here: http://www.open-mpi.org/community/lists/devel/2009/11/7119.php Here's another thread with a slightly different thread, but with elements of IBoE support in it: http://www.open-mpi.org/community/lists/devel/2009/11/7120.php 1. It looks like there is a new field on the ibv_port_attr struct: transport. Is it expected that all device drivers will start filling in this value, or is it done in the OF core code somewhere? Please note that this field isn't present in the distro provided IB stack and hence it is highly recommended to avoid referring it in your code, FWIW: we have configure tests checking for this field (just like we have configure tests checking for transport_type, because that wasn't always there, either). However, it is a little disturbing that based on this conversation, that field name may change, and therefore we'll have to add *more* configure logic to figure out what exact field to check. The same is true for all the IBoE code -- since none of that code has been approved yet, it's risky to base any code off it. :-\ as least some of us (...) are for decoupling ompi from ofed, so lets not put sticks in the wheels of that process. Hear hear (let's remove MPI from OFED! :-) ). But I think that this is a separate issue. -- Jeff Squyres jsquy...@cisco.com -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
On Nov 24, 2009, at 11:52 PM, Jason Gunthorpe wrote: OMPI uses RDMACM (among others), so I'm not sure I follow what you're asking me...? I think I'm asking you about the non RDMACM stuff in openmpi, ibcm, xoob, etc. I can't tell at glance if any of them will be safe to run on RDMAoE as-is.. Wait, I think I might have been mistaken. I'm looking through the patches this morning and I don't see the don't allow host-loopback if it's IBoE logic. The only places I see the check for real IB vs. IBoE is when deciding to use IBCM or OOB connection schemes (which, as Pasha said, are designed to be [real] IB only). But, as you mentioned, there definitely are apps that don't use RDMACM and use an out of band (i.e., OOB) mechanism for making IB QP's. They therefore might have similar issues (need to check for real IB vs. IBoE). Sorry for the confusion... I'm going to chalk it up to the fact that it was late at night when I sent that. :-) -- Jeff Squyres jsquy...@cisco.com -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: QoS settings not mapped correctly per pkey ?
Hi Vincent, Vincent Ficet wrote: Hello, Following the QoS experiments I carried out yesterday, I wanted to set up 3 IP networks, each one bound to a particular pkey, in order to achieve QoS for each network. Unfortunately, it seems that something is not mapped properly in the ULP layers (vlarb tables are fine). The settings are as follows: opensm.conf: qos_max_vls8 qos_high_limit 1 qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0 qos_vlarb_low 0:8,1:1,2:1,3:4,4:0,5:0 qos_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 Please check section 7 of the QoS_management_in_OpenSM.txt doc. It explains what exactly is the meaning of the values in the VLArb table. It also has explanation of the problem that you're seeing. Quoting from there: Keep in mind that ports usually transmit packets of size equal to MTU. For instance, for 4KB MTU a single packet will require 64 credits, so in order to achieve effective VL arbitration for packets of 4KB MTU, the weighting values for each VL should be multiples of 64. -- Yevgeny The corresponding VLArb tables are fine on both the server (pichu16) and the client (pichu22): [r...@pichu22 network-scripts]# smpquery vlarb -D 0 # VLArbitration tables: DR path slid 65535; dlid 65535; 0 port 0 LowCap 8 HighCap 8 # Low priority VL Arbitration Table: VL: |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 | WEIGHT: |0x8 |0x1 |0x1 |0x4 |0x0 |0x0 |0x0 |0x0 | # High priority VL Arbitration Table: VL: |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 | WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 | [r...@pichu16 ~]# smpquery vlarb -D 0 # VLArbitration tables: DR path slid 65535; dlid 65535; 0 port 0 LowCap 8 HighCap 8 # Low priority VL Arbitration Table: VL: |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 | WEIGHT: |0x8 |0x1 |0x1 |0x4 |0x0 |0x0 |0x0 |0x0 | # High priority VL Arbitration Table: VL: |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 | WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 | partitions.conf: --- default=0x7fff,ipoib: ALL=full; ip_backbone=0x0001,ipoib: ALL=full; ip_admin=0x0002,ipoib: ALL=full; qos-policy.conf: --- qos-ulps default: 0 # default SL ipoib, pkey 0x7FFF : 1 # IP with default pkey 0x7FFF ipoib, pkey 0x1: 2 # backbone IP with pkey 0x1 ipoib, pkey 0x2: 3 # admin IP with pkey 0x2 end-qos-ulps Assigned IP addresses (in /etc/hosts): - 10.12.1.4 pichu16-ic0 # default IPoIB network, pkey 0x7FFF 10.13.1.4 pichu16-backbone# IPoIB backbone network, pkey 0x1 10.14.1.4 pichu16-admin # IPoIB admin network, pkey 0x2 10.12.1.10 pichu22-ic0 # default IPoIB network, pkey 0x7FFF 10.13.1.10 pichu22-backbone# IPoIB backbone network, pkey 0x1 10.14.1.10 pichu22-admin # IPoIB admin network, pkey 0x2 Note that the netmask is /16, so the -ic0, -backbone and -admin networks cannot see each other. IPoIB settings on server side: -- [r...@pichu16 ~]# tail -n 5 /etc/sysconfig/network-scripts/ifcfg-ib0* == /etc/sysconfig/network-scripts/ifcfg-ib0 == BOOTPROTO=static IPADDR=10.12.1.4 NETMASK=255.255.0.0 ONBOOT=yes MTU=2044 == /etc/sysconfig/network-scripts/ifcfg-ib0.8001 == BOOTPROTO=static IPADDR=10.13.1.4 NETMASK=255.255.0.0 ONBOOT=yes MTU=2044 == /etc/sysconfig/network-scripts/ifcfg-ib0.8002 == BOOTPROTO=static IPADDR=10.14.1.4 NETMASK=255.255.0.0 ONBOOT=yes MTU=2044 [r...@pichu16 ~]# ip addr show ib0 4: ib0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 2044 qdisc pfifo_fast state UP qlen 256 link/infiniband 80:00:00:48:fe:80:00:00:00:00:00:00:2c:90:00:10:0d:00:05:6d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff inet 10.12.1.4/16 brd 10.12.255.255 scope global ib0 inet 10.13.1.4/16 brd 10.13.255.255 scope global ib0 inet 10.14.1.4/16 brd 10.14.255.255 scope global ib0 inet6 fe80::2e90:10:d00:56d/64 scope link valid_lft forever preferred_lft forever IPoIB settings on client side: -- [r...@pichu22 ~]# tail -n 5 /etc/sysconfig/network-scripts/ifcfg-ib0* == /etc/sysconfig/network-scripts/ifcfg-ib0 == BOOTPROTO=static IPADDR=10.12.1.10 NETMASK=255.255.0.0 ONBOOT=yes MTU=2044 == /etc/sysconfig/network-scripts/ifcfg-ib0.8001 == BOOTPROTO=static IPADDR=10.13.1.10 NETMASK=255.255.0.0 ONBOOT=yes MTU=2044 == /etc/sysconfig/network-scripts/ifcfg-ib0.8002 == BOOTPROTO=static IPADDR=10.14.1.10 NETMASK=255.255.0.0 ONBOOT=yes MTU=2044 [r...@pichu22 ~]# ip addr show ib0 48: ib0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 2044 qdisc pfifo_fast state UP qlen 256 link/infiniband 80:00:00:48:fe:80:00:00:00:00:00:00:2c:90:00:10:0d:00:06:79 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff inet 10.12.1.10/16 brd 10.12.255.255 scope global ib0 inet 10.13.1.10/16 brd 10.13.255.255 scope global ib0 inet 10.14.1.10/16 brd 10.14.255.255
[ANNOUNCE] OFED 1.5 rc3 release is available
OFED 1.5-rc3 is available Notes: The tarball is available on: http://www.openfabrics.org/builds/ofed-1.5/release/OFED-1.5-rc3.tgz To get BUILD_ID run ofed_info Please report any issues in bugzilla https://bugs.openfabrics.org/ for OFED 1.5 Vladimir Tziporet Release information: Linux Operating Systems: - RedHat EL4 up6: 2.6.9-67.ELsmp - RedHat EL4 up7: 2.6.9-78.ELsmp - RedHat EL4 up8: 2.6.9-89.ELsmp - RedHat EL5 up2: 2.6.18-92.el5 - RedHat EL5 up3: 2.6.18-128.el5 - RedHat EL5 up4: 2.6.18-164.el5 - SLES10 SP2: 2.6.16.60-0.21-smp - SLES10 SP3: 2.6.16.60-0.54-smp - SLES11: 2.6.27.19-5-default - OEL 4 up7 2.6.9-78.ELsmp - OEL 5 up2 2.6.18-92.el5 - CentOS5.2 2.6.18-92.el5 - CentOS5.3 2.6.18-128.el5 - Fedora Core11 2.6.29 * - OpenSuSE 11 2.6.25.5-1.1* - kernel.org http://kernel.org/: 2.6.29 and 2.6.30 * Minimal QA for these versions Systems: * x86_64 * x86 * ia64 * ppc64 Main changes from 1.5 rc2: 1. Updated packages: libipathverbs-1.2.tar.gz ibutils-1.2-0.1.ge8e69b7.tar.gz ib-bonding-0.9.0-41.src.rpm rnfs-utils-1.1.5-10.OFED.src.rpm - Management packages: libibumad-1.3.3_20091118_f25da9f.tar.gz libibmad-1.3.3_20091118_f25da9f.tar.gz opensm-3.3.3_20091118_f25da9f.tar.gz infiniband-diags-1.5.3_20091118_f25da9f.tar.gz - MPI packages: mvapich-1.2.0-3532.src.rpm (Bug fixes) mvapich2-1.4-2.src.rpm mpitests-3.2-916.src.rpm (Updated IMB and OSU benchmark tests) 2. Bug fixes See attached. Limitations: - SLES10 SP3 on IA64 is not supported yet commit bb49143753189b46c195635972e44ee512e5ee5c Author: Eli Cohen e...@mellanox.co.il Date: Tue Nov 24 17:14:23 2009 +0200 From 134e8ba45ac49b773ba912726b4faee6a2839fab Mon Sep 17 00:00:00 2001 From: Eli Cohen e...@mellanox.co.il Date: Tue, 24 Nov 2009 17:06:35 +0200 Subject: [PATCH] mlx4: properly mask MGM entry members count The members_count field size is 24 bits so mask it properly when reading it. Signed-off-by: Eli Cohen e...@mellanox.co.il commit f78229e5764dd2c354adc36c766b166fb21dc24d Merge: 9b7e7cf... cc39d9d... Author: Vladimir Sokolovsky v...@mellanox.co.il Date: Tue Nov 24 13:25:48 2009 +0200 Merge branch 'ofed_1_5' of ssh://v...@sofa.openfabrics.org/~swise/scm/ofed_kernel into ofed_kernel_1_5 commit 9b7e7cf0b17afbc9aa0844e46338101aed0db032 Merge: 9cf376f... ea2b694... Author: Vladimir Sokolovsky v...@mellanox.co.il Date: Tue Nov 24 10:43:47 2009 +0200 Merge branch 'ofed_kernel_1_5' of git://git.openfabrics.org/~amirv/ofed_1_5 into ofed_kernel_1_5 commit ea2b694c1a744fb9ece2724dedd43b4b9efc046b Author: Amir Vadai am...@mellanox.co.il Date: Tue Nov 24 09:33:53 2009 +0200 sdp: Fixed annoying warning by memtrack kzalloc done in sdp_seq_open is freed by the system in a function that memtrack can't see. Therefore it printed false warning. Signed-off-by: Amir Vadai am...@mellanox.co.il commit ab7b734e930d22a25f904c47e4f4898718cb150c Author: Amir Vadai am...@mellanox.co.il Date: Tue Nov 24 09:32:39 2009 +0200 sdp: fixed BUG1796 - running out of memory on rx rcv queue could grow endlessly because minimal RX buffers in QP was set to SDP_MIN_TX_CREDITS + 1 - so there always were credits available for the sender. Signed-off-by: Amir Vadai am...@mellanox.co.il commit ae23d8839e308cd28f0d75ce98fb7dc247c9459b Author: Amir Vadai am...@mellanox.co.il Date: Mon Nov 23 13:29:41 2009 +0200 sdp: fixed sparse warnings Signed-off-by: Amir Vadai am...@mellanox.co.il commit cc39d9d3858d37bf45ac58f880a36b9a2f2a8e8d Author: Steve Wise sw...@opengridcomputing.com Date: Mon Nov 23 11:46:02 2009 -0600 cxgb3: pull in page unmap fix. Signed-off-by: Steve Wise sw...@opengridcomputing.com commit 9cf376f7bad5bf5e52ca9ad0437c94e03115a30e Author: Yevgeny Petrilin yevge...@mellanox.co.il Date: Mon Nov 23 18:24:26 2009 +0200 mlx4_core: Added missing device ID's Signed-off-by: Yevgeny Petrilin yevge...@mellanox.co.il commit 29fbed919ea2c66a322d2c4c179b39c481292c93 Author: Yevgeny Petrilin yevge...@mellanox.co.il Date: Mon Nov 23 12:32:58 2009 +0200 Fixed 'round_jiffies_relative' definition in timer.h backports Signed-off-by: Yevgeny Petrilin yevge...@mellanox.co.il commit 33518d251b6893262f7946243d5913511f29499e Author: Yevgeny Petrilin yevge...@mellanox.co.il Date: Sun Nov 22 15:42:20 2009 +0200 mlx4_core: Revert commit 2d455685bfb144e60adde8e76737fa820bea9c3c Signed-off-by: Yevgeny Petrilin
Re: RDMAoE verbs questions
Jason Gunthorpe wrote: On Tue, Nov 24, 2009 at 06:23:15PM -0500, Jeff Squyres wrote: 2. I am somewhat confused by the overloading of the term transport. It appears that a device will have ibv_device.transport_type==IBV_TRANSPORT_IB for both IB and RDMAOE devices. The only way to tell the difference is to examine the new ibv_port_attr.transport field to see if it is RDMA_TRANSPORT_IB or RDMA_TRANSPORT_RDMAOE. I haven't seen these patches but this seems poor to me. I think any app that isn't using rdmacm will need patching and support for RDMAOE (certainly all mine will). libibverbs shouldn't overload the existing transport_type checks for something that is not 100% compatible with IB. Good catch - I agree that the ABI should be 100% backward compatible, and we will fix this. We can add a sysfs option to query the transport type, or add another verb Note that application does not need to query the transport type, but we thought it can be good to know also from debug perspective. Thus I think sysfs is the best place. Opinions? Tziporet -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
On Wed, Nov 25, 2009 at 09:30:40AM -0500, Jeff Squyres wrote: In practice, we have seen that applications *do* need to query the transport type -- at least (real) IB vs. iWARP. It is your expectation that IB and IBoE will function identically? Can you discuss the transport vs. transport_type questions? The reason for identifying each specific port with its own transport is to allow devices which may configure each port differently to be distinhishable. ConnectX is one such device. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
Jeff Squyres wrote: Here's one thread: http://www.open-mpi.org/community/lists/devel/2009/11/7063.php Jeff, looking on the threads you have sent, I didn't find a way to download the patch in a form which can be applied on a source tree, is there a way to do it through this archive? are these patches available from some git tree @mellanox or elsewhere? does anyone have the email address of Vasily Philipov (/vasily_at_[hidden]/), if yes, can you op Pasha please ask him to send me or better, this list the proposed patch, many thanks. Or -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: QoS settings not mapped correctly per pkey ?
Yevgeny, OK, so there are three possible reasons that I can think of: 1. Something is wrong in the configuration. 2. The application does not saturate the link, thus QoS and the whole VL arbitration thing doesn't kick in. 3. There's some bug, somewhere. Let's start with reason no. 1. Please shut off each of the SLs one by one, and make sure that the application gets zero BW on these SLs. You can do it by mapping SL to VL15: qos_sl2vl 0,15,2,3,4,5,6,7,8,9,10,11,12,13,14,15 If I shut down this SL by moving it to VL15, the interfaces stop pinging. This is probably because some IPoIB multicast traffic gets cut off for pkey 0x7fff .. ? So no results for this one. and then qos_sl2vl 0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15 With this setup, and the following QoS settings: qos_max_vls8 qos_high_limit 1 qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0 qos_vlarb_low 0:1,1:64,2:128,3:192,4:0,5:0 qos_sl2vl 0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15 I get roughly the same values for SL 1 to SL3: [r...@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-ic0 -t 10 -P 8 21; done | grep SUM [SUM] 0.0-10.0 sec 6.15 GBytes 5.28 Gbits/sec [SUM] 0.0-10.0 sec 6.00 GBytes 5.16 Gbits/sec [SUM] 0.0-10.1 sec 5.38 GBytes 4.59 Gbits/sec [r...@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-backbone -t 10 -P 8 21; done | grep SUM [SUM] 0.0-10.0 sec 6.09 GBytes 5.23 Gbits/sec [SUM] 0.0-10.0 sec 6.41 GBytes 5.51 Gbits/sec [SUM] 0.0-10.0 sec 4.72 GBytes 4.05 Gbits/sec [r...@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-admin -t 10 -P 8 21; done | grep SUM [SUM] 0.0-10.1 sec 6.96 GBytes 5.92 Gbits/sec [SUM] 0.0-10.1 sec 5.89 GBytes 5.00 Gbits/sec [SUM] 0.0-10.0 sec 5.35 GBytes 4.58 Gbits/sec and then qos_sl2vl 0,1,2,15,4,5,6,7,8,9,10,11,12,13,14,15 Same results as the previous 0,1,15,3,... SL2vl mapping. If this part works well, then we will continue to reason no. 2. In the above tests, I used -P8 to force 8 threads on the client side for each test. I have one quad core CPU(Intel E55400). This makes 24 iperf threads on 4 cores, which __should__ be fine (well I suppose ...) And regarding reason #3. I still get the error I got yesterday, which you told me was not important because the SL's set in partitions.conf would override what was read from qos-policy.conf in the first place. Nov 25 13:13:05 664690 [373E910] 0x01 - __qos_policy_validate_pkey: ERR AC15: pkey 0x0002 in match rule - overriding partition SL (0) with QoS Level SL (3) Nov 25 13:13:05 664681 [373E910] 0x01 - __qos_policy_validate_pkey: ERR AC15: pkey 0x0001 in match rule - overriding partition SL (0) with QoS Level SL (2) Nov 25 13:13:05 664670 [373E910] 0x01 - __qos_policy_validate_pkey: ERR AC15: pkey 0x7FFF in match rule - overriding partition SL (0) with QoS Level SL (1) Thanks for your help. Vincent -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: QoS settings not mapped correctly per pkey ?
Vincent Ficet wrote: Yevgeny, OK, so there are three possible reasons that I can think of: 1. Something is wrong in the configuration. 2. The application does not saturate the link, thus QoS and the whole VL arbitration thing doesn't kick in. 3. There's some bug, somewhere. Let's start with reason no. 1. Please shut off each of the SLs one by one, and make sure that the application gets zero BW on these SLs. You can do it by mapping SL to VL15: qos_sl2vl 0,15,2,3,4,5,6,7,8,9,10,11,12,13,14,15 If I shut down this SL by moving it to VL15, the interfaces stop pinging. This is probably because some IPoIB multicast traffic gets cut off for pkey 0x7fff .. ? Could be, or because ALL interfaces are mapped to SL1, which is what the results below suggest. So no results for this one. and then qos_sl2vl 0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15 With this setup, and the following QoS settings: qos_max_vls8 qos_high_limit 1 qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0 qos_vlarb_low 0:1,1:64,2:128,3:192,4:0,5:0 qos_sl2vl 0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15 I get roughly the same values for SL 1 to SL3: That doesn't look right. You have shut off SL2, so you can't see same BW for this SL. Looks like there is a problem in configuration (or bug in SM). Have you validated somehow that the interfaces have been mapped to the right SLs? [r...@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-ic0 -t 10 -P 8 21; done | grep SUM [SUM] 0.0-10.0 sec 6.15 GBytes 5.28 Gbits/sec [SUM] 0.0-10.0 sec 6.00 GBytes 5.16 Gbits/sec [SUM] 0.0-10.1 sec 5.38 GBytes 4.59 Gbits/sec [r...@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-backbone -t 10 -P 8 21; done | grep SUM [SUM] 0.0-10.0 sec 6.09 GBytes 5.23 Gbits/sec [SUM] 0.0-10.0 sec 6.41 GBytes 5.51 Gbits/sec [SUM] 0.0-10.0 sec 4.72 GBytes 4.05 Gbits/sec [r...@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-admin -t 10 -P 8 21; done | grep SUM [SUM] 0.0-10.1 sec 6.96 GBytes 5.92 Gbits/sec [SUM] 0.0-10.1 sec 5.89 GBytes 5.00 Gbits/sec [SUM] 0.0-10.0 sec 5.35 GBytes 4.58 Gbits/sec and then qos_sl2vl 0,1,2,15,4,5,6,7,8,9,10,11,12,13,14,15 Same results as the previous 0,1,15,3,... SL2vl mapping. If this part works well, then we will continue to reason no. 2. In the above tests, I used -P8 to force 8 threads on the client side for each test. I have one quad core CPU(Intel E55400). This makes 24 iperf threads on 4 cores, which __should__ be fine (well I suppose ...) Best would be having one qperf per CPU core, which is 4 qperf's in your case. What is your subnet setup? -- Yevgeny And regarding reason #3. I still get the error I got yesterday, which you told me was not important because the SL's set in partitions.conf would override what was read from qos-policy.conf in the first place. Nov 25 13:13:05 664690 [373E910] 0x01 - __qos_policy_validate_pkey: ERR AC15: pkey 0x0002 in match rule - overriding partition SL (0) with QoS Level SL (3) Nov 25 13:13:05 664681 [373E910] 0x01 - __qos_policy_validate_pkey: ERR AC15: pkey 0x0001 in match rule - overriding partition SL (0) with QoS Level SL (2) Nov 25 13:13:05 664670 [373E910] 0x01 - __qos_policy_validate_pkey: ERR AC15: pkey 0x7FFF in match rule - overriding partition SL (0) with QoS Level SL (1) Thanks for your help. Vincent -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 2/8] ehca - ib_post_send fixes, Updated
Always set ib_post_send()'s bad_wr. Do not report success if an error occurred. Signed-off-by: Frank Zago fz...@systemfabricworks.com Index: linux-2.6.31/drivers/infiniband/hw/ehca/ehca_reqs.c === --- linux-2.6.31.orig/drivers/infiniband/hw/ehca/ehca_reqs.c 2009-11-24 18:59:31.0 -0600 +++ linux-2.6.31/drivers/infiniband/hw/ehca/ehca_reqs.c 2009-11-24 19:49:19.0 -0600 @@ -400,7 +400,6 @@ static inline int post_one_send(struct ehca_qp *my_qp, struct ib_send_wr *cur_send_wr, -struct ib_send_wr **bad_send_wr, int hidden) { struct ehca_wqe *wqe_p; @@ -412,8 +411,6 @@ wqe_p = ipz_qeit_get_inc(my_qp-ipz_squeue); if (unlikely(!wqe_p)) { /* too many posted work requests: queue overflow */ - if (bad_send_wr) - *bad_send_wr = cur_send_wr; ehca_err(my_qp-ib_qp.device, Too many posted WQEs qp_num=%x, my_qp-ib_qp.qp_num); return -ENOMEM; @@ -433,8 +430,6 @@ */ if (unlikely(ret)) { my_qp-ipz_squeue.current_q_offset = start_offset; - if (bad_send_wr) - *bad_send_wr = cur_send_wr; ehca_err(my_qp-ib_qp.device, Could not write WQE qp_num=%x, my_qp-ib_qp.qp_num); return -EINVAL; @@ -448,7 +443,6 @@ struct ib_send_wr **bad_send_wr) { struct ehca_qp *my_qp = container_of(qp, struct ehca_qp, ib_qp); - struct ib_send_wr *cur_send_wr; int wqe_cnt = 0; int ret = 0; unsigned long flags; @@ -457,7 +451,8 @@ if (unlikely(my_qp-state IB_QPS_RTS)) { ehca_err(qp-device, Invalid QP state qp_state=%d qpn=%x, my_qp-state, qp-qp_num); - return -EINVAL; + ret = -EINVAL; + goto out; } /* LOCK the QUEUE */ @@ -476,24 +471,21 @@ struct ib_send_wr circ_wr; memset(circ_wr, 0, sizeof(circ_wr)); circ_wr.opcode = IB_WR_RDMA_READ; - post_one_send(my_qp, circ_wr, NULL, 1); /* ignore retcode */ + post_one_send(my_qp, circ_wr, 1); /* ignore retcode */ wqe_cnt++; ehca_dbg(qp-device, posted circ wr qp_num=%x, qp-qp_num); my_qp-message_count = my_qp-packet_count = 0; } /* loop processes list of send reqs */ - for (cur_send_wr = send_wr; cur_send_wr != NULL; -cur_send_wr = cur_send_wr-next) { - ret = post_one_send(my_qp, cur_send_wr, bad_send_wr, 0); + while(send_wr) { + ret = post_one_send(my_qp, send_wr, 0); if (unlikely(ret)) { - /* if one or more WQEs were successful, don't fail */ - if (wqe_cnt) - ret = 0; goto post_send_exit0; } wqe_cnt++; - } /* eof for cur_send_wr */ + send_wr = send_wr-next; + } post_send_exit0: iosync(); /* serialize GAL register access */ @@ -503,6 +495,10 @@ my_qp, qp-qp_num, wqe_cnt, ret); my_qp-message_count += wqe_cnt; spin_unlock_irqrestore(my_qp-spinlock_s, flags); + +out: + if (ret) + *bad_send_wr = send_wr; return ret; } -- -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: OFED-1.x SRP's user manual
Pawel Dziekonski wrote: On Mon, 16 Nov 2009 at 05:39:45PM +0100, Pawel Dziekonski wrote: Hello, where do I find SRP user manual? There is no manual in OFED distro, I have also checked main website and wiki. Manual is mentioned in OFED/docs/SRPT_README.txt. Hello again. I got no answers - does the manual exist at all? P You can find in ofed-1.5-rc2 (ofed-1.5-2009xxyy-)/docs directory a. srp_release_notes.txt -- srp initiator readme b. SRPT_README.txt -- srp target readme -vu -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [ANNOUNCE] OFED 1.5 rc3 release is available
Subject: [ANNOUNCE] OFED 1.5 rc3 release is available OFED 1.5-rc3 is available Vlad, What happened to the new uDAPL libraries released last night? I don't see them on the list. -arlin -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
On Wed, Nov 25, 2009 at 04:41:08PM +0200, Eli Cohen wrote: On Wed, Nov 25, 2009 at 09:30:40AM -0500, Jeff Squyres wrote: In practice, we have seen that applications *do* need to query the transport type -- at least (real) IB vs. iWARP. It is your expectation that IB and IBoE will function identically? Can you discuss the transport vs. transport_type questions? The reason for identifying each specific port with its own transport is to allow devices which may configure each port differently to be distinhishable. ConnectX is one such device. As far as I can tell there is no reason for a multi-port device to be represented through verbs as a single device with multiple protocols. If you have a single physical chip with two ports and they are running different protocols it seems much cleaner to me to report it to verbs apps as two devices. Doing this avoids creating compatability problems. Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RDMAoE verbs questions
Or, The patch is attached. Regards, Pasha. Or Gerlitz wrote: Jeff Squyres wrote: Here's one thread: http://www.open-mpi.org/community/lists/devel/2009/11/7063.php Jeff, looking on the threads you have sent, I didn't find a way to download the patch in a form which can be applied on a source tree, is there a way to do it through this archive? are these patches available from some git tree @mellanox or elsewhere? does anyone have the email address of Vasily Philipov (/vasily_at_[hidden]/), if yes, can you op Pasha please ask him to send me or better, this list the proposed patch, many thanks. Or -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff -r 16b0d6d73529 ompi/config/ompi_check_openib.m4 --- a/ompi/config/ompi_check_openib.m4 Tue Nov 03 20:00:16 2009 -0800 +++ b/ompi/config/ompi_check_openib.m4 Sun Nov 15 14:58:37 2009 +0200 @@ -13,7 +13,7 @@ # Copyright (c) 2006-2008 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2006-2007 Los Alamos National Security, LLC. All rights # reserved. -# Copyright (c) 2006-2008 Mellanox Technologies. All rights reserved. +# Copyright (c) 2006-2009 Mellanox Technologies. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -204,6 +204,21 @@ [$1_have_ibcm=1 $1_LIBS=-libcm $$1_LIBS])]) fi + + # Check support for RDMAoE devices + $1_have_rdmaoe=0 + AC_CHECK_DECLS([RDMA_TRANSPORT_RDMAOE], + [$1_have_rdmaoe=1], [], + [#include infiniband/verbs.h]) + + AC_MSG_CHECKING([if RDMAoE support is enabled]) + if test 1 = $$1_have_rdmaoe; then +AC_DEFINE_UNQUOTED([OMPI_HAVE_RDMAOE], [$$1_have_rdmaoe], [Enable RDMAoE support]) +AC_MSG_RESULT([yes]) + else +AC_MSG_RESULT([no]) + fi + ]) # Check to see if infiniband/driver.h works. It is known to diff -r 16b0d6d73529 ompi/mca/btl/openib/btl_openib.c --- a/ompi/mca/btl/openib/btl_openib.c Tue Nov 03 20:00:16 2009 -0800 +++ b/ompi/mca/btl/openib/btl_openib.c Sun Nov 15 14:58:37 2009 +0200 @@ -354,6 +354,13 @@ } #endif +#ifdef OMPI_HAVE_RDMAOE +if(RDMA_TRANSPORT_RDMAOE == (openib_btl-ib_port_attr.transport) +OPAL_PROC_ON_LOCAL_NODE(ompi_proc-proc_flags)) { +continue; +} +#endif + if(NULL == (ib_proc = mca_btl_openib_proc_create(ompi_proc))) { return OMPI_ERR_OUT_OF_RESOURCE; } diff -r 16b0d6d73529 ompi/mca/btl/openib/connect/base.h --- a/ompi/mca/btl/openib/connect/base.h Tue Nov 03 20:00:16 2009 -0800 +++ b/ompi/mca/btl/openib/connect/base.h Sun Nov 15 14:58:37 2009 +0200 @@ -1,6 +1,7 @@ /* * Copyright (c) 2007-2008 Cisco Systems, Inc. All rights reserved. * + * Copyright (c) 2009 Mellanox Technologies. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -13,6 +14,17 @@ #include connect/connect.h +#ifdef OMPI_HAVE_RDMAOE +#define BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl) \ +(((IBV_TRANSPORT_IB != ((btl)-device-ib_dev-transport_type)) || \ +(RDMA_TRANSPORT_RDMAOE == ((btl)-ib_port_attr.transport))) ? \ +true : false) +#else +#define BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl) \ +((IBV_TRANSPORT_IB != ((btl)-device-ib_dev-transport_type)) ? \ +true : false) +#endif + BEGIN_C_DECLS /* diff -r 16b0d6d73529 ompi/mca/btl/openib/connect/btl_openib_connect_ibcm.c --- a/ompi/mca/btl/openib/connect/btl_openib_connect_ibcm.c Tue Nov 03 20:00:16 2009 -0800 +++ b/ompi/mca/btl/openib/connect/btl_openib_connect_ibcm.c Sun Nov 15 14:58:37 2009 +0200 @@ -1,6 +1,6 @@ /* * Copyright (c) 2007-2009 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2008 Mellanox Technologies. All rights reserved. + * Copyright (c) 2008-2009 Mellanox Technologies. All rights reserved. * * $COPYRIGHT$ * @@ -653,7 +653,7 @@ we're in an old version of OFED that is IB only (i.e., no iWarp), so we can safely assume that we can use this CPC. */ #if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) -if (IBV_TRANSPORT_IB != btl-device-ib_dev-transport_type) { +if (BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl)) { BTL_VERBOSE((ibcm CPC only supported on InfiniBand; skipped on %s:%d, ibv_get_device_name(btl-device-ib_dev), openib_btl-port_num)); diff -r 16b0d6d73529 ompi/mca/btl/openib/connect/btl_openib_connect_oob.c --- a/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c Tue Nov 03 20:00:16 2009 -0800 +++ b/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c Sun Nov 15 14:58:37 2009 +0200
Re: RDMAoE verbs questions
Pavel Shamis (Pasha) wrote: The patch is attached Thanks, this patch basically replaces checks for the device transport type to be IB to a check that makes sure either the former happens or the port transport type is rdmaoe. As Jason, Tziporet and noted, the port transport type seems to be bad and non-comapatible/operable idea, so it should and probably could be avoided. I see another patch @ http://www.open-mpi.org/community/lists/devel/2009/11/7063.php can you send that one as well. The you sent patch isn't signed so I can't address the author in further replies (unless you are the author), also it wasn't generated with the -p option of diff which would show for each change what is the effected function, doing so would help in the review. Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: QoS settings not mapped correctly per pkey ?
Hello Yevgeny, OK, so there are three possible reasons that I can think of: 1. Something is wrong in the configuration. 2. The application does not saturate the link, thus QoS and the whole VL arbitration thing doesn't kick in. 3. There's some bug, somewhere. Let's start with reason no. 1. Please shut off each of the SLs one by one, and make sure that the application gets zero BW on these SLs. You can do it by mapping SL to VL15: qos_sl2vl 0,15,2,3,4,5,6,7,8,9,10,11,12,13,14,15 If I shut down this SL by moving it to VL15, the interfaces stop pinging. This is probably because some IPoIB multicast traffic gets cut off for pkey 0x7fff .. ? Could be, or because ALL interfaces are mapped to SL1, which is what the results below suggest. Yes, you are right (see below). So no results for this one. and then qos_sl2vl 0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15 With this setup, and the following QoS settings: qos_max_vls8 qos_high_limit 1 qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0 qos_vlarb_low 0:1,1:64,2:128,3:192,4:0,5:0 qos_sl2vl 0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15 I get roughly the same values for SL 1 to SL3: That doesn't look right. You have shut off SL2, so you can't see same BW for this SL. Looks like there is a problem in configuration (or bug in SM). Yes, that's correct: There could be a configuration issue or a bug in SM: Current setup and results: qos_max_vls8 qos_high_limit 1 qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0 qos_vlarb_low 0:1,1:64,2:128,3:192,4:0,5:0 qos_sl2vl 0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15 [r...@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-ic0 -t 10 -P 8 21; done | grep SUM [SUM] 0.0-10.1 sec 9.78 GBytes 8.28 Gbits/sec [SUM] 0.0-10.0 sec 5.69 GBytes 4.89 Gbits/sec [SUM] 0.0-10.0 sec 4.30 GBytes 3.69 Gbits/sec [r...@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-backbone -t 10 -P 8 21; done | grep SUM [SUM] 0.0-10.2 sec 6.44 GBytes 5.45 Gbits/sec [SUM] 0.0-10.1 sec 6.64 GBytes 5.66 Gbits/sec [SUM] 0.0-10.0 sec 6.03 GBytes 5.15 Gbits/sec [r...@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-admin -t 10 -P 8 21; done | grep SUM [SUM] 0.0-10.0 sec 5.80 GBytes 4.98 Gbits/sec [SUM] 0.0-10.0 sec 7.04 GBytes 6.02 Gbits/sec [SUM] 0.0-10.0 sec 6.60 GBytes 5.67 Gbits/sec The -backbone bandwidth should be 0 here. Have you validated somehow that the interfaces have been mapped to the right SLs? Two things: 1/ Either the interface have not been mapped properly to the right SL's, but given the config files below, I doubt it: [r...@pichu22 ~]# tail -n 5 /etc/sysconfig/network-scripts/ifcfg-ib0* == /etc/sysconfig/network-scripts/ifcfg-ib0 == BOOTPROTO=static IPADDR=10.12.1.10 NETMASK=255.255.0.0 ONBOOT=yes MTU=2000 == /etc/sysconfig/network-scripts/ifcfg-ib0.8001 == BOOTPROTO=static IPADDR=10.13.1.10 NETMASK=255.255.0.0 ONBOOT=yes MTU=2000 == /etc/sysconfig/network-scripts/ifcfg-ib0.8002 == BOOTPROTO=static IPADDR=10.14.1.10 NETMASK=255.255.0.0 ONBOOT=yes MTU=2000 partitions.conf: - default=0x7fff,ipoib: ALL=full; ip_backbone=0x0001,ipoib: ALL=full; ip_admin=0x0002,ipoib: ALL=full; qos-policy.conf: qos-ulps default: 0 # default SL ipoib, pkey 0x7FFF: 1 # IP with default pkey 0x7FFF ipoib, pkey 0x1: 2 # backbone IP with pkey 0x1 ipoib, pkey 0x2: 3 # admin IP with pkey 0x2 end-qos-ulps ib0.8001 maps to pkey 1 (with MSB set to 1 due to full membership = 0x8001 = (116 | 1) ib0.8002 maps to pkey 2 (with MSB set to 1 due to full membership = 0x8002 = (116 | 2) 2/ Somehow, the qos policy parsing does not map pkeys as we would expect, which is what the opensm messages would suggest: Nov 25 13:13:05 664690 [373E910] 0x01 - __qos_policy_validate_pkey: ERR AC15: pkey 0x0002 in match rule - overriding partition SL (0) with QoS Level SL (3) Nov 25 13:13:05 664681 [373E910] 0x01 - __qos_policy_validate_pkey: ERR AC15: pkey 0x0001 in match rule - overriding partition SL (0) with QoS Level SL (2) Nov 25 13:13:05 664670 [373E910] 0x01 - __qos_policy_validate_pkey: ERR AC15: pkey 0x7FFF in match rule - overriding partition SL (0) with QoS Level SL (1) If the messages are correct and do reflect what opensm is actually doing, this would explain why shutting down SL1 (by moving it to VL15) prevented all interfaces from running. [r...@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-ic0 -t 10 -P 8 21; done | grep SUM [SUM] 0.0-10.0 sec 6.15 GBytes 5.28 Gbits/sec [SUM] 0.0-10.0 sec 6.00 GBytes 5.16 Gbits/sec [SUM] 0.0-10.1 sec 5.38 GBytes 4.59 Gbits/sec [r...@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-backbone -t 10 -P 8 21; done | grep SUM [SUM] 0.0-10.0 sec 6.09 GBytes 5.23 Gbits/sec [SUM] 0.0-10.0 sec 6.41 GBytes 5.51 Gbits/sec [SUM] 0.0-10.0 sec 4.72 GBytes 4.05 Gbits/sec [r...@pichu22 ~]# while test -e