[ANNOUNCE] uDAPL library: v1 - compat-1.2.15 and v2 - dapl-2.0.25 release

2009-11-25 Thread Davis, Arlin R

New release for uDAPL 1.2 and 2.0 available on the OFA download page and in my 
git tree.

md5sum: d7b710aebd8bb9b48b6b8982f71f62c7 compat-dapl-1.2.15.tar.gz 
md5sum: 3a14b650bbfbe38243eb4a11157bef5d dapl-2.0.25.tar.gz 

Summary of v1 and v2 changes since last release: 

v1 - Release 1.2.15 fixes: 
v1 - dtest, dapltest: conflict with dapl-2 utils package, change to dapl1, 
dapltest1 
v1 - scm: fix compiler warning, unused variable 

v2 - Release 2.0.25 fixes: 
v2 - winof scm: initialize opt for NODELAY setsockopt 
v2 - winof cma: windows definition for EADDRNOTAVAIL missing 
v2 - scm: client side setsockopt NODELAY fails if data arrives before setting 
v2 - cma: setup_listener Cannot assign requested address 
v2 - common: seg fault in dapl_evd_wait with multi-thread application using 
CNO's. 
v2 - ucm: inbound DREQ/DREP handshake should transition QP. 
v2 - winof: Remove duplicate include of comp_channel.cpp from cm.c as it is 
included in opensm_ucb/device 
v2 - winof: Utilize WinOF version of inet_ntop() for Windows OSes which do not 
support inet_ntop(). 

Vlad, please pull both new packages into latest OFED 1.5 build and install the 
following:
 
dapl-2.0.25-1 
dapl-utils-2.0.25-1 
dapl-devel-2.0.25-1 
dapl-debuginfo-2.0.25-1 
compat-dapl-1.2.15-1 
compat-dapl-devel-1.2.15-1 

See http://www.openfabrics.org/downloads/dapl/ more details.

Thanks,

-arlin



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


QoS settings not mapped correctly per pkey ?

2009-11-25 Thread Vincent Ficet
Hello,

Following the QoS experiments I carried out yesterday, I wanted to set
up 3 IP networks, each one bound to a particular pkey, in order to
achieve QoS for each network.
Unfortunately, it seems that something is not mapped properly in the ULP
layers (vlarb tables are fine).

The settings are as follows:

opensm.conf:


qos_max_vls8
qos_high_limit 1
qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
qos_vlarb_low  0:8,1:1,2:1,3:4,4:0,5:0
qos_sl2vl  0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15

The corresponding VLArb tables are fine on both the server (pichu16) and
the client (pichu22):

[r...@pichu22 network-scripts]# smpquery vlarb -D 0
# VLArbitration tables: DR path slid 65535; dlid 65535; 0 port 0 LowCap
8 HighCap 8
# Low priority VL Arbitration Table:
VL: |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 |
WEIGHT: |0x8 |0x1 |0x1 |0x4 |0x0 |0x0 |0x0 |0x0 |
# High priority VL Arbitration Table:
VL: |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 |
WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |

[r...@pichu16 ~]# smpquery vlarb -D 0
# VLArbitration tables: DR path slid 65535; dlid 65535; 0 port 0 LowCap
8 HighCap 8
# Low priority VL Arbitration Table:
VL: |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 |
WEIGHT: |0x8 |0x1 |0x1 |0x4 |0x0 |0x0 |0x0 |0x0 |
# High priority VL Arbitration Table:
VL: |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 |
WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |

partitions.conf:
---

default=0x7fff,ipoib: ALL=full;
ip_backbone=0x0001,ipoib: ALL=full;
ip_admin=0x0002,ipoib: ALL=full;

qos-policy.conf:
---

qos-ulps
default: 0 # default SL
ipoib, pkey 0x7FFF : 1 # IP with default pkey 0x7FFF
ipoib, pkey 0x1: 2 # backbone IP with pkey 0x1
ipoib, pkey 0x2: 3 # admin IP with pkey 0x2
end-qos-ulps

Assigned IP addresses (in /etc/hosts):
-

10.12.1.4   pichu16-ic0 # default IPoIB network, pkey 0x7FFF
10.13.1.4   pichu16-backbone# IPoIB backbone network, pkey 0x1
10.14.1.4   pichu16-admin   # IPoIB admin network, pkey 0x2
10.12.1.10  pichu22-ic0 # default IPoIB network, pkey 0x7FFF
10.13.1.10  pichu22-backbone# IPoIB backbone network, pkey 0x1
10.14.1.10  pichu22-admin   # IPoIB admin network, pkey 0x2

Note that the netmask is /16, so the -ic0, -backbone and -admin networks
cannot see each other.

IPoIB settings on server side:
--

[r...@pichu16 ~]# tail -n 5 /etc/sysconfig/network-scripts/ifcfg-ib0*
== /etc/sysconfig/network-scripts/ifcfg-ib0 ==
BOOTPROTO=static
IPADDR=10.12.1.4
NETMASK=255.255.0.0
ONBOOT=yes
MTU=2044

== /etc/sysconfig/network-scripts/ifcfg-ib0.8001 ==
BOOTPROTO=static
IPADDR=10.13.1.4
NETMASK=255.255.0.0
ONBOOT=yes
MTU=2044

== /etc/sysconfig/network-scripts/ifcfg-ib0.8002 ==
BOOTPROTO=static
IPADDR=10.14.1.4
NETMASK=255.255.0.0
ONBOOT=yes
MTU=2044

[r...@pichu16 ~]# ip addr show ib0
4: ib0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 2044 qdisc pfifo_fast
state UP qlen 256
link/infiniband
80:00:00:48:fe:80:00:00:00:00:00:00:2c:90:00:10:0d:00:05:6d brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
inet 10.12.1.4/16 brd 10.12.255.255 scope global ib0
inet 10.13.1.4/16 brd 10.13.255.255 scope global ib0
inet 10.14.1.4/16 brd 10.14.255.255 scope global ib0
inet6 fe80::2e90:10:d00:56d/64 scope link
   valid_lft forever preferred_lft forever

IPoIB settings on client side:
--

[r...@pichu22 ~]# tail -n 5 /etc/sysconfig/network-scripts/ifcfg-ib0*
== /etc/sysconfig/network-scripts/ifcfg-ib0 ==
BOOTPROTO=static
IPADDR=10.12.1.10
NETMASK=255.255.0.0
ONBOOT=yes
MTU=2044

== /etc/sysconfig/network-scripts/ifcfg-ib0.8001 ==
BOOTPROTO=static
IPADDR=10.13.1.10
NETMASK=255.255.0.0
ONBOOT=yes
MTU=2044

== /etc/sysconfig/network-scripts/ifcfg-ib0.8002 ==
BOOTPROTO=static
IPADDR=10.14.1.10
NETMASK=255.255.0.0
ONBOOT=yes
MTU=2044

[r...@pichu22 ~]# ip addr show ib0
48: ib0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 2044 qdisc pfifo_fast
state UP qlen 256
link/infiniband
80:00:00:48:fe:80:00:00:00:00:00:00:2c:90:00:10:0d:00:06:79 brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
inet 10.12.1.10/16 brd 10.12.255.255 scope global ib0
inet 10.13.1.10/16 brd 10.13.255.255 scope global ib0
inet 10.14.1.10/16 brd 10.14.255.255 scope global ib0
inet6 fe80::2e90:10:d00:679/64 scope link
   valid_lft forever preferred_lft forever

Iperf servers on server side:
-

Quoting from iperf help:
  -B, --bind  host   bind to host, an interface or multicast address
  -s, --server run in server mode

Each iperf server is bound to a dedicated interface as follows:

[r...@pichu16 ~]# iperf -s -B pichu16-backbone
[r...@pichu16 ~]# iperf -s -B pichu16-admin
[r...@pichu16 ~]# iperf -s -B pichu16-ic0

Iperf clients on client side:

Re: RDMAoE verbs questions

2009-11-25 Thread Jeff Squyres

On Nov 25, 2009, at 2:25 AM, Or Gerlitz wrote:


 I was reviewing Mellanox's Open MPI patches for RDMAoE support

Can you send us point to the patch series (mail thread or some
repository where they sit)?



Here's one thread:

http://www.open-mpi.org/community/lists/devel/2009/11/7063.php

the latest patch in that thread is here:

http://www.open-mpi.org/community/lists/devel/2009/11/7119.php

Here's another thread with a slightly different thread, but with  
elements of IBoE support in it:


http://www.open-mpi.org/community/lists/devel/2009/11/7120.php


 1. It looks like there is a new field on the ibv_port_attr struct:
 transport. Is it expected that all device drivers will start filling
 in this value, or is it done in the OF core code somewhere?
Please note that this field isn't present in the distro provided IB
stack and hence it is highly recommended to avoid referring it in your
code,



FWIW: we have configure tests checking for this field (just like we  
have configure tests checking for transport_type, because that wasn't  
always there, either).  However, it is a little disturbing that based  
on this conversation, that field name may change, and therefore we'll  
have to add *more* configure logic to figure out what exact field to  
check.  The same is true for all the IBoE code -- since none of that  
code has been approved yet, it's risky to base any code off it.  :-\



as least some of us (...) are for decoupling ompi from ofed, so
lets not put sticks in the wheels of that process.



Hear hear (let's remove MPI from OFED! :-) ).  But I think that this  
is a separate issue.


--
Jeff Squyres
jsquy...@cisco.com

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-11-25 Thread Jeff Squyres

On Nov 24, 2009, at 11:52 PM, Jason Gunthorpe wrote:

 OMPI uses RDMACM (among others), so I'm not sure I follow what  
you're

 asking me...?

I think I'm asking you about the non RDMACM stuff in openmpi, ibcm,
xoob, etc. I can't tell at glance if any of them will be safe to run
on RDMAoE as-is..




Wait, I think I might have been mistaken.  I'm looking through the  
patches this morning and I don't see the don't allow host-loopback if  
it's IBoE logic.  The only places I see the check for real IB vs.  
IBoE is when deciding to use IBCM or OOB connection schemes (which, as  
Pasha said, are designed to be [real] IB only).


But, as you mentioned, there definitely are apps that don't use RDMACM  
and use an out of band (i.e., OOB) mechanism for making IB QP's.   
They therefore might have similar issues (need to check for real IB  
vs. IBoE).


Sorry for the confusion... I'm going to chalk it up to the fact that  
it was late at night when I sent that.  :-)


--
Jeff Squyres
jsquy...@cisco.com

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: QoS settings not mapped correctly per pkey ?

2009-11-25 Thread Yevgeny Kliteynik

Hi Vincent,

Vincent Ficet wrote:

Hello,

Following the QoS experiments I carried out yesterday, I wanted to set
up 3 IP networks, each one bound to a particular pkey, in order to
achieve QoS for each network.
Unfortunately, it seems that something is not mapped properly in the ULP
layers (vlarb tables are fine).

The settings are as follows:

opensm.conf:


qos_max_vls8
qos_high_limit 1
qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
qos_vlarb_low  0:8,1:1,2:1,3:4,4:0,5:0
qos_sl2vl  0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15


Please check section 7 of the QoS_management_in_OpenSM.txt
doc. It explains what exactly is the meaning of the values
in the VLArb table. It also has explanation of the problem
that you're seeing. Quoting from there:

Keep in mind that ports usually transmit packets of
 size equal to MTU. For instance, for 4KB MTU a single
 packet will require 64 credits, so in order to achieve
 effective VL arbitration for packets of 4KB MTU, the
 weighting values for each VL should be multiples of 64.

-- Yevgeny



The corresponding VLArb tables are fine on both the server (pichu16) and
the client (pichu22):

[r...@pichu22 network-scripts]# smpquery vlarb -D 0
# VLArbitration tables: DR path slid 65535; dlid 65535; 0 port 0 LowCap
8 HighCap 8
# Low priority VL Arbitration Table:
VL: |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 |
WEIGHT: |0x8 |0x1 |0x1 |0x4 |0x0 |0x0 |0x0 |0x0 |
# High priority VL Arbitration Table:
VL: |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 |
WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |

[r...@pichu16 ~]# smpquery vlarb -D 0
# VLArbitration tables: DR path slid 65535; dlid 65535; 0 port 0 LowCap
8 HighCap 8
# Low priority VL Arbitration Table:
VL: |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 |
WEIGHT: |0x8 |0x1 |0x1 |0x4 |0x0 |0x0 |0x0 |0x0 |
# High priority VL Arbitration Table:
VL: |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 |
WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |

partitions.conf:
---

default=0x7fff,ipoib: ALL=full;
ip_backbone=0x0001,ipoib: ALL=full;
ip_admin=0x0002,ipoib: ALL=full;

qos-policy.conf:
---

qos-ulps
default: 0 # default SL
ipoib, pkey 0x7FFF : 1 # IP with default pkey 0x7FFF
ipoib, pkey 0x1: 2 # backbone IP with pkey 0x1
ipoib, pkey 0x2: 3 # admin IP with pkey 0x2
end-qos-ulps

Assigned IP addresses (in /etc/hosts):
-

10.12.1.4   pichu16-ic0 # default IPoIB network, pkey 0x7FFF
10.13.1.4   pichu16-backbone# IPoIB backbone network, pkey 0x1
10.14.1.4   pichu16-admin   # IPoIB admin network, pkey 0x2
10.12.1.10  pichu22-ic0 # default IPoIB network, pkey 0x7FFF
10.13.1.10  pichu22-backbone# IPoIB backbone network, pkey 0x1
10.14.1.10  pichu22-admin   # IPoIB admin network, pkey 0x2

Note that the netmask is /16, so the -ic0, -backbone and -admin networks
cannot see each other.

IPoIB settings on server side:
--

[r...@pichu16 ~]# tail -n 5 /etc/sysconfig/network-scripts/ifcfg-ib0*
== /etc/sysconfig/network-scripts/ifcfg-ib0 ==
BOOTPROTO=static
IPADDR=10.12.1.4
NETMASK=255.255.0.0
ONBOOT=yes
MTU=2044

== /etc/sysconfig/network-scripts/ifcfg-ib0.8001 ==
BOOTPROTO=static
IPADDR=10.13.1.4
NETMASK=255.255.0.0
ONBOOT=yes
MTU=2044

== /etc/sysconfig/network-scripts/ifcfg-ib0.8002 ==
BOOTPROTO=static
IPADDR=10.14.1.4
NETMASK=255.255.0.0
ONBOOT=yes
MTU=2044

[r...@pichu16 ~]# ip addr show ib0
4: ib0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 2044 qdisc pfifo_fast
state UP qlen 256
link/infiniband
80:00:00:48:fe:80:00:00:00:00:00:00:2c:90:00:10:0d:00:05:6d brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
inet 10.12.1.4/16 brd 10.12.255.255 scope global ib0
inet 10.13.1.4/16 brd 10.13.255.255 scope global ib0
inet 10.14.1.4/16 brd 10.14.255.255 scope global ib0
inet6 fe80::2e90:10:d00:56d/64 scope link
   valid_lft forever preferred_lft forever

IPoIB settings on client side:
--

[r...@pichu22 ~]# tail -n 5 /etc/sysconfig/network-scripts/ifcfg-ib0*
== /etc/sysconfig/network-scripts/ifcfg-ib0 ==
BOOTPROTO=static
IPADDR=10.12.1.10
NETMASK=255.255.0.0
ONBOOT=yes
MTU=2044

== /etc/sysconfig/network-scripts/ifcfg-ib0.8001 ==
BOOTPROTO=static
IPADDR=10.13.1.10
NETMASK=255.255.0.0
ONBOOT=yes
MTU=2044

== /etc/sysconfig/network-scripts/ifcfg-ib0.8002 ==
BOOTPROTO=static
IPADDR=10.14.1.10
NETMASK=255.255.0.0
ONBOOT=yes
MTU=2044

[r...@pichu22 ~]# ip addr show ib0
48: ib0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 2044 qdisc pfifo_fast
state UP qlen 256
link/infiniband
80:00:00:48:fe:80:00:00:00:00:00:00:2c:90:00:10:0d:00:06:79 brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
inet 10.12.1.10/16 brd 10.12.255.255 scope global ib0
inet 10.13.1.10/16 brd 10.13.255.255 scope global ib0
inet 10.14.1.10/16 brd 10.14.255.255 

[ANNOUNCE] OFED 1.5 rc3 release is available

2009-11-25 Thread Vladimir Sokolovsky

OFED 1.5-rc3 is available

Notes:

The tarball is available on:
http://www.openfabrics.org/builds/ofed-1.5/release/OFED-1.5-rc3.tgz

To get BUILD_ID run ofed_info

Please report any issues in bugzilla https://bugs.openfabrics.org/  for
OFED 1.5

Vladimir  Tziporet




Release information:

Linux Operating Systems:
   - RedHat EL4 up6:   2.6.9-67.ELsmp
   - RedHat EL4 up7:   2.6.9-78.ELsmp
   - RedHat EL4 up8:   2.6.9-89.ELsmp
   - RedHat EL5 up2:   2.6.18-92.el5
   - RedHat EL5 up3:   2.6.18-128.el5
   - RedHat EL5 up4:   2.6.18-164.el5
   - SLES10 SP2:   2.6.16.60-0.21-smp
   - SLES10 SP3:   2.6.16.60-0.54-smp
   - SLES11:   2.6.27.19-5-default
   - OEL 4 up7 2.6.9-78.ELsmp
   - OEL 5 up2 2.6.18-92.el5
   - CentOS5.2 2.6.18-92.el5
   - CentOS5.3 2.6.18-128.el5
   - Fedora Core11 2.6.29  *
   - OpenSuSE 11   2.6.25.5-1.1*
   - kernel.org http://kernel.org/: 2.6.29 and 2.6.30


 * Minimal QA for these versions

Systems:
 * x86_64
 * x86
 * ia64
 * ppc64


Main changes from 1.5 rc2:

1. Updated packages:
libipathverbs-1.2.tar.gz
ibutils-1.2-0.1.ge8e69b7.tar.gz
ib-bonding-0.9.0-41.src.rpm
rnfs-utils-1.1.5-10.OFED.src.rpm

- Management packages:
libibumad-1.3.3_20091118_f25da9f.tar.gz
libibmad-1.3.3_20091118_f25da9f.tar.gz
opensm-3.3.3_20091118_f25da9f.tar.gz
infiniband-diags-1.5.3_20091118_f25da9f.tar.gz

- MPI packages:
mvapich-1.2.0-3532.src.rpm (Bug fixes)
mvapich2-1.4-2.src.rpm
mpitests-3.2-916.src.rpm (Updated IMB and OSU benchmark tests)

2. Bug fixes
   See attached.


Limitations:

- SLES10 SP3 on IA64 is not supported yet

commit bb49143753189b46c195635972e44ee512e5ee5c
Author: Eli Cohen e...@mellanox.co.il
Date:   Tue Nov 24 17:14:23 2009 +0200

From 134e8ba45ac49b773ba912726b4faee6a2839fab Mon Sep 17 00:00:00 2001
From: Eli Cohen e...@mellanox.co.il
Date: Tue, 24 Nov 2009 17:06:35 +0200
Subject: [PATCH] mlx4: properly mask MGM entry members count

The members_count field size is 24 bits so mask it properly when reading it.

Signed-off-by: Eli Cohen e...@mellanox.co.il

commit f78229e5764dd2c354adc36c766b166fb21dc24d
Merge: 9b7e7cf... cc39d9d...
Author: Vladimir Sokolovsky v...@mellanox.co.il
Date:   Tue Nov 24 13:25:48 2009 +0200

Merge branch 'ofed_1_5' of 
ssh://v...@sofa.openfabrics.org/~swise/scm/ofed_kernel into ofed_kernel_1_5

commit 9b7e7cf0b17afbc9aa0844e46338101aed0db032
Merge: 9cf376f... ea2b694...
Author: Vladimir Sokolovsky v...@mellanox.co.il
Date:   Tue Nov 24 10:43:47 2009 +0200

Merge branch 'ofed_kernel_1_5' of git://git.openfabrics.org/~amirv/ofed_1_5 
into ofed_kernel_1_5

commit ea2b694c1a744fb9ece2724dedd43b4b9efc046b
Author: Amir Vadai am...@mellanox.co.il
Date:   Tue Nov 24 09:33:53 2009 +0200

sdp: Fixed annoying warning by memtrack

kzalloc done in sdp_seq_open is freed by the system in a function
that memtrack can't see. Therefore it printed false warning.

Signed-off-by: Amir Vadai am...@mellanox.co.il

commit ab7b734e930d22a25f904c47e4f4898718cb150c
Author: Amir Vadai am...@mellanox.co.il
Date:   Tue Nov 24 09:32:39 2009 +0200

sdp: fixed BUG1796 - running out of memory on rx

rcv queue could grow endlessly because minimal RX buffers in QP
was set to SDP_MIN_TX_CREDITS + 1 - so there always were credits
available for the sender.

Signed-off-by: Amir Vadai am...@mellanox.co.il

commit ae23d8839e308cd28f0d75ce98fb7dc247c9459b
Author: Amir Vadai am...@mellanox.co.il
Date:   Mon Nov 23 13:29:41 2009 +0200

sdp: fixed sparse warnings

Signed-off-by: Amir Vadai am...@mellanox.co.il

commit cc39d9d3858d37bf45ac58f880a36b9a2f2a8e8d
Author: Steve Wise sw...@opengridcomputing.com
Date:   Mon Nov 23 11:46:02 2009 -0600

cxgb3: pull in page unmap fix.

Signed-off-by: Steve Wise sw...@opengridcomputing.com

commit 9cf376f7bad5bf5e52ca9ad0437c94e03115a30e
Author: Yevgeny Petrilin yevge...@mellanox.co.il
Date:   Mon Nov 23 18:24:26 2009 +0200

mlx4_core: Added missing device ID's

Signed-off-by: Yevgeny Petrilin yevge...@mellanox.co.il

commit 29fbed919ea2c66a322d2c4c179b39c481292c93
Author: Yevgeny Petrilin yevge...@mellanox.co.il
Date:   Mon Nov 23 12:32:58 2009 +0200

Fixed 'round_jiffies_relative' definition in timer.h backports

Signed-off-by: Yevgeny Petrilin yevge...@mellanox.co.il

commit 33518d251b6893262f7946243d5913511f29499e
Author: Yevgeny Petrilin yevge...@mellanox.co.il
Date:   Sun Nov 22 15:42:20 2009 +0200

mlx4_core: Revert commit 2d455685bfb144e60adde8e76737fa820bea9c3c

Signed-off-by: Yevgeny Petrilin 

Re: RDMAoE verbs questions

2009-11-25 Thread Tziporet Koren

Jason Gunthorpe wrote:

On Tue, Nov 24, 2009 at 06:23:15PM -0500, Jeff Squyres wrote:

  
2. I am somewhat confused by the overloading of the term transport.   
It appears that a device will have  
ibv_device.transport_type==IBV_TRANSPORT_IB for both IB and RDMAOE  
devices.  The only way to tell the difference is to examine the new  
ibv_port_attr.transport field to see if it is RDMA_TRANSPORT_IB or  
RDMA_TRANSPORT_RDMAOE.



I haven't seen these patches but this seems poor to me. I think any
app that isn't using rdmacm will need patching and support for RDMAOE
(certainly all mine will). libibverbs shouldn't overload the existing
transport_type checks for something that is not 100% compatible with
IB.

  
Good catch - I agree that the ABI should be 100% backward compatible, 
and we will fix this.

We can add a sysfs option to query the transport type, or add another verb

Note that application does not need to query the transport type, but we 
thought it can be good to know also from debug perspective.

Thus I think sysfs is the best place.

Opinions?

Tziporet

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-11-25 Thread Eli Cohen
On Wed, Nov 25, 2009 at 09:30:40AM -0500, Jeff Squyres wrote:
 
 In practice, we have seen that applications *do* need to query the
 transport type -- at least (real) IB vs. iWARP.  It is your
 expectation that IB and IBoE will function identically?
 
 Can you discuss the transport vs. transport_type questions?
 

The reason for identifying each specific port with its own transport
is to allow devices which may configure each port differently to be
distinhishable. ConnectX is one such device.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-11-25 Thread Or Gerlitz

Jeff Squyres wrote:
Here's one thread:  
http://www.open-mpi.org/community/lists/devel/2009/11/7063.php
Jeff, looking on the threads you have sent, I didn't find a way to 
download the patch in a form which can be applied on a source tree, is 
there a way to do it through this archive? are these patches available 
from some git tree @mellanox or elsewhere? does anyone have the email 
address of Vasily Philipov (/vasily_at_[hidden]/), if yes, can you op 
Pasha please ask him to send me or better, this list the proposed patch, 
many thanks.


Or

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: QoS settings not mapped correctly per pkey ?

2009-11-25 Thread Vincent Ficet
Yevgeny,

 OK, so there are three possible reasons that I can think of:
 1. Something is wrong in the configuration.
 2. The application does not saturate the link, thus QoS
   and the whole VL arbitration thing doesn't kick in.
 3. There's some bug, somewhere.

 Let's start with reason no. 1.
 Please shut off each of the SLs one by one, and
 make sure that the application gets zero BW on
 these SLs. You can do it by mapping SL to VL15:

 qos_sl2vl  0,15,2,3,4,5,6,7,8,9,10,11,12,13,14,15
If I shut down this SL by moving it to VL15, the interfaces stop pinging.
This is probably because some IPoIB multicast traffic gets cut off for
pkey 0x7fff .. ?

So no results for this one.

 and then
 qos_sl2vl  0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15

With this setup, and the following QoS settings:

qos_max_vls8
qos_high_limit 1
qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
qos_vlarb_low  0:1,1:64,2:128,3:192,4:0,5:0
qos_sl2vl  0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15

I get roughly the same values for SL 1 to SL3:

[r...@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-ic0 -t
10 -P 8 21; done | grep SUM
[SUM]  0.0-10.0 sec  6.15 GBytes  5.28 Gbits/sec
[SUM]  0.0-10.0 sec  6.00 GBytes  5.16 Gbits/sec
[SUM]  0.0-10.1 sec  5.38 GBytes  4.59 Gbits/sec

[r...@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-backbone
-t 10 -P 8 21; done | grep SUM
[SUM]  0.0-10.0 sec  6.09 GBytes  5.23 Gbits/sec
[SUM]  0.0-10.0 sec  6.41 GBytes  5.51 Gbits/sec
[SUM]  0.0-10.0 sec  4.72 GBytes  4.05 Gbits/sec

[r...@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-admin -t
10 -P 8 21; done | grep SUM
[SUM]  0.0-10.1 sec  6.96 GBytes  5.92 Gbits/sec
[SUM]  0.0-10.1 sec  5.89 GBytes  5.00 Gbits/sec
[SUM]  0.0-10.0 sec  5.35 GBytes  4.58 Gbits/sec

 and then
 qos_sl2vl  0,1,2,15,4,5,6,7,8,9,10,11,12,13,14,15
Same results as the previous 0,1,15,3,... SL2vl mapping.

 If this part works well, then we will continue to
 reason no. 2.
In the above tests, I used -P8 to force 8 threads on the client side for
each test.
I have one quad core CPU(Intel  E55400).
This makes 24 iperf threads on 4 cores, which __should__ be fine (well I
suppose ...)

And regarding reason #3. I still get the error I got yesterday, which
you told me was not important because the SL's set in partitions.conf
would override what was read from qos-policy.conf in the first place.

Nov 25 13:13:05 664690 [373E910] 0x01 - __qos_policy_validate_pkey: ERR
AC15: pkey 0x0002 in match rule - overriding partition SL (0) with QoS
Level SL (3)
Nov 25 13:13:05 664681 [373E910] 0x01 - __qos_policy_validate_pkey: ERR
AC15: pkey 0x0001 in match rule - overriding partition SL (0) with QoS
Level SL (2)
Nov 25 13:13:05 664670 [373E910] 0x01 - __qos_policy_validate_pkey: ERR
AC15: pkey 0x7FFF in match rule - overriding partition SL (0) with QoS
Level SL (1)

Thanks for your help.

Vincent
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: QoS settings not mapped correctly per pkey ?

2009-11-25 Thread Yevgeny Kliteynik

Vincent Ficet wrote:

Yevgeny,

OK, so there are three possible reasons that I can think of:
1. Something is wrong in the configuration.
2. The application does not saturate the link, thus QoS
  and the whole VL arbitration thing doesn't kick in.
3. There's some bug, somewhere.

Let's start with reason no. 1.
Please shut off each of the SLs one by one, and
make sure that the application gets zero BW on
these SLs. You can do it by mapping SL to VL15:

qos_sl2vl  0,15,2,3,4,5,6,7,8,9,10,11,12,13,14,15

If I shut down this SL by moving it to VL15, the interfaces stop pinging.
This is probably because some IPoIB multicast traffic gets cut off for
pkey 0x7fff .. ?


Could be, or because ALL interfaces are mapped to
SL1, which is what the results below suggest.


So no results for this one.

and then
qos_sl2vl  0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15


With this setup, and the following QoS settings:

qos_max_vls8
qos_high_limit 1
qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
qos_vlarb_low  0:1,1:64,2:128,3:192,4:0,5:0
qos_sl2vl  0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15

I get roughly the same values for SL 1 to SL3:


That doesn't look right.
You have shut off SL2, so you can't see same
BW for this SL. Looks like there is a problem
in configuration (or bug in SM).

Have you validated somehow that the interfaces
have been mapped to the right SLs?


[r...@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-ic0 -t
10 -P 8 21; done | grep SUM
[SUM]  0.0-10.0 sec  6.15 GBytes  5.28 Gbits/sec
[SUM]  0.0-10.0 sec  6.00 GBytes  5.16 Gbits/sec
[SUM]  0.0-10.1 sec  5.38 GBytes  4.59 Gbits/sec

[r...@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-backbone
-t 10 -P 8 21; done | grep SUM
[SUM]  0.0-10.0 sec  6.09 GBytes  5.23 Gbits/sec
[SUM]  0.0-10.0 sec  6.41 GBytes  5.51 Gbits/sec
[SUM]  0.0-10.0 sec  4.72 GBytes  4.05 Gbits/sec

[r...@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-admin -t
10 -P 8 21; done | grep SUM
[SUM]  0.0-10.1 sec  6.96 GBytes  5.92 Gbits/sec
[SUM]  0.0-10.1 sec  5.89 GBytes  5.00 Gbits/sec
[SUM]  0.0-10.0 sec  5.35 GBytes  4.58 Gbits/sec


and then
qos_sl2vl  0,1,2,15,4,5,6,7,8,9,10,11,12,13,14,15

Same results as the previous 0,1,15,3,... SL2vl mapping.

If this part works well, then we will continue to
reason no. 2.

In the above tests, I used -P8 to force 8 threads on the client side for
each test.
I have one quad core CPU(Intel  E55400).
This makes 24 iperf threads on 4 cores, which __should__ be fine (well I
suppose ...)


Best would be having one qperf per CPU core,
which is 4 qperf's in your case.

What is your subnet setup?

-- Yevgeny



And regarding reason #3. I still get the error I got yesterday, which
you told me was not important because the SL's set in partitions.conf
would override what was read from qos-policy.conf in the first place.

Nov 25 13:13:05 664690 [373E910] 0x01 - __qos_policy_validate_pkey: ERR
AC15: pkey 0x0002 in match rule - overriding partition SL (0) with QoS
Level SL (3)
Nov 25 13:13:05 664681 [373E910] 0x01 - __qos_policy_validate_pkey: ERR
AC15: pkey 0x0001 in match rule - overriding partition SL (0) with QoS
Level SL (2)
Nov 25 13:13:05 664670 [373E910] 0x01 - __qos_policy_validate_pkey: ERR
AC15: pkey 0x7FFF in match rule - overriding partition SL (0) with QoS
Level SL (1)

Thanks for your help.

Vincent



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 2/8] ehca - ib_post_send fixes, Updated

2009-11-25 Thread frank zago

Always set ib_post_send()'s bad_wr.
Do not report success if an error occurred.

Signed-off-by: Frank Zago fz...@systemfabricworks.com
Index: linux-2.6.31/drivers/infiniband/hw/ehca/ehca_reqs.c
===
--- linux-2.6.31.orig/drivers/infiniband/hw/ehca/ehca_reqs.c	2009-11-24 
18:59:31.0 -0600
+++ linux-2.6.31/drivers/infiniband/hw/ehca/ehca_reqs.c	2009-11-24 
19:49:19.0 -0600

@@ -400,7 +400,6 @@

 static inline int post_one_send(struct ehca_qp *my_qp,
 struct ib_send_wr *cur_send_wr,
-struct ib_send_wr **bad_send_wr,
 int hidden)
 {
struct ehca_wqe *wqe_p;
@@ -412,8 +411,6 @@
wqe_p = ipz_qeit_get_inc(my_qp-ipz_squeue);
if (unlikely(!wqe_p)) {
/* too many posted work requests: queue overflow */
-   if (bad_send_wr)
-   *bad_send_wr = cur_send_wr;
ehca_err(my_qp-ib_qp.device, Too many posted WQEs 
 qp_num=%x, my_qp-ib_qp.qp_num);
return -ENOMEM;
@@ -433,8 +430,6 @@
 */
if (unlikely(ret)) {
my_qp-ipz_squeue.current_q_offset = start_offset;
-   if (bad_send_wr)
-   *bad_send_wr = cur_send_wr;
ehca_err(my_qp-ib_qp.device, Could not write WQE 
 qp_num=%x, my_qp-ib_qp.qp_num);
return -EINVAL;
@@ -448,7 +443,6 @@
   struct ib_send_wr **bad_send_wr)
 {
struct ehca_qp *my_qp = container_of(qp, struct ehca_qp, ib_qp);
-   struct ib_send_wr *cur_send_wr;
int wqe_cnt = 0;
int ret = 0;
unsigned long flags;
@@ -457,7 +451,8 @@
if (unlikely(my_qp-state  IB_QPS_RTS)) {
ehca_err(qp-device, Invalid QP state  qp_state=%d qpn=%x,
 my_qp-state, qp-qp_num);
-   return -EINVAL;
+   ret = -EINVAL;
+   goto out;
}

/* LOCK the QUEUE */
@@ -476,24 +471,21 @@
struct ib_send_wr circ_wr;
memset(circ_wr, 0, sizeof(circ_wr));
circ_wr.opcode = IB_WR_RDMA_READ;
-   post_one_send(my_qp, circ_wr, NULL, 1); /* ignore retcode */
+   post_one_send(my_qp, circ_wr, 1); /* ignore retcode */
wqe_cnt++;
ehca_dbg(qp-device, posted circ wr  qp_num=%x, qp-qp_num);
my_qp-message_count = my_qp-packet_count = 0;
}

/* loop processes list of send reqs */
-   for (cur_send_wr = send_wr; cur_send_wr != NULL;
-cur_send_wr = cur_send_wr-next) {
-   ret = post_one_send(my_qp, cur_send_wr, bad_send_wr, 0);
+   while(send_wr) {
+   ret = post_one_send(my_qp, send_wr, 0);
if (unlikely(ret)) {
-   /* if one or more WQEs were successful, don't fail */
-   if (wqe_cnt)
-   ret = 0;
goto post_send_exit0;
}
wqe_cnt++;
-   } /* eof for cur_send_wr */
+   send_wr = send_wr-next;
+   }

 post_send_exit0:
iosync(); /* serialize GAL register access */
@@ -503,6 +495,10 @@
 my_qp, qp-qp_num, wqe_cnt, ret);
my_qp-message_count += wqe_cnt;
spin_unlock_irqrestore(my_qp-spinlock_s, flags);
+
+out:
+   if (ret)
+   *bad_send_wr = send_wr;
return ret;
 }


--

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: OFED-1.x SRP's user manual

2009-11-25 Thread Vu Pham

Pawel Dziekonski wrote:

On Mon, 16 Nov 2009 at 05:39:45PM +0100, Pawel Dziekonski wrote:
  

Hello,

where do I find SRP user manual? There is no manual in OFED distro,
I have also checked main website and wiki. Manual is mentioned in
OFED/docs/SRPT_README.txt.



Hello again.

I got no answers - does the manual exist at all?
P


  

You can find in ofed-1.5-rc2 (ofed-1.5-2009xxyy-)/docs directory
a. srp_release_notes.txt  -- srp initiator readme
b. SRPT_README.txt -- srp target readme

-vu
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [ANNOUNCE] OFED 1.5 rc3 release is available

2009-11-25 Thread Davis, Arlin R
 
Subject: [ANNOUNCE] OFED 1.5 rc3 release is available

OFED 1.5-rc3 is available


Vlad,

What happened to the new uDAPL libraries released last night?
I don't see them on the list.

-arlin




--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-11-25 Thread Jason Gunthorpe
On Wed, Nov 25, 2009 at 04:41:08PM +0200, Eli Cohen wrote:
 On Wed, Nov 25, 2009 at 09:30:40AM -0500, Jeff Squyres wrote:
  
  In practice, we have seen that applications *do* need to query the
  transport type -- at least (real) IB vs. iWARP.  It is your
  expectation that IB and IBoE will function identically?
  
  Can you discuss the transport vs. transport_type questions?
  
 
 The reason for identifying each specific port with its own transport
 is to allow devices which may configure each port differently to be
 distinhishable. ConnectX is one such device.

As far as I can tell there is no reason for a multi-port device to
be represented through verbs as a single device with multiple
protocols.

If you have a single physical chip with two ports and they are running
different protocols it seems much cleaner to me to report it to verbs
apps as two devices.

Doing this avoids creating compatability problems.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RDMAoE verbs questions

2009-11-25 Thread Pavel Shamis (Pasha)

Or,
The patch is attached.

Regards,
Pasha.

Or Gerlitz wrote:

Jeff Squyres wrote:
Here's one thread:  
http://www.open-mpi.org/community/lists/devel/2009/11/7063.php
Jeff, looking on the threads you have sent, I didn't find a way to 
download the patch in a form which can be applied on a source tree, is 
there a way to do it through this archive? are these patches available 
from some git tree @mellanox or elsewhere? does anyone have the email 
address of Vasily Philipov (/vasily_at_[hidden]/), if yes, can you op 
Pasha please ask him to send me or better, this list the proposed 
patch, many thanks.


Or

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



diff -r 16b0d6d73529 ompi/config/ompi_check_openib.m4
--- a/ompi/config/ompi_check_openib.m4	Tue Nov 03 20:00:16 2009 -0800
+++ b/ompi/config/ompi_check_openib.m4	Sun Nov 15 14:58:37 2009 +0200
@@ -13,7 +13,7 @@
 # Copyright (c) 2006-2008 Cisco Systems, Inc.  All rights reserved.
 # Copyright (c) 2006-2007 Los Alamos National Security, LLC.  All rights
 # reserved.
-# Copyright (c) 2006-2008 Mellanox Technologies. All rights reserved.
+# Copyright (c) 2006-2009 Mellanox Technologies. All rights reserved.
 # $COPYRIGHT$
 # 
 # Additional copyrights may follow
@@ -204,6 +204,21 @@
[$1_have_ibcm=1
$1_LIBS=-libcm $$1_LIBS])])
fi
+		   
+   # Check support for RDMAoE devices
+   $1_have_rdmaoe=0
+   AC_CHECK_DECLS([RDMA_TRANSPORT_RDMAOE],
+  [$1_have_rdmaoe=1], [],
+  [#include infiniband/verbs.h])
+
+   AC_MSG_CHECKING([if RDMAoE support is enabled])
+   if test 1 = $$1_have_rdmaoe; then
+AC_DEFINE_UNQUOTED([OMPI_HAVE_RDMAOE], [$$1_have_rdmaoe], [Enable RDMAoE support])
+AC_MSG_RESULT([yes])
+   else
+AC_MSG_RESULT([no])
+   fi
+
   ])
 
 # Check to see if infiniband/driver.h works.  It is known to
diff -r 16b0d6d73529 ompi/mca/btl/openib/btl_openib.c
--- a/ompi/mca/btl/openib/btl_openib.c	Tue Nov 03 20:00:16 2009 -0800
+++ b/ompi/mca/btl/openib/btl_openib.c	Sun Nov 15 14:58:37 2009 +0200
@@ -354,6 +354,13 @@
 }
 #endif
 
+#ifdef OMPI_HAVE_RDMAOE
+if(RDMA_TRANSPORT_RDMAOE == (openib_btl-ib_port_attr.transport) 
+OPAL_PROC_ON_LOCAL_NODE(ompi_proc-proc_flags)) {
+continue;
+}
+#endif
+
 if(NULL == (ib_proc = mca_btl_openib_proc_create(ompi_proc))) {
 return OMPI_ERR_OUT_OF_RESOURCE;
 }
diff -r 16b0d6d73529 ompi/mca/btl/openib/connect/base.h
--- a/ompi/mca/btl/openib/connect/base.h	Tue Nov 03 20:00:16 2009 -0800
+++ b/ompi/mca/btl/openib/connect/base.h	Sun Nov 15 14:58:37 2009 +0200
@@ -1,6 +1,7 @@
 /*
  * Copyright (c) 2007-2008 Cisco Systems, Inc.  All rights reserved.
  *
+ * Copyright (c) 2009  Mellanox Technologies.  All rights reserved.
  * $COPYRIGHT$
  * 
  * Additional copyrights may follow
@@ -13,6 +14,17 @@
 
 #include connect/connect.h
 
+#ifdef OMPI_HAVE_RDMAOE
+#define BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl)   \
+(((IBV_TRANSPORT_IB != ((btl)-device-ib_dev-transport_type)) || \
+(RDMA_TRANSPORT_RDMAOE == ((btl)-ib_port_attr.transport))) ?  \
+true : false)
+#else
+#define BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl)   \
+((IBV_TRANSPORT_IB != ((btl)-device-ib_dev-transport_type)) ?   \
+true : false)
+#endif
+
 BEGIN_C_DECLS
 
 /*
diff -r 16b0d6d73529 ompi/mca/btl/openib/connect/btl_openib_connect_ibcm.c
--- a/ompi/mca/btl/openib/connect/btl_openib_connect_ibcm.c	Tue Nov 03 20:00:16 2009 -0800
+++ b/ompi/mca/btl/openib/connect/btl_openib_connect_ibcm.c	Sun Nov 15 14:58:37 2009 +0200
@@ -1,6 +1,6 @@
 /*
  * Copyright (c) 2007-2009 Cisco Systems, Inc.  All rights reserved.
- * Copyright (c) 2008  Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2008-2009 Mellanox Technologies. All rights reserved.
  *
  * $COPYRIGHT$
  * 
@@ -653,7 +653,7 @@
we're in an old version of OFED that is IB only (i.e., no
iWarp), so we can safely assume that we can use this CPC. */
 #if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE)
-if (IBV_TRANSPORT_IB != btl-device-ib_dev-transport_type) {
+if (BTL_OPENIB_CONNECT_BASE_CHECK_IF_NOT_IB(btl)) {
 BTL_VERBOSE((ibcm CPC only supported on InfiniBand; skipped on %s:%d,
  ibv_get_device_name(btl-device-ib_dev),
  openib_btl-port_num));
diff -r 16b0d6d73529 ompi/mca/btl/openib/connect/btl_openib_connect_oob.c
--- a/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c	Tue Nov 03 20:00:16 2009 -0800
+++ b/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c	Sun Nov 15 14:58:37 2009 +0200

Re: RDMAoE verbs questions

2009-11-25 Thread Or Gerlitz

Pavel Shamis (Pasha) wrote:

The patch is attached
Thanks, this patch basically replaces checks for the device transport 
type to be IB to a check that makes sure either the former happens or 
the port transport type is rdmaoe. As Jason, Tziporet and noted, the 
port transport type seems to be bad and non-comapatible/operable idea, 
so it should and probably could be avoided.


I see another patch @ 
http://www.open-mpi.org/community/lists/devel/2009/11/7063.php
can you send that one as well. The you sent patch isn't signed so I 
can't address the author in further replies (unless you are the author), 
also it wasn't generated with the -p option of diff which would show for 
each change what is the effected function, doing so would help in the 
review.


Or.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: QoS settings not mapped correctly per pkey ?

2009-11-25 Thread Vincent Ficet
Hello Yevgeny,

 OK, so there are three possible reasons that I can think of:
 1. Something is wrong in the configuration.
 2. The application does not saturate the link, thus QoS
   and the whole VL arbitration thing doesn't kick in.
 3. There's some bug, somewhere.

 Let's start with reason no. 1.
 Please shut off each of the SLs one by one, and
 make sure that the application gets zero BW on
 these SLs. You can do it by mapping SL to VL15:

 qos_sl2vl  0,15,2,3,4,5,6,7,8,9,10,11,12,13,14,15
 If I shut down this SL by moving it to VL15, the interfaces stop
 pinging.
 This is probably because some IPoIB multicast traffic gets cut off for
 pkey 0x7fff .. ?

 Could be, or because ALL interfaces are mapped to
 SL1, which is what the results below suggest.
Yes, you are right (see below).

 So no results for this one.
 and then
 qos_sl2vl  0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15

 With this setup, and the following QoS settings:

 qos_max_vls8
 qos_high_limit 1
 qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
 qos_vlarb_low  0:1,1:64,2:128,3:192,4:0,5:0
 qos_sl2vl  0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15

 I get roughly the same values for SL 1 to SL3:

 That doesn't look right.
 You have shut off SL2, so you can't see same
 BW for this SL. Looks like there is a problem
 in configuration (or bug in SM).
Yes, that's correct: There could be a configuration issue or a bug in SM:

Current setup and results:

qos_max_vls8
qos_high_limit 1
qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
qos_vlarb_low  0:1,1:64,2:128,3:192,4:0,5:0
qos_sl2vl  0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15

[r...@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-ic0 -t
10 -P 8 21; done | grep SUM
[SUM]  0.0-10.1 sec  9.78 GBytes  8.28 Gbits/sec
[SUM]  0.0-10.0 sec  5.69 GBytes  4.89 Gbits/sec
[SUM]  0.0-10.0 sec  4.30 GBytes  3.69 Gbits/sec
[r...@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-backbone
-t 10 -P 8 21; done | grep SUM
[SUM]  0.0-10.2 sec  6.44 GBytes  5.45 Gbits/sec
[SUM]  0.0-10.1 sec  6.64 GBytes  5.66 Gbits/sec
[SUM]  0.0-10.0 sec  6.03 GBytes  5.15 Gbits/sec
[r...@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-admin -t
10 -P 8 21; done | grep SUM
[SUM]  0.0-10.0 sec  5.80 GBytes  4.98 Gbits/sec
[SUM]  0.0-10.0 sec  7.04 GBytes  6.02 Gbits/sec
[SUM]  0.0-10.0 sec  6.60 GBytes  5.67 Gbits/sec

The -backbone bandwidth should be 0 here.


 Have you validated somehow that the interfaces
 have been mapped to the right SLs?
Two things:
1/ Either the interface have not been mapped properly to the right SL's,
but given the config files below, I doubt it:

[r...@pichu22 ~]# tail -n 5 /etc/sysconfig/network-scripts/ifcfg-ib0*
== /etc/sysconfig/network-scripts/ifcfg-ib0 ==
BOOTPROTO=static
IPADDR=10.12.1.10
NETMASK=255.255.0.0
ONBOOT=yes
MTU=2000

== /etc/sysconfig/network-scripts/ifcfg-ib0.8001 ==
BOOTPROTO=static
IPADDR=10.13.1.10
NETMASK=255.255.0.0
ONBOOT=yes
MTU=2000

== /etc/sysconfig/network-scripts/ifcfg-ib0.8002 ==
BOOTPROTO=static
IPADDR=10.14.1.10
NETMASK=255.255.0.0
ONBOOT=yes
MTU=2000

partitions.conf:
-

default=0x7fff,ipoib: ALL=full;
ip_backbone=0x0001,ipoib: ALL=full;
ip_admin=0x0002,ipoib: ALL=full;

qos-policy.conf:

qos-ulps
default: 0 # default SL
ipoib, pkey 0x7FFF: 1 # IP with default pkey 0x7FFF
ipoib, pkey 0x1: 2 # backbone IP with pkey 0x1
ipoib, pkey 0x2: 3 # admin IP with pkey 0x2
end-qos-ulps

ib0.8001 maps to pkey 1 (with MSB set to 1 due to full membership =
0x8001 = (116 | 1)
ib0.8002 maps to pkey 2 (with MSB set to 1 due to full membership =
0x8002 = (116 | 2)

2/ Somehow, the qos policy parsing does not map pkeys as we would
expect, which is what the opensm messages would suggest:

Nov 25 13:13:05 664690 [373E910] 0x01 - __qos_policy_validate_pkey: ERR
AC15: pkey 0x0002 in match rule - overriding partition SL (0) with QoS
Level SL (3)
Nov 25 13:13:05 664681 [373E910] 0x01 - __qos_policy_validate_pkey: ERR
AC15: pkey 0x0001 in match rule - overriding partition SL (0) with QoS
Level SL (2)
Nov 25 13:13:05 664670 [373E910] 0x01 - __qos_policy_validate_pkey: ERR
AC15: pkey 0x7FFF in match rule - overriding partition SL (0) with QoS
Level SL (1)

If the messages are correct and do reflect what opensm is actually
doing, this would explain why shutting down SL1 (by moving it to VL15)
prevented all interfaces from running.


 [r...@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-ic0 -t
 10 -P 8 21; done | grep SUM
 [SUM]  0.0-10.0 sec  6.15 GBytes  5.28 Gbits/sec
 [SUM]  0.0-10.0 sec  6.00 GBytes  5.16 Gbits/sec
 [SUM]  0.0-10.1 sec  5.38 GBytes  4.59 Gbits/sec

 [r...@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-backbone
 -t 10 -P 8 21; done | grep SUM
 [SUM]  0.0-10.0 sec  6.09 GBytes  5.23 Gbits/sec
 [SUM]  0.0-10.0 sec  6.41 GBytes  5.51 Gbits/sec
 [SUM]  0.0-10.0 sec  4.72 GBytes  4.05 Gbits/sec

 [r...@pichu22 ~]# while test -e