Re: Slow performance with librspreload.so

2013-09-16 Thread Gandalf Corvotempesta
2013/9/3 Gandalf Corvotempesta gandalf.corvotempe...@gmail.com:
 $ sudo qperf -ub  172.17.0.2 rc_bi_bw rc_lat rc_bw rc_rdma_read_lat
 rc_rdma_read_bw rc_rdma_write_lat rc_rdma_write_bw tcp_lat tcp_bw
 rc_bi_bw:
 bw  =  20.5 Gb/sec
 rc_lat:
 latency  =  15.4 us
 rc_bw:
 bw  =  13.7 Gb/sec
 rc_rdma_read_lat:
 latency  =  12.9 us
 rc_rdma_read_bw:
 bw  =  11.5 Gb/sec
 rc_rdma_write_lat:
 latency  =  15.2 us
 rc_rdma_write_bw:
 bw  =  13.7 Gb/sec
 tcp_lat:
 latency  =  48.8 us
 tcp_bw:
 bw  =  12.5 Gb/sec

 I don't know if they are good for a DDR fabric.

Just to clarify, why I'm getting the same bandwidth with
librspreload.so and with plain use of IPoIB ?
Should I check something ?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Slow performance with librspreload.so

2013-09-03 Thread Gandalf Corvotempesta
2013/9/1 Gandalf Corvotempesta gandalf.corvotempe...@gmail.com:
 What is strange to me is that rsocket is slower than IPoIB and limited
 to 10Gbit more or less. With IPoIB i'm able to reach 12.5 Gbit

qperf is giving the same strange speed:

FROM NODE1 to NODE2:
$ sudo qperf -ub 77.95.175.106 ud_lat ud_bw
ud_lat:
latency  =  12.5 us
ud_bw:
send_bw  =  12.5 Gb/sec
recv_bw  =  12.5 Gb/sec


FROM NODE1 TO NODE2, slower and with more latency than remote host!
$ sudo qperf -ub 172.17.0.1 ud_lat ud_bw
ud_lat:
latency  =  13.8 us
ud_bw:
send_bw  =  11.9 Gb/sec
recv_bw  =  11.9 Gb/sec


how can I check if this is due to an hardware bottleneck ? CPU and RAM are good.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Slow performance with librspreload.so

2013-09-03 Thread Gandalf Corvotempesta
2013/9/3 Hal Rosenstock h...@dev.mellanox.co.il:
 With mthca, due to quirk, optimal performance is achieved at 1K MTU.
 OpenSM can reduce the MTU in returned PathRecords to 1K when one end of
 the path is mthca and actual path MTU is  1K. This is controlled by
 enable_quirks config parameter which defaults to FALSE (don't do this).

I'll try.

Actually these are my results, from node1 to node2

$ sudo qperf -ub  172.17.0.2 rc_bi_bw rc_lat rc_bw rc_rdma_read_lat
rc_rdma_read_bw rc_rdma_write_lat rc_rdma_write_bw tcp_lat tcp_bw
rc_bi_bw:
bw  =  20.5 Gb/sec
rc_lat:
latency  =  15.4 us
rc_bw:
bw  =  13.7 Gb/sec
rc_rdma_read_lat:
latency  =  12.9 us
rc_rdma_read_bw:
bw  =  11.5 Gb/sec
rc_rdma_write_lat:
latency  =  15.2 us
rc_rdma_write_bw:
bw  =  13.7 Gb/sec
tcp_lat:
latency  =  48.8 us
tcp_bw:
bw  =  12.5 Gb/sec

I don't know if they are good for a DDR fabric.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Slow performance with librspreload.so

2013-09-01 Thread Rupert Dance
My guess is that it will not make a huge difference and that the solution
lies elsewhere.

-Original Message-
From: Gandalf Corvotempesta [mailto:gandalf.corvotempe...@gmail.com] 
Sent: Saturday, August 31, 2013 3:51 PM
To: Rupert Dance
Cc: Hefty, Sean; linux-rdma@vger.kernel.org
Subject: Re: Slow performance with librspreload.so

2013/8/31 Rupert Dance rsda...@soft-forge.com:
 The Vendor ID indicates that this is a Voltaire card which probably 
 means it is an older card. Some of the early Mellanox based cards did 
 not support anything bigger than 2048.

Yes, it's an older card used just for this test.
By the way, increasing MTU to 4096 will give me more performance?


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Slow performance with librspreload.so

2013-09-01 Thread Gandalf Corvotempesta
2013/9/1 Rupert Dance rsda...@soft-forge.com:
 My guess is that it will not make a huge difference and that the solution
 lies elsewhere.

What is strange to me is that rsocket is slower than IPoIB and limited
to 10Gbit more or less. With IPoIB i'm able to reach 12.5 Gbit
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Slow performance with librspreload.so

2013-08-31 Thread Gandalf Corvotempesta
2013/8/30 Rupert Dance rsda...@soft-forge.com:
 One way to set or check mtu is with the ibportstate utility:

 Usage: ibportstate [options] dest dr_path|lid|guid portnum [op]
 Supported ops: enable, disable, reset, speed, width, query, down, arm,
 active, vls, mtu, lid, smlid, lmc

I've tried but max MTU is 2048 on one device:

$ sudo ibv_devinfo
hca_id: mthca0
transport: InfiniBand (0)
fw_ver: 4.7.600
node_guid: 0008:f104:0398:14cc
sys_image_guid: 0008:f104:0398:14cf
vendor_id: 0x08f1
vendor_part_id: 25208
hw_ver: 0xA0
board_id: VLT0040010001
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 1
port_lid: 2
port_lmc: 0x00
link_layer: InfiniBand

any workaround? Maybe a firmware update ?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Slow performance with librspreload.so

2013-08-31 Thread Rupert Dance
The Vendor ID indicates that this is a Voltaire card which probably means it
is an older card. Some of the early Mellanox based cards did not support
anything bigger than 2048. 

  00-08-F1   (hex)  Voltaire
  0008F1 (base 16)  Voltaire
9 Hamenofim st.
Herzelia  46725
ISRAEL

Checking for FW updates cannot hurt but you may well be restricted to 2048

-Original Message-
From: Gandalf Corvotempesta [mailto:gandalf.corvotempe...@gmail.com] 
Sent: Saturday, August 31, 2013 5:21 AM
To: Rupert Dance
Cc: Hefty, Sean; linux-rdma@vger.kernel.org
Subject: Re: Slow performance with librspreload.so

2013/8/30 Rupert Dance rsda...@soft-forge.com:
 One way to set or check mtu is with the ibportstate utility:

 Usage: ibportstate [options] dest dr_path|lid|guid portnum [op] 
 Supported ops: enable, disable, reset, speed, width, query, down, arm, 
 active, vls, mtu, lid, smlid, lmc

I've tried but max MTU is 2048 on one device:

$ sudo ibv_devinfo
hca_id: mthca0
transport: InfiniBand (0)
fw_ver: 4.7.600
node_guid: 0008:f104:0398:14cc
sys_image_guid: 0008:f104:0398:14cf
vendor_id: 0x08f1
vendor_part_id: 25208
hw_ver: 0xA0
board_id: VLT0040010001
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 1
port_lid: 2
port_lmc: 0x00
link_layer: InfiniBand

any workaround? Maybe a firmware update ?


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Slow performance with librspreload.so

2013-08-31 Thread Gandalf Corvotempesta
2013/8/31 Rupert Dance rsda...@soft-forge.com:
 The Vendor ID indicates that this is a Voltaire card which probably means it
 is an older card. Some of the early Mellanox based cards did not support
 anything bigger than 2048.

Yes, it's an older card used just for this test.
By the way, increasing MTU to 4096 will give me more performance?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Slow performance with librspreload.so

2013-08-30 Thread Gandalf Corvotempesta
2013/8/29 Hefty, Sean sean.he...@intel.com:
 12 Gbps on a 20 Gb link actually seems reasonable to me.  I only see around 
 25 Gbps on a 40 Gb link, with raw perftest performance coming in at about 26 
 Gbps.

Is this a rstream limits or an IB limit? I've read somewhere that DDR
should transfer at 16Gbps

By the way, moving the HBA on the second slot, brought me to 12Gbps on
both hosts.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Slow performance with librspreload.so

2013-08-30 Thread Gandalf Corvotempesta
2013/8/30 Gandalf Corvotempesta gandalf.corvotempe...@gmail.com:
 By the way, moving the HBA on the second slot, brought me to 12Gbps on
 both hosts.

This is great:

$ sudo LD_PRELOAD=/usr/local/lib/rsocket/librspreload.so iperf -c 172.17.0.2

Client connecting to 172.17.0.2, TCP port 5001
TCP window size:  128 KByte (default)

[  3] local 172.17.0.1 port 34108 connected with 172.17.0.2 port 5001
[ ID] Interval   Transfer Bandwidth
[  3]  0.0-10.0 sec  12.2 GBytes  10.5 Gbits/sec
$ sudo LD_PRELOAD=/usr/local/lib/rsocket/librspreload.so iperf -c
172.17.0.2 -P 2

Client connecting to 172.17.0.2, TCP port 5001
TCP window size:  128 KByte (default)

[  4] local 172.17.0.1 port 55323 connected with 172.17.0.2 port 5001
[  3] local 172.17.0.1 port 36579 connected with 172.17.0.2 port 5001
[ ID] Interval   Transfer Bandwidth
[  4]  0.0-10.0 sec  7.46 GBytes  6.41 Gbits/sec
[  3]  0.0-10.0 sec  7.46 GBytes  6.41 Gbits/sec
[SUM]  0.0-10.0 sec  14.9 GBytes  12.8 Gbits/sec


with 2 parallel connection i'm able to reach rate speed with iperf,
the same speed archived with rstream.
Is iperf affected by IPoIB MTU size when used with librspreload.so ?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Slow performance with librspreload.so

2013-08-30 Thread Gandalf Corvotempesta
2013/8/30 Gandalf Corvotempesta gandalf.corvotempe...@gmail.com:
 Is iperf affected by IPoIB MTU size when used with librspreload.so ?

Another strange issue:

$ sudo LD_PRELOAD=/usr/local/lib/rsocket/librspreload.so iperf -c 172.17.0.2

Client connecting to 172.17.0.2, TCP port 5001
TCP window size:  128 KByte (default)

[  3] local 172.17.0.1 port 57926 connected with 172.17.0.2 port 5001
[ ID] Interval   Transfer Bandwidth
[  3]  0.0-10.0 sec  12.2 GBytes  10.4 Gbits/sec

$ iperf -c 172.17.0.2

Client connecting to 172.17.0.2, TCP port 5001
TCP window size:  648 KByte (default)

[  3] local 172.17.0.1 port 58113 connected with 172.17.0.2 port 5001
[ ID] Interval   Transfer Bandwidth
[  3]  0.0-10.0 sec  14.5 GBytes  12.5 Gbits/sec



rsocket slower than IPoIB ?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Slow performance with librspreload.so

2013-08-30 Thread Hefty, Sean
 with 2 parallel connection i'm able to reach rate speed with iperf,
 the same speed archived with rstream.
 Is iperf affected by IPoIB MTU size when used with librspreload.so ?

Not directly.  The ipoib mtu is usually set based on the mtu of the IB link.  
The latter does affect rsocket performance.  However if the ipoib mtu is 
changed separately from the IB link mtu, it will not affect rsockets.

- Sean
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Slow performance with librspreload.so

2013-08-30 Thread Gandalf Corvotempesta
2013/8/30 Hefty, Sean sean.he...@intel.com:
 Not directly.  The ipoib mtu is usually set based on the mtu of the IB link.  
 The latter does affect rsocket performance.  However if the ipoib mtu is 
 changed separately from the IB link mtu, it will not affect rsockets.

Actually i'm going faster with IPoIB than rsockets.
How can I change the MTU for IB link ?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Slow performance with librspreload.so

2013-08-30 Thread Atchley, Scott
On Aug 30, 2013, at 1:38 PM, Hefty, Sean sean.he...@intel.com wrote:

 Another strange issue:
 
 $ sudo LD_PRELOAD=/usr/local/lib/rsocket/librspreload.so iperf -c
 172.17.0.2
 
 Client connecting to 172.17.0.2, TCP port 5001
 TCP window size:  128 KByte (default)
 
 Increasing the window size may improve the results.  E.g. on my systems I go 
 from 17.7 Gbps at 128 KB to 24.3 Gbps for 512 KB.
 
 
 [  3] local 172.17.0.1 port 57926 connected with 172.17.0.2 port 5001
 [ ID] Interval   Transfer Bandwidth
 [  3]  0.0-10.0 sec  12.2 GBytes  10.4 Gbits/sec
 
 $ iperf -c 172.17.0.2
 
 Client connecting to 172.17.0.2, TCP port 5001
 TCP window size:  648 KByte (default)
 
 [  3] local 172.17.0.1 port 58113 connected with 172.17.0.2 port 5001
 [ ID] Interval   Transfer Bandwidth
 [  3]  0.0-10.0 sec  14.5 GBytes  12.5 Gbits/sec
 
 rsocket slower than IPoIB ?
 
 This is surprising to me - just getting 12.5 Gbps out of ipoib is surprising. 
  Does iperf use sendfile()?

I have a pair of nodes connected by QDR via a switch. Using normal IPoIB, a 
single Netperf can reach 18.4 Gb/s if I bind to the same core that the IRQ 
handler is bound to. With four concurrent Netperfs, I can reach 23 Gb/s. This 
is in datagram mode. Connected mode is slower.

I have not tried rsockets on these nodes.

Scott


 
 My results with iperf (version 2.0.5) over ipoib (default configurations) 
 vary considerably based on the TCP window size.  (Note that this is a 40 Gbps 
 link.)  Results summarized:
 
 TCP window size: 27.9 KByte (default)
 [  3]  0.0-10.0 sec  12.8 GBytes  11.0 Gbits/sec
 
 TCP window size:  416 KByte (WARNING: requested  500 KByte)
 [  3]  0.0-10.0 sec  8.19 GBytes  7.03 Gbits/sec
 
 TCP window size:  250 KByte (WARNING: requested  125 KByte)
 [  3]  0.0-10.0 sec  4.99 GBytes  4.29 Gbits/sec
 
 I'm guessing that there are some settings I can change to increase the ipoib 
 performance on my systems.  Using rspreload, I get:
 
 LD_PRELOAD=/usr/local/lib/rsocket/librspreload.so iperf -c 192.168.0.103
 TCP window size:  512 KByte (default)
 [  3]  0.0-10.0 sec  28.3 GBytes  24.3 Gbits/sec
 
 It seems that ipoib bandwidth should be close to rsockets, similar to what 
 you see.  I also don't understand the effect that the TCP window size is 
 having on the results.  The smallest window gives the best bandwidth for 
 ipoib?!
 
 - Sean
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Slow performance with librspreload.so

2013-08-30 Thread Hefty, Sean
 Another strange issue:
 
 $ sudo LD_PRELOAD=/usr/local/lib/rsocket/librspreload.so iperf -c
 172.17.0.2
 
 Client connecting to 172.17.0.2, TCP port 5001
 TCP window size:  128 KByte (default)

Increasing the window size may improve the results.  E.g. on my systems I go 
from 17.7 Gbps at 128 KB to 24.3 Gbps for 512 KB.

 
 [  3] local 172.17.0.1 port 57926 connected with 172.17.0.2 port 5001
 [ ID] Interval   Transfer Bandwidth
 [  3]  0.0-10.0 sec  12.2 GBytes  10.4 Gbits/sec
 
 $ iperf -c 172.17.0.2
 
 Client connecting to 172.17.0.2, TCP port 5001
 TCP window size:  648 KByte (default)
 
 [  3] local 172.17.0.1 port 58113 connected with 172.17.0.2 port 5001
 [ ID] Interval   Transfer Bandwidth
 [  3]  0.0-10.0 sec  14.5 GBytes  12.5 Gbits/sec
 
 rsocket slower than IPoIB ?

This is surprising to me - just getting 12.5 Gbps out of ipoib is surprising.  
Does iperf use sendfile()?

My results with iperf (version 2.0.5) over ipoib (default configurations) vary 
considerably based on the TCP window size.  (Note that this is a 40 Gbps link.) 
 Results summarized:

TCP window size: 27.9 KByte (default)
 [  3]  0.0-10.0 sec  12.8 GBytes  11.0 Gbits/sec

TCP window size:  416 KByte (WARNING: requested  500 KByte)
[  3]  0.0-10.0 sec  8.19 GBytes  7.03 Gbits/sec

TCP window size:  250 KByte (WARNING: requested  125 KByte)
 [  3]  0.0-10.0 sec  4.99 GBytes  4.29 Gbits/sec

I'm guessing that there are some settings I can change to increase the ipoib 
performance on my systems.  Using rspreload, I get:

LD_PRELOAD=/usr/local/lib/rsocket/librspreload.so iperf -c 192.168.0.103
TCP window size:  512 KByte (default)
[  3]  0.0-10.0 sec  28.3 GBytes  24.3 Gbits/sec

It seems that ipoib bandwidth should be close to rsockets, similar to what you 
see.  I also don't understand the effect that the TCP window size is having on 
the results.  The smallest window gives the best bandwidth for ipoib?!

- Sean
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Slow performance with librspreload.so

2013-08-30 Thread Rupert Dance
One way to set or check mtu is with the ibportstate utility:

Usage: ibportstate [options] dest dr_path|lid|guid portnum [op]
Supported ops: enable, disable, reset, speed, width, query, down, arm,
active, vls, mtu, lid, smlid, lmc

-Original Message-
From: linux-rdma-ow...@vger.kernel.org
[mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Gandalf Corvotempesta
Sent: Friday, August 30, 2013 12:27 PM
To: Hefty, Sean
Cc: linux-rdma@vger.kernel.org
Subject: Re: Slow performance with librspreload.so

2013/8/30 Hefty, Sean sean.he...@intel.com:
 Not directly.  The ipoib mtu is usually set based on the mtu of the IB
link.  The latter does affect rsocket performance.  However if the ipoib mtu
is changed separately from the IB link mtu, it will not affect rsockets.

Actually i'm going faster with IPoIB than rsockets.
How can I change the MTU for IB link ?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in the
body of a message to majord...@vger.kernel.org More majordomo info at
http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: Slow performance with librspreload.so

2013-08-29 Thread Gandalf Corvotempesta
-- Forwarded message --
From: Gandalf Corvotempesta gandalf.corvotempe...@gmail.com
Date: 2013/8/29
Subject: Re: Slow performance with librspreload.so
To: Hefty, Sean sean.he...@intel.com


2013/8/28 Hefty, Sean sean.he...@intel.com:
 If you can provide your PCIe information and the results from running the 
 perftest tools (rdma_bw), that could help as well.

node1 (172.17.0.1 is ip configured on ib0):

$ sudo ./rstream -s 172.17.0.1
name  bytes   xfers   iters   total   time Gb/secusec/xfer
64_lat64  1   100k12m 0.26s  0.40   1.28
4k_lat4k  1   10k 78m 0.17s  3.96   8.28
64k_lat   64k 1   1k  125m0.11s  9.86  53.19
1m_lat1m  1   100 200m0.14s 12.34 679.73
64_bw 64  100k1   12m 0.06s  1.75   0.29
4k_bw 4k  10k 1   78m 0.06s 11.79   2.78
64k_bw64k 1k  1   125m0.09s 12.20  42.97
1m_bw 1m  100 1   200m0.13s 12.78 656.55

$ lspci | grep -i infiniband
04:00.0 InfiniBand: Mellanox Technologies MT25418 [ConnectX VPI PCIe
2.0 2.5GT/s - IB DDR / 10GigE] (rev a0)


node2 (172.17.0.2 is ip configured on ib0):
$ sudo ./rstream -s 172.17.0.2
name  bytes   xfers   iters   total   time Gb/secusec/xfer
64_lat64  1   100k12m 1.10s  0.09   5.49
4k_lat4k  1   10k 78m 0.43s  1.53  21.49
64k_lat   64k 1   1k  125m0.29s  3.64 143.99
1m_lat1m  1   100 200m0.37s  4.531852.70
64_bw 64  100k1   12m 0.42s  0.24   2.12
4k_bw 4k  10k 1   78m 0.16s  4.16   7.87
64k_bw64k 1k  1   125m0.23s  4.49 116.69
1m_bw 1m  100 1   200m0.36s  4.631813.52

$ lspci | grep -i infiniband
02:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex
(Tavor compatibility mode) (rev 20)
(this is a Voltaire 400Ex-D card)

Same result by using 127.0.0.1 on both hosts, obviously.

I'm unable to run rdma_bw due to different CPU speed any my versions
doesn't have the ignore flag.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Slow performance with librspreload.so

2013-08-29 Thread Gandalf Corvotempesta
2013/8/29 Gandalf Corvotempesta gandalf.corvotempe...@gmail.com:
 node1 (172.17.0.1 is ip configured on ib0):

 $ sudo ./rstream -s 172.17.0.1
 name  bytes   xfers   iters   total   time Gb/secusec/xfer
 64_lat64  1   100k12m 0.26s  0.40   1.28
 4k_lat4k  1   10k 78m 0.17s  3.96   8.28
 64k_lat   64k 1   1k  125m0.11s  9.86  53.19
 1m_lat1m  1   100 200m0.14s 12.34 679.73
 64_bw 64  100k1   12m 0.06s  1.75   0.29
 4k_bw 4k  10k 1   78m 0.06s 11.79   2.78
 64k_bw64k 1k  1   125m0.09s 12.20  42.97
 1m_bw 1m  100 1   200m0.13s 12.78 656.55

With standard sockets:

$ sudo ./rstream -s 172.17.0.1 -T s
name  bytes   xfers   iters   total   time Gb/secusec/xfer
64_lat64  1   100k12m 1.07s  0.10   5.36
4k_lat4k  1   10k 78m 0.13s  4.89   6.70
64k_lat   64k 1   1k  125m0.06s 18.38  28.52
1m_lat1m  1   100 200m0.06s 25.90 323.89
64_bw 64  100k1   12m 0.98s  0.10   4.91
4k_bw 4k  10k 1   78m 0.12s  5.29   6.20
64k_bw64k 1k  1   125m0.04s 27.04  19.39
1m_bw 1m  100 1   200m0.05s 31.52 266.14
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Slow performance with librspreload.so

2013-08-29 Thread Gandalf Corvotempesta
2013/8/29 Hefty, Sean sean.he...@intel.com:
 12 Gbps on a 20 Gb link actually seems reasonable to me.  I only see around 
 25 Gbps on a 40 Gb link, with raw perftest performance coming in at about 26 
 Gbps.

Ok.
I think that i've connected the HBA to the wrong PCI-Express slot.
I have a DELL R200 that has 3 PCI-Express slot but one of them is just
x4. probably i've connected the card to this.

Tomorrow i'll try to connect the HBA to the x8 slot.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Slow performance with librspreload.so

2013-08-28 Thread Gandalf Corvotempesta
Hi
i'm trying the preloader librspreload.so on two directly connected hosts:

host1:$ sudo ibstatus
Infiniband device 'mlx4_0' port 1 status:
default gid: fe80::::0002:c903:004d:dd45
base lid: 0x1
sm lid: 0x1
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 20 Gb/sec (4X DDR)
link_layer: InfiniBand

Infiniband device 'mlx4_0' port 2 status:
default gid: fe80::::0002:c903:004d:dd46
base lid: 0x0
sm lid: 0x0
state: 1: DOWN
phys state: 2: Polling
rate: 10 Gb/sec (4X)
link_layer: InfiniBand


host2:$ sudo ibstatus
Infiniband device 'mthca0' port 1 status:
default gid: fe80::::0008:f104:0398:14cd
base lid: 0x2
sm lid: 0x1
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 20 Gb/sec (4X DDR)
link_layer: InfiniBand

Infiniband device 'mthca0' port 2 status:
default gid: fe80::::0008:f104:0398:14ce
base lid: 0x0
sm lid: 0x0
state: 1: DOWN
phys state: 2: Polling
rate: 10 Gb/sec (4X)
link_layer: InfiniBand



i've connected just one port between two hosts.
Ports is detected properly as 20Gb/s  (4x DDR) but i'm unable to reach
speed over 5Gbit/s:

host1:$ sudo LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so
NPtcp -h 172.17.0.2
Send and receive buffers are 131072 and 131072 bytes
(A bug in Linux doubles the requested buffer sizes)
Now starting the main loop
  0:   1 bytes  17008 times --  1.24 Mbps in   6.13 usec
  1:   2 bytes  16306 times --  2.02 Mbps in   7.56 usec
  2:   3 bytes  13223 times --  3.10 Mbps in   7.38 usec
  3:   4 bytes   9037 times --  4.21 Mbps in   7.25 usec
  4:   6 bytes  10345 times --  6.49 Mbps in   7.05 usec
  5:   8 bytes   7093 times --  7.77 Mbps in   7.85 usec
  6:  12 bytes   7957 times -- 17.08 Mbps in   5.36 usec
  7:  13 bytes   7772 times -- 14.75 Mbps in   6.73 usec
  8:  16 bytes   6861 times -- 16.11 Mbps in   7.58 usec
  9:  19 bytes   7424 times -- 18.91 Mbps in   7.67 usec
 10:  21 bytes   8237 times -- 17.69 Mbps in   9.06 usec
 11:  24 bytes   7361 times -- 19.72 Mbps in   9.28 usec
 12:  27 bytes   7628 times -- 24.14 Mbps in   8.53 usec
 13:  29 bytes   5207 times -- 29.81 Mbps in   7.42 usec
 14:  32 bytes   6504 times -- 29.42 Mbps in   8.30 usec
 15:  35 bytes   6401 times -- 39.08 Mbps in   6.83 usec
 16:  45 bytes   8362 times -- 45.19 Mbps in   7.60 usec
 17:  48 bytes   8774 times -- 46.10 Mbps in   7.94 usec
 18:  51 bytes   8654 times -- 55.19 Mbps in   7.05 usec
 19:  61 bytes   5562 times -- 57.42 Mbps in   8.10 usec
 20:  64 bytes   6068 times -- 72.31 Mbps in   6.75 usec
 21:  67 bytes   7636 times -- 42.93 Mbps in  11.91 usec
 22:  93 bytes   4512 times -- 55.84 Mbps in  12.71 usec
 23:  96 bytes   5246 times -- 60.13 Mbps in  12.18 usec
 24:  99 bytes   5558 times -- 59.49 Mbps in  12.70 usec
 25: 125 bytes   2864 times -- 75.25 Mbps in  12.67 usec
 26: 128 bytes   3913 times -- 75.78 Mbps in  12.89 usec
 27: 131 bytes   3940 times -- 74.77 Mbps in  13.37 usec
 28: 189 bytes   3883 times --113.42 Mbps in  12.71 usec
 29: 192 bytes   5243 times --109.85 Mbps in  13.33 usec
 30: 195 bytes   5038 times --115.66 Mbps in  12.86 usec
 31: 253 bytes   2710 times --146.61 Mbps in  13.17 usec
 32: 256 bytes   3782 times --142.77 Mbps in  13.68 usec
 33: 259 bytes   3683 times --144.75 Mbps in  13.65 usec
 34: 381 bytes   3733 times --201.64 Mbps in  14.42 usec
 35: 384 bytes   4624 times --204.22 Mbps in  14.35 usec
 36: 387 bytes   4665 times --204.65 Mbps in  14.43 usec
 37: 509 bytes   2364 times --265.12 Mbps in  14.65 usec
 38: 512 bytes   3406 times --267.89 Mbps in  14.58 usec
 39: 515 bytes   3442 times --266.90 Mbps in  14.72 usec
 40: 765 bytes   3429 times --381.51 Mbps in  15.30 usec
 41: 768 bytes   4357 times --384.85 Mbps in  15.23 usec
 42: 771 bytes   4387 times --386.35 Mbps in  15.23 usec
 43:1021 bytes   2214 times --495.38 Mbps in  15.72 usec
 44:1024 bytes   3176 times --499.56 Mbps in  15.64 usec
 45:1027 bytes   3203 times --497.19 Mbps in  15.76 usec
 46:1533 bytes   3188 times --692.19 Mbps in  16.90 usec
 47:1536 bytes   3945 times --688.52 Mbps in  17.02 usec
 48:1539 bytes   3920 times --693.85 Mbps in  16.92 usec
 49:2045 bytes   1981 times --858.05 Mbps in  18.18 usec
 50:2048 bytes   2748 times --862.22 Mbps in  18.12 usec
 51:2051 bytes   2761 times --832.50 Mbps in  18.80 usec
 52:3069 bytes   2666 times --   1174.72 Mbps in  19.93 usec
 53:3072 bytes   

RE: Slow performance with librspreload.so

2013-08-28 Thread Hefty, Sean
 i've connected just one port between two hosts.
 Ports is detected properly as 20Gb/s  (4x DDR) but i'm unable to reach
 speed over 5Gbit/s:

It's possible that this is falling back to using normal TCP sockets.

Can you run the rstream test program to verify that you can get faster than 5 
Gbps?

rstream without any options will use rsockets directly.  If you use the -T s 
option, it will use standard TCP sockets.  You can use LD_PRELOAD with -T s to 
verify that the preload brings your per performance to the same level as using 
rsockets directly.

- Sean
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Slow performance with librspreload.so

2013-08-28 Thread Gandalf Corvotempesta
2013/8/28 Hefty, Sean sean.he...@intel.com:
 Can you run the rstream test program to verify that you can get faster than 5 
 Gbps?

 rstream without any options will use rsockets directly.  If you use the -T s 
 option, it will use standard TCP sockets.  You can use LD_PRELOAD with -T s 
 to verify that the preload brings your per performance to the same level as 
 using rsockets directly.

5Gb/s with rstream:

$ sudo ./rstream -s 172.17.0.2
name  bytes   xfers   iters   total   time Gb/secusec/xfer
64_lat64  1   100k12m 0.70s  0.15   3.52
4k_lat4k  1   10k 78m 0.29s  2.23  14.69
64k_lat   64k 1   1k  125m0.21s  4.94 106.07
1m_lat1m  1   100 200m0.30s  5.611495.89
64_bw 64  100k1   12m 0.25s  0.42   1.23
4k_bw 4k  10k 1   78m 0.13s  5.17   6.34
64k_bw64k 1k  1   125m0.19s  5.58  94.03
1m_bw 1m  100 1   200m0.30s  5.641486.53
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Slow performance with librspreload.so

2013-08-28 Thread Gandalf Corvotempesta
2013/8/28 Hefty, Sean sean.he...@intel.com:
 Can you explain your environment more?  The performance seems low.

Ubuntu 13.04 Server on both nodes.

node1:

$ cat /proc/cpuinfo | grep 'model name'
model name : Intel(R) Xeon(R) CPU E5-2603 0 @ 1.80GHz
model name : Intel(R) Xeon(R) CPU E5-2603 0 @ 1.80GHz
model name : Intel(R) Xeon(R) CPU E5-2603 0 @ 1.80GHz
model name : Intel(R) Xeon(R) CPU E5-2603 0 @ 1.80GHz

$ free -m
 total   used   free sharedbuffers cached
Mem: 16022966  15056  0 95534
-/+ buffers/cache:336  15686
Swap:16353  0  16353


node2:

$ cat /proc/cpuinfo | grep 'model name'
model name : Intel(R) Xeon(R) CPU3065  @ 2.33GHz
model name : Intel(R) Xeon(R) CPU3065  @ 2.33GHz

$ free -m
 total   used   free sharedbuffers cached
Mem:  2001718   1282  0 53516
-/+ buffers/cache:148   1853
Swap: 2044  0   2044
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Slow performance with librspreload.so

2013-08-28 Thread Hefty, Sean
 2013/8/28 Hefty, Sean sean.he...@intel.com:
  Can you explain your environment more?  The performance seems low.
 
 Ubuntu 13.04 Server on both nodes.
 
 node1:
 
 $ cat /proc/cpuinfo | grep 'model name'
 model name : Intel(R) Xeon(R) CPU E5-2603 0 @ 1.80GHz


 $ cat /proc/cpuinfo | grep 'model name'
 model name : Intel(R) Xeon(R) CPU3065  @ 2.33GHz

Can you run rstream using the loopback address?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Slow performance with librspreload.so

2013-08-28 Thread Hefty, Sean
 Ubuntu 13.04 Server on both nodes.
 
 node1:
 
 $ cat /proc/cpuinfo | grep 'model name'
 model name : Intel(R) Xeon(R) CPU E5-2603 0 @ 1.80GHz

If you can provide your PCIe information and the results from running the 
perftest tools (rdma_bw), that could help as well.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html