Re: [ceph-users] Help needed porting Ceph to RSockets

2014-02-05 Thread Gandalf Corvotempesta
2013-10-31 Hefty, Sean sean.he...@intel.com:
 Can you please try the attached patch in place of all previous patches?

Any updates on ceph with rsockets?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPoIB and STP

2013-12-29 Thread Gandalf Corvotempesta
OpenSM will be running on both nodes and both node ports will be
bonded together in active-failover (or active-active, if possible). My
issue is with interconnection links between switches.
Last time that i've tried I had issue due to the interface bonding in
active-active.
When using active-active i've seen IPoIB running and just a couple of
KBytes/s and in active-failover was running at 10-11Gbps/s

So, are you suggesting:
OpenSM running on both, node1 and node2
Both nodes connected to both switches (1 port per switch)
Both switches interconnected with 2 or more links to have aggregated bandwidth

This would'nt require any special configuration? How can I ensure the
both link on switches are used together? Any command to run with
OpenSM ?

2013/12/29 Gennadiy Nerubayev para...@gmail.com:
 On Sat, Dec 28, 2013 at 6:08 AM, Gandalf Corvotempesta
 gandalf.corvotempe...@gmail.com wrote:

 Hi,
 i'm trying to configure a redundant IPoIB network.
 Obviously, IB switches doesn't talk IP and so doesn't have STP support.

 How can I interconnect two different switches with multiple cables and
 avoid loops?

 For example:
 http://pastebin.com/raw.php?i=w8ySRibG

 To have a fully redundant network, I have to interconnect switch1 and
 switch2 at least with 2 cables and enable STP to avoit loops

 Is OpenSM smart enough to detect a multiple link and shutdown one port
 automatically (doing the same as STP) ?

 Infiniband uses a switched fabric topology, so regular Ethernet STP
 topology rules do not apply. In this scenario here each link, direct
 or via the interconnects to the other switch, would just be an
 alternate path to the target. If I recall correctly, running OpenSM
 only binds to one port, so this is why you'd technically want those
 links between the switches to have a single fabric. Of course, losing
 the link to that port also means you will have no subnet manager, so
 you might want to consider having it running on one more host as
 well..

 -Gennadiy
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rsockets addressing

2013-09-17 Thread Gandalf Corvotempesta
2013/9/17 Yann Droneaud ydrone...@opteya.com:
 Are talking about Ethernet network or InfiniBand Fabric ?

IB fabric

 In case of InfiniBand, the subnet manager should take care of it.

So, interconnecting 2 IB switches is good.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


rsockets addressing

2013-09-16 Thread Gandalf Corvotempesta
Hi to all,
which kind of system is used by rsockets to address the remote host?
Is using IPoIB ?

Will I be able to support two redundant fabrics with failover managed
by OpenSM with rsockets ?
For example, two nodes connected (with 2 HBA on each) to two different
IB fabrics.
In case of an HBA failure, will rsocket be able to re-establish a
connected using the second fabric ?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rsockets addressing

2013-09-16 Thread Gandalf Corvotempesta
2013/9/16 Hefty, Sean sean.he...@intel.com:
 rsockets does not implement failover.  An application would need to 
 reestablish a connection in the case of a failure.  I have not looked to see 
 what it would take to implement failover inside rsockets, and that's not 
 something I would have time to implement anytime soon.

Connection restablishement is ok, but in case of port failure, ipoib
will remap the remote IP against the new port/fabric ?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Slow performance with librspreload.so

2013-09-16 Thread Gandalf Corvotempesta
2013/9/3 Gandalf Corvotempesta gandalf.corvotempe...@gmail.com:
 $ sudo qperf -ub  172.17.0.2 rc_bi_bw rc_lat rc_bw rc_rdma_read_lat
 rc_rdma_read_bw rc_rdma_write_lat rc_rdma_write_bw tcp_lat tcp_bw
 rc_bi_bw:
 bw  =  20.5 Gb/sec
 rc_lat:
 latency  =  15.4 us
 rc_bw:
 bw  =  13.7 Gb/sec
 rc_rdma_read_lat:
 latency  =  12.9 us
 rc_rdma_read_bw:
 bw  =  11.5 Gb/sec
 rc_rdma_write_lat:
 latency  =  15.2 us
 rc_rdma_write_bw:
 bw  =  13.7 Gb/sec
 tcp_lat:
 latency  =  48.8 us
 tcp_bw:
 bw  =  12.5 Gb/sec

 I don't know if they are good for a DDR fabric.

Just to clarify, why I'm getting the same bandwidth with
librspreload.so and with plain use of IPoIB ?
Should I check something ?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rsockets addressing

2013-09-16 Thread Gandalf Corvotempesta
2013/9/16 Yann Droneaud ydrone...@opteya.com:
 The InfiniBand fabrics support ISL ... indeed.
[cut]
 You definitely need link between switches if you want to use a high
 availability fabric topology with HCA ports connected to differents
 switches. (additionally, a switch should have a link to two others
 switches ... then it's starting to be complicated, since you have to
 design your fabric topology to match the communication pattern / data
 locality used by your application ...).

I've read somewhere that this is not suggested because issues on one
switch could affect also the second switch, but doing so will allow me
to use both port in hot-standby failover.

In one port fails, IPoIB bonding will switch on the second port and at
the same time , traffic will be routed trough the ISL link

Without the ISL, a port failure will bring down the whole node.

So:
 - switch1 conntected to switch2
 - node1 connected to both switches
 - node2 connected to both switches
 - IPoIB on each node with active-passive bonding for each IB port.

How can I create a redundant ISL ? Should I connect two or more port and
the subnet manager will automatically take care of this or I have to configure
something like STP on plain ethernet networks.
Will the ISL be used in loadbalancing in case of 2 or more cables connected ?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] Help needed porting Ceph to RSockets

2013-09-12 Thread Gandalf Corvotempesta
2013/9/10 Andreas Bluemle andreas.blue...@itxperts.de:
 Since I have added these workarounds to my version of the librdmacm
 library, I can at least start up ceph using LD_PRELOAD and end up in
 a healthy ceph cluster state.

Have you seen any performance improvement by using LD_PRELOAD with ceph?
Which throughput are you able to archive with rsocket and ceph?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Slow performance with librspreload.so

2013-09-03 Thread Gandalf Corvotempesta
2013/9/1 Gandalf Corvotempesta gandalf.corvotempe...@gmail.com:
 What is strange to me is that rsocket is slower than IPoIB and limited
 to 10Gbit more or less. With IPoIB i'm able to reach 12.5 Gbit

qperf is giving the same strange speed:

FROM NODE1 to NODE2:
$ sudo qperf -ub 77.95.175.106 ud_lat ud_bw
ud_lat:
latency  =  12.5 us
ud_bw:
send_bw  =  12.5 Gb/sec
recv_bw  =  12.5 Gb/sec


FROM NODE1 TO NODE2, slower and with more latency than remote host!
$ sudo qperf -ub 172.17.0.1 ud_lat ud_bw
ud_lat:
latency  =  13.8 us
ud_bw:
send_bw  =  11.9 Gb/sec
recv_bw  =  11.9 Gb/sec


how can I check if this is due to an hardware bottleneck ? CPU and RAM are good.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Slow performance with librspreload.so

2013-09-03 Thread Gandalf Corvotempesta
2013/9/3 Hal Rosenstock h...@dev.mellanox.co.il:
 With mthca, due to quirk, optimal performance is achieved at 1K MTU.
 OpenSM can reduce the MTU in returned PathRecords to 1K when one end of
 the path is mthca and actual path MTU is  1K. This is controlled by
 enable_quirks config parameter which defaults to FALSE (don't do this).

I'll try.

Actually these are my results, from node1 to node2

$ sudo qperf -ub  172.17.0.2 rc_bi_bw rc_lat rc_bw rc_rdma_read_lat
rc_rdma_read_bw rc_rdma_write_lat rc_rdma_write_bw tcp_lat tcp_bw
rc_bi_bw:
bw  =  20.5 Gb/sec
rc_lat:
latency  =  15.4 us
rc_bw:
bw  =  13.7 Gb/sec
rc_rdma_read_lat:
latency  =  12.9 us
rc_rdma_read_bw:
bw  =  11.5 Gb/sec
rc_rdma_write_lat:
latency  =  15.2 us
rc_rdma_write_bw:
bw  =  13.7 Gb/sec
tcp_lat:
latency  =  48.8 us
tcp_bw:
bw  =  12.5 Gb/sec

I don't know if they are good for a DDR fabric.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Slow performance with librspreload.so

2013-09-01 Thread Gandalf Corvotempesta
2013/9/1 Rupert Dance rsda...@soft-forge.com:
 My guess is that it will not make a huge difference and that the solution
 lies elsewhere.

What is strange to me is that rsocket is slower than IPoIB and limited
to 10Gbit more or less. With IPoIB i'm able to reach 12.5 Gbit
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Slow performance with librspreload.so

2013-08-31 Thread Gandalf Corvotempesta
2013/8/30 Rupert Dance rsda...@soft-forge.com:
 One way to set or check mtu is with the ibportstate utility:

 Usage: ibportstate [options] dest dr_path|lid|guid portnum [op]
 Supported ops: enable, disable, reset, speed, width, query, down, arm,
 active, vls, mtu, lid, smlid, lmc

I've tried but max MTU is 2048 on one device:

$ sudo ibv_devinfo
hca_id: mthca0
transport: InfiniBand (0)
fw_ver: 4.7.600
node_guid: 0008:f104:0398:14cc
sys_image_guid: 0008:f104:0398:14cf
vendor_id: 0x08f1
vendor_part_id: 25208
hw_ver: 0xA0
board_id: VLT0040010001
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 1
port_lid: 2
port_lmc: 0x00
link_layer: InfiniBand

any workaround? Maybe a firmware update ?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Slow performance with librspreload.so

2013-08-31 Thread Gandalf Corvotempesta
2013/8/31 Rupert Dance rsda...@soft-forge.com:
 The Vendor ID indicates that this is a Voltaire card which probably means it
 is an older card. Some of the early Mellanox based cards did not support
 anything bigger than 2048.

Yes, it's an older card used just for this test.
By the way, increasing MTU to 4096 will give me more performance?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Slow performance with librspreload.so

2013-08-30 Thread Gandalf Corvotempesta
2013/8/29 Hefty, Sean sean.he...@intel.com:
 12 Gbps on a 20 Gb link actually seems reasonable to me.  I only see around 
 25 Gbps on a 40 Gb link, with raw perftest performance coming in at about 26 
 Gbps.

Is this a rstream limits or an IB limit? I've read somewhere that DDR
should transfer at 16Gbps

By the way, moving the HBA on the second slot, brought me to 12Gbps on
both hosts.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Slow performance with librspreload.so

2013-08-30 Thread Gandalf Corvotempesta
2013/8/30 Gandalf Corvotempesta gandalf.corvotempe...@gmail.com:
 By the way, moving the HBA on the second slot, brought me to 12Gbps on
 both hosts.

This is great:

$ sudo LD_PRELOAD=/usr/local/lib/rsocket/librspreload.so iperf -c 172.17.0.2

Client connecting to 172.17.0.2, TCP port 5001
TCP window size:  128 KByte (default)

[  3] local 172.17.0.1 port 34108 connected with 172.17.0.2 port 5001
[ ID] Interval   Transfer Bandwidth
[  3]  0.0-10.0 sec  12.2 GBytes  10.5 Gbits/sec
$ sudo LD_PRELOAD=/usr/local/lib/rsocket/librspreload.so iperf -c
172.17.0.2 -P 2

Client connecting to 172.17.0.2, TCP port 5001
TCP window size:  128 KByte (default)

[  4] local 172.17.0.1 port 55323 connected with 172.17.0.2 port 5001
[  3] local 172.17.0.1 port 36579 connected with 172.17.0.2 port 5001
[ ID] Interval   Transfer Bandwidth
[  4]  0.0-10.0 sec  7.46 GBytes  6.41 Gbits/sec
[  3]  0.0-10.0 sec  7.46 GBytes  6.41 Gbits/sec
[SUM]  0.0-10.0 sec  14.9 GBytes  12.8 Gbits/sec


with 2 parallel connection i'm able to reach rate speed with iperf,
the same speed archived with rstream.
Is iperf affected by IPoIB MTU size when used with librspreload.so ?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Slow performance with librspreload.so

2013-08-30 Thread Gandalf Corvotempesta
2013/8/30 Gandalf Corvotempesta gandalf.corvotempe...@gmail.com:
 Is iperf affected by IPoIB MTU size when used with librspreload.so ?

Another strange issue:

$ sudo LD_PRELOAD=/usr/local/lib/rsocket/librspreload.so iperf -c 172.17.0.2

Client connecting to 172.17.0.2, TCP port 5001
TCP window size:  128 KByte (default)

[  3] local 172.17.0.1 port 57926 connected with 172.17.0.2 port 5001
[ ID] Interval   Transfer Bandwidth
[  3]  0.0-10.0 sec  12.2 GBytes  10.4 Gbits/sec

$ iperf -c 172.17.0.2

Client connecting to 172.17.0.2, TCP port 5001
TCP window size:  648 KByte (default)

[  3] local 172.17.0.1 port 58113 connected with 172.17.0.2 port 5001
[ ID] Interval   Transfer Bandwidth
[  3]  0.0-10.0 sec  14.5 GBytes  12.5 Gbits/sec



rsocket slower than IPoIB ?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Slow performance with librspreload.so

2013-08-30 Thread Gandalf Corvotempesta
2013/8/30 Hefty, Sean sean.he...@intel.com:
 Not directly.  The ipoib mtu is usually set based on the mtu of the IB link.  
 The latter does affect rsocket performance.  However if the ipoib mtu is 
 changed separately from the IB link mtu, it will not affect rsockets.

Actually i'm going faster with IPoIB than rsockets.
How can I change the MTU for IB link ?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: Slow performance with librspreload.so

2013-08-29 Thread Gandalf Corvotempesta
-- Forwarded message --
From: Gandalf Corvotempesta gandalf.corvotempe...@gmail.com
Date: 2013/8/29
Subject: Re: Slow performance with librspreload.so
To: Hefty, Sean sean.he...@intel.com


2013/8/28 Hefty, Sean sean.he...@intel.com:
 If you can provide your PCIe information and the results from running the 
 perftest tools (rdma_bw), that could help as well.

node1 (172.17.0.1 is ip configured on ib0):

$ sudo ./rstream -s 172.17.0.1
name  bytes   xfers   iters   total   time Gb/secusec/xfer
64_lat64  1   100k12m 0.26s  0.40   1.28
4k_lat4k  1   10k 78m 0.17s  3.96   8.28
64k_lat   64k 1   1k  125m0.11s  9.86  53.19
1m_lat1m  1   100 200m0.14s 12.34 679.73
64_bw 64  100k1   12m 0.06s  1.75   0.29
4k_bw 4k  10k 1   78m 0.06s 11.79   2.78
64k_bw64k 1k  1   125m0.09s 12.20  42.97
1m_bw 1m  100 1   200m0.13s 12.78 656.55

$ lspci | grep -i infiniband
04:00.0 InfiniBand: Mellanox Technologies MT25418 [ConnectX VPI PCIe
2.0 2.5GT/s - IB DDR / 10GigE] (rev a0)


node2 (172.17.0.2 is ip configured on ib0):
$ sudo ./rstream -s 172.17.0.2
name  bytes   xfers   iters   total   time Gb/secusec/xfer
64_lat64  1   100k12m 1.10s  0.09   5.49
4k_lat4k  1   10k 78m 0.43s  1.53  21.49
64k_lat   64k 1   1k  125m0.29s  3.64 143.99
1m_lat1m  1   100 200m0.37s  4.531852.70
64_bw 64  100k1   12m 0.42s  0.24   2.12
4k_bw 4k  10k 1   78m 0.16s  4.16   7.87
64k_bw64k 1k  1   125m0.23s  4.49 116.69
1m_bw 1m  100 1   200m0.36s  4.631813.52

$ lspci | grep -i infiniband
02:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex
(Tavor compatibility mode) (rev 20)
(this is a Voltaire 400Ex-D card)

Same result by using 127.0.0.1 on both hosts, obviously.

I'm unable to run rdma_bw due to different CPU speed any my versions
doesn't have the ignore flag.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Slow performance with librspreload.so

2013-08-29 Thread Gandalf Corvotempesta
2013/8/29 Gandalf Corvotempesta gandalf.corvotempe...@gmail.com:
 node1 (172.17.0.1 is ip configured on ib0):

 $ sudo ./rstream -s 172.17.0.1
 name  bytes   xfers   iters   total   time Gb/secusec/xfer
 64_lat64  1   100k12m 0.26s  0.40   1.28
 4k_lat4k  1   10k 78m 0.17s  3.96   8.28
 64k_lat   64k 1   1k  125m0.11s  9.86  53.19
 1m_lat1m  1   100 200m0.14s 12.34 679.73
 64_bw 64  100k1   12m 0.06s  1.75   0.29
 4k_bw 4k  10k 1   78m 0.06s 11.79   2.78
 64k_bw64k 1k  1   125m0.09s 12.20  42.97
 1m_bw 1m  100 1   200m0.13s 12.78 656.55

With standard sockets:

$ sudo ./rstream -s 172.17.0.1 -T s
name  bytes   xfers   iters   total   time Gb/secusec/xfer
64_lat64  1   100k12m 1.07s  0.10   5.36
4k_lat4k  1   10k 78m 0.13s  4.89   6.70
64k_lat   64k 1   1k  125m0.06s 18.38  28.52
1m_lat1m  1   100 200m0.06s 25.90 323.89
64_bw 64  100k1   12m 0.98s  0.10   4.91
4k_bw 4k  10k 1   78m 0.12s  5.29   6.20
64k_bw64k 1k  1   125m0.04s 27.04  19.39
1m_bw 1m  100 1   200m0.05s 31.52 266.14
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Slow performance with librspreload.so

2013-08-29 Thread Gandalf Corvotempesta
2013/8/29 Hefty, Sean sean.he...@intel.com:
 12 Gbps on a 20 Gb link actually seems reasonable to me.  I only see around 
 25 Gbps on a 40 Gb link, with raw perftest performance coming in at about 26 
 Gbps.

Ok.
I think that i've connected the HBA to the wrong PCI-Express slot.
I have a DELL R200 that has 3 PCI-Express slot but one of them is just
x4. probably i've connected the card to this.

Tomorrow i'll try to connect the HBA to the x8 slot.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Slow performance with librspreload.so

2013-08-28 Thread Gandalf Corvotempesta
Hi
i'm trying the preloader librspreload.so on two directly connected hosts:

host1:$ sudo ibstatus
Infiniband device 'mlx4_0' port 1 status:
default gid: fe80::::0002:c903:004d:dd45
base lid: 0x1
sm lid: 0x1
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 20 Gb/sec (4X DDR)
link_layer: InfiniBand

Infiniband device 'mlx4_0' port 2 status:
default gid: fe80::::0002:c903:004d:dd46
base lid: 0x0
sm lid: 0x0
state: 1: DOWN
phys state: 2: Polling
rate: 10 Gb/sec (4X)
link_layer: InfiniBand


host2:$ sudo ibstatus
Infiniband device 'mthca0' port 1 status:
default gid: fe80::::0008:f104:0398:14cd
base lid: 0x2
sm lid: 0x1
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 20 Gb/sec (4X DDR)
link_layer: InfiniBand

Infiniband device 'mthca0' port 2 status:
default gid: fe80::::0008:f104:0398:14ce
base lid: 0x0
sm lid: 0x0
state: 1: DOWN
phys state: 2: Polling
rate: 10 Gb/sec (4X)
link_layer: InfiniBand



i've connected just one port between two hosts.
Ports is detected properly as 20Gb/s  (4x DDR) but i'm unable to reach
speed over 5Gbit/s:

host1:$ sudo LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so
NPtcp -h 172.17.0.2
Send and receive buffers are 131072 and 131072 bytes
(A bug in Linux doubles the requested buffer sizes)
Now starting the main loop
  0:   1 bytes  17008 times --  1.24 Mbps in   6.13 usec
  1:   2 bytes  16306 times --  2.02 Mbps in   7.56 usec
  2:   3 bytes  13223 times --  3.10 Mbps in   7.38 usec
  3:   4 bytes   9037 times --  4.21 Mbps in   7.25 usec
  4:   6 bytes  10345 times --  6.49 Mbps in   7.05 usec
  5:   8 bytes   7093 times --  7.77 Mbps in   7.85 usec
  6:  12 bytes   7957 times -- 17.08 Mbps in   5.36 usec
  7:  13 bytes   7772 times -- 14.75 Mbps in   6.73 usec
  8:  16 bytes   6861 times -- 16.11 Mbps in   7.58 usec
  9:  19 bytes   7424 times -- 18.91 Mbps in   7.67 usec
 10:  21 bytes   8237 times -- 17.69 Mbps in   9.06 usec
 11:  24 bytes   7361 times -- 19.72 Mbps in   9.28 usec
 12:  27 bytes   7628 times -- 24.14 Mbps in   8.53 usec
 13:  29 bytes   5207 times -- 29.81 Mbps in   7.42 usec
 14:  32 bytes   6504 times -- 29.42 Mbps in   8.30 usec
 15:  35 bytes   6401 times -- 39.08 Mbps in   6.83 usec
 16:  45 bytes   8362 times -- 45.19 Mbps in   7.60 usec
 17:  48 bytes   8774 times -- 46.10 Mbps in   7.94 usec
 18:  51 bytes   8654 times -- 55.19 Mbps in   7.05 usec
 19:  61 bytes   5562 times -- 57.42 Mbps in   8.10 usec
 20:  64 bytes   6068 times -- 72.31 Mbps in   6.75 usec
 21:  67 bytes   7636 times -- 42.93 Mbps in  11.91 usec
 22:  93 bytes   4512 times -- 55.84 Mbps in  12.71 usec
 23:  96 bytes   5246 times -- 60.13 Mbps in  12.18 usec
 24:  99 bytes   5558 times -- 59.49 Mbps in  12.70 usec
 25: 125 bytes   2864 times -- 75.25 Mbps in  12.67 usec
 26: 128 bytes   3913 times -- 75.78 Mbps in  12.89 usec
 27: 131 bytes   3940 times -- 74.77 Mbps in  13.37 usec
 28: 189 bytes   3883 times --113.42 Mbps in  12.71 usec
 29: 192 bytes   5243 times --109.85 Mbps in  13.33 usec
 30: 195 bytes   5038 times --115.66 Mbps in  12.86 usec
 31: 253 bytes   2710 times --146.61 Mbps in  13.17 usec
 32: 256 bytes   3782 times --142.77 Mbps in  13.68 usec
 33: 259 bytes   3683 times --144.75 Mbps in  13.65 usec
 34: 381 bytes   3733 times --201.64 Mbps in  14.42 usec
 35: 384 bytes   4624 times --204.22 Mbps in  14.35 usec
 36: 387 bytes   4665 times --204.65 Mbps in  14.43 usec
 37: 509 bytes   2364 times --265.12 Mbps in  14.65 usec
 38: 512 bytes   3406 times --267.89 Mbps in  14.58 usec
 39: 515 bytes   3442 times --266.90 Mbps in  14.72 usec
 40: 765 bytes   3429 times --381.51 Mbps in  15.30 usec
 41: 768 bytes   4357 times --384.85 Mbps in  15.23 usec
 42: 771 bytes   4387 times --386.35 Mbps in  15.23 usec
 43:1021 bytes   2214 times --495.38 Mbps in  15.72 usec
 44:1024 bytes   3176 times --499.56 Mbps in  15.64 usec
 45:1027 bytes   3203 times --497.19 Mbps in  15.76 usec
 46:1533 bytes   3188 times --692.19 Mbps in  16.90 usec
 47:1536 bytes   3945 times --688.52 Mbps in  17.02 usec
 48:1539 bytes   3920 times --693.85 Mbps in  16.92 usec
 49:2045 bytes   1981 times --858.05 Mbps in  18.18 usec
 50:2048 bytes   2748 times --862.22 Mbps in  18.12 usec
 51:2051 bytes   2761 times --832.50 Mbps in  18.80 usec
 52:3069 bytes   2666 times --   1174.72 Mbps in  19.93 usec
 53:3072 bytes   

Re: Slow performance with librspreload.so

2013-08-28 Thread Gandalf Corvotempesta
2013/8/28 Hefty, Sean sean.he...@intel.com:
 Can you run the rstream test program to verify that you can get faster than 5 
 Gbps?

 rstream without any options will use rsockets directly.  If you use the -T s 
 option, it will use standard TCP sockets.  You can use LD_PRELOAD with -T s 
 to verify that the preload brings your per performance to the same level as 
 using rsockets directly.

5Gb/s with rstream:

$ sudo ./rstream -s 172.17.0.2
name  bytes   xfers   iters   total   time Gb/secusec/xfer
64_lat64  1   100k12m 0.70s  0.15   3.52
4k_lat4k  1   10k 78m 0.29s  2.23  14.69
64k_lat   64k 1   1k  125m0.21s  4.94 106.07
1m_lat1m  1   100 200m0.30s  5.611495.89
64_bw 64  100k1   12m 0.25s  0.42   1.23
4k_bw 4k  10k 1   78m 0.13s  5.17   6.34
64k_bw64k 1k  1   125m0.19s  5.58  94.03
1m_bw 1m  100 1   200m0.30s  5.641486.53
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Slow performance with librspreload.so

2013-08-28 Thread Gandalf Corvotempesta
2013/8/28 Hefty, Sean sean.he...@intel.com:
 Can you explain your environment more?  The performance seems low.

Ubuntu 13.04 Server on both nodes.

node1:

$ cat /proc/cpuinfo | grep 'model name'
model name : Intel(R) Xeon(R) CPU E5-2603 0 @ 1.80GHz
model name : Intel(R) Xeon(R) CPU E5-2603 0 @ 1.80GHz
model name : Intel(R) Xeon(R) CPU E5-2603 0 @ 1.80GHz
model name : Intel(R) Xeon(R) CPU E5-2603 0 @ 1.80GHz

$ free -m
 total   used   free sharedbuffers cached
Mem: 16022966  15056  0 95534
-/+ buffers/cache:336  15686
Swap:16353  0  16353


node2:

$ cat /proc/cpuinfo | grep 'model name'
model name : Intel(R) Xeon(R) CPU3065  @ 2.33GHz
model name : Intel(R) Xeon(R) CPU3065  @ 2.33GHz

$ free -m
 total   used   free sharedbuffers cached
Mem:  2001718   1282  0 53516
-/+ buffers/cache:148   1853
Swap: 2044  0   2044
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dual star topology

2013-07-27 Thread Gandalf Corvotempesta
2013/7/25 Hefty, Sean sean.he...@intel.com:
 Currently, rsockets cannot fail over between local HCA ports.  This is an 
 implementation restriction resulting from no automatic path migration support.

This is still unclear.
From the README (
http://git.openfabrics.org/git?p=~shefty/librdmacm.git;a=blob;f=README;h=e1f222740144ed8f7d8bc935ea6643355b077bcd;hb=refs/heads/master
) I can read:
Using multiple interfaces
  The librdmacm does support multiple interfaces.

so, multiple interfaces are supported. Why I can't use both port? Each
port is threathed as a single interface.

Next question: rsockets addressing is done by IPoIB ? How can I reach
a remote IP address without IPoIB ?
If yes, I think that I could use a bonded IPoIB interface to have an
always-on path.

English is not my primary language, so, i repeat: today, is not
possible to use librdma_cm in a fault tollerance architecture ? I
don't need bandwidth aggregation, but just a fault tollerance across 2
switches.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dual star topology

2013-07-26 Thread Gandalf Corvotempesta
2013/7/25 Hefty, Sean sean.he...@intel.com:
 Currently, rsockets cannot fail over between local HCA ports.  This is an 
 implementation restriction resulting from no automatic path migration support.

So, currently there is not way to have a fully redundant IB network
with dual-port HCA ?

In case of IPoIB, how can I use multiple ports for interconnecting
both switches? In a standard IP network I have to bond them toghether
to avoid loops. In IB should I do the same ?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dual star topology

2013-07-26 Thread Gandalf Corvotempesta
2013/7/26 Hal Rosenstock h...@dev.mellanox.co.il:
 There is support for IPoIB bonding.

I'm talking about switch interconnection. How can I create bonding on switches?
Any recent benchmark about IPoIB ?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dual star topology

2013-07-26 Thread Gandalf Corvotempesta
2013/7/26 Hal Rosenstock h...@dev.mellanox.co.il:
 The SM reroutes based on links coming and going so if there are multiple
 links between switches, this addresses the internal link failover scenario.

Ok, so, IPoIB across bonded ports on each host.
Multiple links between switches managed automatically by SM.
Will these multiple links be used in parallel by aggregating bandwidth
or just in case of failover ?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dual star topology

2013-07-26 Thread Gandalf Corvotempesta
2013/7/26 Hal Rosenstock h...@dev.mellanox.co.il:
 Note that in terms of IPoIB bonding, I think it's an active/standby
 rather than active/active model.

Why? Can't I create an active/active bonded interface ?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: Dual star topology

2013-07-25 Thread Gandalf Corvotempesta
IPoIB is very slow, I prefere to try with rsockets and the preloader library.

Il giorno 24/lug/2013 23:18, Hal Rosenstock h...@dev.mellanox.co.il
ha scritto:

 On 7/24/2013 4:56 PM, Gandalf Corvotempesta wrote:
  2013/7/24 Gandalf Corvotempesta gandalf.corvotempe...@gmail.com:
  I have to configure ceph on these subnets and ceph doesn't allow to
  set multiple addresses for each service.
 
  Let me try to explain in a better way.
  I would like to create a ceph cluster over an infiniband network.
  Each server has a signle dual-port HBA.
  Ceph is running with a *single* IP address on each server.
 
  In a standard IP network, I have to interconnect both switches or i'll
  loose some traffic in case of a single port failure:
 
  server1.port1  ib switch 1 -- server2.port1
  server1.port2 - ib switch 2
 
 
  in this case, server1 will not be able to reach server2, because of split 
  brain.
  An interconnection between both switches will solve this in a standard
  IP network.

 So does ceph run on top of IP ? If so, could you use IPoIB bonding (and
 interconnect the switches with some number of links) ?

 -- Hal

  How can I archieve this in an infiniband network?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Dual star topology

2013-07-24 Thread Gandalf Corvotempesta
Hi to all
i'm probably OT but I don't know where to ask
I'm searching for some advice creating a dual-star topology to get
full path redundancy.
I have one dual-port DDR card on each server and two switches.
I'll connect one port to each switch, but should I also interconnect
both switches like in a standard ethernet network ?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: Dual star topology

2013-07-24 Thread Gandalf Corvotempesta
in this way all services should be dual stack with at least two
difrent addresses, one for each infiniband subnets.

I have to configure ceph on these subnets and ceph doesn't allow to
set multiple addresses for each service.

Il giorno 24/lug/2013 20:04, Hal Rosenstock h...@dev.mellanox.co.il
ha scritto:

 Hi Gandalf,

 On 7/24/2013 11:20 AM, Gandalf Corvotempesta wrote:
  Hi to all
  i'm probably OT but I don't know where to ask
  I'm searching for some advice creating a dual-star topology to get
  full path redundancy.
  I have one dual-port DDR card on each server and two switches.
  I'll connect one port to each switch, but should I also interconnect
  both switches like in a standard ethernet network ?

 The most fully redundant model would be to _not_ interconnect the 2
 switches so it's 2 IB subnets rather than a single subnet. This means
 you need at least 1 SM on each subnet (with different subnet prefix) but
 you might want more if you're worried about some SM failing and not
 having an SM on that subnet.

 -- Hal
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dual star topology

2013-07-24 Thread Gandalf Corvotempesta
2013/7/24 Gandalf Corvotempesta gandalf.corvotempe...@gmail.com:
 I have to configure ceph on these subnets and ceph doesn't allow to
 set multiple addresses for each service.

Let me try to explain in a better way.
I would like to create a ceph cluster over an infiniband network.
Each server has a signle dual-port HBA.
Ceph is running with a *single* IP address on each server.

In a standard IP network, I have to interconnect both switches or i'll
loose some traffic in case of a single port failure:

server1.port1  ib switch 1 -- server2.port1
server1.port2 - ib switch 2


in this case, server1 will not be able to reach server2, because of split brain.
An interconnection between both switches will solve this in a standard
IP network.

How can I archieve this in an infiniband network?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html