Re: [ceph-users] Help needed porting Ceph to RSockets
2013-10-31 Hefty, Sean sean.he...@intel.com: Can you please try the attached patch in place of all previous patches? Any updates on ceph with rsockets? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IPoIB and STP
OpenSM will be running on both nodes and both node ports will be bonded together in active-failover (or active-active, if possible). My issue is with interconnection links between switches. Last time that i've tried I had issue due to the interface bonding in active-active. When using active-active i've seen IPoIB running and just a couple of KBytes/s and in active-failover was running at 10-11Gbps/s So, are you suggesting: OpenSM running on both, node1 and node2 Both nodes connected to both switches (1 port per switch) Both switches interconnected with 2 or more links to have aggregated bandwidth This would'nt require any special configuration? How can I ensure the both link on switches are used together? Any command to run with OpenSM ? 2013/12/29 Gennadiy Nerubayev para...@gmail.com: On Sat, Dec 28, 2013 at 6:08 AM, Gandalf Corvotempesta gandalf.corvotempe...@gmail.com wrote: Hi, i'm trying to configure a redundant IPoIB network. Obviously, IB switches doesn't talk IP and so doesn't have STP support. How can I interconnect two different switches with multiple cables and avoid loops? For example: http://pastebin.com/raw.php?i=w8ySRibG To have a fully redundant network, I have to interconnect switch1 and switch2 at least with 2 cables and enable STP to avoit loops Is OpenSM smart enough to detect a multiple link and shutdown one port automatically (doing the same as STP) ? Infiniband uses a switched fabric topology, so regular Ethernet STP topology rules do not apply. In this scenario here each link, direct or via the interconnects to the other switch, would just be an alternate path to the target. If I recall correctly, running OpenSM only binds to one port, so this is why you'd technically want those links between the switches to have a single fabric. Of course, losing the link to that port also means you will have no subnet manager, so you might want to consider having it running on one more host as well.. -Gennadiy -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rsockets addressing
2013/9/17 Yann Droneaud ydrone...@opteya.com: Are talking about Ethernet network or InfiniBand Fabric ? IB fabric In case of InfiniBand, the subnet manager should take care of it. So, interconnecting 2 IB switches is good. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
rsockets addressing
Hi to all, which kind of system is used by rsockets to address the remote host? Is using IPoIB ? Will I be able to support two redundant fabrics with failover managed by OpenSM with rsockets ? For example, two nodes connected (with 2 HBA on each) to two different IB fabrics. In case of an HBA failure, will rsocket be able to re-establish a connected using the second fabric ? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rsockets addressing
2013/9/16 Hefty, Sean sean.he...@intel.com: rsockets does not implement failover. An application would need to reestablish a connection in the case of a failure. I have not looked to see what it would take to implement failover inside rsockets, and that's not something I would have time to implement anytime soon. Connection restablishement is ok, but in case of port failure, ipoib will remap the remote IP against the new port/fabric ? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Slow performance with librspreload.so
2013/9/3 Gandalf Corvotempesta gandalf.corvotempe...@gmail.com: $ sudo qperf -ub 172.17.0.2 rc_bi_bw rc_lat rc_bw rc_rdma_read_lat rc_rdma_read_bw rc_rdma_write_lat rc_rdma_write_bw tcp_lat tcp_bw rc_bi_bw: bw = 20.5 Gb/sec rc_lat: latency = 15.4 us rc_bw: bw = 13.7 Gb/sec rc_rdma_read_lat: latency = 12.9 us rc_rdma_read_bw: bw = 11.5 Gb/sec rc_rdma_write_lat: latency = 15.2 us rc_rdma_write_bw: bw = 13.7 Gb/sec tcp_lat: latency = 48.8 us tcp_bw: bw = 12.5 Gb/sec I don't know if they are good for a DDR fabric. Just to clarify, why I'm getting the same bandwidth with librspreload.so and with plain use of IPoIB ? Should I check something ? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rsockets addressing
2013/9/16 Yann Droneaud ydrone...@opteya.com: The InfiniBand fabrics support ISL ... indeed. [cut] You definitely need link between switches if you want to use a high availability fabric topology with HCA ports connected to differents switches. (additionally, a switch should have a link to two others switches ... then it's starting to be complicated, since you have to design your fabric topology to match the communication pattern / data locality used by your application ...). I've read somewhere that this is not suggested because issues on one switch could affect also the second switch, but doing so will allow me to use both port in hot-standby failover. In one port fails, IPoIB bonding will switch on the second port and at the same time , traffic will be routed trough the ISL link Without the ISL, a port failure will bring down the whole node. So: - switch1 conntected to switch2 - node1 connected to both switches - node2 connected to both switches - IPoIB on each node with active-passive bonding for each IB port. How can I create a redundant ISL ? Should I connect two or more port and the subnet manager will automatically take care of this or I have to configure something like STP on plain ethernet networks. Will the ISL be used in loadbalancing in case of 2 or more cables connected ? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ceph-users] Help needed porting Ceph to RSockets
2013/9/10 Andreas Bluemle andreas.blue...@itxperts.de: Since I have added these workarounds to my version of the librdmacm library, I can at least start up ceph using LD_PRELOAD and end up in a healthy ceph cluster state. Have you seen any performance improvement by using LD_PRELOAD with ceph? Which throughput are you able to archive with rsocket and ceph? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Slow performance with librspreload.so
2013/9/1 Gandalf Corvotempesta gandalf.corvotempe...@gmail.com: What is strange to me is that rsocket is slower than IPoIB and limited to 10Gbit more or less. With IPoIB i'm able to reach 12.5 Gbit qperf is giving the same strange speed: FROM NODE1 to NODE2: $ sudo qperf -ub 77.95.175.106 ud_lat ud_bw ud_lat: latency = 12.5 us ud_bw: send_bw = 12.5 Gb/sec recv_bw = 12.5 Gb/sec FROM NODE1 TO NODE2, slower and with more latency than remote host! $ sudo qperf -ub 172.17.0.1 ud_lat ud_bw ud_lat: latency = 13.8 us ud_bw: send_bw = 11.9 Gb/sec recv_bw = 11.9 Gb/sec how can I check if this is due to an hardware bottleneck ? CPU and RAM are good. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Slow performance with librspreload.so
2013/9/3 Hal Rosenstock h...@dev.mellanox.co.il: With mthca, due to quirk, optimal performance is achieved at 1K MTU. OpenSM can reduce the MTU in returned PathRecords to 1K when one end of the path is mthca and actual path MTU is 1K. This is controlled by enable_quirks config parameter which defaults to FALSE (don't do this). I'll try. Actually these are my results, from node1 to node2 $ sudo qperf -ub 172.17.0.2 rc_bi_bw rc_lat rc_bw rc_rdma_read_lat rc_rdma_read_bw rc_rdma_write_lat rc_rdma_write_bw tcp_lat tcp_bw rc_bi_bw: bw = 20.5 Gb/sec rc_lat: latency = 15.4 us rc_bw: bw = 13.7 Gb/sec rc_rdma_read_lat: latency = 12.9 us rc_rdma_read_bw: bw = 11.5 Gb/sec rc_rdma_write_lat: latency = 15.2 us rc_rdma_write_bw: bw = 13.7 Gb/sec tcp_lat: latency = 48.8 us tcp_bw: bw = 12.5 Gb/sec I don't know if they are good for a DDR fabric. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Slow performance with librspreload.so
2013/9/1 Rupert Dance rsda...@soft-forge.com: My guess is that it will not make a huge difference and that the solution lies elsewhere. What is strange to me is that rsocket is slower than IPoIB and limited to 10Gbit more or less. With IPoIB i'm able to reach 12.5 Gbit -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Slow performance with librspreload.so
2013/8/30 Rupert Dance rsda...@soft-forge.com: One way to set or check mtu is with the ibportstate utility: Usage: ibportstate [options] dest dr_path|lid|guid portnum [op] Supported ops: enable, disable, reset, speed, width, query, down, arm, active, vls, mtu, lid, smlid, lmc I've tried but max MTU is 2048 on one device: $ sudo ibv_devinfo hca_id: mthca0 transport: InfiniBand (0) fw_ver: 4.7.600 node_guid: 0008:f104:0398:14cc sys_image_guid: 0008:f104:0398:14cf vendor_id: 0x08f1 vendor_part_id: 25208 hw_ver: 0xA0 board_id: VLT0040010001 phys_port_cnt: 2 port: 1 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 1 port_lid: 2 port_lmc: 0x00 link_layer: InfiniBand any workaround? Maybe a firmware update ? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Slow performance with librspreload.so
2013/8/31 Rupert Dance rsda...@soft-forge.com: The Vendor ID indicates that this is a Voltaire card which probably means it is an older card. Some of the early Mellanox based cards did not support anything bigger than 2048. Yes, it's an older card used just for this test. By the way, increasing MTU to 4096 will give me more performance? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Slow performance with librspreload.so
2013/8/29 Hefty, Sean sean.he...@intel.com: 12 Gbps on a 20 Gb link actually seems reasonable to me. I only see around 25 Gbps on a 40 Gb link, with raw perftest performance coming in at about 26 Gbps. Is this a rstream limits or an IB limit? I've read somewhere that DDR should transfer at 16Gbps By the way, moving the HBA on the second slot, brought me to 12Gbps on both hosts. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Slow performance with librspreload.so
2013/8/30 Gandalf Corvotempesta gandalf.corvotempe...@gmail.com: By the way, moving the HBA on the second slot, brought me to 12Gbps on both hosts. This is great: $ sudo LD_PRELOAD=/usr/local/lib/rsocket/librspreload.so iperf -c 172.17.0.2 Client connecting to 172.17.0.2, TCP port 5001 TCP window size: 128 KByte (default) [ 3] local 172.17.0.1 port 34108 connected with 172.17.0.2 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 12.2 GBytes 10.5 Gbits/sec $ sudo LD_PRELOAD=/usr/local/lib/rsocket/librspreload.so iperf -c 172.17.0.2 -P 2 Client connecting to 172.17.0.2, TCP port 5001 TCP window size: 128 KByte (default) [ 4] local 172.17.0.1 port 55323 connected with 172.17.0.2 port 5001 [ 3] local 172.17.0.1 port 36579 connected with 172.17.0.2 port 5001 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.0 sec 7.46 GBytes 6.41 Gbits/sec [ 3] 0.0-10.0 sec 7.46 GBytes 6.41 Gbits/sec [SUM] 0.0-10.0 sec 14.9 GBytes 12.8 Gbits/sec with 2 parallel connection i'm able to reach rate speed with iperf, the same speed archived with rstream. Is iperf affected by IPoIB MTU size when used with librspreload.so ? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Slow performance with librspreload.so
2013/8/30 Gandalf Corvotempesta gandalf.corvotempe...@gmail.com: Is iperf affected by IPoIB MTU size when used with librspreload.so ? Another strange issue: $ sudo LD_PRELOAD=/usr/local/lib/rsocket/librspreload.so iperf -c 172.17.0.2 Client connecting to 172.17.0.2, TCP port 5001 TCP window size: 128 KByte (default) [ 3] local 172.17.0.1 port 57926 connected with 172.17.0.2 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 12.2 GBytes 10.4 Gbits/sec $ iperf -c 172.17.0.2 Client connecting to 172.17.0.2, TCP port 5001 TCP window size: 648 KByte (default) [ 3] local 172.17.0.1 port 58113 connected with 172.17.0.2 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 14.5 GBytes 12.5 Gbits/sec rsocket slower than IPoIB ? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Slow performance with librspreload.so
2013/8/30 Hefty, Sean sean.he...@intel.com: Not directly. The ipoib mtu is usually set based on the mtu of the IB link. The latter does affect rsocket performance. However if the ipoib mtu is changed separately from the IB link mtu, it will not affect rsockets. Actually i'm going faster with IPoIB than rsockets. How can I change the MTU for IB link ? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fwd: Slow performance with librspreload.so
-- Forwarded message -- From: Gandalf Corvotempesta gandalf.corvotempe...@gmail.com Date: 2013/8/29 Subject: Re: Slow performance with librspreload.so To: Hefty, Sean sean.he...@intel.com 2013/8/28 Hefty, Sean sean.he...@intel.com: If you can provide your PCIe information and the results from running the perftest tools (rdma_bw), that could help as well. node1 (172.17.0.1 is ip configured on ib0): $ sudo ./rstream -s 172.17.0.1 name bytes xfers iters total time Gb/secusec/xfer 64_lat64 1 100k12m 0.26s 0.40 1.28 4k_lat4k 1 10k 78m 0.17s 3.96 8.28 64k_lat 64k 1 1k 125m0.11s 9.86 53.19 1m_lat1m 1 100 200m0.14s 12.34 679.73 64_bw 64 100k1 12m 0.06s 1.75 0.29 4k_bw 4k 10k 1 78m 0.06s 11.79 2.78 64k_bw64k 1k 1 125m0.09s 12.20 42.97 1m_bw 1m 100 1 200m0.13s 12.78 656.55 $ lspci | grep -i infiniband 04:00.0 InfiniBand: Mellanox Technologies MT25418 [ConnectX VPI PCIe 2.0 2.5GT/s - IB DDR / 10GigE] (rev a0) node2 (172.17.0.2 is ip configured on ib0): $ sudo ./rstream -s 172.17.0.2 name bytes xfers iters total time Gb/secusec/xfer 64_lat64 1 100k12m 1.10s 0.09 5.49 4k_lat4k 1 10k 78m 0.43s 1.53 21.49 64k_lat 64k 1 1k 125m0.29s 3.64 143.99 1m_lat1m 1 100 200m0.37s 4.531852.70 64_bw 64 100k1 12m 0.42s 0.24 2.12 4k_bw 4k 10k 1 78m 0.16s 4.16 7.87 64k_bw64k 1k 1 125m0.23s 4.49 116.69 1m_bw 1m 100 1 200m0.36s 4.631813.52 $ lspci | grep -i infiniband 02:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex (Tavor compatibility mode) (rev 20) (this is a Voltaire 400Ex-D card) Same result by using 127.0.0.1 on both hosts, obviously. I'm unable to run rdma_bw due to different CPU speed any my versions doesn't have the ignore flag. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Slow performance with librspreload.so
2013/8/29 Gandalf Corvotempesta gandalf.corvotempe...@gmail.com: node1 (172.17.0.1 is ip configured on ib0): $ sudo ./rstream -s 172.17.0.1 name bytes xfers iters total time Gb/secusec/xfer 64_lat64 1 100k12m 0.26s 0.40 1.28 4k_lat4k 1 10k 78m 0.17s 3.96 8.28 64k_lat 64k 1 1k 125m0.11s 9.86 53.19 1m_lat1m 1 100 200m0.14s 12.34 679.73 64_bw 64 100k1 12m 0.06s 1.75 0.29 4k_bw 4k 10k 1 78m 0.06s 11.79 2.78 64k_bw64k 1k 1 125m0.09s 12.20 42.97 1m_bw 1m 100 1 200m0.13s 12.78 656.55 With standard sockets: $ sudo ./rstream -s 172.17.0.1 -T s name bytes xfers iters total time Gb/secusec/xfer 64_lat64 1 100k12m 1.07s 0.10 5.36 4k_lat4k 1 10k 78m 0.13s 4.89 6.70 64k_lat 64k 1 1k 125m0.06s 18.38 28.52 1m_lat1m 1 100 200m0.06s 25.90 323.89 64_bw 64 100k1 12m 0.98s 0.10 4.91 4k_bw 4k 10k 1 78m 0.12s 5.29 6.20 64k_bw64k 1k 1 125m0.04s 27.04 19.39 1m_bw 1m 100 1 200m0.05s 31.52 266.14 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Slow performance with librspreload.so
2013/8/29 Hefty, Sean sean.he...@intel.com: 12 Gbps on a 20 Gb link actually seems reasonable to me. I only see around 25 Gbps on a 40 Gb link, with raw perftest performance coming in at about 26 Gbps. Ok. I think that i've connected the HBA to the wrong PCI-Express slot. I have a DELL R200 that has 3 PCI-Express slot but one of them is just x4. probably i've connected the card to this. Tomorrow i'll try to connect the HBA to the x8 slot. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Slow performance with librspreload.so
Hi i'm trying the preloader librspreload.so on two directly connected hosts: host1:$ sudo ibstatus Infiniband device 'mlx4_0' port 1 status: default gid: fe80::::0002:c903:004d:dd45 base lid: 0x1 sm lid: 0x1 state: 4: ACTIVE phys state: 5: LinkUp rate: 20 Gb/sec (4X DDR) link_layer: InfiniBand Infiniband device 'mlx4_0' port 2 status: default gid: fe80::::0002:c903:004d:dd46 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 2: Polling rate: 10 Gb/sec (4X) link_layer: InfiniBand host2:$ sudo ibstatus Infiniband device 'mthca0' port 1 status: default gid: fe80::::0008:f104:0398:14cd base lid: 0x2 sm lid: 0x1 state: 4: ACTIVE phys state: 5: LinkUp rate: 20 Gb/sec (4X DDR) link_layer: InfiniBand Infiniband device 'mthca0' port 2 status: default gid: fe80::::0008:f104:0398:14ce base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 2: Polling rate: 10 Gb/sec (4X) link_layer: InfiniBand i've connected just one port between two hosts. Ports is detected properly as 20Gb/s (4x DDR) but i'm unable to reach speed over 5Gbit/s: host1:$ sudo LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so NPtcp -h 172.17.0.2 Send and receive buffers are 131072 and 131072 bytes (A bug in Linux doubles the requested buffer sizes) Now starting the main loop 0: 1 bytes 17008 times -- 1.24 Mbps in 6.13 usec 1: 2 bytes 16306 times -- 2.02 Mbps in 7.56 usec 2: 3 bytes 13223 times -- 3.10 Mbps in 7.38 usec 3: 4 bytes 9037 times -- 4.21 Mbps in 7.25 usec 4: 6 bytes 10345 times -- 6.49 Mbps in 7.05 usec 5: 8 bytes 7093 times -- 7.77 Mbps in 7.85 usec 6: 12 bytes 7957 times -- 17.08 Mbps in 5.36 usec 7: 13 bytes 7772 times -- 14.75 Mbps in 6.73 usec 8: 16 bytes 6861 times -- 16.11 Mbps in 7.58 usec 9: 19 bytes 7424 times -- 18.91 Mbps in 7.67 usec 10: 21 bytes 8237 times -- 17.69 Mbps in 9.06 usec 11: 24 bytes 7361 times -- 19.72 Mbps in 9.28 usec 12: 27 bytes 7628 times -- 24.14 Mbps in 8.53 usec 13: 29 bytes 5207 times -- 29.81 Mbps in 7.42 usec 14: 32 bytes 6504 times -- 29.42 Mbps in 8.30 usec 15: 35 bytes 6401 times -- 39.08 Mbps in 6.83 usec 16: 45 bytes 8362 times -- 45.19 Mbps in 7.60 usec 17: 48 bytes 8774 times -- 46.10 Mbps in 7.94 usec 18: 51 bytes 8654 times -- 55.19 Mbps in 7.05 usec 19: 61 bytes 5562 times -- 57.42 Mbps in 8.10 usec 20: 64 bytes 6068 times -- 72.31 Mbps in 6.75 usec 21: 67 bytes 7636 times -- 42.93 Mbps in 11.91 usec 22: 93 bytes 4512 times -- 55.84 Mbps in 12.71 usec 23: 96 bytes 5246 times -- 60.13 Mbps in 12.18 usec 24: 99 bytes 5558 times -- 59.49 Mbps in 12.70 usec 25: 125 bytes 2864 times -- 75.25 Mbps in 12.67 usec 26: 128 bytes 3913 times -- 75.78 Mbps in 12.89 usec 27: 131 bytes 3940 times -- 74.77 Mbps in 13.37 usec 28: 189 bytes 3883 times --113.42 Mbps in 12.71 usec 29: 192 bytes 5243 times --109.85 Mbps in 13.33 usec 30: 195 bytes 5038 times --115.66 Mbps in 12.86 usec 31: 253 bytes 2710 times --146.61 Mbps in 13.17 usec 32: 256 bytes 3782 times --142.77 Mbps in 13.68 usec 33: 259 bytes 3683 times --144.75 Mbps in 13.65 usec 34: 381 bytes 3733 times --201.64 Mbps in 14.42 usec 35: 384 bytes 4624 times --204.22 Mbps in 14.35 usec 36: 387 bytes 4665 times --204.65 Mbps in 14.43 usec 37: 509 bytes 2364 times --265.12 Mbps in 14.65 usec 38: 512 bytes 3406 times --267.89 Mbps in 14.58 usec 39: 515 bytes 3442 times --266.90 Mbps in 14.72 usec 40: 765 bytes 3429 times --381.51 Mbps in 15.30 usec 41: 768 bytes 4357 times --384.85 Mbps in 15.23 usec 42: 771 bytes 4387 times --386.35 Mbps in 15.23 usec 43:1021 bytes 2214 times --495.38 Mbps in 15.72 usec 44:1024 bytes 3176 times --499.56 Mbps in 15.64 usec 45:1027 bytes 3203 times --497.19 Mbps in 15.76 usec 46:1533 bytes 3188 times --692.19 Mbps in 16.90 usec 47:1536 bytes 3945 times --688.52 Mbps in 17.02 usec 48:1539 bytes 3920 times --693.85 Mbps in 16.92 usec 49:2045 bytes 1981 times --858.05 Mbps in 18.18 usec 50:2048 bytes 2748 times --862.22 Mbps in 18.12 usec 51:2051 bytes 2761 times --832.50 Mbps in 18.80 usec 52:3069 bytes 2666 times -- 1174.72 Mbps in 19.93 usec 53:3072 bytes
Re: Slow performance with librspreload.so
2013/8/28 Hefty, Sean sean.he...@intel.com: Can you run the rstream test program to verify that you can get faster than 5 Gbps? rstream without any options will use rsockets directly. If you use the -T s option, it will use standard TCP sockets. You can use LD_PRELOAD with -T s to verify that the preload brings your per performance to the same level as using rsockets directly. 5Gb/s with rstream: $ sudo ./rstream -s 172.17.0.2 name bytes xfers iters total time Gb/secusec/xfer 64_lat64 1 100k12m 0.70s 0.15 3.52 4k_lat4k 1 10k 78m 0.29s 2.23 14.69 64k_lat 64k 1 1k 125m0.21s 4.94 106.07 1m_lat1m 1 100 200m0.30s 5.611495.89 64_bw 64 100k1 12m 0.25s 0.42 1.23 4k_bw 4k 10k 1 78m 0.13s 5.17 6.34 64k_bw64k 1k 1 125m0.19s 5.58 94.03 1m_bw 1m 100 1 200m0.30s 5.641486.53 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Slow performance with librspreload.so
2013/8/28 Hefty, Sean sean.he...@intel.com: Can you explain your environment more? The performance seems low. Ubuntu 13.04 Server on both nodes. node1: $ cat /proc/cpuinfo | grep 'model name' model name : Intel(R) Xeon(R) CPU E5-2603 0 @ 1.80GHz model name : Intel(R) Xeon(R) CPU E5-2603 0 @ 1.80GHz model name : Intel(R) Xeon(R) CPU E5-2603 0 @ 1.80GHz model name : Intel(R) Xeon(R) CPU E5-2603 0 @ 1.80GHz $ free -m total used free sharedbuffers cached Mem: 16022966 15056 0 95534 -/+ buffers/cache:336 15686 Swap:16353 0 16353 node2: $ cat /proc/cpuinfo | grep 'model name' model name : Intel(R) Xeon(R) CPU3065 @ 2.33GHz model name : Intel(R) Xeon(R) CPU3065 @ 2.33GHz $ free -m total used free sharedbuffers cached Mem: 2001718 1282 0 53516 -/+ buffers/cache:148 1853 Swap: 2044 0 2044 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Dual star topology
2013/7/25 Hefty, Sean sean.he...@intel.com: Currently, rsockets cannot fail over between local HCA ports. This is an implementation restriction resulting from no automatic path migration support. This is still unclear. From the README ( http://git.openfabrics.org/git?p=~shefty/librdmacm.git;a=blob;f=README;h=e1f222740144ed8f7d8bc935ea6643355b077bcd;hb=refs/heads/master ) I can read: Using multiple interfaces The librdmacm does support multiple interfaces. so, multiple interfaces are supported. Why I can't use both port? Each port is threathed as a single interface. Next question: rsockets addressing is done by IPoIB ? How can I reach a remote IP address without IPoIB ? If yes, I think that I could use a bonded IPoIB interface to have an always-on path. English is not my primary language, so, i repeat: today, is not possible to use librdma_cm in a fault tollerance architecture ? I don't need bandwidth aggregation, but just a fault tollerance across 2 switches. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Dual star topology
2013/7/25 Hefty, Sean sean.he...@intel.com: Currently, rsockets cannot fail over between local HCA ports. This is an implementation restriction resulting from no automatic path migration support. So, currently there is not way to have a fully redundant IB network with dual-port HCA ? In case of IPoIB, how can I use multiple ports for interconnecting both switches? In a standard IP network I have to bond them toghether to avoid loops. In IB should I do the same ? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Dual star topology
2013/7/26 Hal Rosenstock h...@dev.mellanox.co.il: There is support for IPoIB bonding. I'm talking about switch interconnection. How can I create bonding on switches? Any recent benchmark about IPoIB ? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Dual star topology
2013/7/26 Hal Rosenstock h...@dev.mellanox.co.il: The SM reroutes based on links coming and going so if there are multiple links between switches, this addresses the internal link failover scenario. Ok, so, IPoIB across bonded ports on each host. Multiple links between switches managed automatically by SM. Will these multiple links be used in parallel by aggregating bandwidth or just in case of failover ? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Dual star topology
2013/7/26 Hal Rosenstock h...@dev.mellanox.co.il: Note that in terms of IPoIB bonding, I think it's an active/standby rather than active/active model. Why? Can't I create an active/active bonded interface ? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fwd: Dual star topology
IPoIB is very slow, I prefere to try with rsockets and the preloader library. Il giorno 24/lug/2013 23:18, Hal Rosenstock h...@dev.mellanox.co.il ha scritto: On 7/24/2013 4:56 PM, Gandalf Corvotempesta wrote: 2013/7/24 Gandalf Corvotempesta gandalf.corvotempe...@gmail.com: I have to configure ceph on these subnets and ceph doesn't allow to set multiple addresses for each service. Let me try to explain in a better way. I would like to create a ceph cluster over an infiniband network. Each server has a signle dual-port HBA. Ceph is running with a *single* IP address on each server. In a standard IP network, I have to interconnect both switches or i'll loose some traffic in case of a single port failure: server1.port1 ib switch 1 -- server2.port1 server1.port2 - ib switch 2 in this case, server1 will not be able to reach server2, because of split brain. An interconnection between both switches will solve this in a standard IP network. So does ceph run on top of IP ? If so, could you use IPoIB bonding (and interconnect the switches with some number of links) ? -- Hal How can I archieve this in an infiniband network? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Dual star topology
Hi to all i'm probably OT but I don't know where to ask I'm searching for some advice creating a dual-star topology to get full path redundancy. I have one dual-port DDR card on each server and two switches. I'll connect one port to each switch, but should I also interconnect both switches like in a standard ethernet network ? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fwd: Dual star topology
in this way all services should be dual stack with at least two difrent addresses, one for each infiniband subnets. I have to configure ceph on these subnets and ceph doesn't allow to set multiple addresses for each service. Il giorno 24/lug/2013 20:04, Hal Rosenstock h...@dev.mellanox.co.il ha scritto: Hi Gandalf, On 7/24/2013 11:20 AM, Gandalf Corvotempesta wrote: Hi to all i'm probably OT but I don't know where to ask I'm searching for some advice creating a dual-star topology to get full path redundancy. I have one dual-port DDR card on each server and two switches. I'll connect one port to each switch, but should I also interconnect both switches like in a standard ethernet network ? The most fully redundant model would be to _not_ interconnect the 2 switches so it's 2 IB subnets rather than a single subnet. This means you need at least 1 SM on each subnet (with different subnet prefix) but you might want more if you're worried about some SM failing and not having an SM on that subnet. -- Hal -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Dual star topology
2013/7/24 Gandalf Corvotempesta gandalf.corvotempe...@gmail.com: I have to configure ceph on these subnets and ceph doesn't allow to set multiple addresses for each service. Let me try to explain in a better way. I would like to create a ceph cluster over an infiniband network. Each server has a signle dual-port HBA. Ceph is running with a *single* IP address on each server. In a standard IP network, I have to interconnect both switches or i'll loose some traffic in case of a single port failure: server1.port1 ib switch 1 -- server2.port1 server1.port2 - ib switch 2 in this case, server1 will not be able to reach server2, because of split brain. An interconnection between both switches will solve this in a standard IP network. How can I archieve this in an infiniband network? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html