Re: [ceph-users] Typical 10GbE latency
On 13-11-14 19:39, Stephan Seitz wrote: Indeed, there must be something! But I can't figure it out yet. Same controllers, tried the same OS, direct cables, but the latency is 40% higher. Wido, just an educated guess: Did you check the offload settings of your NIC? ethtool -k IFNAME should you provide that. Yes, I tested that as well. But I have to add, the other deployments I tested all run with the default settings and do get a good latency. But I turned off lro for example, that didn't help either. - Stephan -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
On 12-11-14 21:12, Udo Lembke wrote: Hi Wido, On 12.11.2014 12:55, Wido den Hollander wrote: (back to list) Indeed, there must be something! But I can't figure it out yet. Same controllers, tried the same OS, direct cables, but the latency is 40% higher. perhaps something with pci-e order / interupts? have you checked the bios settings or use another pcie-slot? That's indeed a good suggestion. I haven't tried, but that is something I should try. Will take me a while to get that tested, but I will give it a try. Udo -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
Indeed, there must be something! But I can't figure it out yet. Same controllers, tried the same OS, direct cables, but the latency is 40% higher. Wido, just an educated guess: Did you check the offload settings of your NIC? ethtool -k IFNAME should you provide that. - Stephan -- Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin http://www.heinlein-support.de Tel: 030 / 405051-44 Fax: 030 / 405051-19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin signature.asc Description: This is a digitally signed message part ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
any special parameters (or best practice) regarding the offload settings for the NICs? I got two ports: p4p1 (Public net) and p4p2 (Cluster internal), the cluster internal has MTU 9000 across all the OSD servers and of course on the SW ports: ceph@cephosd01:~$ ethtool -k p4p1 Features for p4p1: rx-checksumming: on tx-checksumming: on tx-checksum-ipv4: on tx-checksum-ip-generic: off [fixed] tx-checksum-ipv6: on tx-checksum-fcoe-crc: on [fixed] tx-checksum-sctp: on scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: off [fixed] tcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: off [fixed] tx-tcp6-segmentation: on udp-fragmentation-offload: off [fixed] generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: on rx-vlan-offload: on tx-vlan-offload: on ntuple-filters: off receive-hashing: on highdma: on [fixed] rx-vlan-filter: on vlan-challenged: off [fixed] tx-lockless: off [fixed] netns-local: off [fixed] tx-gso-robust: off [fixed] tx-fcoe-segmentation: on [fixed] tx-gre-segmentation: off [fixed] tx-ipip-segmentation: off [fixed] tx-sit-segmentation: off [fixed] tx-udp_tnl-segmentation: off [fixed] tx-mpls-segmentation: off [fixed] fcoe-mtu: off [fixed] tx-nocache-copy: off loopback: off [fixed] rx-fcs: off [fixed] rx-all: off tx-vlan-stag-hw-insert: off [fixed] rx-vlan-stag-hw-parse: off [fixed] rx-vlan-stag-filter: off [fixed] l2-fwd-offload: off busy-poll: on [fixed] ceph@cephosd01:~$ ethtool -k p4p2 Features for p4p2: rx-checksumming: on tx-checksumming: on tx-checksum-ipv4: on tx-checksum-ip-generic: off [fixed] tx-checksum-ipv6: on tx-checksum-fcoe-crc: on [fixed] tx-checksum-sctp: on scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: off [fixed] tcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: off [fixed] tx-tcp6-segmentation: on udp-fragmentation-offload: off [fixed] generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: on rx-vlan-offload: on tx-vlan-offload: on ntuple-filters: off receive-hashing: on highdma: on [fixed] rx-vlan-filter: on vlan-challenged: off [fixed] tx-lockless: off [fixed] netns-local: off [fixed] tx-gso-robust: off [fixed] tx-fcoe-segmentation: on [fixed] tx-gre-segmentation: off [fixed] tx-ipip-segmentation: off [fixed] tx-sit-segmentation: off [fixed] tx-udp_tnl-segmentation: off [fixed] tx-mpls-segmentation: off [fixed] fcoe-mtu: off [fixed] tx-nocache-copy: off loopback: off [fixed] rx-fcs: off [fixed] rx-all: off tx-vlan-stag-hw-insert: off [fixed] rx-vlan-stag-hw-parse: off [fixed] rx-vlan-stag-filter: off [fixed] l2-fwd-offload: off busy-poll: on [fixed] ceph@cephosd01:~$ German Anders --- Original message --- Asunto: Re: [ceph-users] Typical 10GbE latency De: Stephan Seitz s.se...@heinlein-support.de Para: Wido den Hollander w...@42on.com Cc: ceph-users@lists.ceph.com Fecha: Thursday, 13/11/2014 15:39 Indeed, there must be something! But I can't figure it out yet. Same controllers, tried the same OS, direct cables, but the latency is 40% higher. Wido, just an educated guess: Did you check the offload settings of your NIC? ethtool -k IFNAME should you provide that. - Stephan -- Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin http://www.heinlein-support.de Tel: 030 / 405051-44 Fax: 030 / 405051-19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
Is this with a 8192 byte payload? Oh, sorry it was with 1500. I'll try to send a report with 8192 tomorrow. - Mail original - De: Robert LeBlanc rob...@leblancnet.us À: Alexandre DERUMIER aderum...@odiso.com Cc: Wido den Hollander w...@42on.com, ceph-users@lists.ceph.com Envoyé: Mardi 11 Novembre 2014 23:13:17 Objet: Re: [ceph-users] Typical 10GbE latency Is this with a 8192 byte payload? Theoretical transfer time of 1 Gbps (you are only sending one packet so LACP won't help) one direction is 0.061 ms, double that and you are at 0.122 ms of bits in flight, then there is context switching, switch latency (store and forward assumed for 1 Gbps), etc which I'm not sure would fit in the rest of the 0.057 of you min time. If it is a 8192 byte payload, then I'm really impressed! On Tue, Nov 11, 2014 at 11:56 AM, Alexandre DERUMIER aderum...@odiso.com wrote: Don't have yet 10GBE, but here my result my simple lacp on 2 gigabit links with a cisco 6500 rtt min/avg/max/mdev = 0.179/0.202/0.221/0.019 ms (Seem to be lower than your 10gbe nexus) - Mail original - De: Wido den Hollander w...@42on.com À: ceph-users@lists.ceph.com Envoyé: Lundi 10 Novembre 2014 17:22:04 Objet: Re: [ceph-users] Typical 10GbE latency On 08-11-14 02:42, Gary M wrote: Wido, Take the switch out of the path between nodes and remeasure.. ICMP-echo requests are very low priority traffic for switches and network stacks. I tried with a direct TwinAx and fiber cable. No difference. If you really want to know, place a network analyzer between the nodes to measure the request packet to response packet latency.. The ICMP traffic to the ping application is not accurate in the sub-millisecond range. And should only be used as a rough estimate. True, I fully agree with you. But, why is everybody showing a lower latency here? My latencies are about 40% higher then what I see in this setup and other setups. You also may want to install the high resolution timer patch, sometimes called HRT, to the kernel which may give you different results. ICMP traffic takes a different path than the TCP traffic and should not be considered an indicator of defect. Yes, I'm aware. But it still doesn't explain me why the latency on other systems, which are in production, is lower then on this idle system. I believe the ping app calls the sendto system call.(sorry its been a while since I last looked) Systems calls can take between .1us and .2us each. However, the ping application makes several of these calls and waits for a signal from the kernel. The wait for a signal means the ping application must wait to be rescheduled to report the time.Rescheduling will depend on a lot of other factors in the os. eg, timers, card interrupts other tasks with higher priorities. Reporting the time must add a few more systems calls for this to happen. As the ping application loops to post the next ping request which again requires a few systems calls which may cause a task switch while in each system call. For the above factors, the ping application is not a good representation of network performance due to factors in the application and network traffic shaping performed at the switch and the tcp stacks. I think that netperf is probably a better tool, but that also does TCP latencies. I want the real IP latency, so I assumed that ICMP would be the most simple one. The other setups I have access to are in production and do not have any special tuning, yet their latency is still lower then on this new deployment. That's what gets me confused. Wido cheers, gary On Fri, Nov 7, 2014 at 4:32 PM, Łukasz Jagiełło jagiello.luk...@gmail.com mailto: jagiello.luk...@gmail.com wrote: Hi, rtt min/avg/max/mdev = 0.070/0.177/0.272/0.049 ms 04:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) at both hosts and Arista 7050S-64 between. Both hosts were part of active ceph cluster. On Thu, Nov 6, 2014 at 5:18 AM, Wido den Hollander w...@42on.com mailto: w...@42on.com wrote: Hello, While working at a customer I've ran into a 10GbE latency which seems high to me. I have access to a couple of Ceph cluster and I ran a simple ping test: $ ping -s 8192 -c 100 -n ip Two results I got: rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms Both these environment are running with Intel 82599ES 10Gbit cards in LACP. One with Extreme Networks switches, the other with Arista. Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm seeing: rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms As you can see, the Cisco Nexus network has high latency compared to the other setup. You would say the switches are to blame, but we also tried with a direct TwinAx
Re: [ceph-users] Typical 10GbE latency
(back to list) On 11/10/2014 06:57 PM, Gary M wrote: Hi Wido, That is a bit weird.. I'd also check the Ethernet controller firmware version and settings between the other configurations. There must be something different. Indeed, there must be something! But I can't figure it out yet. Same controllers, tried the same OS, direct cables, but the latency is 40% higher. I can understand wanting to do a simple latency test.. But as we get closer to hw speeds and microsecond measurements, measures appear to be more unstable through software stacks. I fully agree with you. But a basic ICMP test on a idle machine should be a baseline from where you can start with further diagnosing network latency using better tools like netperf. Wido -gary On Mon, Nov 10, 2014 at 9:22 AM, Wido den Hollander w...@42on.com wrote: On 08-11-14 02:42, Gary M wrote: Wido, Take the switch out of the path between nodes and remeasure.. ICMP-echo requests are very low priority traffic for switches and network stacks. I tried with a direct TwinAx and fiber cable. No difference. If you really want to know, place a network analyzer between the nodes to measure the request packet to response packet latency.. The ICMP traffic to the ping application is not accurate in the sub-millisecond range. And should only be used as a rough estimate. True, I fully agree with you. But, why is everybody showing a lower latency here? My latencies are about 40% higher then what I see in this setup and other setups. You also may want to install the high resolution timer patch, sometimes called HRT, to the kernel which may give you different results. ICMP traffic takes a different path than the TCP traffic and should not be considered an indicator of defect. Yes, I'm aware. But it still doesn't explain me why the latency on other systems, which are in production, is lower then on this idle system. I believe the ping app calls the sendto system call.(sorry its been a while since I last looked) Systems calls can take between .1us and .2us each. However, the ping application makes several of these calls and waits for a signal from the kernel. The wait for a signal means the ping application must wait to be rescheduled to report the time.Rescheduling will depend on a lot of other factors in the os. eg, timers, card interrupts other tasks with higher priorities. Reporting the time must add a few more systems calls for this to happen. As the ping application loops to post the next ping request which again requires a few systems calls which may cause a task switch while in each system call. For the above factors, the ping application is not a good representation of network performance due to factors in the application and network traffic shaping performed at the switch and the tcp stacks. I think that netperf is probably a better tool, but that also does TCP latencies. I want the real IP latency, so I assumed that ICMP would be the most simple one. The other setups I have access to are in production and do not have any special tuning, yet their latency is still lower then on this new deployment. That's what gets me confused. Wido cheers, gary On Fri, Nov 7, 2014 at 4:32 PM, Łukasz Jagiełło jagiello.luk...@gmail.com mailto:jagiello.luk...@gmail.com wrote: Hi, rtt min/avg/max/mdev = 0.070/0.177/0.272/0.049 ms 04:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) at both hosts and Arista 7050S-64 between. Both hosts were part of active ceph cluster. On Thu, Nov 6, 2014 at 5:18 AM, Wido den Hollander w...@42on.com mailto:w...@42on.com wrote: Hello, While working at a customer I've ran into a 10GbE latency which seems high to me. I have access to a couple of Ceph cluster and I ran a simple ping test: $ ping -s 8192 -c 100 -n ip Two results I got: rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms Both these environment are running with Intel 82599ES 10Gbit cards in LACP. One with Extreme Networks switches, the other with Arista. Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm seeing: rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms As you can see, the Cisco Nexus network has high latency compared to the other setup. You would say the switches are to blame, but we also tried with a direct TwinAx connection, but that didn't help. This setup also uses the Intel 82599ES cards, so the cards don't seem to be the problem. The MTU is set to 9000 on all these networks and cards. I was wondering, others with a Ceph cluster running on 10GbE, could you
Re: [ceph-users] Typical 10GbE latency
Don't have yet 10GBE, but here my result my simple lacp on 2 gigabit links with a cisco 6500 rtt min/avg/max/mdev = 0.179/0.202/0.221/0.019 ms (Seem to be lower than your 10gbe nexus) - Mail original - De: Wido den Hollander w...@42on.com À: ceph-users@lists.ceph.com Envoyé: Lundi 10 Novembre 2014 17:22:04 Objet: Re: [ceph-users] Typical 10GbE latency On 08-11-14 02:42, Gary M wrote: Wido, Take the switch out of the path between nodes and remeasure.. ICMP-echo requests are very low priority traffic for switches and network stacks. I tried with a direct TwinAx and fiber cable. No difference. If you really want to know, place a network analyzer between the nodes to measure the request packet to response packet latency.. The ICMP traffic to the ping application is not accurate in the sub-millisecond range. And should only be used as a rough estimate. True, I fully agree with you. But, why is everybody showing a lower latency here? My latencies are about 40% higher then what I see in this setup and other setups. You also may want to install the high resolution timer patch, sometimes called HRT, to the kernel which may give you different results. ICMP traffic takes a different path than the TCP traffic and should not be considered an indicator of defect. Yes, I'm aware. But it still doesn't explain me why the latency on other systems, which are in production, is lower then on this idle system. I believe the ping app calls the sendto system call.(sorry its been a while since I last looked) Systems calls can take between .1us and .2us each. However, the ping application makes several of these calls and waits for a signal from the kernel. The wait for a signal means the ping application must wait to be rescheduled to report the time.Rescheduling will depend on a lot of other factors in the os. eg, timers, card interrupts other tasks with higher priorities. Reporting the time must add a few more systems calls for this to happen. As the ping application loops to post the next ping request which again requires a few systems calls which may cause a task switch while in each system call. For the above factors, the ping application is not a good representation of network performance due to factors in the application and network traffic shaping performed at the switch and the tcp stacks. I think that netperf is probably a better tool, but that also does TCP latencies. I want the real IP latency, so I assumed that ICMP would be the most simple one. The other setups I have access to are in production and do not have any special tuning, yet their latency is still lower then on this new deployment. That's what gets me confused. Wido cheers, gary On Fri, Nov 7, 2014 at 4:32 PM, Łukasz Jagiełło jagiello.luk...@gmail.com mailto:jagiello.luk...@gmail.com wrote: Hi, rtt min/avg/max/mdev = 0.070/0.177/0.272/0.049 ms 04:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) at both hosts and Arista 7050S-64 between. Both hosts were part of active ceph cluster. On Thu, Nov 6, 2014 at 5:18 AM, Wido den Hollander w...@42on.com mailto:w...@42on.com wrote: Hello, While working at a customer I've ran into a 10GbE latency which seems high to me. I have access to a couple of Ceph cluster and I ran a simple ping test: $ ping -s 8192 -c 100 -n ip Two results I got: rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms Both these environment are running with Intel 82599ES 10Gbit cards in LACP. One with Extreme Networks switches, the other with Arista. Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm seeing: rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms As you can see, the Cisco Nexus network has high latency compared to the other setup. You would say the switches are to blame, but we also tried with a direct TwinAx connection, but that didn't help. This setup also uses the Intel 82599ES cards, so the cards don't seem to be the problem. The MTU is set to 9000 on all these networks and cards. I was wondering, others with a Ceph cluster running on 10GbE, could you perform a simple network latency test like this? I'd like to compare the results. -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 tel:%2B31%20%280%2920%20700%209902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Łukasz Jagiełło lukaszatjagiellodotorg ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http
Re: [ceph-users] Typical 10GbE latency
Is this with a 8192 byte payload? Theoretical transfer time of 1 Gbps (you are only sending one packet so LACP won't help) one direction is 0.061 ms, double that and you are at 0.122 ms of bits in flight, then there is context switching, switch latency (store and forward assumed for 1 Gbps), etc which I'm not sure would fit in the rest of the 0.057 of you min time. If it is a 8192 byte payload, then I'm really impressed! On Tue, Nov 11, 2014 at 11:56 AM, Alexandre DERUMIER aderum...@odiso.com wrote: Don't have yet 10GBE, but here my result my simple lacp on 2 gigabit links with a cisco 6500 rtt min/avg/max/mdev = 0.179/0.202/0.221/0.019 ms (Seem to be lower than your 10gbe nexus) - Mail original - De: Wido den Hollander w...@42on.com À: ceph-users@lists.ceph.com Envoyé: Lundi 10 Novembre 2014 17:22:04 Objet: Re: [ceph-users] Typical 10GbE latency On 08-11-14 02:42, Gary M wrote: Wido, Take the switch out of the path between nodes and remeasure.. ICMP-echo requests are very low priority traffic for switches and network stacks. I tried with a direct TwinAx and fiber cable. No difference. If you really want to know, place a network analyzer between the nodes to measure the request packet to response packet latency.. The ICMP traffic to the ping application is not accurate in the sub-millisecond range. And should only be used as a rough estimate. True, I fully agree with you. But, why is everybody showing a lower latency here? My latencies are about 40% higher then what I see in this setup and other setups. You also may want to install the high resolution timer patch, sometimes called HRT, to the kernel which may give you different results. ICMP traffic takes a different path than the TCP traffic and should not be considered an indicator of defect. Yes, I'm aware. But it still doesn't explain me why the latency on other systems, which are in production, is lower then on this idle system. I believe the ping app calls the sendto system call.(sorry its been a while since I last looked) Systems calls can take between .1us and .2us each. However, the ping application makes several of these calls and waits for a signal from the kernel. The wait for a signal means the ping application must wait to be rescheduled to report the time.Rescheduling will depend on a lot of other factors in the os. eg, timers, card interrupts other tasks with higher priorities. Reporting the time must add a few more systems calls for this to happen. As the ping application loops to post the next ping request which again requires a few systems calls which may cause a task switch while in each system call. For the above factors, the ping application is not a good representation of network performance due to factors in the application and network traffic shaping performed at the switch and the tcp stacks. I think that netperf is probably a better tool, but that also does TCP latencies. I want the real IP latency, so I assumed that ICMP would be the most simple one. The other setups I have access to are in production and do not have any special tuning, yet their latency is still lower then on this new deployment. That's what gets me confused. Wido cheers, gary On Fri, Nov 7, 2014 at 4:32 PM, Łukasz Jagiełło jagiello.luk...@gmail.com mailto:jagiello.luk...@gmail.com wrote: Hi, rtt min/avg/max/mdev = 0.070/0.177/0.272/0.049 ms 04:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) at both hosts and Arista 7050S-64 between. Both hosts were part of active ceph cluster. On Thu, Nov 6, 2014 at 5:18 AM, Wido den Hollander w...@42on.com mailto:w...@42on.com wrote: Hello, While working at a customer I've ran into a 10GbE latency which seems high to me. I have access to a couple of Ceph cluster and I ran a simple ping test: $ ping -s 8192 -c 100 -n ip Two results I got: rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms Both these environment are running with Intel 82599ES 10Gbit cards in LACP. One with Extreme Networks switches, the other with Arista. Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm seeing: rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms As you can see, the Cisco Nexus network has high latency compared to the other setup. You would say the switches are to blame, but we also tried with a direct TwinAx connection, but that didn't help. This setup also uses the Intel 82599ES cards, so the cards don't seem to be the problem. The MTU is set to 9000 on all these networks and cards. I was wondering, others with a Ceph cluster running on 10GbE, could you perform a simple network latency test like this? I'd like to compare the results. -- Wido den Hollander 42on B.V
Re: [ceph-users] Typical 10GbE latency
On 08-11-14 02:42, Gary M wrote: Wido, Take the switch out of the path between nodes and remeasure.. ICMP-echo requests are very low priority traffic for switches and network stacks. I tried with a direct TwinAx and fiber cable. No difference. If you really want to know, place a network analyzer between the nodes to measure the request packet to response packet latency.. The ICMP traffic to the ping application is not accurate in the sub-millisecond range. And should only be used as a rough estimate. True, I fully agree with you. But, why is everybody showing a lower latency here? My latencies are about 40% higher then what I see in this setup and other setups. You also may want to install the high resolution timer patch, sometimes called HRT, to the kernel which may give you different results. ICMP traffic takes a different path than the TCP traffic and should not be considered an indicator of defect. Yes, I'm aware. But it still doesn't explain me why the latency on other systems, which are in production, is lower then on this idle system. I believe the ping app calls the sendto system call.(sorry its been a while since I last looked) Systems calls can take between .1us and .2us each. However, the ping application makes several of these calls and waits for a signal from the kernel. The wait for a signal means the ping application must wait to be rescheduled to report the time.Rescheduling will depend on a lot of other factors in the os. eg, timers, card interrupts other tasks with higher priorities. Reporting the time must add a few more systems calls for this to happen. As the ping application loops to post the next ping request which again requires a few systems calls which may cause a task switch while in each system call. For the above factors, the ping application is not a good representation of network performance due to factors in the application and network traffic shaping performed at the switch and the tcp stacks. I think that netperf is probably a better tool, but that also does TCP latencies. I want the real IP latency, so I assumed that ICMP would be the most simple one. The other setups I have access to are in production and do not have any special tuning, yet their latency is still lower then on this new deployment. That's what gets me confused. Wido cheers, gary On Fri, Nov 7, 2014 at 4:32 PM, Łukasz Jagiełło jagiello.luk...@gmail.com mailto:jagiello.luk...@gmail.com wrote: Hi, rtt min/avg/max/mdev = 0.070/0.177/0.272/0.049 ms 04:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) at both hosts and Arista 7050S-64 between. Both hosts were part of active ceph cluster. On Thu, Nov 6, 2014 at 5:18 AM, Wido den Hollander w...@42on.com mailto:w...@42on.com wrote: Hello, While working at a customer I've ran into a 10GbE latency which seems high to me. I have access to a couple of Ceph cluster and I ran a simple ping test: $ ping -s 8192 -c 100 -n ip Two results I got: rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms Both these environment are running with Intel 82599ES 10Gbit cards in LACP. One with Extreme Networks switches, the other with Arista. Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm seeing: rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms As you can see, the Cisco Nexus network has high latency compared to the other setup. You would say the switches are to blame, but we also tried with a direct TwinAx connection, but that didn't help. This setup also uses the Intel 82599ES cards, so the cards don't seem to be the problem. The MTU is set to 9000 on all these networks and cards. I was wondering, others with a Ceph cluster running on 10GbE, could you perform a simple network latency test like this? I'd like to compare the results. -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 tel:%2B31%20%280%2920%20700%209902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Łukasz Jagiełło lukaszatjagiellodotorg ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
Hi, this is with intel 10GBE bondet (2x10Gbit/s) network. rtt min/avg/max/mdev = 0.053/0.107/0.184/0.034 ms I thought that the mellanox stuff had lower latencies. Stefan Am 06.11.2014 um 18:09 schrieb Robert LeBlanc: rtt min/avg/max/mdev = 0.130/0.157/0.190/0.016 ms IPoIB Mellanox ConnectX-3 MT27500 FDR adapter and Mellanox IS5022 QDR switch MTU set to 65520. CentOS 7.0.1406 running 3.17.2-1.el7.elrepo.x86_64 on Intel(R) Atom(TM) CPU C2750 with 32 GB of RAM. On Thu, Nov 6, 2014 at 9:46 AM, Udo Lembke ulem...@polarzone.de mailto:ulem...@polarzone.de wrote: Hi, no special optimizations on the host. In this case the pings are from an proxmox-ve host to ceph-osds (ubuntu + debian). The pings from one osd to the others are comparable. Udo On 06.11.2014 15:00, Irek Fasikhov wrote: Hi,Udo. Good value :) Whether an additional optimization on the host? Thanks. Thu Nov 06 2014 at 16:57:36, Udo Lembke ulem...@polarzone.de mailto:ulem...@polarzone.de: Hi, from one host to five OSD-hosts. NIC Intel 82599EB; jumbo-frames; single Switch IBM G8124 (blade network). rtt min/avg/max/mdev = 0.075/0.114/0.231/0.037 ms rtt min/avg/max/mdev = 0.088/0.164/0.739/0.072 ms rtt min/avg/max/mdev = 0.081/0.141/0.229/0.030 ms rtt min/avg/max/mdev = 0.083/0.115/0.183/0.030 ms rtt min/avg/max/mdev = 0.087/0.144/0.190/0.028 ms Udo ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
Infiniband has much lower latencies when performing RDMA and native IB traffic. Doing IPoIB adds all the Ethernet stuff that has to be done in software. Still it is comparable to Ethernet even with this disadvantage. Once Ceph has the ability to do native RDMA, Infiniband should have an edge. Robert LeBlanc Sent from a mobile device please excuse any typos. On Nov 7, 2014 4:25 AM, Stefan Priebe - Profihost AG s.pri...@profihost.ag wrote: Hi, this is with intel 10GBE bondet (2x10Gbit/s) network. rtt min/avg/max/mdev = 0.053/0.107/0.184/0.034 ms I thought that the mellanox stuff had lower latencies. Stefan Am 06.11.2014 um 18:09 schrieb Robert LeBlanc: rtt min/avg/max/mdev = 0.130/0.157/0.190/0.016 ms IPoIB Mellanox ConnectX-3 MT27500 FDR adapter and Mellanox IS5022 QDR switch MTU set to 65520. CentOS 7.0.1406 running 3.17.2-1.el7.elrepo.x86_64 on Intel(R) Atom(TM) CPU C2750 with 32 GB of RAM. On Thu, Nov 6, 2014 at 9:46 AM, Udo Lembke ulem...@polarzone.de mailto:ulem...@polarzone.de wrote: Hi, no special optimizations on the host. In this case the pings are from an proxmox-ve host to ceph-osds (ubuntu + debian). The pings from one osd to the others are comparable. Udo On 06.11.2014 15:00, Irek Fasikhov wrote: Hi,Udo. Good value :) Whether an additional optimization on the host? Thanks. Thu Nov 06 2014 at 16:57:36, Udo Lembke ulem...@polarzone.de mailto:ulem...@polarzone.de: Hi, from one host to five OSD-hosts. NIC Intel 82599EB; jumbo-frames; single Switch IBM G8124 (blade network). rtt min/avg/max/mdev = 0.075/0.114/0.231/0.037 ms rtt min/avg/max/mdev = 0.088/0.164/0.739/0.072 ms rtt min/avg/max/mdev = 0.081/0.141/0.229/0.030 ms rtt min/avg/max/mdev = 0.083/0.115/0.183/0.030 ms rtt min/avg/max/mdev = 0.087/0.144/0.190/0.028 ms Udo ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
Mellanox is also doing ethernet now, http://www.mellanox.com/page/products_dyn?product_family=163mtag=sx1012 for example - 220nsec for 40GbE - 280nsec for 10GbE And I think it's also possible to do Roce (rdma over ethernet) with mellanox connect-x3 adapters - Mail original - De: Robert LeBlanc rob...@leblancnet.us À: Stefan Priebe s.pri...@profihost.ag Cc: ceph-users@lists.ceph.com Envoyé: Vendredi 7 Novembre 2014 16:00:40 Objet: Re: [ceph-users] Typical 10GbE latency Infiniband has much lower latencies when performing RDMA and native IB traffic. Doing IPoIB adds all the Ethernet stuff that has to be done in software. Still it is comparable to Ethernet even with this disadvantage. Once Ceph has the ability to do native RDMA, Infiniband should have an edge. Robert LeBlanc Sent from a mobile device please excuse any typos. On Nov 7, 2014 4:25 AM, Stefan Priebe - Profihost AG s.pri...@profihost.ag wrote: Hi, this is with intel 10GBE bondet (2x10Gbit/s) network. rtt min/avg/max/mdev = 0.053/0.107/0.184/0.034 ms I thought that the mellanox stuff had lower latencies. Stefan Am 06.11.2014 um 18:09 schrieb Robert LeBlanc: rtt min/avg/max/mdev = 0.130/0.157/0.190/0.016 ms IPoIB Mellanox ConnectX-3 MT27500 FDR adapter and Mellanox IS5022 QDR switch MTU set to 65520. CentOS 7.0.1406 running 3.17.2-1.el7.elrepo.x86_64 on Intel(R) Atom(TM) CPU C2750 with 32 GB of RAM. On Thu, Nov 6, 2014 at 9:46 AM, Udo Lembke ulem...@polarzone.de mailto: ulem...@polarzone.de wrote: Hi, no special optimizations on the host. In this case the pings are from an proxmox-ve host to ceph-osds (ubuntu + debian). The pings from one osd to the others are comparable. Udo On 06.11.2014 15:00, Irek Fasikhov wrote: Hi,Udo. Good value :) Whether an additional optimization on the host? Thanks. Thu Nov 06 2014 at 16:57:36, Udo Lembke ulem...@polarzone.de mailto: ulem...@polarzone.de : Hi, from one host to five OSD-hosts. NIC Intel 82599EB; jumbo-frames; single Switch IBM G8124 (blade network). rtt min/avg/max/mdev = 0.075/0.114/0.231/0.037 ms rtt min/avg/max/mdev = 0.088/0.164/0.739/0.072 ms rtt min/avg/max/mdev = 0.081/0.141/0.229/0.030 ms rtt min/avg/max/mdev = 0.083/0.115/0.183/0.030 ms rtt min/avg/max/mdev = 0.087/0.144/0.190/0.028 ms Udo ___ ceph-users mailing list ceph-users@lists.ceph.com mailto: ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
Hi, rtt min/avg/max/mdev = 0.070/0.177/0.272/0.049 ms 04:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) at both hosts and Arista 7050S-64 between. Both hosts were part of active ceph cluster. On Thu, Nov 6, 2014 at 5:18 AM, Wido den Hollander w...@42on.com wrote: Hello, While working at a customer I've ran into a 10GbE latency which seems high to me. I have access to a couple of Ceph cluster and I ran a simple ping test: $ ping -s 8192 -c 100 -n ip Two results I got: rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms Both these environment are running with Intel 82599ES 10Gbit cards in LACP. One with Extreme Networks switches, the other with Arista. Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm seeing: rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms As you can see, the Cisco Nexus network has high latency compared to the other setup. You would say the switches are to blame, but we also tried with a direct TwinAx connection, but that didn't help. This setup also uses the Intel 82599ES cards, so the cards don't seem to be the problem. The MTU is set to 9000 on all these networks and cards. I was wondering, others with a Ceph cluster running on 10GbE, could you perform a simple network latency test like this? I'd like to compare the results. -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Łukasz Jagiełło lukaszatjagiellodotorg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
Wido, Take the switch out of the path between nodes and remeasure.. ICMP-echo requests are very low priority traffic for switches and network stacks. If you really want to know, place a network analyzer between the nodes to measure the request packet to response packet latency.. The ICMP traffic to the ping application is not accurate in the sub-millisecond range. And should only be used as a rough estimate. You also may want to install the high resolution timer patch, sometimes called HRT, to the kernel which may give you different results. ICMP traffic takes a different path than the TCP traffic and should not be considered an indicator of defect. I believe the ping app calls the sendto system call.(sorry its been a while since I last looked) Systems calls can take between .1us and .2us each. However, the ping application makes several of these calls and waits for a signal from the kernel. The wait for a signal means the ping application must wait to be rescheduled to report the time.Rescheduling will depend on a lot of other factors in the os. eg, timers, card interrupts other tasks with higher priorities. Reporting the time must add a few more systems calls for this to happen. As the ping application loops to post the next ping request which again requires a few systems calls which may cause a task switch while in each system call. For the above factors, the ping application is not a good representation of network performance due to factors in the application and network traffic shaping performed at the switch and the tcp stacks. cheers, gary On Fri, Nov 7, 2014 at 4:32 PM, Łukasz Jagiełło jagiello.luk...@gmail.com wrote: Hi, rtt min/avg/max/mdev = 0.070/0.177/0.272/0.049 ms 04:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) at both hosts and Arista 7050S-64 between. Both hosts were part of active ceph cluster. On Thu, Nov 6, 2014 at 5:18 AM, Wido den Hollander w...@42on.com wrote: Hello, While working at a customer I've ran into a 10GbE latency which seems high to me. I have access to a couple of Ceph cluster and I ran a simple ping test: $ ping -s 8192 -c 100 -n ip Two results I got: rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms Both these environment are running with Intel 82599ES 10Gbit cards in LACP. One with Extreme Networks switches, the other with Arista. Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm seeing: rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms As you can see, the Cisco Nexus network has high latency compared to the other setup. You would say the switches are to blame, but we also tried with a direct TwinAx connection, but that didn't help. This setup also uses the Intel 82599ES cards, so the cards don't seem to be the problem. The MTU is set to 9000 on all these networks and cards. I was wondering, others with a Ceph cluster running on 10GbE, could you perform a simple network latency test like this? I'd like to compare the results. -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Łukasz Jagiełło lukaszatjagiellodotorg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Typical 10GbE latency
Hello, While working at a customer I've ran into a 10GbE latency which seems high to me. I have access to a couple of Ceph cluster and I ran a simple ping test: $ ping -s 8192 -c 100 -n ip Two results I got: rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms Both these environment are running with Intel 82599ES 10Gbit cards in LACP. One with Extreme Networks switches, the other with Arista. Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm seeing: rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms As you can see, the Cisco Nexus network has high latency compared to the other setup. You would say the switches are to blame, but we also tried with a direct TwinAx connection, but that didn't help. This setup also uses the Intel 82599ES cards, so the cards don't seem to be the problem. The MTU is set to 9000 on all these networks and cards. I was wondering, others with a Ceph cluster running on 10GbE, could you perform a simple network latency test like this? I'd like to compare the results. -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
Between two hosts on an HP Procurve 6600, no jumbo frames: rtt min/avg/max/mdev = 0.096/0.128/0.151/0.019 ms Cheers, Dan On Thu Nov 06 2014 at 2:19:07 PM Wido den Hollander w...@42on.com wrote: Hello, While working at a customer I've ran into a 10GbE latency which seems high to me. I have access to a couple of Ceph cluster and I ran a simple ping test: $ ping -s 8192 -c 100 -n ip Two results I got: rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms Both these environment are running with Intel 82599ES 10Gbit cards in LACP. One with Extreme Networks switches, the other with Arista. Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm seeing: rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms As you can see, the Cisco Nexus network has high latency compared to the other setup. You would say the switches are to blame, but we also tried with a direct TwinAx connection, but that didn't help. This setup also uses the Intel 82599ES cards, so the cards don't seem to be the problem. The MTU is set to 9000 on all these networks and cards. I was wondering, others with a Ceph cluster running on 10GbE, could you perform a simple network latency test like this? I'd like to compare the results. -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
Hi Wido, What is the full topology? Are you using a north-south or east-west? So far I've seen the east-west are slightly slower. What are the fabric modes you have configured? How is everything connected? Also you have no information on the OS - if I remember correctly there was a lot of improvements in the latest kernels... And what about the bandwith? The values you present don't seem awfully high, and the deviation seems low. On Thu, Nov 6, 2014 at 1:18 PM, Wido den Hollander w...@42on.com wrote: Hello, While working at a customer I've ran into a 10GbE latency which seems high to me. I have access to a couple of Ceph cluster and I ran a simple ping test: $ ping -s 8192 -c 100 -n ip Two results I got: rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms Both these environment are running with Intel 82599ES 10Gbit cards in LACP. One with Extreme Networks switches, the other with Arista. Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm seeing: rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms As you can see, the Cisco Nexus network has high latency compared to the other setup. You would say the switches are to blame, but we also tried with a direct TwinAx connection, but that didn't help. This setup also uses the Intel 82599ES cards, so the cards don't seem to be the problem. The MTU is set to 9000 on all these networks and cards. I was wondering, others with a Ceph cluster running on 10GbE, could you perform a simple network latency test like this? I'd like to compare the results. -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
also, between two hosts on a NetGear SW model at 10GbE: rtt min/avg/max/mdev = 0.104/0.196/0.288/0.055 ms German Anders --- Original message --- Asunto: [ceph-users] Typical 10GbE latency De: Wido den Hollander w...@42on.com Para: ceph-us...@ceph.com Fecha: Thursday, 06/11/2014 10:18 Hello, While working at a customer I've ran into a 10GbE latency which seems high to me. I have access to a couple of Ceph cluster and I ran a simple ping test: $ ping -s 8192 -c 100 -n ip Two results I got: rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms Both these environment are running with Intel 82599ES 10Gbit cards in LACP. One with Extreme Networks switches, the other with Arista. Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm seeing: rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms As you can see, the Cisco Nexus network has high latency compared to the other setup. You would say the switches are to blame, but we also tried with a direct TwinAx connection, but that didn't help. This setup also uses the Intel 82599ES cards, so the cards don't seem to be the problem. The MTU is set to 9000 on all these networks and cards. I was wondering, others with a Ceph cluster running on 10GbE, could you perform a simple network latency test like this? I'd like to compare the results. -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
On 11/06/2014 02:38 PM, Luis Periquito wrote: Hi Wido, What is the full topology? Are you using a north-south or east-west? So far I've seen the east-west are slightly slower. What are the fabric modes you have configured? How is everything connected? Also you have no information on the OS - if I remember correctly there was a lot of improvements in the latest kernels... The Nexus 3000s are connected with 40Gbit to the Nexus 7000. There are two 7000 units and 8 3000s spread out over 4 racks. But the test I did was with two hosts connected to the same Nexus 3000 switch using TwinAx cabling of 3m. The tests were performed with Ubuntu 14.04 (3.13) and RHEL 7 (3.10), but that didn't make a difference. And what about the bandwith? Just fine, no problems getting 10Gbit through the NICs. The values you present don't seem awfully high, and the deviation seems low. No, they don't seem high, but they are about 40% higher then the values I see on other environments. 40% is a lot. This Ceph cluster is SSD-only, so the lower the latency, the more IOps the system can do. Wido On Thu, Nov 6, 2014 at 1:18 PM, Wido den Hollander w...@42on.com wrote: Hello, While working at a customer I've ran into a 10GbE latency which seems high to me. I have access to a couple of Ceph cluster and I ran a simple ping test: $ ping -s 8192 -c 100 -n ip Two results I got: rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms Both these environment are running with Intel 82599ES 10Gbit cards in LACP. One with Extreme Networks switches, the other with Arista. Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm seeing: rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms As you can see, the Cisco Nexus network has high latency compared to the other setup. You would say the switches are to blame, but we also tried with a direct TwinAx connection, but that didn't help. This setup also uses the Intel 82599ES cards, so the cards don't seem to be the problem. The MTU is set to 9000 on all these networks and cards. I was wondering, others with a Ceph cluster running on 10GbE, could you perform a simple network latency test like this? I'd like to compare the results. -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
Hi, from one host to five OSD-hosts. NIC Intel 82599EB; jumbo-frames; single Switch IBM G8124 (blade network). rtt min/avg/max/mdev = 0.075/0.114/0.231/0.037 ms rtt min/avg/max/mdev = 0.088/0.164/0.739/0.072 ms rtt min/avg/max/mdev = 0.081/0.141/0.229/0.030 ms rtt min/avg/max/mdev = 0.083/0.115/0.183/0.030 ms rtt min/avg/max/mdev = 0.087/0.144/0.190/0.028 ms Udo Am 06.11.2014 14:18, schrieb Wido den Hollander: Hello, While working at a customer I've ran into a 10GbE latency which seems high to me. I have access to a couple of Ceph cluster and I ran a simple ping test: $ ping -s 8192 -c 100 -n ip Two results I got: rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms Both these environment are running with Intel 82599ES 10Gbit cards in LACP. One with Extreme Networks switches, the other with Arista. Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm seeing: rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms As you can see, the Cisco Nexus network has high latency compared to the other setup. You would say the switches are to blame, but we also tried with a direct TwinAx connection, but that didn't help. This setup also uses the Intel 82599ES cards, so the cards don't seem to be the problem. The MTU is set to 9000 on all these networks and cards. I was wondering, others with a Ceph cluster running on 10GbE, could you perform a simple network latency test like this? I'd like to compare the results. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
What is the COPP? On Thu, Nov 6, 2014 at 1:53 PM, Wido den Hollander w...@42on.com wrote: On 11/06/2014 02:38 PM, Luis Periquito wrote: Hi Wido, What is the full topology? Are you using a north-south or east-west? So far I've seen the east-west are slightly slower. What are the fabric modes you have configured? How is everything connected? Also you have no information on the OS - if I remember correctly there was a lot of improvements in the latest kernels... The Nexus 3000s are connected with 40Gbit to the Nexus 7000. There are two 7000 units and 8 3000s spread out over 4 racks. But the test I did was with two hosts connected to the same Nexus 3000 switch using TwinAx cabling of 3m. The tests were performed with Ubuntu 14.04 (3.13) and RHEL 7 (3.10), but that didn't make a difference. And what about the bandwith? Just fine, no problems getting 10Gbit through the NICs. The values you present don't seem awfully high, and the deviation seems low. No, they don't seem high, but they are about 40% higher then the values I see on other environments. 40% is a lot. This Ceph cluster is SSD-only, so the lower the latency, the more IOps the system can do. Wido On Thu, Nov 6, 2014 at 1:18 PM, Wido den Hollander w...@42on.com wrote: Hello, While working at a customer I've ran into a 10GbE latency which seems high to me. I have access to a couple of Ceph cluster and I ran a simple ping test: $ ping -s 8192 -c 100 -n ip Two results I got: rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms Both these environment are running with Intel 82599ES 10Gbit cards in LACP. One with Extreme Networks switches, the other with Arista. Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm seeing: rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms As you can see, the Cisco Nexus network has high latency compared to the other setup. You would say the switches are to blame, but we also tried with a direct TwinAx connection, but that didn't help. This setup also uses the Intel 82599ES cards, so the cards don't seem to be the problem. The MTU is set to 9000 on all these networks and cards. I was wondering, others with a Ceph cluster running on 10GbE, could you perform a simple network latency test like this? I'd like to compare the results. -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
Hi,Udo. Good value :) Whether an additional optimization on the host? Thanks. Thu Nov 06 2014 at 16:57:36, Udo Lembke ulem...@polarzone.de: Hi, from one host to five OSD-hosts. NIC Intel 82599EB; jumbo-frames; single Switch IBM G8124 (blade network). rtt min/avg/max/mdev = 0.075/0.114/0.231/0.037 ms rtt min/avg/max/mdev = 0.088/0.164/0.739/0.072 ms rtt min/avg/max/mdev = 0.081/0.141/0.229/0.030 ms rtt min/avg/max/mdev = 0.083/0.115/0.183/0.030 ms rtt min/avg/max/mdev = 0.087/0.144/0.190/0.028 ms Udo Am 06.11.2014 14:18, schrieb Wido den Hollander: Hello, While working at a customer I've ran into a 10GbE latency which seems high to me. I have access to a couple of Ceph cluster and I ran a simple ping test: $ ping -s 8192 -c 100 -n ip Two results I got: rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms Both these environment are running with Intel 82599ES 10Gbit cards in LACP. One with Extreme Networks switches, the other with Arista. Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm seeing: rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms As you can see, the Cisco Nexus network has high latency compared to the other setup. You would say the switches are to blame, but we also tried with a direct TwinAx connection, but that didn't help. This setup also uses the Intel 82599ES cards, so the cards don't seem to be the problem. The MTU is set to 9000 on all these networks and cards. I was wondering, others with a Ceph cluster running on 10GbE, could you perform a simple network latency test like this? I'd like to compare the results. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
Hi, 2 LACP bonded Intel Corporation Ethernet 10G 2P X520 Adapters, no jumbo frames, here: rtt min/avg/max/mdev = 0.141/0.207/0.313/0.040 ms rtt min/avg/max/mdev = 0.124/0.223/0.289/0.044 ms rtt min/avg/max/mdev = 0.302/0.378/0.460/0.038 ms rtt min/avg/max/mdev = 0.282/0.389/0.473/0.035 ms All hosts on the same stacked pair of Dell N4032F switches. Regards -- Robert Sander Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin http://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
On 11/06/2014 02:58 PM, Luis Periquito wrote: What is the COPP? Nothing special, default settings. 200 ICMP packets/second. But we also tested with a direct TwinAx cable between two hosts, so no switch involved. That did not improve the latency. So this seems to be a kernel/driver issue somewhere, but I can't think of anything. The systems I have access to have no special tuning and get much better latency. Wido On Thu, Nov 6, 2014 at 1:53 PM, Wido den Hollander w...@42on.com wrote: On 11/06/2014 02:38 PM, Luis Periquito wrote: Hi Wido, What is the full topology? Are you using a north-south or east-west? So far I've seen the east-west are slightly slower. What are the fabric modes you have configured? How is everything connected? Also you have no information on the OS - if I remember correctly there was a lot of improvements in the latest kernels... The Nexus 3000s are connected with 40Gbit to the Nexus 7000. There are two 7000 units and 8 3000s spread out over 4 racks. But the test I did was with two hosts connected to the same Nexus 3000 switch using TwinAx cabling of 3m. The tests were performed with Ubuntu 14.04 (3.13) and RHEL 7 (3.10), but that didn't make a difference. And what about the bandwith? Just fine, no problems getting 10Gbit through the NICs. The values you present don't seem awfully high, and the deviation seems low. No, they don't seem high, but they are about 40% higher then the values I see on other environments. 40% is a lot. This Ceph cluster is SSD-only, so the lower the latency, the more IOps the system can do. Wido On Thu, Nov 6, 2014 at 1:18 PM, Wido den Hollander w...@42on.com wrote: Hello, While working at a customer I've ran into a 10GbE latency which seems high to me. I have access to a couple of Ceph cluster and I ran a simple ping test: $ ping -s 8192 -c 100 -n ip Two results I got: rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms Both these environment are running with Intel 82599ES 10Gbit cards in LACP. One with Extreme Networks switches, the other with Arista. Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm seeing: rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms As you can see, the Cisco Nexus network has high latency compared to the other setup. You would say the switches are to blame, but we also tried with a direct TwinAx connection, but that didn't help. This setup also uses the Intel 82599ES cards, so the cards don't seem to be the problem. The MTU is set to 9000 on all these networks and cards. I was wondering, others with a Ceph cluster running on 10GbE, could you perform a simple network latency test like this? I'd like to compare the results. -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
Hi, no special optimizations on the host. In this case the pings are from an proxmox-ve host to ceph-osds (ubuntu + debian). The pings from one osd to the others are comparable. Udo On 06.11.2014 15:00, Irek Fasikhov wrote: Hi,Udo. Good value :) Whether an additional optimization on the host? Thanks. Thu Nov 06 2014 at 16:57:36, Udo Lembke ulem...@polarzone.de mailto:ulem...@polarzone.de: Hi, from one host to five OSD-hosts. NIC Intel 82599EB; jumbo-frames; single Switch IBM G8124 (blade network). rtt min/avg/max/mdev = 0.075/0.114/0.231/0.037 ms rtt min/avg/max/mdev = 0.088/0.164/0.739/0.072 ms rtt min/avg/max/mdev = 0.081/0.141/0.229/0.030 ms rtt min/avg/max/mdev = 0.083/0.115/0.183/0.030 ms rtt min/avg/max/mdev = 0.087/0.144/0.190/0.028 ms Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Typical 10GbE latency
rtt min/avg/max/mdev = 0.130/0.157/0.190/0.016 ms IPoIB Mellanox ConnectX-3 MT27500 FDR adapter and Mellanox IS5022 QDR switch MTU set to 65520. CentOS 7.0.1406 running 3.17.2-1.el7.elrepo.x86_64 on Intel(R) Atom(TM) CPU C2750 with 32 GB of RAM. On Thu, Nov 6, 2014 at 9:46 AM, Udo Lembke ulem...@polarzone.de wrote: Hi, no special optimizations on the host. In this case the pings are from an proxmox-ve host to ceph-osds (ubuntu + debian). The pings from one osd to the others are comparable. Udo On 06.11.2014 15:00, Irek Fasikhov wrote: Hi,Udo. Good value :) Whether an additional optimization on the host? Thanks. Thu Nov 06 2014 at 16:57:36, Udo Lembke ulem...@polarzone.de: Hi, from one host to five OSD-hosts. NIC Intel 82599EB; jumbo-frames; single Switch IBM G8124 (blade network). rtt min/avg/max/mdev = 0.075/0.114/0.231/0.037 ms rtt min/avg/max/mdev = 0.088/0.164/0.739/0.072 ms rtt min/avg/max/mdev = 0.081/0.141/0.229/0.030 ms rtt min/avg/max/mdev = 0.083/0.115/0.183/0.030 ms rtt min/avg/max/mdev = 0.087/0.144/0.190/0.028 ms Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com