On 2018-01-23 08:27, Blair Bethwaite wrote:

> Firstly, the OP's premise in asking, "Or should there be a differnce
> of 10x", is fundamentally incorrect. Greater bandwidth does not mean
> lower latency, though the latter almost always results in the former.
> Unfortunately, changing the speed of light remains a difficult
> engineering challenge :-). However, you can do things like: add
> multiple links, overlap signals on the wire, and tweak error
> correction encodings; all to get more bits on the wire without making
> the wire itself any faster. Take Mellanox 100Gb ethernet, 1 lane is
> 25Gb, to get 50Gb they mash 2 lanes together, to get 100Gb they mash 4
> lanes - the latency of single bit transmission is more-or-less
> unchanged. Also note that with UDP/TCP pings or actual Ceph traffic
> we're going via the kernel stack running on the CPU and as such the
> speed & power-management of the CPU can make quite a difference.
> 
> Example 25GE on a dual-port CX-4 card in LACP bond, RHEL7 host.
> 
> $ cat /etc/redhat-release
> Red Hat Enterprise Linux Server release 7.3 (Maipo)
> $ ofed_info | head -1
> MLNX_OFED_LINUX-4.0-1.0.1.0 (OFED-4.0-1.0.1):
> $ grep 'model name' /proc/cpuinfo | uniq
> model name      : Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
> $ ibv_devinfo
> hca_id: mlx5_1
> transport:                      InfiniBand (0)
> fw_ver:                         14.18.1000
> node_guid:                      ...
> sys_image_guid:                 ...
> vendor_id:                      0x02c9
> vendor_part_id:                 4117
> hw_ver:                         0x0
> board_id:                       MT_2420110034
> ...
> 
> $ sudo ping -M do -s 8972 -c 100000 -f ...
> 100000 packets transmitted, 100000 received, 0% packet loss, time 4652ms
> rtt min/avg/max/mdev = 0.029/0.031/2.711/0.015 ms, ipg/ewma 0.046/0.031 ms
> 
> $ sudo ping -M do -s 3972 -c 100000 -f ...
> 100000 packets transmitted, 100000 received, 0% packet loss, time 3321ms
> rtt min/avg/max/mdev = 0.019/0.022/0.364/0.003 ms, ipg/ewma 0.033/0.022 ms
> 
> $ sudo ping -M do -s 1972 -c 100000 -f ...
> 100000 packets transmitted, 100000 received, 0% packet loss, time 2818ms
> rtt min/avg/max/mdev = 0.017/0.018/0.086/0.005 ms, ipg/ewma 0.028/0.021 ms
> 
> $ sudo ping -M do -s 472 -c 100000 -f ...
> 100000 packets transmitted, 100000 received, 0% packet loss, time 2498ms
> rtt min/avg/max/mdev = 0.014/0.016/0.305/0.005 ms, ipg/ewma 0.024/0.017 ms
> 
> $ sudo ping -M do -c 100000 -f ...
> 100000 packets transmitted, 100000 received, 0% packet loss, time 2363ms
> rtt min/avg/max/mdev = 0.014/0.015/0.322/0.006 ms, ipg/ewma 0.023/0.016 ms
> 
> On 22 January 2018 at 22:37, Nick Fisk <n...@fisk.me.uk> wrote: 
> 
>> Anyone with 25G ethernet willing to do the test? Would love to see what the
>> latency figures are for that.
>> 
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>> Maged Mokhtar
>> Sent: 22 January 2018 11:28
>> To: ceph-users@lists.ceph.com
>> Subject: Re: [ceph-users] What is the should be the expected latency of
>> 10Gbit network connections
>> 
>> On 2018-01-22 08:39, Wido den Hollander wrote:
>> 
>> On 01/20/2018 02:02 PM, Marc Roos wrote:
>> 
>> If I test my connections with sockperf via a 1Gbit switch I get around
>> 25usec, when I test the 10Gbit connection via the switch I have around
>> 12usec is that normal? Or should there be a differnce of 10x.
>> 
>> No, that's normal.
>> 
>> Tests with 8k ping packets over different links I did:
>> 
>> 1GbE:  0.800ms
>> 10GbE: 0.200ms
>> 40GbE: 0.150ms
>> 
>> Wido
>> 
>> sockperf ping-pong
>> 
>> sockperf: Warmup stage (sending a few dummy messages)...
>> sockperf: Starting test...
>> sockperf: Test end (interrupted by timer)
>> sockperf: Test ended
>> sockperf: [Total Run] RunTime=10.100 sec; SentMessages=432875;
>> ReceivedMessages=432874
>> sockperf: ========= Printing statistics for Server No: 0
>> sockperf: [Valid Duration] RunTime=10.000 sec; SentMessages=428640;
>> ReceivedMessages=428640
>> sockperf: ====> avg-lat= 11.609 (std-dev=1.684)
>> sockperf: # dropped messages = 0; # duplicated messages = 0; #
>> out-of-order messages = 0
>> sockperf: Summary: Latency is 11.609 usec
>> sockperf: Total 428640 observations; each percentile contains 4286.40
>> observations
>> sockperf: ---> <MAX> observation =  856.944
>> sockperf: ---> percentile  99.99 =   39.789
>> sockperf: ---> percentile  99.90 =   20.550
>> sockperf: ---> percentile  99.50 =   17.094
>> sockperf: ---> percentile  99.00 =   15.578
>> sockperf: ---> percentile  95.00 =   12.838
>> sockperf: ---> percentile  90.00 =   12.299
>> sockperf: ---> percentile  75.00 =   11.844
>> sockperf: ---> percentile  50.00 =   11.409
>> sockperf: ---> percentile  25.00 =   11.124
>> sockperf: ---> <MIN> observation =    8.888
>> 
>> sockperf: Warmup stage (sending a few dummy messages)...
>> sockperf: Starting test...
>> sockperf: Test end (interrupted by timer)
>> sockperf: Test ended
>> sockperf: [Total Run] RunTime=1.100 sec; SentMessages=22065;
>> ReceivedMessages=22064
>> sockperf: ========= Printing statistics for Server No: 0
>> sockperf: [Valid Duration] RunTime=1.000 sec; SentMessages=20056;
>> ReceivedMessages=20056
>> sockperf: ====> avg-lat= 24.861 (std-dev=1.774)
>> sockperf: # dropped messages = 0; # duplicated messages = 0; #
>> out-of-order messages = 0
>> sockperf: Summary: Latency is 24.861 usec
>> sockperf: Total 20056 observations; each percentile contains 200.56
>> observations
>> sockperf: ---> <MAX> observation =   77.158
>> sockperf: ---> percentile  99.99 =   54.285
>> sockperf: ---> percentile  99.90 =   37.864
>> sockperf: ---> percentile  99.50 =   34.406
>> sockperf: ---> percentile  99.00 =   33.337
>> sockperf: ---> percentile  95.00 =   27.497
>> sockperf: ---> percentile  90.00 =   26.072
>> sockperf: ---> percentile  75.00 =   24.618
>> sockperf: ---> percentile  50.00 =   24.443
>> sockperf: ---> percentile  25.00 =   24.361
>> sockperf: ---> <MIN> observation =   16.746
>> [root@c01 sbin]# sockperf ping-pong -i 192.168.0.12 -p 5001 -t 10
>> sockperf: == version #2.6 ==
>> sockperf[CLIENT] send on:sockperf: using recvfrom() to block on
>> socket(s)
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> I find the ping command with flood option handy to measure latency, gives
>> stats min/max/average/std deviation
>> 
>> example:
>> 
>> ping  -c 100000 -f 10.0.1.12
>> 
>> Maged
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

The ip flood test will show hardware link level latency. sockperf will
show latency user space tcp socket applications will see due to kernel
context switches, interrupts, transmission buffers, tcp ack..etc So: 
sockperf is a better latency measurement to what Ceph clients will see. 
The flood latency gives a better picture of expected iops which is the
inverse of latency at the link level.( at the app level with
concurrency, iops is not related to latency ) 

Maybe with SPDK/RDMA, Ceph latency will be close to link latency.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to