Re: [ceph-users] Cluster Performance very Poor

Julien Calvet Fri, 27 Dec 2013 15:21:45 -0800

Hello German


Could you remember me what and wich type of bus/adapter your hds are connected ?

Julien 

> Le 28 déc. 2013 à 00:10, German Anders <gand...@despegar.com> a écrit :
> 
> Hi Mark,
>             I've already make those changes but the performance is almost the 
> same, i make another test with a DD statement and the results were the same 
> (i've used all of the 73GB disks for the OSD's and also put the Journal 
> inside the OSD device), also noticed that the network is at Gb:
> 
> ceph@ceph-node04:~$ sudo rbd -m 10.1.1.151 -p ceph-cloud --size 102400 create 
> rbdCloud -k /etc/ceph/ceph.client.admin.keyring
> ceph@ceph-node04:~$ sudo rbd map -m 10.1.1.151 rbdCloud --pool ceph-cloud 
> --id admin -k /etc/ceph/ceph.client.admin.keyring
> ceph@ceph-node04:~$ sudo mkdir /mnt/rbdCloud
> ceph@ceph-node04:~$ sudo mkfs.xfs -l size=64m,lazy-count=1 -f 
> /dev/rbd/ceph-cloud/rbdCloud
> log stripe unit (4194304 bytes) is too large (maximum is 256KiB)
> log stripe unit adjusted to 32KiB
> meta-data=/dev/rbd/ceph-cloud/rbdCloud isize=256    agcount=17, 
> agsize=1637376 blks
>          =                       sectsz=512   attr=2, projid32bit=0
> data     =                       bsize=4096   blocks=26214400, imaxpct=25
>          =                       sunit=1024   swidth=1024 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal log           bsize=4096   blocks=16384, version=2
>          =                       sectsz=512   sunit=8 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> ceph@ceph-node04:~$ 
> ceph@ceph-node04:~$ sudo mount /dev/rbd/ceph-cloud/rbdCloud /mnt/rbdCloud
> ceph@ceph-node04:~$ cd /mnt/rbdCloud
> ceph@ceph-node04:/mnt/rbdCloud$
> ceph@ceph-node04:/mnt/rbdCloud$ for i in 1 2 3 4; do sudo dd if=/dev/zero 
> of=a bs=1M count=1000 conv=fdatasync; done
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes (1.0 GB) copied, 10.2545 s, 102 MB/s
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes (1.0 GB) copied, 10.0554 s, 104 MB/s
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes (1.0 GB) copied, 10.2352 s, 102 MB/s
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes (1.0 GB) copied, 10.1197 s, 104 MB/s
> ceph@ceph-node04:/mnt/rbdCloud$ 
> 
> OSD tree:
> 
> ceph@ceph-node05:~/ceph-cluster-prd$ sudo ceph osd tree
> # id    weight    type name    up/down    reweight
> -1    3.43    root default
> -2    0.6299        host ceph-node01
> 12    0.06999            osd.12    up    1    
> 13    0.06999            osd.13    up    1    
> 14    0.06999            osd.14    up    1    
> 15    0.06999            osd.15    up    1    
> 16    0.06999            osd.16    up    1    
> 17    0.06999            osd.17    up    1    
> 18    0.06999            osd.18    up    1    
> 19    0.06999            osd.19    up    1    
> 20    0.06999            osd.20    up    1    
> -3    0.6999        host ceph-node02
> 22    0.06999            osd.22    up    1    
> 23    0.06999            osd.23    up    1    
> 24    0.06999            osd.24    up    1    
> 25    0.06999            osd.25    up    1    
> 26    0.06999            osd.26    up    1    
> 27    0.06999            osd.27    up    1    
> 28    0.06999            osd.28    up    1    
> 29    0.06999            osd.29    up    1    
> 30    0.06999            osd.30    up    1    
> 31    0.06999            osd.31    up    1    
> -4    0.6999        host ceph-node03
> 32    0.06999            osd.32    up    1    
> 33    0.06999            osd.33    up    1    
> 34    0.06999            osd.34    up    1    
> 35    0.06999            osd.35    up    1    
> 36    0.06999            osd.36    up    1    
> 37    0.06999            osd.37    up    1    
> 38    0.06999            osd.38    up    1    
> 39    0.06999            osd.39    up    1    
> 40    0.06999            osd.40    up    1    
> 41    0.06999            osd.41    up    1    
> -5    0.6999        host ceph-node04
> 0    0.06999            osd.0    up    1    
> 1    0.06999            osd.1    up    1    
> 2    0.06999            osd.2    up    1    
> 3    0.06999            osd.3    up    1    
> 4    0.06999            osd.4    up    1    
> 5    0.06999            osd.5    up    1    
> 6    0.06999            osd.6    up    1    
> 7    0.06999            osd.7    up    1    
> 8    0.06999            osd.8    up    1    
> 9    0.06999            osd.9    up    1    
> -6    0.6999        host ceph-node05
> 10    0.06999            osd.10    up    1    
> 11    0.06999            osd.11    up    1    
> 42    0.06999            osd.42    up    1    
> 43    0.06999            osd.43    up    1    
> 44    0.06999            osd.44    up    1    
> 45    0.06999            osd.45    up    1    
> 46    0.06999            osd.46    up    1    
> 47    0.06999            osd.47    up    1    
> 48    0.06999            osd.48    up    1    
> 49    0.06999            osd.49    up    1
> 
> 
> Any ideas?
> 
> Thanks in advance,
>  
> German Anders
> 
> 
> 
> 
> 
> 
> 
>  
>> --- Original message --- 
>> Asunto: Re: [ceph-users] Cluster Performance very Poor 
>> De: Mark Nelson <mark.nel...@inktank.com> 
>> Para: <ceph-users@lists.ceph.com> 
>> Fecha: Friday, 27/12/2013 15:39
>> 
>>> On 12/27/2013 12:19 PM, German Anders wrote:
>>>     Hi Cephers,
>>> 
>>>          I've run a rados bench to measure the throughput of the cluster,
>>> and found that the performance is really poor:
>>> 
>>> The setup is the following:
>>> 
>>> OS: Ubuntu 12.10 Server 64 bits
>>> 
>>> 
>>> ceph-node01(mon) 10.77.0.101 ProLiant BL460c G7 32GB 8 x 2 Ghz
>>>                                   10.1.1.151 D2200sb Storage Blade
>>> (Firmware: 2.30)
>>> ceph-node02(mon) 10.77.0.102 ProLiant BL460c G7 64GB 8 x 2 Ghz
>>>                                   10.1.1.152 D2200sb Storage Blade
>>> (Firmware: 2.30)
>>> ceph-node03(mon) 10.77.0.103 ProLiant BL460c G6 32GB 8 x 2 Ghz
>>>                                   10.1.1.153 D2200sb Storage Blade
>>> (Firmware: 2.30)
>>> ceph-node04 10.77.0.104 ProLiant BL460c G7 32GB 8 x
>>> 2 Ghz
>>>                                  10.1.1.154 D2200sb Storage Blade
>>> (Firmware: 2.30)
>>> ceph-node05(deploy) 10.77.0.105 ProLiant BL460c G6    32GB 8 x
>>> 2 Ghz
>>>                                      10.1.1.155 D2200sb Storage
>>> Blade (Firmware: 2.30)
>> 
>> If your servers have controllers with writeback cache, please make sure 
>> it is enabled as that will likely help.
>> 
>>> 
>>> ceph-node01:
>>> 
>>>        /dev/sda 73G (OSD)
>>>        /dev/sdb 73G (OSD)
>>>        /dev/sdc 73G (OSD)
>>>        /dev/sdd 73G (OSD)
>>>        /dev/sde 73G (OSD)
>>>        /dev/sdf 73G (OSD)
>>>        /dev/sdg 73G (OSD)
>>>        /dev/sdh 73G (OSD)
>>>        /dev/sdi 73G (OSD)
>>>        /dev/sdj 73G (Journal)
>>>        /dev/sdk 500G (OSD)
>>>        /dev/sdl 500G (OSD)
>>>        /dev/sdn 146G (Journal)
>>> 
>>> ceph-node02:
>>> 
>>>        /dev/sda 73G (OSD)
>>>        /dev/sdb 73G (OSD)
>>>        /dev/sdc 73G (OSD)
>>>        /dev/sdd 73G (OSD)
>>>        /dev/sde 73G (OSD)
>>>        /dev/sdf 73G (OSD)
>>>        /dev/sdg 73G (OSD)
>>>        /dev/sdh 73G (OSD)
>>>        /dev/sdi 73G (OSD)
>>>        /dev/sdj 73G (Journal)
>>>        /dev/sdk 500G (OSD)
>>>        /dev/sdl 500G (OSD)
>>>        /dev/sdn 146G (Journal)
>>> 
>>> ceph-node03:
>>> 
>>>        /dev/sda 73G (OSD)
>>>        /dev/sdb 73G (OSD)
>>>        /dev/sdc 73G (OSD)
>>>        /dev/sdd 73G (OSD)
>>>        /dev/sde 73G (OSD)
>>>        /dev/sdf 73G (OSD)
>>>        /dev/sdg 73G (OSD)
>>>        /dev/sdh 73G (OSD)
>>>        /dev/sdi 73G (OSD)
>>>        /dev/sdj 73G (Journal)
>>>        /dev/sdk 500G (OSD)
>>>        /dev/sdl 500G (OSD)
>>>        /dev/sdn 73G (Journal)
>>> 
>>> ceph-node04:
>>> 
>>>        /dev/sda 73G (OSD)
>>>        /dev/sdb 73G (OSD)
>>>        /dev/sdc 73G (OSD)
>>>        /dev/sdd 73G (OSD)
>>>        /dev/sde 73G (OSD)
>>>        /dev/sdf 73G (OSD)
>>>        /dev/sdg 73G (OSD)
>>>        /dev/sdh 73G (OSD)
>>>        /dev/sdi 73G (OSD)
>>>        /dev/sdj 73G (Journal)
>>>        /dev/sdk 500G (OSD)
>>>        /dev/sdl 500G (OSD)
>>>        /dev/sdn 146G (Journal)
>>> 
>>> ceph-node05:
>>> 
>>>        /dev/sda 73G (OSD)
>>>        /dev/sdb 73G (OSD)
>>>        /dev/sdc 73G (OSD)
>>>        /dev/sdd 73G (OSD)
>>>        /dev/sde 73G (OSD)
>>>        /dev/sdf 73G (OSD)
>>>        /dev/sdg 73G (OSD)
>>>        /dev/sdh 73G (OSD)
>>>        /dev/sdi 73G (OSD)
>>>        /dev/sdj 73G (Journal)
>>>        /dev/sdk 500G (OSD)
>>>        /dev/sdl 500G (OSD)
>>>        /dev/sdn 73G (Journal)
>> 
>> Am I correct in assuming that you've put all of your journals for every 
>> disk in each node on two spinning disks? This is going to be quite 
>> slow, because Ceph does a full write of the data the journal for every 
>> real write. The general solution is to either use SSDs for journals 
>> (preferably multiple fast SSDs with high write endurance and only 3-6 
>> OSD journals each), or put the journals on a partition on the data disk.
>> 
>>> 
>>> And the OSD tree is:
>>> 
>>> root@ceph-node03:/home/ceph# ceph osd tree
>>> # id weight type name up/down reweight
>>> -1 7.27 root default
>>> -2 1.15 host ceph-node01
>>> 12 0.06999 osd.12 up 1
>>> 13 0.06999 osd.13 up 1
>>> 14 0.06999 osd.14 up 1
>>> 15 0.06999 osd.15 up 1
>>> 16 0.06999 osd.16 up 1
>>> 17 0.06999 osd.17 up 1
>>> 18 0.06999 osd.18 up 1
>>> 19 0.06999 osd.19 up 1
>>> 20 0.06999 osd.20 up 1
>>> 21 0.45 osd.21 up 1
>>> 22 0.06999 osd.22 up 1
>>> -3 1.53 host ceph-node02
>>> 23 0.06999 osd.23 up 1
>>> 24 0.06999 osd.24 up 1
>>> 25 0.06999 osd.25 up 1
>>> 26 0.06999 osd.26 up 1
>>> 27 0.06999 osd.27 up 1
>>> 28 0.06999 osd.28 up 1
>>> 29 0.06999 osd.29 up 1
>>> 30 0.06999 osd.30 up 1
>>> 31 0.06999 osd.31 up 1
>>> 32 0.45 osd.32 up 1
>>> 33 0.45 osd.33 up 1
>>> -4 1.53 host ceph-node03
>>> 34 0.06999 osd.34 up 1
>>> 35 0.06999 osd.35 up 1
>>> 36 0.06999 osd.36 up 1
>>> 37 0.06999 osd.37 up 1
>>> 38 0.06999 osd.38 up 1
>>> 39 0.06999 osd.39 up 1
>>> 40 0.06999 osd.40 up 1
>>> 41 0.06999 osd.41 up 1
>>> 42 0.06999 osd.42 up 1
>>> 43 0.45 osd.43 up 1
>>> 44 0.45 osd.44 up 1
>>> -5 1.53 host ceph-node04
>>> 0 0.06999 osd.0 up 1
>>> 1 0.06999 osd.1 up 1
>>> 2 0.06999 osd.2 up 1
>>> 3 0.06999 osd.3 up 1
>>> 4 0.06999 osd.4 up 1
>>> 5 0.06999 osd.5 up 1
>>> 6 0.06999 osd.6 up 1
>>> 7 0.06999 osd.7 up 1
>>> 8 0.06999 osd.8 up 1
>>> 9 0.45 osd.9 up 1
>>> 10 0.45 osd.10 up 1
>>> -6 1.53 host ceph-node05
>>> 11 0.06999 osd.11 up 1
>>> 45 0.06999 osd.45 up 1
>>> 46 0.06999 osd.46 up 1
>>> 47 0.06999 osd.47 up 1
>>> 48 0.06999 osd.48 up 1
>>> 49 0.06999 osd.49 up 1
>>> 50 0.06999 osd.50 up 1
>>> 51 0.06999 osd.51 up 1
>>> 52 0.06999 osd.52 up 1
>>> 53 0.45 osd.53 up 1
>>> 54 0.45 osd.54 up 1
>> 
>> Based on this, it appears your 500GB drives are weighted much higher 
>> than the 73GB drives. This will help even data distribution out, but 
>> unfortunately will cause the system to be slower if all of the OSDs are 
>> in the same pool. What this does is cause the 500GB drives to get a 
>> higher proportion of the writes than the other drives, but those drives 
>> are almost certainly no faster than the other ones. Because there is a 
>> limited number of outstanding IOs you can have (due to memory 
>> constraints), eventually all outstanding IOs will be waiting on the 
>> 500GB disks while the 73GB disks mostly sit around waiting for work.
>> 
>> What I'd suggest doing is putting all of your 73 disks in the same pool 
>> and your 500GB disks in another pool. I suspect that if you do that and 
>> put your journals on the first partition of each disk, you'll see some 
>> improvement in your benchmark results.
>> 
>>> 
>>> 
>>> And the result:
>>> 
>>> root@ceph-node03:/home/ceph# rados bench -p ceph-cloud 20 write -t 10
>>>     Maintaining 10 concurrent writes of 4194304 bytes for up to 20 seconds
>>> or 0 objects
>>>     Object prefix: benchmark_data_ceph-node03_29727
>>>       sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
>>>         0 0 0 0 0 0 - 0
>>>         1 10 30 20 79.9465 80 0.159295 0.378849
>>>         2 10 52 42 83.9604 88 0.719616 0.430293
>>>         3 10 74 64 85.2991 88 0.487685 0.412956
>>>         4 10 97 87 86.9676 92 0.351122 0.418814
>>>         5 10 123 113 90.3679 104 0.317011 0.418876
>>>         6 10 147 137 91.3012 96 0.562112 0.418178
>>>         7 10 172 162 92.5398 100 0.691045 0.413416
>>>         8 10 197 187 93.469 100 0.459424 0.415459
>>>         9 10 222 212 94.1915 100 0.798889 0.416093
>>>        10 10 248 238 95.1697 104 0.440002 0.415609
>>>        11 10 267 257 93.4252 76 0.48959 0.41531
>>>        12 10 289 279 92.9707 88 0.524622 0.420145
>>>        13 10 313 303 93.2016 96 1.02104 0.423955
>>>        14 10 336 326 93.1136 92 0.477328 0.420684
>>>        15 10 359 349 93.037 92 0.591118 0.418589
>>>        16 10 383 373 93.2204 96 0.600392 0.421916
>>>        17 10 407 397 93.3812 96 0.240166 0.419829
>>>        18 10 431 421 93.526 96 0.746706 0.420971
>>>        19 10 457 447 94.0757 104 0.237565 0.419025
>>> 2013-12-27 13:13:21.817874min lat: 0.101352 max lat: 1.81426 avg lat:
>>> 0.418242
>>>       sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
>>>        20 10 480 470 93.9709 92 0.489254 0.418242
>>>     Total time run: 20.258064
>>> Total writes made: 481
>>> Write size: 4194304
>>> Bandwidth (MB/sec): 94.975
>>> 
>>> Stddev Bandwidth: 21.7799
>>> Max bandwidth (MB/sec): 104
>>> Min bandwidth (MB/sec): 0
>>> Average Latency: 0.420573
>>> Stddev Latency: 0.226378
>>> Max latency: 1.81426
>>> Min latency: 0.101352
>>> root@ceph-node03:/home/ceph#
>>> 
>>> Thanks in advance,
>>> 
>>> Best regards,
>>> 
>>> *German Anders*
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cluster Performance very Poor

Reply via email to