Re: [ceph-users] Performance doesn't scale well on a full ssd cluster.

Mark Wu Fri, 17 Oct 2014 03:07:22 -0700

Decrease the rbd debug level from 5 to 0,  and almost all the debugging and
logging are disabled. It doesn't help.
ceph tell osd.* injectargs '--debug_rbd 0\/0'
ceph tell osd.* injectargs '--debug_objectcacher 0\/0'
ceph tell osd.* injectargs '--debug_rbd_replay 0\/0'



2014-10-17 8:45 GMT+08:00 Shu, Xinxin <xinxin....@intel.com>:

>  We do observe the same issue on our 12 SSD setup, disable the all log
> maybe helpful.
>
>
>
> Cheers,
>
> xinxin
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Mark Wu
> *Sent:* Friday, October 17, 2014 12:18 AM
> *To:* ceph-users@lists.ceph.com
> *Subject:* [ceph-users] Performance doesn't scale well on a full ssd
> cluster.
>
>
>
> Hi list,
>
>
>
> During my test, I found ceph doesn't scale as I expected on a 30 osds
> cluster.
>
> The following is the information of my setup:
>
> HW configuration:
>
>    15 Dell R720 servers, and each server has:
>
>       Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, 20 cores and hyper-thread
> enabled.
>
>       128GB memory
>
>       two Intel 3500 SSD disks, connected with MegaRAID SAS 2208
> controller, each disk is configured as raid0 separately.
>
>       bonding with two 10GbE nics, used for both the public network and
> cluster network.
>
>
>
> SW configuration:
>
>    OS CentOS 6.5, Kernel 3.17,  Ceph 0.86
>
>    XFS as file system for data.
>
>    each SSD disk has two partitions, one is osd data and the other is osd
> journal.
>
>    the pool has 2048 pgs. 2 replicas.
>
>    5 monitors running on 5 of the 15 servers.
>
>    Ceph configuration (in memory debugging options are disabled)
>
>
>
> [osd]
>
> osd data = /var/lib/ceph/osd/$cluster-$id
>
> osd journal = /var/lib/ceph/osd/$cluster-$id/journal
>
> osd mkfs type = xfs
>
> osd mkfs options xfs = -f -i size=2048
>
> osd mount options xfs = rw,noatime,logbsize=256k,delaylog
>
> osd journal size = 20480
>
> osd mon heartbeat interval = 30 # Performance tuning filestore
>
> osd_max_backfills = 10
>
> osd_recovery_max_active = 15
>
> merge threshold = 40
>
> filestore split multiple = 8
>
> filestore fd cache size = 1024
>
> osd op threads = 64 # Recovery tuning osd recovery max active = 1 osd max
>
> backfills = 1
>
> osd recovery op priority = 1
>
> throttler perf counter = false
>
> osd enable op tracker = false
>
> filestore_queue_max_ops = 5000
>
> filestore_queue_committing_max_ops = 5000
>
> journal_max_write_entries = 1000
>
> journal_queue_max_ops = 5000
>
> objecter_inflight_ops = 8192
>
>
>
>
>
>   When I test with 7 servers (14 osds),  the maximum iops of 4k random
> write I saw is 17k on single volume and 44k on the whole cluster.
>
> I expected the number of 30 osds cluster could approximate 90k. But
> unfornately,  I found that with 30 osds, it almost provides the performce
>
> as 14 osds, even worse sometime. I checked the iostat output on all the
> nodes, which have similar numbers. It's well distributed but disk
> utilization is low.
>
> In the test with 14 osds, I can see higher utilization of disk (80%~90%).
> So do you have any tunning suggestion to improve the performace with 30
> osds?
>
> Any feedback is appreciated.
>
>
>
>
>
> iostat output:
>
> Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz
> avgqu-sz   await  svctm  %util
>
> sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00
>     0.00    0.00   0.00   0.00
>
> sdb               0.00    88.50    0.00 5188.00     0.00 93397.00    18.00
>     0.90    0.17   0.09  47.85
>
> sdc               0.00   443.50    0.00 5561.50     0.00 97324.00    17.50
>     4.06    0.73   0.09  47.90
>
> dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00
>     0.00    0.00   0.00   0.00
>
> dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00
>     0.00    0.00   0.00   0.00
>
>
>
> Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz
> avgqu-sz   await  svctm  %util
>
> sda               0.00    17.50    0.00   28.00     0.00  3948.00   141.00
>     0.01    0.29   0.05   0.15
>
> sdb               0.00    69.50    0.00 4932.00     0.00 87067.50    17.65
>     2.27    0.46   0.09  43.45
>
> sdc               0.00    69.00    0.00 4855.50     0.00 105771.50
>  21.78     0.95    0.20   0.10  46.40
>
> dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00
>     0.00    0.00   0.00   0.00
>
> dm-1              0.00     0.00    0.00   42.50     0.00  3948.00    92.89
>     0.01    0.19   0.04   0.15
>
>
>
> Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz
> avgqu-sz   await  svctm  %util
>
> sda               0.00    12.00    0.00    8.00     0.00   568.00    71.00
>     0.00    0.12   0.12   0.10
>
> sdb               0.00    72.50    0.00 5046.50     0.00 113198.50
>  22.43     1.09    0.22   0.10  51.40
>
> sdc               0.00    72.50    0.00 4912.00     0.00 91204.50    18.57
>     2.25    0.46   0.09  43.60
>
> dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00
>     0.00    0.00   0.00   0.00
>
> dm-1              0.00     0.00    0.00   18.00     0.00   568.00    31.56
>     0.00    0.17   0.06   0.10
>
>
>
>
>
>
>
> Regards,
>
> Mark Wu
>
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Performance doesn't scale well on a full ssd cluster.

Reply via email to