[ceph-users] RBD poor performance

Weird Deviations Wed, 27 Feb 2019 01:56:23 -0800

Hello all
I faced with poor performance on RBD images
First, my lab's hardware consists of 3 intel server with
- 2 intel xeon e5-2660 v4 (all powersaving stuff are turned off in BIOS)
running on
- S2600TPR MOBO
- 256 Gb RAM
- 4 Sata SSD intel 960 Gb model DC S3520 for OSD
- 2 Sata SSD intel 480 Gb model DC S3520 for OS
- 1 PCI-e NVMe intel 800 Gb model DC P3700 Series for writeback pool
- dual port ixgbe 10Gb/s NIC
in each
All this stuff running under CentOS 7.6 on
kernel 4.14.15-1.el7.elrepo.x86_64
Network interfaces run in teaming


Each of these 3 servers act as mon-host, OSD-host, mgr-host and RBD-host:

ceph -s
  cluster:
    id:     6dc5b328-f8be-4c52-96b7-d20a1f78b067
    health: HEALTH_WARN
            Failed to send data to Zabbix
            1548 slow ops, oldest one blocked for 63205 sec,
mon.alfa-csn-03 has slow ops

  services:
    mon: 3 daemons, quorum alfa-csn-01,alfa-csn-02,alfa-csn-03
    mgr: alfa-csn-03(active), standbys: alfa-csn-02, alfa-csn-01
    osd: 27 osds: 27 up, 27 in
    rgw: 3 daemons active

  data:
    pools:   8 pools, 2592 pgs
    objects: 219.0 k objects, 810 GiB
    usage:   1.3 TiB used, 9.4 TiB / 11 TiB avail
    pgs:     2592 active+clean

I created 2 OSD per SSD and using them to store data and 1 OSD on NVMe for
write cache
Also i created erasure profile:
crush-device-class=
crush-failure-domain=host
crush-root=default
k=2
m=1
plugin=isa
technique=reed_sol_van

and organized pool `vmstor' under this profile with 1024 pg and pgp
Here is crush rule for `vmstor' pool:
rule vmstor {
        id 1
        type erasure
        min_size 3
        max_size 3
        step set_chooseleaf_tries 50
        step set_choose_tries 100
        step take data
        step chooseleaf indep 0 type host-data
        step emit
}
host-data alfa-csn-01-ssd {
        id -5           # do not change unnecessarily
        id -6 class ssd         # do not change unnecessarily
        alg straw2
        hash 0  # rjenkins1
        item osd.0 weight 1.000
        item osd.1 weight 1.000
        item osd.2 weight 1.000
        item osd.3 weight 1.000
        item osd.4 weight 1.000
        item osd.5 weight 1.000
        item osd.6 weight 1.000
        item osd.7 weight 1.000
}
host-data alfa-csn-02-ssd {
        id -7           # do not change unnecessarily
        id -8 class ssd         # do not change unnecessarily
        alg straw2
        hash 0  # rjenkins1
        item osd.8 weight 1.000
        item osd.9 weight 1.000
        item osd.10 weight 1.000
        item osd.11 weight 1.000
        item osd.12 weight 1.000
        item osd.13 weight 1.000
        item osd.14 weight 1.000
        item osd.15 weight 1.000
}
host-data alfa-csn-03-ssd {
        id -9           # do not change unnecessarily
        id -10 class ssd                # do not change unnecessarily
        alg straw2
        hash 0  # rjenkins1
        item osd.16 weight 1.000
        item osd.17 weight 1.000
        item osd.18 weight 1.000
        item osd.19 weight 1.000
        item osd.20 weight 1.000
        item osd.21 weight 1.000
        item osd.22 weight 1.000
        item osd.23 weight 1.000
}

Also there was created pool named `wb-vmstor' with 256 pg and pgs as hot
tier for `vmstor':
rule wb-vmstor {
        id 4
        type replicated
        min_size 2
        max_size 3
        step take wb
        step set_chooseleaf_tries 50
        step set_choose_tries 100
        step chooseleaf firstn 0 type host-wb
        step emit
}


Then pool `vmstor' was inited as rbd pool, and a few images were created in
it
These images was plugged as disks to 2 qemu-kvm virtual machines - 4 images
per VM using native RBD support in QEMU
Qemu servers are running on the same (but separated) servers, i.e. xeon
e5-2660v4, 256 ram and so on
And then fio tests were performed on these disks

Results:
1) in case of using this virtual drive as raw block devices i got about 400
IOPS by 4kb or 8 kb (or another other size till 1Mb) blocks on random write
2) after i created filesystems on these drives and mounted them in system
and got about 20k IOPS.
And it doesn't matter if i run test on single or both VMs - i have total
20k IOPS. I mean i run fio test on one VM and have 20k IOPS, then i run fio
test on
2 VMs and have 10k IOPS on each VM
My fio job is:
[global]
numjobs=1
ioengine=libaio
buffered=0
direct=1
bs=8k
rw=randrw
rwmixread=0
iodepth=8
group_reporting=1
time_based=1

[vdb]
size=10G
directory=/mnt
filename=vdb
[vdc]
size=10G
directory=/mnt1
filename=vdc
[vdd]
size=10G
directory=/mnt2
filename=vdd
[vde]
size=10G
directory=/mnt3
filename=vde
[vdf]
size=10G
directory=/mnt4
filename=vdf

To my mind that result is not so good and i guess this hardware and CEPH
can produce much more
Please, help me find what i'm doing wrong

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] RBD poor performance

Reply via email to