Not been able to make any headway on this after some significant effort.

-Tested all 48 SSDs with FIO directly, all tested with 10% of each other for 4k 
iops in rand|seq read|write.
-Disabled all CPU power save.
-Tested with both rbd cache enabled and disabled on the client.
-Tested with drive caches enabled and disabled (hdparm)
-Minimal TCP retransmissions under load (<10 for a 2 minute duration).
-No drops/pause frames noted on upstream switches.
-CPU load on OSD nodes peaks at 6~.
-iostat shows a peak of 15ms under read/write workloads, %util peaks at about 
10%.
-Swapped out the RBD client for a bigger box, since the load was peaking at 16. 
Now a 24 core box, load still peaks at 16.
-Disabled cephx signatures
-Verified hardware health (nothing in dmesg, nothing in CIMC fault logs, 
storage controller logs)
-Test multiple SSDs at once to find the controllers iops limit, which is 
apparently 650k @ 4k.

Nothing has made a noticeable difference here. I'm pretty baffled as to what 
would be causing the awful sequential read and write performance, but allowing 
good random r/w speeds.

I switched up fio testing methodologies to use more threads, but this didn't 
seem to help either:

[global]
bs=4k
ioengine=rbd
iodepth=32
size=5g
runtime=120
numjobs=4
group_reporting=1
pool=rbd_af1
rbdname=image1

[seq-read]
rw=read
stonewall

[rand-read]
rw=randread
stonewall

[seq-write]
rw=write
stonewall

[rand-write]
rw=randwrite
stonewall

Any pointers are appreciated at this point. I've been following other threads 
on the mailing list, and looked at the archives, related to RBD performance but 
none of the solutions that worked for others seem to have helped this setup.

Thanks,
Anthony

________________________________
From: Anthony Brandelli (abrandel) <abran...@cisco.com>
Sent: Tuesday, January 14, 2020 12:43 AM
To: ceph-users@lists.ceph.com <ceph-users@lists.ceph.com>
Subject: Slow Performance - Sequential IO


I have a newly setup test cluster that is giving some surprising numbers when 
running fio against an RBD. The end goal here is to see how viable a Ceph based 
iSCSI SAN of sorts is for VMware clusters, which require a bunch of random IO.



Hardware:

2x E5-2630L v2 (2.4GHz, 6 core)

256GB RAM

2x 10gbps bonded network, Intel X520

LSI 9271-8i, SSDs used for OSDs in JBOD mode

Mons: 2x 1.2TB 10K SAS in RAID1

OSDs: 12x Samsung MZ6ER800HAGL-00003 800GB SAS SSDs, super cap/power loss 
protection



Cluster setup:

Three mon nodes, four OSD nodes

Two OSDs per SSD

Replica 3 pool

Ceph 14.2.5



Ceph status:

  cluster:

    id:     e3d93b4a-520c-4d82-a135-97d0bda3e69d

    health: HEALTH_WARN

            application not enabled on 1 pool(s)

  services:

    mon: 3 daemons, quorum mon1,mon2,mon3 (age 6d)

    mgr: mon2(active, since 6d), standbys: mon3, mon1

    osd: 96 osds: 96 up (since 3d), 96 in (since 3d)

  data:

    pools:   1 pools, 3072 pgs

    objects: 857.00k objects, 1.8 TiB

    usage:   432 GiB used, 34 TiB / 35 TiB avail

    pgs:     3072 active+clean



Network between nodes tests at 9.88gbps. Direct testing of the SSDs using a 4K 
block in fio shows 127k seq read, 86k randm read, 107k seq write, 52k random 
write. No high CPU load/interface saturation is noted when running tests 
against the rbd.



When testing with a 4K block size against an RBD on a dedicated metal test host 
(same specs as other cluster nodes noted above) I get the following (command 
similar to fio -ioengine=rbd -direct=1 -name=test -bs=4k -iodepth=32 -rw=XXXX 
-pool=scbench -runtime=60 -rbdname=datatest):



10k sequential read iops

69k random read iops

13k sequential write iops

22k random write iops



I’m not clear why the random ops, especially read, would be so much quicker 
compared to the sequential ops.



Any points appreciated.



Thanks,

Anthony
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to