I'm tryting to debug low performance on nvme-based cluster.
I have 24 NVME in 4 servers, plenty of CPU, cluster perfectly balanced,
no scrubbing or replication atm, 1024 pg for 24 OSD/17TB.
I expect to see some performance (~250k IOPS total, 50% r/w, few hundred
volumes with capped by IOPS, pre-warmed). I see about half of it.
I looked at drive utilization, it's about 70% (per atop), but I've
noticed, that in-flight for drives is, basically, around 1. That means,
that at a given time only one request is processed. This is match match
for OSD count/3 /latency formula, and with one in-flight nvme is showing
about 10% of it's specs (Intel, DC grade).
I looked at osd (they loaded uniformely, so any of them shows the same
results).
I see "op_wip": 10-27, but in-flight value is about 0-2, mostly around 1.
I can't get away from the feeling that somehow osd is doing operations
(almost) sequential. I already played with osd_op_num_threads_per_shard
(4), osd_op_num_shards (8), set mclock profile to high_client_ops,
ms_async_op_threads 24, and it can't get more inflight ios.
I feel I miss something. How to make Ceph to send more requests to
underlaying NVME?
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]