We're seeing ~5800 IOPS, ~23 MiB/s on 4 KiB IO (stripe_width 8192) on a pool that could do 3 GiB/s with 4M blocksize. So, yeah, well, that is rather harsh, even for EC.
4kb IO is slow in Ceph even without EC. Your 3 GB/s linear writes don't matter anything. Ceph adds a significant overhead to each operation.
From my observations 4kb random write throughput with iodepth=128 in a full-flash cluster is only ~30% lower with EC 2+1 than with 3 replicas.
With iodepth=1 and in an HDD+SSD setup it's worse: I get 100-120 write iops with EC and 500+ write iops with 3 replicas. I guess this is because in a replicated pool Ceph can just write new block to the deferred write queue and with EC it must first read the corresponding block from another OSD, and SSD journal doesn't help reads. But I don't remember exact test results for iodepth=128...
-- With best regards, Vitaliy Filippov _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com