[ceph-users] Re: Performance issues RGW (S3)

Anthony D'Atri Mon, 10 Jun 2024 06:20:59 -0700

> Hi all,
> 
> My Ceph setup:
> - 12 OSD nodes, 4 OSD nodes per rack. Replication of 3, 1 replica per rack.
> - 20 spinning SAS disks per node.

Don't use legacy HDDs if you care about performance.

> - Some nodes have 256GB RAM, some nodes 128GB.

128GB is on the low side for 20 OSDs.

> - CPU varies between Intel E5-2650 and Intel Gold 5317.

E5-2650 is underpowered for 20 OSDs.  5317 isn't the ideal fit, it'd make a 
decent MDS system but assuming a dual socket system, you have ~2 threads per 
OSD, which is maybe acceptable for HDDs, but I assume you have mon/mgr/rgw on 
some of them too.


> - Each node has 10Gbit/s network.
> 
> Using rados bench I am getting decent results (depending on block size):
> - 1000 MB/s throughput, 1000 IOps with 1MB block size
> - 30 MB/s throughput, 7500 IOps with 4K block size

rados bench is a useful for smoke testing but not always a reflection of E2E 
experience.

> 
> Unfortunately not getting the same performance with Rados Gateway (S3).
> 
> - 1x HAProxy with 3 backend RGW's.

Run an RGW on every node.


> I am using Minio Warp for benchmarking (PUT). I am 1 Warp server and 5 Warp 
> clients. Benchmarking towards the HAProxy.
> 
> Results:
> - Using 10MB object size, I am hitting the 10Gbit/s link of the HAProxy 
> server. Thats good.
> - Using 500K object size, I am getting a throughput of 70 up to 150 MB/s with 
> 140 up to 300 obj/s.

Tiny objects are the devil of any object storage deployment.  The HDDs are 
killing you here, especially for the index pool.  You might get a bit better by 
upping pg_num from the party line.

You might also disable Nagle on the RGW nodes.


> It depends on the concurrency setting of Warp.
> 
> It look likes the objects/s is the bottleneck, not the throughput.
> 
> Max memory usage is about 80-90GB per node. CPU's are quite idling.
> 
> Is it reasonable to expect more IOps / objects/s for RGW with my setup? At 
> this moment I am not able to find the bottleneck what is causing the low 
> obj/s.

HDDs are a false economy.

> 
> Ceph version is 15.2.
> 
> Thanks!
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Performance issues RGW (S3)

Reply via email to