[ceph-users] Re: Achieve no loss-of-write with 2 node failure?

Andrew Klaassen via ceph-users Tue, 16 Dec 2025 13:08:40 -0800

 On Tuesday, December 16, 2025 at 03:28:01 p.m. EST, Peter Grandi via 
ceph-users <[email protected]> wrote:


> * Ceph is designed around the idea of many servers and many
>  small drives with a few small drives per server as it needs to
>  have lots of IOPS-per-TB and low impact per system failure.
> * Minimizing up-front costs means fewer larger drives and higher
>  density servers.
> * The common result is extreme congestion during balancing and
>  recovery times and high latencies during parallel accesses.
>  This mailing list is full of "cost-optimized" horror stories.

Good to know, and thanks for the further explanations why.

> Note: larger HDDs have really low IOPS-per-TB; SSDs avoid that
> issue but cheap SSDs do not have PLP so write IOPS are much
> lower than read IOPS.

That is something I've seen mentioned a lot, so we've only got PLP drives on 
the shopping list. The tentative current shopping list is 24x 7.68TB Samsung 
PM893 or Kingston DC600M drives.

> Whether the drive is SSD or HDD larger
> ones also usually mean large PGs which is not so good. With SSDs
> at least it is possible (and in some cases advisable) to split
> them into multiple OSDs though.

Could we just increase the number of PGs to avoid this?

> That is indeed a good suggestion: the fewer the drives per
> server the better. Ideally just one drive per server :-).

This might just be possible, since we've got a couple of racks of render nodes 
that I can probably make the case to retire from render duties. Would I 
actually see a major advantage going from 6 nodes to 8, from 8 to 12, or from 
12 to 24? (Given 24 disks in each case.)

Andrew

  
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: Achieve no loss-of-write with 2 node failure?

Reply via email to