[ceph-users] Re: recommendation for barebones server with 8-12 direct attach NVMe?
On 14/1/2024 1:57 pm, Anthony D'Atri wrote: The OP is asking about new servers I think. I was looking his statement below relating to using hardware laying around, just putting out there some options which worked for use. So we were going to replace a Ceph cluster with some hardware we had laying around using SATA HBAs but I was told that the only right way to build Ceph in 2023 is with direct attach NVMe. Cheers ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: recommendation for barebones server with 8-12 direct attach NVMe?
The OP is asking about new servers I think. > On Jan 13, 2024, at 9:36 PM, Mike O'Connor wrote: > > Because it's almost impossible to purchase the equipment required to convert > old drive bays to u.2 etc. > > The M.2's we purchased are enterprise class. > > Mike > > >> On 14/1/2024 12:53 pm, Anthony D'Atri wrote: >> Why use such a card and M.2 drives that I suspect aren’t enterprise-class? >> Instead of U.2, E1.s, or E3.s ? >> On Jan 13, 2024, at 5:10 AM, Mike O'Connor wrote: >>> >>> On 13/1/2024 1:02 am, Drew Weaver wrote: Hello, So we were going to replace a Ceph cluster with some hardware we had laying around using SATA HBAs but I was told that the only right way to build Ceph in 2023 is with direct attach NVMe. Does anyone have any recommendation for a 1U barebones server (we just drop in ram disks and cpus) with 8-10 2.5" NVMe bays that are direct attached to the motherboard without a bridge or HBA for Ceph specifically? Thanks, -Drew ___ ceph-users mailing list --ceph-users@ceph.io To unsubscribe send an emailtoceph-users-le...@ceph.io >>> Hi >>> >>> You need to use PCIe card with a PCIe switch, cards with 4 x m.2 NVME are >>> cheap enough around $USD180 from Aliexpress. >>> >>> There are companies with cards which have many more m.2 ports but the cost >>> goes up greatly. >>> >>> We just build a 3x1RU G9 HP cluster with 4 x 2T m.2 NVME using Dual 40G >>> Ethernet ports and dual 10G Ethernet and a second hand Arisa 16 port 40G >>> switch. >>> >>> It works really well. >>> >>> Cheers >>> >>> Mike >>> ___ >>> ceph-users mailing list --ceph-users@ceph.io >>> To unsubscribe send an email toceph-users-le...@ceph.io > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: recommendation for barebones server with 8-12 direct attach NVMe?
Because it's almost impossible to purchase the equipment required to convert old drive bays to u.2 etc. The M.2's we purchased are enterprise class. Mike On 14/1/2024 12:53 pm, Anthony D'Atri wrote: Why use such a card and M.2 drives that I suspect aren’t enterprise-class? Instead of U.2, E1.s, or E3.s ? On Jan 13, 2024, at 5:10 AM, Mike O'Connor wrote: On 13/1/2024 1:02 am, Drew Weaver wrote: Hello, So we were going to replace a Ceph cluster with some hardware we had laying around using SATA HBAs but I was told that the only right way to build Ceph in 2023 is with direct attach NVMe. Does anyone have any recommendation for a 1U barebones server (we just drop in ram disks and cpus) with 8-10 2.5" NVMe bays that are direct attached to the motherboard without a bridge or HBA for Ceph specifically? Thanks, -Drew ___ ceph-users mailing list --ceph-users@ceph.io To unsubscribe send an emailtoceph-users-le...@ceph.io Hi You need to use PCIe card with a PCIe switch, cards with 4 x m.2 NVME are cheap enough around $USD180 from Aliexpress. There are companies with cards which have many more m.2 ports but the cost goes up greatly. We just build a 3x1RU G9 HP cluster with 4 x 2T m.2 NVME using Dual 40G Ethernet ports and dual 10G Ethernet and a second hand Arisa 16 port 40G switch. It works really well. Cheers Mike ___ ceph-users mailing list --ceph-users@ceph.io To unsubscribe send an email toceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: recommendation for barebones server with 8-12 direct attach NVMe?
Why use such a card and M.2 drives that I suspect aren’t enterprise-class? Instead of U.2, E1.s, or E3.s ? > On Jan 13, 2024, at 5:10 AM, Mike O'Connor wrote: > > On 13/1/2024 1:02 am, Drew Weaver wrote: >> Hello, >> >> So we were going to replace a Ceph cluster with some hardware we had laying >> around using SATA HBAs but I was told that the only right way to build Ceph >> in 2023 is with direct attach NVMe. >> >> Does anyone have any recommendation for a 1U barebones server (we just drop >> in ram disks and cpus) with 8-10 2.5" NVMe bays that are direct attached to >> the motherboard without a bridge or HBA for Ceph specifically? >> >> Thanks, >> -Drew >> >> ___ >> ceph-users mailing list --ceph-users@ceph.io >> To unsubscribe send an email toceph-users-le...@ceph.io > > Hi > > You need to use PCIe card with a PCIe switch, cards with 4 x m.2 NVME are > cheap enough around $USD180 from Aliexpress. > > There are companies with cards which have many more m.2 ports but the cost > goes up greatly. > > We just build a 3x1RU G9 HP cluster with 4 x 2T m.2 NVME using Dual 40G > Ethernet ports and dual 10G Ethernet and a second hand Arisa 16 port 40G > switch. > > It works really well. > > Cheers > > Mike > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Recomand number of k and m erasure code
There are nuances, but in general the higher the sum of m+k, the lower the performance, because *every* operation has to hit that many drives, which is especially impactful with HDDs. So there’s a tradeoff between storage efficiency and performance. And as you’ve seen, larger parity groups especially mean slower recovery/backfill. There’s also a modest benefit to choosing values of m and k that have small prime factors, but I wouldn’t worry too much about that. You can find EC efficiency tables on the net: https://docs.netapp.com/us-en/storagegrid-116/ilm/what-erasure-coding-schemes-are.html I should really add a table to the docs, making a note to do that. There’s a nice calculator at the OSNEXUS site: https://www.osnexus.com/ceph-designer The overhead factor is (k+m) / k So for a 4,2 profile, that’s 6 / 4 == 1.5 For 6,2, 8 / 6 = 1.33 For 10,2, 12 / 10 = 1.2 and so forth. As k increases, the incremental efficiency gain sees diminishing returns, but performance continues to decrease. Think of m as the number of copies you can lose without losing data, and m-1 as the number you can lose / have down and still have data *available*. I also suggest that the number of failure domains — in your cases this means OSD nodes — be *at least* k+m+1, so in your case you want k+m to be at most 9. With RBD and many CephFS implementations, we mostly have relatively large RADOS objects that are striped over many OSDs. When using RGW especially, one should attend to average and median S3 object size. There’s an analysis of the potential for space amplification in the docs so I won’t repeat it here in detail. This sheet https://docs.google.com/spreadsheets/d/1rpGfScgG-GLoIGMJWDixEkqs-On9w8nAUToPQjN8bDI/edit#gid=358760253 visually demonstrates this. Basically, for an RGW bucket pool — or for a CephFS data pool storing unusually small objects — if you have a lot of S3 objects in the multiples of KB size, you waste a significant fraction of underlying storage. This is exacerbated by EC, and the larger the sum of k+m, the more waste. When people ask me about replication vs EC and EC profile, the first question I ask is what they’re storing. When EC isn’t a non-starter, I tend to recommend 4,2 as a profile until / unless someone has specific needs and can understand the tradeoffs. This lets you store ~~ 2x the data of 3x replication while not going overboard on the performance hit. If you care about your data, do not set m=1. If you need to survive the loss of many drives, say if your cluster is across multiple buildings or sites, choose a larger value of k. There are people running profiles like 4,6 because they have unusual and specific needs. > On Jan 13, 2024, at 10:32 AM, Phong Tran Thanh wrote: > > Hi ceph user! > > I need to determine which erasure code values (k and m) to choose for a > Ceph cluster with 10 nodes. > > I am using the reef version with rbd. Furthermore, when using a larger k, > for example, ec6+2 and ec4+2, which erasure coding performance is better, > and what are the criteria for choosing the appropriate erasure coding? > Please help me > > Email: tranphong...@gmail.com > Skype: tranphong079 > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Recomand number of k and m erasure code
Hi ceph user! I need to determine which erasure code values (k and m) to choose for a Ceph cluster with 10 nodes. I am using the reef version with rbd. Furthermore, when using a larger k, for example, ec6+2 and ec4+2, which erasure coding performance is better, and what are the criteria for choosing the appropriate erasure coding? Please help me Email: tranphong...@gmail.com Skype: tranphong079 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph Nautilous 14.2.22 slow OSD memory leak?
Hi, > On Jan 12, 2024, at 12:01, Frédéric Nass > wrote: > > Hard to tell for sure since this bug hit different major versions of the > kernel, at least RHEL's from what I know. In what RH kernel release this issue was fixed? Thanks, k ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: recommendation for barebones server with 8-12 direct attach NVMe?
On 13/1/2024 1:02 am, Drew Weaver wrote: Hello, So we were going to replace a Ceph cluster with some hardware we had laying around using SATA HBAs but I was told that the only right way to build Ceph in 2023 is with direct attach NVMe. Does anyone have any recommendation for a 1U barebones server (we just drop in ram disks and cpus) with 8-10 2.5" NVMe bays that are direct attached to the motherboard without a bridge or HBA for Ceph specifically? Thanks, -Drew ___ ceph-users mailing list --ceph-users@ceph.io To unsubscribe send an email toceph-users-le...@ceph.io Hi You need to use PCIe card with a PCIe switch, cards with 4 x m.2 NVME are cheap enough around $USD180 from Aliexpress. There are companies with cards which have many more m.2 ports but the cost goes up greatly. We just build a 3x1RU G9 HP cluster with 4 x 2T m.2 NVME using Dual 40G Ethernet ports and dual 10G Ethernet and a second hand Arisa 16 port 40G switch. It works really well. Cheers Mike ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io