[ceph-users] Re: NVMe and 2x Replica

Adam Boyhan Thu, 04 Feb 2021 09:58:11 -0800

All great input and points guys. 

Helps me lean towards 3 copes a bit more.

I mean honestly NVMe cost per TB isn't that much more than SATA SSD now. 
Somewhat surprised the salesmen aren't pitching 3x replication as it makes them 
more money. 

From: "Anthony D'Atri" <anthony.da...@gmail.com> 
To: "ceph-users" <ceph-users@ceph.io> 
Sent: Thursday, February 4, 2021 12:47:27 PM 
Subject: [ceph-users] Re: NVMe and 2x Replica 

> I searched each to find the section where 2x was discussed. What I found was 
> interesting. First, there are really only 2 positions here: Micron's and Red 
> Hat's. Supermicro copies Micron's positon paragraph word for word. Not 
> surprising considering that they are advertising a Supermicro / Micron 
> solution. 

FWIW, at Cephalocon another vendor made a similar claim during a talk. 

* Failure rates are averages, not minima. Some drives will always fail sooner 
* Firmware and other design flaws can result in much higher rates of failure or 
insidious UREs that can result in partial data unavailability or loss 
* Latent soft failures may not be detected until a deep scrub succeeds, which 
could be weeks later 
* In a distributed system, there are up/down/failure scenarios where the 
location of even one good / canonical / latest copy of data is unclear, 
especially when drive or HBA cache is in play. 
* One of these is a power failure. Sure PDU / PSU redundancy helps, but stuff 
happens, like a DC underprovisioning amps, so that a spike in user traffic 
results in the whole row going down :-x Various unpleasant things can happen. 

I was championing R3 even pre-Ceph when I was using ZFS or HBA RAID. As others 
have written, as drives get larger the time to fill them with replica data 
increases, as does the chance of overlapping failures. I’ve experieneced R2 
overlapping failures more than once, with and before Ceph. 

My sense has been that not many people run R2 for data they care about, and as 
has been written recently 2,2 EC is safer with the same raw:usable ratio. I’ve 
figured that vendors make R2 statements like these as a selling point to assert 
lower TCO. My first response is often “How much would it cost you directly, and 
indirectly in terms of user / customer goodwill, to loose data?”. 

> Personally, this looks like marketing BS to me. SSD shops want to sell SSDs, 
> but because of the cost difference they have to convince buyers that their 
> products are competitive. 

^this. I’m watching the QLC arena with interest for the potential to narrow the 
CapEx gap. Durability has been one concern, though I’m seeing newer products 
claiming that eg. ZNS improves that. It also seems that there are something 
like what, *4* separate EDSFF / ruler form factors, I really want to embrace 
those eg. for object clusters, but I’m VERY wary of the longevity of competing 
standards and any single-source for chassies or drives. 

> Our products cost twice as much, but LOOK you only need 2/3 as many, and you 
> get all these other benefits (performance). Plus, if you replace everything 
> in 2 or 3 years anyway, then you won't have to worry about them failing. 

Refresh timelines. You’re funny ;) Every time, every single time, that I’ve 
worked in an organization that claims a 3 (or 5, or whatever) hardware refresh 
cycle, it hasn’t happened. When you start getting close, the capex doesn’t 
materialize, or the opex cost of DC hands and operational oversight. “How do 
you know that the drives will start failing or getting slower? Let’s revisit 
this in 6 months”. Etc. 

_______________________________________________ 
ceph-users mailing list -- ceph-users@ceph.io 
To unsubscribe send an email to ceph-users-le...@ceph.io 
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: NVMe and 2x Replica

Reply via email to