On Sun, Jun 15, 2025 at 05:02:07PM +0000, Aaron Friel wrote: > > > > > I'm looking to see if I can submit a patch to fix this, but it seems > > > > > like the durability bit field for devices may be only 2 bits, is that > > > > > right? > > > > > > > > That gets you values of 0-3. Why is that not enough? > > > > > > In bch2_mi_to_cpu, it looks like durability is encoded with a "bias" > > > (default value) that maps {0,1,2,3} => {1,0,1,2}. > > > > > > .durability = BCH_MEMBER_DURABILITY(mi) > > > ? BCH_MEMBER_DURABILITY(mi) - 1 > > > : 1, > > > > > > This is pretty unfortunate, because it looks like if I want to use RAID6 > > > (replicas=3), I can't represent a device as having inherent durability of > > > RAID6 (durability=3). > > > > > > It doesn't look like too much work to add a feature flag > > > `BCH_FEATURE_durability_bias_v2` which when set, modifies the bias to > > > unconditionally add one to the 2-bit field, mapping {0,1,2,3} to > > > {1,2,3,4}. That would support even very large erasure encoded arrays as > > > well, where you might use something like RS (56,4) for a common 60 drive > > > JBOD. Practically speaking though I don't think anyone uses stripes that > > > wide in a single array. At least not for spinning rust, but it's been a > > > long time since I've worked with enterprise storage and I understand the > > > rules have changed with flash now. > > > > > > I can submit patches for implementing the feature if you want me to > > > submit them as a PR. Not sure about your stance on LLM-authored code > > > though. > > > > Actually there's an easier way, which I've done a few different times > before. We can extend BCH_MEMBER_DURABILITY to 4 bits (should be > sufficient, no?), with the high bits going whenever we've got room in > bch_member. > > > > Rename BCH_MEMBER_DURABILITY -> BCH_MEMBER_DURABILTIY_LO > > > > BCH_MEMBER_DURABILITY_HI for the new two bits > > > > Then write new get/set functions for BCH_MEMBER_DURABILITY that > reads/stores from the lo and hi fields. > > > > But we'd still want a new on disk format version for this, and then use > bch2_request_incompat_feature() whenever attempting to set a durability > htat doesn't fit in the old 2 bit field. > > Do we want the new field to be additive after saturating > BCH_MEMBER_DURABILITY_LO at 2, rather than treat it as a 4 bit field which > could result in an older kernel seeing 0b01 and interpreting it as 0? So: > > LO HI VALUE > # existing range: > 00 00 1 > 01 00 0 > 10 00 1 > 11 00 2 > # expanded range: > 11 01 3 > 11 10 4 > 11 11 5 > > Then an older kernel will read any device with durability >2 as having > durability=2, which is not ideal but I worry that durability=0 might result > in undefined (or unspecified?) behavior.
No, just make it an incompat feature, it's way simpler - older kernels that don't understand durability > 2 will never see them