28.06.2018 12:15, Qu Wenruo пишет: > > > On 2018年06月28日 16:16, Andrei Borzenkov wrote: >> On Thu, Jun 28, 2018 at 8:39 AM, Qu Wenruo <quwenruo.bt...@gmx.com> wrote: >>> >>> >>> On 2018年06月28日 11:14, r...@georgianit.com wrote: >>>> >>>> >>>> On Wed, Jun 27, 2018, at 10:55 PM, Qu Wenruo wrote: >>>> >>>>> >>>>> Please get yourself clear of what other raid1 is doing. >>>> >>>> A drive failure, where the drive is still there when the computer reboots, >>>> is a situation that *any* raid 1, (or for that matter, raid 5, raid 6, >>>> anything but raid 0) will recover from perfectly without raising a sweat. >>>> Some will rebuild the array automatically, >>> >>> WOW, that's black magic, at least for RAID1. >>> The whole RAID1 has no idea of which copy is correct unlike btrfs who >>> has datasum. >>> >>> Don't bother other things, just tell me how to determine which one is >>> correct? >>> >> >> When one drive fails, it is recorded in meta-data on remaining drives; >> probably configuration generation number is increased. Next time drive >> with older generation is not incorporated. Hardware controllers also >> keep this information in NVRAM and so do not even depend on scanning >> of other disks. > > Yep, the only possible way to determine such case is from external info. > > For device generation, it's possible to enhance btrfs, but at least we > could start from detect and refuse to RW mount to avoid possible further > corruption. > But anyway, if one really cares about such case, hardware RAID > controller seems to be the only solution as other software may have the > same problem. > > And the hardware solution looks pretty interesting, is the write to > NVRAM 100% atomic? Even at power loss? > >> >>> The only possibility is that, the misbehaved device missed several super >>> block update so we have a chance to detect it's out-of-date. >>> But that's not always working. >>> >> >> Why it should not work as long as any write to array is suspended >> until superblock on remaining devices is updated? > > What happens if there is no generation gap in device superblock? >
Well, you use "generation" in strict btrfs sense, I use "generation" generically. That is exactly what btrfs apparently lacks currently - some monotonic counter that is used to record such event. > If one device got some of its (nodatacow) data written to disk, while > the other device doesn't get data written, and neither of them reached > super block update, there is no difference in device superblock, thus no > way to detect which is correct. > Again, the very fact that device failed should have triggered update of superblock to record this information which presumably should increase some counter. >> >>> If you're talking about missing generation check for btrfs, that's >>> valid, but it's far from a "major design flaw", as there are a lot of >>> cases where other RAID1 (mdraid or LVM mirrored) can also be affected >>> (the brain-split case). >>> >> >> That's different. Yes, with software-based raid there is usually no >> way to detect outdated copy if no other copies are present. Having >> older valid data is still very different from corrupting newer data. > > While for VDI case (or any VM image file format other than raw), older > valid data normally means corruption. > Unless they have their own write-ahead log. >> Some file format may detect such problem by themselves if they have > internal checksum, but anyway, older data normally means corruption, > especially when partial new and partial old. > Yes, that's true. But there is really nothing that can be done here, even theoretically; it hardly a reason to not do what looks possible. > On the other hand, with data COW and csum, btrfs can ensure the whole > filesystem update is atomic (at least for single device). > So the title, especially the "major design flaw" can't be wrong any more. > >> >>>> others will automatically kick out the misbehaving drive. *none* of them >>>> will take back the the drive with old data and start commingling that data >>>> with good copy.)\ This behaviour from BTRFS is completely abnormal.. and >>>> defeats even the most basic expectations of RAID. >>> >>> RAID1 can only tolerate 1 missing device, it has nothing to do with >>> error detection. >>> And it's impossible to detect such case without extra help. >>> >>> Your expectation is completely wrong. >>> >> >> Well ... somehow it is my experience as well ... :) > > Acceptable, but not really apply to software based RAID1. > > Thanks, > Qu > >> >>>> >>>> I'm not the one who has to clear his expectations here. >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>>> the body of a message to majord...@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>> >
signature.asc
Description: OpenPGP digital signature