Am Mon, 18 Sep 2017 20:30:41 +0200 schrieb Holger Hoffstätte <hol...@applied-asynchrony.com>:
> On 09/18/17 19:09, Liu Bo wrote: > > This 'mirror 0' looks fishy, (as mirror comes from > > btrfs_io_bio->mirror_num, which should be at least 1 if raid1 setup > > is in use.) > > > > Not sure if 4.13.2-gentoo made any changes on btrfs, but can you > > No, it did not; Gentoo always strives to be as close to mainline as > possible except for urgent security & low-risk convenience fixes. According to https://dev.gentoo.org/~mpagano/genpatches/patches-4.13-2.htm it's not only security patches. But as the list shows, there are indeed no btrfs patches. But there's one that may change btrfs behavior (tho unlikely), that is enabling native gcc optimizations if you choose so. I don't think that's a default option in Gentoo. I'm using native optimizations myself and see no strange mirror issues in btrfs. OTOH, I've lately switched to gentoo ck patchset to get better optimizations for gaming and realtime apps. But it's still at the 4.12 series. Are you sure the system crashed and wasn't just stuck at reading from the disks? If the disks have error correction and recovery enabled, the Linux block layer times out on the requests that the drives eventually won't fix anyways and resets the link after 30s. The drive timeout is 120s by default. You can change that on enterprise grade and NAS-ready drives, also a handful of desktop drives support it. Smartctl is used to set the values, just google "smartctl scterc". You could also adjust the timeout of the scsi layer to above the drive timeout, that means more than 120s if you cannot change scterc. I think it makes most sense to not reset the link before the drive had its chance to answer the request. I think there are pros and cons of changing these values. I always recommend to increase the scsi timeout above the scterc timeout. Personally, I lower the scterc timeout to 70 centisecs, and let the scsi timeout just at its default. RAID setups should use this to get control of their own error correction methods: The drive returns from request early and the RAID can do its job of reading from another copy, i.e. btrfs or mdraid, then repair it by writing back a correct copy which the drive converts into a sector relocation aka self-repair. Other people may jump in and recommend their own perspective of why or why not change which knob to which value. But well, as long as you saw no scsi errors reported when the "crash" occurred, these values are not involved in your problem anyways. What about "btrfs device stats"? -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html