On Wed, Sep 25, 2019 at 3:32 PM Pallissard, Matthew <m...@pallissard.net> wrote:
>
> On 2019-09-25T15:05:44, Chris Murphy wrote:
> > Definitely deal with the timing issue first. If by chance there are bad 
> > sectors on any of the drives, they must be properly reported by the drive 
> > with a discrete read error in order for Btrfs to do a proper fixup. If the 
> > times are mismatched, then Linux can get tired waiting, and do a link reset 
> > on the drive before the read error happens. And now the whole command queue 
> > is lost and the problem isn't fixed.
>
> Good to know, that seems like a critical piece of information.  A few 
> searches turned up this page, https://wiki.debian.org/Btrfs#FAQ.
>
> Should this be noted on the 'gotchas' or 'getting started page as well?  I'd 
> be happy to make edits should the powers that be allow it.

Should what be noted as a gotcha? The timing stuff? That's not Btrfs
specific. It's just a default that's become shitty because if the
crazy amount of time consumer drives can take doing "deep recovery"
for bad sectors that can exceed a minute. It's incredible how slow
that is and how many attempts are being made. But I guess on rare
occasion this does cause a recovery, while also making your computer
slow as balls. Anyway, this 30 second timer is obsolete but kernel
developers so far refuse to change it, arguing every distribution that
cares about desktop users, and users who use consumer drives for data
storage, should change the timer default for their users using a udev
rule. Except no distro I know of does that. This affects everyone with
consumer drives that have deep recoveries, mostly common with hard
drives. But it's especially negative on large data storage stacks
using any kind of RAID. You'll find this problem all over the
linux-raid@ achive, it comes up all the time. Still.

https://raid.wiki.kernel.org/index.php/Timeout_Mismatch

-- 
Chris Murphy

Reply via email to