Re: BTRFS bad block management. Does it exist?

Anand Jain Tue, 16 Oct 2018 02:58:13 -0700




On 10/14/2018 07:08 PM, waxhead wrote:

In case BTRFS fails to WRITE to a disk. What happens?

Does the bad area get mapped out somehow?

There was a proposed patch, its not convincing because the disks doesthe bad block relocation part transparently to the host and if disk runsout of reserved list then probably its time to replace the disk as in myexperience the disk would have failed for other non-media error beforeit runs out of the reserved list and where in this case the hostperformed relocation won't help. Further more being at the file-systemlevel you won't be able to accurately determine whether the block writehas failed for the bad media error and not because of the reason oftarget circuitry fault.

Does it try again until itsucceed or

until it "times out" or reach a threshold counter?

Block IO timeout and retry are the properties of the block layerdepending on the type of error it should.

SD module already does retry of 5 counts (when failfast is not set), itshould be tune-able. And I think there was a patch for that in the ML.


We had few discussion on the retry part in the past. [1]
[1]
https://www.spinics.net/lists/linux-btrfs/msg70240.html
https://www.spinics.net/lists/linux-btrfs/msg71779.html

Does it eventually try to write to a different disk (in case of usingthe raid1/10 profile?)

When there is mirror copy it does not go into the RO mode, and it leaveswrite hole(s) patchy across any transaction as we don't fail the disk atthe first failed transaction. That means if a disk is at nth transactionper the super-block, its not guaranteed that all previous transactionshave made it to the disk successfully in case of mirror-ed configs. Iconsider this as a bug. And there is a danger that it may read the junkdata, which is hard but not impossible to hit due to our un-reasonable(there is a patch in the ML to address that as well) hard-codedpid-based read-mirror policy.

I sent a patch to fail the disk when first write fails so that we knowthe last good integrity of the FS based on the transaction id. That wasa long time back I still believe its important patch. There wasn'tenough comments I guess for it go into the next step.

The current solution is to replace the offending disk _without_ readingfrom it, to have a good recovery from the failed disk. As data centerscan't relay on admin initiated manual recovery, there is also a patch todo this stuff automatically using the auto-replace feature, patches arein the ML. Again there wasn't enough comments I guess for it go into thenext step.


Thanks, Anand

Re: BTRFS bad block management. Does it exist?

Reply via email to