Austin S Hemmelgarn posted on Mon, 07 Dec 2015 10:39:05 -0500 as excerpted:
> On 2015-12-07 10:12, Jon Panozzo wrote: >> This is what I was thinking as well. In my particular use-case, parity >> is only really used today to reconstruct an entire device due to a >> device failure. I think if btrfs scrub detected errors on a single >> device, I could do a "reverse reconstruct" where instead of syncing TO >> the parity disk, I sync FROM the parity disk TO the btrfs single device >> with the error, replacing physical blocks that are out of sync with >> parity (thus repairing the scrub-found errrors). The downside to this >> approach is I would have to perform the reverse-sync against the entire >> btrfs block device, which could be much more time-consuming than if I >> could single out the specific block addresses and just sync those. >> That said, I guess option A is better than no option at all. >> >> I would be curious if any of the devs or other members of this mailing >> list have tried to correlate btrfs internal block addresses to a true >> block-address on the device being used. Any interesting articles / >> links that show how to do this? Not expecting much, but if someone >> does know, I'd be very grateful. > I think there is a tool in btrfs-progs to do it, but I've never used it, > and you would still need to get scrub to spit out actual error addresses > for you. btrfs-debug-tree is what you're looking for. =:^) As I understand things, the complexity is due to btrfs' chunk abstraction, along with the multi-device feature. On a normal filesystem, byte or block addresses are mapped linearly to absolute filesystem byte address and there's just the one device to worry about, so there's effectively little or no translation to be done. On btrfs by contrast, block addresses map into chunks, also known as block groups, which are designed to be more or less arbitrarily relocatable within the filesystem using balance (originally called the restriper). Further, these block groups can be single, striped across multiple devices (raid0 and the 0 side of raid10, duplicated on the same device (dup) or across multiple devices (only two devices currently, N- way-mirroring is on the roadmap, raid1 and the 1 side of raid10), or striped with parity (raid5 and 6). So while block addresses can map more or less linearly into block groups, btrfs has to maintain an entirely new layer of abstraction mapping in addition, that tells the filesystem where to look for that block group, that is, on what device (or across what devices if striped), and at what absolute bytenr offset into the device. And again, keep in mind that even with a constant single/dup/raid mapping and even in the simplest single mode on single device, balance can and does more or less arbitrarily dynamically relocate block groups within the filesystem, so the mapping you see today may or may not be the mapping you see tomorrow, depending on whether a balance was run in the mean time. Obviously the devs are going to need a tool to help them debug this additional complexity, and that's where btrfs-debug-tree comes in. =:^) But for "ordinary mortal admins", yes, btrfs is open source and btrfs-debug-tree is available for those that want to use it, but once they realize the complexity, most (including me) are going to simply be content to treat it as a black box and not worry too much about investigating its innards. So while specific block and/or byte mapping can be done and there's tools available for and appropriate to the task, it's the type of thing most admins are very content to treat as a black box and leave well enough alone, once they understand the complexities involved. "Btrfs, while he might use it, it ain't your grandfather's filesystem!" (TM) =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html