Re: Migration to BTRFS

Austin S. Hemmelgarn Mon, 29 Apr 2019 10:24:04 -0700

On 2019-04-29 12:16, Hendrik Friedel wrote:

Hello,
With "single" data profile you won't lose filesystem, but you will
irretrievably lose any data on the missing drive. Also "single" profile
does not support auto-healing (repairing of bad copy from good copy). If
this is acceptable to you, then yes, both variants will do what you want.
Actually, it's a bit worse than this potentially. You may loseindividual files if you lose one disk with the proposed setup, but youmay also lose _parts_ of individual files, especially if you have lotsof large (>1-5GB in size) files.
You mean if parts of the files are on the failed drive, or what do youhave in mind?

Yes, it's if parts of the files are on the failed drive. Essentially, ifa file has more than one extent, then with the single profile thoseextents may be stored on different drives. The common case for this isdealing with files larger than the data chunk size for the filesystem(typically between 1-5GB on most reasonably sized volumes), because anextent can't be larger than a chunk.

And on top of this, finding what data went missing will essentiallyrequire trying to read every byte of every file in the volume.
Why is that and how would it be done (scrub, I suppose?)

There's no other way short of scanning the filesystem internals tofigure out what chunks would be present on a missing disk and then mapthe contents of those chunks to the files they are part of. Ideally,this wouldn't be the case, but it's a unusual enough situation that it'sjust not been a priority to provide a tool to do it.

As far as the actual process itself, scrub is one way to do it, but itrequires using a separate tool to map the inode numbers spit out by thescrub messages in the kernel logs to actual file names. There are abunch of other ways to do it too though. Personally, I'd probablythrough something together in Python to try and read each file all theway through, bail if it hit _any_ IO error, and then log the names offiles it found IO errors in, though even something just chaining `find`and `cat` together and then watching the kernel log for IO errormessages would be enough.

I am wondering, why the design of 'single' is that way? It seems to me,that this is unneccessarily increasing the failure probability. Mythinking: If I have two separate file-systems, I have a FP of Z, with Zthe probability of one drive to fail. If I one btrfs-system in singleprofile, I have a FP of Z^N, wheras it could -with a different design-still be Z, no?

Yes, it is technically possible, you just place each file entirely onone device. In fact, you can see this as a placement option in manydistributed filesystems. There are a couple of reasons it's not donewith local filesystems backed with conventional block storage:

* It adds an extra layer of complexity. In a distributed filesystem, oreven with mhddfs, you already have a nice, easy to use filesysteminterface (or an object-storage interface) so you don't have to handleblock mapping. With a local filesystem though, you still have to doblock translation, which then becomes far more complicated because ofthe new, extra, constraint on where each block can go.* It is very good at confusing regular end-users. Assume you have toplace a 4GB file on a volume arranged like this, but only have 2G ofspace left on each disk. You still technically have 4G of free space,but you can't put the file on the volume because there isn't enoughspace on either disk for it. This type of situation is extremelyconfusing for normal users, and is not all that uncommon in desktopusage scenarios. BTRFS also already has issues like this to begin with,and adding another source for them is not a good idea.* The exact benefits of this usually don't matter for (comparatively)small local storage devices. The primary reason it's done at all is forbig hosting companies so that they can trivially guarantee that serviceswill be fully functional if they can actually see all the files. For aregular user on a small desktop, it just doesn't matter in most cases.

As of today there is no provision for automatic mounting of incomplete
multi-device btrfs in degraded mode. Actually, with systemd it is flat
impossible to mount incomplete btrfs because standard framework only
proceeds to mount it after all devices have been seen.

Do you talk about the mount during boot or about mounting in general?

Both, unless you do some heavy modifications of some of the standardinstalled files (you need to disable some specific udev rules and thenreplace the standard `mount.btrfs` wrapper that systemd uses).

 > If I where you, with your use case I would consider using mhddfs
> https://romanrm.net/mhddfs which is filesystem agnostic layer on topof 2x [-m> DUP, -d SINGLE] BTRFS drives. Last time I tested mhddfs (about 5+years ago) it> was dead slow, but that might not be very important to you. For whatit does it
 > works great!
In fact, that is what I am using today. But when using snapshots, thiswould become a bit messy (having to do the snapshot on each deviceseparately, but identically.
 > remember that backup is not a backup unless it has a extra backup
I do have two backups (one offsite) of all data that is irreplacable andone of data that is nice to have (TV-Recordings).
Greetings,
Hendrik

Re: Migration to BTRFS

Reply via email to