Balance conversion to metadata RAID1, data RAID1 leaves some metadata as DUP
I recently recovered created a fresh filesystem on one disk and recovered from backups with data as SINGLE and metadata as DUP. I added a second disk yesterday and ran a balance with -dconvert=raid1 -mconvert=raid1. I did reboot during the process for a couple of reasons, putting the sides on the PC case, putting it back under the desk and I updated the kernel from 5.3.9 to 5.2.13 at some point during this process. Balance resumed as one would expect. Balance has now completed: root@phoenix:~# btrfs balance status /home_data No balance found on '/home_data' However, some metadata remains as DUP which does not seem right: root@phoenix:~# btrfs fi usage /home_data/ Overall: Device size: 10.92TiB Device allocated: 4.69TiB Device unallocated:6.23TiB Device missing: 0.00B Used: 4.61TiB Free (estimated): 3.15TiB (min: 3.15TiB) Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Data,RAID1: Size:2.34TiB, Used:2.30TiB /dev/mapper/data_disk_ESFH 2.34TiB /dev/mapper/data_disk_EVPC 2.34TiB Metadata,RAID1: Size:7.00GiB, Used:4.48GiB /dev/mapper/data_disk_ESFH 7.00GiB /dev/mapper/data_disk_EVPC 7.00GiB Metadata,DUP: Size:1.00GiB, Used:257.22MiB /dev/mapper/data_disk_ESFH 2.00GiB System,RAID1: Size:32.00MiB, Used:368.00KiB /dev/mapper/data_disk_ESFH 32.00MiB /dev/mapper/data_disk_EVPC 32.00MiB Unallocated: /dev/mapper/data_disk_ESFH 3.11TiB /dev/mapper/data_disk_EVPC 3.11TiB root@phoenix:~# root@phoenix:~# btrfs --version btrfs-progs v5.1 I presume running another balance will fix this, but surely all metadata should have been converted? Is there a way to only balance the DUP metadata?
Re: Chasing IO errors. BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2907: errno=-5 IO failure
On 8/22/19 12:32 AM, Qu Wenruo wrote: >>> Then I'd recommend to do regular rescue procedure: >>> - Try that skip_bg patchset if possible >>> This provides the best salvage method so far, full subvolume >>> available, although needs out-of-tree patches. >>> https://patchwork.kernel.org/project/linux-btrfs/list/?series=130637 >>> >> >> I can give that a go, but not for a while. >> >> I seem to be able to read the file system as is, as it goes read only. >> But perhaps 'seems' is the operative word. > > As long as you can mount RO, it shouldn't be mostly OK for data salvage. > Posting this for completeness, as this started just before I went away for a while. The filesystem was RO but there were failures copying affected files meaning my copy from the RO file system was incomplete. So I likely _should_ have applied the patch suggested above, if that was my only copy. Instead I recovered from backups. Thanks for your help. Pete
Re: Balance conversion to metadata RAID1, data RAID1 leaves some metadata as DUP
On 9/8/19 9:09 AM, Pete wrote: (snip) I presume running another balance will fix this, but surely all metadata should have been converted? Is there a way to only balance the DUP metadata? Adding "soft" to -mconvert should do exactly that; it will then skip any chunks that are already in the target profile. -h
Re: Balance conversion to metadata RAID1, data RAID1 leaves some metadata as DUP
On 9/8/19 8:57 AM, Holger Hoffstätte wrote: > On 9/8/19 9:09 AM, Pete wrote: > (snip) >> I presume running another balance will fix this, but surely all metadata >> should have been converted? Is there a way to only balance the DUP >> metadata? > > Adding "soft" to -mconvert should do exactly that; it will then skip > any chunks that are already in the target profile. > > -h Appreciated. Fixed it, very rapidly, with: btrfs bal start -mconvert=raid1,soft /home_data I think a few examples on the wiki page would be helpful. As I don't do this sort of maintenance every day I looked at the filter section on the wiki / man pages online following your prompting and was adding 'type=soft' in various places with no success and it was about the 3rd reading of the relevant area where I cam up with the above which worked. Pete
Re: Balance conversion to metadata RAID1, data RAID1 leaves some metadata as DUP
On 9/8/19 11:18 AM, Pete wrote: On 9/8/19 8:57 AM, Holger Hoffstätte wrote: On 9/8/19 9:09 AM, Pete wrote: (snip) I presume running another balance will fix this, but surely all metadata should have been converted? Is there a way to only balance the DUP metadata? Adding "soft" to -mconvert should do exactly that; it will then skip any chunks that are already in the target profile. -h Appreciated. Fixed it, very rapidly, with: btrfs bal start -mconvert=raid1,soft /home_data I think a few examples on the wiki page would be helpful. As I don't do this sort of maintenance every day I looked at the filter section on the wiki / man pages online following your prompting and was adding 'type=soft' in various places with no success and it was about the 3rd reading of the relevant area where I cam up with the above which worked. IMHO 'soft' should just be the implicit default behaviour for convert, since it almost always does what one would expect when converting. I can't think of a good reason why it shouldn't be, but maybe there is one - Dave? -h
user_subvol_rm_allowed vs rmdir_subvol
Came across this podman issue yesterday https://github.com/containers/libpod/issues/3963 Question 1: For unprivileged use case, is it intentional that the user creates a subvolume/snapshot using 'btrfs sub create' and that the user delete it with 'rm -rf' ? And is the consequence of this performance? Because I see rm -rf must individually remove all files and dirs from the subvolume first, before rmdir() is called to remove the subvolume. Where as 'btrfs sub del' calls BTRFS_IOC_SNAP_DESTROY ioctl which is pretty much immediate, with cleanup happening in the background. Question 2: As it relates to the podman issue, what do Btrfs developers recommend? If kernel > 4.18, and if unprivileged, then use 'rm -rf' to delete subvolumes? Otherwise use 'btrfs sub del' with root privilege? Question 3: man 5 btrfs has a confusing note for user_subvol_rm_allowed mount option: Note historically, any user could create a snapshot even if he was not owner of the source subvolume, the subvolume deletion has been restricted for that reason. The subvolume creation has been restricted but this mount option is still required. This is a usability issue. 2nd sentence "subvolume creation has been restricted" I can't parse that. Is it an error, or can it be worded differently? -- Chris Murphy
Feature requests: online backup - defrag - change RAID level
Hello everyone! I have been programming for a long time (over 20 years), and I am quite interested in a lot of low-level stuff. But in reality I have never done anything related to kernels or filesystems. But I did a lot of assembly, C, OS stuff etc... Looking at your project status page (at https://btrfs.wiki.kernel.org/index.php/Status), I must say that your priorities don't quite match mine. Of course, the opinions usually differ. It is my opinion that there are some quite essential features which btrfs is, unfortunately, still missing. So here is a list of features which I would rate as very important (for a modern COW filesystem like btrfs is), so perhaps you can think about it at least a little bit. 1) Full online backup (or copy, whatever you want to call it) btrfs backup [-f] - backups a btrfs filesystem given by to a partition (with all subvolumes). - To be performed by creating a new btrfs filesystem in the destination partition , with a new GUID. - All data from the source filesystem is than copied to the destination partition, similar to how RAID1 works. - The size of the destination partition must be sufficient to hold the used data from the source filesystem, otherwise the operation fails. The point is that the destination doesn't have to be as large as source, just sufficient to hold the data (of course, many details and concerns are skipped in this short proposal) - When the operation completes, the destination partition contains a fully featured, mountable and unmountable btrfs filesystem, which is an exact copy of the source filesystem at some point in time, with all the snapshots and subvolumes of the source filesystem. - There are two possible implementations about how this operation is to be performed, depending on whether the destination drive is slower than source drive(s) or not (like, when the destination is HDD and the source is SDD). If the source and the destination are of similar speed, than a RAID1-alike algorithm can be used (all writes simultaneously go to the source and the destination). This mode can also be used if the user/admin is willing to tolerate a performance hit for some relatively short period of time. The second possible implementation is a bit more complex, it can be done by creating a temporary snapshot or by buffering all the current writes until they can be written to the destination drive, but this implementation is of lesser priority (see if you can make the RAID1 implementation work first). 2) Sensible defrag The defrag is currently a joke. If you use defrag than you better not use subvolumes/snapshots. That's... very… hard to tolerate. Quite a necessary feature. I mean, defrag is an operation that should be performed in many circumstances, and in many cases it is even automatically initiated. But, btrfs defrag is virtually unusable. And, it is unusable where it is most needed, as the presence of subvolumes will, predictably, increase fragmentation by quite a lot. How to do it: - The extents must not be unshared, but just shuffled a bit. Unsharing the extents is, in most situations, not tolerable. - The defrag should work by doing a full defrag of one 'selected subvolume' (which can be selected by user, or it can be guessed because the user probably wants to defrag the currently mounted subvolume, or default subvolume). The other subvolumes should than share data (shared extents) with the 'selected subvolume' (as much as possible). - If you want it even more feature-full and complicated, then you could allow the user to specify a list of selected subvolumes, like: subvol1, subvol2, subvol3… etc. and the defrag algorithm than defrags subvol1 in full, than subvol2 as much as possible while not changing subvol1 and at the same time sharing extents with subvol1, than defrag subvol3 while not changing subvol1 and subvol2… etc. - I think it would be wrong to use a general deduplication algorithm for this. Instead, the information about the shared extents should be analyzed given the starting state of the filesystem, and than the algorithm should produce an optimal solution based on the currently shared extents. Deduplication is a different task. 3) Downgrade to 'single' or 'DUP' (also, general easy way to switch between RAID levels) Currently, as much as I gather, user has to do a "btrfs balance start -dconvert=single -mconvert=single ", than delete a drive, which is a bit ridiculous sequence of operations. Can you do something like "btrfs delete", but such that it also simultaneously converts to 'single', or some other chosen RAID level? ## I hope that you will consider my suggestions, I hope that I'm helpful (although, I guess, the short time I spent working with btrfs and writing this mail can not compare with the amount of work you are putting into it). Perhaps, teams sometimes need a different p
Feature requests: online backup - defrag - change RAID level
Hello everyone! I have been programming for a long time (over 20 years), and I am quite interested in a lot of low-level stuff. But in reality I have never done anything related to kernels or filesystems. But I did a lot of assembly, C, OS stuff etc... Looking at your project status page (at https://btrfs.wiki.kernel.org/index.php/Status), I must say that your priorities don't quite match mine. Of course, the opinions usually differ. It is my opinion that there are some quite essential features which btrfs is, unfortunately, still missing. So here is a list of features which I would rate as very important (for a modern COW filesystem like btrfs is), so perhaps you can think about it at least a little bit. 1) Full online backup (or copy, whatever you want to call it) btrfs backup [-f] - backups a btrfs filesystem given by to a partition (with all subvolumes). - To be performed by creating a new btrfs filesystem in the destination partition , with a new GUID. - All data from the source filesystem is than copied to the destination partition, similar to how RAID1 works. - The size of the destination partition must be sufficient to hold the used data from the source filesystem, otherwise the operation fails. The point is that the destination doesn't have to be as large as source, just sufficient to hold the data (of course, many details and concerns are skipped in this short proposal) - When the operation completes, the destination partition contains a fully featured, mountable and unmountable btrfs filesystem, which is an exact copy of the source filesystem at some point in time, with all the snapshots and subvolumes of the source filesystem. - There are two possible implementations about how this operation is to be performed, depending on whether the destination drive is slower than source drive(s) or not (like, when the destination is HDD and the source is SDD). If the source and the destination are of similar speed, than a RAID1-alike algorithm can be used (all writes simultaneously go to the source and the destination). This mode can also be used if the user/admin is willing to tolerate a performance hit for some relatively short period of time. The second possible implementation is a bit more complex, it can be done by creating a temporary snapshot or by buffering all the current writes until they can be written to the destination drive, but this implementation is of lesser priority (see if you can make the RAID1 implementation work first). 2) Sensible defrag The defrag is currently a joke. If you use defrag than you better not use subvolumes/snapshots. That's... very… hard to tolerate. Quite a necessary feature. I mean, defrag is an operation that should be performed in many circumstances, and in many cases it is even automatically initiated. But, btrfs defrag is virtually unusable. And, it is unusable where it is most needed, as the presence of subvolumes will, predictably, increase fragmentation by quite a lot. How to do it: - The extents must not be unshared, but just shuffled a bit. Unsharing the extents is, in most situations, not tolerable. - The defrag should work by doing a full defrag of one 'selected subvolume' (which can be selected by user, or it can be guessed because the user probably wants to defrag the currently mounted subvolume, or default subvolume). The other subvolumes should than share data (shared extents) with the 'selected subvolume' (as much as possible). - If you want it even more feature-full and complicated, then you could allow the user to specify a list of selected subvolumes, like: subvol1, subvol2, subvol3… etc. and the defrag algorithm than defrags subvol1 in full, than subvol2 as much as possible while not changing subvol1 and at the same time sharing extents with subvol1, than defrag subvol3 while not changing subvol1 and subvol2… etc. - I think it would be wrong to use a general deduplication algorithm for this. Instead, the information about the shared extents should be analyzed given the starting state of the filesystem, and than the algorithm should produce an optimal solution based on the currently shared extents. Deduplication is a different task. 3) Downgrade to 'single' or 'DUP' (also, general easy way to switch between RAID levels) Currently, as much as I gather, user has to do a "btrfs balance start -dconvert=single -mconvert=single ", than delete a drive, which is a bit ridiculous sequence of operations. Can you do something like "btrfs delete", but such that it also simultaneously converts to 'single', or some other chosen RAID level? ## I hope that you will consider my suggestions, I hope that I'm helpful (although, I guess, the short time I spent working with btrfs and writing this mail can not compare with the amount of work you are putting into it). Perhaps, teams sometimes need a different perspective,
Re: Feature requests: online backup - defrag - change RAID level
On 2019/9/9 上午10:55, zedlr...@server53.web-hosting.com wrote: > Hello everyone! > [...] > > 1) Full online backup (or copy, whatever you want to call it) > btrfs backup [-f] > - backups a btrfs filesystem given by to a partition > (with all subvolumes). Why not just btrfs send? Or you want to keep the whole subvolume structures/layout? > > - To be performed by creating a new btrfs filesystem in the destination > partition , with a new GUID. I'd say current send/receive is more flex. And you also needs to understand btrfs also integrates volume management, thus it's not just , you also needs RAID level and things like that. > - All data from the source filesystem is than copied > to the destination partition, similar to how RAID1 works. > - The size of the destination partition must be sufficient to hold the > used data from the source filesystem, otherwise the operation fails. The > point is that the destination doesn't have to be as large as source, > just sufficient to hold the data (of course, many details and concerns > are skipped in this short proposal) All can be done already by send/receive, although at subvolume level. Please check if send/receive is suitable for your use case. [...] > > 2) Sensible defrag > The defrag is currently a joke. If you use defrag than you better not > use subvolumes/snapshots. That's... very… hard to tolerate. Quite a > necessary feature. I mean, defrag is an operation that should be > performed in many circumstances, and in many cases it is even > automatically initiated. But, btrfs defrag is virtually unusable. And, > it is unusable where it is most needed, as the presence of subvolumes > will, predictably, increase fragmentation by quite a lot. > > How to do it: > - The extents must not be unshared, but just shuffled a bit. Unsharing > the extents is, in most situations, not tolerable. I definitely see cases unsharing extents makes sense, so at least we should let user to determine what they want. > > - The defrag should work by doing a full defrag of one 'selected > subvolume' (which can be selected by user, or it can be guessed because > the user probably wants to defrag the currently mounted subvolume, or > default subvolume). The other subvolumes should than share data (shared > extents) with the 'selected subvolume' (as much as possible). What's wrong with current file based defrag? If you want to defrag a subvolume, just iterate through all files. > > - I think it would be wrong to use a general deduplication algorithm for > this. Instead, the information about the shared extents should be > analyzed given the starting state of the filesystem, and than the > algorithm should produce an optimal solution based on the currently > shared extents. Please be more specific, like giving an example for it. > > Deduplication is a different task. > > 3) Downgrade to 'single' or 'DUP' (also, general easy way to switch > between RAID levels) > > Currently, as much as I gather, user has to do a "btrfs balance start > -dconvert=single -mconvert=single > ", than delete a drive, which is a bit ridiculous sequence of operations. > > Can you do something like "btrfs delete", but such that it also > simultaneously converts to 'single', or some other chosen RAID level? That's a shortcut for chunk profile change. My first idea of this is, it could cause more problem than benefit. (It only benefits profile downgrade, thus only makes sense for RAID1->SINGLE, DUP->SINGLE, and RAID10->RAID0, nothing else) I still prefer the safer allocate-new-chunk way to convert chunks, even at a cost of extra IO. Thanks, Qu > > ## I hope that you will consider my suggestions, I hope that I'm helpful > (although, I guess, the short time I spent working with btrfs and > writing this mail can not compare with the amount of work you are > putting into it). Perhaps, teams sometimes need a different perspective, > outsiders perspective, in order to better understand the situation. > > So long! > signature.asc Description: OpenPGP digital signature