Re: Why do full balance and deduplication reduce available free space?
Am Mon, 02 Oct 2017 22:19:32 +0200 schrieb Niccolò Belli: > Il 2017-10-02 21:35 Kai Krakow ha scritto: > > Besides defragging removing the reflinks, duperemove will unshare > > your snapshots when used in this way: If it sees duplicate blocks > > within the subvolumes you give it, it will potentially unshare > > blocks from the snapshots while rewriting extents. > > > > BTW, you should be able to use duperemove with read-only snapshots > > if used in read-only-open mode. But I'd rather suggest to use bees > > instead: It works at whole-volume level, walking extents instead of > > files. That way it is much faster, doesn't reprocess already > > deduplicated extents, and it works with read-only snapshots. > > > > Until my patch it didn't like mixed nodatasum/datasum workloads. > > Currently this is fixed by just leaving nocow data alone as users > > probably set nocow for exactly the reason to not fragment extents > > and relocate blocks. > > Bad Btrfs Feature Interactions: btrfs read-only snapshots (never > tested, probably wouldn't work well) > > Unfortunately it seems that bees doesn't support read-only snapshots, > so it's a no way. > > P.S. > I tried duperemove with -A, but besides taking much longer it didn't > improve the situation. > Are you sure that the culprit is duperemove? AFAIK it shouldn't > unshare extents... Unsharing of extents depends... If an extent is shared between a r/o and r/w snapshot, rewriting the extent for deduplication ends up in a shared extent again but it is no longer reflinked with the original r/o snapshot. At least if btrfs doesn't allow to change extents part of a r/o snapshot... Which you all tell is the case... And then, there's unsharing of metadata by the deduplication process itself. Both effects should be minimal, tho. But since chunks are allocated in 1GB sizes, it may jump 1GB worth of allocation just for a few extra MB needed. A metadata rebalance may fix this. -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why do full balance and deduplication reduce available free space?
On 10/02/2017 12:02 PM, Niccolò Belli wrote: > Hi, > I have several subvolumes mounted with compress-force=lzo and autodefrag. >> Since I use lots of snapshots (snapper keeps around 24 hourly snapshots, 7 >> daily > snapshots and 4 weekly snapshots) I had to create a systemd timer to perform > a > full balance and deduplication each night. In fact data needs to be > already deduplicated when snapshots are created, otherwise I have no other > way to deduplicate snapshots. [...] > Data,single: Size:44.00GiB, Used:40.00GiB > /dev/sda5 44.00GiB > > Metadata,single: Size:5.00GiB, Used:3.78GiB > /dev/sda5 5.00GiB [...] > Data,single: Size:41.00GiB, Used:40.01GiB > /dev/sda5 41.00GiB > > Metadata,single: Size:5.00GiB, Used:3.77GiB > /dev/sda5 5.00GiB [...] > Data,single: Size:41.00GiB, Used:40.03GiB > /dev/sda5 41.00GiB > > Metadata,single: Size:5.00GiB, Used:3.84GiB > /dev/sda5 5.00GiB [...] > Data,single: Size:41.00GiB, Used:40.04GiB > /dev/sda5 41.00GiB > > Metadata,single: Size:5.00GiB, Used:3.97GiB > /dev/sda5 5.00GiB [] > > It further reduced the available free space! Balance and deduplication > actually reduced my available free space of 400MB! > 400MB each night! Your data increased by 40MB (over 40GB, so about ~0.1%); instead your metadata increased about 200MB (over ~4GB, about ~2%); so 1) it seems to me that your data is quite "deduped" 2) (NB this is a my guessing) I think that deduping (and or re-balancing) rearranges the metadata leading to a increase disk usage. The only explanation that I found is that the deduping breaks the sharing of metadata with the snapshots: - a snapshot share the metadata, which in turn refers to the data. Because the metadata is shared, there is only one copy. The metadata remains shared, until it is not changed/updated. - dedupe, when shares a file block, updates the metadata breaking the sharing with its snapshot, and thus creating a copy of these. NB: updating snapshot metadata is the same that updating subvolume metadata > How is it possible? Should I avoid doing balances and deduplications at all? Try few days without deduplication, and check if something change. May be that it would be sufficient to delay the deduping: not each night, but each week or month. Another option is running dedupe on all the files (including the snapshotted ones). In fact this would still break the metadata sharing, but the extents should still be shared (IMHO :-) ). Of course the cost of deduping will increasing a lot (about 24+7+4 = 35 times) > > Thanks, > Niccolò BR G.Baroncelli > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- gpg @keyserver.linux.it: Goffredo Baroncelli Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why do full balance and deduplication reduce available free space?
Il 2017-10-02 21:35 Kai Krakow ha scritto: Besides defragging removing the reflinks, duperemove will unshare your snapshots when used in this way: If it sees duplicate blocks within the subvolumes you give it, it will potentially unshare blocks from the snapshots while rewriting extents. BTW, you should be able to use duperemove with read-only snapshots if used in read-only-open mode. But I'd rather suggest to use bees instead: It works at whole-volume level, walking extents instead of files. That way it is much faster, doesn't reprocess already deduplicated extents, and it works with read-only snapshots. Until my patch it didn't like mixed nodatasum/datasum workloads. Currently this is fixed by just leaving nocow data alone as users probably set nocow for exactly the reason to not fragment extents and relocate blocks. Bad Btrfs Feature Interactions: btrfs read-only snapshots (never tested, probably wouldn't work well) Unfortunately it seems that bees doesn't support read-only snapshots, so it's a no way. P.S. I tried duperemove with -A, but besides taking much longer it didn't improve the situation. Are you sure that the culprit is duperemove? AFAIK it shouldn't unshare extents... Niccolò -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why do full balance and deduplication reduce available free space?
Am Mon, 02 Oct 2017 12:02:16 +0200 schrieb Niccolò Belli: > This is how I performe balance: btrfs balance start --full-balance > rootfs This is how I perform deduplication (duperemove is from git > master): duperemove -drh --dedupe-options=noblock > --hashfile=../rootfs.hash Besides defragging removing the reflinks, duperemove will unshare your snapshots when used in this way: If it sees duplicate blocks within the subvolumes you give it, it will potentially unshare blocks from the snapshots while rewriting extents. BTW, you should be able to use duperemove with read-only snapshots if used in read-only-open mode. But I'd rather suggest to use bees instead: It works at whole-volume level, walking extents instead of files. That way it is much faster, doesn't reprocess already deduplicated extents, and it works with read-only snapshots. Until my patch it didn't like mixed nodatasum/datasum workloads. Currently this is fixed by just leaving nocow data alone as users probably set nocow for exactly the reason to not fragment extents and relocate blocks. -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why do full balance and deduplication reduce available free space?
Maybe this is because of the autodefrag mount option? I thought it wasn't supposed to unshare lots of extents... Niccolò -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Why do full balance and deduplication reduce available free space?
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- > ow...@vger.kernel.org] On Behalf Of Niccolò Belli > Sent: Monday, 2 October 2017 9:29 PM > To: Hans van Kranenburg <hans.van.kranenb...@mendix.com> > Cc: linux-btrfs@vger.kernel.org > Subject: Re: Why do full balance and deduplication reduce available free > space? > > Il 2017-10-02 12:16 Hans van Kranenburg ha scritto: > > On 10/02/2017 12:02 PM, Niccolò Belli wrote: > >> [...] > >> > >> Since I use lots of snapshots [...] I had to create a systemd timer > >> to perform a full balance and deduplication each night. > > > > Can you explain what's your reasoning behind this 'because X it needs > > Y'? I don't follow. > > Available free space is important to me, so I want snapshots to be > deduplicated as well. Since I cannot deduplicate snapshots because they are > read-only, then the data must be already deduplicated before the snapshots > are taken. I do not consider the hourly snapshots because in a day they will > be gone anyway, but daily snapshots will stay there for much longer so I want > them to be deduplicated. I use bees for deduplication and it will quite happily dedupe read-only snapshots. You could always change them to RW while dedupe is running then change back to RO. Paul.
Re: Why do full balance and deduplication reduce available free space?
Il 2017-10-02 12:16 Hans van Kranenburg ha scritto: On 10/02/2017 12:02 PM, Niccolò Belli wrote: [...] Since I use lots of snapshots [...] I had to create a systemd timer to perform a full balance and deduplication each night. Can you explain what's your reasoning behind this 'because X it needs Y'? I don't follow. Available free space is important to me, so I want snapshots to be deduplicated as well. Since I cannot deduplicate snapshots because they are read-only, then the data must be already deduplicated before the snapshots are taken. I do not consider the hourly snapshots because in a day they will be gone anyway, but daily snapshots will stay there for much longer so I want them to be deduplicated. Niccolò -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why do full balance and deduplication reduce available free space?
On 10/02/2017 12:02 PM, Niccolò Belli wrote: > [...] > > Since I use lots of snapshots [...] I had to > create a systemd timer to perform a full balance and deduplication each > night. Can you explain what's your reasoning behind this 'because X it needs Y'? I don't follow. -- Hans van Kranenburg -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Why do full balance and deduplication reduce available free space?
Hi, I have several subvolumes mounted with compress-force=lzo and autodefrag. Since I use lots of snapshots (snapper keeps around 24 hourly snapshots, 7 daily snapshots and 4 weekly snapshots) I had to create a systemd timer to perform a full balance and deduplication each night. In fact data needs to be already deduplicated when snapshots are created, otherwise I have no other way to deduplicate snapshots. This is how I performe balance: btrfs balance start --full-balance rootfs This is how I perform deduplication (duperemove is from git master): duperemove -drh --dedupe-options=noblock --hashfile=../rootfs.hash Looking at the logs I noticed something weird: available free space actually decreases after balance or deduplication. This is just before the timer starts: Overall: Device size: 128.00GiB Device allocated: 49.03GiB Device unallocated: 78.97GiB Device missing: 0.00B Used: 43.78GiB Free (estimated): 82.97GiB (min: 82.97GiB) Data ratio: 1.00 Metadata ratio: 1.00 Global reserve: 512.00MiB (used: 0.00B) Data,single: Size:44.00GiB, Used:40.00GiB /dev/sda5 44.00GiB Metadata,single: Size:5.00GiB, Used:3.78GiB /dev/sda5 5.00GiB System,single: Size:32.00MiB, Used:16.00KiB /dev/sda5 32.00MiB Unallocated: /dev/sda5 78.97GiB I also manually performed a full balance just before the timer starts: Overall: Device size: 128.00GiB Device allocated: 46.03GiB Device unallocated: 81.97GiB Device missing: 0.00B Used: 43.78GiB Free (estimated): 82.96GiB (min: 82.96GiB) Data ratio: 1.00 Metadata ratio: 1.00 Global reserve: 512.00MiB (used: 0.00B) Data,single: Size:41.00GiB, Used:40.01GiB /dev/sda5 41.00GiB Metadata,single: Size:5.00GiB, Used:3.77GiB /dev/sda5 5.00GiB System,single: Size:32.00MiB, Used:16.00KiB /dev/sda5 32.00MiB Unallocated: /dev/sda5 81.97GiB As you can see even doing a full balance was enough to reduce the available free space! Then the timer started and it performed the deduplication: Overall: Device size: 128.00GiB Device allocated: 46.03GiB Device unallocated: 81.97GiB Device missing: 0.00B Used: 43.87GiB Free (estimated): 82.94GiB (min: 82.94GiB) Data ratio: 1.00 Metadata ratio: 1.00 Global reserve: 512.00MiB (used: 176.00KiB) Data,single: Size:41.00GiB, Used:40.03GiB /dev/sda5 41.00GiB Metadata,single: Size:5.00GiB, Used:3.84GiB /dev/sda5 5.00GiB System,single: Size:32.00MiB, Used:16.00KiB /dev/sda5 32.00MiB Unallocated: /dev/sda5 81.97GiB Once again it reduced the available free space! Then, after the deduplication, the timer also performed a full balance: Overall: Device size: 128.00GiB Device allocated: 46.03GiB Device unallocated: 81.97GiB Device missing: 0.00B Used: 44.00GiB Free (estimated): 82.93GiB (min: 82.93GiB) Data ratio: 1.00 Metadata ratio: 1.00 Global reserve: 512.00MiB (used: 0.00B) Data,single: Size:41.00GiB, Used:40.04GiB /dev/sda5 41.00GiB Metadata,single: Size:5.00GiB, Used:3.97GiB /dev/sda5 5.00GiB System,single: Size:32.00MiB, Used:16.00KiB /dev/sda5 32.00MiB Unallocated: /dev/sda5 81.97GiB It further reduced the available free space! Balance and deduplication actually reduced my available free space of 400MB! 400MB each night! How is it possible? Should I avoid doing balances and deduplications at all? Thanks, Niccolò -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html