Re: Very slow balance / btrfs-transaction
February 3, 2017 11:26 PM, "Goldwyn Rodrigues"wrote: > On 02/03/2017 04:13 PM, j...@capsec.org wrote: > > Hi, > > > > > > I'm currently running a balance (without any filters) on a 4 drives raid1 > > filesystem. The array contains 3 3TB drives and one 6TB drive; I'm running > > the rebalance because the 6TB drive recently replaced a 2TB drive. > > > > > > I know that balance is not supposed to be a fast operation, but this one is > > now running for ~6 days and it managed to balance ~18% (754 out of about > > 4250 > > chunks balanced (755 considered), 82% left) -- so I expect it to take > > another ~4 weeks. > > > > That seems excessively slow for ~8TiB of data. > > > > > > Is this expected behavior? In case it's not: Is there anything I can do to > > help debug it? > > Do you have quotas enabled? > > -- > Goldwyn Just dropping in — I don’t normally follow the list but I found this thread when I was troubleshooting balance issues (kernel 4.11, converting raid1 to raid10). Disabling quotas had an immense impact on performance and it would be helpful if notes could be added in *lots* of places. With quotas on, each block group took 30 minutes to over an hour to convert, and the system was only usable for a few seconds per iteration: Jun 28 00:42:41 overkill kernel: BTRFS info (device sdc2): relocating block group 7141922439168 flags data|raid1 Jun 28 01:32:13 overkill kernel: BTRFS info (device sdc2): relocating block group 7140848697344 flags data|raid1 Jun 28 02:48:59 overkill kernel: BTRFS info (device sdc2): relocating block group 7139774955520 flags data|raid1 Jun 28 03:50:12 overkill kernel: BTRFS info (device sdc2): relocating block group 7138701213696 flags data|raid1 Jun 28 05:20:58 overkill kernel: BTRFS info (device sdc2): relocating block group 7137627471872 flags data|raid1 Jun 28 06:49:00 overkill kernel: BTRFS info (device sdc2): relocating block group 7136553730048 flags data|raid1 Jun 28 07:23:58 overkill kernel: BTRFS info (device sdc2): relocating block group 7135479988224 flags data|raid1 Jun 28 08:03:39 overkill kernel: BTRFS info (device sdc2): relocating block group 7134406246400 flags data|raid1 Jun 28 08:40:11 overkill kernel: BTRFS info (device sdc2): relocating block group 712504576 flags data|raid1 Jun 28 09:44:46 overkill kernel: BTRFS info (device sdc2): relocating block group 7132258762752 flags data|raid1 Jun 28 10:24:17 overkill kernel: BTRFS info (device sdc2): relocating block group 7131185020928 flags data|raid1 Jun 28 11:35:39 overkill kernel: BTRFS info (device sdc2): relocating block group 7130111279104 flags data|raid1 Jun 28 12:53:56 overkill kernel: BTRFS info (device sdc2): relocating block group 7129037537280 flags data|raid1 Jun 28 13:37:00 overkill kernel: BTRFS info (device sdc2): relocating block group 7127963795456 flags data|raid1 Jun 28 14:32:19 overkill kernel: BTRFS info (device sdc2): relocating block group 7126890053632 flags data|raid1 Jun 28 15:45:19 overkill kernel: BTRFS info (device sdc2): relocating block group 7125816311808 flags data|raid1 Jun 28 16:30:01 overkill kernel: BTRFS info (device sdc2): relocating block group 7124742569984 flags data|raid1 Jun 28 17:26:57 overkill kernel: BTRFS info (device sdc2): relocating block group 7123668828160 flags data|raid1 Jun 28 18:15:01 overkill kernel: BTRFS info (device sdc2): relocating block group 7122595086336 flags data|raid1 Jun 28 18:48:05 overkill kernel: BTRFS info (device sdc2): relocating block group 7121521344512 flags data|raid1 Jun 28 19:25:59 overkill kernel: BTRFS info (device sdc2): relocating block group 7120447602688 flags data|raid1 Jun 28 19:55:46 overkill kernel: BTRFS info (device sdc2): relocating block group 7119373860864 flags data|raid1 Jun 28 20:30:41 overkill kernel: BTRFS info (device sdc2): relocating block group 7118300119040 flags data|raid1 Jun 28 21:28:43 overkill kernel: BTRFS info (device sdc2): relocating block group 7117226377216 flags data|raid1 Jun 28 22:55:34 overkill kernel: BTRFS info (device sdc2): relocating block group 7114005151744 flags data|raid1 Jun 28 23:19:06 overkill kernel: BTRFS info (device sdc2): relocating block group 7110783926272 flags data|raid1 With quotas off, it takes ~20 seconds to convert each block group and the system is completely usable: Jul 01 09:56:42 overkill kernel: BTRFS info (device sde): relocating block group 7085014122496 flags data|raid1 Jul 01 09:56:59 overkill kernel: BTRFS info (device sde): relocating block group 7083940380672 flags data|raid1 Jul 01 09:57:18 overkill kernel: BTRFS info (device sde): relocating block group 7082866638848 flags data|raid1 Jul 01 09:57:39 overkill kernel: BTRFS info (device sde): relocating block group 7081792897024 flags data|raid1 Jul 01 09:58:01 overkill kernel: BTRFS info (device
Re: Very slow balance / btrfs-transaction
At 02/08/2017 09:56 PM, Filipe Manana wrote: On Wed, Feb 8, 2017 at 12:39 AM, Qu Wenruowrote: At 02/07/2017 11:55 PM, Filipe Manana wrote: On Tue, Feb 7, 2017 at 12:22 AM, Qu Wenruo wrote: At 02/07/2017 12:09 AM, Goldwyn Rodrigues wrote: Hi Qu, On 02/05/2017 07:45 PM, Qu Wenruo wrote: At 02/04/2017 09:47 AM, Jorg Bornschein wrote: February 4, 2017 1:07 AM, "Goldwyn Rodrigues" wrote: Quata support was indeed active -- and it warned me that the qroup data was inconsistent. Disabling quotas had an immediate impact on balance throughput -- it's *much* faster now! From a quick glance at iostat I would guess it's at least a factor 100 faster. Should quota support generally be disabled during balances? Or did I somehow push my fs into a weired state where it triggered a slow-path? Thanks! j Would you please provide the kernel version? v4.9 introduced a bad fix for qgroup balance, which doesn't completely fix qgroup bytes leaking, but also hugely slow down the balance process: commit 62b99540a1d91e46422f0e04de50fc723812c421 Author: Qu Wenruo Date: Mon Aug 15 10:36:51 2016 +0800 btrfs: relocation: Fix leaking qgroups numbers on data extents Sorry for that. And in v4.10, a better method is applied to fix the byte leaking problem, and should be a little faster than previous one. commit 824d8dff8846533c9f1f9b1eabb0c03959e989ca Author: Qu Wenruo Date: Tue Oct 18 09:31:29 2016 +0800 btrfs: qgroup: Fix qgroup data leaking by using subtree tracing However, using balance with qgroup is still slower than balance without qgroup, the root fix needs us to rework current backref iteration. This patch has made the btrfs balance performance worse. The balance task has become more CPU intensive compared to earlier and takes longer to complete, besides hogging resources. While correctness is important, we need to figure out how this can be made more efficient. The cause is already known. It's find_parent_node() which takes most of the time to find all referencer of an extent. And it's also the cause for FIEMAP softlockup (fixed in recent release by early quit). The biggest problem is, current find_parent_node() uses list to iterate, which is quite slow especially it's done in a loop. In real world find_parent_node() is about O(n^3). We can either improve find_parent_node() by using rb_tree, or introduce some cache for find_parent_node(). Even if anyone is able to reduce that function's complexity from O(n^3) down to lets say O(n^2) or O(n log n) for example, the current implementation of qgroups will always be a problem. The real problem is that this more recent rework of qgroups does all this accounting inside the critical section of a transaction - blocking any other tasks that want to start a new transaction or attempt to join the current transaction. Not to mention that on systems with small amounts of memory (2Gb or 4Gb from what I've seen from user reports) we also OOM due this allocation of struct btrfs_qgroup_extent_record per delayed data reference head, that are used for that accounting phase in the critical section of a transaction commit. Let's face it and be realistic, even if someone manages to make find_parent_node() much much better, like O(n) for example, it will always be a problem due to the reasons mentioned before. Many extents touched per transaction and many subvolumes/snapshots, will always expose that root problem - doing the accounting in the transaction commit critical section. You must accept the fact that we must call find_parent_node() at least twice to get correct owner modification for each touched extent. Or qgroup number will never be correct. One for old_roots by searching commit root, and one for new_roots by searching current root. You can call find_parent_node() as many time as you like, but that's just wasting your CPU time. Only the final find_parent_node() will determine new_roots for that extent, and there is no better timing than commit_transaction(). You're missing my point. My point is not about needing to call find_parent_nodes() nor how many times to call it, or whether it's needed or not. My point is about doing expensive things inside the critical section of a transaction commit, which leads not only to low performance but getting a system becoming unresponsive and with too high latency - and this is not theory or speculation, there are upstream reports about this as well as several in suse's bugzilla, all caused when qgroups are enabled on 4.2+ kernels (when the last qgroups major changes landed). Judging from that code and from your reply to this and other threads it seems you didn't understand the consequences of doing all that accounting stuff inside the critical section of a transaction commit. NO, I know what you're talking about. Or I won't send the patch to
Re: Very slow balance / btrfs-transaction
On Wed, Feb 8, 2017 at 12:39 AM, Qu Wenruowrote: > > > At 02/07/2017 11:55 PM, Filipe Manana wrote: >> >> On Tue, Feb 7, 2017 at 12:22 AM, Qu Wenruo >> wrote: >>> >>> >>> >>> At 02/07/2017 12:09 AM, Goldwyn Rodrigues wrote: Hi Qu, On 02/05/2017 07:45 PM, Qu Wenruo wrote: > > > > > At 02/04/2017 09:47 AM, Jorg Bornschein wrote: >> >> >> February 4, 2017 1:07 AM, "Goldwyn Rodrigues" >> wrote: >> >> >> Quata support was indeed active -- and it warned me that the qroup >> data was inconsistent. >> >> Disabling quotas had an immediate impact on balance throughput -- it's >> *much* faster now! >> From a quick glance at iostat I would guess it's at least a factor 100 >> faster. >> >> >> Should quota support generally be disabled during balances? Or did I >> somehow push my fs into a weired state where it triggered a slow-path? >> >> >> >> Thanks! >> >>j > > > > Would you please provide the kernel version? > > v4.9 introduced a bad fix for qgroup balance, which doesn't completely > fix qgroup bytes leaking, but also hugely slow down the balance > process: > > commit 62b99540a1d91e46422f0e04de50fc723812c421 > Author: Qu Wenruo > Date: Mon Aug 15 10:36:51 2016 +0800 > > btrfs: relocation: Fix leaking qgroups numbers on data extents > > Sorry for that. > > And in v4.10, a better method is applied to fix the byte leaking > problem, and should be a little faster than previous one. > > commit 824d8dff8846533c9f1f9b1eabb0c03959e989ca > Author: Qu Wenruo > Date: Tue Oct 18 09:31:29 2016 +0800 > > btrfs: qgroup: Fix qgroup data leaking by using subtree tracing > > > However, using balance with qgroup is still slower than balance without > qgroup, the root fix needs us to rework current backref iteration. > This patch has made the btrfs balance performance worse. The balance task has become more CPU intensive compared to earlier and takes longer to complete, besides hogging resources. While correctness is important, we need to figure out how this can be made more efficient. >>> The cause is already known. >>> >>> It's find_parent_node() which takes most of the time to find all >>> referencer >>> of an extent. >>> >>> And it's also the cause for FIEMAP softlockup (fixed in recent release by >>> early quit). >>> >>> The biggest problem is, current find_parent_node() uses list to iterate, >>> which is quite slow especially it's done in a loop. >>> In real world find_parent_node() is about O(n^3). >>> We can either improve find_parent_node() by using rb_tree, or introduce >>> some >>> cache for find_parent_node(). >> >> >> Even if anyone is able to reduce that function's complexity from >> O(n^3) down to lets say O(n^2) or O(n log n) for example, the current >> implementation of qgroups will always be a problem. The real problem >> is that this more recent rework of qgroups does all this accounting >> inside the critical section of a transaction - blocking any other >> tasks that want to start a new transaction or attempt to join the >> current transaction. Not to mention that on systems with small amounts >> of memory (2Gb or 4Gb from what I've seen from user reports) we also >> OOM due this allocation of struct btrfs_qgroup_extent_record per >> delayed data reference head, that are used for that accounting phase >> in the critical section of a transaction commit. >> >> Let's face it and be realistic, even if someone manages to make >> find_parent_node() much much better, like O(n) for example, it will >> always be a problem due to the reasons mentioned before. Many extents >> touched per transaction and many subvolumes/snapshots, will always >> expose that root problem - doing the accounting in the transaction >> commit critical section. > > > You must accept the fact that we must call find_parent_node() at least twice > to get correct owner modification for each touched extent. > Or qgroup number will never be correct. > > One for old_roots by searching commit root, and one for new_roots by > searching current root. > > You can call find_parent_node() as many time as you like, but that's just > wasting your CPU time. > > Only the final find_parent_node() will determine new_roots for that extent, > and there is no better timing than commit_transaction(). You're missing my point. My point is not about needing to call find_parent_nodes() nor how many times to call it, or whether it's needed or not. My point is about doing expensive things inside the critical section of a transaction commit, which leads not only to low performance but getting a system becoming unresponsive and
Re: Very slow balance / btrfs-transaction
At 02/07/2017 11:55 PM, Filipe Manana wrote: On Tue, Feb 7, 2017 at 12:22 AM, Qu Wenruowrote: At 02/07/2017 12:09 AM, Goldwyn Rodrigues wrote: Hi Qu, On 02/05/2017 07:45 PM, Qu Wenruo wrote: At 02/04/2017 09:47 AM, Jorg Bornschein wrote: February 4, 2017 1:07 AM, "Goldwyn Rodrigues" wrote: Quata support was indeed active -- and it warned me that the qroup data was inconsistent. Disabling quotas had an immediate impact on balance throughput -- it's *much* faster now! From a quick glance at iostat I would guess it's at least a factor 100 faster. Should quota support generally be disabled during balances? Or did I somehow push my fs into a weired state where it triggered a slow-path? Thanks! j Would you please provide the kernel version? v4.9 introduced a bad fix for qgroup balance, which doesn't completely fix qgroup bytes leaking, but also hugely slow down the balance process: commit 62b99540a1d91e46422f0e04de50fc723812c421 Author: Qu Wenruo Date: Mon Aug 15 10:36:51 2016 +0800 btrfs: relocation: Fix leaking qgroups numbers on data extents Sorry for that. And in v4.10, a better method is applied to fix the byte leaking problem, and should be a little faster than previous one. commit 824d8dff8846533c9f1f9b1eabb0c03959e989ca Author: Qu Wenruo Date: Tue Oct 18 09:31:29 2016 +0800 btrfs: qgroup: Fix qgroup data leaking by using subtree tracing However, using balance with qgroup is still slower than balance without qgroup, the root fix needs us to rework current backref iteration. This patch has made the btrfs balance performance worse. The balance task has become more CPU intensive compared to earlier and takes longer to complete, besides hogging resources. While correctness is important, we need to figure out how this can be made more efficient. The cause is already known. It's find_parent_node() which takes most of the time to find all referencer of an extent. And it's also the cause for FIEMAP softlockup (fixed in recent release by early quit). The biggest problem is, current find_parent_node() uses list to iterate, which is quite slow especially it's done in a loop. In real world find_parent_node() is about O(n^3). We can either improve find_parent_node() by using rb_tree, or introduce some cache for find_parent_node(). Even if anyone is able to reduce that function's complexity from O(n^3) down to lets say O(n^2) or O(n log n) for example, the current implementation of qgroups will always be a problem. The real problem is that this more recent rework of qgroups does all this accounting inside the critical section of a transaction - blocking any other tasks that want to start a new transaction or attempt to join the current transaction. Not to mention that on systems with small amounts of memory (2Gb or 4Gb from what I've seen from user reports) we also OOM due this allocation of struct btrfs_qgroup_extent_record per delayed data reference head, that are used for that accounting phase in the critical section of a transaction commit. Let's face it and be realistic, even if someone manages to make find_parent_node() much much better, like O(n) for example, it will always be a problem due to the reasons mentioned before. Many extents touched per transaction and many subvolumes/snapshots, will always expose that root problem - doing the accounting in the transaction commit critical section. You must accept the fact that we must call find_parent_node() at least twice to get correct owner modification for each touched extent. Or qgroup number will never be correct. One for old_roots by searching commit root, and one for new_roots by searching current root. You can call find_parent_node() as many time as you like, but that's just wasting your CPU time. Only the final find_parent_node() will determine new_roots for that extent, and there is no better timing than commit_transaction(). Or you can wasting more time calling find_parent_node() every time you touched a extent, saving one find_parent_node() in commit_transaction() with the cost of more find_parent_node() in other place. Is that what you want? I can move the find_parent_node() for old_roots out of commit_transaction(). But that will only reduce 50% of the time spent on commit_transaction(). Compared to O(n^3) find_parent_node(), that's not the determining fact even. Thanks, Qu IIRC SUSE guys(maybe Jeff?) are working on it with the first method, but I didn't hear anything about it recently. Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: Very slow balance / btrfs-transaction
On 2017-02-07 14:47, Kai Krakow wrote: Am Mon, 6 Feb 2017 08:19:37 -0500 schrieb "Austin S. Hemmelgarn": MDRAID uses stripe selection based on latency and other measurements (like head position). It would be nice if btrfs implemented similar functionality. This would also be helpful for selecting a disk if there're more disks than stripesets (for example, I have 3 disks in my btrfs array). This could write new blocks to the most idle disk always. I think this wasn't covered by the above mentioned patch. Currently, selection is based only on the disk with most free space. You're confusing read selection and write selection. MDADM and DM-RAID both use a load-balancing read selection algorithm that takes latency and other factors into account. However, they use a round-robin write selection algorithm that only cares about the position of the block in the virtual device modulo the number of physical devices. Thanks for clearing that point. As an example, say you have a 3 disk RAID10 array set up using MDADM (this is functionally the same as a 3-disk raid1 mode BTRFS filesystem). Every third block starting from block 0 will be on disks 1 and 2, every third block starting from block 1 will be on disks 3 and 1, and every third block starting from block 2 will be on disks 2 and 3. No latency measurements are taken, literally nothing is factored in except the block's position in the virtual device. I didn't know MDADM can use RAID10 on odd amounts of disks... Nice. I'll keep that in mind. :-) It's one of those neat features that I stumbled across by accident a while back that not many people know about. It's kind of ironic when you think about it too, since the MD RAID10 profile with only 2 replicas is actually a more accurate comparison for the BTRFS raid1 profile than the MD RAID1 profile. FWIW, it can (somewhat paradoxically) sometimes get better read and write performance than MD RAID0 across the same number of disks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Very slow balance / btrfs-transaction
Am Mon, 6 Feb 2017 08:19:37 -0500 schrieb "Austin S. Hemmelgarn": > > MDRAID uses stripe selection based on latency and other measurements > > (like head position). It would be nice if btrfs implemented similar > > functionality. This would also be helpful for selecting a disk if > > there're more disks than stripesets (for example, I have 3 disks in > > my btrfs array). This could write new blocks to the most idle disk > > always. I think this wasn't covered by the above mentioned patch. > > Currently, selection is based only on the disk with most free > > space. > You're confusing read selection and write selection. MDADM and > DM-RAID both use a load-balancing read selection algorithm that takes > latency and other factors into account. However, they use a > round-robin write selection algorithm that only cares about the > position of the block in the virtual device modulo the number of > physical devices. Thanks for clearing that point. > As an example, say you have a 3 disk RAID10 array set up using MDADM > (this is functionally the same as a 3-disk raid1 mode BTRFS > filesystem). Every third block starting from block 0 will be on disks > 1 and 2, every third block starting from block 1 will be on disks 3 > and 1, and every third block starting from block 2 will be on disks 2 > and 3. No latency measurements are taken, literally nothing is > factored in except the block's position in the virtual device. I didn't know MDADM can use RAID10 on odd amounts of disks... Nice. I'll keep that in mind. :-) -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Very slow balance / btrfs-transaction
On Tue, Feb 7, 2017 at 12:22 AM, Qu Wenruowrote: > > > At 02/07/2017 12:09 AM, Goldwyn Rodrigues wrote: >> >> >> Hi Qu, >> >> On 02/05/2017 07:45 PM, Qu Wenruo wrote: >>> >>> >>> >>> At 02/04/2017 09:47 AM, Jorg Bornschein wrote: February 4, 2017 1:07 AM, "Goldwyn Rodrigues" wrote: >> >> >> >> Quata support was indeed active -- and it warned me that the qroup data was inconsistent. Disabling quotas had an immediate impact on balance throughput -- it's *much* faster now! From a quick glance at iostat I would guess it's at least a factor 100 faster. Should quota support generally be disabled during balances? Or did I somehow push my fs into a weired state where it triggered a slow-path? Thanks! j >>> >>> >>> Would you please provide the kernel version? >>> >>> v4.9 introduced a bad fix for qgroup balance, which doesn't completely >>> fix qgroup bytes leaking, but also hugely slow down the balance process: >>> >>> commit 62b99540a1d91e46422f0e04de50fc723812c421 >>> Author: Qu Wenruo >>> Date: Mon Aug 15 10:36:51 2016 +0800 >>> >>> btrfs: relocation: Fix leaking qgroups numbers on data extents >>> >>> Sorry for that. >>> >>> And in v4.10, a better method is applied to fix the byte leaking >>> problem, and should be a little faster than previous one. >>> >>> commit 824d8dff8846533c9f1f9b1eabb0c03959e989ca >>> Author: Qu Wenruo >>> Date: Tue Oct 18 09:31:29 2016 +0800 >>> >>> btrfs: qgroup: Fix qgroup data leaking by using subtree tracing >>> >>> >>> However, using balance with qgroup is still slower than balance without >>> qgroup, the root fix needs us to rework current backref iteration. >>> >> >> This patch has made the btrfs balance performance worse. The balance >> task has become more CPU intensive compared to earlier and takes longer >> to complete, besides hogging resources. While correctness is important, >> we need to figure out how this can be made more efficient. >> > The cause is already known. > > It's find_parent_node() which takes most of the time to find all referencer > of an extent. > > And it's also the cause for FIEMAP softlockup (fixed in recent release by > early quit). > > The biggest problem is, current find_parent_node() uses list to iterate, > which is quite slow especially it's done in a loop. > In real world find_parent_node() is about O(n^3). > We can either improve find_parent_node() by using rb_tree, or introduce some > cache for find_parent_node(). Even if anyone is able to reduce that function's complexity from O(n^3) down to lets say O(n^2) or O(n log n) for example, the current implementation of qgroups will always be a problem. The real problem is that this more recent rework of qgroups does all this accounting inside the critical section of a transaction - blocking any other tasks that want to start a new transaction or attempt to join the current transaction. Not to mention that on systems with small amounts of memory (2Gb or 4Gb from what I've seen from user reports) we also OOM due this allocation of struct btrfs_qgroup_extent_record per delayed data reference head, that are used for that accounting phase in the critical section of a transaction commit. Let's face it and be realistic, even if someone manages to make find_parent_node() much much better, like O(n) for example, it will always be a problem due to the reasons mentioned before. Many extents touched per transaction and many subvolumes/snapshots, will always expose that root problem - doing the accounting in the transaction commit critical section. > > > IIRC SUSE guys(maybe Jeff?) are working on it with the first method, but I > didn't hear anything about it recently. > > Thanks, > Qu > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Filipe David Manana, "People will forget what you said, people will forget what you did, but people will never forget how you made them feel." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Very slow balance / btrfs-transaction
At 02/07/2017 12:09 AM, Goldwyn Rodrigues wrote: Hi Qu, On 02/05/2017 07:45 PM, Qu Wenruo wrote: At 02/04/2017 09:47 AM, Jorg Bornschein wrote: February 4, 2017 1:07 AM, "Goldwyn Rodrigues"wrote: Quata support was indeed active -- and it warned me that the qroup data was inconsistent. Disabling quotas had an immediate impact on balance throughput -- it's *much* faster now! From a quick glance at iostat I would guess it's at least a factor 100 faster. Should quota support generally be disabled during balances? Or did I somehow push my fs into a weired state where it triggered a slow-path? Thanks! j Would you please provide the kernel version? v4.9 introduced a bad fix for qgroup balance, which doesn't completely fix qgroup bytes leaking, but also hugely slow down the balance process: commit 62b99540a1d91e46422f0e04de50fc723812c421 Author: Qu Wenruo Date: Mon Aug 15 10:36:51 2016 +0800 btrfs: relocation: Fix leaking qgroups numbers on data extents Sorry for that. And in v4.10, a better method is applied to fix the byte leaking problem, and should be a little faster than previous one. commit 824d8dff8846533c9f1f9b1eabb0c03959e989ca Author: Qu Wenruo Date: Tue Oct 18 09:31:29 2016 +0800 btrfs: qgroup: Fix qgroup data leaking by using subtree tracing However, using balance with qgroup is still slower than balance without qgroup, the root fix needs us to rework current backref iteration. This patch has made the btrfs balance performance worse. The balance task has become more CPU intensive compared to earlier and takes longer to complete, besides hogging resources. While correctness is important, we need to figure out how this can be made more efficient. The cause is already known. It's find_parent_node() which takes most of the time to find all referencer of an extent. And it's also the cause for FIEMAP softlockup (fixed in recent release by early quit). The biggest problem is, current find_parent_node() uses list to iterate, which is quite slow especially it's done in a loop. In real world find_parent_node() is about O(n^3). We can either improve find_parent_node() by using rb_tree, or introduce some cache for find_parent_node(). IIRC SUSE guys(maybe Jeff?) are working on it with the first method, but I didn't hear anything about it recently. Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Very slow balance / btrfs-transaction
Hi Qu, On 02/05/2017 07:45 PM, Qu Wenruo wrote: > > > At 02/04/2017 09:47 AM, Jorg Bornschein wrote: >> February 4, 2017 1:07 AM, "Goldwyn Rodrigues"wrote: >> >> >> Quata support was indeed active -- and it warned me that the qroup >> data was inconsistent. >> >> Disabling quotas had an immediate impact on balance throughput -- it's >> *much* faster now! >> From a quick glance at iostat I would guess it's at least a factor 100 >> faster. >> >> >> Should quota support generally be disabled during balances? Or did I >> somehow push my fs into a weired state where it triggered a slow-path? >> >> >> >> Thanks! >> >>j > > Would you please provide the kernel version? > > v4.9 introduced a bad fix for qgroup balance, which doesn't completely > fix qgroup bytes leaking, but also hugely slow down the balance process: > > commit 62b99540a1d91e46422f0e04de50fc723812c421 > Author: Qu Wenruo > Date: Mon Aug 15 10:36:51 2016 +0800 > > btrfs: relocation: Fix leaking qgroups numbers on data extents > > Sorry for that. > > And in v4.10, a better method is applied to fix the byte leaking > problem, and should be a little faster than previous one. > > commit 824d8dff8846533c9f1f9b1eabb0c03959e989ca > Author: Qu Wenruo > Date: Tue Oct 18 09:31:29 2016 +0800 > > btrfs: qgroup: Fix qgroup data leaking by using subtree tracing > > > However, using balance with qgroup is still slower than balance without > qgroup, the root fix needs us to rework current backref iteration. > This patch has made the btrfs balance performance worse. The balance task has become more CPU intensive compared to earlier and takes longer to complete, besides hogging resources. While correctness is important, we need to figure out how this can be made more efficient. -- Goldwyn -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Very slow balance / btrfs-transaction
On 2017-02-04 16:10, Kai Krakow wrote: Am Sat, 04 Feb 2017 20:50:03 + schrieb "Jorg Bornschein": February 4, 2017 1:07 AM, "Goldwyn Rodrigues" wrote: Yes, please check if disabling quotas makes a difference in execution time of btrfs balance. Just FYI: With quotas disabled it took ~20h to finish the balance instead of the projected >30 days. Therefore, in my case, there was a speedup of factor ~35. and thanks for the quick reply! (and for btrfs general!) BTW: I'm wondering how much sense it makes to activate the underlying bcache for my raid1 fs again. I guess btrfs chooses randomly (or based predicted of disk latency?) which copy of a given extend to load? As far as I know, it uses PID modulo only currently, no round-robin, no random value. There are no performance optimizations going into btrfs yet because there're still a lot of ongoing feature implementations. I think there were patches to include a rotator value in the stripe selection. They don't apply to the current kernel. I tried it once and didn't see any subjective difference for normal desktop workloads. But that's probably because I use RAID1 for metadata only. I had tested similar patches myself using raid1 for everything, and saw near zero improvement unless I explicitly tried to create a worst-case performance situation. The reality is that the current algorithm is actually remarkably close to being optimal for most use cases while using an insanely small amount of processing power and memory compared to an optimal algorithm (and a truly optimal algorithm is in fact functionally impossible in almost all cases because it would require predicting the future). MDRAID uses stripe selection based on latency and other measurements (like head position). It would be nice if btrfs implemented similar functionality. This would also be helpful for selecting a disk if there're more disks than stripesets (for example, I have 3 disks in my btrfs array). This could write new blocks to the most idle disk always. I think this wasn't covered by the above mentioned patch. Currently, selection is based only on the disk with most free space. You're confusing read selection and write selection. MDADM and DM-RAID both use a load-balancing read selection algorithm that takes latency and other factors into account. However, they use a round-robin write selection algorithm that only cares about the position of the block in the virtual device modulo the number of physical devices. As an example, say you have a 3 disk RAID10 array set up using MDADM (this is functionally the same as a 3-disk raid1 mode BTRFS filesystem). Every third block starting from block 0 will be on disks 1 and 2, every third block starting from block 1 will be on disks 3 and 1, and every third block starting from block 2 will be on disks 2 and 3. No latency measurements are taken, literally nothing is factored in except the block's position in the virtual device. Now, that said, BTRFS does behave differently under the same circumstances, but this is because the striping is different for BTRFS. It happens at the chunk level instead of the block level. If we look at an example using the same 3 devices as the MDADM example, and then for simplicity assume that you end up allocating alternating data and metadata chunks, things might look a bit like this: * System chunk: Device 1 and 2 * Metadata chunk 0: Device 3 and 1 * Data chunk 0: Device 2 and 3 * Metadata chunk 1: Device 1 and 2 * Data chunk 1: Device 1 and 2 Overall, there is technically a pattern, but it's got a very long repetition period. This is still however a near optimal allocation pattern given the constraints. It also gives (just like the MDADM and DM-RAID method) 100% deterministic behavior, the only difference is it depends on a slightly different factor. Changing this to select the most idle disk as you suggest would remove that determinism, increase the likelihood of sub-optimal layouts in terms of space usage, increase the number of cases where you could get ENOSPC, and provide near zero net performance benefit except under heavy load. IOW, it would provide a pretty negative net benefit. What actually needs to happen to improve write performance is that BTRFS needs to quit serializing writes when writing chunks across multiple devices. In the case of a raid1 setup, it writes first to one device, then the other, alternating back and forth as it updates each extent. This combined with the COW behavior causing write amplification is what makes write performance so horrible for BTRFS compared to MDADM or DM-RAID. It's not that we have bad device selection for writes, it's that we don't even try to do any kind of practical parallelization despite it being an embarrassingly parallel task (and yes, that seriously is what something that's trivial to parallelize is called in scientific papers...). -- To unsubscribe from
Re: Very slow balance / btrfs-transaction
At 02/06/2017 05:14 PM, Jorg Bornschein wrote: February 6, 2017 1:45 AM, "Qu Wenruo"Would you please provide the kernel version? v4.9 introduced a bad fix for qgroup balance, which doesn't completely fix qgroup bytes leaking, but also hugely slow down the balance process: I'm a bit behind the times: 4.8.13-1-ARCH j Unfortunately, v4.8 also has that bad commit :(. So if you have your spare time, you could try v4.10. Although for Archlinux it would take some time before v4.10 moved from [testing] to [core]. Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Very slow balance / btrfs-transaction
February 6, 2017 1:45 AM, "Qu Wenruo"> Would you please provide the kernel version? > > v4.9 introduced a bad fix for qgroup balance, which doesn't completely fix > qgroup bytes leaking, > but also hugely slow down the balance process: > I'm a bit behind the times: 4.8.13-1-ARCH j -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Very slow balance / btrfs-transaction
At 02/04/2017 09:47 AM, Jorg Bornschein wrote: February 4, 2017 1:07 AM, "Goldwyn Rodrigues"wrote: On 02/03/2017 06:30 PM, Jorg Bornschein wrote: February 3, 2017 11:26 PM, "Goldwyn Rodrigues" wrote: Hi, I'm currently running a balance (without any filters) on a 4 drives raid1 filesystem. The array contains 3 3TB drives and one 6TB drive; I'm running the rebalance because the 6TB drive recently replaced a 2TB drive. I know that balance is not supposed to be a fast operation, but this one is now running for ~6 days and it managed to balance ~18% (754 out of about 4250 chunks balanced (755 considered), 82% left) -- so I expect it to take another ~4 weeks. That seems excessively slow for ~8TiB of data. Is this expected behavior? In case it's not: Is there anything I can do to help debug it? Do you have quotas enabled? I might have activated it when playing with "snapper" -- I remember using some quota command without knowing what it does. How can I check its active? Shall I just disable it wit "btrfs quota disable"? To check your quota limits: # btrfs qgroup show To disable # btrfs quota disable Yes, please check if disabling quotas makes a difference in execution time of btrfs balance. Quata support was indeed active -- and it warned me that the qroup data was inconsistent. Disabling quotas had an immediate impact on balance throughput -- it's *much* faster now! From a quick glance at iostat I would guess it's at least a factor 100 faster. Should quota support generally be disabled during balances? Or did I somehow push my fs into a weired state where it triggered a slow-path? Thanks! j Would you please provide the kernel version? v4.9 introduced a bad fix for qgroup balance, which doesn't completely fix qgroup bytes leaking, but also hugely slow down the balance process: commit 62b99540a1d91e46422f0e04de50fc723812c421 Author: Qu Wenruo Date: Mon Aug 15 10:36:51 2016 +0800 btrfs: relocation: Fix leaking qgroups numbers on data extents Sorry for that. And in v4.10, a better method is applied to fix the byte leaking problem, and should be a little faster than previous one. commit 824d8dff8846533c9f1f9b1eabb0c03959e989ca Author: Qu Wenruo Date: Tue Oct 18 09:31:29 2016 +0800 btrfs: qgroup: Fix qgroup data leaking by using subtree tracing However, using balance with qgroup is still slower than balance without qgroup, the root fix needs us to rework current backref iteration. Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Very slow balance / btrfs-transaction
Am Sat, 04 Feb 2017 20:50:03 + schrieb "Jorg Bornschein": > February 4, 2017 1:07 AM, "Goldwyn Rodrigues" > wrote: > > > Yes, please check if disabling quotas makes a difference in > > execution time of btrfs balance. > > Just FYI: With quotas disabled it took ~20h to finish the balance > instead of the projected >30 days. Therefore, in my case, there was a > speedup of factor ~35. > > > and thanks for the quick reply! (and for btrfs general!) > > > BTW: I'm wondering how much sense it makes to activate the underlying > bcache for my raid1 fs again. I guess btrfs chooses randomly (or > based predicted of disk latency?) which copy of a given extend to > load? As far as I know, it uses PID modulo only currently, no round-robin, no random value. There are no performance optimizations going into btrfs yet because there're still a lot of ongoing feature implementations. I think there were patches to include a rotator value in the stripe selection. They don't apply to the current kernel. I tried it once and didn't see any subjective difference for normal desktop workloads. But that's probably because I use RAID1 for metadata only. MDRAID uses stripe selection based on latency and other measurements (like head position). It would be nice if btrfs implemented similar functionality. This would also be helpful for selecting a disk if there're more disks than stripesets (for example, I have 3 disks in my btrfs array). This could write new blocks to the most idle disk always. I think this wasn't covered by the above mentioned patch. Currently, selection is based only on the disk with most free space. > I guess that would mean the effective cache size would only be > half of the actual cache-set size (+-additional overhead)? Or does > btrfs try a deterministically determined copy of each extend first? I'm currently using 500GB bcache, it helps a lot during system start - and probably also while using using the system. I think that bcache mostly caches metadata access which should improve a lot of btrfs performance issues. The downside of RAID1 profile is, that probably every second access is a cache-miss unless it has already been cached. Thus, it's only half-effective as it could be. I'm using write-back bcache caching, and RAID0 for data (I do daily backups with borgbackup, I can easily recover broken files). So writing with bcache is not such an issue for me. The cache is big enough that double metadata writes are no problem. -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Very slow balance / btrfs-transaction
February 4, 2017 1:07 AM, "Goldwyn Rodrigues"wrote: > Yes, please check if disabling quotas makes a difference in execution > time of btrfs balance. Just FYI: With quotas disabled it took ~20h to finish the balance instead of the projected >30 days. Therefore, in my case, there was a speedup of factor ~35. and thanks for the quick reply! (and for btrfs general!) BTW: I'm wondering how much sense it makes to activate the underlying bcache for my raid1 fs again. I guess btrfs chooses randomly (or based predicted of disk latency?) which copy of a given extend to load? I guess that would mean the effective cache size would only be half of the actual cache-set size (+-additional overhead)? Or does btrfs try a deterministically determined copy of each extend first? j -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Very slow balance / btrfs-transaction
Lakshmipathi.G posted on Sat, 04 Feb 2017 08:25:04 +0530 as excerpted: >>Should quota support generally be disabled during balances? > > If this true and quota impacts balance throughput, at-least there should > an alert message like "Running Balance with quota will affect > performance" or similar before starting. The problem isn't that, exactly, tho that's part of it. The problem with quotas is that the feature itself isn't yet mature. At least until very recently, and possibly still, quotas couldn't be depended upon to work correctly (various not entirely uncommon corner-cases would trigger negative numbers, etc), and even when they do work correctly, they simply don't scale well in combination with balance, check, etc -- that 10X difference isn't uncommon. So my recommendation for quotas has been and remains, unless you're actively working with the devs on improving them, it's probably better to keep them disabled. Either you actually need quota functionality or you don't. If you do, it's better to use a mature filesystem where quotas are a mature feature that works dependably. If you don't, just leave the feature off, as it continues to simply not be worth the troubles and scaling issues it triggers. IOW, btrfs quotas might work and scale well some day, but that day isn't today, and it's not going to be tomorrow or next kernel cycle, either. It's going to take awhile, and you'll be much happier with btrfs in the mean time if you don't have them enabled. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Very slow balance / btrfs-transaction
>Should quota support generally be disabled during balances? If this true and quota impacts balance throughput, at-least there should an alert message like "Running Balance with quota will affect performance" or similar before starting. Cheers, Lakshmipathi.G -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Very slow balance / btrfs-transaction
February 4, 2017 1:07 AM, "Goldwyn Rodrigues"wrote: > On 02/03/2017 06:30 PM, Jorg Bornschein wrote: > >> February 3, 2017 11:26 PM, "Goldwyn Rodrigues" wrote: >> >> Hi, >> >> I'm currently running a balance (without any filters) on a 4 drives raid1 >> filesystem. The array >> contains 3 3TB drives and one 6TB drive; I'm running the rebalance because >> the 6TB drive recently >> replaced a 2TB drive. >> >> I know that balance is not supposed to be a fast operation, but this one is >> now running for ~6 days >> and it managed to balance ~18% (754 out of about 4250 chunks balanced (755 >> considered), 82% left) >> -- so I expect it to take another ~4 weeks. >> >> That seems excessively slow for ~8TiB of data. >> >> Is this expected behavior? In case it's not: Is there anything I can do to >> help debug it? >>> Do you have quotas enabled? >> >> I might have activated it when playing with "snapper" -- I remember using >> some quota command >> without knowing what it does. >> >> How can I check its active? Shall I just disable it wit "btrfs quota >> disable"? > > To check your quota limits: > # btrfs qgroup show > > To disable > # btrfs quota disable > > Yes, please check if disabling quotas makes a difference in execution > time of btrfs balance. Quata support was indeed active -- and it warned me that the qroup data was inconsistent. Disabling quotas had an immediate impact on balance throughput -- it's *much* faster now! >From a quick glance at iostat I would guess it's at least a factor 100 faster. Should quota support generally be disabled during balances? Or did I somehow push my fs into a weired state where it triggered a slow-path? Thanks! j -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Very slow balance / btrfs-transaction
On 02/03/2017 06:30 PM, Jorg Bornschein wrote: > February 3, 2017 11:26 PM, "Goldwyn Rodrigues"wrote: > >>> Hi, >>> >>> I'm currently running a balance (without any filters) on a 4 drives raid1 >>> filesystem. The array >>> contains 3 3TB drives and one 6TB drive; I'm running the rebalance because >>> the 6TB drive recently >>> replaced a 2TB drive. >>> >>> I know that balance is not supposed to be a fast operation, but this one is >>> now running for ~6 days >>> and it managed to balance ~18% (754 out of about 4250 chunks balanced (755 >>> considered), 82% left) >>> -- so I expect it to take another ~4 weeks. >>> >>> That seems excessively slow for ~8TiB of data. >>> >>> Is this expected behavior? In case it's not: Is there anything I can do to >>> help debug it? >> >> Do you have quotas enabled? > > > I might have activated it when playing with "snapper" -- I remember using > some quota command without knowing what it does. > > How can I check its active? Shall I just disable it wit "btrfs quota > disable"? > To check your quota limits: # btrfs qgroup show To disable # btrfs quota disable Yes, please check if disabling quotas makes a difference in execution time of btrfs balance. -- Goldwyn -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Very slow balance / btrfs-transaction
February 3, 2017 11:26 PM, "Goldwyn Rodrigues"wrote: >> Hi, >> >> I'm currently running a balance (without any filters) on a 4 drives raid1 >> filesystem. The array >> contains 3 3TB drives and one 6TB drive; I'm running the rebalance because >> the 6TB drive recently >> replaced a 2TB drive. >> >> I know that balance is not supposed to be a fast operation, but this one is >> now running for ~6 days >> and it managed to balance ~18% (754 out of about 4250 chunks balanced (755 >> considered), 82% left) >> -- so I expect it to take another ~4 weeks. >> >> That seems excessively slow for ~8TiB of data. >> >> Is this expected behavior? In case it's not: Is there anything I can do to >> help debug it? > > Do you have quotas enabled? I might have activated it when playing with "snapper" -- I remember using some quota command without knowing what it does. How can I check its active? Shall I just disable it wit "btrfs quota disable"? j -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Very slow balance / btrfs-transaction
On 02/03/2017 04:13 PM, j...@capsec.org wrote: > Hi, > > > I'm currently running a balance (without any filters) on a 4 drives raid1 > filesystem. The array contains 3 3TB drives and one 6TB drive; I'm running > the rebalance because the 6TB drive recently replaced a 2TB drive. > > > I know that balance is not supposed to be a fast operation, but this one is > now running for ~6 days and it managed to balance ~18% (754 out of about 4250 > chunks balanced (755 considered), 82% left) -- so I expect it to take > another ~4 weeks. > > That seems excessively slow for ~8TiB of data. > > > Is this expected behavior? In case it's not: Is there anything I can do to > help debug it? Do you have quotas enabled? -- Goldwyn -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Very slow balance / btrfs-transaction
Hi, I'm currently running a balance (without any filters) on a 4 drives raid1 filesystem. The array contains 3 3TB drives and one 6TB drive; I'm running the rebalance because the 6TB drive recently replaced a 2TB drive. I know that balance is not supposed to be a fast operation, but this one is now running for ~6 days and it managed to balance ~18% (754 out of about 4250 chunks balanced (755 considered), 82% left) -- so I expect it to take another ~4 weeks. That seems excessively slow for ~8TiB of data. Is this expected behavior? In case it's not: Is there anything I can do to help debug it? The 4 individual devices are bcache devices with currently no ssd cache partition attached; the bcache backing devices sit ontop of luks encrypted devices. Maybe a few words about the history of this fs: It used to be a 1 drive btrfs ontop of a bcache partition with a 30GiB SSD cache (actively used for >1 year). During the last month, I gradually added devices (always with active bcaches). At some point, after adding the 4th device, I deactivated (detached) the bcache caching device and instead activated raid1 for data and metadata and ran a rebalance (which was reasonably fast -- I don't remember how fast exactly, but probably <24h). The finaly steps that lead to the current situation: I activated "nossd" and replaced the smallest device with "btrfs dev replace" (which was also reasonabley fast, <12h). Best & thanks, j -- [joerg@dorsal ~]$ lsblk NAMEMAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:00 111.8G 0 disk ├─sda18:10 1G 0 part /boot └─sda28:20 110.8G 0 part └─crypted 254:00 110.8G 0 crypt ├─ssd-root 254:10 72.8G 0 lvm / ├─ssd-swap 254:20 8G 0 lvm [SWAP] └─ssd-cache 254:3030G 0 lvm sdb 8:16 0 2.7T 0 disk └─sdb18:17 0 2.7T 0 part └─crypted-sdb 254:70 2.7T 0 crypt └─bcache2 253:20 2.7T 0 disk sdc 8:32 0 2.7T 0 disk └─sdc18:33 0 2.7T 0 part └─crypted-sdc 254:40 2.7T 0 crypt └─bcache1 253:10 2.7T 0 disk sdd 8:48 0 2.7T 0 disk └─sdd18:49 0 2.7T 0 part └─crypted-sdd 254:60 2.7T 0 crypt └─bcache0 253:00 2.7T 0 disk sde 8:64 0 5.5T 0 disk └─sde18:65 0 5.5T 0 part └─crypted-sde 254:50 5.5T 0 crypt └─bcache3 253:30 5.5T 0 disk /storage -- joerg@dorsal ~]$ sudo btrfs fi usage -h /storage/ Overall: Device size: 13.64TiB Device allocated: 8.35TiB Device unallocated:5.29TiB Device missing: 0.00B Used: 8.34TiB Free (estimated): 2.65TiB (min: 2.65TiB) Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 15.77MiB) Data,RAID1: Size:4.17TiB, Used:4.16TiB /dev/bcache02.38TiB /dev/bcache12.37TiB /dev/bcache22.38TiB /dev/bcache31.20TiB Metadata,RAID1: Size:9.00GiB, Used:7.49GiB /dev/bcache18.00GiB /dev/bcache21.00GiB /dev/bcache39.00GiB System,RAID1: Size:32.00MiB, Used:624.00KiB /dev/bcache1 32.00MiB /dev/bcache3 32.00MiB Unallocated: /dev/bcache0 355.52GiB /dev/bcache1 356.49GiB /dev/bcache2 355.52GiB /dev/bcache34.25TiB -- [joerg@dorsal ~]$ ps -xal | grep btrfs 1 0 227 2 0 -20 0 0 - S< ? 0:00 [btrfs-worker] 1 0 229 2 0 -20 0 0 - S< ? 0:00 [btrfs-worker-hi] 1 0 230 2 0 -20 0 0 - S< ? 0:00 [btrfs-delalloc] 1 0 231 2 0 -20 0 0 - S< ? 0:00 [btrfs-flush_del] 1 0 232 2 0 -20 0 0 - S< ? 0:00 [btrfs-cache] 1 0 233 2 0 -20 0 0 - S< ? 0:00 [btrfs-submit] 1 0 234 2 0 -20 0 0 - S< ? 0:00 [btrfs-fixup] 1 0 235 2 0 -20 0 0 - S< ? 0:00 [btrfs-endio] 1 0 236 2 0 -20 0 0 - S< ? 0:00 [btrfs-endio-met] 1 0 237 2 0 -20 0 0 - S< ? 0:00 [btrfs-endio-met] 1 0 238 2 0 -20 0 0 - S< ? 0:00 [btrfs-endio-rai] 1 0 239 2 0 -20 0 0 - S< ? 0:00 [btrfs-endio-rep] 1 0 240 2 0 -20 0 0 - S< ? 0:00 [btrfs-rmw] 1 0 241 2 0 -20 0 0 - S< ? 0:00 [btrfs-endio-wri] 1 0 242 2 0 -20 0 0 - S< ? 0:00 [btrfs-freespace] 1 0 243 2 0 -20 0 0 - S< ? 0:00 [btrfs-delayed-m] 1 0 244 2 0 -20 0