On Tue, Feb 7, 2017 at 12:22 AM, Qu Wenruo <quwen...@cn.fujitsu.com> wrote: > > > At 02/07/2017 12:09 AM, Goldwyn Rodrigues wrote: >> >> >> Hi Qu, >> >> On 02/05/2017 07:45 PM, Qu Wenruo wrote: >>> >>> >>> >>> At 02/04/2017 09:47 AM, Jorg Bornschein wrote: >>>> >>>> February 4, 2017 1:07 AM, "Goldwyn Rodrigues" <rgold...@suse.de> wrote: >> >> >> <snipped> >> >>>> >>>> >>>> Quata support was indeed active -- and it warned me that the qroup >>>> data was inconsistent. >>>> >>>> Disabling quotas had an immediate impact on balance throughput -- it's >>>> *much* faster now! >>>> From a quick glance at iostat I would guess it's at least a factor 100 >>>> faster. >>>> >>>> >>>> Should quota support generally be disabled during balances? Or did I >>>> somehow push my fs into a weired state where it triggered a slow-path? >>>> >>>> >>>> >>>> Thanks! >>>> >>>> j >>> >>> >>> Would you please provide the kernel version? >>> >>> v4.9 introduced a bad fix for qgroup balance, which doesn't completely >>> fix qgroup bytes leaking, but also hugely slow down the balance process: >>> >>> commit 62b99540a1d91e46422f0e04de50fc723812c421 >>> Author: Qu Wenruo <quwen...@cn.fujitsu.com> >>> Date: Mon Aug 15 10:36:51 2016 +0800 >>> >>> btrfs: relocation: Fix leaking qgroups numbers on data extents >>> >>> Sorry for that. >>> >>> And in v4.10, a better method is applied to fix the byte leaking >>> problem, and should be a little faster than previous one. >>> >>> commit 824d8dff8846533c9f1f9b1eabb0c03959e989ca >>> Author: Qu Wenruo <quwen...@cn.fujitsu.com> >>> Date: Tue Oct 18 09:31:29 2016 +0800 >>> >>> btrfs: qgroup: Fix qgroup data leaking by using subtree tracing >>> >>> >>> However, using balance with qgroup is still slower than balance without >>> qgroup, the root fix needs us to rework current backref iteration. >>> >> >> This patch has made the btrfs balance performance worse. The balance >> task has become more CPU intensive compared to earlier and takes longer >> to complete, besides hogging resources. While correctness is important, >> we need to figure out how this can be made more efficient. >> > The cause is already known. > > It's find_parent_node() which takes most of the time to find all referencer > of an extent. > > And it's also the cause for FIEMAP softlockup (fixed in recent release by > early quit). > > The biggest problem is, current find_parent_node() uses list to iterate, > which is quite slow especially it's done in a loop. > In real world find_parent_node() is about O(n^3). > We can either improve find_parent_node() by using rb_tree, or introduce some > cache for find_parent_node().
Even if anyone is able to reduce that function's complexity from O(n^3) down to lets say O(n^2) or O(n log n) for example, the current implementation of qgroups will always be a problem. The real problem is that this more recent rework of qgroups does all this accounting inside the critical section of a transaction - blocking any other tasks that want to start a new transaction or attempt to join the current transaction. Not to mention that on systems with small amounts of memory (2Gb or 4Gb from what I've seen from user reports) we also OOM due this allocation of struct btrfs_qgroup_extent_record per delayed data reference head, that are used for that accounting phase in the critical section of a transaction commit. Let's face it and be realistic, even if someone manages to make find_parent_node() much much better, like O(n) for example, it will always be a problem due to the reasons mentioned before. Many extents touched per transaction and many subvolumes/snapshots, will always expose that root problem - doing the accounting in the transaction commit critical section. > > > IIRC SUSE guys(maybe Jeff?) are working on it with the first method, but I > didn't hear anything about it recently. > > Thanks, > Qu > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Filipe David Manana, "People will forget what you said, people will forget what you did, but people will never forget how you made them feel." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html