Re: Rework qgroup accounting

Liu Bo Wed, 18 Dec 2013 18:02:02 -0800

On Wed, Dec 18, 2013 at 04:07:26PM -0500, Josef Bacik wrote:
> People have been complaining about autodefrag/defrag killing their box with 
> OOM.
> This is because the snapshot aware defrag stuff super sucks if you have lots 
> of
> snapshots, and so that needs to be reworked.  The problem is once that is 
> fixed
> you start to hit horrible lock contention on the delayed refs lock because we
> have thousands of like entries that can't be merged until when we go to 
> actually
> run the delayed ref.  This problem exists because of the delayed ref sequence
> number.
> 
> The major user of the delayed ref sequence number is the qgroup code.  It uses
> it to pass into btrfs_find_all_roots to see what roots pointed to a particular
> bytenr either before or including the current operation.  It needs this
> information to know if we were removing the last ref or an just the last ref 
> for
> this particular root.  The problem with this is that it has made the delayed 
> ref
> code incredibly fragile and has forced us to do things like
> btrfs_merge_delayed_refs which is what is causing us so much pain when we have
> thousands of ref updates for the same block.
> 
> In order to fix this I'm introducing a new way of adjusting quota counts.  
> I've
> called them qgroup operations, and we apply them in very specific situations.
> We only add these when we add or remove the only ref for a particular root.
> Obviously we have to account for shared refs as well so there is some extra 
> code
> for these special cases, but basically we make the qgroup accounting only 
> happen
> when we know there was a real change (or likely a real change in the case of
> shared refs).
> 
> In order to do this I've also introduced lock/unlock_ref.  This only gets used
> if we actually have qgroups enabled, but it will be relatively low cost even 
> if
> we have qgroups enabled as it only locks the bytenr for reference updates.  So
> delayed ref updates will not trip over this since we only do one at a time
> anyway, so we'll only have contention if we have delayed refs running at the
> same time as a qgroup operation update.
> 
> Then all we need to account for is the fact that we will get the full view of
> the roots at the time we run the operations, not what they were when our
> particular operation occurred.  This is ok because we will either ignore our
> root in the case of add or not ignore it in case of remove when calculating 
> the
> ref counts.  We use the same ref counting scheme that Arne developed as it's
> pretty freaking awesome, and just adjust how we count the ref counts based on
> our operations.
> 
> In addition to all of this new code I've added a big set of sanity tests to 
> make
> sure everything is working right.  Between this and the qgroups xfstests I'm
> pretty certain I haven't broken anything obvious with qgroups.  This is just 
> the
> first step in getting rid of the delayed ref sequence number and fixing the
> defrag OOM mess but it is the biggest part.  Thanks,


I'd say I love the idea, will look at it closer.

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Rework qgroup accounting

Reply via email to