On Wed, Oct 12, 2016 at 11:18:49PM +1100, Dave Chinner wrote:
> Hi Linus,
>
> This is the second part of the XFS updates for this merge cycle.
> This pullreq contains the new shared data extents feature for XFS,
> and can be found at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git
> tags/xfs-reflink-for-linus-4.9-rc1
>
> The full pull request output is below.
>
> Given the complexity and size of this change I am expecting - like
> the addition of reverse mapping last cycle - that there will be some
> follow-up bug fixes and cleanups around the -rc3 stage for issues
> that I'm sure will show up once the code hits a wider userbase.
>
> What it is:
>
> At the most basic level we are simply adding shared data extents to
> XFS - i.e. a single extent on disk can now have multiple owners. To
> do this we have to add new on-disk features to both track the shared
> extents and the number of times they've been shared. This is done by
> the new "refcount" btree that sits in every allocation group. When
> we share or unshare an extent, this tree gets updated.
>
> Along with this new tree, the reverse mapping tree needs to be
> updated to track each owner or a shared extent. This also needs to
> be updated ever share/unshare operation. These interactions at
> extent allocation and freeing time have complex ordering and
> recovery constraints, so there's a significant amount of new
> intent-based transaction code to ensure that operations are
> performed atomically from both the runtime and integrity/crash
> recovery perspectives.
>
> We also need to break sharing when writes hit a shared extent - this
> is where the new copy-on-write implementation comes in. We allocate
> new storage and copy the original data along with the overwrite data
> into the new location. We only do this for data as we don't share
> metadata at all - each inode has it's own metadata that tracks the
> shared data extents, the extents undergoing CoW and it's own private
> extents.
>
> Of course, being XFS, nothing is simple - we use delayed allocation
> for CoW similar to how we use it for normal writes. ENOSPC is a
> significant issue here - we build on the reservation code added
> in 4.8-rc1 with the reverse mapping feature to ensure we don't get
> spurious ENOSPC issues part way through a CoW operation. These
> mechanisms also help minimise fragmentation due to repeated CoW
> operations. To further reduce fragmentation overhead, we've also
> introduced a CoW extent size hint, which indicates how large a
> region we should allocate when we execute a CoW operation.
>
> With all this functionality in place, we can hook up
> .copy_file_range, .clone_file_range and .dedupe_file_range and we
> gain all the capabilities of reflink and other vfs provided
> functionality that enable manipulation to shared extents. We also
> added a fallocate mode that explicitly unshares a range of a file,
> which we implemented as an explicit CoW of all the shared extents in
> a file.
>
> As such, it's a huge chunk of new functionality with new on-disk
> format features and internal infrastructure. It warns at mount time
> as an experimental feature and that it may eat data (as we do with
> all new on-disk features until they stabilise). We have not
> released userspace suport for it yet - userspace support currently
> requires download from Darrick's xfsprogs repo and build from
> source, so the access to this feature is really developer/tester
> only at this point. Initial userspace support will be released at
> the same time the kernel with this code in it is released.
Userland support is in this branch:
https://github.com/djwong/xfsprogs/tree/for-dave-for-4.9-15
There will undoubtedly be more of these since Dave will libxfs-apply
the kernel patches into for-next after the merge window closes, after
which I'll rebase the tool patches against that.
> The new code causes 5-6 new failures with xfstests - these aren't
> serious functional failures but things the output of tests changing
> slightly due to perturbations in layouts, space usage, etc. OTOH,
> we've added 150+ new tests to xfstests that specifically exercise
> this new functionality so it's got far better test coverage than any
> functionality we've previously added to XFS.
https://github.com/djwong/xfstests/tree/djwong-devel
have fixes to some of the tests tests, if you dare. :)
I'll resync with upstream the next time I see a xfstests.git update.
(Merge window is open, so I don't anticipate that until next week.)
> Darrick has done a pretty amazing job getting us to this stage, and
> special mention also needs to go to Christoph (review, testing,
> improvements and bug fixes) and Brian (caught several intricate
> bugs during review) for the effort they've also put in.
Yes, my hearty thanks to Dave, Christoph, and Brian for their support!
--D
>
> Thanks,
>
> -Dave.
>
> --
> The following changes since commit 155cd433b516506df065866f3d974661f6473572