Hi Eric,

On Wed, Mar 1, 2023 at 10:26 PM Eric Wheeler <dm-de...@lists.ewheeler.net>
wrote:

>
> Hurrah! I've been looking forward to this for a long time...
>
>
> ...So if you have any commentary on the future of dm-thin with respect
> to metadata range support, or dm-thin performance in general, that I would
> be very curious about your roadmap and your plans.
>

The plan over the next few months is roughly:

- Get people using the new Rust tools.  They are _so_ much faster than the
old C++ ones. [available now]
- Push upstream a set of patches I've been working on to boost thin
concurrency performance.  These are
  nearing completion and are available here for those who are interested:
https://github.com/jthornber/linux/tree/2023-02-28-thin-concurrency-7.
  These are making a huge difference to performance in my testing, eg, fio
with 16 jobs running concurrently gets several times the throughput.
  [Upstream in the next month hopefully]
- Change thinp metadata to store ranges rather than individual mappings.
This will reduce the amount of space the metadata consumes, and
  have the knock on effect of boosting performance slightly (less metadata
means faster lookups).  However I consider this a half-way house, in
  that I'm only going to change the metadata and not start using ranges
within the core target (I'm not moving away from fixed block sizes).  [Next
3 months]

I don't envisage significant changes to dm-thin or dm-cache after this.


Longer term I think we're nearing a crunch point where we drastically
change how we do things.  Since I wrote device-mapper in 2001 the speed of
devices has increased so much that I think dm is no longer doing a good job:

- The layering approach introduces inefficiencies with each layer.  Sure it
may only be a 5% hit to add another linear mapping into the stack.
  But those 5%'s add up.
- dm targets only see individual bios rather than the whole request queue.
This prevents a lot of really useful optimisations.
  Think how much smarter dm-cache and dm-thin could be if they could look
at the whole queue.
- The targets are getting too complicated.  I think dm-thin is around 8k
lines of code, though it shares most of that with dm-cache.
   I understand the dedup target from the vdo guys weighs in at 64k lines.
Kernel development is fantastically expensive (or slow depending
   how you want to look at it).  I did a lot of development work on thinp
v2, and it was looking a lot like a filesystem shoe-horned into the block
layer.
   I can see why bcache turned into bcache-fs.
- Code within the block layer is memory constrained.  We can't allocate
arbitrary sized allocations within targets, instead we have to use mempools
  of fixed size objects (frowned upon these days), or declare up front how
much memory we need to service a bio (forcing us to assume the worst case).
  This stuff isn't hard, just tedious and makes coding sophisticated
targets pretty joyless.

So my plan going forwards is to keep the fast path of these targets in
kernel (eg, a write to a provisioned, unsnapshotted region).  But take
the slow paths out to userland.  I think io_uring, and ublk have shown us
that this is viable.  That way a snapshot copy-on-write, or dm-cache data
migration, which are very slow operations can be done with ordinary
userland code.  For the fast paths, layering will be removed by having
userland give the kernel
instruction to execute for specific regions of the virtual device (ie.
remap to here).  The kernel driver will have nothing specific to thin/cache
etc.
I'm not sure how many of the current dm-targets would fit into this model,
but I'm sure thin provisioning, caching, linear, and stripe can.

- Joe








> Thanks again for all your great work on this.
>
> -Eric
>
> > [note: _data_ sharing was always maintained, this is purely about
> metadata space usage]
> >
> > # thin_metadata_pack/unpack
> >
> > These are a couple of new tools that are used for support.  They compress
> > thin metadata, typically to a tenth of the size (much better than you'd
> > get with generic compressors).  This makes it easier to pass damaged
> > metadata around for inspection.
> >
> > # blk-archive
> >
> > The blk-archive tools were initially part of this thin-provisioning-tools
> > package.  But have now been split off to their own project:
> >
> >     https://github.com/jthornber/blk-archive
> >
> > They allow efficient archiving of thin devices (data deduplication
> > and compression).  Which will be of interest to those of you who are
> > holding large numbers of snapshots in thin pools as a poor man's backup.
> >
> > In particular:
> >
> >     - Thin snapshots can be used to archive live data.
> >     - it avoids reading unprovisioned areas of thin devices.
> >     - it can calculate deltas between thin devices to minimise
> >       how much data is read and deduped (incremental backups).
> >     - restoring to a thin device tries to maximise data sharing
> >       within the thin pool (a big win if you're restoring snapshots).
> >
> >
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

Reply via email to