Re: Transactional btrfs

Adam Borowski Sat, 08 Sep 2018 14:28:40 -0700

On Sat, Sep 08, 2018 at 08:45:47PM +0000, Martin Raiber wrote:
> Am 08.09.2018 um 18:24 schrieb Adam Borowski:
> > On Thu, Sep 06, 2018 at 06:08:33AM -0400, Austin S. Hemmelgarn wrote:
> >> On 2018-09-06 03:23, Nathan Dehnel wrote:
> >>> So I guess my question is, does btrfs support atomic writes across
> >>> multiple files? Or is anyone interested in such a feature?
> >>>
> >> I'm fairly certain that it does not currently, but in theory it would not 
> >> be
> >> hard to add.


> >> However, if this were extended to include rename, unlink, touch, and a
> >> handful of other VFS operations, then I can easily think of a few dozen use
> >> cases.  Package managers in particular would likely be very interested in
> >> being able to atomically rename a group of files as a single transaction, 
> >> as
> >> it would make their job _much_ easier.

> > I wonder, what about:
> > sync; mount -o remount,commit=9999999,flushoncommit
> > eatmydata apt dist-upgrade
> > sync; mount -o remount,commit=30,noflushoncommit
> >
> > Obviously, this gets fooled by fsyncs, and makes the transaction affects the
> > whole system (if you have unrelated writes they won't get committed until
> > the end of transaction).  Then there are nocow files, but you already made
> > the decision to disable most features of btrfs for them.

> Now combine this with snapshot root, then on success rename exchange to
> root and you are there.

No need: no unsuccessful transactions ever get written to the disk.
(Not counting unreachable stuff.)

> Btrfs had in the past TRANS_START and TRANS_END ioctls (for ceph, I
> think), but no rollback (and therefore no error handling incl. ENOSPC).
> 
> If you want to look at a working file system transaction mechanism, you
> should look at transactional NTFS (TxF). They are writing they are
> deprecating it, so it's perhaps not very widely used. Windows uses it
> for updates, I think.

You're talking about multiple simultaneous transactions, they have a massive
complexity cost.  And btrfs is already ridiculously complex.  I don't really
see a good way to tie this with the POSIX API without some serious
rethinking.

dpkg can already recover from a properly returned error (although not as
nicely as a full rollback); what is fatal for it is having its status
database corrupted/out of sync.  That's why it does a multiple fsync dance
and keeps fully rewriting its files over and over and over.

Atomic operations are pretty useful even without a rollback: you still need
to be able to handle failure, but not a crash.

> Specifically for btrfs, the problem would be that it really needs to
> support multiple simultaneous writers, otherwise one transaction can
> block the whole system.

My dirty hack above doesn't suffer from such a block: it only suffers from
compromising durability of concurrent writers.  During that userspace
transaction, there are no commits until it finishes; this means that if
there's unrelated activity it may suffer from losing writes that were done
between transaction start and crash.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ What Would Jesus Do, MUD/MMORPG edition:
⣾⠁⢰⠒⠀⣿⡁ • multiplay with an admin char to benefit your mortal [Mt3:16-17]
⢿⡄⠘⠷⠚⠋⠀ • abuse item cloning bugs [Mt14:17-20, Mt15:34-37]
⠈⠳⣄⠀⠀⠀⠀ • use glitches to walk on water [Mt14:25-26]

Re: Transactional btrfs

Reply via email to