Re: How does Suse do live filesystem revert with btrfs?

2014-05-07 Thread Marc MERLIN
On Tue, May 06, 2014 at 04:26:48PM +, Duncan wrote:
 Marc MERLIN posted on Sun, 04 May 2014 22:04:59 -0700 as excerpted:
 
  On Mon, May 05, 2014 at 01:36:39AM +0100, Hugo Mills wrote:
 I'm guessing it involves reflink copies of files from the snapshot
  back to the original, and then restarting affected services. That's
  about the only other thing that I can think of, but it's got load of
  race conditions in it (albeit difficult to hit in most cases, I
  suspect).
  
  Aaah, right, you can use a script to see the file differences between
  two snapshots, and then restore that with reflink if you can truly get a
  list of all changed files.
  However, that is indeed not atomic at all, even if faster than rsync.
 
 Would send/receive help in such a script?

Not really, you still end up with a new snapshot that you can't live
switch to.

It's really either
1) reboot
2) use cp --reflink to copy a list of changed files (as well as rm to
delete the ones that were removed).

I'm currently using btrfs-diff (below) which shows changed files but it
doesn't show files deleted.

Is there something better that would show me which files changed and how
between 2 snapshots?

btrfs-diff:
-
#!/bin/bash

usage() { echo $@ 2; echo Usage: $0 older-snapshot newer-snapshot 2; 
exit 1; }

[ $# -eq 2 ] || usage Incorrect invocation;
SNAPSHOT_OLD=$1;
SNAPSHOT_NEW=$2;

[ -d $SNAPSHOT_OLD ] || usage $SNAPSHOT_OLD does not exist;
[ -d $SNAPSHOT_NEW ] || usage $SNAPSHOT_NEW does not exist;

OLD_TRANSID=`btrfs subvolume find-new $SNAPSHOT_OLD 999`
OLD_TRANSID=${OLD_TRANSID#transid marker was }
[ -n $OLD_TRANSID -a $OLD_TRANSID -gt 0 ] || usage Failed to find 
generation for $SNAPSHOT_NEW

btrfs subvolume find-new $SNAPSHOT_NEW $OLD_TRANSID | sed '$d' | cut -f17- 
-d' ' | sort | uniq
-

Thanks,
Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How does Suse do live filesystem revert with btrfs?

2014-05-07 Thread Duncan
Marc MERLIN posted on Wed, 07 May 2014 01:56:12 -0700 as excerpted:

 On Tue, May 06, 2014 at 04:26:48PM +, Duncan wrote:
 Marc MERLIN posted on Sun, 04 May 2014 22:04:59 -0700 as excerpted:
 
  
  Aaah, right, you can use a script to see the file differences between
  two snapshots, and then restore that with reflink if you can truly
  get a list of all changed files.
  However, that is indeed not atomic at all, even if faster than rsync.
 
 Would send/receive help in such a script?
 
 Not really, you still end up with a new snapshot that you can't live
 switch to.
 
 It's really either 1) reboot 2) use cp --reflink to copy a list of
 changed files (as well as rm to delete the ones that were removed).

What I meant was... use send/receive locally, in place of the
cp --reflink.

But now that I think of it, at least in the normal sense that wouldn't 
work, since send is like diff and receive like patch, but what would be 
needed would actually be an option similar to patch --reverse.  With 
something like that, you could (in theory, in practice it'd be racy if 
other running apps were writing to it too) reverse the live subvolume 
to the state of the snapshot.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How does Suse do live filesystem revert with btrfs?

2014-05-07 Thread Marc MERLIN
On Wed, May 07, 2014 at 11:35:52AM +, Duncan wrote:
 Marc MERLIN posted on Wed, 07 May 2014 01:56:12 -0700 as excerpted:
 
  On Tue, May 06, 2014 at 04:26:48PM +, Duncan wrote:
  Marc MERLIN posted on Sun, 04 May 2014 22:04:59 -0700 as excerpted:
  
   
   Aaah, right, you can use a script to see the file differences between
   two snapshots, and then restore that with reflink if you can truly
   get a list of all changed files.
   However, that is indeed not atomic at all, even if faster than rsync.
  
  Would send/receive help in such a script?
  
  Not really, you still end up with a new snapshot that you can't live
  switch to.
  
  It's really either 1) reboot 2) use cp --reflink to copy a list of
  changed files (as well as rm to delete the ones that were removed).
 
 What I meant was... use send/receive locally, in place of the
 cp --reflink.

This won't work since it can only work on another read-only subvolume.

But you could use btrfs send -p to get a list of changes between 2
snapshots, decode that (without btrfs receive) just to spit out the
names of the files that changed or got deleted.
It would be wasteful since it would cause all the changed blocks to be
read on the source, but still better than nothing.

Really, we'd just need a btrfs --send --dry-run -v -p vol1 vol2 
which would spit out a list of the file ops it would do.

That'd be enough to simply grep out the deletes, do them locally and
then use cp --reflink on everything else.

Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How does Suse do live filesystem revert with btrfs?

2014-05-07 Thread Goffredo Baroncelli
On 05/07/2014 01:39 PM, Marc MERLIN wrote:
 On Wed, May 07, 2014 at 11:35:52AM +, Duncan wrote:
 Marc MERLIN posted on Wed, 07 May 2014 01:56:12 -0700 as excerpted:

 On Tue, May 06, 2014 at 04:26:48PM +, Duncan wrote:
 Marc MERLIN posted on Sun, 04 May 2014 22:04:59 -0700 as excerpted:


 Aaah, right, you can use a script to see the file differences between
 two snapshots, and then restore that with reflink if you can truly
 get a list of all changed files.
 However, that is indeed not atomic at all, even if faster than rsync.

 Would send/receive help in such a script?

 Not really, you still end up with a new snapshot that you can't live
 switch to.

 It's really either 1) reboot 2) use cp --reflink to copy a list of
 changed files (as well as rm to delete the ones that were removed).

 What I meant was... use send/receive locally, in place of the
 cp --reflink.
 
 This won't work since it can only work on another read-only subvolume.
 
 But you could use btrfs send -p to get a list of changes between 2
 snapshots, decode that (without btrfs receive) just to spit out the
 names of the files that changed or got deleted.
 It would be wasteful since it would cause all the changed blocks to be
 read on the source, but still better than nothing.
 
 Really, we'd just need a btrfs --send --dry-run -v -p vol1 vol2 
 which would spit out a list of the file ops it would do.
 
 That'd be enough to simply grep out the deletes, do them locally and
 then use cp --reflink on everything else.

What happens to the already opened files ? I suppose that a process which has 
already opened a file, see the old one; instead a new open could see the new 
one.
If this is acceptable, why not doing mount --bind /snapshot /, or use 
pivot_root(2), or a overlay filesystem ?
May be that we need to move also the other already mounted_filesystem (like 
/proc, /sys)...



 
 Marc
 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How does Suse do live filesystem revert with btrfs?

2014-05-06 Thread Duncan
Marc MERLIN posted on Sat, 03 May 2014 17:52:57 -0700 as excerpted:

 (more questions I'm asking myself while writing my talk slides)
 
 I know Suse uses btrfs to roll back filesystem changes.
 
 So I understand how you can take a snapshot before making a change, but
 not how you revert to that snapshot without rebooting or using rsync,
 
 How do you do a pivot-root like mountpoint swap to an older snapshot,
 especially if you have filehandles opened on the current snapshot?
 
 Is that what Suse manages, or are they doing something simpler?

While I don't have any OpenSuSE specific knowledge on this, I strongly 
suspect their solution is more along the select-the-root-snapshot-to-roll-
back-to-from-the-initramfs/initrd line.

Consider, they do the snapshot, then the upgrade.  In-use files won't be 
entirely removed and the upgrade actually activated for them until a 
reboot or at least an application restart[1] for all those running apps 
in ordered to free their in-use files, anyway.  At that point, if the 
user finds something broke, they've just rebooted[1], so rebooting[1] to 
select the pre-upgrade rootfs snapshot won't be too big a deal, since 
they've already disrupted the normal high level session and have just 
attempted a reload in ordered to discover the breakage, in the first 
place.

IOW, for the rootfs and main system, anyway, the rollback technology is a 
great step up from not having that snapshot to rollback to in the first 
place, but it's /not/ /magic/; if a rollback is needed, they almost 
certainly will need to reboot[1] and from there select the rootfs 
snapshot to rollback to, in ordered to mount it and accomplish that 
rollback.

---
[1] Reboot:  Or possibly dipped to single user mode, and/or to the 
initramfs, which they'd need to reload and switch-root into for the 
purpose, but systemd is doing just that sort of thing these days in 
ordered to properly unmount rootfs after upgrades before shutdown as it's 
a step safer than the old style remount read-only, and implementing a 
snapshot selector and remount of the rootfs in that initr* instead of 
dropping all the way to a full reboot is only a small step from there.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How does Suse do live filesystem revert with btrfs?

2014-05-06 Thread Duncan
Marc MERLIN posted on Sun, 04 May 2014 22:04:59 -0700 as excerpted:

 On Mon, May 05, 2014 at 01:36:39AM +0100, Hugo Mills wrote:
I'm guessing it involves reflink copies of files from the snapshot
 back to the original, and then restarting affected services. That's
 about the only other thing that I can think of, but it's got load of
 race conditions in it (albeit difficult to hit in most cases, I
 suspect).
 
 Aaah, right, you can use a script to see the file differences between
 two snapshots, and then restore that with reflink if you can truly get a
 list of all changed files.
 However, that is indeed not atomic at all, even if faster than rsync.

Would send/receive help in such a script?

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How does Suse do live filesystem revert with btrfs?

2014-05-05 Thread Marc MERLIN
On Sun, May 04, 2014 at 09:23:12PM -0600, Chris Murphy wrote:
 
 On May 4, 2014, at 5:26 PM, Marc MERLIN m...@merlins.org wrote:
 
  Actually, never mind Suse, does someone know whether you can revert to
  an older snapshot in place?
 
 They are using snapper. Updates are not atomic, that is they
 are applied to the currently mounted fs, not the snapshot, and
 after update the system is rebooted using the same (now updated)
 subvolumes. The rollback I think creates another snapshot and an
 earlier snapshot is moved into place because they are using the top
 level (subvolume id 5) for rootfs.

Ok. If they are rebooting, then it's easy, I know how to do this myself
:)
 
 Production baremetal systems need well tested and safe update
 strategies that avoid update related problems, so that rollbacks
 aren't even necessary. Or such systems can tolerate rebooting.

I wasn't worried about rollbacks as much as doing a btrfs send to a new
snapshot and then atomically switching to it without rebooting.
I work at google where we do file level OS upgrades (not on btrfs since
this was designed over 10 years ago), and I was kind of curious how I
could re-implement that with btrfs send/receive.
While this is off topic here, if you're curious about our update system:
http://marc.merlins.org/linux/talks/ProdNG-LISA/html/
or
http://marc.merlins.org/linux/talks/ProdNG-LISA/Paper/ for the detailed
paper.

 If the use case considers rebooting a bit problem, then either a
 heavy weight virtual machine should be used, or something lighter
 weight like LXC containers. systemd-nspawn containers I think are
 still not considered for production use, but for testing and proof of
 concept you could see if it can boot arbitrary subvolumes - I think
 it can. And they boot really fast, like maybe a few seconds fast. For
 user space applications needing rollbacks, that's where application
 containers come in handy - you could either have two applications
 icons available (current and previous) and if on Btrfs the previous
 version could be a reflink copy.
 
1) containers/VMs and boots (even if fast) were not something I wanted
to involve in my design, but your point is well taken that in my cases
they work fine.
2) reflink seems like the way to update an existing volume with data
from another one you just btrfs received on, but can't atomically mount.


 Maybe there's some way to quit everything but the kernel and PID 1
 switching back to an initrd, and then at switch root time, use a new
 root with all new daemons and libraries. It'd be faster than a warm
 reboot. It probably takes a special initrd to do this. The other thin
 you can consider is kexec, but then going forward realize this isn't
 compatible with a UEFI Secure Boot world.

Secure boot is not a problem for me :)
But yes, kexec is basically my fallback for something better than a full
boot.

 Well I think the bigger issue with system updates is the fact they're not 
 atomic right now. The running system has a bunch of libraries yanked out from 
 under it during the update process, things are either partially updated, or 
 wholly replaced, and it's just a matter of time before something up in user 
 space really doesn't like that. This was a major motivation for offline 
 updates in gnome, so certain updates require reboot/poweroff.

Gnome and ubuntu are lazy :)
(but seriously, they are)

We've been doing almost atomic live system upgrades at google for
about 12 years. It's not trivial, but it's very possible.
Mind you, when you something with a spaghetti library dependency like
gnome, that sure doesn't help though, but one could argue that gnome is
part of the problem :)
 
 To take advantage of Btrfs (and LVM thinp snapshots for that matter) what we
 ought to do is take a snapshot of rootfs and update the snapshot in a chroot
 or a container. And then the user can reboot whenever its convenient for them,
 and instead of a much, much longer reboot as the updates are applied, they get
 a normal boot. Plus there could be some metric to test for whether the update
 process was even successful, or likely to result in an unbootable system; and
 at that point the snapshot could just be obliterated and the reasons logged.

While this is not as good as the update system I'm currently working
with at work, I agree it's decent and simple way to do things.

 Already look at how Fedora does this. The file system at the top level
 of a Btrfs volume is not FHS. It's its own thing, and only via fstab
 do the subvolumes at the top level get mounted in accordance with
 the FHS. So that means you get to look at fstab to figure out how a
 system is put together when troubleshooting it, if you're not already
 familiar with the layout. Will every distribution end up doing their
 own thing? Almost certainly yes, SUSE does it differently still as a
 consequence of installing the whole OS to the top level, making every
 snapshot navigable from the always mounted top level. *shrug*

Right. Brave new 

Re: How does Suse do live filesystem revert with btrfs?

2014-05-04 Thread Marc MERLIN
Actually, never mind Suse, does someone know whether you can revert to
an older snapshot in place?
The only way I can think of is to mount the snapshot on top of the other
filesystem. This gets around the umounting a filesystem with open
filehandles problem, but this also means that you have to keep track of
daemons that are still accessing filehandles on the overlayed
filesystem.

My one concern with this approach is that you can't free up the
subvolume/snapshot of the underlying filesystem if it's mounted and even
after you free up filehandles pointing to it, I don't think you can
umount it.

In other words, you can play this trick to delay a reboot a bit, but
ultimately you'll have to reboot to free up the mountpoints, old
subvolumes, and be able to delete them.

Somehow I'm thinking Suse came up with a better method.

Even if you don't know Suse, can you think of a better way to do this?

Thanks,
Marc

On Sat, May 03, 2014 at 05:52:57PM -0700, Marc MERLIN wrote:
 (more questions I'm asking myself while writing my talk slides)
 
 I know Suse uses btrfs to roll back filesystem changes.
 
 So I understand how you can take a snapshot before making a change, but
 not how you revert to that snapshot without rebooting or using rsync,
 
 How do you do a pivot-root like mountpoint swap to an older snapshot,
 especially if you have filehandles opened on the current snapshot?
 
 Is that what Suse manages, or are they doing something simpler?
 
 Thanks,
 Marc

-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How does Suse do live filesystem revert with btrfs?

2014-05-04 Thread Hugo Mills
On Sun, May 04, 2014 at 04:26:45PM -0700, Marc MERLIN wrote:
 Actually, never mind Suse, does someone know whether you can revert to
 an older snapshot in place?

   Not while the system's running useful services, no.

 The only way I can think of is to mount the snapshot on top of the other
 filesystem. This gets around the umounting a filesystem with open
 filehandles problem, but this also means that you have to keep track of
 daemons that are still accessing filehandles on the overlayed
 filesystem.

   You have a good handle on the problems.

 My one concern with this approach is that you can't free up the
 subvolume/snapshot of the underlying filesystem if it's mounted and even
 after you free up filehandles pointing to it, I don't think you can
 umount it.
 
 In other words, you can play this trick to delay a reboot a bit, but
 ultimately you'll have to reboot to free up the mountpoints, old
 subvolumes, and be able to delete them.

   Yup.

 Somehow I'm thinking Suse came up with a better method.

   I'm guessing it involves reflink copies of files from the snapshot
back to the original, and then restarting affected services. That's
about the only other thing that I can think of, but it's got load of
race conditions in it (albeit difficult to hit in most cases, I
suspect).

   Hugo.

 Even if you don't know Suse, can you think of a better way to do this?
 
 Thanks,
 Marc
 
 On Sat, May 03, 2014 at 05:52:57PM -0700, Marc MERLIN wrote:
  (more questions I'm asking myself while writing my talk slides)
  
  I know Suse uses btrfs to roll back filesystem changes.
  
  So I understand how you can take a snapshot before making a change, but
  not how you revert to that snapshot without rebooting or using rsync,
  
  How do you do a pivot-root like mountpoint swap to an older snapshot,
  especially if you have filehandles opened on the current snapshot?
  
  Is that what Suse manages, or are they doing something simpler?
  
  Thanks,
  Marc
 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- That's not rain,  that's a lake with slots in it. ---


signature.asc
Description: Digital signature


Re: How does Suse do live filesystem revert with btrfs?

2014-05-04 Thread Chris Murphy

On May 4, 2014, at 5:26 PM, Marc MERLIN m...@merlins.org wrote:

 Actually, never mind Suse, does someone know whether you can revert to
 an older snapshot in place?

They are using snapper. Updates are not atomic, that is they are applied to the 
currently mounted fs, not the snapshot, and after update the system is rebooted 
using the same (now updated) subvolumes. The rollback I think creates another 
snapshot and an earlier snapshot is moved into place because they are using the 
top level (subvolume id 5) for rootfs.

 The only way I can think of is to mount the snapshot on top of the other
 filesystem. This gets around the umounting a filesystem with open
 filehandles problem, but this also means that you have to keep track of
 daemons that are still accessing filehandles on the overlayed
 filesystem.

Production baremetal systems need well tested and safe update strategies that 
avoid update related problems, so that rollbacks aren't even necessary. Or such 
systems can tolerate rebooting.

If the use case considers rebooting a bit problem, then either a heavy weight 
virtual machine should be used, or something lighter weight like LXC 
containers. systemd-nspawn containers I think are still not considered for 
production use, but for testing and proof of concept you could see if it can 
boot arbitrary subvolumes - I think it can. And they boot really fast, like 
maybe a few seconds fast. For user space applications needing rollbacks, that's 
where application containers come in handy - you could either have two 
applications icons available (current and previous) and if on Btrfs the 
previous version could be a reflink copy.

Maybe there's some way to quit everything but the kernel and PID 1 switching 
back to an initrd, and then at switch root time, use a new root with all new 
daemons and libraries. It'd be faster than a warm reboot. It probably takes a 
special initrd to do this. The other thin you can consider is kexec, but then 
going forward realize this isn't compatible with a UEFI Secure Boot world.

 
 My one concern with this approach is that you can't free up the
 subvolume/snapshot of the underlying filesystem if it's mounted and even
 after you free up filehandles pointing to it, I don't think you can
 umount it.
 
 In other words, you can play this trick to delay a reboot a bit, but
 ultimately you'll have to reboot to free up the mountpoints, old
 subvolumes, and be able to delete them.

Well I think the bigger issue with system updates is the fact they're not 
atomic right now. The running system has a bunch of libraries yanked out from 
under it during the update process, things are either partially updated, or 
wholly replaced, and it's just a matter of time before something up in user 
space really doesn't like that. This was a major motivation for offline updates 
in gnome, so certain updates require reboot/poweroff.

To take advantage of Btrfs (and LVM thinp snapshots for that matter) what we 
ought to do is take a snapshot of rootfs and update the snapshot in a chroot or 
a container. And then the user can reboot whenever its convenient for them, and 
instead of a much, much longer reboot as the updates are applied, they get a 
normal boot. Plus there could be some metric to test for whether the update 
process was even successful, or likely to result in an unbootable system; and 
at that point the snapshot could just be obliterated and the reasons logged.

Of course this update the snapshot idea poses some problems with the FHS 
because there are things in /var that the current system needs to continue to 
write to, and yet so does the new system, and they shouldn't necessarily be 
separate, e.g. logs. /usr is a given, /boot is a given, and then /home should 
be dealt with differently because we probably shouldnt ever have rollbacks of 
/home but rather retrieval of deleted files from a snapshot into the current 
/home using reflink. So we either need some FHS re-evaluation with atomic 
system updates, and system rollbacks in mind. Or we end up needing a lot of 
subvolumes to carve the necessarily snapshotting/rollback granularity needed. 
And this makes for a less well understood system: how it functions, how to 
troubleshoot it, etc. So I'm more in favor of changes to the FHS.

Already look at how Fedora does this. The file system at the top level of a 
Btrfs volume is not FHS. It's its own thing, and only via fstab do the 
subvolumes at the top level get mounted in accordance with the FHS. So that 
means you get to look at fstab to figure out how a system is put together when 
troubleshooting it, if you're not already familiar with the layout. Will every 
distribution end up doing their own thing? Almost certainly yes, SUSE does it 
differently still as a consequence of installing the whole OS to the top level, 
making every snapshot navigable from the always mounted top level. *shrug*

Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs