Re: Explaining snapshots (for backup)

2022-11-16 Thread Curt
On 2022-11-16, David Christensen  wrote:
>
> Of course, a reverse operation is desirable -- e.g. "restore".

Very, very desirable.



Re: Explaining snapshots (for backup)

2022-11-15 Thread David Christensen

On 11/15/22 10:32, to...@tuxteam.de wrote:


[1] https://btrfs.wiki.kernel.org/images-btrfs/6/68/Btree_TOS.pdf



Thank you for the link.


Thank you IBM for making that paper public (and not behind a pay wall).


David




Re: Explaining snapshots (for backup)

2022-11-15 Thread David Christensen

On 11/15/22 06:27, rhkra...@gmail.com wrote:

I'm not really clear on the concept of a snapshot (for backup) -- I've done a
little googling but haven't found an explanation that "satisfies" me.

In this email I want to hyptothesize on what a snapshot might be in the hope
that others can correct / amplify it when I go wrong.

Starting from a beginning, I suppose I could copy the entire contents of
whatever I wanted to make a snapshot of (by any of a variety of tools -- dd,
cp, ...) and call that a snapshot, although the more common name for it would
be a "full backup".

Later, I could copy any files (as files) that have changed since the date I made
the full backup and call that a snapshot.  (I might use find to identify the
files that have changed, and then any of a variety of tools to actually copy
them somewhere (and call that a shapshot).)

I suppose it wouild be possible to do something like this at a block level,
that is copying only blocks of a file which have changed.  (Not sure which
commands might help me do that, but I don't think I'd really be interested in
doing that.)

I also hear (i.e., read) statements from which I infer that some snapshots
included only the metadata of the files (or blocks???), but I'm not sure of the
value of either of those to me -- how can you reconstruct a possibly missing
file (or block) from the metadata?



In the content of computers, "backup" has many meanings for myself:

1.  Copying files and directories from one filesystem to another 
filesystem, using tools such as cp(1), scp(1), and rsync(1).  Both 
filesystems are mounted.


2.  Copying files and directories to a database or file structure, using 
tools such as amanda(8) and borg(1).  The source filesystem is mounted 
and the destination may be files and directories on a mounted filesystem.


3.  Serializing files and directories to a file, using tools such as 
cpio(1) and tar(1).  (I tend to use the term "archive".)  The source 
filesystem is mounted and the destination file is on a mounted filesystem.


4.  Serializing a filesystem to a file, using tools such as dump(8). (I 
tend to use the term "dump".)  The source filesystem is not mounted and 
the destination file is on a mounted filesystem.


5.  Storage media used to store "backups", created by whatever means; 
umounted or unmounted.


6. Secondary networks, hardware, software, data, etc. ("resources") that 
can be used in place of primary resources.



I expect there are many more meanings for "backup"; both those that I 
have forgotten and meanings known to other people.



Of course, a reverse operation is desirable -- e.g. "restore".


For myself, "snapshot" is a feature of advanced filesystems and volume 
managers, such as btrfs(8), zfs(8), and lvm(8).  My understanding is 
that snapshots are internal to the filesystem or volume manager. 
Additional operations include "roll back", "send", "receive", 
"replicate", and "destroy".  Of course, snapshots can be backed up, to 
backup media or servers, etc..



Related terms include "image" and "clone", which operate at the device 
(block) level using tools such as dd(1).  Operations include copying one 
device to another device ("cloning"), serializing a device to a file 
("taking an image"), and deserializing a file to a device ("restoring an 
image").   Sophisticated tools such as clonezilla 
(https://clonezilla.org/) understand and can perform partitioning, 
volume, filesystem, bootloader, and other operations.



To further add confusion, people can and do use the above terms both as 
verbs and as nouns, in multiple contexts, in the same sentence.



David



Re: Explaining snapshots (for backup)

2022-11-15 Thread tomas
On Tue, Nov 15, 2022 at 11:10:16AM -0500, rhkra...@gmail.com wrote:
> Intentionally top-posting;  Thanks to all who replied, I think I have a 
> pretty 
> good understanding and I think the biggest thing I was missing was how a file 
> could be reconstructed using only the metadata, but I now see the explanation 
> that filesystems that can do snapshots are COW and now things make sense.

The underlying mechanism for all that is, most of the time, some variant
of a B-tree, where you just build up a modified B-tree up from the leaves
up to the root. The new root is just the new file system's state. If you
want to hold onto the old one (aka snapshot), you hold onto the old root.

This is an idea which generated in the database world (called MVCC, Multi
Version Concurrency Control); Postgres (PostgreSQL's grandma) was one of
the pioneers in that area, AFAIK. Back then you could do "time travel",
rolling back the whole database. Needless to say, it ate storage for
breakfast. An expensive breakfast, back then ;-)

Here's a paper [1] on how btrfs does that. I'd expect the other snapshot
capable file systems to work similarly (they have some pretty pics :)

So a snapshot itself is extremely fast and cheap -- just holding on to many
of them will exhaust your disk at some point.

Cheers

[1] https://btrfs.wiki.kernel.org/images-btrfs/6/68/Btree_TOS.pdf
-- 
t


signature.asc
Description: PGP signature


Re: Explaining snapshots (for backup)

2022-11-15 Thread Dan Ritter
pa...@quillandmouse.com wrote: 
> On Tue, 15 Nov 2022 09:40:11 -0500
> Dan Ritter  wrote:
> 
> > rhkra...@gmail.com wrote: 
> > > I'm not really clear on the concept of a snapshot (for backup) --
> > > I've done a little googling but haven't found an explanation that
> > > "satisfies" me.
> > > 
> > > Starting from a beginning, I suppose I could copy the entire
> > > contents of whatever I wanted to make a snapshot of (by any of a
> > > variety of tools -- dd, cp, ...) and call that a snapshot, although
> > > the more common name for it would be a "full backup".
> > 
> > Let's look at the larger circumstances.
> > 
> > In ordinary usage, there are tens to thousands of processes
> > runnning on your system. Some of them are emitting logs or
> > writing files.
> > 
> > Taking a backup takes some time. During that time, some files
> > get written, some get opened, and some are related to each other
> > (by the processes) in ways which are inconsistent until all of
> > them are written.
> > 
> > A snapshot differs from a backup in two important regards:
> > 
> > - first, it requires the filesystem to bring writes to a halt.
> > There is now a consistent view.
> > 
> > - second, it doesn't actually copy things. It just records their
> > state and, when done, allows future writes to continue -- writes
> > which are not part of this snapshot.
> > 
> > As a result, you can take a snapshot and then:
> > 
> > - discard it (trivial)
> > 
> > - look through it and copy off any file or group of files, thus
> > getting what they contained at the time of the snapshot, not the
> > what they contain now (excellent for recovering from an
> > accidental delete)
> > 
> > - copy all of it off elsewhere, producing a consistent full
> > backup.
> > 
> 
> Assuming a snapshot is taken so that you can recover a filesystem to a
> previous state (or the current state). Is that correct?

Yes.

> I don't understand "recording the state" of files. To me, this means
> the ownership, size, etc., not the contents. That doesn't seem valuable
> for recovering the state of a system.

That's the metadata. A snapshot fixes the data as well as the
metadata. State is everything written to disk.

> Let's assume, as the OP says, you do an original full backup. A
> snapshot ought to record either the contents of all the files which
> have changed, or record the delta of each file which has changed.
> Thus, you'd be able to recover a filesystem to either some prior state
> or its current state, using the snapshot.
> 
> Am I missing something?

Well, you don't need to recover it to the current state, because
you already have that. But, yes, a snapshot from yesterday
allows you to copy from that snapshot any or all changed files
to the current filesystem.

-dsr-



Re: Explaining snapshots (for backup)

2022-11-15 Thread debian-user
> On Tue, 15 Nov 2022 09:40:11 -0500
> Dan Ritter  wrote:
> 
> > rhkra...@gmail.com wrote:   
> > > I'm not really clear on the concept of a snapshot (for backup) --
> > > I've done a little googling but haven't found an explanation that
> > > "satisfies" me.
> > > 
> > > Starting from a beginning, I suppose I could copy the entire
> > > contents of whatever I wanted to make a snapshot of (by any of a
> > > variety of tools -- dd, cp, ...) and call that a snapshot,
> > > although the more common name for it would be a "full backup".  
> > 
> > Let's look at the larger circumstances.
> > 
> > In ordinary usage, there are tens to thousands of processes
> > runnning on your system. Some of them are emitting logs or
> > writing files.
> > 
> > Taking a backup takes some time. During that time, some files
> > get written, some get opened, and some are related to each other
> > (by the processes) in ways which are inconsistent until all of
> > them are written.
> > 
> > A snapshot differs from a backup in two important regards:
> > 
> > - first, it requires the filesystem to bring writes to a halt.
> > There is now a consistent view.
> > 
> > - second, it doesn't actually copy things. It just records their
> > state and, when done, allows future writes to continue -- writes
> > which are not part of this snapshot.
> > 
> > As a result, you can take a snapshot and then:
> > 
> > - discard it (trivial)
> > 
> > - look through it and copy off any file or group of files, thus
> > getting what they contained at the time of the snapshot, not the
> > what they contain now (excellent for recovering from an
> > accidental delete)
> > 
> > - copy all of it off elsewhere, producing a consistent full
> > backup.
> >   
> 
> Assuming a snapshot is taken so that you can recover a filesystem to a
> previous state (or the current state). Is that correct?
> 
> I don't understand "recording the state" of files. To me, this means
> the ownership, size, etc., not the contents. That doesn't seem
> valuable for recovering the state of a system.
> 
> Let's assume, as the OP says, you do an original full backup. A
> snapshot ought to record either the contents of all the files which
> have changed, or record the delta of each file which has changed.
> Thus, you'd be able to recover a filesystem to either some prior state
> or its current state, using the snapshot.
> 
> Am I missing something?

I think what you're missing is that 'snapshots' are usually an
attribute of a copy-on-write (COW) filesystem such as btrfs or zfs. So
a snapshot is a list of the blocks that belong to particular files at
some particular moment. Once the snapshot has been taken, the
filesystem can resume writes by copying any blocks that need to be
written subsequently and updating the current map of which blocks
belong to which files, but not touching the snapshot's map. And when
looking for empty space on the disk to use, it ignores any block that
appears in any map.



Re: Explaining snapshots (for backup)

2022-11-15 Thread Joe
On Tue, 15 Nov 2022 09:27:05 -0500
rhkra...@gmail.com wrote:

> I'm not really clear on the concept of a snapshot (for backup) --
> I've done a little googling but haven't found an explanation that
> "satisfies" me.
> 
> In this email I want to hyptothesize on what a snapshot might be in
> the hope that others can correct / amplify it when I go wrong.
> 
> Starting from a beginning, I suppose I could copy the entire contents
> of whatever I wanted to make a snapshot of (by any of a variety of
> tools -- dd, cp, ...) and call that a snapshot, although the more
> common name for it would be a "full backup".
> 
> Later, I could copy any files (as files) that have changed since the
> date I made the full backup and call that a snapshot.  (I might use
> find to identify the files that have changed, and then any of a
> variety of tools to actually copy them somewhere (and call that a
> shapshot).)
> 
> I suppose it wouild be possible to do something like this at a block
> level, that is copying only blocks of a file which have changed.
> (Not sure which commands might help me do that, but I don't think I'd
> really be interested in doing that.)
> 
> I also hear (i.e., read) statements from which I infer that some
> snapshots included only the metadata of the files (or blocks???), but
> I'm not sure of the value of either of those to me -- how can you
> reconstruct a possibly missing file (or block) from the metadata? 
> 
> Thanks!
> 

LVM can be used to make a snapshot, which can then be copied to make a
consistent backup. Here's one tutorial, there are others:

https://devconnected.com/lvm-snapshots-backup-and-restore-on-linux/

It's obviously easier if you already use LVM for your filesystems. You
need unallocated space, preferably as large as the entity you are
copying, though on a quiet system you can get away with less. If you
run out of space during the snapshot, it is worthless, and you need to
start again. You can create an LVM volume on an external drive.

Here are a couple of explanations of LVM snaphots, which is somewhat
similar to how the Shadow Copy function of Windows works:

https://www.thomas-krenn.com/en/wiki/LVM_Snapshots_Information

https://documentation.suse.com/sles/12-SP4/html/SLES-all/cha-lvm-snapshots.html

-- 
Joe
 



Re: Explaining snapshots (for backup)

2022-11-15 Thread rhkramer
Intentionally top-posting;  Thanks to all who replied, I think I have a pretty 
good understanding and I think the biggest thing I was missing was how a file 
could be reconstructed using only the metadata, but I now see the explanation 
that filesystems that can do snapshots are COW and now things make sense.

Thanks again to all who replied!

On Tuesday, November 15, 2022 10:36:59 AM Thomas Schmitt wrote:
> Nitpicking mode on.

-- 
rhk

If you reply: snip, snip, and snip again; leave attributions; avoid HTML; 
avoid top posting; and keep it "on list".  (Oxford comma included at no 
charge.)  If you change topics, change the Subject: line. 

Writing is often meant for others to read and understand (legal agreements 
excepted?) -- make it easier for your reader by various means, including 
liberal use of whitespace and minimal use of (obscure?) jargon, abbreviations, 
acronyms, and references.

If someone else has already responded to a question, decide whether any 
response you add will be helpful or not ...

A picture is worth a thousand words -- divide by 10 for each minute of video 
(or audio) or create a transcript and edit it to 10% of the original.



Re: Explaining snapshots (for backup)

2022-11-15 Thread Thomas Schmitt
Hi,

rhkra...@gmail.com wrote:
> In this email I want to hyptothesize on what a snapshot might be in the hope
> that others can correct / amplify it when I go wrong.

Nitpicking mode on.


> I suppose I could copy the entire contents of
> whatever I wanted to make a snapshot of (by any of a variety of tools -- dd,
> cp, ...) and call that a snapshot,

This lacks the single-point-in-time aspect of a snapshot.

The copy process will last some time in which changes may occur which will
only partly be backuped, because some parts of the change did not yet
exist, when their affected files were backuped.
A filesystem snapshot should reduce the risk for such inconsistencies by
freezing the filesystem in a state with no pending write operations.
(There still remains the risk that the filesystem gets frozen while e.g.
a large file is being written by multiple write operations. The snapshot
will then show the file as incomplete, as if you tried to read it too early.)

The freezing of the overall filesystem will normally not last long, because
a common aspect of most snapshot concepts is copy-on-write. I.e. the data
blocks get marked read-only. When a write operation is desired, then the
affected old block loses its active job to a copy which is created in
a previously unused block. The write operation then happens on the new
block, while the old block remains valid only in the snapshot.


> I also hear (i.e., read) statements from which I infer that some snapshots
> included only the metadata of the files (or blocks???), but I'm not sure of 
> the
> value of either of those to me -- how can you reconstruct a possibly missing
> file (or block) from the metadata?

I think these statements refer to the copy-on-write concept:
Mark as read-only now, copy later.
For such a mark, you need some a kind of block management system, which of
course needs to maintain data about the data blocks, i.e. metadata.


pa...@quillandmouse.com wrote:
> Assuming a snapshot is taken so that you can recover a filesystem to a
> previous state (or the current state). Is that correct?

Not necessarily. In backup scenarios, which involve real copying, the
snapshot shall reduce the risk for consistency mishaps.


> Let's assume, as the OP says, you do an original full backup. A
> snapshot ought to record either the contents of all the files which
> have changed, or record the delta of each file which has changed.

No. That's "incremental" or "differential" backup.
  https://en.wikipedia.org/wiki/Incremental_backup
  https://en.wikipedia.org/wiki/Differential_backup

Snapshots can be used to emulate such backups. (I'd frown on a backup
concept, though, which does not store copies at some other storage medium.)


Have a nice day :)

Thomas



Re: Explaining snapshots (for backup)

2022-11-15 Thread paulf
On Tue, 15 Nov 2022 09:40:11 -0500
Dan Ritter  wrote:

> rhkra...@gmail.com wrote: 
> > I'm not really clear on the concept of a snapshot (for backup) --
> > I've done a little googling but haven't found an explanation that
> > "satisfies" me.
> > 
> > Starting from a beginning, I suppose I could copy the entire
> > contents of whatever I wanted to make a snapshot of (by any of a
> > variety of tools -- dd, cp, ...) and call that a snapshot, although
> > the more common name for it would be a "full backup".
> 
> Let's look at the larger circumstances.
> 
> In ordinary usage, there are tens to thousands of processes
> runnning on your system. Some of them are emitting logs or
> writing files.
> 
> Taking a backup takes some time. During that time, some files
> get written, some get opened, and some are related to each other
> (by the processes) in ways which are inconsistent until all of
> them are written.
> 
> A snapshot differs from a backup in two important regards:
> 
> - first, it requires the filesystem to bring writes to a halt.
> There is now a consistent view.
> 
> - second, it doesn't actually copy things. It just records their
> state and, when done, allows future writes to continue -- writes
> which are not part of this snapshot.
> 
> As a result, you can take a snapshot and then:
> 
> - discard it (trivial)
> 
> - look through it and copy off any file or group of files, thus
> getting what they contained at the time of the snapshot, not the
> what they contain now (excellent for recovering from an
> accidental delete)
> 
> - copy all of it off elsewhere, producing a consistent full
> backup.
> 

Assuming a snapshot is taken so that you can recover a filesystem to a
previous state (or the current state). Is that correct?

I don't understand "recording the state" of files. To me, this means
the ownership, size, etc., not the contents. That doesn't seem valuable
for recovering the state of a system.

Let's assume, as the OP says, you do an original full backup. A
snapshot ought to record either the contents of all the files which
have changed, or record the delta of each file which has changed.
Thus, you'd be able to recover a filesystem to either some prior state
or its current state, using the snapshot.

Am I missing something?

Paul

-- 
Paul M. Foster
Personal Blog: http://noferblatz.com
Company Site: http://quillandmouse.com
Software Projects: https://gitlab.com/paulmfoster



Re: Explaining snapshots (for backup)

2022-11-15 Thread Dan Ritter
rhkra...@gmail.com wrote: 
> I'm not really clear on the concept of a snapshot (for backup) -- I've done a 
> little googling but haven't found an explanation that "satisfies" me.
> 
> Starting from a beginning, I suppose I could copy the entire contents of 
> whatever I wanted to make a snapshot of (by any of a variety of tools -- dd, 
> cp, ...) and call that a snapshot, although the more common name for it would 
> be a "full backup".

Let's look at the larger circumstances.

In ordinary usage, there are tens to thousands of processes
runnning on your system. Some of them are emitting logs or
writing files.

Taking a backup takes some time. During that time, some files
get written, some get opened, and some are related to each other
(by the processes) in ways which are inconsistent until all of
them are written.

A snapshot differs from a backup in two important regards:

- first, it requires the filesystem to bring writes to a halt.
There is now a consistent view.

- second, it doesn't actually copy things. It just records their
state and, when done, allows future writes to continue -- writes
which are not part of this snapshot.

As a result, you can take a snapshot and then:

- discard it (trivial)

- look through it and copy off any file or group of files, thus
getting what they contained at the time of the snapshot, not the
what they contain now (excellent for recovering from an
accidental delete)

- copy all of it off elsewhere, producing a consistent full
backup.

Does that help?

-dsr-



Re: Explaining snapshots (for backup)

2022-11-15 Thread DdB
Am 15.11.2022 um 15:27 schrieb rhkra...@gmail.com:
> I'm not really clear on the concept of a snapshot (for backup) -- I've done a 
> little googling but haven't found an explanation that "satisfies" me.

I am familiar with snapshots on zfs. There might be different meanings
in other contexts, idk ...

zfs is s COW filesystem, meaning "copy-on-write". Whenever you write to
an already existing block (piece of a file), a new block gets created
(copied and modified) and on closing the write (or on sync), the
(metadata-) pointer/directory entry changes to point to the new block.
This algorithm primarily implements transactions on the filesystem to
garantee, it has a consistent state at all times. but one side effect
is, that the old filesystem state still hangs around. And a snapshot
basically points to that outdated state of a filesystem and keeping a
shapshot means to prevent the reuse of those blocks until the snapshot
gets freed up again.

Does that help your understanding?



Explaining snapshots (for backup)

2022-11-15 Thread rhkramer
I'm not really clear on the concept of a snapshot (for backup) -- I've done a 
little googling but haven't found an explanation that "satisfies" me.

In this email I want to hyptothesize on what a snapshot might be in the hope 
that others can correct / amplify it when I go wrong.

Starting from a beginning, I suppose I could copy the entire contents of 
whatever I wanted to make a snapshot of (by any of a variety of tools -- dd, 
cp, ...) and call that a snapshot, although the more common name for it would 
be a "full backup".

Later, I could copy any files (as files) that have changed since the date I 
made 
the full backup and call that a snapshot.  (I might use find to identify the 
files that have changed, and then any of a variety of tools to actually copy 
them somewhere (and call that a shapshot).)

I suppose it wouild be possible to do something like this at a block level, 
that is copying only blocks of a file which have changed.  (Not sure which 
commands might help me do that, but I don't think I'd really be interested in 
doing that.)

I also hear (i.e., read) statements from which I infer that some snapshots 
included only the metadata of the files (or blocks???), but I'm not sure of the 
value of either of those to me -- how can you reconstruct a possibly missing 
file (or block) from the metadata? 

Thanks!

-- 
rhk

If you reply: snip, snip, and snip again; leave attributions; avoid HTML; 
avoid top posting; and keep it "on list".  (Oxford comma included at no 
charge.)  If you change topics, change the Subject: line. 

Writing is often meant for others to read and understand (legal agreements 
excepted?) -- make it easier for your reader by various means, including 
liberal use of whitespace and minimal use of (obscure?) jargon, abbreviations, 
acronyms, and references.

If someone else has already responded to a question, decide whether any 
response you add will be helpful or not ...

A picture is worth a thousand words -- divide by 10 for each minute of video 
(or audio) or create a transcript and edit it to 10% of the original.