On 08.02.2017 14:08 Austin S. Hemmelgarn wrote:
> On 2017-02-08 07:14, Martin Raiber wrote:
>> Hi,
>>
>> On 08.02.2017 03:11 Peter Zaitsev wrote:
>>> Out of curiosity, I see one problem here:
>>> If you're doing snapshots of the live database, each snapshot leaves
>>> the database files like killing the database in-flight. Like shutting
>>> the system down in the middle of writing data.
>>>
>>> This is because I think there's no API for user space to subscribe to
>>> events like a snapshot - unlike e.g. the VSS API (volume snapshot
>>> service) in Windows. You should put the database into frozen state to
>>> prepare it for a hotcopy before creating the snapshot, then ensure all
>>> data is flushed before continuing.
>>>
>>> I think I've read that btrfs snapshots do not guarantee single point in
>>> time snapshots - the snapshot may be smeared across a longer period of
>>> time while the kernel is still writing data. So parts of your writes
>>> may still end up in the snapshot after issuing the snapshot command,
>>> instead of in the working copy as expected.
>>>
>>> How is this going to be addressed? Is there some snapshot aware API to
>>> let user space subscribe to such events and do proper preparation? Is
>>> this planned? LVM could be a user of such an API, too. I think this
>>> could have nice enterprise-grade value for Linux.
>>>
>>> XFS has xfs_freeze and xfs_thaw for this, to prepare LVM snapshots. But
>>> still, also this needs to be integrated with MySQL to properly work. I
>>> once (years ago) researched on this but gave up on my plans when I
>>> planned database backups for our web server infrastructure. We moved to
>>> creating SQL dumps instead, although there're binlogs which can be used
>>> to recover to a clean and stable transactional state after taking
>>> snapshots. But I simply didn't want to fiddle around with properly
>>> cleaning up binlogs which accumulate horribly much space usage over
>>> time. The cleanup process requires to create a cold copy or dump of the
>>> complete database from time to time, only then it's safe to remove all
>>> binlogs up to that point in time.
>>
>> little bit off topic, but I for one would be on board with such an
>> effort. It "just" needs coordination between the backup
>> software/snapshot tools, the backed up software and the various snapshot
>> providers. If you look at the Windows VSS API, this would be a
>> relatively large undertaking if all the corner cases are taken into
>> account, like e.g. a database having the database log on a separate
>> volume from the data, dependencies between different components etc.
>>
>> You'll know more about this, but databases usually fsync quite often in
>> their default configuration, so btrfs snapshots shouldn't be much behind
>> the properly snapshotted state, so I see the advantages more with
>> usability and taking care of corner cases automatically.
> Just my perspective, but BTRFS (and XFS, and OCFS2) already provide
> reflinking to userspace, and therefore it's fully possible to
> implement this in userspace.  Having a version of the fsfreeze (the
> generic form of xfs_freeze) stuff that worked on individual sub-trees
> would be nice from a practical perspective, but implementing it would
> not be easy by any means, and would be essentially necessary for a
> VSS-like API.  In the meantime though, it is fully possible for the
> application software to implement this itself without needing anything
> more from the kernel.

VSS snapshots whole volumes, not individual files (so comparable to an
LVM snapshot). The sub-folder freeze would be something useful in some
situations, but duplicating the files+extends might also take too long
in a lot of situations. You are correct that the kernel features are
there and what is missing is a user-space daemon, plus a protocol that
facilitates/coordinates the backups/snapshots.

Sending a FIFREEZE ioctl, taking a snapshot and then thawing it does not
really help in some situations as e.g. MySQL InnoDB uses O_DIRECT and
manages its on buffer pool which won't get the FIFREEZE and flush, but
as said, the default configuration is to flush/fsync on every commit.



Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to