Re: definiing deduplication (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-23 Thread Thomas Schmitt
Hi, hw wrote: > with CDs/DVDs, writing is not so easy. Thus it is not as easy to overwrite them by mistake. The complicated part of optical burning can be put into scripts. But i agree that modern HDD sizes cannot be easily covered by optical media. I wrote: > > [...] LTO tapes [...] hw

Re: definiing deduplication (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-23 Thread hw
On Thu, 2022-11-10 at 15:32 +0100, Thomas Schmitt wrote: > Hi, > > i wrote: > > > the time window in which the backuped data > > > can become inconsistent on the application level. > > hw wrote: > > Or are you referring to the data being altered while a backup is in > > progress? > > Yes. Ah I

Re: definiing deduplication

2022-11-13 Thread Michael Stone
On Sat, Nov 12, 2022 at 01:39:56PM -0500, Stefan Monnier wrote: But as I mentioned, higher-layers (the filesystem layer, and the applications running on top of that) *should* try and make sure that a hard failure (kernel crash, power failure, ... these and up taking a snapshot of your block

Re: definiing deduplication

2022-11-12 Thread Stefan Monnier
> It took me a while to find out how the block layer can ensure that a > snapshot is consistent on the filesystem level. The answer is Linux VFS > method super_operations.freeze_fs(). > https://www.kernel.org/doc/html/latest/filesystems/vfs.html > Without it a snapshot on block level would be

Re: definiing deduplication

2022-11-12 Thread Thomas Schmitt
Hi, Stefan Monnier wrote: > Presumably the "backuper" is the sysadmin, i.e. the same (group of) > person who chose the filesystem, so I'd say yes the "backuper" is > to blame. I rather mean the whole complex of system maintainer, users, and backup software. But even if there is a qualified

Re: definiing deduplication

2022-11-11 Thread Stefan Monnier
>> Arguably this can be considered as a bug in the application (because >> a failure in the middle could thus result in an inconsistent state). > A backup programmer or operator does not necessarily have influence on > such applications. Indeed it remains a real problem, that can be solved only

Re: definiing deduplication

2022-11-11 Thread Thomas Schmitt
Hi, i wrote: > > Data of different files or at different places in the same file > > may have relations which may become inconsistent during change operations > > until the overall change is complete. Stefan Monnier wrote: > Arguably this can be considered as a bug in the application (because >

Re: definiing deduplication

2022-11-10 Thread Stefan Monnier
>> Or are you referring to the data being altered while a backup is in >> progress? > Yes. Data of different files or at different places in the same file > may have relations which may become inconsistent during change operations > until the overall change is complete. Arguably this can be

Re: definiing deduplication (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-10 Thread Thomas Schmitt
Hi, i wrote: > > the time window in which the backuped data > > can become inconsistent on the application level. hw wrote: > Or are you referring to the data being altered while a backup is in > progress? Yes. Data of different files or at different places in the same file may have relations

Re: definiing deduplication (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-10 Thread hw
On Wed, 2022-11-09 at 12:08 +0100, Thomas Schmitt wrote: > Hi, > > i wrote: > > >   https://github.com/dm-vdo/kvdo/issues/18 > > hw wrote: > > So the VDO ppl say 4kB is a good block size > > They actually say that it's the only size which they support. > > > > Deduplication doesn't work when

Re: definiing deduplication (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-09 Thread David Christensen
On 11/9/22 03:08, Thomas Schmitt wrote: So i would use at least four independent storage facilities interchangeably. I would make snapshots, if the filesystem supports them, and backup those instead of the changeable filesystem. I would try to reduce the activity of applications on the

Re: definiing deduplication (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-09 Thread hw
On Wed, 2022-11-09 at 14:44 +0100, didier gaumet wrote: > Le 09/11/2022 à 14:25, hw a écrit : > > > I don't think it was, see https://docs.freebsd.org/en/books/handbook/zfs/ > > > > I does mention performance, but I remember other statements saying that was > > designed for arrays with 40+ disks

Re: definiing deduplication (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-09 Thread didier gaumet
Le 09/11/2022 à 14:25, hw a écrit : I don't think it was, see https://docs.freebsd.org/en/books/handbook/zfs/ I does mention performance, but I remember other statements saying that was designed for arrays with 40+ disks and, besides data integrity, with ease of use in mind. Performance

Re: definiing deduplication (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-09 Thread hw
On Wed, 2022-11-09 at 11:05 +0100, didier gaumet wrote: > Le 09/11/2022 à 10:27, hw a écrit : > [...] > > Yes, I've seen those.  I can only wonder how much performance impact VDO > > would > > have for backups.  And I wonder why it doesn't require as much memory as ZFS > > seems to need for

Re: definiing deduplication (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-09 Thread Thomas Schmitt
Hi, i wrote: > >   https://github.com/dm-vdo/kvdo/issues/18 hw wrote: > So the VDO ppl say 4kB is a good block size They actually say that it's the only size which they support. > Deduplication doesn't work when files aren't sufficiently identical, The definition of sufficiently identical

Re: definiing deduplication (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-09 Thread hw
On Tue, 2022-11-08 at 11:11 +0100, Thomas Schmitt wrote: > Hi, > > hw wrote: > > I still wonder how VDO actually works. > > There is a comparer/decider named UDS which holds an index of the valid > storage blocks, and a device driver named VDO which performes the > deduplication and hides its

Re: definiing deduplication (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-09 Thread didier gaumet
Le 09/11/2022 à 10:27, hw a écrit : [...] Yes, I've seen those. I can only wonder how much performance impact VDO would have for backups. And I wonder why it doesn't require as much memory as ZFS seems to need for deduplication. It's *only* an hypothesis, but I would suppose that ZFS was

Re: definiing deduplication (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-09 Thread hw
On Tue, 2022-11-08 at 10:04 +0100, didier gaumet wrote: > Le 08/11/2022 à 05:13, hw a écrit : > > On Mon, 2022-11-07 at 13:57 -0500, rhkra...@gmail.com wrote: > > > > > > > > > I didn't (and don't) know much about deduplication (beyond what you might > > > deduce from the name), so I google and

Re: definiing deduplication (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-08 Thread Thomas Schmitt
Hi, hw wrote: > I still wonder how VDO actually works. There is a comparer/decider named UDS which holds an index of the valid storage blocks, and a device driver named VDO which performes the deduplication and hides its internals from the user by providing a block device on top of the real

Re: definiing deduplication (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-08 Thread didier gaumet
Le 08/11/2022 à 05:13, hw a écrit : On Mon, 2022-11-07 at 13:57 -0500, rhkra...@gmail.com wrote: I didn't (and don't) know much about deduplication (beyond what you might deduce from the name), so I google and found this article which was helpful to me:    *

Re: definiing deduplication (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-07 Thread hw
On Mon, 2022-11-07 at 13:57 -0500, rhkra...@gmail.com wrote: > > > I didn't (and don't) know much about deduplication (beyond what you might > deduce from the name), so I google and found this article which was helpful to > me: > >    *

definiing deduplication (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-07 Thread rhkramer
> didier gaumet wrote: > > I may be mistaken, but I think there is a confusion here about a > > deduplication at filesystem level and at backup tool level. I didn't (and don't) know much about deduplication (beyond what you might deduce from the name), so I google and found this article which