Hi,

You all bastard !

Pieter

On Thu, Jun 4, 2009 at 8:52 AM, Pieter Wuille <s...@users.sourceforge.net>wrote:

> On Wed, Jun 03, 2009 at 07:36:22PM -0400, Jeffrey J. Kosowsky wrote:
> > Holger Parplies wrote at about 23:45:35 +0200 on Wednesday, June 3, 2009:
> >  > Hi,
> >  >
> >  > Peter Walter wrote on 2009-06-03 16:15:37 -0400 [Re: [BackupPC-users]
> Backing up a BackupPC server]:
> >  > > [...]
> >  > > My understanding is that, if it were not for the
> >  > > hardlinks, that rsync transfers to another server would be more
> >  > > feasible;
> >  >
> >  > right.
> >  >
> >  > > that processing the hardlinks requires significant cpu
> >  > > resources, memory resources, and that access times are very slow,
> >  >
> >  > Memory: yes. CPU: I don't think so. Access times very slow? Well, the
> inodes
> >  > referenced from one directory are probably scattered all over the
> place, so
> >  > traversing the file tree (e.g. "find $TopDir -ls") is probably slower
> than
> >  > in "normal" directories. Or do you mean swapping slows down memory
> accesses
> >  > by several orders of magnitude?
> >  >
> >  > > compared to processing ordinary files. Is my understanding correct?
> If
> >  > > so, then what I would think of doing is (a) shutting down backuppc
> (b)
> >  > > creating a "dump" file containing the hardlink metadata (c) backing
> up
> >  > > the pooled files and the dump file using rsync (d) restarting
> backuppc.
> >  > > I really don't need a live, working copy of the backuppc file system
> -
> >  > > just a way to recreate it from a backup if necessary, using an
> "undump"
> >  > > program that recreated the hardlinks from the dump file. Is this
> >  > > approach feasible?
> >  >
> >  > Yes. I'm just not certain how you would test it. You can undoubtedly
> >  > restore your pool to a new location, but apart from browsing a few
> random
> >  > files, how would you verify it? Maybe create a new "dump" and compare
> the
> >  > two ...
> >  >
> >  > Have you got the resources to try this? I believe I've got most of the
> code
> >  > we'd need. I'd just need to take it apart ...
> >  >
> >
> > Holger, one thing I don't understand is that if you create a dump
> > table associating inodes with pool file hashes, aren't we back in the
> > same situation as using rsync -H? i.e., for large pool sizes, the
> > table ends up using up all memory and bleeding into swap which means
> > that lookups start taking forever causing the system to
> > thrash. Specifically, I would assume that rsync -H basically is
> > constructing a similar table when it deals with hard links, though
> > perhaps there are some savings in this case since we know something
> > about the structure of the BackupPC file data -- i.e., we know that
> > all the hard links have as one of their links a link to a pool file.
> >
> [...]
> > This would allow the entire above algorithm to be done in O(mlogm)
> > time with the only memory intensive steps being those required to sort
> > the pool and pc tables. However, since sorting is a well studied
> > problem, we should be able to use memory efficient algorithms for
> > that.
>
> You didn't use the knowledge that the files in the pool have names that
> correspond (apart from a few hashchains) to the partial md5sums of the
> data in them, like BackupPC_tarPCcopy does. I've never used/tested this
> tool, but if i understand it correctly, it builds a tar file that contains
> symbolic hardlinks to the pool directory, instead of the actual data.
> This combined with with a verbatim copy of the pool directory itself,
> should
> suffice to copy the entire topdir in O(m+n) time and O(1) memory (since a
> lookup of what pool file a certain hardlinked file in a pc/ dir points to,
> can be done in O(1) time and space (except for a sporadic hash chain)).
> In practice however, doing the copy on the blocklevel will be significantly
> faster still, because no continuous seeking is required.
>
> > I would be curious to know how how in the real world the time (and
> > memory usage) compares to copy over a large (say multi Terabyte)
> > BackupPC topdir varies for the following methods:
> >
> > 1. cp -ad
> > 2. rsync -H
> > 3. Copy using a single table of pool inode numbers
> > 4. Copy using a sorted table of pool inode numbers and pc hierarchy
> >    inode numbers
> Add:
>  5. copy the pooldir and use tarPCcopy for the rest
>  6. copy the blockdevice
>
> --
> Pieter
>
>
> ------------------------------------------------------------------------------
> OpenSolaris 2009.06 is a cutting edge operating system for enterprises
> looking to deploy the next generation of Solaris that includes the latest
> innovations from Sun and the OpenSource community. Download a copy and
> enjoy capabilities such as Networking, Storage and Virtualization.
> Go to: http://p.sf.net/sfu/opensolaris-get
> _______________________________________________
> BackupPC-users mailing list
> BackupPC-users@lists.sourceforge.net
> List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
> Wiki:    http://backuppc.wiki.sourceforge.net
> Project: http://backuppc.sourceforge.net/
>
------------------------------------------------------------------------------
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Reply via email to