Re: strategies for Mac desktops

John Hein Thu, 02 Jun 2011 16:22:55 -0700

As you surmised, these are mostly gtar questions.
If your DLE is not a filesystem, then the other dump-ish choices are out.


But gtar (and star as well as the various flavors of dump) does _try_
to save space when it encounters hard links.

For instance (gtar 1.26), 
mkdir xx
dd if=/dev/zero of=xx/z count=1000
gtar cf a.tar xx
ln xx/z xx/l
gtar cf b.tar xx

a.tar & b.tar should be the same or similar size.

However, for an incremental dump on a DLE which has a new hard link, I
don't think gtar will get you the savings you want (if I'm reading
your reasoning correctly).  It's only when the original is in the same
tar image that you get the savings (i.e., not when the original is in
a different tarball).

mkdir xx
dd if=/dev/zero of=xx/z count=1000
/bin/rm l0
gtar cf 0.tar --listed-incremental=l0 xx
ln xx/z xx/l
cp -p l0 l1
gtar cf 1.tar --listed-incremental=l1 xx
touch xx/foo
cp -p l1 l2
gtar cf 2.tar --listed-incremental=l2 xx

In theory, tar's incremental mode might be able to realize that 'l'
points to 'z' and 'z' hasn't changed, so just archive the hard link
"meta-data" (i.e., not the contents).  But I don't think gtar rolls
like that - I'm not sure where/if the aforementioned theory may have
holes, but it seems it's not implemented that way at this time.

tar tvf 1.tar
drwxr-xr-x jhein/jhein       7 2011-06-02 14:41 xx/
-rw-r--r-- jhein/jhein  512000 2011-06-02 14:41 xx/l
hrw-r--r-- jhein/jhein       0 2011-06-02 14:41 xx/z link to xx/l

In this simple test 1.tar is just as big as 0.tar.  2.tar is
smaller, of course.

And if you just touch xx/l (or xx/z), then a 3.tar will be "big" again.

Testing for dumps (ufs, zfs) is left as an exercise for the reader ;).
If you find out, let us know.  Doing a quick test with star seems to
show it behaves the same as gtar.

Chris Hoogendyk wrote at 15:18 -0400 on Jun  2, 2011:
 > OK, so maybe I shot myself in the foot by asking too much (no replies from 
 > anyone in over 24 hours) ;-).
 > 
 > Let me pare this down to one simple question -- Will Amanda efficiently do 
 > server side incremental 
 > backups of hard link intensive rsync based backups being stored on the 
 > server from a workstation 
 > (Mac or otherwise)? In other words, if the workstation creates a new folder 
 > on the server and 
 > populates it with hard links before running an rsync against it, will Amanda 
 > see that as all being 
 > new and backing up essentially a full of the user's files?
 > 
 > I understand Amanda uses native tools and there is a possibility that this 
 > will vary depending on 
 > whether the server is using gnutar on a zfs volume, or ufsdump on a ufs 
 > volume, etc. I'm just hoping 
 > that someone has some specific experience they can relate, especially since 
 > Zmanda is working with 
 > BackupPC now.
 > 
 > I'm guessing from Dustin's April 12, 2010 blog at http://code.v.igoro.us/ 
 > (cyrus imap under list of 
 > possible projects), that gnutar probably still doesn't deal well with the 
 > hard links. I saw some 
 > references while I was digging that imply that ufsdump should be alright. 
 > But, I'd still like to 
 > hear from anyone who has first hand experience or definitive knowledge.
 > 
 > TIA,
 > 
 > Chris Hoogendyk
 > 
 > 
 > On 6/1/11 11:20 AM, Chris Hoogendyk wrote:
 > > I haven't tried this yet, but I'm hoping to get some comments and guidance 
 > > from others who may be 
 > > doing it. One particular question is set off in all caps below, but, more 
 > > generally, I'm open to 
 > > comments and advice on any of this.
 > >
 > > I have a number of Amanda installations in different departments and 
 > > buildings that backup all my 
 > > servers. They've all got tape libraries now and typically run a 6 week or 
 > > better daily rotation 
 > > with a weekly dump cycle.
 > >
 > > In the past I have punted on desktops, providing share space on the server 
 > > and advising people to 
 > > put what they want backed up on the server. Now we have converted most of 
 > > our office staff to 
 > > Macs, and I want to take a more integrated and automated approach for 
 > > critical staff machines. I 
 > > have a few options I'm looking at. One would be to automate Time Machine 
 > > to a share on the server 
 > > and back that up with Amanda. Another would be to script rsync to a server 
 > > share and back that up 
 > > with Amanda (we're using Samba for shares). The third would be to 
 > > implement Amanda Client on Mac 
 > > OS X for the staff and back that up from the server. Each of these 
 > > approaches has advantages and 
 > > disadvantages.
 > >
 > > If you have seen W. Curtis Preston's analysis of Time Machine, that 
 > > provides some background to my 
 > > questions. He wrote two blog posts. One breaks down time machine and 
 > > expresses some complaints 
 > > about it. The second replicates what time machine is doing using scripting 
 > > with rsync.
 > >
 > > http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/280-time-machine.html/
 > > http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/282-time-machine-rsync.html
 > >
 > > Curtis' basic complaint about Time Machine was that if you point it at a 
 > > server, it creates a 
 > > sparse bundle so that it can have a Mac OS X native file system to write 
 > > to. Then all the backups 
 > > go into that sparse bundle. It is better than a disk image, because it 
 > > starts small and grows, 
 > > instead of just being the max size from the beginning 
 > > (http://en.wikipedia.org/wiki/Sparse_bundle). It also works smoothly with 
 > > rsync if you want to 
 > > copy that sparse bundle to another server. rsync will copy the new bands 
 > > of data without having to 
 > > copy the whole bundle. The disadvantage is that it is one file to be 
 > > backed up from the server. So 
 > > you don't get any advantage from Amanda's planning strategies with 
 > > incremental dumps.
 > >
 > > Curtis' script with rsync is a command line implementation of the same 
 > > backup strategy that Time 
 > > Machine uses, but without the sparse bundle. So, this is like the 
 > > traditional rsync snapshot with 
 > > lots of hard links.
 > >
 > > Both of those methods have the advantage that the user would have quick 
 > > access to their own 
 > > backups over the time frame we configure it for. Using a scripted rsync 
 > > means that systems 
 > > administrators would also have access to the files on the server and could 
 > > retrieve things for 
 > > someone if they needed the help. It also means that because the backup is 
 > > composed of individual 
 > > files and hard links, Amanda's strategies with incremental backups should 
 > > work efficiently. The 
 > > [QUESTION] I have is when a folder is created full of hard links pointing 
 > > to files in the previous 
 > > backup folder, and then a couple of those are replaced with updated files, 
 > > will Amanda see 
 > > everything in this folder as new and end up backing up all of the files, 
 > > because the hard link 
 > > ends up behaving like the whole file? And, will a full backup end up with 
 > > multiple copies of a 
 > > single file getting backed up because of the hard links? I think it has to 
 > > be intelligent with 
 > > respect to hard links, but I am hoping someone with real experience can 
 > > say so.
 > >
 > > That same question would apply to BackupPC, but that's not one of the 
 > > options I am looking at.
 > >
 > > Configuring Amanda Client on the staff Mac OS X desktops would have all 
 > > the advantages of Amanda, 
 > > but it would mean that I would be in charge of configuring all the details 
 > > of each machine, and I 
 > > would be responsible for recovering files anytime anyone asked, including 
 > > waiting for the daily 
 > > backups to be completed and then queuing up the right tape to get their 
 > > file back from the date 
 > > they want.
 > >
 > > Advice?
 > 
 > -- 
 > ---------------
 > 
 > Chris Hoogendyk
 > 
 > -
 >     O__  ---- Systems Administrator
 >    c/ /'_ --- Biology&  Geology Departments
 >   (*) \(*) -- 140 Morrill Science Center
 > ~~~~~~~~~~ - University of Massachusetts, Amherst
 > 
 > <hoogen...@bio.umass.edu>
 > 
 > ---------------
 > 
 > Erdös 4
 > 
 >

Re: strategies for Mac desktops

Reply via email to