Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape
Rory Campbell-Lange wrote: On 14/08/09, Frank Smith (fsm...@hoovers.com) wrote: Chris Hoogendyk wrote: Amanda will do the compression for you. You define it in the dumptype in amanda.conf. If you have a holding disk, then it will compress the data as it goes onto the holding disk. If you don't have a holding disk, then you might have issues with being able to stream a backup to tape, compressing it on the fly. Even with a really fast cpu, I don't know if you can maintain the throughput to drive LTO4 at a good speed. You might want to consider configuring for client compression. Not only will that give you more CPU for feeding your tape, it also minimizes network bandwidth. As usual, YMMV, it all depends on where the bottlenecks are in your environment. In our case the server _is_ the only client, with up to 30TB of direct attached storage, with the storage running at between 80MB/s and 120MB/s access speeds (Bytes rather than bytes). I don't know if this is fast enough to deal with a SAS connected LTO4 drive, particularly if it is doing software compression along the way. With reference to Chris Hoogendyk's email clarification on parallelism, I am very curious to learn if Amanda ...still require[s] a DLE to be completed to holding disk before it will send any of it to tape... In our case this is a particularly important question as, although we can add in more AoE storage for a DLE, this will only run at the speeds above. Do we need a 1TB SAS disk array too? You will get the best performance if you can do that. If the disk that is being copied to tape can give the speed the tape needs, that's going to do a better job of keeping things moving. You have a couple of options. You can go without a holding disk, and then each DLE will be streamed sequentially to tape. This will stretch out your backups. It will also mean that any compression you do in software will be done in line with that sequential stream. Your system may not be able to keep that all flying fast enough for the tape, and you may end up with shoe shining and very low speeds. You can certainly try it and see what happens. If (when?) that fails, you could try using hardware compression on the tape drive. The backups will still be sequential, one DLE streaming to tape at a time, and if your drives can't keep up, it will be slower than you might like. But, at least you are not dealing with network backups. The option I would try, budget allowing, would be to add a couple of SAS drives to be used as holding disks. Then break up your DLEs so that each DLE is substantially smaller than the holding disks. Then Amanda can run them in parallel, compress them on the holding disk, and then stream completed, compressed DLEs from the holding disk to the tape. I wouldn't put the holding disks in raid. -- --- Chris Hoogendyk - O__ Systems Administrator c/ /'_ --- Biology Geology Departments (*) \(*) -- 140 Morrill Science Center ~~ - University of Massachusetts, Amherst hoogen...@bio.umass.edu --- Erdös 4
Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape
I wouldn't put the holding disks in raid. Hu hu... Interesting... I have a 4 disks RAID-0 holding disk, and it isn't fast... I always wondered if I should use seperated (non-RAID) drives... Cyrille Bollu Responsable systèmes Fedasil - ICT tel: +32.2.213.43.49 gsm: +32.478.23.08.15
Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape
On Monday 17 August 2009, Cyrille Bollu wrote: I wouldn't put the holding disks in raid. Hu hu... Interesting... I have a 4 disks RAID-0 holding disk, and it isn't fast... I always wondered if I should use seperated (non-RAID) drives... Its been my observation that software raids are slower. Not tremendously so though. When Jim put together a raid-5 with 4 drives several years ago, the drives were about 70meg/sec drives, and the overall was just a hair over 50meg/sec. I know he has rebuilt it with bigger faster drives 2 or 3 times since, along with more iron in the cpu, so I don't know its current speed. Being retired means being out of the loop. :( Cyrille Bollu Responsable systèmes Fedasil - ICT tel: +32.2.213.43.49 gsm: +32.478.23.08.15 -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) The NRA is offering FREE Associate memberships to anyone who wants them. https://www.nrahq.org/nrabonus/accept-membership.asp A prig is a fellow who is always making you a present of his opinions. -- George Eliot
Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape
On Monday 17 August 2009, Cyrille Bollu wrote: I wouldn't put the holding disks in raid. Hu hu... Interesting... I have a 4 disks RAID-0 holding disk, and it isn't fast... I always wondered if I should use seperated (non-RAID) drives... To drive an LTO-4 your holding disk needs to read somewhat over 100MB/sec sequentially, which requires at least 2 drives striped, but should be easy enough with any modern raid controller or software raid. Complicating this, it may also need to write at similar speed, simultaneously, which introduces a random access element and ups the demand considerably. I would think you would probably want at least 4 big SATA drives striped together to reliably feed an LTO-4 drive at full speed. Conveniently, this could also give you over 5TB of very cheap holding disk space. Also, unless you're backing up exclusively large files over a fast SAN link (faster than Gig-E), I doubt you could get anywhere close to full tape performance without a holding disk. -- ... a serious depression seems improbable; [we expect] recovery of business next spring, with further improvement in the fall. Harvard Economic Society, November 10, 1929
Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape
Cyrille Bollu wrote: Chris Hoogendyk wrote: I wouldn't put the holding disks in raid. Hu hu... Interesting... I have a 4 disks RAID-0 holding disk, and it isn't fast... I always wondered if I should use seperated (non-RAID) drives... Here is an extremely interesting article that everyone should take a look at. Rather than just giving theoretical comparisons that you see in places like the Wikipedia raid article (which is nevertheless an excellent reference), this article is a case study analysis with both real and test environments throwing data at raid and instrumenting it -- http://blogs.zdnet.com/Ou/?p=484. While this guy is looking at things like database servers and exchange, we ought to be able to interpret this for Amanda. A couple of points to note for Amanda: Amanda will use all the holding disk drives while doing parallel backups and storing output on the holding disks. When it is writing to tape, it is constrained by the sequential nature of the tape, and will only be doing one DLE at a time from those that it has completed on the holding disks. Also, Amanda's access is heavily sequential, although it may have multiple parallel processes hitting the drives. -- --- Chris Hoogendyk - O__ Systems Administrator c/ /'_ --- Biology Geology Departments (*) \(*) -- 140 Morrill Science Center ~~ - University of Massachusetts, Amherst hoogen...@bio.umass.edu --- Erdös 4
Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape
On 17/08/09, Chris Hoogendyk (hoogen...@bio.umass.edu) wrote: Cyrille Bollu wrote: Chris Hoogendyk wrote: I wouldn't put the holding disks in raid. snip http://blogs.zdnet.com/Ou/?p=484. While this guy is looking at things like database servers and exchange, we ought to be able to interpret this for Amanda. A couple of points to note for Amanda: Amanda will use all the holding disk drives while doing parallel backups and storing output on the holding disks. When it is writing to tape, it is constrained by the sequential nature of the tape, and will only be doing one DLE at a time from those that it has completed on the holding disks. Also, Amanda's access is heavily sequential, although it may have multiple parallel processes hitting the drives. A slow RAID1 off two 7200 RPM SATA disks on a BBU-backed LSI hardware raid controller can do about 62031 KiB/s write and 86399 KiB/s read. Those sorts of numbers improve steadily the number of spindles you add to a RAID collection and the higher the RAID number and (in the case of writing) if cacheing is enabled. On 16/08/09, Rory Campbell-Lange (r...@campbell-lange.net) wrote: On 14/08/09, Frank Smith (fsm...@hoovers.com) wrote: Chris Hoogendyk wrote: With reference to Chris Hoogendyk's email clarification on parallelism, I am very curious to learn if Amanda ...still require[s] a DLE to be completed to holding disk before it will send any of it to tape... In our case this is a particularly important question as, although we can add in more AoE storage for a DLE, this will only run at the speeds above. Do we need a 1TB SAS disk array too? From the discussion here it seems preferable to have a DLE on two major counts. One is that compression can happen prior to writing to tape, which could result in shoe-shining, and another is that Amanda will be clearer about the amount of data it will be trying to write to a tape, in other words it will do a better fit of data to tape. The most important question I now have to ask is: How fast can a SAS-based LTO4 drive write to tape? Regards Rory -- Rory Campbell-Lange Director r...@campbell-lange.net Campbell-Lange Workshop www.campbell-lange.net 0207 6311 555 3 Tottenham Street London W1T 2AF Registered in England No. 04551928
Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape
On 14/08/09, Frank Smith (fsm...@hoovers.com) wrote: Chris Hoogendyk wrote: Amanda will do the compression for you. You define it in the dumptype in amanda.conf. If you have a holding disk, then it will compress the data as it goes onto the holding disk. If you don't have a holding disk, then you might have issues with being able to stream a backup to tape, compressing it on the fly. Even with a really fast cpu, I don't know if you can maintain the throughput to drive LTO4 at a good speed. You might want to consider configuring for client compression. Not only will that give you more CPU for feeding your tape, it also minimizes network bandwidth. As usual, YMMV, it all depends on where the bottlenecks are in your environment. In our case the server _is_ the only client, with up to 30TB of direct attached storage, with the storage running at between 80MB/s and 120MB/s access speeds (Bytes rather than bytes). I don't know if this is fast enough to deal with a SAS connected LTO4 drive, particularly if it is doing software compression along the way. With reference to Chris Hoogendyk's email clarification on parallelism, I am very curious to learn if Amanda ...still require[s] a DLE to be completed to holding disk before it will send any of it to tape... In our case this is a particularly important question as, although we can add in more AoE storage for a DLE, this will only run at the speeds above. Do we need a 1TB SAS disk array too? Rory -- Rory Campbell-Lange Director r...@campbell-lange.net Campbell-Lange Workshop www.campbell-lange.net 0207 6311 555 3 Tottenham Street London W1T 2AF Registered in England No. 04551928
Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape
Hi Chris On 13/08/09, Chris Hoogendyk (hoogen...@bio.umass.edu) wrote: ... the solution is akin to the Japanese monks caring for Bonzai I liked this idea about tape archives -- constant pruning and maintenance. Difficult to sell though. As for your specific questions: You should be able to do LVM snapshots. I use fssnap on Solaris 9 and 10, and scanning through, here are just a couple of references I find to people using LVM snapshots with Amanda: snip With the latest releases of Amanda, there is a new API that could make it even easier to implement. Great; thanks for the pointers. Typically, we set up Amanda with holding disk space. snip If all the storage is locally attached (actually, AoE drives storage units connected over Ethernet), I am hoping to avoid the disk space if I can write to tape fast enough. I'd like to avoid paying for up to 15TB of fast holding disk space if I can avoid it. Compression can be done either on the client, on the server, or on the tape drive. Obviously, if you use software compression, you want to turn off the tape drive compression. I use server side compression, because I have a dedicated Amanda server that can handle it. By not using the tape drive compression, Amanda has more complete information on data size and tape usage for its planning. If your server is more constrained than your clients, you could use client compression. This is specified in your dumptypes in your amanda.conf. I don't have any clients, so this is an interesting observation. I'll be trying to do sofware compression then I think. The Unix backup book (google for amanda software compression) suggests that compression can be used on a per-image basis; presumably I can pass the backup data stream through gzip or bzip2 on the way to a tape? Deduplication is not available with Amanda. However, some people stage different kinds of tools and use Amanda for the final staging and management of tapes and archives. So, in some situations, BackupPC could be used to do deduplication from, say, desktop clients to a server archive which is then backed up by Amanda. That could start complicating your 12 year recovery scenario and what happens when software is not available or doesn't run. Great -- thanks for the details. Amanda uses the term index rather than catalog -- see http://wiki.zmanda.com/index.php/Amanda_Index. Note that if you are putting tapes into a long term archive with no intent of recycling them in subsequent backups, you can use amadmin to mark them as no-reuse. I periodically (typically at the end of semesters) do a force full, mark the tapes as no-reuse, and then pull them out of my tapecycle and put them in storage. Very useful again, thanks. Regards Rory -- Rory Campbell-Lange Director r...@campbell-lange.net Campbell-Lange Workshop www.campbell-lange.net 0207 6311 555 3 Tottenham Street London W1T 2AF Registered in England No. 04551928
Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape
Hi, Here's my (very) small personnal experience: A few years ago, when I tried it, I couldn't enable server-side software compression while bypassing the holding disk with my IBM ULTIUM LTO-3 drive: Tape speed was sinking to about 5MB/s. My backup server was a Dell PowerEdge 2850 with 4 Intel Xeon 3GHz and 8MB RAM using RHEL-4.0 and amanda-2.4.4p3-1. Maybe did I do something wrong at that time (I just had 1 try). Beware though. Cyrille owner-amanda-us...@amanda.org wrote on 14/08/2009 15:57:45: Hi Chris On 13/08/09, Chris Hoogendyk (hoogen...@bio.umass.edu) wrote: ... the solution is akin to the Japanese monks caring for Bonzai I liked this idea about tape archives -- constant pruning and maintenance. Difficult to sell though. As for your specific questions: You should be able to do LVM snapshots. I use fssnap on Solaris 9 and 10, and scanning through, here are just a couple of references I find to people using LVM snapshots with Amanda: snip With the latest releases of Amanda, there is a new API that could make it even easier to implement. Great; thanks for the pointers. Typically, we set up Amanda with holding disk space. snip If all the storage is locally attached (actually, AoE drives storage units connected over Ethernet), I am hoping to avoid the disk space if I can write to tape fast enough. I'd like to avoid paying for up to 15TB of fast holding disk space if I can avoid it. Compression can be done either on the client, on the server, or on the tape drive. Obviously, if you use software compression, you want to turn off the tape drive compression. I use server side compression, because I have a dedicated Amanda server that can handle it. By not using the tape drive compression, Amanda has more complete information on data size and tape usage for its planning. If your server is more constrained than your clients, you could use client compression. This is specified in your dumptypes in your amanda.conf. I don't have any clients, so this is an interesting observation. I'll be trying to do sofware compression then I think. The Unix backup book (google for amanda software compression) suggests that compression can be used on a per-image basis; presumably I can pass the backup data stream through gzip or bzip2 on the way to a tape? Deduplication is not available with Amanda. However, some people stage different kinds of tools and use Amanda for the final staging and management of tapes and archives. So, in some situations, BackupPC could be used to do deduplication from, say, desktop clients to a server archive which is then backed up by Amanda. That could start complicating your 12 year recovery scenario and what happens when software is not available or doesn't run. Great -- thanks for the details. Amanda uses the term index rather than catalog -- see http://wiki.zmanda.com/index.php/Amanda_Index. Note that if you are putting tapes into a long term archive with no intent of recycling them in subsequent backups, you can use amadmin to mark them as no-reuse. I periodically (typically at the end of semesters) do a force full, mark the tapes as no-reuse, and then pull them out of my tapecycle and put them in storage. Very useful again, thanks. Regards Rory -- Rory Campbell-Lange Director r...@campbell-lange.net Campbell-Lange Workshop www.campbell-lange.net 0207 6311 555 3 Tottenham Street London W1T 2AF Registered in England No. 04551928
Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape
Cyrille Bollu wrote: Here's my (very) small personnal experience: A few years ago, when I tried it, I couldn't enable server-side software compression while bypassing the holding disk with my IBM ULTIUM LTO-3 drive: Tape speed was sinking to about 5MB/s. My backup server was a Dell PowerEdge 2850 with 4 Intel Xeon 3GHz and 8MB RAM using RHEL-4.0 and amanda-2.4.4p3-1. Maybe did I do something wrong at that time (I just had 1 try). Beware though. That actually makes perfect sense. By not using a holding disk, you are disabling Amanda's ability to run multiple things in parallel. The tape device now controls everything. That is to say, you cannot do a backup without streaming it to the tape. So, you cannot do more than one at a time. Furthermore, as that one backup gets done, the compression has to be done as it is being streamed to the tape. So all the processes from reading a remote disk, to transferring it over the network, to compressing it, to writing it to the tape are all tied together in a single pipe. Any slowdown along that pipe will affect everything else. When the tape doesn't get what it needs to keep going, it will stop and then have to start up again and reposition, and then you get shoe shining. When you use a holding disk, Amanda can stream multiple backups to the holding disk simultaneously. It can compress them there when it has them and do it in parallel with other processes. Once it has something ready to go to tape, it can dd it straight from the disk to the tape as an independent process in parallel with the other things that are going on. That final step out to the tape is no longer constrained by any of the other steps along the way. Now all you have to worry about is tuning various pieces of hardware and software to get the throughput you want. -- --- Chris Hoogendyk - O__ Systems Administrator c/ /'_ --- Biology Geology Departments (*) \(*) -- 140 Morrill Science Center ~~ - University of Massachusetts, Amherst hoogen...@bio.umass.edu --- Erdös 4
Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape
Rory Campbell-Lange wrote: Hi Chris On 13/08/09, Chris Hoogendyk (hoogen...@bio.umass.edu) wrote: snip Typically, we set up Amanda with holding disk space. snip If all the storage is locally attached (actually, AoE drives storage units connected over Ethernet), I am hoping to avoid the disk space if I can write to tape fast enough. I'd like to avoid paying for up to 15TB of fast holding disk space if I can avoid it. So, one way would be to logically divide the storage into smaller DLE's. A DLE (Disk List Entry -- http://wiki.zmanda.com/man/disklist.5.html) for Amanda can be a mount point or directory. Obviously, I don't know how your storage is organized; but, if you can define your DLE's as separate directories on the storage device, each one of which is much smaller, then you could use a smaller holding disk and still benefit from Amanda's parallelism. In one of the other departments here, the sysadmin has successfully divided a large array this way and is driving LTO4 near top speed. Compression can be done either on the client, on the server, or on the tape drive. Obviously, if you use software compression, you want to turn off the tape drive compression. I use server side compression, because I have a dedicated Amanda server that can handle it. By not using the tape drive compression, Amanda has more complete information on data size and tape usage for its planning. If your server is more constrained than your clients, you could use client compression. This is specified in your dumptypes in your amanda.conf. I don't have any clients, so this is an interesting observation. I'll be trying to do sofware compression then I think. The Unix backup book (google for amanda software compression) suggests that compression can be used on a per-image basis; presumably I can pass the backup data stream through gzip or bzip2 on the way to a tape? Amanda will do the compression for you. You define it in the dumptype in amanda.conf. If you have a holding disk, then it will compress the data as it goes onto the holding disk. If you don't have a holding disk, then you might have issues with being able to stream a backup to tape, compressing it on the fly. Even with a really fast cpu, I don't know if you can maintain the throughput to drive LTO4 at a good speed. -- --- Chris Hoogendyk - O__ Systems Administrator c/ /'_ --- Biology Geology Departments (*) \(*) -- 140 Morrill Science Center ~~ - University of Massachusetts, Amherst hoogen...@bio.umass.edu --- Erdös 4
Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape
Chris Hoogendyk wrote: Rory Campbell-Lange wrote: Hi Chris On 13/08/09, Chris Hoogendyk (hoogen...@bio.umass.edu) wrote: snip Typically, we set up Amanda with holding disk space. snip If all the storage is locally attached (actually, AoE drives storage units connected over Ethernet), I am hoping to avoid the disk space if I can write to tape fast enough. I'd like to avoid paying for up to 15TB of fast holding disk space if I can avoid it. So, one way would be to logically divide the storage into smaller DLE's. A DLE (Disk List Entry -- http://wiki.zmanda.com/man/disklist.5.html) for Amanda can be a mount point or directory. Obviously, I don't know how your storage is organized; but, if you can define your DLE's as separate directories on the storage device, each one of which is much smaller, then you could use a smaller holding disk and still benefit from Amanda's parallelism. In one of the other departments here, the sysadmin has successfully divided a large array this way and is driving LTO4 near top speed. Compression can be done either on the client, on the server, or on the tape drive. Obviously, if you use software compression, you want to turn off the tape drive compression. I use server side compression, because I have a dedicated Amanda server that can handle it. By not using the tape drive compression, Amanda has more complete information on data size and tape usage for its planning. If your server is more constrained than your clients, you could use client compression. This is specified in your dumptypes in your amanda.conf. I don't have any clients, so this is an interesting observation. I'll be trying to do sofware compression then I think. The Unix backup book (google for amanda software compression) suggests that compression can be used on a per-image basis; presumably I can pass the backup data stream through gzip or bzip2 on the way to a tape? Amanda will do the compression for you. You define it in the dumptype in amanda.conf. If you have a holding disk, then it will compress the data as it goes onto the holding disk. If you don't have a holding disk, then you might have issues with being able to stream a backup to tape, compressing it on the fly. Even with a really fast cpu, I don't know if you can maintain the throughput to drive LTO4 at a good speed. You might want to consider configuring for client compression. Not only will that give you more CPU for feeding your tape, it also minimizes network bandwidth. As usual, YMMV, it all depends on where the bottlenecks are in your environment. Frank -- Frank Smith fsm...@hoovers.com Sr. Systems Administrator Voice: 512-374-4673 Hoover's Online Fax: 512-374-4501
Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape
On Thu, 13 Aug 2009 01:08:03 -0400 Jon LaBadie j...@jgcomp.com wrote: On Wed, Aug 12, 2009 at 06:17:17PM -0400, rorycl wrote: So maybe you should provide a complete OS distribution, including the backup software. Like a customized version of one of the live CD releases of Linux. But wait, will that distribution's included device drivers work on the devices that will exist in 12 years? Will that era's computers still have CD drives. Will they be bootable? Oh, folks 12 years hence ought to be able to dig out 12 year old computers to run their 12 year old distributions on. I suspect the bottle neck will be finding drives to read LT04 tapes; maybe keep one or two spares around? Or put two or three complete machines aside and keep them for the purpose. -- Charles Curley /\ASCII Ribbon Campaign Looking for fine software \ /Respect for open standards and/or writing? X No HTML/RTF in email http://www.charlescurley.com/ \No M$ Word docs in email Key fingerprint = CE5C 6645 A45A 64E4 94C0 809C FFF6 4C48 4ECD DFDB
Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape
On 13/08/09, Charles Curley (charlescur...@charlescurley.com) wrote: On Thu, 13 Aug 2009 01:08:03 -0400 Jon LaBadie j...@jgcomp.com wrote: On Wed, Aug 12, 2009 at 06:17:17PM -0400, rorycl wrote: So maybe you should provide a complete OS distribution, including the backup software. Like a customized version of one of the live CD releases of Linux. But wait, will that distribution's included device drivers work on the devices that will exist in 12 years? Will that era's computers still have CD drives. Will they be bootable? Oh, folks 12 years hence ought to be able to dig out 12 year old computers to run their 12 year old distributions on. Many thanks for this note, Charles, and to the other notes Chris, Charles and Jon about their commentary about using Amanda to provide a long-term archive format. The points about being able to use standard Unix tools to retrieve information is well made, as is the point that the current machines and architectures (and CDs!) may not be around in 12 years' time. Thanks very much for those observations. I'd like to return the other part of my question if I may: The backup tape format is to be LT04 and we have a second-hand Dell PowerVault 124T 16 tape autoloader to work with currently. Backup from a pool may be taken off a Linux LVM (or hopefully soon a BTRFS) snapshot ensuring that the source data does not change during the backup process. We have the possibility of pre-preparing backup or compressed images if this is advisable. I'd be grateful to learn specifically if the approach I have set out seems feasible. Also: - is the snapshot volume or secondary holding pool advisable? - is compression / deduplication possible? - after scanning through the wiki I can't see any references to what I think of as a backup job catalogue. How does one know what files were part of a particular backup job? Thanks for any further advice. Rory -- Rory Campbell-Lange Director r...@campbell-lange.net Campbell-Lange Workshop www.campbell-lange.net 0207 6311 555 3 Tottenham Street London W1T 2AF Registered in England No. 04551928
Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape
Rory Campbell-Lange wrote: On 13/08/09, Charles Curley (charlescur...@charlescurley.com) wrote: On Thu, 13 Aug 2009 01:08:03 -0400 Jon LaBadie j...@jgcomp.com wrote: On Wed, Aug 12, 2009 at 06:17:17PM -0400, rorycl wrote: So maybe you should provide a complete OS distribution, including the backup software. Like a customized version of one of the live CD releases of Linux. But wait, will that distribution's included device drivers work on the devices that will exist in 12 years? Will that era's computers still have CD drives. Will they be bootable? Oh, folks 12 years hence ought to be able to dig out 12 year old computers to run their 12 year old distributions on. Many thanks for this note, Charles, and to the other notes Chris, Charles and Jon about their commentary about using Amanda to provide a long-term archive format. The points about being able to use standard Unix tools to retrieve information is well made, as is the point that the current machines and architectures (and CDs!) may not be around in 12 years' time. Thanks very much for those observations. I'd like to return the other part of my question if I may: The backup tape format is to be LT04 and we have a second-hand Dell PowerVault 124T 16 tape autoloader to work with currently. Backup from a pool may be taken off a Linux LVM (or hopefully soon a BTRFS) snapshot ensuring that the source data does not change during the backup process. We have the possibility of pre-preparing backup or compressed images if this is advisable. I'd be grateful to learn specifically if the approach I have set out seems feasible. Also: - is the snapshot volume or secondary holding pool advisable? - is compression / deduplication possible? - after scanning through the wiki I can't see any references to what I think of as a backup job catalogue. How does one know what files were part of a particular backup job? Thanks for any further advice. One further comment on the nature of long term archives (and then on to your specific questions): I used to work in the Systems Office of the University Library. I handled backups there, and had close contact with a group of librarians who were into digital content, archives and special collections. Among other things, we kicked around ideas about how to archive digital collections, the life expectancies and failure rates of various types of CDs (generally terrible in reality), etc. When librarians talk about archives, they don't just talk decades. They expect things to last a hundred years and more. In that light, they have concluded that the solution is akin to the Japanese monks caring for Bonzai. There are records of Bonzai trees that have been cared for for hundreds of years. So, think of sysadmins as monks caring for data. The archive librarians solution is raid6 with hot spares and mirrored to another location. The sysadmins maintain and update hardware and software and transfer data when necessary. Although I haven't kept up with that area, they were developing a cooperative distributed archive software as an open source. The idea was that different libraries join the cooperative, run the software, and they end up with multiple copies of their digital collections distributed geographically among other libraries. If your library burns down, you rebuild, set up the software, and bring your collection back. Sort of a cloud library, if you will. So, as technology changes, you need to be frequently reviewing the state of your archives, keeping an eye on compatibility bottlenecks and transferring data to newer media when it becomes necessary. I have a faculty member who ran his own backups on AIT2 for years. His drives are fairly old now. I periodically urge him to read them back in to a disk archive and allow me to put them on AIT5. He's too busy. Ah, well. It's his data. -- As for your specific questions: You should be able to do LVM snapshots. I use fssnap on Solaris 9 and 10, and scanning through, here are just a couple of references I find to people using LVM snapshots with Amanda: http://wiki.zmanda.com/index.php/FAQ:Which_backup_program_for_filesystems_is_better%3F http://archives.zmanda.com/amanda-archives/viewtopic.php?t=2711sid=f1535cf0b0782bf2b99aebc033e91c9c http://archives.zmanda.com/amanda-archives/viewtopic.php?p=9823sid=8e54f6a0b4ab2cd58bd02e048c299844 In the past that sort of thing had always been done with a wrapper script (described toward the end of http://wiki.zmanda.com/index.php/Backup_client). Paul Bijens refers to a script that he uses in one of the above links. With the latest releases of Amanda, there is a new API that could make it even easier to implement. Typically, we set up Amanda with holding disk space. See the section of the sample amanda.conf partway down regarding holding
[Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape
I'm going to cross-post this text on the Amanda and Bacula lists. Apologies in advance if you see this twice. Our company is about to provide centralised backups for several pools of backup data of between 1 and 15TB in size. Each pool changes daily but backups to tape will only occur once a month for each pool. The backup tape format is to be LT04 and we have a second-hand Dell PowerVault 124T 16 tape autoloader to work with currently. Backup from a pool may be taken off a Linux LVM (or hopefully soon a BTRFS) snapshot ensuring that the source data does not change during the backup process. We have the possibility of pre-preparing backup or compressed images if this is advisable. An important aspect of the system is that the tapes should be readable for 12 years, by other parties if necessary. From this point of view we like the idea of providing a CD with each tape set of the software needed to extract the contents, together with a listing of the enclosed files in a UTF8 text file. We will be required to audit each backup set by successfully extracting files from tape. We are very familiar with working on the command-line in Linux, Postgresql and Python. As we have not run backup to tape on Linux before I would be very grateful to receive advice on what approach members of this list would take to meeting the above requirements. Many thanks, Rory +-- |This was sent by r...@campbell-lange.net via Backup Central. |Forward SPAM to ab...@backupcentral.com. +--
Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape
rorycl wrote: An important aspect of the system is that the tapes should be readable for 12 years, by other parties if necessary. From this point of view we like the idea of providing a CD with each tape set of the software needed to extract the contents, together with a listing of the enclosed files in a UTF8 text file. We will be required to audit each backup set by successfully extracting files from tape. Just taking up that one point for the moment -- Amanda is not just open source and open format, but the tape format is based on standard UNIX/Linux tools. If you pull off the first file of the tape, it actually tells you how to read the tape. You don't need Amanda or any Amanda tools to read it. Just standard UNIX/Linux tools that come with every distribution, such as dd, gnutar, and gzip. That said, it is easier to read and recover using the Amanda tools, because they will give you an index, allow you to specify what it is you want to recover, tell you which tapes you need, and get it for you. But, in the event that the tape lands in the hands of a UNIX/Linux admin who has never heard of Amanda, but who needs to recover the data, it can be done. And those tools are more likely to be available in stable or compatible forms in 12 years. It just happens that 12 years is about the lifecycle of a particular version of Solaris. That is, from the first introduction of Solaris X to its final EOL and drop of all support is about 12 years. I think Linux turns over faster than that, but the basic tools are typically compatible between versions. If you want, you can use amreport to generate a report on the contents of a backup. Since you won't need a CD of software (and won't need to worry about whether it will run, whether the right libraries will be available, etc.), you might decide that a printout provided with each tape might be easier. Sysadmin looks at printout and immediately sees what's on the tape and, Oh, gee, it's that easy to read the tape. That avoids the difficulty of a CD not being stable or readable. The tapes are typically going to outlive a CD. -- --- Chris Hoogendyk - O__ Systems Administrator c/ /'_ --- Biology Geology Departments (*) \(*) -- 140 Morrill Science Center ~~ - University of Massachusetts, Amherst hoogen...@bio.umass.edu --- Erdös 4
Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape
On Wed, 12 Aug 2009 18:17:17 -0400 rorycl amanda-fo...@backupcentral.com wrote: An important aspect of the system is that the tapes should be readable for 12 years, by other parties if necessary. From this point of view we like the idea of providing a CD with each tape set of the software needed to extract the contents, together with a listing of the enclosed files in a UTF8 text file. We will be required to audit each backup set by successfully extracting files from tape. I assume you already have verified that your tapes will last that long before print-through makes them unreadable. Another thought is to provide a CD/DVD of a suitable distribution of Linux with Amanda, mt, etc. That way you don't have compatibility problems with the current version of Amanda and some future distribution of Linux. Install and go, and at least then you should be able to read the tapes. -- Charles Curley /\ASCII Ribbon Campaign Looking for fine software \ /Respect for open standards and/or writing? X No HTML/RTF in email http://www.charlescurley.com/ \No M$ Word docs in email Key fingerprint = CE5C 6645 A45A 64E4 94C0 809C FFF6 4C48 4ECD DFDB
Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape
On Wed, Aug 12, 2009 at 06:17:17PM -0400, rorycl wrote: I'm going to cross-post this text on the Amanda and Bacula lists. Apologies in advance if you see this twice. Our company is about to provide centralised backups for several pools of backup data of between 1 and 15TB in size. Each pool changes daily but backups to tape will only occur once a month for each pool. The backup tape format is to be LT04 and we have a second-hand Dell PowerVault 124T 16 tape autoloader to work with currently. Backup from a pool may be taken off a Linux LVM (or hopefully soon a BTRFS) snapshot ensuring that the source data does not change during the backup process. We have the possibility of pre-preparing backup or compressed images if this is advisable. An important aspect of the system is that the tapes should be readable for 12 years, by other parties if necessary. From this point of view we like the idea of providing a CD with each tape set of the software needed to extract the contents, together with a listing of the enclosed files in a UTF8 text file. We will be required to audit each backup set by successfully extracting files from tape. Others have mentioned that even without amanda software, amanda backups are recoverable with standard unix/linux tools. I question whether the concept of providing the software is reasonable. Programs are compiled for a particular environment, versions of libraries, devices, operating system, etc. Amanda software, compiled for today's systems, is unlikely to be able to run on systems a dozen years from now. What systems were around in 1997? Many instances of them still running? So maybe you should provide a complete OS distribution, including the backup software. Like a customized version of one of the live CD releases of Linux. But wait, will that distribution's included device drivers work on the devices that will exist in 12 years? Will that era's computers still have CD drives. Will they be bootable? This requirement may take some additional thought. jl -- Jon H. LaBadie j...@jgcomp.com JG Computing 12027 Creekbend Drive (703) 787-0884 Reston, VA 20194 (703) 787-0922 (fax)