Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape

2009-08-17 Thread Chris Hoogendyk



Rory Campbell-Lange wrote:

On 14/08/09, Frank Smith (fsm...@hoovers.com) wrote:
  

Chris Hoogendyk wrote:

Amanda will do the compression for you. You define it in the dumptype in 
amanda.conf. If you have a holding disk, then it will compress the data 
as it goes onto the holding disk. If you don't have a holding disk, then 
you might have issues with being able to stream a backup to tape, 
compressing it on the fly. Even with a really fast cpu, I don't know if 
you can maintain the throughput to drive LTO4 at a good speed.
  

You might want to consider configuring for client compression.  Not
only will that give you more CPU for feeding your tape, it also
minimizes network bandwidth. As usual, YMMV, it all depends on where
the bottlenecks are in your environment.



In our case the server _is_ the only client, with up to 30TB of direct
attached storage, with the storage running at between 80MB/s and 120MB/s
access speeds (Bytes rather than bytes).

I don't know if this is fast enough to deal with a SAS connected LTO4
drive, particularly if it is doing software compression along the way.

With reference to Chris Hoogendyk's email clarification on
parallelism, I am very curious to learn if Amanda ...still require[s]
a DLE to be completed to holding disk before it will send any of it to
tape... In our case this is a particularly important question as,
although we can add in more AoE storage for a DLE, this will only run at
the speeds above. Do we need a 1TB SAS disk array too?


You will get the best performance if you can do that. If the disk that 
is being copied to tape can give the speed the tape needs, that's going 
to do a better job of keeping things moving.


You have a couple of options.

You can go without a holding disk, and then each DLE will be streamed 
sequentially to tape. This will stretch out your backups. It will also 
mean that any compression you do in software will be done in line with 
that sequential stream. Your system may not be able to keep that all 
flying fast enough for the tape, and you may end up with shoe shining 
and very low speeds. You can certainly try it and see what happens. If 
(when?) that fails, you could try using hardware compression on the tape 
drive. The backups will still be sequential, one DLE streaming to tape 
at a time, and if your drives can't keep up, it will be slower than you 
might like. But, at least you are not dealing with network backups.


The option I would try, budget allowing, would be to add a couple of SAS 
drives to be used as holding disks. Then break up your DLEs so that each 
DLE is substantially smaller than the holding disks. Then Amanda can run 
them in parallel, compress them on the holding disk, and then stream 
completed, compressed DLEs from the holding disk to the tape.


I wouldn't put the holding disks in raid.


--
---

Chris Hoogendyk

-
  O__   Systems Administrator
 c/ /'_ --- Biology  Geology Departments
(*) \(*) -- 140 Morrill Science Center
~~ - University of Massachusetts, Amherst 


hoogen...@bio.umass.edu

--- 


Erdös 4




Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape

2009-08-17 Thread Cyrille Bollu
 
 I wouldn't put the holding disks in raid.

Hu hu... Interesting... I have a 4 disks RAID-0 holding disk, and it isn't 
fast... I always wondered if I should use seperated (non-RAID) drives... 

Cyrille Bollu
Responsable systèmes
Fedasil - ICT
tel: +32.2.213.43.49
gsm: +32.478.23.08.15

Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape

2009-08-17 Thread Gene Heskett
On Monday 17 August 2009, Cyrille Bollu wrote:
 I wouldn't put the holding disks in raid.

Hu hu... Interesting... I have a 4 disks RAID-0 holding disk, and it isn't
fast... I always wondered if I should use seperated (non-RAID) drives...

Its been my observation that software raids are slower. Not tremendously so 
though.  When Jim put together a raid-5 with 4 drives several years ago, the 
drives were about 70meg/sec drives, and the overall was just a hair over 
50meg/sec.  I know he has rebuilt it with bigger  faster drives 2 or 3 times 
since, along with more iron in the cpu, so I don't know its current speed.  
Being retired means being out of the loop. :(

Cyrille Bollu
Responsable systèmes
Fedasil - ICT
tel: +32.2.213.43.49
gsm: +32.478.23.08.15


-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
The NRA is offering FREE Associate memberships to anyone who wants them.
https://www.nrahq.org/nrabonus/accept-membership.asp

A prig is a fellow who is always making you a present of his opinions.
-- George Eliot



Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape

2009-08-17 Thread Alan Hodgson
 On Monday 17 August 2009, Cyrille Bollu wrote:
  I wouldn't put the holding disks in raid.
 
 Hu hu... Interesting... I have a 4 disks RAID-0 holding disk, and it
  isn't fast... I always wondered if I should use seperated (non-RAID)
  drives...

To drive an LTO-4 your holding disk needs to read somewhat over 100MB/sec 
sequentially, which requires at least 2 drives striped, but should be easy 
enough with any modern raid controller or software raid. 

Complicating this, it may also need to write at similar speed, 
simultaneously, which introduces a random access element and ups the demand 
considerably. I would think you would probably want at least 4 big SATA 
drives striped together to reliably feed an LTO-4 drive at full speed. 
Conveniently, this could also give you over 5TB of very cheap holding disk 
space.

Also, unless you're backing up exclusively large files over a fast SAN link 
(faster than Gig-E), I doubt you could get anywhere close to full tape 
performance without a holding disk.


-- 
... a serious depression seems improbable; [we expect] recovery of business
next spring, with further improvement in the fall. Harvard Economic
Society, November 10, 1929


Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape

2009-08-17 Thread Chris Hoogendyk



Cyrille Bollu wrote:

Chris Hoogendyk wrote:
 I wouldn't put the holding disks in raid.

Hu hu... Interesting... I have a 4 disks RAID-0 holding disk, and it 
isn't fast... I always wondered if I should use seperated (non-RAID) 
drives...


Here is an extremely interesting article that everyone should take a 
look at. Rather than just giving theoretical comparisons that you see in 
places like the Wikipedia raid article (which is nevertheless an 
excellent reference), this article is a case study analysis with both 
real and test environments throwing data at raid and instrumenting it -- 
http://blogs.zdnet.com/Ou/?p=484.


While this guy is looking at things like database servers and exchange, 
we ought to be able to interpret this for Amanda. A couple of points to 
note for Amanda: Amanda will use all the holding disk drives while doing 
parallel backups and storing output on the holding disks. When it is 
writing to tape, it is constrained by the sequential nature of the tape, 
and will only be doing one DLE at a time from those that it has 
completed on the holding disks. Also, Amanda's access is heavily 
sequential, although it may have multiple parallel processes hitting the 
drives.



--
---

Chris Hoogendyk

-
O__  Systems Administrator
c/ /'_ --- Biology  Geology Departments
(*) \(*) -- 140 Morrill Science Center
~~ - University of Massachusetts, Amherst

hoogen...@bio.umass.edu

---

Erdös 4



Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape

2009-08-17 Thread Rory Campbell-Lange

On 17/08/09, Chris Hoogendyk (hoogen...@bio.umass.edu) wrote:
 Cyrille Bollu wrote:
 Chris Hoogendyk wrote:
  I wouldn't put the holding disks in raid.
snip
 http://blogs.zdnet.com/Ou/?p=484.
 
 While this guy is looking at things like database servers and
 exchange, we ought to be able to interpret this for Amanda. A couple
 of points to note for Amanda: Amanda will use all the holding disk
 drives while doing parallel backups and storing output on the
 holding disks. When it is writing to tape, it is constrained by the
 sequential nature of the tape, and will only be doing one DLE at a
 time from those that it has completed on the holding disks. Also,
 Amanda's access is heavily sequential, although it may have multiple
 parallel processes hitting the drives.

A slow RAID1 off two 7200 RPM SATA disks on a BBU-backed LSI hardware
raid controller can do about 62031 KiB/s write and 86399 KiB/s read.
Those sorts of numbers improve steadily the number of spindles you add
to a RAID collection and the higher the RAID number and (in the case of
writing) if cacheing is enabled.

On 16/08/09, Rory Campbell-Lange (r...@campbell-lange.net) wrote:
 On 14/08/09, Frank Smith (fsm...@hoovers.com) wrote:
  Chris Hoogendyk wrote:
 With reference to Chris Hoogendyk's email clarification on
 parallelism, I am very curious to learn if Amanda ...still require[s]
 a DLE to be completed to holding disk before it will send any of it to
 tape... In our case this is a particularly important question as,
 although we can add in more AoE storage for a DLE, this will only run at
 the speeds above. Do we need a 1TB SAS disk array too?

From the discussion here it seems preferable to have a DLE on two major
counts. One is that compression can happen prior to writing to tape,
which could result in shoe-shining, and another is that Amanda will be
clearer about the amount of data it will be trying to write to a tape,
in other words it will do a better fit of data to tape.

The most important question I now have to ask is:

How fast can a SAS-based LTO4 drive write to tape?

Regards
Rory


-- 
Rory Campbell-Lange
Director
r...@campbell-lange.net

Campbell-Lange Workshop
www.campbell-lange.net
0207 6311 555
3 Tottenham Street London W1T 2AF
Registered in England No. 04551928


Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape

2009-08-16 Thread Rory Campbell-Lange
On 14/08/09, Frank Smith (fsm...@hoovers.com) wrote:
 Chris Hoogendyk wrote:
  Amanda will do the compression for you. You define it in the dumptype in 
  amanda.conf. If you have a holding disk, then it will compress the data 
  as it goes onto the holding disk. If you don't have a holding disk, then 
  you might have issues with being able to stream a backup to tape, 
  compressing it on the fly. Even with a really fast cpu, I don't know if 
  you can maintain the throughput to drive LTO4 at a good speed.
 
 You might want to consider configuring for client compression.  Not
 only will that give you more CPU for feeding your tape, it also
 minimizes network bandwidth. As usual, YMMV, it all depends on where
 the bottlenecks are in your environment.

In our case the server _is_ the only client, with up to 30TB of direct
attached storage, with the storage running at between 80MB/s and 120MB/s
access speeds (Bytes rather than bytes).

I don't know if this is fast enough to deal with a SAS connected LTO4
drive, particularly if it is doing software compression along the way.

With reference to Chris Hoogendyk's email clarification on
parallelism, I am very curious to learn if Amanda ...still require[s]
a DLE to be completed to holding disk before it will send any of it to
tape... In our case this is a particularly important question as,
although we can add in more AoE storage for a DLE, this will only run at
the speeds above. Do we need a 1TB SAS disk array too?

Rory

-- 
Rory Campbell-Lange
Director
r...@campbell-lange.net

Campbell-Lange Workshop
www.campbell-lange.net
0207 6311 555
3 Tottenham Street London W1T 2AF
Registered in England No. 04551928


Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape

2009-08-14 Thread Rory Campbell-Lange
Hi Chris

On 13/08/09, Chris Hoogendyk (hoogen...@bio.umass.edu) wrote:
 ... the solution is akin to the Japanese monks caring for Bonzai

I liked this idea about tape archives -- constant pruning and
maintenance. Difficult to sell though.

 As for your specific questions:
 
 You should be able to do LVM snapshots. I use fssnap on Solaris 9 and
 10, and scanning through, here are just a couple of references I find
 to people using LVM snapshots with Amanda:
snip
 With the latest releases of Amanda, there is a new API that could make
 it even easier to implement.

Great; thanks for the pointers.

 Typically, we set up Amanda with holding disk space.
snip

If all the storage is locally attached (actually, AoE drives storage
units connected over Ethernet), I am hoping to avoid the disk space if I
can write to tape fast enough. I'd like to avoid paying for up to 15TB
of fast holding disk space if I can avoid it.

 Compression can be done either on the client, on the server, or on
 the tape drive. Obviously, if you use software compression, you want
 to turn off the tape drive compression. I use server side
 compression, because I have a dedicated Amanda server that can
 handle it. By not using the tape drive compression, Amanda has more
 complete information on data size and tape usage for its planning.
 If your server is more constrained than your clients, you could use
 client compression. This is specified in your dumptypes in your
 amanda.conf.

I don't have any clients, so this is an interesting observation. I'll be
trying to do sofware compression then I think. The Unix backup book
(google for amanda software compression) suggests that compression can
be used on a per-image basis; presumably I can pass the backup data
stream through gzip or bzip2 on the way to a tape?

 Deduplication is not available with Amanda. However, some people
 stage different kinds of tools and use Amanda for the final staging
 and management of tapes and archives. So, in some situations,
 BackupPC could be used to do deduplication from, say, desktop
 clients to a server archive which is then backed up by Amanda. That
 could start complicating your 12 year recovery scenario and what
 happens when software is not available or doesn't run.

Great -- thanks for the details.

 Amanda uses the term index rather than catalog -- see
 http://wiki.zmanda.com/index.php/Amanda_Index.
 
 Note that if you are putting tapes into a long term archive with no
 intent of recycling them in subsequent backups, you can use amadmin
 to mark them as no-reuse. I periodically (typically at the end of
 semesters) do a force full, mark the tapes as no-reuse, and then
 pull them out of my tapecycle and put them in storage.

Very useful again, thanks.

Regards
Rory
-- 
Rory Campbell-Lange
Director
r...@campbell-lange.net

Campbell-Lange Workshop
www.campbell-lange.net
0207 6311 555
3 Tottenham Street London W1T 2AF
Registered in England No. 04551928


Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape

2009-08-14 Thread Cyrille Bollu
Hi,

Here's my (very) small personnal experience:

A few years ago, when I tried it, I couldn't enable server-side software 
compression while bypassing the holding disk with my IBM ULTIUM LTO-3 
drive: Tape speed was sinking to about 5MB/s.

My backup server was a Dell PowerEdge 2850 with 4 Intel Xeon 3GHz and 8MB 
RAM using RHEL-4.0 and amanda-2.4.4p3-1.

Maybe did I do something wrong at that time (I just had 1 try). Beware 
though.

Cyrille

owner-amanda-us...@amanda.org wrote on 14/08/2009 15:57:45:

 Hi Chris
 
 On 13/08/09, Chris Hoogendyk (hoogen...@bio.umass.edu) wrote:
  ... the solution is akin to the Japanese monks caring for Bonzai
 
 I liked this idea about tape archives -- constant pruning and
 maintenance. Difficult to sell though.
 
  As for your specific questions:
  
  You should be able to do LVM snapshots. I use fssnap on Solaris 9 and
  10, and scanning through, here are just a couple of references I find
  to people using LVM snapshots with Amanda:
 snip
  With the latest releases of Amanda, there is a new API that could make
  it even easier to implement.
 
 Great; thanks for the pointers.
 
  Typically, we set up Amanda with holding disk space.
 snip
 
 If all the storage is locally attached (actually, AoE drives storage
 units connected over Ethernet), I am hoping to avoid the disk space if I
 can write to tape fast enough. I'd like to avoid paying for up to 15TB
 of fast holding disk space if I can avoid it.
 
  Compression can be done either on the client, on the server, or on
  the tape drive. Obviously, if you use software compression, you want
  to turn off the tape drive compression. I use server side
  compression, because I have a dedicated Amanda server that can
  handle it. By not using the tape drive compression, Amanda has more
  complete information on data size and tape usage for its planning.
  If your server is more constrained than your clients, you could use
  client compression. This is specified in your dumptypes in your
  amanda.conf.
 
 I don't have any clients, so this is an interesting observation. I'll be
 trying to do sofware compression then I think. The Unix backup book
 (google for amanda software compression) suggests that compression can
 be used on a per-image basis; presumably I can pass the backup data
 stream through gzip or bzip2 on the way to a tape?
 
  Deduplication is not available with Amanda. However, some people
  stage different kinds of tools and use Amanda for the final staging
  and management of tapes and archives. So, in some situations,
  BackupPC could be used to do deduplication from, say, desktop
  clients to a server archive which is then backed up by Amanda. That
  could start complicating your 12 year recovery scenario and what
  happens when software is not available or doesn't run.
 
 Great -- thanks for the details.
 
  Amanda uses the term index rather than catalog -- see
  http://wiki.zmanda.com/index.php/Amanda_Index.
  
  Note that if you are putting tapes into a long term archive with no
  intent of recycling them in subsequent backups, you can use amadmin
  to mark them as no-reuse. I periodically (typically at the end of
  semesters) do a force full, mark the tapes as no-reuse, and then
  pull them out of my tapecycle and put them in storage.
 
 Very useful again, thanks.
 
 Regards
 Rory
 -- 
 Rory Campbell-Lange
 Director
 r...@campbell-lange.net
 
 Campbell-Lange Workshop
 www.campbell-lange.net
 0207 6311 555
 3 Tottenham Street London W1T 2AF
 Registered in England No. 04551928


Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape

2009-08-14 Thread Chris Hoogendyk



Cyrille Bollu wrote:

Here's my (very) small personnal experience:

A few years ago, when I tried it, I couldn't enable server-side 
software compression while bypassing the holding disk with my IBM 
ULTIUM LTO-3 drive: Tape speed was sinking to about 5MB/s.


My backup server was a Dell PowerEdge 2850 with 4 Intel Xeon 3GHz and 
8MB RAM using RHEL-4.0 and amanda-2.4.4p3-1.


Maybe did I do something wrong at that time (I just had 1 try). Beware 
though. 


That actually makes perfect sense.

By not using a holding disk, you are disabling Amanda's ability to run 
multiple things in parallel. The tape device now controls everything. 
That is to say, you cannot do a backup without streaming it to the tape. 
So, you cannot do more than one at a time. Furthermore, as that one 
backup gets done, the compression has to be done as it is being streamed 
to the tape. So all the processes from reading a remote disk, to 
transferring it over the network, to compressing it, to writing it to 
the tape are all tied together in a single pipe. Any slowdown along that 
pipe will affect everything else. When the tape doesn't get what it 
needs to keep going, it will stop and then have to start up again and 
reposition, and then you get shoe shining.


When you use a holding disk, Amanda can stream multiple backups to the 
holding disk simultaneously. It can compress them there when it has them 
and do it in parallel with other processes. Once it has something ready 
to go to tape, it can dd it straight from the disk to the tape as an 
independent process in parallel with the other things that are going on. 
That final step out to the tape is no longer constrained by any of the 
other steps along the way. Now all you have to worry about is tuning 
various pieces of hardware and software to get the throughput you want.



--
---

Chris Hoogendyk

-
  O__   Systems Administrator
 c/ /'_ --- Biology  Geology Departments
(*) \(*) -- 140 Morrill Science Center
~~ - University of Massachusetts, Amherst 


hoogen...@bio.umass.edu

--- 


Erdös 4




Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape

2009-08-14 Thread Chris Hoogendyk



Rory Campbell-Lange wrote:

Hi Chris

On 13/08/09, Chris Hoogendyk (hoogen...@bio.umass.edu) wrote:
  
snip

Typically, we set up Amanda with holding disk space.


snip

If all the storage is locally attached (actually, AoE drives storage
units connected over Ethernet), I am hoping to avoid the disk space if I
can write to tape fast enough. I'd like to avoid paying for up to 15TB
of fast holding disk space if I can avoid it.


So, one way would be to logically divide the storage into smaller DLE's. 
A DLE (Disk List Entry -- http://wiki.zmanda.com/man/disklist.5.html) 
for Amanda can be a mount point or directory. Obviously, I don't know 
how your storage is organized; but, if you can define your DLE's as 
separate directories on the storage device, each one of which is much 
smaller, then you could use a smaller holding disk and still benefit 
from Amanda's parallelism. In one of the other departments here, the 
sysadmin has successfully divided a large array this way and is driving 
LTO4 near top speed.



Compression can be done either on the client, on the server, or on
the tape drive. Obviously, if you use software compression, you want
to turn off the tape drive compression. I use server side
compression, because I have a dedicated Amanda server that can
handle it. By not using the tape drive compression, Amanda has more
complete information on data size and tape usage for its planning.
If your server is more constrained than your clients, you could use
client compression. This is specified in your dumptypes in your
amanda.conf.



I don't have any clients, so this is an interesting observation. I'll be
trying to do sofware compression then I think. The Unix backup book
(google for amanda software compression) suggests that compression can
be used on a per-image basis; presumably I can pass the backup data
stream through gzip or bzip2 on the way to a tape?


Amanda will do the compression for you. You define it in the dumptype in 
amanda.conf. If you have a holding disk, then it will compress the data 
as it goes onto the holding disk. If you don't have a holding disk, then 
you might have issues with being able to stream a backup to tape, 
compressing it on the fly. Even with a really fast cpu, I don't know if 
you can maintain the throughput to drive LTO4 at a good speed.



--
---

Chris Hoogendyk

-
  O__   Systems Administrator
 c/ /'_ --- Biology  Geology Departments
(*) \(*) -- 140 Morrill Science Center
~~ - University of Massachusetts, Amherst 


hoogen...@bio.umass.edu

--- 


Erdös 4




Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape

2009-08-14 Thread Frank Smith
Chris Hoogendyk wrote:
 
 Rory Campbell-Lange wrote:
 Hi Chris

 On 13/08/09, Chris Hoogendyk (hoogen...@bio.umass.edu) wrote:
   
 snip
 Typically, we set up Amanda with holding disk space.
 
 snip

 If all the storage is locally attached (actually, AoE drives storage
 units connected over Ethernet), I am hoping to avoid the disk space if I
 can write to tape fast enough. I'd like to avoid paying for up to 15TB
 of fast holding disk space if I can avoid it.
 
 So, one way would be to logically divide the storage into smaller DLE's. 
 A DLE (Disk List Entry -- http://wiki.zmanda.com/man/disklist.5.html) 
 for Amanda can be a mount point or directory. Obviously, I don't know 
 how your storage is organized; but, if you can define your DLE's as 
 separate directories on the storage device, each one of which is much 
 smaller, then you could use a smaller holding disk and still benefit 
 from Amanda's parallelism. In one of the other departments here, the 
 sysadmin has successfully divided a large array this way and is driving 
 LTO4 near top speed.
 
 Compression can be done either on the client, on the server, or on
 the tape drive. Obviously, if you use software compression, you want
 to turn off the tape drive compression. I use server side
 compression, because I have a dedicated Amanda server that can
 handle it. By not using the tape drive compression, Amanda has more
 complete information on data size and tape usage for its planning.
 If your server is more constrained than your clients, you could use
 client compression. This is specified in your dumptypes in your
 amanda.conf.
 
 I don't have any clients, so this is an interesting observation. I'll be
 trying to do sofware compression then I think. The Unix backup book
 (google for amanda software compression) suggests that compression can
 be used on a per-image basis; presumably I can pass the backup data
 stream through gzip or bzip2 on the way to a tape?
 
 Amanda will do the compression for you. You define it in the dumptype in 
 amanda.conf. If you have a holding disk, then it will compress the data 
 as it goes onto the holding disk. If you don't have a holding disk, then 
 you might have issues with being able to stream a backup to tape, 
 compressing it on the fly. Even with a really fast cpu, I don't know if 
 you can maintain the throughput to drive LTO4 at a good speed.

You might want to consider configuring for client compression.  Not
only will that give you more CPU for feeding your tape, it also
minimizes network bandwidth. As usual, YMMV, it all depends on where
the bottlenecks are in your environment.

Frank


-- 
Frank Smith  fsm...@hoovers.com
Sr. Systems Administrator   Voice: 512-374-4673
Hoover's Online   Fax: 512-374-4501


Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape

2009-08-13 Thread Charles Curley
On Thu, 13 Aug 2009 01:08:03 -0400
Jon LaBadie j...@jgcomp.com wrote:

 On Wed, Aug 12, 2009 at 06:17:17PM -0400, rorycl wrote:
  

 So maybe you should provide a complete OS distribution, including the
 backup software.  Like a customized version of one of the live CD
 releases of Linux.  But wait, will that distribution's included
 device drivers work on the devices that will exist in 12 years?  Will
 that era's computers still have CD drives.  Will they be bootable?

Oh, folks 12 years hence ought to be able to dig out 12 year old
computers to run their 12 year old distributions on. I suspect the
bottle neck will be finding drives to read LT04 tapes; maybe keep one or
two spares around?

Or put two or three complete machines aside and keep them for the
purpose.


-- 

Charles Curley  /\ASCII Ribbon Campaign
Looking for fine software   \ /Respect for open standards
and/or writing?  X No HTML/RTF in email
http://www.charlescurley.com/ \No M$ Word docs in email

Key fingerprint = CE5C 6645 A45A 64E4 94C0  809C FFF6 4C48 4ECD DFDB


Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape

2009-08-13 Thread Rory Campbell-Lange
On 13/08/09, Charles Curley (charlescur...@charlescurley.com) wrote:
 On Thu, 13 Aug 2009 01:08:03 -0400
 Jon LaBadie j...@jgcomp.com wrote:
  On Wed, Aug 12, 2009 at 06:17:17PM -0400, rorycl wrote:

  So maybe you should provide a complete OS distribution, including the
  backup software.  Like a customized version of one of the live CD
  releases of Linux.  But wait, will that distribution's included
  device drivers work on the devices that will exist in 12 years?  Will
  that era's computers still have CD drives.  Will they be bootable?
 
 Oh, folks 12 years hence ought to be able to dig out 12 year old
 computers to run their 12 year old distributions on. 

Many thanks for this note, Charles, and to the other notes Chris,
Charles and Jon about their commentary about using Amanda to provide a
long-term archive format. The points about being able to use standard
Unix tools to retrieve information is well made, as is the point that
the current machines and architectures (and CDs!) may not be around in
12 years' time. Thanks very much for those observations. 

I'd like to return the other part of my question if I may:

 The backup tape format is to be LT04 and we have a second-hand Dell
 PowerVault 124T 16 tape autoloader to work with currently. Backup from
 a pool may be taken off a Linux LVM (or hopefully soon a BTRFS)
 snapshot ensuring that the source data does not change during the
 backup process. We have the possibility of pre-preparing backup or
 compressed images if this is advisable.

I'd be grateful to learn specifically if the approach I have set out
seems feasible. Also:
 
- is the snapshot volume or secondary holding pool advisable?
- is compression / deduplication possible?   
- after scanning through the wiki I can't see any references to what
  I think of as a backup job catalogue. How does one know what
  files were part of a particular backup job?   

Thanks for any further advice.

Rory

-- 
Rory Campbell-Lange
Director
r...@campbell-lange.net

Campbell-Lange Workshop
www.campbell-lange.net
0207 6311 555
3 Tottenham Street London W1T 2AF
Registered in England No. 04551928


Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape

2009-08-13 Thread Chris Hoogendyk



Rory Campbell-Lange wrote:

On 13/08/09, Charles Curley (charlescur...@charlescurley.com) wrote:
  

On Thu, 13 Aug 2009 01:08:03 -0400
Jon LaBadie j...@jgcomp.com wrote:


On Wed, Aug 12, 2009 at 06:17:17PM -0400, rorycl wrote:
  
So maybe you should provide a complete OS distribution, including the

backup software.  Like a customized version of one of the live CD
releases of Linux.  But wait, will that distribution's included
device drivers work on the devices that will exist in 12 years?  Will
that era's computers still have CD drives.  Will they be bootable?
  

Oh, folks 12 years hence ought to be able to dig out 12 year old
computers to run their 12 year old distributions on. 



Many thanks for this note, Charles, and to the other notes Chris,
Charles and Jon about their commentary about using Amanda to provide a
long-term archive format. The points about being able to use standard
Unix tools to retrieve information is well made, as is the point that
the current machines and architectures (and CDs!) may not be around in
12 years' time. Thanks very much for those observations. 


I'd like to return the other part of my question if I may:

  

The backup tape format is to be LT04 and we have a second-hand Dell
PowerVault 124T 16 tape autoloader to work with currently. Backup from
a pool may be taken off a Linux LVM (or hopefully soon a BTRFS)
snapshot ensuring that the source data does not change during the
backup process. We have the possibility of pre-preparing backup or
compressed images if this is advisable.



I'd be grateful to learn specifically if the approach I have set out
seems feasible. Also:
 
- is the snapshot volume or secondary holding pool advisable?
- is compression / deduplication possible?   
- after scanning through the wiki I can't see any references to what
  I think of as a backup job catalogue. How does one know what
  files were part of a particular backup job?   


Thanks for any further advice.


One further comment on the nature of long term archives (and then on to 
your specific questions):


I used to work in the Systems Office of the University Library. I 
handled backups there, and had close contact with a group of librarians 
who were into digital content, archives and special collections. Among 
other things, we kicked around ideas about how to archive digital 
collections, the life expectancies and failure rates of various types of 
CDs (generally terrible in reality), etc. When librarians talk about 
archives, they don't just talk decades. They expect things to last a 
hundred years and more. In that light, they have concluded that the 
solution is akin to the Japanese monks caring for Bonzai. There are 
records of Bonzai trees that have been cared for for hundreds of years. 
So, think of sysadmins as monks caring for data. The archive librarians 
solution is raid6 with hot spares and mirrored to another location. The 
sysadmins maintain and update hardware and software and transfer data 
when necessary. Although I haven't kept up with that area, they were 
developing a cooperative distributed archive software as an open source. 
The idea was that different libraries join the cooperative, run the 
software, and they end up with multiple copies of their digital 
collections distributed geographically among other libraries. If your 
library burns down, you rebuild, set up the software, and bring your 
collection back. Sort of a cloud library, if you will.


So, as technology changes, you need to be frequently reviewing the state 
of your archives, keeping an eye on compatibility bottlenecks and 
transferring data to newer media when it becomes necessary. I have a 
faculty member who ran his own backups on AIT2 for years. His drives are 
fairly old now. I periodically urge him to read them back in to a disk 
archive and allow me to put them on AIT5. He's too busy. Ah, well. It's 
his data.


--

As for your specific questions:

You should be able to do LVM snapshots. I use fssnap on Solaris 9 and 
10, and scanning through, here are just a couple of references I find to 
people using LVM snapshots with Amanda:

http://wiki.zmanda.com/index.php/FAQ:Which_backup_program_for_filesystems_is_better%3F
http://archives.zmanda.com/amanda-archives/viewtopic.php?t=2711sid=f1535cf0b0782bf2b99aebc033e91c9c
http://archives.zmanda.com/amanda-archives/viewtopic.php?p=9823sid=8e54f6a0b4ab2cd58bd02e048c299844

In the past that sort of thing had always been done with a wrapper 
script (described toward the end of 
http://wiki.zmanda.com/index.php/Backup_client). Paul Bijens refers to a 
script that he uses in one of the above links. With the latest releases 
of Amanda, there is a new API that could make it even easier to implement.


Typically, we set up Amanda with holding disk space.
See the section of the sample amanda.conf partway down regarding holding 

[Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape

2009-08-12 Thread rorycl

I'm going to cross-post this text on the Amanda and Bacula lists.
Apologies in advance if you see this twice.

Our company is about to provide centralised backups for several pools of
backup data of between 1 and 15TB in size. Each pool changes daily but
backups to tape will only occur once a month for each pool.

The backup tape format is to be LT04 and we have a second-hand Dell
PowerVault 124T 16 tape autoloader to work with currently. Backup from a
pool may be taken off a Linux LVM (or hopefully soon a BTRFS) snapshot
ensuring that the source data does not change during the backup process.
We have the possibility of pre-preparing backup or compressed images if
this is advisable.

An important aspect of the system is that the tapes should be readable
for 12 years, by other parties if necessary. From this point of view we
like the idea of providing a CD with each tape set of the software
needed to extract the contents, together with a listing of the enclosed
files in a UTF8 text file. We will be required to audit each backup set
by successfully extracting files from tape.

We are very familiar with working on the command-line in Linux,
Postgresql and Python.

As we have not run backup to tape on Linux before I would be very
grateful to receive advice on what approach members of this list would
take to meeting the above requirements.

Many thanks,
Rory

+--
|This was sent by r...@campbell-lange.net via Backup Central.
|Forward SPAM to ab...@backupcentral.com.
+--




Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape

2009-08-12 Thread Chris Hoogendyk



rorycl wrote:

An important aspect of the system is that the tapes should be readable
for 12 years, by other parties if necessary. From this point of view we
like the idea of providing a CD with each tape set of the software
needed to extract the contents, together with a listing of the enclosed
files in a UTF8 text file. We will be required to audit each backup set
by successfully extracting files from tape.


Just taking up that one point for the moment -- Amanda is not just open 
source and open format, but the tape format is based on standard 
UNIX/Linux tools. If you pull off the first file of the tape, it 
actually tells you how to read the tape. You don't need Amanda or any 
Amanda tools to read it. Just standard UNIX/Linux tools that come with 
every distribution, such as dd, gnutar, and gzip.


That said, it is easier to read and recover using the Amanda tools, 
because they will give you an index, allow you to specify what it is you 
want to recover, tell you which tapes you need, and get it for you. But, 
in the event that the tape lands in the hands of a UNIX/Linux admin who 
has never heard of Amanda, but who needs to recover the data, it can be 
done. And those tools are more likely to be available in stable or 
compatible forms in 12 years. It just happens that 12 years is about the 
lifecycle of a particular version of Solaris. That is, from the first 
introduction of Solaris X to its final EOL and drop of all support is 
about 12 years. I think Linux turns over faster than that, but the basic 
tools are typically compatible between versions.


If you want, you can use amreport to generate a report on the contents 
of a backup. Since you won't need a CD of software (and won't need to 
worry about whether it will run, whether the right libraries will be 
available, etc.), you might decide that a printout provided with each 
tape might be easier. Sysadmin looks at printout and immediately sees 
what's on the tape and, Oh, gee, it's that easy to read the tape. That 
avoids the difficulty of a CD not being stable or readable. The tapes 
are typically going to outlive a CD.



--
---

Chris Hoogendyk

-
  O__   Systems Administrator
 c/ /'_ --- Biology  Geology Departments
(*) \(*) -- 140 Morrill Science Center
~~ - University of Massachusetts, Amherst 


hoogen...@bio.umass.edu

--- 


Erdös 4




Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape

2009-08-12 Thread Charles Curley
On Wed, 12 Aug 2009 18:17:17 -0400
rorycl amanda-fo...@backupcentral.com wrote:


 An important aspect of the system is that the tapes should be readable
 for 12 years, by other parties if necessary. From this point of view
 we like the idea of providing a CD with each tape set of the software
 needed to extract the contents, together with a listing of the
 enclosed files in a UTF8 text file. We will be required to audit each
 backup set by successfully extracting files from tape.

I assume you already have verified that your tapes will last that long
before print-through makes them unreadable.

Another thought is to provide a CD/DVD of a suitable distribution of
Linux with Amanda, mt, etc. That way you don't have compatibility
problems with the current version of Amanda and some future
distribution of Linux. Install and go, and at least then you should be
able to read the tapes.

-- 

Charles Curley  /\ASCII Ribbon Campaign
Looking for fine software   \ /Respect for open standards
and/or writing?  X No HTML/RTF in email
http://www.charlescurley.com/ \No M$ Word docs in email

Key fingerprint = CE5C 6645 A45A 64E4 94C0  809C FFF6 4C48 4ECD DFDB


Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape

2009-08-12 Thread Jon LaBadie
On Wed, Aug 12, 2009 at 06:17:17PM -0400, rorycl wrote:
 
 I'm going to cross-post this text on the Amanda and Bacula lists.
 Apologies in advance if you see this twice.
 
 Our company is about to provide centralised backups for several pools of
 backup data of between 1 and 15TB in size. Each pool changes daily but
 backups to tape will only occur once a month for each pool.
 
 The backup tape format is to be LT04 and we have a second-hand Dell
 PowerVault 124T 16 tape autoloader to work with currently. Backup from a
 pool may be taken off a Linux LVM (or hopefully soon a BTRFS) snapshot
 ensuring that the source data does not change during the backup process.
 We have the possibility of pre-preparing backup or compressed images if
 this is advisable.
 
 An important aspect of the system is that the tapes should be readable
 for 12 years, by other parties if necessary. From this point of view we
 like the idea of providing a CD with each tape set of the software
 needed to extract the contents, together with a listing of the enclosed
 files in a UTF8 text file. We will be required to audit each backup set
 by successfully extracting files from tape.

Others have mentioned that even without amanda software, amanda backups
are recoverable with standard unix/linux tools.

I question whether the concept of providing the software is reasonable.
Programs are compiled for a particular environment, versions of libraries,
devices, operating system, etc.  Amanda software, compiled for today's
systems, is unlikely to be able to run on systems a dozen years from now.
What systems were around in 1997?  Many instances of them still running?  

So maybe you should provide a complete OS distribution, including the backup
software.  Like a customized version of one of the live CD releases of
Linux.  But wait, will that distribution's included device drivers work on
the devices that will exist in 12 years?  Will that era's computers still
have CD drives.  Will they be bootable?

This requirement may take some additional thought.

jl
-- 
Jon H. LaBadie  j...@jgcomp.com
 JG Computing
 12027 Creekbend Drive  (703) 787-0884
 Reston, VA  20194  (703) 787-0922 (fax)