Re: Is tape spanning documented anywhere?

2006-06-16 Thread Josef Wolf
On Tue, Jun 13, 2006 at 05:20:13PM +0200, Toralf Lund wrote:
 
 And like I also said, in general, allowing partial flush would also 
 address another issue: The one of blocking the entire tape operation 
 when using a holding disk, and getting a dump larger that won't fit on 
 the runtape tapes even though it was expected to (either because of 
 miscalculations during the planner phase or because it specifying the 
 tape size seems to be a rather inexact science.)

This issue can be solved by a much simpler change:  Simply assume an
infinite large runtapes setting as long as no taping succeeded on the
current run.


Re: Is tape spanning documented anywhere?

2006-06-14 Thread Toralf Lund


I also have one other scenario in mind, though - which is one I've 
actually come across a number of times: What if a certain DLE due for 
backup is estimated to be slightly smaller than runtapes*tape size, 
and thus dumped to holding disk, but then turns out to be slightly 
larger?



Wouldn't it be more accurate to say the scenario you ran into previously
was DLE larger than tape size because the tape spanning feature was
not available at that time.
  

Yes. But the key issue remains unchanged.
With the current setup, amanda will obviously run out of 
tape-space during the original dump and also if you try amflush. And if 
auto-flush is enabled, the next dump will hit end-of-tape before any of 
the new dumps have been written, and the next one after that, and so on; 
this holding disk image will effectively block the tape operation of all 
the following backups, and eventually, the holding disk will be full, 
too, so amdump won't be able to do anything at all.



What is different with the tape spanning feature is that you could get the
large DLE to tape by simply increasing runtapes, even if only temporarily.
Thus, no system lockup.
  
Yes, but that requires manual intervention, and we were talking about 
safety. All situations where you have to do manual work in order to 
allow the backup to continue mean reducing safety, IMO, or differently 
put, a change that means they won't occur, increases safety.


- Toralf




Re: Is tape spanning documented anywhere?

2006-06-13 Thread Toralf Lund




Anyhow, I'd really like to know more about how the spanning actually 
works. Is it documented anywhere? http://www.amanda.org/docs and FAQ 
still say that the option does not exist...





Try http://wiki.zmanda.org/index.php/Splitting_dumps_across_tapes
  

Yes. Thanks. That's quite helpful.

A few questions still remain, though. Like

  1. Does tape splitting in any way affect the scheduling algorithm? I
 mean, what if Amanda wants to dump a certain amount in order to
 reach  the balanced dump size, and only much larger DLEs are
 available? Might one of those be scheduled just so that one
 split part can be used to fill up the remaining space?
  2. What happens to the holding disk file after a dump is partially
 written to tape? Will Amanda keep the entire file, or just what
 will be written next time around? And what if the holding disk
 data is split into chunks?

But I'm testing a setup using tape_splitsize right now, so maybe I'll 
find out...


- Toralf



Re: Is tape spanning documented anywhere?

2006-06-13 Thread Paul Bijnens

On 2006-06-13 09:41, Toralf Lund wrote:




Anyhow, I'd really like to know more about how the spanning actually 
works. Is it documented anywhere? http://www.amanda.org/docs and FAQ 
still say that the option does not exist...





Try http://wiki.zmanda.org/index.php/Splitting_dumps_across_tapes
  

Yes. Thanks. That's quite helpful.

A few questions still remain, though. Like

  1. Does tape splitting in any way affect the scheduling algorithm? I
 mean, what if Amanda wants to dump a certain amount in order to
 reach  the balanced dump size, and only much larger DLEs are
 available? Might one of those be scheduled just so that one
 split part can be used to fill up the remaining space?


While scheduling/planning Amanda does not take into account any
tape-splitting.   When a DLE is larger than then total amount of
level 0 dumps divided by runspercycle (= the optimal balanced amount
of level 0 dumps to schedule each run) then you have an unbalanced
configuration.  Amanda will try to reschedule full dumps (= promoting
the smaller ones sooner) but will never succeed.
The result is that you have even more level 0 dumps in a run than
in a balanced situation, using even more tape.



  2. What happens to the holding disk file after a dump is partially
 written to tape? Will Amanda keep the entire file, or just what
 will be written next time around? And what if the holding disk
 data is split into chunks?


Amanda keeps the entire dump, and will be flushed entirely again
on the next amflush or autoflush.

The holdingdisk chunking is completely independant of the tape chunking.


--
Paul Bijnens, xplanation Technology ServicesTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ...  Are you sure?  ...   YES   ...   Phew ...   I'm out  *
***



Re: Is tape spanning documented anywhere?

2006-06-13 Thread Toralf Lund





  2. What happens to the holding disk file after a dump is partially
 written to tape? Will Amanda keep the entire file, or just what
 will be written next time around? And what if the holding disk
 data is split into chunks?


Amanda keeps the entire dump, and will be flushed entirely again
on the next amflush or autoflush.
You mean entirely as in the whole DLE? That would mean that tape 
splitting is restricted to multiple tapes written in the same run, which 
would be rather disappointing.


Or maybe you misunderstood the question. Sorry if I was a bit unclear, 
but I'm not sure what terminology to use, now. What do you (should we) 
call one piece of output from the tape split? And what should dump 
be taken to mean? The entire output from the backup of one DLE, or one 
entry from the tape splitting. And what will a holding disk file 
contain, anyway? Again, will it be data for the whole DLE, or one 
instance of output from the split operation? (Like I said, I'm testing a 
bit as I write this, but haven't been able to draw any conclusions yet, 
mainly because I had to re-build amanda just now...)


Anyhow, when I said holding disk file earlier, what I meant was the 
holding disk data for one DLE, and when I said partially written to 
tape, what I meant was some, but not all, split sections completely 
written to tape (i.e. I was not talking about sections that are 
incomplete because end-of-tape is reached.)


- Toralf



Re: Is tape spanning documented anywhere?

2006-06-13 Thread Paul Bijnens

On 2006-06-13 10:32, Toralf Lund wrote:





  2. What happens to the holding disk file after a dump is partially
 written to tape? Will Amanda keep the entire file, or just what
 will be written next time around? And what if the holding disk
 data is split into chunks?


Amanda keeps the entire dump, and will be flushed entirely again
on the next amflush or autoflush.
You mean entirely as in the whole DLE? That would mean that tape 
splitting is restricted to multiple tapes written in the same run, which 
would be rather disappointing.


Yes indeed.  The whole DLE.  A singe DLE still needs to be written
in one run, possibly using many tapes.

But you may still split a v large DLE into smaller ones
using the include/exclude mechanism that exists for a long time.
Only now you are not limited by the capacity of a single tape anymore.




Or maybe you misunderstood the question. Sorry if I was a bit unclear, 
but I'm not sure what terminology to use, now. What do you (should we) 
call one piece of output from the tape split? And what should dump 
be taken to mean? The entire output from the backup of one DLE, or one 
entry from the tape splitting. And what will a holding disk file 
contain, anyway? Again, will it be data for the whole DLE, or one 
instance of output from the split operation? (Like I said, I'm testing a 
bit as I write this, but haven't been able to draw any conclusions yet, 
mainly because I had to re-build amanda just now...)


A piece of a DLE on tape:  a tape-chunk.
A dump:  the entire output of dump/gnutar for one DLE.

One dump may be split in chunks on the holdingdisk.
One dump may be split in chunks on a tape.
These two chunking mechanisms are completely independent.

Note also that when you bypass the holdingdisk and do want tape-chunking
there is an addional parameter split_diskbuffer which gives the path
of a file Amanda can use to buffer one tape-chunk on disk (to be able
to feed the tape as fast as possible).  And there is even a third
parameter fallback_splitsize, which gives the amount of RAM of a 
buffer when the split_diskbuffer failed.


But even when the last byte of single DLE did not make it to tape,
the complete DLE is considered failed (still on holdingdisk, or will
be tried again on the same level the next run).




Anyhow, when I said holding disk file earlier, what I meant was the 
holding disk data for one DLE, and when I said partially written to 
tape, what I meant was some, but not all, split sections completely 
written to tape (i.e. I was not talking about sections that are 
incomplete because end-of-tape is reached.)


I think I completely understood the question.  I had the same
terminology in my mind.

To confirm:  suppose you have one DLE:

Total DLE size (after compression):   295 Gbyte.
tape_splitsize 10 G
tape-capacity 101 G
runtapes 3
holdingdisk chunksize 2 G

And you also have some smaller dumps from other DLE's resulting in 20
Gbyte, which happen to be taped first.

The large DLE will will be spread over about 148 holdingdisk
chunks of 2 Gbyte each.

The first tape contains all those smaller ones already occupying
about 20 Gbyte, followed by 8 tape-chunks of the large DLE.
While writing chunk number 9, amanda bumps into eot, and restarts
that chunk all over again on the second tape.
The second tape contains 10 tapechunks more, nr 9-18.
While writing chunk 19 Amanda bumps into EOT again.
That 19th chunk is ignored on tape 2 and restarted again on tape 3.
Tape 3 contains tapechunks 9-28.

ANd while writing chunk 29 it bumps into EOT again.  But this was
the last tape we could use.  So the entire DLE, all the 295 GByte
is considered failed-to-tape.
All the 148 holdingdisk chunks are kept on disk.


--
Paul Bijnens, xplanation Technology ServicesTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ...  Are you sure?  ...   YES   ...   Phew ...   I'm out  *
***



Re: Is tape spanning documented anywhere?

2006-06-13 Thread Toralf Lund




Yes indeed.  The whole DLE.  A singe DLE still needs to be written
in one run, possibly using many tapes.
Oh no... Like I said, that's a big disappointment. I'm tempted to say 
that it is not correct to claim that Amanda now suppots tape spanning, 
if it can't span dumps across tapes written in separate runs.


Shouldn't it be able to delete the corresponding data from the holding 
disk file as tape-chunks are successfully written - so that only the 
remaining chunks would be flushed on the next run? Seems like this 
should be easy enough to implement, especially if you interact with 
holding disk chunks in a constructive manner. Is there any reason why 
nobody has looked into this, except for lack of time?
... or maybe what's on the holding disk does not really matter and/or is 
a separate issue. I suppose the taper already knows how to find a 
certain tape-chunk within the holding disk data, so it's more a matter 
of being able to tell it to start writing from a certain chunk 
(different from 0) during flush. The flush operation would obviously 
have to find the correct index somewhere in the database. Does it do a 
lookup at all today, or just blindly tape whatever is on the holding disk?


- Toralf






Re: Is tape spanning documented anywhere?

2006-06-13 Thread Paul Bijnens

On 2006-06-13 12:10, Toralf Lund wrote:




Yes indeed.  The whole DLE.  A singe DLE still needs to be written
in one run, possibly using many tapes.
Oh no... Like I said, that's a big disappointment. I'm tempted to say 
that it is not correct to claim that Amanda now suppots tape spanning, 
if it can't span dumps across tapes written in separate runs.


Shouldn't it be able to delete the corresponding data from the holding 
disk file as tape-chunks are successfully written - so that only the 
remaining chunks would be flushed on the next run? Seems like this 
should be easy enough to implement, especially if you interact with 
holding disk chunks in a constructive manner. Is there any reason why 
nobody has looked into this, except for lack of time?
... or maybe what's on the holding disk does not really matter and/or is 
a separate issue. I suppose the taper already knows how to find a 
certain tape-chunk within the holding disk data, so it's more a matter 
of being able to tell it to start writing from a certain chunk 
(different from 0) during flush. The flush operation would obviously 
have to find the correct index somewhere in the database. Does it do a 
lookup at all today, or just blindly tape whatever is on the holding disk?


Taping one DLE is several runs opens a can of worms:  you have to
add a notion of partial succeeded.  Restoring then needs some tapes
and some holdingdisk files.  What if the holdingdisk crashes or
accidently rm the files before all of it is written to tape? etc.

As Stefan used to say:  AAPW(*).

(*) http://article.gmane.org/gmane.comp.archivers.amanda.user/25064

--
Paul Bijnens, xplanation Technology ServicesTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ...  Are you sure?  ...   YES   ...   Phew ...   I'm out  *
***



Re: Is tape spanning documented anywhere?

2006-06-13 Thread Toralf Lund

Paul Bijnens wrote:

On 2006-06-13 12:10, Toralf Lund wrote:




Yes indeed.  The whole DLE.  A singe DLE still needs to be written
in one run, possibly using many tapes.
Oh no... Like I said, that's a big disappointment. I'm tempted to 
say that it is not correct to claim that Amanda now suppots tape 
spanning, if it can't span dumps across tapes written in separate runs.


Shouldn't it be able to delete the corresponding data from the 
holding disk file as tape-chunks are successfully written - so that 
only the remaining chunks would be flushed on the next run? Seems 
like this should be easy enough to implement, especially if you 
interact with holding disk chunks in a constructive manner. Is there 
any reason why nobody has looked into this, except for lack of time?
... or maybe what's on the holding disk does not really matter and/or 
is a separate issue. I suppose the taper already knows how to find a 
certain tape-chunk within the holding disk data, so it's more a 
matter of being able to tell it to start writing from a certain chunk 
(different from 0) during flush. The flush operation would obviously 
have to find the correct index somewhere in the database. Does it do 
a lookup at all today, or just blindly tape whatever is on the 
holding disk?


Taping one DLE is several runs opens a can of worms:  you have to
add a notion of partial succeeded.  Restoring then needs some tapes
and some holdingdisk files.  What if the holdingdisk crashes or
accidently rm the files before all of it is written to tape? etc.
This would be a bit of an issue, of course. I'm wondering if the would 
the situation be that much different from the one we have today, though. 
Holding disk crash or file removal is always going to be a serious 
problem, of course. But if you have a partial tape dump of the data, you 
will at least be able to recover some of it... Maybe it would be wise to 
keep all data on the disk until all tape-chunks are fully written, 
though, and also use the partial status only for flush purposes, i.e. 
consider partial writes as unsuccessful when doing restore etc.


What happens in the current version if amdump is interrupted while 
writing the 2nd tape, by the way?


As Stefan used to say:  AAPW(*).
Well, yes. I was partly also asking if a P would be W in this case, 
though, or if someone for some good reason had decided that tape split 
across runs should not be supported.


- Toralf





Re: Is tape spanning documented anywhere?

2006-06-13 Thread Joshua Baker-LePain

On Tue, 13 Jun 2006 at 12:55pm, Toralf Lund wrote


Paul Bijnens wrote:


Taping one DLE is several runs opens a can of worms:  you have to
add a notion of partial succeeded.  Restoring then needs some tapes
and some holdingdisk files.  What if the holdingdisk crashes or
accidently rm the files before all of it is written to tape? etc.


This would be a bit of an issue, of course. I'm wondering if the would the 
situation be that much different from the one we have today, though. Holding 
disk crash or file removal is always going to be a serious problem, of 
course. But if you have a partial tape dump of the data, you will at least be 
able to recover some of it... Maybe it would be wise to keep all data on the


To throw my $.02 in here, the situations would be very different.  If one 
is forced to have all DLEs tapeable in one amdump run, then 
(theoretically), nothing will be left on the holding disk to lose should 
said disk die.  That's obviously not the case if single DLEs are allowed 
to span amdumps, and the holding disk dies between amdumps.


Having the entire night's amdump run on tape at the end of the amdump 
gives me that warm fuzzy feeling inside.  Maybe it's just me being 
curmudgeonly (it wouldn't be the first time -- hell, I haven't found a WM 
I like more than fvwm2) and slavishly adhering to the KISS method.  But I 
think backups *should* adhere to the KISS method.


What happens in the current version if amdump is interrupted while writing 
the 2nd tape, by the way?


I assume the same thing that would happen if amdump were interrupted while 
writing to the 1st tape.  The image being written to tape would be marked 
FAILED TO TAPE and be left on the holding disk (along with any other 
images that hadn't been written yet), and the user would be encouraged to 
run amflush.


--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


Re: Is tape spanning documented anywhere?

2006-06-13 Thread Paul Bijnens

On 2006-06-13 12:55, Toralf Lund wrote:

Paul Bijnens wrote:

On 2006-06-13 12:10, Toralf Lund wrote:




Yes indeed.  The whole DLE.  A singe DLE still needs to be written
in one run, possibly using many tapes.
Oh no... Like I said, that's a big disappointment. I'm tempted to 
say that it is not correct to claim that Amanda now suppots tape 
spanning, if it can't span dumps across tapes written in separate runs.


Shouldn't it be able to delete the corresponding data from the 
holding disk file as tape-chunks are successfully written - so that 
only the remaining chunks would be flushed on the next run? Seems 
like this should be easy enough to implement, especially if you 
interact with holding disk chunks in a constructive manner. Is there 
any reason why nobody has looked into this, except for lack of time?
... or maybe what's on the holding disk does not really matter and/or 
is a separate issue. I suppose the taper already knows how to find a 
certain tape-chunk within the holding disk data, so it's more a 
matter of being able to tell it to start writing from a certain chunk 
(different from 0) during flush. The flush operation would obviously 
have to find the correct index somewhere in the database. Does it do 
a lookup at all today, or just blindly tape whatever is on the 
holding disk?


Taping one DLE is several runs opens a can of worms:  you have to
add a notion of partial succeeded.  Restoring then needs some tapes
and some holdingdisk files.  What if the holdingdisk crashes or
accidently rm the files before all of it is written to tape? etc.
This would be a bit of an issue, of course. I'm wondering if the would 
the situation be that much different from the one we have today, though. 
Holding disk crash or file removal is always going to be a serious 
problem, of course. But if you have a partial tape dump of the data, you 
will at least be able to recover some of it... Maybe it would be wise to 
keep all data on the disk until all tape-chunks are fully written, 
though, and also use the partial status only for flush purposes, i.e. 
consider partial writes as unsuccessful when doing restore etc.


What happens in the current version if amdump is interrupted while 
writing the 2nd tape, by the way?


As Stefan used to say:  AAPW(*).
Well, yes. I was partly also asking if a P would be W in this case, 
though, or if someone for some good reason had decided that tape split 
across runs should not be supported.


I think any AP would very W.   :-)

It could also help the current minor problem that taping starts only
when the DLE is completely dumped to holdingdisk.
The current implementation also assumes the tape chunks follow
sequentially on the tape.  This is not strictly necessary either.

Allowing tape-chunks to be interspersed with chunks from other DLE's
together with multi-run taping... Wow, that would make Amanda really
one of the best free backup programs!

Only a small step further and you can use the gnutar option
--record-number (show record number within archive of a particular
file) making it possible to restore from only a few tape-chunks,
instead of feeding the complete 300 Gbyte image to tar, to extract
only one file, which happens to be at the end of the image by
murphy's law anyway.  :-)

--
Paul Bijnens, xplanation Technology ServicesTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ...  Are you sure?  ...   YES   ...   Phew ...   I'm out  *
***



Re: Is tape spanning documented anywhere?

2006-06-13 Thread Joshua Baker-LePain

On Tue, 13 Jun 2006 at 2:46pm, Toralf Lund wrote


Joshua Baker-LePain wrote:


To throw my $.02 in here, the situations would be very different.  If one 
is forced to have all DLEs tapeable in one amdump run, then 
(theoretically), nothing will be left on the holding disk to lose should 
said disk die.


But we're talking about a situation where the DLEs are not tapeable. The


With tape spanning as implemented, any DLE is tapeable if runtapes is big 
enough.  :)


  Maybe it's just me being curmudgeonly (it wouldn't be the first time -- 
hell, I haven't found a WM I like more than fvwm2) and slavishly adhering 
to the KISS method.  But I think backups *should* adhere to the KISS 
method.


Normally I would agree, but I have to back up 3Tb of data organised as one 
single volume. The only simple option would be to have one 3Tb tape as 
well, but such a thing isn't available (to me at least.) Also, I think the 
whole tape splitting concept is inherently complex, and what I suggest here 
doesn't change the complexity level. The complexity was introduced already, 
I'm just talking about a *simple* implementation adjustment...


I agree that it doesn't change the complexity level.  But it does change
the safety level.  Suddenly you're making yourself far more vulnerable to
losing parts of a backup image.

On a practical level, I'm pretty sure that the setup you're proposing 
would require you to have a 3TB holding disk (or at least 3TB-tapelength) 
to hold your level 0.  Looking at the amanda.conf man page, amanda *can* 
span tapes without using a holding disk, but doing so requires either a 
disk buffer (different from a holding disk in that the whole dump image 
isn't buffered there, just the chunks that have come from the dump disk 
but haven't made it to tape yet) or buffering the chunks to system RAM. 
Am I missing something?


I don't know about you, but I'd have a hard time convincing my boss I 
needed a 2nd 3TB server to backup the first 3TB server.  :)


--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


Re: Is tape spanning documented anywhere?

2006-06-13 Thread Joshua Baker-LePain

On Tue, 13 Jun 2006 at 3:05pm, Paul Bijnens wrote


On 2006-06-13 12:55, Toralf Lund wrote:



It could also help the current minor problem that taping starts only
when the DLE is completely dumped to holdingdisk.
The current implementation also assumes the tape chunks follow
sequentially on the tape.  This is not strictly necessary either.

Allowing tape-chunks to be interspersed with chunks from other DLE's
together with multi-run taping... Wow, that would make Amanda really
one of the best free backup programs!


Again, let the curmudgeon step in here.  One of the initial design 
principles of amanda was the ability to get your data off the tapes with 
*no* amanda tools -- mt, dd, and tar or restore were all that was needed. 
Tape spanning as implemented has already broken that, requiring 
amfetchdump to reassemble spanned DLEs...


I think.  I honestly don't know how badly the principle has been broken. 
Can one simply cat the 2 (or more) spanned images together (minus some 
header info perhaps) and get the whole image back?


But I do know that interspersing tape chunks from multiple DLEs would 
absolutely destroy any hopes of getting your data off the tapes without 
amanda's tools *and* record keeping.  With live CDs so prevelant these 
days, keeping copies of the amanda tools around is dead easy.  But, 
IMNSHO, losing the ability to get your data off the tapes if you lose your 
amanda database is unacceptable.



Only a small step further and you can use the gnutar option
--record-number (show record number within archive of a particular
file) making it possible to restore from only a few tape-chunks,
instead of feeding the complete 300 Gbyte image to tar, to extract
only one file, which happens to be at the end of the image by
murphy's law anyway.  :-)


This *is* something that would be nice to implement.  But I'd like to see 
it implemented in a way that makes it optional.  Amanda holds onto record 
numbers.  But, again, if you lose your amanda database, the whole tarball 
would still be there for you to recover and feed to tar directly.


--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


Re: Is tape spanning documented anywhere?

2006-06-13 Thread Paul Bijnens

On 2006-06-13 15:35, Joshua Baker-LePain wrote:


Again, let the curmudgeon step in here.  One of the initial design 
principles of amanda was the ability to get your data off the tapes with 
*no* amanda tools -- mt, dd, and tar or restore were all that was 
needed. Tape spanning as implemented has already broken that, requiring 
amfetchdump to reassemble spanned DLEs...


I think.  I honestly don't know how badly the principle has been broken. 
Can one simply cat the 2 (or more) spanned images together (minus some 
header info perhaps) and get the whole image back?


http://wiki.zmanda.com/index.php/Restoring_files#Using_amrestore_with_split_dumps

In that explanation I used amrestore to fetch the chunks from disk to
tape, but doing it with a shell script is still doable:
- read the first 32K block of the tape chunk
- get the first line and decide if this is the a chunk you need
  (we can still keep the requirement that chunks should have been
  written monotonously, but they can be interspersed with other chunks)
- if not, just skip to the next tape chunk
- if yes, save the rest of the tape chunk to disk
- output that chunk to stdout when reading the next chunk header and
  that has a different number (because incomplete blocks are rewritten
  in the beginning of the next tape).

When I find some more time, I'll test that method, and add it to the
webpage.

Amrestore does not need any database, or amanda.conf file at all, so
is not too dependent on Amanda things only.

The recommended way is to backup the DLE's that contains Amanda related
files with the KISS principle (no splitting, maybe not even gzipping!)
and then use the more complicat-ing/ed features for those DLE's that 
need it.

But I agree.  I like to Keep It Simple too.

In general Backup Programs should make it easy to restore. That's even
more important than easy to backup.



--
Paul Bijnens, xplanation Technology ServicesTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ...  Are you sure?  ...   YES   ...   Phew ...   I'm out  *
***



Re: Is tape spanning documented anywhere?

2006-06-13 Thread Jon LaBadie
On Tue, Jun 13, 2006 at 09:35:31AM -0400, Joshua Baker-LePain wrote:
 On Tue, 13 Jun 2006 at 3:05pm, Paul Bijnens wrote
 
 On 2006-06-13 12:55, Toralf Lund wrote:
 
 It could also help the current minor problem that taping starts only
 when the DLE is completely dumped to holdingdisk.
 The current implementation also assumes the tape chunks follow
 sequentially on the tape.  This is not strictly necessary either.
 
 Allowing tape-chunks to be interspersed with chunks from other DLE's
 together with multi-run taping... Wow, that would make Amanda really
 one of the best free backup programs!
 
 Again, let the curmudgeon step in here.  One of the initial design 
 principles of amanda was the ability to get your data off the tapes with 
 *no* amanda tools -- mt, dd, and tar or restore were all that was needed. 
 Tape spanning as implemented has already broken that, requiring 
 amfetchdump to reassemble spanned DLEs...
 
 I think.  I honestly don't know how badly the principle has been broken. 
 Can one simply cat the 2 (or more) spanned images together (minus some 
 header info perhaps) and get the whole image back?
 
 But I do know that interspersing tape chunks from multiple DLEs would 
 absolutely destroy any hopes of getting your data off the tapes without 
 amanda's tools *and* record keeping.  With live CDs so prevelant these 
 days, keeping copies of the amanda tools around is dead easy.  But, 
 IMNSHO, losing the ability to get your data off the tapes if you lose your 
 amanda database is unacceptable.
 

My feelings exactly JLB.

Tape spanning was an important addition.  I was willing to accept
the loss of easy recovery without amanda because of its importance
and because it is optional on a DLE by DLE basis.

Plus I feel, without confirming this, that you could fairly easily
combine the tape splits (should we call them splits vs holding disk
chunks?) using standard tools.

But I would certainly hesitate to go much further and further
complicate standard tool recovery.  That ability saved me twice already.

-- 
Jon H. LaBadie  [EMAIL PROTECTED]
 JG Computing
 4455 Province Line Road(609) 252-0159
 Princeton, NJ  08540-4322  (609) 683-7220 (fax)


Re: Is tape spanning documented anywhere?

2006-06-13 Thread Jon LaBadie
On Tue, Jun 13, 2006 at 02:46:31PM +0200, Toralf Lund wrote:
 
 Normally I would agree, but I have to back up 3Tb of data organised as 
 one single volume. The only simple option would be to have one 3Tb 
 tape as well, but such a thing isn't available (to me at least.)

Toralf,
perhaps I'm being dense, but why isn't your situation satisfied by
the current tape-spanning.  I'm envisioning something like lto-2
or lto-3 drives and using no holding disk but sufficient buffer
space.  If your data compresses to say 1.6TB with the 400GB lto-3
tapes, a setting of runtapes 5 or 6 will accept an entire level 0
dump with only part of the final tape wasted.  On incremental dumps,
amanda would use only as many tapes as necessary.  Again, only the
final tape would have wasted space.


-- 
Jon H. LaBadie  [EMAIL PROTECTED]
 JG Computing
 4455 Province Line Road(609) 252-0159
 Princeton, NJ  08540-4322  (609) 683-7220 (fax)


Re: Is tape spanning documented anywhere?

2006-06-13 Thread Toralf Lund


To throw my $.02 in here, the situations would be very different.  
If one is forced to have all DLEs tapeable in one amdump run, 
then (theoretically), nothing will be left on the holding disk to 
lose should said disk die.


But we're talking about a situation where the DLEs are not 
tapeable. The


With tape spanning as implemented, any DLE is tapeable if runtapes is 
big enough.  :)


What I'm slightly worried about, is the unbalanced setup a low number 
of large DLEs combined with a large runtapes value will give me. I mean, 
it implies that several tapes will have to be written on some nights, 
while nothing at all is taped on others - at least if we disregard 
incrementals for the moment. This could mean that the write operation 
will continue long into the following day, when we want to use the 
server's capacity for other purposes, or (even worse) isn't finished 
when the next dump is supposed to start. Actually, maybe there won't be 
any serious issues associated with this, but I'd just feel more 
comfortable if I could spread the work more evenly and/or use the idle 
hours of every night. And a different flush operation would help me 
achieve at least part of that, even though the actual dump would still 
be pretty unbalanced.


Some of my colleagues have just nearly convinced me that I worry too 
much, though ;-/




  Maybe it's just me being curmudgeonly (it wouldn't be the first 
time -- hell, I haven't found a WM I like more than fvwm2) and 
slavishly adhering to the KISS method.  But I think backups *should* 
adhere to the KISS method.


Normally I would agree, but I have to back up 3Tb of data organised 
as one single volume. The only simple option would be to have one 
3Tb tape as well, but such a thing isn't available (to me at least.) 
Also, I think the whole tape splitting concept is inherently complex, 
and what I suggest here doesn't change the complexity level. The 
complexity was introduced already, I'm just talking about a *simple* 
implementation adjustment...


I agree that it doesn't change the complexity level.  But it does change
the safety level.  Suddenly you're making yourself far more vulnerable to
losing parts of a backup image.

On a practical level, I'm pretty sure that the setup you're proposing 
would require you to have a 3TB holding disk (or at least 
3TB-tapelength) to hold your level 0.
It's not quite as bad as that, fortunately. While there is one 3TB 
volume, I can actually split it into more than one DLE quite easily. 
Splitting it into (much) more than tapes-per-cycle entries (which seems 
to be a requirement if you want a balanced setup) is however going to 
very hard. But you are right, holding list space is also going to be a 
bit of an issue.


I also have one other scenario in mind, though - which is one I've 
actually come across a number of times: What if a certain DLE due for 
backup is estimated to be slightly smaller than runtapes*tape size, 
and thus dumped to holding disk, but then turns out to be slightly 
larger? With the current setup, amanda will obviously run out of 
tape-space during the original dump and also if you try amflush. And if 
auto-flush is enabled, the next dump will hit end-of-tape before any of 
the new dumps have been written, and the next one after that, and so on; 
this holding disk image will effectively block the tape operation of all 
the following backups, and eventually, the holding disk will be full, 
too, so amdump won't be able to do anything at all.


If we were to introduce partial tape write as discussed here, but 
leave the scheduling algorithm unchanged, we would actually increase the 
safety in this area - an oversized dump would also be flushed 
eventually, and not lock up the system. We would not compromise the 
safety in other ways, as Amanda would still try to schedule only 
runtapes*tape size's worth of data (so nothing would be left on the 
holding disk if everything went according to plan.)


- Toralf



Re: Is tape spanning documented anywhere?

2006-06-13 Thread Toralf Lund

Jon LaBadie wrote:

On Tue, Jun 13, 2006 at 02:46:31PM +0200, Toralf Lund wrote:
  
Normally I would agree, but I have to back up 3Tb of data organised as 
one single volume. The only simple option would be to have one 3Tb 
tape as well, but such a thing isn't available (to me at least.)



Toralf,
perhaps I'm being dense, but why isn't your situation satisfied by
the current tape-spanning.  I'm envisioning something like lto-2
or lto-3 drives and using no holding disk but sufficient buffer
space.  If your data compresses to say 1.6TB with the 400GB lto-3
tapes, a setting of runtapes 5 or 6 will accept an entire level 0
dump with only part of the final tape wasted. 
Well, like I just said in another post - maybe I worry to much, but I'm 
a bit concerned about dumping 5 or 6 tapes during one run and nothing 
during others, based in timing/system load considerations. It just seems 
nicer to spread the work as evenly as possibly across runs...


And like I also said, in general, allowing partial flush would also 
address another issue: The one of blocking the entire tape operation 
when using a holding disk, and getting a dump larger that won't fit on 
the runtape tapes even though it was expected to (either because of 
miscalculations during the planner phase or because it specifying the 
tape size seems to be a rather inexact science.)


We're talking about an LTO-2 changer, by the way...

- Toralf



Re: Is tape spanning documented anywhere?

2006-06-13 Thread Toralf Lund

Toralf Lund wrote:

Jon LaBadie wrote:

On Tue, Jun 13, 2006 at 02:46:31PM +0200, Toralf Lund wrote:
 
Normally I would agree, but I have to back up 3Tb of data organised 
as one single volume. The only simple option would be to have one 
3Tb tape as well, but such a thing isn't available (to me at least.)



Toralf,
perhaps I'm being dense, but why isn't your situation satisfied by
the current tape-spanning.  I'm envisioning something like lto-2
or lto-3 drives and using no holding disk but sufficient buffer
space.  If your data compresses to say 1.6TB with the 400GB lto-3
tapes, a setting of runtapes 5 or 6 will accept an entire level 0
dump with only part of the final tape wasted. 
Well, like I just said in another post - maybe I worry to much, but 
I'm a bit concerned about dumping 5 or 6 tapes during one run and 
nothing during others, based in timing/system load considerations. It 
just seems nicer to spread the work as evenly as possibly across runs...
Also, I was thinking that I might be able to split up the directory 
enough to make do with 2 tapes per DLE. With the current tape-spanning 
and runtapes 2, the waste of tape would then start getting rather 
significant - I would waste space on every other tape, rather than just 
one out of 5 or 6...


But maybe I shouldn't worry too much about extra tape usage, either, 
since the tapes are a one-time cost with the normal reuse setup. Wasted 
tapes means slightly more work for the person responsible for changing 
the tapes, though...


- T



Re: Is tape spanning documented anywhere?

2006-06-13 Thread Joshua Baker-LePain

On Tue, 13 Jun 2006 at 4:04pm, Paul Bijnens wrote


http://wiki.zmanda.com/index.php/Restoring_files#Using_amrestore_with_split_dumps

In that explanation I used amrestore to fetch the chunks from disk to
tape, but doing it with a shell script is still doable:
- read the first 32K block of the tape chunk
- get the first line and decide if this is the a chunk you need
 (we can still keep the requirement that chunks should have been
 written monotonously, but they can be interspersed with other chunks)
- if not, just skip to the next tape chunk
- if yes, save the rest of the tape chunk to disk
- output that chunk to stdout when reading the next chunk header and
 that has a different number (because incomplete blocks are rewritten
 in the beginning of the next tape).


So it sounds like tape chunks are just like full dump images, with the 
standard 32KB amanda header with added info about which chunk it is. 
That's a Good Thing.  Thanks for the pointer when I was being lazy.


Given that, interspersing chunks from different images could be done 
without creating too much extra hassle.  *But* I don't know that I see the 
utility.  You liked the idea of starting to tape a dump from holding disk 
before the dump from the client is done.  While I can see the utility, 
what happens when the client or the network dies mid-dump?  You just 
wasted a bunch of tape.


--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


Re: Is tape spanning documented anywhere?

2006-06-13 Thread Jon LaBadie
On Tue, Jun 13, 2006 at 05:10:53PM +0200, Toralf Lund wrote:
 
 I also have one other scenario in mind, though - which is one I've 
 actually come across a number of times: What if a certain DLE due for 
 backup is estimated to be slightly smaller than runtapes*tape size, 
 and thus dumped to holding disk, but then turns out to be slightly 
 larger?

Wouldn't it be more accurate to say the scenario you ran into previously
was DLE larger than tape size because the tape spanning feature was
not available at that time.

 With the current setup, amanda will obviously run out of 
 tape-space during the original dump and also if you try amflush. And if 
 auto-flush is enabled, the next dump will hit end-of-tape before any of 
 the new dumps have been written, and the next one after that, and so on; 
 this holding disk image will effectively block the tape operation of all 
 the following backups, and eventually, the holding disk will be full, 
 too, so amdump won't be able to do anything at all.

What is different with the tape spanning feature is that you could get the
large DLE to tape by simply increasing runtapes, even if only temporarily.
Thus, no system lockup.

-- 
Jon H. LaBadie  [EMAIL PROTECTED]
 JG Computing
 4455 Province Line Road(609) 252-0159
 Princeton, NJ  08540-4322  (609) 683-7220 (fax)


Re: Is tape spanning documented anywhere?

2006-06-12 Thread Matt Hyclak
On Mon, Jun 12, 2006 at 10:18:46AM +0200, Toralf Lund enlightened us:
 I haven't been following the posts to this list too closely, or bothered 
 to upgrade amanda, for some time (since our existing setup *works*...), 
 so I didn't find out until right now that tape spanning is supported in 
 the current release.
 
 Anyhow, I'd really like to know more about how the spanning actually 
 works. Is it documented anywhere? http://www.amanda.org/docs and FAQ 
 still say that the option does not exist...
 

Try http://wiki.zmanda.org/index.php/Splitting_dumps_across_tapes

Matt

-- 
Matt Hyclak
Department of Mathematics 
Department of Social Work
Ohio University
(740) 593-1263