Re: Is tape spanning documented anywhere?
On Tue, Jun 13, 2006 at 05:20:13PM +0200, Toralf Lund wrote: > > And like I also said, in general, allowing "partial flush" would also > address another issue: The one of blocking the entire tape operation > when using a holding disk, and getting a dump larger that won't fit on > the tapes even though it was expected to (either because of > miscalculations during the planner phase or because it specifying the > tape size seems to be a rather inexact science.) This issue can be solved by a much simpler change: Simply assume an infinite large runtapes setting as long as no taping succeeded on the current run.
Re: Is tape spanning documented anywhere?
I also have one other scenario in mind, though - which is one I've actually come across a number of times: What if a certain DLE due for backup is estimated to be slightly smaller than *, and thus dumped to holding disk, but then turns out to be slightly larger? Wouldn't it be more accurate to say the scenario you ran into previously was DLE larger than because the tape spanning feature was not available at that time. Yes. But the key issue remains unchanged. With the current setup, amanda will obviously run out of tape-space during the original dump and also if you try amflush. And if auto-flush is enabled, the next dump will hit end-of-tape before any of the new dumps have been written, and the next one after that, and so on; this holding disk image will effectively block the tape operation of all the following backups, and eventually, the holding disk will be full, too, so amdump won't be able to do anything at all. What is different with the tape spanning feature is that you could get the large DLE to tape by simply increasing runtapes, even if only temporarily. Thus, no system lockup. Yes, but that requires manual intervention, and we were talking about safety. All situations where you have to do manual work in order to allow the backup to continue mean reducing safety, IMO, or differently put, a change that means they won't occur, increases safety. - Toralf
Re: Is tape spanning documented anywhere?
On Tue, Jun 13, 2006 at 05:10:53PM +0200, Toralf Lund wrote: > > I also have one other scenario in mind, though - which is one I've > actually come across a number of times: What if a certain DLE due for > backup is estimated to be slightly smaller than *, > and thus dumped to holding disk, but then turns out to be slightly > larger? Wouldn't it be more accurate to say the scenario you ran into previously was DLE larger than because the tape spanning feature was not available at that time. > With the current setup, amanda will obviously run out of > tape-space during the original dump and also if you try amflush. And if > auto-flush is enabled, the next dump will hit end-of-tape before any of > the new dumps have been written, and the next one after that, and so on; > this holding disk image will effectively block the tape operation of all > the following backups, and eventually, the holding disk will be full, > too, so amdump won't be able to do anything at all. What is different with the tape spanning feature is that you could get the large DLE to tape by simply increasing runtapes, even if only temporarily. Thus, no system lockup. -- Jon H. LaBadie [EMAIL PROTECTED] JG Computing 4455 Province Line Road(609) 252-0159 Princeton, NJ 08540-4322 (609) 683-7220 (fax)
Re: Is tape spanning documented anywhere?
On Tue, 13 Jun 2006 at 4:04pm, Paul Bijnens wrote http://wiki.zmanda.com/index.php/Restoring_files#Using_amrestore_with_split_dumps In that explanation I used amrestore to fetch the chunks from disk to tape, but doing it with a shell script is still doable: - read the first 32K block of the tape chunk - get the first line and decide if this is the a chunk you need (we can still keep the requirement that chunks should have been written monotonously, but they can be interspersed with other chunks) - if not, just skip to the next tape chunk - if yes, save the rest of the tape chunk to disk - output that chunk to stdout when reading the next chunk header and that has a different number (because incomplete blocks are rewritten in the beginning of the next tape). So it sounds like tape chunks are just like full dump images, with the standard 32KB amanda header with added info about which chunk it is. That's a Good Thing. Thanks for the pointer when I was being lazy. Given that, interspersing chunks from different images could be done without creating too much extra hassle. *But* I don't know that I see the utility. You liked the idea of starting to tape a dump from holding disk before the dump from the client is done. While I can see the utility, what happens when the client or the network dies mid-dump? You just wasted a bunch of tape. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University
Re: Is tape spanning documented anywhere?
Toralf Lund wrote: Jon LaBadie wrote: On Tue, Jun 13, 2006 at 02:46:31PM +0200, Toralf Lund wrote: Normally I would agree, but I have to back up 3Tb of data organised as one single volume. The only "simple" option would be to have one 3Tb tape as well, but such a thing isn't available (to me at least.) Toralf, perhaps I'm being dense, but why isn't your situation satisfied by the current tape-spanning. I'm envisioning something like lto-2 or lto-3 drives and using no holding disk but sufficient buffer space. If your data compresses to say 1.6TB with the 400GB lto-3 tapes, a setting of runtapes 5 or 6 will accept an entire level 0 dump with only part of the final tape wasted. Well, like I just said in another post - maybe I worry to much, but I'm a bit concerned about dumping 5 or 6 tapes during one run and nothing during others, based in timing/system load considerations. It just seems nicer to spread the work as evenly as possibly across runs... Also, I was thinking that I might be able to split up the directory enough to make do with 2 tapes per DLE. With the current tape-spanning and "runtapes 2", the waste of tape would then start getting rather significant - I would waste space on every other tape, rather than just one out of 5 or 6... But maybe I shouldn't worry too much about extra tape usage, either, since the tapes are a one-time cost with the normal reuse setup. Wasted tapes means slightly more work for the person responsible for changing the tapes, though... - T
Re: Is tape spanning documented anywhere?
Jon LaBadie wrote: On Tue, Jun 13, 2006 at 02:46:31PM +0200, Toralf Lund wrote: Normally I would agree, but I have to back up 3Tb of data organised as one single volume. The only "simple" option would be to have one 3Tb tape as well, but such a thing isn't available (to me at least.) Toralf, perhaps I'm being dense, but why isn't your situation satisfied by the current tape-spanning. I'm envisioning something like lto-2 or lto-3 drives and using no holding disk but sufficient buffer space. If your data compresses to say 1.6TB with the 400GB lto-3 tapes, a setting of runtapes 5 or 6 will accept an entire level 0 dump with only part of the final tape wasted. Well, like I just said in another post - maybe I worry to much, but I'm a bit concerned about dumping 5 or 6 tapes during one run and nothing during others, based in timing/system load considerations. It just seems nicer to spread the work as evenly as possibly across runs... And like I also said, in general, allowing "partial flush" would also address another issue: The one of blocking the entire tape operation when using a holding disk, and getting a dump larger that won't fit on the tapes even though it was expected to (either because of miscalculations during the planner phase or because it specifying the tape size seems to be a rather inexact science.) We're talking about an LTO-2 changer, by the way... - Toralf
Re: Is tape spanning documented anywhere?
To throw my $.02 in here, the situations would be very different. If one is "forced" to have all DLEs "tapeable" in one amdump run, then (theoretically), nothing will be left on the holding disk to lose should said disk die. But we're talking about a situation where the DLEs are not "tapeable". The With tape spanning as implemented, any DLE is tapeable if runtapes is big enough. :) What I'm slightly worried about, is the "unbalanced" setup a low number of large DLEs combined with a large runtapes value will give me. I mean, it implies that several tapes will have to be written on some nights, while nothing at all is taped on others - at least if we disregard incrementals for the moment. This could mean that the write operation will continue long into the following day, when we want to use the server's capacity for other purposes, or (even worse) isn't finished when the next dump is supposed to start. Actually, maybe there won't be any serious issues associated with this, but I'd just feel more comfortable if I could spread the work more evenly and/or use the idle hours of every night. And a different "flush" operation would help me achieve at least part of that, even though the actual dump would still be pretty unbalanced. Some of my colleagues have just nearly convinced me that I worry too much, though ;-/ Maybe it's just me being curmudgeonly (it wouldn't be the first time -- hell, I haven't found a WM I like more than fvwm2) and slavishly adhering to the KISS method. But I think backups *should* adhere to the KISS method. Normally I would agree, but I have to back up 3Tb of data organised as one single volume. The only "simple" option would be to have one 3Tb tape as well, but such a thing isn't available (to me at least.) Also, I think the whole tape splitting concept is inherently complex, and what I suggest here doesn't change the complexity level. The complexity was introduced already, I'm just talking about a *simple* implementation adjustment... I agree that it doesn't change the complexity level. But it does change the safety level. Suddenly you're making yourself far more vulnerable to losing parts of a backup image. On a practical level, I'm pretty sure that the setup you're proposing would require you to have a 3TB holding disk (or at least 3TB-tapelength) to hold your level 0. It's not quite as bad as that, fortunately. While there is one 3TB volume, I can actually split it into more than one DLE quite easily. Splitting it into (much) more than tapes-per-cycle entries (which seems to be a requirement if you want a "balanced" setup) is however going to very hard. But you are right, holding list space is also going to be a bit of an issue. I also have one other scenario in mind, though - which is one I've actually come across a number of times: What if a certain DLE due for backup is estimated to be slightly smaller than *, and thus dumped to holding disk, but then turns out to be slightly larger? With the current setup, amanda will obviously run out of tape-space during the original dump and also if you try amflush. And if auto-flush is enabled, the next dump will hit end-of-tape before any of the new dumps have been written, and the next one after that, and so on; this holding disk image will effectively block the tape operation of all the following backups, and eventually, the holding disk will be full, too, so amdump won't be able to do anything at all. If we were to introduce "partial tape write" as discussed here, but leave the scheduling algorithm unchanged, we would actually increase the safety in this area - an oversized dump would also be flushed eventually, and not "lock up" the system. We would not compromise the safety in other ways, as Amanda would still try to schedule only *'s worth of data (so nothing would be left on the holding disk if everything went according to plan.) - Toralf
Re: Is tape spanning documented anywhere?
On Tue, Jun 13, 2006 at 02:46:31PM +0200, Toralf Lund wrote: > > > Normally I would agree, but I have to back up 3Tb of data organised as > one single volume. The only "simple" option would be to have one 3Tb > tape as well, but such a thing isn't available (to me at least.) Toralf, perhaps I'm being dense, but why isn't your situation satisfied by the current tape-spanning. I'm envisioning something like lto-2 or lto-3 drives and using no holding disk but sufficient buffer space. If your data compresses to say 1.6TB with the 400GB lto-3 tapes, a setting of runtapes 5 or 6 will accept an entire level 0 dump with only part of the final tape wasted. On incremental dumps, amanda would use only as many tapes as necessary. Again, only the final tape would have wasted space. -- Jon H. LaBadie [EMAIL PROTECTED] JG Computing 4455 Province Line Road(609) 252-0159 Princeton, NJ 08540-4322 (609) 683-7220 (fax)
Re: Is tape spanning documented anywhere?
On Tue, Jun 13, 2006 at 09:35:31AM -0400, Joshua Baker-LePain wrote: > On Tue, 13 Jun 2006 at 3:05pm, Paul Bijnens wrote > > >On 2006-06-13 12:55, Toralf Lund wrote: > > >It could also help the current minor problem that taping starts only > >when the DLE is completely dumped to holdingdisk. > >The current implementation also assumes the tape chunks follow > >sequentially on the tape. This is not strictly necessary either. > > > >Allowing tape-chunks to be interspersed with chunks from other DLE's > >together with multi-run taping... Wow, that would make Amanda really > >one of the best free backup programs! > > Again, let the curmudgeon step in here. One of the initial design > principles of amanda was the ability to get your data off the tapes with > *no* amanda tools -- mt, dd, and tar or restore were all that was needed. > Tape spanning as implemented has already broken that, requiring > amfetchdump to reassemble spanned DLEs... > > I think. I honestly don't know how badly the principle has been broken. > Can one simply cat the 2 (or more) spanned images together (minus some > header info perhaps) and get the whole image back? > > But I do know that interspersing tape chunks from multiple DLEs would > absolutely destroy any hopes of getting your data off the tapes without > amanda's tools *and* record keeping. With live CDs so prevelant these > days, keeping copies of the amanda tools around is dead easy. But, > IMNSHO, losing the ability to get your data off the tapes if you lose your > amanda database is unacceptable. > My feelings exactly JLB. Tape spanning was an important addition. I was willing to accept the loss of easy recovery without amanda because of its importance and because it is optional on a DLE by DLE basis. Plus I feel, without confirming this, that you could fairly easily combine the tape splits (should we call them splits vs holding disk chunks?) using standard tools. But I would certainly hesitate to go much further and further complicate standard tool recovery. That ability saved me twice already. -- Jon H. LaBadie [EMAIL PROTECTED] JG Computing 4455 Province Line Road(609) 252-0159 Princeton, NJ 08540-4322 (609) 683-7220 (fax)
Re: Is tape spanning documented anywhere?
On 2006-06-13 15:35, Joshua Baker-LePain wrote: Again, let the curmudgeon step in here. One of the initial design principles of amanda was the ability to get your data off the tapes with *no* amanda tools -- mt, dd, and tar or restore were all that was needed. Tape spanning as implemented has already broken that, requiring amfetchdump to reassemble spanned DLEs... I think. I honestly don't know how badly the principle has been broken. Can one simply cat the 2 (or more) spanned images together (minus some header info perhaps) and get the whole image back? http://wiki.zmanda.com/index.php/Restoring_files#Using_amrestore_with_split_dumps In that explanation I used amrestore to fetch the chunks from disk to tape, but doing it with a shell script is still doable: - read the first 32K block of the tape chunk - get the first line and decide if this is the a chunk you need (we can still keep the requirement that chunks should have been written monotonously, but they can be interspersed with other chunks) - if not, just skip to the next tape chunk - if yes, save the rest of the tape chunk to disk - output that chunk to stdout when reading the next chunk header and that has a different number (because incomplete blocks are rewritten in the beginning of the next tape). When I find some more time, I'll test that method, and add it to the webpage. Amrestore does not need any database, or amanda.conf file at all, so is not too dependent on Amanda things only. The recommended way is to backup the DLE's that contains Amanda related files with the KISS principle (no splitting, maybe not even gzipping!) and then use the more complicat-ing/ed features for those DLE's that need it. But I agree. I like to Keep It Simple too. In general Backup Programs should make it easy to restore. That's even more important than easy to backup. -- Paul Bijnens, xplanation Technology ServicesTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, ^^, * * F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... * * ... "Are you sure?" ... YES ... Phew ... I'm out * ***
Re: Is tape spanning documented anywhere?
On Tue, 13 Jun 2006 at 3:05pm, Paul Bijnens wrote On 2006-06-13 12:55, Toralf Lund wrote: It could also help the current minor problem that taping starts only when the DLE is completely dumped to holdingdisk. The current implementation also assumes the tape chunks follow sequentially on the tape. This is not strictly necessary either. Allowing tape-chunks to be interspersed with chunks from other DLE's together with multi-run taping... Wow, that would make Amanda really one of the best free backup programs! Again, let the curmudgeon step in here. One of the initial design principles of amanda was the ability to get your data off the tapes with *no* amanda tools -- mt, dd, and tar or restore were all that was needed. Tape spanning as implemented has already broken that, requiring amfetchdump to reassemble spanned DLEs... I think. I honestly don't know how badly the principle has been broken. Can one simply cat the 2 (or more) spanned images together (minus some header info perhaps) and get the whole image back? But I do know that interspersing tape chunks from multiple DLEs would absolutely destroy any hopes of getting your data off the tapes without amanda's tools *and* record keeping. With live CDs so prevelant these days, keeping copies of the amanda tools around is dead easy. But, IMNSHO, losing the ability to get your data off the tapes if you lose your amanda database is unacceptable. Only a small step further and you can use the gnutar option --record-number (show record number within archive of a particular file) making it possible to restore from only a few tape-chunks, instead of feeding the complete 300 Gbyte image to tar, to extract only one file, which happens to be at the end of the image by murphy's law anyway. :-) This *is* something that would be nice to implement. But I'd like to see it implemented in a way that makes it optional. Amanda holds onto record numbers. But, again, if you lose your amanda database, the whole tarball would still be there for you to recover and feed to tar directly. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University
Re: Is tape spanning documented anywhere?
On Tue, 13 Jun 2006 at 2:46pm, Toralf Lund wrote Joshua Baker-LePain wrote: To throw my $.02 in here, the situations would be very different. If one is "forced" to have all DLEs "tapeable" in one amdump run, then (theoretically), nothing will be left on the holding disk to lose should said disk die. But we're talking about a situation where the DLEs are not "tapeable". The With tape spanning as implemented, any DLE is tapeable if runtapes is big enough. :) Maybe it's just me being curmudgeonly (it wouldn't be the first time -- hell, I haven't found a WM I like more than fvwm2) and slavishly adhering to the KISS method. But I think backups *should* adhere to the KISS method. Normally I would agree, but I have to back up 3Tb of data organised as one single volume. The only "simple" option would be to have one 3Tb tape as well, but such a thing isn't available (to me at least.) Also, I think the whole tape splitting concept is inherently complex, and what I suggest here doesn't change the complexity level. The complexity was introduced already, I'm just talking about a *simple* implementation adjustment... I agree that it doesn't change the complexity level. But it does change the safety level. Suddenly you're making yourself far more vulnerable to losing parts of a backup image. On a practical level, I'm pretty sure that the setup you're proposing would require you to have a 3TB holding disk (or at least 3TB-tapelength) to hold your level 0. Looking at the amanda.conf man page, amanda *can* span tapes without using a holding disk, but doing so requires either a disk buffer (different from a holding disk in that the whole dump image isn't buffered there, just the chunks that have come from the dump disk but haven't made it to tape yet) or buffering the chunks to system RAM. Am I missing something? I don't know about you, but I'd have a hard time convincing my boss I needed a 2nd 3TB server to backup the first 3TB server. :) -- Joshua Baker-LePain Department of Biomedical Engineering Duke University
Re: Is tape spanning documented anywhere?
On 2006-06-13 12:55, Toralf Lund wrote: Paul Bijnens wrote: On 2006-06-13 12:10, Toralf Lund wrote: Yes indeed. The whole DLE. A singe DLE still needs to be written in one run, possibly using many tapes. Oh no... Like I said, that's a big disappointment. I'm tempted to say that it is not correct to claim that Amanda now suppots tape spanning, if it can't span dumps across tapes written in separate runs. Shouldn't it be able to delete the corresponding data from the holding disk file as tape-chunks are successfully written - so that only the remaining chunks would be flushed on the next run? Seems like this should be easy enough to implement, especially if you interact with holding disk chunks in a constructive manner. Is there any reason why nobody has looked into this, except for lack of time? ... or maybe what's on the holding disk does not really matter and/or is a separate issue. I suppose the taper already knows how to find a certain tape-chunk within the holding disk data, so it's more a matter of being able to tell it to start writing from a certain chunk (different from 0) during flush. The flush operation would obviously have to find the correct index somewhere in the database. Does it do a lookup at all today, or just blindly tape whatever is on the holding disk? Taping one DLE is several "runs" opens a can of worms: you have to add a notion of "partial" succeeded. Restoring then needs some tapes and some holdingdisk files. What if the holdingdisk crashes or accidently rm the files before all of it is written to tape? etc. This would be a bit of an issue, of course. I'm wondering if the would the situation be that much different from the one we have today, though. Holding disk crash or file removal is always going to be a serious problem, of course. But if you have a partial tape dump of the data, you will at least be able to recover some of it... Maybe it would be wise to keep all data on the disk until all tape-chunks are fully written, though, and also use the "partial" status only for flush purposes, i.e. consider "partial" writes as "unsuccessful" when doing restore etc. What happens in the current version if amdump is interrupted while writing the 2nd tape, by the way? As Stefan used to say: AAPW(*). Well, yes. I was partly also asking if a P would be W in this case, though, or if someone for some good reason had decided that tape split across runs should not be supported. I think any AP would very W. :-) It could also help the current minor problem that taping starts only when the DLE is completely dumped to holdingdisk. The current implementation also assumes the tape chunks follow sequentially on the tape. This is not strictly necessary either. Allowing tape-chunks to be interspersed with chunks from other DLE's together with multi-run taping... Wow, that would make Amanda really one of the best free backup programs! Only a small step further and you can use the gnutar option --record-number (show record number within archive of a particular file) making it possible to restore from only a few tape-chunks, instead of feeding the complete 300 Gbyte image to tar, to extract only one file, which happens to be at the end of the image by murphy's law anyway. :-) -- Paul Bijnens, xplanation Technology ServicesTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, ^^, * * F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... * * ... "Are you sure?" ... YES ... Phew ... I'm out * ***
Re: Is tape spanning documented anywhere?
Joshua Baker-LePain wrote: On Tue, 13 Jun 2006 at 12:55pm, Toralf Lund wrote Paul Bijnens wrote: Taping one DLE is several "runs" opens a can of worms: you have to add a notion of "partial" succeeded. Restoring then needs some tapes and some holdingdisk files. What if the holdingdisk crashes or accidently rm the files before all of it is written to tape? etc. This would be a bit of an issue, of course. I'm wondering if the would the situation be that much different from the one we have today, though. Holding disk crash or file removal is always going to be a serious problem, of course. But if you have a partial tape dump of the data, you will at least be able to recover some of it... Maybe it would be wise to keep all data on the To throw my $.02 in here, the situations would be very different. If one is "forced" to have all DLEs "tapeable" in one amdump run, then (theoretically), nothing will be left on the holding disk to lose should said disk die. But we're talking about a situation where the DLEs are not "tapeable". The alternatives are writing parts of a DLE and writing nothing at all. Or maybe, if the scheduling setup was also changed, there *would* be some additional DLEs there to loose - but these would never be taped with the current version either - they would actually not even be dumped... That's obviously not the case if single DLEs are allowed to span amdumps, and the holding disk dies between amdumps. Having the entire night's amdump run on tape at the end of the amdump gives me that warm fuzzy feeling inside. There will be nothing that prevents you from getting that *if it is possible*. But again, in the World I'm talking about, you simply won't get everything on tape. You can only choose between *something* or nothing at all... Maybe it's just me being curmudgeonly (it wouldn't be the first time -- hell, I haven't found a WM I like more than fvwm2) and slavishly adhering to the KISS method. But I think backups *should* adhere to the KISS method. Normally I would agree, but I have to back up 3Tb of data organised as one single volume. The only "simple" option would be to have one 3Tb tape as well, but such a thing isn't available (to me at least.) Also, I think the whole tape splitting concept is inherently complex, and what I suggest here doesn't change the complexity level. The complexity was introduced already, I'm just talking about a *simple* implementation adjustment... What happens in the current version if amdump is interrupted while writing the 2nd tape, by the way? I assume the same thing that would happen if amdump were interrupted while writing to the 1st tape. The image being written to tape would be marked FAILED TO TAPE and be left on the holding disk (along with any other images that hadn't been written yet), and the user would be encouraged to run amflush.
Re: Is tape spanning documented anywhere?
On Tue, 13 Jun 2006 at 12:55pm, Toralf Lund wrote Paul Bijnens wrote: Taping one DLE is several "runs" opens a can of worms: you have to add a notion of "partial" succeeded. Restoring then needs some tapes and some holdingdisk files. What if the holdingdisk crashes or accidently rm the files before all of it is written to tape? etc. This would be a bit of an issue, of course. I'm wondering if the would the situation be that much different from the one we have today, though. Holding disk crash or file removal is always going to be a serious problem, of course. But if you have a partial tape dump of the data, you will at least be able to recover some of it... Maybe it would be wise to keep all data on the To throw my $.02 in here, the situations would be very different. If one is "forced" to have all DLEs "tapeable" in one amdump run, then (theoretically), nothing will be left on the holding disk to lose should said disk die. That's obviously not the case if single DLEs are allowed to span amdumps, and the holding disk dies between amdumps. Having the entire night's amdump run on tape at the end of the amdump gives me that warm fuzzy feeling inside. Maybe it's just me being curmudgeonly (it wouldn't be the first time -- hell, I haven't found a WM I like more than fvwm2) and slavishly adhering to the KISS method. But I think backups *should* adhere to the KISS method. What happens in the current version if amdump is interrupted while writing the 2nd tape, by the way? I assume the same thing that would happen if amdump were interrupted while writing to the 1st tape. The image being written to tape would be marked FAILED TO TAPE and be left on the holding disk (along with any other images that hadn't been written yet), and the user would be encouraged to run amflush. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University
Re: Is tape spanning documented anywhere?
Paul Bijnens wrote: On 2006-06-13 12:10, Toralf Lund wrote: Yes indeed. The whole DLE. A singe DLE still needs to be written in one run, possibly using many tapes. Oh no... Like I said, that's a big disappointment. I'm tempted to say that it is not correct to claim that Amanda now suppots tape spanning, if it can't span dumps across tapes written in separate runs. Shouldn't it be able to delete the corresponding data from the holding disk file as tape-chunks are successfully written - so that only the remaining chunks would be flushed on the next run? Seems like this should be easy enough to implement, especially if you interact with holding disk chunks in a constructive manner. Is there any reason why nobody has looked into this, except for lack of time? ... or maybe what's on the holding disk does not really matter and/or is a separate issue. I suppose the taper already knows how to find a certain tape-chunk within the holding disk data, so it's more a matter of being able to tell it to start writing from a certain chunk (different from 0) during flush. The flush operation would obviously have to find the correct index somewhere in the database. Does it do a lookup at all today, or just blindly tape whatever is on the holding disk? Taping one DLE is several "runs" opens a can of worms: you have to add a notion of "partial" succeeded. Restoring then needs some tapes and some holdingdisk files. What if the holdingdisk crashes or accidently rm the files before all of it is written to tape? etc. This would be a bit of an issue, of course. I'm wondering if the would the situation be that much different from the one we have today, though. Holding disk crash or file removal is always going to be a serious problem, of course. But if you have a partial tape dump of the data, you will at least be able to recover some of it... Maybe it would be wise to keep all data on the disk until all tape-chunks are fully written, though, and also use the "partial" status only for flush purposes, i.e. consider "partial" writes as "unsuccessful" when doing restore etc. What happens in the current version if amdump is interrupted while writing the 2nd tape, by the way? As Stefan used to say: AAPW(*). Well, yes. I was partly also asking if a P would be W in this case, though, or if someone for some good reason had decided that tape split across runs should not be supported. - Toralf
Re: Is tape spanning documented anywhere?
On 2006-06-13 12:10, Toralf Lund wrote: Yes indeed. The whole DLE. A singe DLE still needs to be written in one run, possibly using many tapes. Oh no... Like I said, that's a big disappointment. I'm tempted to say that it is not correct to claim that Amanda now suppots tape spanning, if it can't span dumps across tapes written in separate runs. Shouldn't it be able to delete the corresponding data from the holding disk file as tape-chunks are successfully written - so that only the remaining chunks would be flushed on the next run? Seems like this should be easy enough to implement, especially if you interact with holding disk chunks in a constructive manner. Is there any reason why nobody has looked into this, except for lack of time? ... or maybe what's on the holding disk does not really matter and/or is a separate issue. I suppose the taper already knows how to find a certain tape-chunk within the holding disk data, so it's more a matter of being able to tell it to start writing from a certain chunk (different from 0) during flush. The flush operation would obviously have to find the correct index somewhere in the database. Does it do a lookup at all today, or just blindly tape whatever is on the holding disk? Taping one DLE is several "runs" opens a can of worms: you have to add a notion of "partial" succeeded. Restoring then needs some tapes and some holdingdisk files. What if the holdingdisk crashes or accidently rm the files before all of it is written to tape? etc. As Stefan used to say: AAPW(*). (*) http://article.gmane.org/gmane.comp.archivers.amanda.user/25064 -- Paul Bijnens, xplanation Technology ServicesTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, ^^, * * F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... * * ... "Are you sure?" ... YES ... Phew ... I'm out * ***
Re: Is tape spanning documented anywhere?
Yes indeed. The whole DLE. A singe DLE still needs to be written in one run, possibly using many tapes. Oh no... Like I said, that's a big disappointment. I'm tempted to say that it is not correct to claim that Amanda now suppots tape spanning, if it can't span dumps across tapes written in separate runs. Shouldn't it be able to delete the corresponding data from the holding disk file as tape-chunks are successfully written - so that only the remaining chunks would be flushed on the next run? Seems like this should be easy enough to implement, especially if you interact with holding disk chunks in a constructive manner. Is there any reason why nobody has looked into this, except for lack of time? ... or maybe what's on the holding disk does not really matter and/or is a separate issue. I suppose the taper already knows how to find a certain tape-chunk within the holding disk data, so it's more a matter of being able to tell it to start writing from a certain chunk (different from 0) during flush. The flush operation would obviously have to find the correct index somewhere in the database. Does it do a lookup at all today, or just blindly tape whatever is on the holding disk? - Toralf
Re: Is tape spanning documented anywhere?
Paul Bijnens wrote: On 2006-06-13 10:32, Toralf Lund wrote: 2. What happens to the holding disk file after a dump is partially written to tape? Will Amanda keep the entire file, or just what will be written next time around? And what if the holding disk data is split into "chunks"? Amanda keeps the entire dump, and will be flushed entirely again on the next amflush or autoflush. You mean entirely as in the whole DLE? That would mean that tape splitting is restricted to multiple tapes written in the same run, which would be rather disappointing. Yes indeed. The whole DLE. A singe DLE still needs to be written in one run, possibly using many tapes. Oh no... Like I said, that's a big disappointment. I'm tempted to say that it is not correct to claim that Amanda now suppots tape spanning, if it can't span dumps across tapes written in separate runs. Shouldn't it be able to delete the corresponding data from the holding disk file as tape-chunks are successfully written - so that only the remaining chunks would be flushed on the next run? Seems like this should be easy enough to implement, especially if you interact with holding disk chunks in a constructive manner. Is there any reason why nobody has looked into this, except for lack of time? But you may still split a v large DLE into smaller ones using the include/exclude mechanism that exists for a long time. Yeah, I know, but hasn't the fact that you have to do that always been seen as the one big limitation of Amanda? I was hoping this was gone now now ;-( This thing is, setting up the exclusions/inclusions right is nearly always non-trivial on a system with dynamic data, and in a setup I'm looking at now, it will simply be impossible to specify a static config for this. Without real tape overflow support, we'll simply have to update amanda.conf every day. The actual update may perhaps be done automatically, but I was rather hoping I wouldn't have to implement a tool for that, and the dump database etc. will obviously also get very messy. Differently put, I don't want to set up an amanda config which has the odd DLE that's slightly larger than a tape - I want *all* DLEs to be like that. I may be able to assume that they are never larger than two tapes, so I could use "runtapes 2", but I fear that this would lead to too much waste of tape space. I suspect the only way to get the full dumps evenly split across a relatively limited number of tapes, will be to set "runtapes" large enough for all full dumps to fit into one run, but this is probably not possible in practice. Only now you are not limited by the capacity of a single tape anymore. Or maybe you misunderstood the question. Sorry if I was a bit unclear, but I'm not sure what terminology to use, now. What do you (should we) call one piece of output from the "tape split"? And what should "dump" be taken to mean? The entire output from the backup of one DLE, or one entry from the tape splitting. And what will a holding disk file contain, anyway? Again, will it be data for the whole DLE, or one instance of output from the split operation? (Like I said, I'm testing a bit as I write this, but haven't been able to draw any conclusions yet, mainly because I had to re-build amanda just now...) A piece of a DLE on tape: a tape-chunk. [ And so on ] OK. Thanks. - Toralf
Re: Is tape spanning documented anywhere?
On 2006-06-13 10:32, Toralf Lund wrote: 2. What happens to the holding disk file after a dump is partially written to tape? Will Amanda keep the entire file, or just what will be written next time around? And what if the holding disk data is split into "chunks"? Amanda keeps the entire dump, and will be flushed entirely again on the next amflush or autoflush. You mean entirely as in the whole DLE? That would mean that tape splitting is restricted to multiple tapes written in the same run, which would be rather disappointing. Yes indeed. The whole DLE. A singe DLE still needs to be written in one run, possibly using many tapes. But you may still split a v large DLE into smaller ones using the include/exclude mechanism that exists for a long time. Only now you are not limited by the capacity of a single tape anymore. Or maybe you misunderstood the question. Sorry if I was a bit unclear, but I'm not sure what terminology to use, now. What do you (should we) call one piece of output from the "tape split"? And what should "dump" be taken to mean? The entire output from the backup of one DLE, or one entry from the tape splitting. And what will a holding disk file contain, anyway? Again, will it be data for the whole DLE, or one instance of output from the split operation? (Like I said, I'm testing a bit as I write this, but haven't been able to draw any conclusions yet, mainly because I had to re-build amanda just now...) A piece of a DLE on tape: a tape-chunk. A dump: the entire output of dump/gnutar for one DLE. One dump may be split in chunks on the holdingdisk. One dump may be split in chunks on a tape. These two chunking mechanisms are completely independent. Note also that when you bypass the holdingdisk and do want tape-chunking there is an addional parameter "split_diskbuffer" which gives the path of a file Amanda can use to buffer one tape-chunk on disk (to be able to feed the tape as fast as possible). And there is even a third parameter "fallback_splitsize", which gives the amount of RAM of a buffer when the split_diskbuffer failed. But even when the last byte of single DLE did not make it to tape, the complete DLE is considered failed (still on holdingdisk, or will be tried again on the same level the next run). Anyhow, when I said "holding disk file" earlier, what I meant was "the holding disk data for one DLE", and when I said "partially written to tape", what I meant was "some, but not all, split sections completely written to tape" (i.e. I was not talking about sections that are incomplete because end-of-tape is reached.) I think I completely understood the question. I had the same terminology in my mind. To confirm: suppose you have one DLE: Total DLE size (after compression): 295 Gbyte. tape_splitsize 10 G tape-capacity 101 G runtapes 3 holdingdisk chunksize 2 G And you also have some smaller dumps from other DLE's resulting in 20 Gbyte, which happen to be taped first. The large DLE will will be spread over about 148 holdingdisk chunks of 2 Gbyte each. The first tape contains all those smaller ones already occupying about 20 Gbyte, followed by 8 tape-chunks of the large DLE. While writing chunk number 9, amanda bumps into eot, and restarts that chunk all over again on the second tape. The second tape contains 10 tapechunks more, nr 9-18. While writing chunk 19 Amanda bumps into EOT again. That 19th chunk is ignored on tape 2 and restarted again on tape 3. Tape 3 contains tapechunks 9-28. ANd while writing chunk 29 it bumps into EOT again. But this was the last tape we could use. So the entire DLE, all the 295 GByte is considered failed-to-tape. All the 148 holdingdisk chunks are kept on disk. -- Paul Bijnens, xplanation Technology ServicesTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, ^^, * * F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... * * ... "Are you sure?" ... YES ... Phew ... I'm out * ***
Re: Is tape spanning documented anywhere?
2. What happens to the holding disk file after a dump is partially written to tape? Will Amanda keep the entire file, or just what will be written next time around? And what if the holding disk data is split into "chunks"? Amanda keeps the entire dump, and will be flushed entirely again on the next amflush or autoflush. You mean entirely as in the whole DLE? That would mean that tape splitting is restricted to multiple tapes written in the same run, which would be rather disappointing. Or maybe you misunderstood the question. Sorry if I was a bit unclear, but I'm not sure what terminology to use, now. What do you (should we) call one piece of output from the "tape split"? And what should "dump" be taken to mean? The entire output from the backup of one DLE, or one entry from the tape splitting. And what will a holding disk file contain, anyway? Again, will it be data for the whole DLE, or one instance of output from the split operation? (Like I said, I'm testing a bit as I write this, but haven't been able to draw any conclusions yet, mainly because I had to re-build amanda just now...) Anyhow, when I said "holding disk file" earlier, what I meant was "the holding disk data for one DLE", and when I said "partially written to tape", what I meant was "some, but not all, split sections completely written to tape" (i.e. I was not talking about sections that are incomplete because end-of-tape is reached.) - Toralf
Re: Is tape spanning documented anywhere?
On 2006-06-13 09:41, Toralf Lund wrote: Anyhow, I'd really like to know more about how the spanning actually works. Is it documented anywhere? http://www.amanda.org/docs and FAQ still say that the option does not exist... Try http://wiki.zmanda.org/index.php/Splitting_dumps_across_tapes Yes. Thanks. That's quite helpful. A few questions still remain, though. Like 1. Does tape splitting in any way affect the scheduling algorithm? I mean, what if Amanda wants to dump a certain amount in order to reach the "balanced" dump size, and only much larger DLEs are available? Might one of those be scheduled just so that one "split" part can be used to fill up the remaining space? While scheduling/planning Amanda does not take into account any tape-splitting. When a DLE is larger than then total amount of level 0 dumps divided by runspercycle (= the optimal balanced amount of level 0 dumps to schedule each run) then you have an "unbalanced" configuration. Amanda will try to reschedule full dumps (= promoting the smaller ones sooner) but will never succeed. The result is that you have even more level 0 dumps in a run than in a balanced situation, using even more tape. 2. What happens to the holding disk file after a dump is partially written to tape? Will Amanda keep the entire file, or just what will be written next time around? And what if the holding disk data is split into "chunks"? Amanda keeps the entire dump, and will be flushed entirely again on the next amflush or autoflush. The holdingdisk chunking is completely independant of the tape chunking. -- Paul Bijnens, xplanation Technology ServicesTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, ^^, * * F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... * * ... "Are you sure?" ... YES ... Phew ... I'm out * ***
Re: Is tape spanning documented anywhere?
Anyhow, I'd really like to know more about how the spanning actually works. Is it documented anywhere? http://www.amanda.org/docs and FAQ still say that the option does not exist... Try http://wiki.zmanda.org/index.php/Splitting_dumps_across_tapes Yes. Thanks. That's quite helpful. A few questions still remain, though. Like 1. Does tape splitting in any way affect the scheduling algorithm? I mean, what if Amanda wants to dump a certain amount in order to reach the "balanced" dump size, and only much larger DLEs are available? Might one of those be scheduled just so that one "split" part can be used to fill up the remaining space? 2. What happens to the holding disk file after a dump is partially written to tape? Will Amanda keep the entire file, or just what will be written next time around? And what if the holding disk data is split into "chunks"? But I'm testing a setup using "tape_splitsize" right now, so maybe I'll find out... - Toralf
Re: Is tape spanning documented anywhere?
On Mon, Jun 12, 2006 at 10:18:46AM +0200, Toralf Lund enlightened us: > I haven't been following the posts to this list too closely, or bothered > to upgrade amanda, for some time (since our existing setup *works*...), > so I didn't find out until right now that tape spanning is supported in > the current release. > > Anyhow, I'd really like to know more about how the spanning actually > works. Is it documented anywhere? http://www.amanda.org/docs and FAQ > still say that the option does not exist... > Try http://wiki.zmanda.org/index.php/Splitting_dumps_across_tapes Matt -- Matt Hyclak Department of Mathematics Department of Social Work Ohio University (740) 593-1263