Re: can amanda auto-size DLE's?
Am 12.03.2014 17:52, schrieb Michael Stauffer: > Thanks Stefan! I'll take a look. > > How did this work for you in terms of daily, or almost daily, creating new > DLE's? I imagine it made for near-constant level 0 dumps? Maybe that was > what you needed anyway with lots of new data? I have to admit that I didn't use this for very long ... as you see from the date the script is from ~2010 ... back then I used it for weekly dumps of my video data but I was far from consequent ;-) Additionally my amanda tape server is now another physical machine so the created DLEs and includelists would have to be transferred somehow ... another todo left. I'd be happy to hear some feedback and maybe some scripting-improvement ... Stefan
Re: can amanda auto-size DLE's?
Thanks Stefan! I'll take a look. How did this work for you in terms of daily, or almost daily, creating new DLE's? I imagine it made for near-constant level 0 dumps? Maybe that was what you needed anyway with lots of new data? -M On Wed, Mar 12, 2014 at 5:48 AM, Stefan G. Weichinger wrote: > Am 05.03.2014 14:10, schrieb Stefan G. Weichinger: > > > Aside from this I back then had some other scripts that generated > > include-lists resulting in chunks of <= X GB (smaller than one tape) ... > > I wanted to dump the videos in my mythtv-config and had the problem of > > very dynamic data in there ;-) > > > > So the goal was to re-create dynamic include-lists for DLEs everyday (or > > even at the actual time of amdump). It worked mostly. I would have to > > dig that up again. > > Digged that up and put it on github: > > https://github.com/stefangweichinger/am_dyn_dles > > feel free to use or improve. > > Stefan > > >
Re: can amanda auto-size DLE's?
Am 05.03.2014 14:10, schrieb Stefan G. Weichinger: > Aside from this I back then had some other scripts that generated > include-lists resulting in chunks of <= X GB (smaller than one tape) ... > I wanted to dump the videos in my mythtv-config and had the problem of > very dynamic data in there ;-) > > So the goal was to re-create dynamic include-lists for DLEs everyday (or > even at the actual time of amdump). It worked mostly. I would have to > dig that up again. Digged that up and put it on github: https://github.com/stefangweichinger/am_dyn_dles feel free to use or improve. Stefan
Re: can amanda auto-size DLE's?
Thanks again Jon - very helpful as usual. -M On Mon, Mar 3, 2014 at 7:01 PM, Jon LaBadie wrote: > On Mon, Mar 03, 2014 at 02:47:53PM -0500, Michael Stauffer wrote: > > > > ... > > > > > Any thoughts on how I can approach this? If amanda can't do it, I > > > thought I > > > > > might try a script to create DLE's of a desired size based on > > > disk-usage, > > > > > then run the script everytime I wanted to do a new level 0 dump. > That > > > of > > > > > course would mean telling amanda when I wanted to do level 0's, > rather > > > than > > > > > amanda controlling it. > > > > > > Using a scheme like that, when it comes to recovering data, which DLE > > > was the object in last summer? Remember that when you are asked to > > > recover some data, you will probably be under time pressure with > clients > > > and bosses looking over your shoulder. That's not the time you want > > > to fumble around trying to determine which DLE the data is in. > > > > > > Yes, I can see the complications. That makes me think of some things: > > > > 1) what do people do when they need to split a DLE? Just rely on > > notes/memory of DLE for restoring from older dumps if needed? Or just > > search using something like in question 3) below? > > In addition to the report, amanda can also print a TOC for the tapes. > This is a list of what DLE's and levels are on the tape. Its a joke > today, but the original reason was to put the TOC in the plastic box > with the tape. I print them out in 3-hole format (8.5x11) and file > them. I also add handwritten notes for things like a DLE split. > > When you are splitting a DLE, in the short run you probably remember > the differences when you need to recover. For archive recovery the > written notes are helpful. > > > > > 2) What happens if you split or otherwise modify a DLE during a cycle > when > > normally the DLE would be getting an incremental dump? Will amanda do a > new > > level 0 dump for it? > > > Splitting a DLE means there is 'at least' one new DLE. All new DLE > must get a level 0. If the original DLE is still active, possibly > "excluding" some things that go into the new DLE, it will continue on > its current dumpcycle. I would probably use amadmin to force it to > do a level 0 though. > > > 3) Is there a tool for seaching for a path or filename across all dump > > indecies? Or do I just grep through all the index files > > in /etc/amanda/config-name// ? > > No am-tool that I know of. Just "zgrep" (the indexes are compressed). > > -- > Jon H. LaBadie j...@jgcomp.com > 11226 South Shore Rd. (703) 787-0688 (H) > Reston, VA 20190 (609) 477-8330 (C) >
Re: can amanda auto-size DLE's?
Thanks Debra, this is very helpful. On Mon, Mar 3, 2014 at 3:50 PM, Debra S Baddorf wrote: > Comments on questions that are at the very bottom. > > On Mar 3, 2014, at 1:47 PM, Michael Stauffer > wrote: > > > > > 3) I had figured that when restoring, amrestore has to read in a > complete > > > > dump/tar file before it can extract even a single file. So if I have > a > > > > single DLE that's ~2TB that fits (with multiple parts) on a single > tape, > > > > then to restore a single file, amrestore has to read the whole tape. > > > > HOWEVER, I'm now testing restoring a single file from a large 2.1TB > DLE, > > > > and the file has been restored, but the amrecover operation is still > > > > running, for quite some time after restoring the file. Why might > this be > > > > happening? > > > > Most (all?) current tape formats and drives can fast forward looking > > for end of file marks. Amanda knows the position of the file on the > > tape and will have to drive go at high speed to that tape file. > > > > For formats like LTO, which have many tracks on the tape, I think it > > is even faster. I "think" a TOC records where (i.e. which track) each > > file starts. So it doesn't have to fast forward and back 50 times to > > get to the "tenth" file which is on the 51st track. > > > > Jon, Olivier and Debra - thanks for reading my long post and replying. > > > > OK this makes sense about searching for eof marks from what I've read. > Seems like it's a good reason to use smaller DLE's. > > > > > > 3a) Where is the recovered dump file written to by amrecover? I > can't see > > > > space being used for it on either server or client. Is it streaming > and > > > > untar'ing in memory, only writing the desired files to disk? > > > > > The tar file is not written to disk be amrecover. The desired files are > > extracted as the tarchive streams. > > > > Thanks, that makes sense too from what I've seen (or not seen, actually > - i.e. large temporary files). > > > > > > So assuming all the above is true, it'd be great if amdump could > > > > automatically break large DLE's into small DLE's to end up with > smaller > > > > dump files and faster restore of individual files. Maybe it would > happen > > > > only for level 0 dumps, so that incremental dumps would still use > the same > > > > sub-DLE's used by the most recent level 0 dump. > > > > Sure, great idea. Then all you would need to configure is one DLE > > starting at "/". Amanda would break things up into sub-DLEs. > > > > Nope, sorry amanda asks the backup-admin to do that part of the > > config. That's why you get the big bucks ;) > > > > Good point! A bit of job security there. ;) > > > > > > Any thoughts on how I can approach this? If amanda can't do it, I > thought I > > > > might try a script to create DLE's of a desired size based on > disk-usage, > > > > then run the script everytime I wanted to do a new level 0 dump. > That of > > > > course would mean telling amanda when I wanted to do level 0's, > rather than > > > > amanda controlling it. > > > > Using a scheme like that, when it comes to recovering data, which DLE > > was the object in last summer? Remember that when you are asked to > > recover some data, you will probably be under time pressure with clients > > and bosses looking over your shoulder. That's not the time you want > > to fumble around trying to determine which DLE the data is in. > > > > Yes, I can see the complications. That makes me think of some things: > > > > 1) what do people do when they need to split a DLE? Just rely on > notes/memory of DLE for restoring from older dumps if needed? Or just > search using something like in question 3) below? > > I leave the old DLE in my disk list, commented out. Possibly with the > date when it was removed. This helps me to remember that > I need to UNcomment it before trying to restore using it. I.E. The DLE > needs to be recreated (needs to be in your disklist file) when > you run amrecover, in order for it to be a valid choice. So if you are > looking at an older tape, you need to have those older DLEs still in > place. > > As I understand it, anyway! > > > > > > 2) What happens if you split or otherwise modify a DLE during a cycle > when normally the DLE would be getting an incremental dump? Will amanda do > a new level 0 dump for it? > > Yes. It's now a totally new DLE as far as amanda knows, so it gets a > level 0 dump on the first backup. > > I've found "amdump myconfig --no-taper node-name [DLE-name] " > useful sometimes. It will do a backup of just the requested node and DLE > but won't waste a tape on this small bit of data. The data stays on my > holding disk. The next amdump will autoflush it to tape with everything > else > (assuming "autoflush" is set to AUTO or YES -- see your amanda.conf > file) > > I use the --no-taper when I need to test a new DLE to make sure it > works, before the regular backup is due.Or perhaps, to get that
Re: can amanda auto-size DLE's?
Am 28.02.2014 06:33, schrieb Jon LaBadie: > Sure, great idea. Then all you would need to configure is one DLE > starting at "/". Amanda would break things up into sub-DLEs. A bit off topic maybe .. but: what I wish for for years now (and never take the time to sit down and script it) is a helper script for amanda that reads in the disklist and compares it with the actual filesystem(s). practical example: When I set up a server for a customer I create initial DLEs like: garden pictures /mnt/samba/pictures { root-tar exclude "./B" exclude append "./F" exclude append "./G" exclude append "./H" [...] } garden pictures_b /mnt/samba/pictures { root-tar include "./B" } garden pictures_f /mnt/samba/pictures { root-tar include "./F" } [...] --| The main pictures-folder gets caught by DLE "pictures", that DLE excludes some (big) subdirs which in turn are defined as separate DLEs. Over time sometimes it is necessary to add more excludes and separate DLEs (when things grow). (2nd level sidenote here: I would also love a warning mechanism for amanda writing me mails like "DLE X has not been fully dumped for more than Y days now. Seems it has grown too much, check your setup or cleanup a bit") Now it would be really nice to have a check script that reads in all this and compares it to the actual filesystem to tell me "yes, all your subdirs are at least caught by one DLE" or "attention, subdir X gets excluded in DLE_A, but is not included anywhere". Or a graphical tree with dirs green that are within DLEs and red ones are not included by amanda ... This would make it easier in big configs with more complex DLE-definitions and dynamic creation of dirs and mountpoints. Did I explain it right? --- Aside from this I back then had some other scripts that generated include-lists resulting in chunks of <= X GB (smaller than one tape) ... I wanted to dump the videos in my mythtv-config and had the problem of very dynamic data in there ;-) So the goal was to re-create dynamic include-lists for DLEs everyday (or even at the actual time of amdump). It worked mostly. I would have to dig that up again. Stefan
Re: can amanda auto-size DLE's?
On Mon, Mar 03, 2014 at 02:47:53PM -0500, Michael Stauffer wrote: > > ... > > > > Any thoughts on how I can approach this? If amanda can't do it, I > > thought I > > > > might try a script to create DLE's of a desired size based on > > disk-usage, > > > > then run the script everytime I wanted to do a new level 0 dump. That > > of > > > > course would mean telling amanda when I wanted to do level 0's, rather > > than > > > > amanda controlling it. > > > > Using a scheme like that, when it comes to recovering data, which DLE > > was the object in last summer? Remember that when you are asked to > > recover some data, you will probably be under time pressure with clients > > and bosses looking over your shoulder. That's not the time you want > > to fumble around trying to determine which DLE the data is in. > > > Yes, I can see the complications. That makes me think of some things: > > 1) what do people do when they need to split a DLE? Just rely on > notes/memory of DLE for restoring from older dumps if needed? Or just > search using something like in question 3) below? In addition to the report, amanda can also print a TOC for the tapes. This is a list of what DLE's and levels are on the tape. Its a joke today, but the original reason was to put the TOC in the plastic box with the tape. I print them out in 3-hole format (8.5x11) and file them. I also add handwritten notes for things like a DLE split. When you are splitting a DLE, in the short run you probably remember the differences when you need to recover. For archive recovery the written notes are helpful. > > 2) What happens if you split or otherwise modify a DLE during a cycle when > normally the DLE would be getting an incremental dump? Will amanda do a new > level 0 dump for it? > Splitting a DLE means there is 'at least' one new DLE. All new DLE must get a level 0. If the original DLE is still active, possibly "excluding" some things that go into the new DLE, it will continue on its current dumpcycle. I would probably use amadmin to force it to do a level 0 though. > 3) Is there a tool for seaching for a path or filename across all dump > indecies? Or do I just grep through all the index files > in /etc/amanda/config-name// ? No am-tool that I know of. Just "zgrep" (the indexes are compressed). -- Jon H. LaBadie j...@jgcomp.com 11226 South Shore Rd. (703) 787-0688 (H) Reston, VA 20190 (609) 477-8330 (C)
Re: can amanda auto-size DLE's?
Comments on questions that are at the very bottom. On Mar 3, 2014, at 1:47 PM, Michael Stauffer wrote: > > > 3) I had figured that when restoring, amrestore has to read in a complete > > > dump/tar file before it can extract even a single file. So if I have a > > > single DLE that's ~2TB that fits (with multiple parts) on a single tape, > > > then to restore a single file, amrestore has to read the whole tape. > > > HOWEVER, I'm now testing restoring a single file from a large 2.1TB DLE, > > > and the file has been restored, but the amrecover operation is still > > > running, for quite some time after restoring the file. Why might this be > > > happening? > > Most (all?) current tape formats and drives can fast forward looking > for end of file marks. Amanda knows the position of the file on the > tape and will have to drive go at high speed to that tape file. > > For formats like LTO, which have many tracks on the tape, I think it > is even faster. I "think" a TOC records where (i.e. which track) each > file starts. So it doesn't have to fast forward and back 50 times to > get to the "tenth" file which is on the 51st track. > > Jon, Olivier and Debra - thanks for reading my long post and replying. > > OK this makes sense about searching for eof marks from what I've read. Seems > like it's a good reason to use smaller DLE's. > > > > 3a) Where is the recovered dump file written to by amrecover? I can't see > > > space being used for it on either server or client. Is it streaming and > > > untar'ing in memory, only writing the desired files to disk? > > > The tar file is not written to disk be amrecover. The desired files are > extracted as the tarchive streams. > > Thanks, that makes sense too from what I've seen (or not seen, actually - > i.e. large temporary files). > > > > So assuming all the above is true, it'd be great if amdump could > > > automatically break large DLE's into small DLE's to end up with smaller > > > dump files and faster restore of individual files. Maybe it would happen > > > only for level 0 dumps, so that incremental dumps would still use the same > > > sub-DLE's used by the most recent level 0 dump. > > Sure, great idea. Then all you would need to configure is one DLE > starting at "/". Amanda would break things up into sub-DLEs. > > Nope, sorry amanda asks the backup-admin to do that part of the > config. That's why you get the big bucks ;) > > Good point! A bit of job security there. ;) > > > > Any thoughts on how I can approach this? If amanda can't do it, I thought > > > I > > > might try a script to create DLE's of a desired size based on disk-usage, > > > then run the script everytime I wanted to do a new level 0 dump. That of > > > course would mean telling amanda when I wanted to do level 0's, rather > > > than > > > amanda controlling it. > > Using a scheme like that, when it comes to recovering data, which DLE > was the object in last summer? Remember that when you are asked to > recover some data, you will probably be under time pressure with clients > and bosses looking over your shoulder. That's not the time you want > to fumble around trying to determine which DLE the data is in. > > Yes, I can see the complications. That makes me think of some things: > > 1) what do people do when they need to split a DLE? Just rely on notes/memory > of DLE for restoring from older dumps if needed? Or just search using > something like in question 3) below? I leave the old DLE in my disk list, commented out. Possibly with the date when it was removed. This helps me to remember that I need to UNcomment it before trying to restore using it. I.E. The DLE needs to be recreated (needs to be in your disklist file) when you run amrecover, in order for it to be a valid choice. So if you are looking at an older tape, you need to have those older DLEs still in place. As I understand it, anyway! > > 2) What happens if you split or otherwise modify a DLE during a cycle when > normally the DLE would be getting an incremental dump? Will amanda do a new > level 0 dump for it? Yes. It's now a totally new DLE as far as amanda knows, so it gets a level 0 dump on the first backup. I've found "amdump myconfig --no-taper node-name [DLE-name] " useful sometimes. It will do a backup of just the requested node and DLE but won't waste a tape on this small bit of data. The data stays on my holding disk. The next amdump will autoflush it to tape with everything else (assuming "autoflush" is set to AUTO or YES -- see your amanda.conf file) I use the --no-taper when I need to test a new DLE to make sure it works, before the regular backup is due.Or perhaps, to get that new level-0 out of the way now, so it doesn't extend the runtime of the regular amdump job. > > 3) Is there a tool for seaching for a path or filename across all dump > indecies? Or do I just grep through all the index files in > /etc/amanda/config
Re: can amanda auto-size DLE's?
Yes thanks, this is what I do. I've had some complication running the restore from the backup server rather than the client, but I'll worry about that later. On Fri, Feb 28, 2014 at 1:47 PM, Debra S Baddorf wrote: > one small comment inserted below > > On Feb 27, 2014, at 11:33 PM, Jon LaBadie > wrote: > > > Oliver already provided good answers, I'll just add a bit. > > > > On Fri, Feb 28, 2014 at 10:35:08AM +0700, Olivier Nicole wrote: > >> Muchael, > >> > > ... > >> > >>> 3) I had figured that when restoring, amrestore has to read in a > complete > >>> dump/tar file before it can extract even a single file. So if I have a > >>> single DLE that's ~2TB that fits (with multiple parts) on a single > tape, > >>> then to restore a single file, amrestore has to read the whole tape. > >>> HOWEVER, I'm now testing restoring a single file from a large 2.1TB > DLE, > >>> and the file has been restored, but the amrecover operation is still > >>> running, for quite some time after restoring the file. Why might this > be > >>> happening? > >> > >> Your touching the essence or tapes here: they are sequential access. > >> > >> So in order to access one specifi DLE on the tape, the tape has to > >> position at the very begining of the tape and read everything until it > >> reaches that dle (the nth file on the tape). > >> > > > > Most (all?) current tape formats and drives can fast forward looking > > for end of file marks. Amanda knows the position of the file on the > > tape and will have to drive go at high speed to that tape file. > > > > For formats like LTO, which have many tracks on the tape, I think it > > is even faster. I "think" a TOC records where (i.e. which track) each > > file starts. So it doesn't have to fast forward and back 50 times to > > get to the "tenth" file which is on the 51st track. > > > >> Then it has to read sequentially all that file containing the backup of > >> a dle to find the file(s) you want to restore. I am not sure about dump, > >> but I am pretty sure that if your tar backup was a file on a disk > >> instead of a file on a tape, it would read sequentially from the > >> begining of the tar file, in a similar way. > >> > >> Then it has to read until the end of the tar (not sure about dump) to > >> make sure that there is no other file(s) satisfying your extraction > >> criteria. > >> > >> So yes, if the file you want to extract is at the begining of your tar, > >> it will continue reading for a certain amount of time after the file has > >> been extracted. > > > > Another reason this happens is the "append" feature of tar. It is > > possible that a second, later version of the same file is in the tar > > file. Amanda does not use this feature but tar does not know this. > > If you see the file you want has been recovered, you can interupt > > amrecover. > > > >>> The recover log shows this on the client doing the recovery: > >>> > >>> [root@cfile amRecoverTest_Feb_27]# tail -f > >>> /var/log/amanda/client/jet1/amrecover.20140227135820.debug > >>> Thu Feb 27 17:23:12 2014: thd-0x25f1590: amrecover: > stream_read_callback: > >>> data is still flowing > >>> > >>> 3a) Where is the recovered dump file written to by amrecover? I can't > see > >>> space being used for it on either server or client. Is it streaming and > >>> untar'ing in memory, only writing the desired files to disk? > >> > > The tar file is not written to disk be amrecover. The desired files are > > extracted as the tarchive streams. > > > >> In the directory from where you started the amrecover command. With tar, > >> it will create the same exact hierarchy, reflecting the original DLE. > >> > >> try: > >> > >> find . -name myfilename -print > > > > I strongly suggest you NOT use amrecover to extract directly to the > > filesystem. Extract them in a temporary directory and once you are > > sure they are what you want, copy/move them to their correct location. > > To make this completely clear (i.e. "restoring guide for idiots") > - cd /tmp/something > - amrecover . > > The files will be restored into the /tmp/something which is your current > directory > when you typed the amrecover command. > > > > > > ... > >>> So assuming all the above is true, it'd be great if amdump could > >>> automatically break large DLE's into small DLE's to end up with smaller > >>> dump files and faster restore of individual files. Maybe it would > happen > >>> only for level 0 dumps, so that incremental dumps would still use the > same > >>> sub-DLE's used by the most recent level 0 dump. > > > > Sure, great idea. Then all you would need to configure is one DLE > > starting at "/". Amanda would break things up into sub-DLEs. > > > > Nope, sorry amanda asks the backup-admin to do that part of the > > config. That's why you get the big bucks ;) > > > >> > >>> The issue I have is that with 30TB of data, there'd be lots of manual > >>> fragmenting of data directories to get more easily-restorable DLE's > sizes > >>> of sa
Re: can amanda auto-size DLE's?
> > > > 3) I had figured that when restoring, amrestore has to read in a > complete > > > dump/tar file before it can extract even a single file. So if I have a > > > single DLE that's ~2TB that fits (with multiple parts) on a single > tape, > > > then to restore a single file, amrestore has to read the whole tape. > > > HOWEVER, I'm now testing restoring a single file from a large 2.1TB > DLE, > > > and the file has been restored, but the amrecover operation is still > > > running, for quite some time after restoring the file. Why might this > be > > > happening? > > Most (all?) current tape formats and drives can fast forward looking > for end of file marks. Amanda knows the position of the file on the > tape and will have to drive go at high speed to that tape file. > > For formats like LTO, which have many tracks on the tape, I think it > is even faster. I "think" a TOC records where (i.e. which track) each > file starts. So it doesn't have to fast forward and back 50 times to > get to the "tenth" file which is on the 51st track. Jon, Olivier and Debra - thanks for reading my long post and replying. OK this makes sense about searching for eof marks from what I've read. Seems like it's a good reason to use smaller DLE's. > > > 3a) Where is the recovered dump file written to by amrecover? I can't > see > > > space being used for it on either server or client. Is it streaming and > > > untar'ing in memory, only writing the desired files to disk? > > > The tar file is not written to disk be amrecover. The desired files are > extracted as the tarchive streams. Thanks, that makes sense too from what I've seen (or not seen, actually - i.e. large temporary files). > > > So assuming all the above is true, it'd be great if amdump could > > > automatically break large DLE's into small DLE's to end up with smaller > > > dump files and faster restore of individual files. Maybe it would > happen > > > only for level 0 dumps, so that incremental dumps would still use the > same > > > sub-DLE's used by the most recent level 0 dump. > > Sure, great idea. Then all you would need to configure is one DLE > starting at "/". Amanda would break things up into sub-DLEs. > > Nope, sorry amanda asks the backup-admin to do that part of the > config. That's why you get the big bucks ;) Good point! A bit of job security there. ;) > > > Any thoughts on how I can approach this? If amanda can't do it, I > thought I > > > might try a script to create DLE's of a desired size based on > disk-usage, > > > then run the script everytime I wanted to do a new level 0 dump. That > of > > > course would mean telling amanda when I wanted to do level 0's, rather > than > > > amanda controlling it. > > Using a scheme like that, when it comes to recovering data, which DLE > was the object in last summer? Remember that when you are asked to > recover some data, you will probably be under time pressure with clients > and bosses looking over your shoulder. That's not the time you want > to fumble around trying to determine which DLE the data is in. Yes, I can see the complications. That makes me think of some things: 1) what do people do when they need to split a DLE? Just rely on notes/memory of DLE for restoring from older dumps if needed? Or just search using something like in question 3) below? 2) What happens if you split or otherwise modify a DLE during a cycle when normally the DLE would be getting an incremental dump? Will amanda do a new level 0 dump for it? 3) Is there a tool for seaching for a path or filename across all dump indecies? Or do I just grep through all the index files in /etc/amanda/config-name// ? Thanks -M
Re: can amanda auto-size DLE's?
one small comment inserted below On Feb 27, 2014, at 11:33 PM, Jon LaBadie wrote: > Oliver already provided good answers, I'll just add a bit. > > On Fri, Feb 28, 2014 at 10:35:08AM +0700, Olivier Nicole wrote: >> Muchael, >> > ... >> >>> 3) I had figured that when restoring, amrestore has to read in a complete >>> dump/tar file before it can extract even a single file. So if I have a >>> single DLE that's ~2TB that fits (with multiple parts) on a single tape, >>> then to restore a single file, amrestore has to read the whole tape. >>> HOWEVER, I'm now testing restoring a single file from a large 2.1TB DLE, >>> and the file has been restored, but the amrecover operation is still >>> running, for quite some time after restoring the file. Why might this be >>> happening? >> >> Your touching the essence or tapes here: they are sequential access. >> >> So in order to access one specifi DLE on the tape, the tape has to >> position at the very begining of the tape and read everything until it >> reaches that dle (the nth file on the tape). >> > > Most (all?) current tape formats and drives can fast forward looking > for end of file marks. Amanda knows the position of the file on the > tape and will have to drive go at high speed to that tape file. > > For formats like LTO, which have many tracks on the tape, I think it > is even faster. I "think" a TOC records where (i.e. which track) each > file starts. So it doesn't have to fast forward and back 50 times to > get to the "tenth" file which is on the 51st track. > >> Then it has to read sequentially all that file containing the backup of >> a dle to find the file(s) you want to restore. I am not sure about dump, >> but I am pretty sure that if your tar backup was a file on a disk >> instead of a file on a tape, it would read sequentially from the >> begining of the tar file, in a similar way. >> >> Then it has to read until the end of the tar (not sure about dump) to >> make sure that there is no other file(s) satisfying your extraction >> criteria. >> >> So yes, if the file you want to extract is at the begining of your tar, >> it will continue reading for a certain amount of time after the file has >> been extracted. > > Another reason this happens is the "append" feature of tar. It is > possible that a second, later version of the same file is in the tar > file. Amanda does not use this feature but tar does not know this. > If you see the file you want has been recovered, you can interupt > amrecover. > >>> The recover log shows this on the client doing the recovery: >>> >>> [root@cfile amRecoverTest_Feb_27]# tail -f >>> /var/log/amanda/client/jet1/amrecover.20140227135820.debug >>> Thu Feb 27 17:23:12 2014: thd-0x25f1590: amrecover: stream_read_callback: >>> data is still flowing >>> >>> 3a) Where is the recovered dump file written to by amrecover? I can't see >>> space being used for it on either server or client. Is it streaming and >>> untar'ing in memory, only writing the desired files to disk? >> > The tar file is not written to disk be amrecover. The desired files are > extracted as the tarchive streams. > >> In the directory from where you started the amrecover command. With tar, >> it will create the same exact hierarchy, reflecting the original DLE. >> >> try: >> >> find . -name myfilename -print > > I strongly suggest you NOT use amrecover to extract directly to the > filesystem. Extract them in a temporary directory and once you are > sure they are what you want, copy/move them to their correct location. To make this completely clear (i.e. "restoring guide for idiots") - cd /tmp/something - amrecover ….. The files will be restored into the /tmp/something which is your current directory when you typed the amrecover command. > > ... >>> So assuming all the above is true, it'd be great if amdump could >>> automatically break large DLE's into small DLE's to end up with smaller >>> dump files and faster restore of individual files. Maybe it would happen >>> only for level 0 dumps, so that incremental dumps would still use the same >>> sub-DLE's used by the most recent level 0 dump. > > Sure, great idea. Then all you would need to configure is one DLE > starting at "/". Amanda would break things up into sub-DLEs. > > Nope, sorry amanda asks the backup-admin to do that part of the > config. That's why you get the big bucks ;) > >> >>> The issue I have is that with 30TB of data, there'd be lots of manual >>> fragmenting of data directories to get more easily-restorable DLE's sizes >>> of say, 500GB each. Some top-level dirs in my main data drive have 3-6TB >>> each, while many others have only 100GB or so. Manually breaking these into >>> smaller DLE's once is fine, but since data gets regularly moved, added and >>> deleted, things would quickly change and upset my smaller DLE's. > > I'll bet if you try you will be able to make some logical splits. >>> >>> Any thoughts on how I can approach this? If aman
Re: can amanda auto-size DLE's?
Oliver already provided good answers, I'll just add a bit. On Fri, Feb 28, 2014 at 10:35:08AM +0700, Olivier Nicole wrote: > Muchael, > ... > > > 3) I had figured that when restoring, amrestore has to read in a complete > > dump/tar file before it can extract even a single file. So if I have a > > single DLE that's ~2TB that fits (with multiple parts) on a single tape, > > then to restore a single file, amrestore has to read the whole tape. > > HOWEVER, I'm now testing restoring a single file from a large 2.1TB DLE, > > and the file has been restored, but the amrecover operation is still > > running, for quite some time after restoring the file. Why might this be > > happening? > > Your touching the essence or tapes here: they are sequential access. > > So in order to access one specifi DLE on the tape, the tape has to > position at the very begining of the tape and read everything until it > reaches that dle (the nth file on the tape). > Most (all?) current tape formats and drives can fast forward looking for end of file marks. Amanda knows the position of the file on the tape and will have to drive go at high speed to that tape file. For formats like LTO, which have many tracks on the tape, I think it is even faster. I "think" a TOC records where (i.e. which track) each file starts. So it doesn't have to fast forward and back 50 times to get to the "tenth" file which is on the 51st track. > Then it has to read sequentially all that file containing the backup of > a dle to find the file(s) you want to restore. I am not sure about dump, > but I am pretty sure that if your tar backup was a file on a disk > instead of a file on a tape, it would read sequentially from the > begining of the tar file, in a similar way. > > Then it has to read until the end of the tar (not sure about dump) to > make sure that there is no other file(s) satisfying your extraction > criteria. > > So yes, if the file you want to extract is at the begining of your tar, > it will continue reading for a certain amount of time after the file has > been extracted. Another reason this happens is the "append" feature of tar. It is possible that a second, later version of the same file is in the tar file. Amanda does not use this feature but tar does not know this. If you see the file you want has been recovered, you can interupt amrecover. > > The recover log shows this on the client doing the recovery: > > > > [root@cfile amRecoverTest_Feb_27]# tail -f > > /var/log/amanda/client/jet1/amrecover.20140227135820.debug > > Thu Feb 27 17:23:12 2014: thd-0x25f1590: amrecover: stream_read_callback: > > data is still flowing > > > > 3a) Where is the recovered dump file written to by amrecover? I can't see > > space being used for it on either server or client. Is it streaming and > > untar'ing in memory, only writing the desired files to disk? > The tar file is not written to disk be amrecover. The desired files are extracted as the tarchive streams. > In the directory from where you started the amrecover command. With tar, > it will create the same exact hierarchy, reflecting the original DLE. > > try: > > find . -name myfilename -print I strongly suggest you NOT use amrecover to extract directly to the filesystem. Extract them in a temporary directory and once you are sure they are what you want, copy/move them to their correct location. ... > > So assuming all the above is true, it'd be great if amdump could > > automatically break large DLE's into small DLE's to end up with smaller > > dump files and faster restore of individual files. Maybe it would happen > > only for level 0 dumps, so that incremental dumps would still use the same > > sub-DLE's used by the most recent level 0 dump. Sure, great idea. Then all you would need to configure is one DLE starting at "/". Amanda would break things up into sub-DLEs. Nope, sorry amanda asks the backup-admin to do that part of the config. That's why you get the big bucks ;) > > > The issue I have is that with 30TB of data, there'd be lots of manual > > fragmenting of data directories to get more easily-restorable DLE's sizes > > of say, 500GB each. Some top-level dirs in my main data drive have 3-6TB > > each, while many others have only 100GB or so. Manually breaking these into > > smaller DLE's once is fine, but since data gets regularly moved, added and > > deleted, things would quickly change and upset my smaller DLE's. I'll bet if you try you will be able to make some logical splits. > > > > Any thoughts on how I can approach this? If amanda can't do it, I thought I > > might try a script to create DLE's of a desired size based on disk-usage, > > then run the script everytime I wanted to do a new level 0 dump. That of > > course would mean telling amanda when I wanted to do level 0's, rather than > > amanda controlling it. Using a scheme like that, when it comes to recovering data, which DLE was the object in last summer? Remember that when you are asked to
Re: can amanda auto-size DLE's?
Muchael, > 1) if I have multiple DLE's in my disklist, then tell amdump to perform a > level 0 dump of the complete config, each DLE gets written to tape as a > separate dump/tar file (possibly in parts if the tar is > part-size). Is > that right? Yes > 2) If multiple DLE's are processed in a single level 0 amdump run, with > each DLE << tape-size, then as many as can fit will be written to a single > tape, or possibly spanning tapes. But in any case it won't be a single DLE > per tape. Is that right? That looks like what I've observed so far. Yes, Amanda try to fit as many dle/tape in order to fill-in the tape. > 3) I had figured that when restoring, amrestore has to read in a complete > dump/tar file before it can extract even a single file. So if I have a > single DLE that's ~2TB that fits (with multiple parts) on a single tape, > then to restore a single file, amrestore has to read the whole tape. > HOWEVER, I'm now testing restoring a single file from a large 2.1TB DLE, > and the file has been restored, but the amrecover operation is still > running, for quite some time after restoring the file. Why might this be > happening? Your touching the essence or tapes here: they are sequential access. So in order to access one specifi DLE on the tape, the tape has to position at the very begining of the tape and read everything until it reaches that dle (the nth file on the tape). Then it has to read sequentially all that file containing the backup of a dle to find the file(s) you want to restore. I am not sure about dump, but I am pretty sure that if your tar backup was a file on a disk instead of a file on a tape, it would read sequentially from the begining of the tar file, in a similar way. Then it has to read until the end of the tar (not sure about dump) to make sure that there is no other file(s) satisfying your extraction criteria. So yes, if the file you want to extract is at the begining of your tar, it will continue reading for a certain amount of time after the file has been extracted. > The recover log shows this on the client doing the recovery: > > [root@cfile amRecoverTest_Feb_27]# tail -f > /var/log/amanda/client/jet1/amrecover.20140227135820.debug > Thu Feb 27 17:23:12 2014: thd-0x25f1590: amrecover: stream_read_callback: > data is still flowing > > 3a) Where is the recovered dump file written to by amrecover? I can't see > space being used for it on either server or client. Is it streaming and > untar'ing in memory, only writing the desired files to disk? In the directory from where you started the amrecover command. With tar, it will create the same exact hierarchy, reflecting the original DLE. try: find . -name myfilename -print > 4) To restore from a single DLE's dump/tar file that's smaller than tape > size, and exists on a tape with multiple other smaller DLE dump/tar files, > amrestore can seek to the particular DLE's dump/tar file and only has to > read that one file. Is that right? As mentionned above, seek on a tape is a sequential read of the tape (unless your tape is already positionned on the file x (known) and you want to read from file y, it will need to read only y-x). > So assuming all the above is true, it'd be great if amdump could > automatically break large DLE's into small DLE's to end up with smaller > dump files and faster restore of individual files. Maybe it would happen > only for level 0 dumps, so that incremental dumps would still use the same > sub-DLE's used by the most recent level 0 dump. Yes, but then what happens for level above 0? You have to make your planning by hand and break your dle yourself. > The issue I have is that with 30TB of data, there'd be lots of manual Depending on the size of your tapes, even with many small dle, you will most probably end-up reading the tape from the begining for every restore. If your dle is splitted on many tapes, you will have to read every tape even if the file you want was found on the first tape (I am not 100% sure about that though). > fragmenting of data directories to get more easily-restorable DLE's sizes > of say, 500GB each. Some top-level dirs in my main data drive have 3-6TB > each, while many others have only 100GB or so. Manually breaking these into > smaller DLE's once is fine, but since data gets regularly moved, added and > deleted, things would quickly change and upset my smaller DLE's. > > Any thoughts on how I can approach this? If amanda can't do it, I thought I > might try a script to create DLE's of a desired size based on disk-usage, > then run the script everytime I wanted to do a new level 0 dump. That of > course would mean telling amanda when I wanted to do level 0's, rather than > amanda controlling it. While the script may be a good idea, running it for each level 0 will completely mess up with Amanda: remember, you don't manage/know when Amanda will do a level 0. So you don't know when to run your script. And if you remove a dle from your disklist (because you h
can amanda auto-size DLE's?
Amanda 3.3.4 Hi, I'm guessing the answer is no since I haven't read about this, but maybe... I'm hoping amanda might be able to auto-size DLE's into sub-DLE's of an approximate size, say 500GB. My understanding is this: 1) if I have multiple DLE's in my disklist, then tell amdump to perform a level 0 dump of the complete config, each DLE gets written to tape as a separate dump/tar file (possibly in parts if the tar is > part-size). Is that right? 2) If multiple DLE's are processed in a single level 0 amdump run, with each DLE << tape-size, then as many as can fit will be written to a single tape, or possibly spanning tapes. But in any case it won't be a single DLE per tape. Is that right? That looks like what I've observed so far. 3) I had figured that when restoring, amrestore has to read in a complete dump/tar file before it can extract even a single file. So if I have a single DLE that's ~2TB that fits (with multiple parts) on a single tape, then to restore a single file, amrestore has to read the whole tape. HOWEVER, I'm now testing restoring a single file from a large 2.1TB DLE, and the file has been restored, but the amrecover operation is still running, for quite some time after restoring the file. Why might this be happening? The recover log shows this on the client doing the recovery: [root@cfile amRecoverTest_Feb_27]# tail -f /var/log/amanda/client/jet1/amrecover.20140227135820.debug Thu Feb 27 17:23:12 2014: thd-0x25f1590: amrecover: stream_read_callback: data is still flowing 3a) Where is the recovered dump file written to by amrecover? I can't see space being used for it on either server or client. Is it streaming and untar'ing in memory, only writing the desired files to disk? 4) To restore from a single DLE's dump/tar file that's smaller than tape size, and exists on a tape with multiple other smaller DLE dump/tar files, amrestore can seek to the particular DLE's dump/tar file and only has to read that one file. Is that right? So assuming all the above is true, it'd be great if amdump could automatically break large DLE's into small DLE's to end up with smaller dump files and faster restore of individual files. Maybe it would happen only for level 0 dumps, so that incremental dumps would still use the same sub-DLE's used by the most recent level 0 dump. The issue I have is that with 30TB of data, there'd be lots of manual fragmenting of data directories to get more easily-restorable DLE's sizes of say, 500GB each. Some top-level dirs in my main data drive have 3-6TB each, while many others have only 100GB or so. Manually breaking these into smaller DLE's once is fine, but since data gets regularly moved, added and deleted, things would quickly change and upset my smaller DLE's. Any thoughts on how I can approach this? If amanda can't do it, I thought I might try a script to create DLE's of a desired size based on disk-usage, then run the script everytime I wanted to do a new level 0 dump. That of course would mean telling amanda when I wanted to do level 0's, rather than amanda controlling it. Thanks for reading this long post! -M