Gene, Stefan, Eric, Well, that was frustrating, re-routed my network cable, hit the power on the strip, lost my editing session (never edit files in /tmp, stupid stupid stupid).
I'm going to try to coalesce the threads here. I think I'm handling Eric's, Gene's and then Stefan's emails in order but I wouldn't bet all that much that I haven't messed things a little. And actually there is a good amount of cross between all of the emails. Thanks for the info on how dumporder is handled. I unfortunately don't have the resources to create new holding areas out of old disks, I'm part of the computer core and don't actually own any of the equiptment that I'm working with (well, a small bit of it but nothing related to the Samar system). This by the way is an SGI/IRIX Origin 300 multiprocessor (4x 500Mhz IP 35 processors, for what its worth). There is an internal PCI bus but the internal SCSI bus is fully occupied, the external already has the Raid array, the jukebox/SDLT and a second SDLT in a shoe box. I'm not sure now long I can safely make the daisychain. PCI to dual SCSI interface cards are available (we installed one in another Origin 300 down the hall from this one) but I've been unable to rouse interest in its purchase. If I could get additional spindle(s) it would help with contention, on the raid if not on the SCSI bus. I also need to remind myself that with the data being stored in amanda work in chunksized pieces that the new spindle need not be all that large in order to be useful. Because dump does not cross partition boundries (at least no dump that I've worked with) I feel comfortable using the xfsdump utility to dump root without worring about encoding exceptions in the disklist file. The other partitions do not crosslink so they shouldn't need exceptions either, even the ones dumped with tar (gnutar specifically). Is "dumped" really the right term there ? "Dumped with Tar" Samar is an amanda client of the Samar amanda server, there are no other clients of this server. samar 1# df -kl Filesystem Type kbytes use avail %use Mounted on /dev/root xfs 67482780 19779524 47703256 30 / /dev/dsk/dks0d2s0 xfs 63288988 60982928 2306060 97 /usr1 /dev/dsk/dks1d2s0 xfs 884945404 803357524 81587880 91 /usr5 We are using dump for root and /usr1. For /usr5 which is large we are using tar do dump the partition broken down by user directory. We do not have DLE for /usr5/dumps which is where I've placed amanda work, nor for lost+found/ the /usr5/amanda directory contains the amanda config and log files. The amount of data that any given user owns or updates is highly variable and beyond my control. Users do not even have quota limits. samar 2# du -ks /usr5/* 63442848 /usr5/allengr 13440 /usr5/amanda 20078964 /usr5/bimal 132563428 /usr5/dtaylor 0 /usr5/dumps 69378168 /usr5/hxgao 161161372 /usr5/joy 121191328 /usr5/lalor 21337456 /usr5/leith 114485136 /usr5/liw 0 /usr5/lost+found 53840856 /usr5/ninggao 40176552 /usr5/skaur 2751032 /usr5/tapu There is no assurance I will be allowed to continue using /usr5 as an amanda work area. Amanda work can consume whatever is left on /usr5, but there is no assurance that there will be any space and I just know the first time a user tries to save a large file and can't allocate the space there is going to be a problem. We always like to use dump rather than tar for root partitions. It has been my understanding that tar is just a character archiver where dump will understand the vendor specific special devices. Dump/restore are the recommended way to preserve and restore a bootable partition. No, the "global" dumptype definition does not specify the program to use, it must be a compile time default. A closer look at the two most recent amanda dump reports shows that a few partitions where promoted to level 0 and that a couple of the partitions that had encountered EOT on the tape where bumped to level 1. It is still unclear to me if the level 0 of those partitions was properly written to tape. Messages in the report include FAILURE AND STRANGE DUMP SUMMARY: samar /usr5/allengr lev 0 FAILED ["data write: Broken pipe"] samar /usr5/allengr lev 0 FAILED [dump to tape failed] samar /usr5/liw lev 0 FAILED ["data write: Broken pipe"] samar /usr5/liw lev 0 FAILED [dump to tape failed] samar /usr5/liw lev 0 STRANGE FAILED AND STRANGE DUMP DETAILS: /-- samar /usr5/allengr lev 0 FAILED ["data write: Broken pipe"] sendbackup: start [samar:/usr5/allengr level 0] sendbackup: info BACKUP=/usr/local/sbin/gnutar sendbackup: info RECOVER_CMD=/usr/sbin/gzip -dc |/usr/local/sbin/gnutar -f... - sendbackup: info COMPRESS_SUFFIX=.gz sendbackup: info end \-------- taper: tape SAMAR02 kb 161967808 fm 9 writing file: No space left on device taper: retrying samar:/usr5/allengr.0 on new tape: [writing file: No space +left on device] This last message indicating that amanda will try to write the data to the next tape volume. However there is no file to DD from the amanda work area - in order to retry amanda would have to go all of the way back to the tar and zip it again. Does amanda retry from the start and reasign to a dumper or does it simply reassign the DLE to the taper ? I could always remount the tapes and find out I guess. > I know that you as the responsible admin tend to see things > pessimistic. It is your job to do so, and I understand this perfectly > as I am "the responsible admin" for several sites, too. > > We can't afford to be too OPTIMISTIC, do we? ;-) > > I also know from my view as an active member of the list that your > installation is still in the process of development. On that note I've got to tell you that one fo the other guys here insists (rightly I should add) that the purpose of data backup is not getting the data to tape, its getting the data back off of the tape. >> So it is very likely that not all of your level 0 backups that have to >> be done first for new DLEs will fit on your tapes. BC> Given a large enough holding area I'd expect that any DD DLE? ;-) I was attemping to delineate the specific step involved. A DLE can be broken into many chunksize pieces fitting into more than one amanda work area. When the chunks are reassembled they MUST fit into a single physical tape (or I suppose virtual tape but I'm not going there today, though it is on the table for next week. Yes, for real, I have this new 4 TByes drive...) So when I said DD the partion to tape I'd meant litterally using DD internally in amanda. >> I understand that you can't run this config every day as it seems to >> have run for full 3 days this time. BC> After the initial run I added /usr5/dumps (which is on the same raid BC> partition) and run time dramatically inproved. /usr5/dumps is a holdingdisk in your amanda.conf? BC> ** This is also midleading as the failed level 0 from the previous run BC> should have re-run as level zero and many of them ran at level 1. BC> This is a "second" problem, a result of the first but a completely BC> different part of the logic. phew ... Actually the jury hasn't returned a virdict yet. I would guess that in the case of direct to tape amanda would not restart the DLE from dumper but only from taper, which can not work since there is nothing in the work-area. However I don't know that for sure, it is a guess and I haven't mounted the tapes to find out (yet). > Getting the backups right is top-priority. > Getting them fast is secondary, at least for a beginning. You can say that again. > I tend to use one "program" for the whole config as it is easier to > configure (and wrap your head around). It would be easier, but I question reliablity to restore root and can't use dump on the raid, too big. I think that was all of the open questions, I can't be certain anymore. Thank you all for your time on this, I apreciate it. Brian --- Brian R Cuttler [EMAIL PROTECTED] Computer Systems Support (v) 518 486-1697 Wadsworth Center (f) 518 473-6384 NYS Department of Health Help Desk 518 473-0773