Re: amrestore problem, headers ok but no data
Brian Cuttler wrote: > samar 31# /usr/local/sbin/amrestore -p /dev/sdlt2 samar /usr5/bimal > | /usr/local/sbin/gnutar -tf - > amrestore: 0: skipping start of tape: date 20041229 label SAMAR24 > amrestore: 1: skipping samar._usr5_lalor.20041229.0 > amrestore: 2: skipping samar._usr5_tapu.20041229.1 > amrestore: 3: skipping samar._.20041229.1 > amrestore: 4: skipping samar._usr5_amanda.20041229.1 > amrestore: 5: restoring samar._usr5_bimal.20041229.1 > ./ > [...] > amrestore./cvs_test/spider/man/CVS/ > > : read error: I/O error./cvs_test/spider/src/ > > ./cvs_test/spider/src/CVS/ That one's a GTAR DLE, isn't it? Looks as though you're only partially out of the woods for those... Try manually dd'ing data from the tape (basically, doing by hand what "amrestore -r" does). This will help you to determine whether the data's physically on the tape. mt rewind mt fsf N# for some appropriate N dd bs=32k tempfile (It looks as though the right value of N is simply the number printed by amrestore; e.g. with the tape you used to generate the above excerpt, to get samar:/usr5/tapu, use "mt fsf 2"). First, take note of the block counts printed by dd at the end, and see if they match your expectations. Note that it's counting the physical blocks it read from the tape; when it says "X+Y", the X is the number of full-size records it read -- 32 KB since you said "bs=32k"; the Y is the number of partial blocks, i.e. those that were less than 32 KB. Since your tape is configured for variable-length blocks, I *think* I'd expect to see Y=0, i.e. all blocks being 32 KB long. Ditto if you configured it for 32-KB fixed-length blocks. I'm not sure what would happen if you configured the drive for shorter fixed-length blocks -- probably depends on the drive, and its driver, whether it'd: - emulate 32-KB blocks, i.e. break each one up into, say, 512-byte blocks and reassemble them at read time, thus yielding "X+0" from dd - break up the 32-KB blocks, but not reassemble them, yielding "0+Y" from dd - write the first 512 bytes of each 32-KB block and discard the rest, yielding a garbage tape Then look at the you dd'ed off the tape. Its expected contents is a 32-KB Amanda per-DLE header (just like the one you've been successfully getting from amrestore), followed by the backup of that DLE (either DUMP or GTAR format; compressed iff you're using software compression on that DLE). You can check the contents with: dd And I'd make the suggestion that somewhere in this setup, there is > insufficient iron to do the job. I don't know anything about an > SDLT, but if its taking 33 hours of real time, that drive has got to > be doing a lot of "shoe shining" [...] I'm not sure about that. It's only 14:34 of tape time, and Amanda reports 2469.2 KB/sec tape speed. Is that reasonable for this drive? > There has been a suggested rule of thumb for tape drives and > capacities put forth that says that if it takes the drive, in > streaming mode, more than 3 hours to complete that backup by writing > a nearly full tape, its too small and slow if its streaming 100% of > the time. That might be useful for guesstimating, until one gets better stats, but I wouldn't depend on it for diagnosing problems! It's about as scientific as Moore's "law", which seems to hold surprisingly true, but even so would better be called "Moore's Observation" -- or as Dave Tillbrook's facetious comment that "the computer you want always costs $5000" :-) -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: amrestore problem, headers ok but no data
On Fri, Jan 07, 2005 at 11:17:21AM -0500, Brian Cuttler wrote: > Following Gene's model, I set the default block size on the tape > devices (sgi command # mt -f /dev/rmt/tps1d4nrns devblksz 32768) > and also switched from the varable length to the fixed length tape > device, used amlabel to relabel the tape (not what Gene indicated). > > Oddly trying to dd if=/dev/rmt/tps... read no data > > samar 85# mt -f /dev/rmt/tps1d4nrns rewind > samar 86# dd if=/dev/rmt/tps1d4nrns of=scratch > Read error: Invalid argument > 0+0 records in > 0+0 records out These two things might well be related. That dd command, without a "bs=" argument, is trying to read 512-byte blocks. But the physical blocks on the tape are 32 KB -- your adjustments have seen to that. It would be appropriate for the read() call to fail in that situation, as indeed it did. On Solaris (whose man pages I have access to at the moment), the error status would be ENOMEM; perhaps on your system it's EINVAL == "Invalid argument" instead. (The place to look that up would likely be in the man page for the tape driver -- st(7) is where I found the Solaris version.) > However, I ran amdump last night. Still having problems with TAR DLE > though oddly I was able to see that a DUMP DLE attempted to write. I'm lost. "Attempted to write" what? To tape during amdump, or to disk during amrestore? If the former, do you mean to say that the tar DLE *didn't* attempt it? > I was able to retrieve the file, using both amrestore and Eric's > suggestion of manually issuing the dd command to get the file from > tape. I was able to open the dump file (DLE for /usr1) and saw that > the file "kmitra" was present. This I thought to be good news since > the only top level file on the partition is kmitra/ (note directlry > slash). Unfortuantely xfsdump reported the file as a regular file > and not a directory and I was unable to proceed from there. Something else you could try: "amrestore -r" one of those DLEs, and "dd" it from tape as I described before. Then "cmp" the two files. They should be identical of course. That'll tell you whether there are problems with amrestore. To see whether amdump's back end (taper) is putting the data on tape correctly, try this: - run amdump *with no tape in the drive*; it'll run in degraded mode and leave all the dumps in holding disk - make copies of the dumps in holding disk - amflush them (the originals, that is) to tape - "amrestore -r" them, and/or "dd" them from tape - compare what came off the tape with the holding-disk copies you made before the amflush (use "dd" to strip off and discard the first 32 KB of each file, as I described previously, because there *will* be differences between the files' Amanda headers; but the remainders of the two files should be identical) If the holding-disk files are split into multiple chunks, you'll have to do some "dd" magic to reassemble them; don't forget to discard the first 32 KB of *every* chunk. To see if amdump's front end (dumper et al) is getting the data onto holding disk correctly in the first place, try to restore from the holding disk copy, which hasn't made the journey to-and-from the tape. (Strip off the header and, if necessary, reassemble as described above.) > I've tried to retrieve several of the TAR DLE but have been unsuccessful > with either method. Sorry, but I gotta ask: with or without "bs=32k"? Hmm, it seems some of my recipes above are premature. Oh well, I'll leave them in anyway, since they might be useful further down the road. It sounds to me as though it's time to: - send you off looking at the debug files on the clients (in /tmp/amanda unless you've configured them otherwise); I'm not sure just what you'd be looking *for*; just anything unusual... - ask you to show us the following from a run that demonstrates the problem: - the email report - log.MMDD and amdump.N files - description of the results of restore attempts for a couple of representative DLEs I can't recall: how much testing have you done on the tape subsystem itself, without Amanda in the loop to confuse things? -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: amrestore problem, headers ok but no data
On Fri, Jan 07, 2005 at 11:59:54AM -0500, Brian Cuttler wrote: > samar 126# which xfsrestore > /sbin/xfsrestore > samar 127# which xfsdump > /usr/sbin/xfsdump I suppose the theory must be that anyone can do a restore, but only root can use [xfs]dump. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: amrestore problem, headers ok but no data
On Fri, Jan 07, 2005 at 01:20:42PM -0500, Brian Cuttler wrote: > samar 170# dd of=/dev/sdlt2 obs=32k if=./scratch > 64+0 records in > 1+0 records out > > "bs" block size, "obs" outpub BS, (there is an IBS also, which I > am afraid of developing should this not resolve soon) Yup, this makes sense. Since you specified "obs", the input block size remained at the default of 512 bytes. "dd" read enough of those (64) to make a 32-KB output block and then wrote the latter. If the file had been longer, it would have repeated the process -- 64 512-byte reads followed by 1 32-KB write -- until done. It can also convert the other way, from a large ibs to a smaller obs. For a disk file, the block size doesn't matter much; things speed up if you use a larger one, but you get the same result except for dd's stats at the end. For a tape, block size can be a lot more important. With a traditional tape drive that uses variable-length blocks, each write() system call produces exactly one physical tape block, whose length is simply the length specified to write(). Correspondingly, each read() call reads exactly one physical tape block; the length specified to read() must be at least as large as the block currently under the heads, or something will go wrong (on Solaris SCSI tapes, as I mentioned before, the read will fail with ENOMEM; I don't know whether other systems behave differently, but one thing that almost certainly will *not* happen is that you get the first chunk of the physical block on this read(), and the next chunk on the next read(). One physical block for each read() call is the rule. This is why "dd" has separate "ibs" and "obs" arguments in the first place -- to let you reblock a tape, i.e. copy it to another tape while changing the block size, without making an intermediate copy on disk. When copying from one disk file to another, or between tape and disk, specifying different input and output block sizes doesn't really accomplish anything. I don't know how tape drives with fixed-length blocks (or configured for them, as in your case) work. Perhaps the drive does the necessary block-length conversion itself, using its internal RAM buffer. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: amrestore problem, headers ok but no data
On Fri, Jan 07, 2005 at 03:40:12PM -0500, Brian Cuttler wrote: > samar 5# /usr/local/sbin/amrestore -r /dev/sdlt2 > amrestore: 0: skipping start of tape: date 20050107 label SAMAR05 > amrestore: 1: restoring samar._usr5_amanda.20050107.1 > amrestore: read error: I/O error > [likewise with "dd bs=32k"] Ooo, I don't like the look of "I/O error" at all. That suggests a hardware or media problem, as opposed to anything that software has any control over. If you haven't done so (can't recall; maybe you've already described this), look in the system logs for errors from that device. If there aren't any, it couldn't hurt to take a look at the driver's man page (and pages for the SCSI subsystem in general -- it is a SCSI drive, isn't it?) to see if more verbose error reporting can be enabled. If you come up with anything, please post it. > samar 12# mt -f /dev/sdlt2 rewind > samar 13# mt -f /dev/sdlt2 fsf 1 > samar 14# cat -evt /dev/sdlt2 | more > Input error: Invalid argument (/dev/sdlt2) This one's to be expected. "cat" uses a smaller block size -- and unlike "dd", you can't change it. > samar 15# mt -f /dev/sdlt2 rewind > samar 16# mt -f /dev/sdlt2 fsf 1 > samar 17# dd if=/dev/sdlt2 bs=32768| cat -evt | more > AMANDA: FILE 20050107 samar /usr5/amanda lev 1 comp .gz program > /usr/local/sbin/ > gnutar$ > To restore, position tape at start of file and run:$ > ^Idd if= bs=32k skip=1 | /usr/sbin/gzip -dc | usr/local/sbin/gnutar > -f... > -$ > ^L$ > [EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL > PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL > PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL > PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL > PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL > PROTECTED]@ > [EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL > PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL > PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL > PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL > PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL > PROTECTED]@ > > > ** many null characters removed. Hmmm, I'd have expected to see "Read error: I/O error" again here, seeing as you get it on every other 32-KB read. Odd that the message is missing while the main output is still as though the error had occurred. > samar 18# mt -f /dev/sdlt2 rewind > samar 19# mt -f /dev/sdlt2 fsf 1 > samar 23# dd if=/dev/sdlt2 bs=32768 skip=1 | /usr/sbin/gzip -dc | > /usr/local/sbin/gnutar -tf - | more > Read error: I/O error > 0+0 records in > 0+0 records out > > gzip: stdin: unexpected end of file Yeah, more of same... If dd can't get the bits off the tape, there's not much point trying to do different things with its nonexistent output :-( > I had followed Gene's instructions I've never had to use these, so my comments here are pretty tentative. > [Commands to show that the tape drive is set for > variable-length blocks, stash the label in ./scratch, and > verify that the label was stashed ok.] I don't see the command to set the drive to 32768-byte blocks, but it's presumably there becuse: > samar 32# mt -f /dev/sdlt2 blksize > > Recommended tape I/O size: 131072 bytes (256 512-byte blocks) > Minimum block size: 4 byte(s) > Maximum block size: 16777212 bytes > Current block size: 32768 byte(s) > > samar 33# mt -f /dev/sdlt2 rewind > > samar 34# mt -f /dev/sdlt2 blksize > > Recommended tape I/O size: 131072 bytes (256 512-byte blocks) > Minimum block size: 4 byte(s) > Maximum block size: 16777212 bytes > Current block size: 32768 byte(s) > > samar 35# dd bs=32768 if=scratch of=/dev/sdlt2 > 1+0 records in > 1+0 records out If I recall from Gene's description of the problem, it's only when you go to *read* a tape that your setting gets magically zapped. So it couldn't hurt to do a final: mt rewind dd bs=32k ./scratch2 mt -f /dev/sdlt2 blksize to verify that that hasn't happened. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: amrestore problem, headers ok but no data
in a very long time. The only optimization that's left for dd to perform is a small one: if ibs and obs are the same, it can save a tiny amount of CPU time by not using an inner loop; it can just do something like this (omitting all the error checking and handling for clarity): while ((actual = read(infd, buf, bs)) > 0) { if (actual == bs) ++wholeBlocksRead; else ++partialBlocksRead; if (write(outfd, buf, actual) == actual) ++wholeBlocksWritten; else ++partialBlocksWritten; } The variables being incremented are the source of the stats dd prints at the end. The optimization is so small that in practice, dd implementations might not bother; they might just fold the ibs==obs case into one of the other two cases. If ibs and obs differ, the code has to be more complicated: a bunch of small read()s to fill up a larger output buffer, or a bunch of small write()s to empty out a larger input buffer (possibly with padding and syncing and other data-munging if specified, but none of that's relevent to this thread). > So if dd is left with a default 512 byte "ibs", input block size, > but the device is using a larger block size, like an amanda tape > of 32k, dd has allocated a 512 byte piece of memory to hold the > input data. But when dd requests the first block it unexectedly > gets 32k of data and has "insufficient memory" (ENOMEM). Just so. Or maybe an "invalid argument" (EINVAL) :-) > The reverse is not really a problem. Suppose you said "ibs=128k". > dd would simply read sufficient device blocks until the buffer > was filled, four blocks in the above example. Yes. As you've said, it would be dd that did this, *not* the kernel. dd would call read() enough times -- in this case four -- to fill the buffer. Each call would read one 32-KB physical tape block. > On output dd can make its own adjustments. If the obs is larger, > it can move multiple input buffers to the output buffer before > doing the write. If the reverse is true, input block size larger > than output, it can copy part of the input block to the output > buffer and do multiple outputs from a single input buffer. Yes, except that in neither case does it need to copy the data from one buffer to another. It can just have a single buffer that's max(ibs,obs) long, and do a number of read()s at appropriate offsets within that one buffer, then one write() of the whole thing; or vice versa. The only time dd needs to copy data internally is when it's doing more complex manipulations. That's what the "conversion buffer", whose size is given by the "cbs=" argument, is for; and why the man page bothers to discuss when the conversion buffer is or is not needed. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: amrestore problem, headers ok but no data
Apologies for following up my own post. On Sat, Jan 08, 2005 at 06:20:59PM -0500, Eric Siegerman wrote: > [...] in neither case does [dd] need to copy the data > from one buffer to another. It can just have a single buffer > that's max(ibs,obs) long [...] Oops; I'm wrong about this. It's only true if ibs is an exact multiple of obs, or vice versa. Otherwise, the data *will* need to be copied. I have no idea whether any dd implementations do in fact optimize the exact-multiple case. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: implausibly old time stamp 1969-12-31 17:00:00
On Mon, Jan 10, 2005 at 02:47:43PM -0700, Jason Davis wrote: > tar: ./2005-01-07_17-00-57/ibdata01.ibz: implausibly old time stamp > 1969-12-31 17:00:00 For some reason, tar thinks the file's timestamp is 0, or else the timestamp recorded in the tarball is in fact 0. (1969-12-31 17:00:00 is the UNIX epoch, converted to your local timezone.) I'm not sure why it's happening. Are you sure you're using GNU tar for both the backup and the restore? If the backup used gtar but the restore used your vendor's tar, there might be an incompatibility. Which version of gtar are you using? Some versions are known to be incompatible with Amanda; I haven't seen this particular problem before, but it's a possibility to consider. 1.13.25 is the usually recommended version for use with Amanda. Darn, I keep forgetting whether anyone has seen problems with 1.14 or not. Sorry. :-( > Here is the output > of stat of the original file that was backed up > [...] > Access: 2005-01-10 13:02:04.0 -0700 > Modify: 2005-01-07 17:07:31.0 -0700 > Change: 2005-01-07 17:07:31.0 -0700 Just out of curiosity, what does stat say about the *restored* file? -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: amrestore problem, headers ok but no data
On Tue, Jan 11, 2005 at 09:11:31AM +0100, Stefan G. Weichinger wrote: > The minimum blocksize value > is 32 KBytes. The maximum blocksize value is 32 > KBytes. The man pages have configure variables, which are expanded during "make". Presumably the maximum block size is one of these, while the minimum is simple, hard-coded text. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: amrestore problem, headers ok but no data
On Tuesday 11 January 2005 16:40, Jon LaBadie wrote: >Also, I think that if both types of devices exist on the same bus, >the lower performance one determines the performance of the entire >bus. In theory, this is *not* the case. One of the (many) selling points of SCSI over IDE is supposed to be that a SCSI bus can run each device at its own speed (though perhaps later versions of the IDE spec have caught up, as they have in some other respects; I dunno). Of course, the slower/narrower device will consume more of the SCSI bus's available bandwidth to carry the same amount of data, even if they don't directly affect performance of the faster/wider devices. In practice, according to the excellent SCSI FAQ, it depends on the devices in question. See these questions in particular: - "Can I connect a SCSI-3 disk to my SCSI-1 host adapter? [...]" (which isn't Brian's precise situation, but the answer might well apply) - "How can I calculate the performance I'll get with mixed SCSI devices?" The SCSI FAQ is dated, but still useful. It's at www.scsifaq.org; click on the link for "Official comp.periphs.scsi FAQ". Sorry, but the site uses too-smart-for-its-own-good navigation that makes it hard to post the actual URL. On Tue, Jan 11, 2005 at 10:48:17PM -0500, Gene Heskett wrote: > [A SCSI bus is] > double handicapped because the cable is, compared to a piece of well > built coax, pretty much a guestimate as to its operating impedance, > usually quoted as being in the 120 to 130 ohm territory, This isn't supposed to be a problem either, because cable impedence isn't supposed to be a guesstimate; it's explicitly specified in the SCSI specs. But in practice, what you've said is true; there's all manner of out-of-spec junk sold as "SCSI cables". For example, I've read that you can have problems if you put a SCSI adapter in the middle of an internal/external chain, even if all the termination is correct, because the internal ribbon cable and the external cable might have different impedences, leading to signal reflections between the two cables. For a telling, if rather ancient, anecdote told by someone from Adaptec, see question "What is the problem with the Adaptec 1542C and external cables?" in the SCSI FAQ. > To many folks forget that a a scsi bus is indeed > an rf transmission line, subject to the usual rules about vswr. Geez, you mean we've got an actual engineer in the e-room? Awesome! (Despite my rambling about impedences and such, I'm sure no hardware guy.) >From the context it's pretty clear what "vswr" means, but what does it stand for? -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / In theory, there is no difference between theory and practice. But, in practice, there is. - Jan van de Snepscheut
Re: GNU tar versions (was: Backups failing on Solaris 9 with GNU tar)
On Mon, Jan 17, 2005 at 04:45:18AM -0500, Gene Heskett wrote: > So I'm puzzled as to whether I'm still doing something wrong, or this > new tar doesn't recognize the '-f... -' for the stdin from a pipe > option. That "..." is a placeholder for "x or t, with whatever other options you want, e.g. v"; you're not meant to type it literally. > /bin/gtar: ./.kde/share/apps/RecentDocuments/cdrom.desktop: time stamp > 2020-04-27 15:36:21 is 482063157 s in the future Do the original files have the same bad timestamps? That'll determine whether this is really a bug or version incompatibility in tar, or just a case of shooting the messenger... -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: Dump level, behaviour change
On Fri, Jan 21, 2005 at 11:30:02AM -0500, Brian Cuttler wrote: > 4) We did also change the crontab for bin to force level 0 dumps on >Fridays. This change reflecting the changes made when we renamed >"notes_dlt" to "wcnotes". Are you perchance seeing these level-2 dumps disproportionally often on Sunday nights? :-) Maybe people just don't use Notes much on weekends, and now that the level-0's are synched to the work week... > [...] we are not losing any > data nor do I believe that a file restore (of individual files) would > be any more difficult than previous. A full restore would of course > entail restores from each dump level. Agreed on all points. As you said, it's a non-problem. > Could we be seeing more change in the file system than previously > because of changes to the disk cluster-size ? ie, the granularity > in the number of blocks/sectors on the Raid vs the older single > spindle disk drives ? I'd expect to see more level-bumping as the amount of changed data *decreases*. But either way, it's hard to imagine how changing the cluster size would have any effect that's visible up at the file level. It'd affect the amount of wasted space at the ends of files, but tar doesn't back up that wasted space, and I rather doubt that dump does either, so Amanda shouldn't even notice the change. > We could increase both the bumpdays and bumpsize values If it ain't broke, don't fix it! > We could even force level 0 every day if desired [...] See above :-) There might well be legitimate reasons to do this, but the current situation isn't one of them. Another possibility that hasn't been raised: has Notes's behaviour changed in some way, to make it modify fewer files? Could be fewer writes overall, or more creating of new files as opposed to modifying of old ones (which would make the old files in question contribute to the bumpable side of the ledger, instead of to the unbumpable side as before). Maybe it's a year-end rollover, i.e. Notes is now writing a new, still-small foo.2005, and the huge foo.2004 is now static. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: Dumper Timeouts
On Fri, Jan 21, 2005 at 11:08:57AM -0500, Gene Heskett wrote: > > ? sendbackup: index tee cannot write [Connection timed out] > Try increasing the etimeout value in your amanda.conf dtimeout, no? (I have no idea whether that'd help, but it's more likely to than is etimeout) -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: Dump level, behaviour change
On Fri, Jan 21, 2005 at 03:29:16PM -0500, Brian Cuttler wrote: > I think Jon LaBadie hit it Cool. I was speaking in ignorance of what the data looked like. > There is a word that I like to use for this type of design. "Hidious" Yup, that's a technical term :-) > So a one block file would > have a min size of 5 rather than 3 blocks. Well, yes and no. It does indeed consume 5 rather than 3 disk blocks, and that shows up in "du" and "df" output and in how quickly the partition fills up -- but a program that open()s and read()s the file will still see exactly the same number of bytes, no matter what the cluster size, or indeed the hardware sector size [1]. That's why I said that tar wouldn't notice the difference. Dump doesn't go through the file system, but straight to the disk, so theoretically, it could back up the unused bytes at the end of a file's final cluster. But there's no reason to do so, and I'd hope that dump wouldn't be that dumb. [1] This was one of UNIX's many innovations over then-current mainframe O/S's, which saw a file as a sequence of blocks, not of bytes as UNIX does. UNIX tried to do the same thing with tapes -- to abstract the concept of "file" away from its hardware implementation -- but as I recently described in another thread, that experiment failed, which is why we have to worry about buffer lengths and short reads and all that bs= with tapes, but not with disk files. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: Fedora Core 3 - which version of tar??
On Thu, Jan 20, 2005 at 10:22:16PM +0100, Stefan G. Weichinger wrote: > - configure and make as $AMANDAUSER I don't believe this is necessary. One should avoid building Amanda as root, but that's not because it'll cause problems for Amanda; it's for the same reason one should avoid building *anything* as root. I've never had a problem building Amanda under my own user account, and it's hard to see why such a problem might ever occur. > make install as root This *is* necessary, of course. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: Fedora Core 3 - which version of tar??
On Fri, Jan 21, 2005 at 11:05:23PM -0500, Gene Heskett wrote: > Most users are not that priviledged, and should not be. And thats the > main justification for a seperate user to run amanda. Agreed 100%! "erics" isn't a member of "disk". (Sorry I didn't mention that. I agree with the above so fully that the possibility never even occurred to me. :-) The reason I mentioned building under my own account was to back up my assertion that building as the Amanda user, or with any other kind of special privilege, is unnecessary. The build shouldn't need any particular permissions at all, since in theory: - the build doesn't modify any files outside the build (and maybe source) trees - any user or group ids that get hard-wired at build time are taken from the --with-user, --with-owner, or --with-group config parameters, not from getuid() or the like If the above claim is false, i.e. if building Amanda as your Amanda user works better for you than building it as a completely unprivileged user (given that both builds are installed as root), then IMO that's a bug in Amanda. In that case, continuing to build as the Amanda user might be a useful workaround, but should only remain necessary until the bug gets fixed. Gene, on your system, if you build Amanda as a vanilla, unprivileged user -- not root, not in the "disk" group -- and then install it as root, what specifically goes wrong? -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: Fedora Core 3 - which version of tar??
On Mon, Jan 24, 2005 at 03:51:13PM -0500, Gene Heskett wrote: > Now become 'amanda' and do an amcheck, which works just fine. > Back out of that and become 'gene' and the permissions are denied, the > user gene, even though he built it, cannot run it. > [...] > So basicly it has to be run by whomever is set in the configuration, > but not by who built it. That's my understanding. Kind of makes sense. And it's certainly how the permissions are set up here: -rwsr-x--- 1 root operator 87183 Apr 23 2004 /usr/local/sbin/amcheck (Our Amanda server is a FreeBSD box, on which group "operator" serves the same function as "disk" on your machines.) Amdump insists on being run by the Amanda user too (the file has read and execute permission for everyone, but the script itself checks). > If I were to change that line in the > configuration, then I'd assume gene could run it, but not amanda. I'd imagine so. Of course, amcheck might get some errors, since "gene" isn't in the "disk" group, and (hopefully) doesn't have permission to write index, log, and tapelist files. (If amcheck didn't notice, amdump certainly would...) > I'll leave it this way for now & see how it runs tonight. Cool. I'm looking forward to the results :-) -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: archiving tapes?!
On Thu, Jan 27, 2005 at 06:08:01PM +0100, Sebastian Kösters wrote: > if I want to restore a > tape/backup older than the last one this fails. Iam only able to restore > the last Backup from tape. I wanted to use the same amlabel for every Sunday > because I dont want the tapelist file become that long. DON'T DON'T DON'T give all your tapes the same label! This will confuse Amanda, and probably you too. Your problem with restoring from old backups is merely one symptom of Amanda's confusion. A symptom of your confusion would be mounting the wrong tape (or forgetting to change tapes), and thus overwriting a backup that you wanted to keep. There are some of Amanda's features that it can sometimes make sense to work around (e.g. what I assume you're doing to force full backups on Sundays). The tape-labelling/tapelist mechanism is NOT one of them; trying to subvert that is a REALLY bad idea. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: AW: archiving tapes?!
On Fri, Jan 28, 2005 at 01:59:32PM -0500, Eric Dantan Rzewnicki wrote: > what are some good options for long term archival storage? There's only one: redundancy! I don't know the answer to the question you're actually asking. All the media I know of are either not great under typical, less-than-ideal conditions (magnetic) or too new for there to be much real-world data (optical) -- not that I've made much of a study of it recently, I admit. But whatever technology(ies?) you choose, making multiple copies is excellent insurance. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: Removing unwanted dumps.
On Mon, Jan 31, 2005 at 12:17:18PM -0500, Jon LaBadie wrote: > I can't see anything that would cause problems if you delete > [unneeded dumps that are still in holding disk]. > Others may. The curinfo database will still think the dumps exist, won't it? That might confuse amrecover until the dumps in question have been superceded by newer ones (i.e. at same or lower dump level). Then again, maybe these particular dumps have already been superceded -- maybe that's why they're unneeded in the first place. How long would it take for this discrepancy to clear itself up? (I know the answer for dumps that *have* been taped -- until the tape gets overwritten one tapecycle hence. How does it work for untaped dumps like these?) -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: Removing unwanted dumps.
On Mon, Jan 31, 2005 at 01:11:48PM -0500, Jon LaBadie wrote: > The curinfo db has very little info that would make a difference. > As to particular dumps, things like date, size, level are only > kept for the most recent at each level. I misspoke. I should have said, "curinfo + indexes + any other info that Amanda keeps around". What you say applies to curinfo itself, but what about the rest? If nothing else, those index files will take up some space; will they ever be deleted, or will they hang around forever? -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: Trouble with amrestore
On Wed, Feb 02, 2005 at 08:13:06AM -0800, Steve H wrote: > What I am confused about, is the client thinking itself is the server. One way to change this is "amrecover -s -t ". Maybe the default can be changed by an option to configure; not sure about that. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: mounting/identifying a new tape drive
On Thu, Feb 03, 2005 at 12:47:34PM -0500, Jon LaBadie wrote: > rm -f /dev/rmt/* > devfsadm -c tape Does that do (the tape-related subset of) the same thing as a reconfiguration boot, i.e. with "-r"? -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: mounting/identifying a new tape drive
On Thu, Feb 03, 2005 at 03:02:36PM -0500, Gil Naveh wrote: > but I don't think it identified the new tape drive... > >mt -f /dev/rmt/0n > I got: > >/dev/rmt/0n: no tape loaded or drive offline On the contrary, I think that means your devfsadm command *did* work. You're now getting a tape-specific error message, instead of a generic one; that means the system now understands that /dev/rmt/0n is in fact a tape drive. So, taking the message at face value ... Was there a tape loaded? Was the drive online? -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: VXA-2 packet-loader issues and AMANDA [Fwd: hard luck with the new autoloader]
On Thu, Feb 03, 2005 at 03:03:22PM -0500, James D. Freels wrote: > Here is what [drive vendor's Tech Support said] is needed: > > 1) need a separate scsi chain; they said I already have too many scsi > devices on this chain to make it reliable. See recent threads re. SCSI cables, bus lengths, etc. (Recent == last month or two). > 2) need to upgrade the firmware in the autoloader to the latest version; > this may not work on an alpha machine and more likely will only work > from an Intel machine I sure hope you mean only that the upgrade process might need an Intel box. If that's the case, doing the firmware upgrades is the cheapest and probably easiest thing to try, even if you do have to do some temporary recabling (well, as long as you have an Intel box with a SCSI adapter...) If on the other hand you mean that, once upgraded, the unit might be less Alpha-compatible than it was before ... send the #&!*~ thing back for a refund! :-) -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: VXA-2 packet-loader issues and AMANDA [Fwd: hard luck with the new autoloader]
On Fri, Feb 04, 2005 at 07:21:59PM -0500, Gene Heskett wrote: > Aha, LVD! LVD is not compatible with the rest of the system unless > the rest of the system is also LVD. It is two, completely seperate > signalling methods that just happen to use the same cabling. Yes and no. From the SCSI FAQ: "[ANSI] specified that if an LVD device is designed properly, it can switch to S.E. [single-ended, i.e. "normal", SCSI] mode and operate with S.E. devices on the same bus segment." - http://h000625f788f5.ne.client2.attbi.com/scsi_faq/scsifaq.html#Generic099 So if you mix it with S.E., you lose its LVDness, e.g. you have to stick to a S.E. bus length; but you shouldn't fry any hardware. HVD (high-voltage differential, i.e. the original differential variant of SCSI) is another story completely! That is indeed flat-out incompatible with S.E. (and presumably with LVD too...) -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: VXA-2 packet-loader issues and AMANDA [Fwd: hard luck with the new autoloader]
On Mon, Feb 07, 2005 at 04:41:17PM -0500, Jon LaBadie wrote: > Can on look at the device connectors, or better yet, the external connectors, > and tell if a device is LVD or SE? Or does one have to check the HW doc? I have no idea. Sorry. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: Flushed, but not forgotten
On Mon, Feb 14, 2005 at 05:05:40PM +, Gaby vanhegan wrote: > If these dumps on the holding disk get superseded by other, more recent > backups, will they automatically be removed from the holding disk? Yes, when they're amflushed. (Unless there's something else preventing it, although I can't think what that might be. But simply the fact that they've been superceded doesn't keep them from being deleted once they've been flushed.) -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: VXA-2 packet-loader issues -- a scsi question added
On Tue, Feb 15, 2005 at 10:53:55AM +0100, Geert Uytterhoeven wrote: > I thought[*] 7 was the highest priority, and 0 the lowest (on a narrow > channel). That's what I recall too. > Wide devices have an even lower priority: 15 to 8. This sounds vaguely familiar too, but I'm *far* less certain about it. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: forcing skipped incrementals into holding disk
On Tue, Feb 15, 2005 at 01:26:20PM -0500, Brian Cuttler wrote: > If you are really looking to get all of the dumps, regardless of what > will actually fit on the tape you could always lie about the tape length. This is less desirable than some of the other options, at least if you're using any of the "whatever-fit" taperalgo's. In that case, Amanda chooses which DLE to tape next based partly on how much tape it thinks there is left. If you've lied about the tape length, Amanda will sometimes pick a DLE that's too big, under the mistaken impression that there's room for it. This will waste tape space, and time as well. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: Exabyte VXA-2-Packet-Loader: new problem
On Tue, Feb 15, 2005 at 01:40:44PM -0500, Jon LaBadie wrote: > If I issue another command to the drive before it is really > ready, even an "mt status", I get error messages. Thus I routinely > put in delays (sleep's) in scripts that might rewind a tape or change > a tape to another slot. As much as 20 or 30 second delays IIRC. Rather than a hard-coded delay (which is suboptimal if it's too long and breaks things if it's too short), why not write a little "mtsync" script that polls the drive for readiness, by doing "mt status" commands in a loop until one succeeds? -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: Amanda's report
On Wed, Feb 16, 2005 at 09:45:38AM -0500, Gil Naveh wrote: > What does STRANGE means? Amanda looks through the stderr output from dump (or gtar) and tries to classify each message as either an error or benign verbosity. "STRANGE" means that there was a message that Amanda doesn't recognize, and so can't classify. As others have pointed out, you have to look at the particular message to determine for yourself whether it indicates a serious problem. (In your temp-file case, I kind of suspect it does...) This isn't relevent to your case, but for general reference: the messages that Amanda can recognize are in hard-coded lists in sendbackup-dump.c and sendbackup-gnutar.c. If you get a recurring STRANGE message and want Amanda to classify it for you, you can add an appropriate regex to the appropriate list. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
amanda-users@amanda.org
On Wed, Feb 16, 2005 at 04:59:13PM -0300, Germán C. Basisty wrote: > Can't determine disk and mount point from $CWD '/root' That's normal; don't worry about it. > EOF, check amidxtaped..debug file on bombon. So what's in the amidxtaped..debug file on bombon? -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: AMDUMP not working under DEBIAN 3
On Tue, Feb 15, 2005 at 07:00:04PM -0500, Gene Heskett wrote: > [running amdump in] The > group backup is also generally acceptable. It depends. If you're using gtar, that runs as root, so indeed, the group shouldn't be relevent. (Avoiding important groups like root is a good idea from a security point of view, but shouldn't affect correct operation.) If you're using dump, though, amdump usually has to run in the group that owns the relevent special files (/dev/whatever). Which group that is, is system-dependent; I've seen "disk", "operator", and "sys". (Of course you could chgrp the special files instead, but that's less wise because something, e.g. a system upgrade or an automated file-permissions-fixer, might chgrp them back.) I said "amdump *usually* has to run..." because on some systems, dump needs to run as root; in that case, I don't know whether the group matters -- same reasoning as for gtar. Hmmm, maybe rundump could take care of running dump in the correct group, on those systems where that matters, instead of not being used at all on such systems... -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: invalid compressed data--crc error and other corruption on disk files
On Fri, Feb 18, 2005 at 11:36:46AM +, Thomas Charles Robinson wrote: > [an excellently clear, concise, and complete [1] problem report > -- thank you! -- which included the following:] > > gzip: stdin: invalid compressed data--crc error All of tar's varied complaints appear to stem from corrupt input, which in turn is adequately explained by this message. Thus, either gzip or hardware looks like the culprit. RAM is a good place to look, especially considering that the data being backed up all resides on the Amanda server; you're giving that box quite a workout. The disk and its bus (SCSI, IDE, etc.) are possibilities too, but less likely IMO -- I'd expect the kernel to detect and report the I/O errors in that case. Not to completely rule out problems with Amanda itself -- I've learned never to rule *anything* out where computers are concerned (or humans for that matter :-/) -- but it seems unlikely. As for gtar, 1.13.25 is well regarded on this list. 'Nuff said, until its input is known to be good. (After all, even if, hypothetically, tar were producing complete junk, gzip should be able to compress and decompress that junk without reporting CRC errors :-) > gzip-1.3.3-9 ... is a beta. It might be worthwhile to try the latest released version, 1.2.4. From the web page, it looks as though that version can't handle files over 2 GB, so you'll have to split up any larger DLEs. Or just disable them for the duration of the test -- no loss; it's not as if you have usable backups of them now :-( Another useful test would be to temporarily disable software compression completely. That should fairly quickly tell you whether the corruption is occurring during gzipping (whether gzip itself or hardware is the ultimate source of the problem). > Lastly, I am currently using an nfs share for the holding disk but this > was NOT being used previously and I was still getting the corruption > mentioned. Hmm, did you ever run with local holding disk, while explicitly testing holding-disk files as you're doing now? I.e. was there ever a point where neither NFS nor the tape drive was in the loop? I'm wondering about the possibility that two independent sources of data corruption -- NFS and the tape subsystem -- might be confounding your attempts to isolate "the" problem. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: invalid compressed data--crc error and other corruption on disk files
On Fri, Feb 18, 2005 at 05:33:44PM +, Thomas Charles Robinson wrote: > On Fri, 2005-02-18 at 16:30, Eric Siegerman wrote: > > On Fri, Feb 18, 2005 at 11:36:46AM +, Thomas Charles Robinson wrote: > > > [an excellently clear, concise, and complete [1] problem report Oops, I forgot to type the footnote :-) [1] "concise and complete" might sound like an oxymoron, but it's not. > > I'm wondering about the possibility that two independent > > sources of data corruption -- NFS and the tape subsystem -- might > > be confounding your attempts to isolate "the" problem. > > I was trying all the manual checks before I started using the nfs > volume. Although it may be a factor I'm prepared to continue using the > volume at this stage Agreed. That you were experiencing the problem when explicitly checking local holding-disk files, pretty much does in my hypothesis. I won't rule out NFS problems, but that's just on general principles (see my previous post) :-) NFS is very likely a red herring. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: invalid compressed data--crc error and other corruption ondiskfiles
On Fri, Feb 18, 2005 at 04:49:32PM +, Thomas Charles Robinson wrote: > An interesting point is that after a second run of my test 'some' of the > dump-files verified as good. This indicates a intermittent problem. > Would bad memory gives this type of behaviour? Oh yeah! That sure smells like a hardware problem of some sort... BTW, I might have been wrong earlier about one thing, and misleading about another: - I said that the kernel would detect SCSI- or IDE-bus errors; on second thought, I'm not so sure. It depends on the bus and its age. Any semi-recent SCSI revision has parity checking; though I know a lot less about IDE, I believe that semi-recent versions of that do CRC checking. But old IDE's don't have any bus-error detection mechanism at all, and in truly ancient SCSI's it's optional. If a bus doesn't have error correction, errors might well manifest as data corruption instead of as kernel log messages :-/ - If you do indeed have a hardware problem, removing gzip from the loop *might* remove just enough load from the machine to stop the hardware from malfunctioning; so if the problem goes away when you disable software compression, that *suggests* a gzip problem, but doesn't *confirm* it. Of course you could always run a few independent, long-running gzip's at the same time as amdump to restore the system load -- you know, something like: gzip /dev/null & as many times as Amanda now runs simultaneous gzip's. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: invalid compressed data--crc error and other corruption on disk files
On Fri, Feb 18, 2005 at 04:27:30PM -0500, Gene Heskett wrote: > It might be a beta, but its a beta thats been used by the whole planet > since back in 2002, without noticable errors. Fair enough. I stand corrected. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: Issue making amanda on Solaris 8
On Wed, Feb 09, 2005 at 12:30:44PM -0800, Steve H wrote: > killpgrp.c:90: error: too many arguments to function `getpgrp' One common cause of weird build problems on Solaris is using the wrong tool set. I don't know about this specific error, but it sort of sounds like a mismatch between the variant of getpgrp() that "configure" detected, and the one that the C compiler subsequently tried to use. To fix it, make sure that /usr/ccs/bin is in your PATH, and that /usr/ucb is *not*. If that doesn't work, try it the other way around :-) Hmmm, looking at the Solaris 8 box on which I'm typing this, it seems I built Amanda with both /usr/ccs/bin and /usr/ucb in my path. But they're in the order stated here; perhaps you have /usr/ucb first. Maybe it's sufficient to make sure that /usr/ccs/bin precedes /usr/ucb. After making any such path change, it's best to "make distclean" and rerun configure; otherwise, stale feature detections from the previous setup might continue to screw things up. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: Runtar error
On Fri, Feb 18, 2005 at 09:10:30AM -0600, Dege, Robert C. wrote: > runtar: error [must be setuid root] On Fri, Feb 18, 2005 at 10:49:46AM -0600, Dege, Robert C. wrote: > -rwsr-x--- 1 root amanda 9947 Feb 16 10:43 runtar > [plus evidence that this copy of runtar *is* the one being > used] Hmm, that looks like runtar complaining, so it must have been executed. That argues against the hypothesis that Amanda can't run runtar at all because it's not in the "amanda" group. And runtar clearly is setuid root. I wonder if the file system is mounted "nosuid". You could test it by copying the "id" program into the directory where runtar lives, making it setuid root, and running it as a nonroot user to see what it says. (MAKE SURE to nuke your copy as soon as you're finished with it; "id" presumably hasn't been audited for setuid-safety!) On a Solaris box, I get (I've edited out the list of secondary groups): % pwd /home/erics/test % ls -ld id // I took away its world-execute more for security paranoia // than for the sake of strictly emulating runtar's perms -rwsr-x--- 1 root erics 8044 Feb 21 14:39 id // The real "id" command just says I'm me -- ho hum % /bin/id -a uid=1000(erics) gid=1000(erics) groups=... // My setuid-root "id" command. Still says my uid is my own, // but note the "euid=0(root)"; that's what we're looking // for. (euid==0 && uid==) is the sign of a // setuid-root executable. (Similarly with gid's for setgid, // but that's not relevent here.) % ./id -a uid=1000(erics) gid=1000(erics) euid=0(root) groups=... // And just as a check, run it from a root shell; the "euid=" // has gone away, since both euid and ruid are now both 0. # ./id -a uid=0(root) gid=1(other) groups=... -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: Amanda-2.4.4p4 - RedHat 7.3 - Samba-3.0
On Thu, Feb 24, 2005 at 09:19:31AM +, Tom Brown wrote: > FAILED AND STRANGE DUMP DETAILS: I posted about this in the last week or two. Look for "strange" in the list archives. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: Amanda's report
On Thu, Feb 24, 2005 at 12:52:17PM -0600, Karl W. Burkett wrote: > mount("/dev/md/dsk/d92", "/tmp/.rlg.10aqFe/.rlg.10aqFe", > MS_RDONLY|MS_DATA|MS_OP > TIONSTR, "ufs", 0xFFBFEBBC, 4) = 0 That'd be the fundamental reason that ufsdump wants root. That it fails to create the temp directory otherwise, turns out to be pretty irrelevent, since what it does with the thing requires root in the first place :-/ One can only guess what it's doing, but from Jon(?)'s observation that the problem only arises on partial-filesystem dumps, my guess would be that it's figuring out which inodes to dump by traversing the file system via readdir() like any other process -- even though it does the actual backup directly, via the special file. Well, I've seen things done in even-weirder ways... -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: CVS info on FAQ
[Cc'ed to -hackers; followups should probably go there] On Fri, Feb 25, 2005 at 09:13:23PM +, Gavin Henry wrote: > Could some update the details on howto checkout things from CVS? > [...] > Just DNS changes. A guide to the repository would also be useful. In particular, which module does one want? There appear to be at least "amanda", "amanda-2", and "amanda-krb-2" to choose from. Then there are other modules like "cgi-support", which I guess are for www.amanda.org rather than for the package itself, but it'd be nice not to *have* to guess :-) (Easy for me, since I know Amanda doesn't do CGI, but someone new might not know that.) This update should go on Amanda's "CVS" page on Sourceforge, too. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: Linux Storage Options
On Mon, Feb 28, 2005 at 02:17:04PM -, Gavin Henry wrote: > Actually, Amanda is very enterprise ready. On Mon, Feb 28, 2005 at 03:47:31PM -0500, Brian Cuttler wrote: > Amanda is limited to DLE (DiskList Entries) that do not exceed the > capacity of a single tape volume. On Wed, Feb 23, 2005 at 02:04:50PM +, Bruce S. Skinner wrote: > There will always be a race between [disk and tape] > for the biggest or fastest and there is no telling who is > going to be out front next week, let alone next year. It is not > reasonable to design a system that you know is likely to break > whenever disk capacity pulls ahead in the race. If in that > eventuality you are going to have to adopt another backup solution, > the reasonable choice is to adopt that other backup solutiuon now. Gentlemen, your timing is impeccable :-) :-) -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: Amanda Backup
On Wed, Mar 02, 2005 at 01:10:53PM -0500, Jon LaBadie wrote: > starttime -100 # start 1 hour before amdump is started :)) You could kind-of actually do this. It'd just mean, "delay everything *else* by 1 hour". Now, whether there's a reason to implement it is another question entirely. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: Some questions about Amanda's emailing of backup reports
On Thu, Mar 03, 2005 at 12:35:03PM -0500, Jon LaBadie wrote: > On Thu, Mar 03, 2005 at 11:23:51AM -0600, Hull, Dave wrote: > > The amanda account has a procmail filter that saves a copy to a > > local file and forwards another copy on to the required persons. I just do that manually -- save the report to a particular email folder after I've looked it over. But yes, a procmail recipe would be even better. > Good solution, particularly if the local file can be the report content > without the email headers. Personally, I like keeping the headers -- that way, if I'm looking for the reports that match certain criteria, I can often use my mail client to give me a filtered view of them. But if you want the headers stripped, just have procmail pipe the emails to a script which does that. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / The animal that coils in a circle is the serpent; that's why so many cults and myths of the serpent exist, because it's hard to represent the return of the sun by the coiling of a hippopotamus. - Umberto Eco, "Foucault's Pendulum"
Re: Unusual dump times?
On Tue, Sep 16, 2003 at 07:35:55AM -0400, Jack Baty wrote: > With everything finally working, I'm wondering if my dump times are > excessive or to be expected. > [...] > marvin.fusio /usr0 70091804311267 61.5 177:48 404.1 31:352275.2 > scooby.fusio /usr1 55450 5443 9.8 8:49 10.3 0:11 515.1 > stewie -e/projects 1 200200 --0:02 96.7 0:009069.9 > stewie -e/software 12290 2290 --0:06 379.9 0:013149.8 > stewie hda22 14380 2535 17.6 1:26 29.6 0:03 996.3 That depends on so many things that it's hard to give a simple answer: client hardware, O/S, dump vs. gtar, many small files vs. fewer big ones, network technology, network saturation, etc. etc. etc. (And if you set all those out for me, I'd still have difficulty saying "yes, it's reasonable" or "no, it isn't".) > I plan to gradually include more machines > totalling about 20GB. If all the hosts take as long as marvin (below), > things could end up taking more than 12 hours to run. Well, to do a full backup on them all, maybe. But you won't be doing that -- as with the run you quoted, most of the DLE's are doing incrementals on any given night, so the one or two full backups dominate the stats. We run a two-configuration setup here (three actually, but two of them are similar enough that for this discussion I'm treating them as one): - A daily backup to disk (i.e. file:), which is a standard Amanda configuration of mixed fulls and incrementals - A weekly full backup of everything to tape The weekly backup is about 50 GB, and takes about 23 hours. That's why it runs Friday night :-) But the last 30 dailies took between 0:37 and 4:33 each, with 80% of them under 3 hours. Sending them to tape would presumably slow down the total duration, but with enough holding disk, the impact on the clients shouldn't be affected much. (There's lots of optimization I could do, for both configurations -- I'm not at all happy with the level of parallelism I'm getting. So far I haven't needed to worry about it.) So if you're using a standard configuration, where you let Amanda schedule full and incremental backups, I'd add in the rest of your DLEs and let Amanda run for a dumpcycle or two before worrying too much about it. Just add them in a few at a time, or you *will* face some very long dump times at first, since the first dump Amanda does of any given DLE has to be a level-0. > Wondering if I > should just stop using compression. Again, that would depend on just what the bottleneck is. If it's CPU usage on the client, try reducing from --best to --fast as someone else suggested, or try changing to server-side compression. If it's network bandwidth, go the other way: move compression from server to client, and/or increase the compression level until you start maxing out the client CPU. You can't make meaningful optimizations until you know what to optimize for, and you can't know that until you can find out, or hypothesize, which resource is saturated. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / When I came back around from the dark side, there in front of me would be the landing area where the crew was, and the Earth, all in the view of my window. I couldn't help but think that there in front of me was all of humanity, except me. - Michael Collins, Apollo 11 Command Module Pilot
Re: How to backup the firewall host itself?
This is only tangentially related to Amanda, but it seemed worth posting to the list to get others' input. On Thu, Sep 18, 2003 at 03:02:23PM -0300, Bruno Negrão wrote: > I have an amanda server on my DMZ and i like it to backup my firewall > machine(the amanda client). Are you *really* sure you want to do this? The security implications are pretty frightening! If an intruder takes over your Amanda server, they can hack Amanda to write corrupted backups. They might stick a trojan into the backup, then wait for you to restore from it. Ok, that's pretty far-fetched, but how about this? An intruder who takes over a machine on the DMZ can use it to stage attacks on the firewall. Because you've opened up ports on the firewall to accept Amanda-related connections from the DMZ Amanda server, you've given the intruder more ports to attack. Worse yet, because you have an Amanda client on the firewall, configured to accept connections from the DMZ server, an intruder can exploit any security problems (buffer overruns etc.) in Amanda itself! At the very least, an intruder who takes over your Amanda server can grab a full backup of the firewall machine -- including the firewall rules, which they can then study to look for holes. It seems to me to be *much* safer to put the Amanda server on your internal network and have it reach *out* through the firewall to the DMZ machines. (You still weaken your firewall's security this way, but not nearly as much, because the Amanda server itself is now much less subject to attack.) -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / When I came back around from the dark side, there in front of me would be the landing area where the crew was, and the Earth, all in the view of my window. I couldn't help but think that there in front of me was all of humanity, except me. - Michael Collins, Apollo 11 Command Module Pilot
Re: ANSWER: pre- and post-dump script?
On Thu, Sep 18, 2003 at 09:52:32AM -0400, Kurt Yoder wrote: > 1. compile amanda with tar=/usr/local/bin/tar > 2. copy or symlink tar to /usr/local/bin/realtar > 3. create a script /usr/local/bin/tar > 4. chmod 755 /usr/local/bin/tar The problem with this is that users typically have /usr/local/bin in their paths, so now they'll get your script instead of a vanilla tar command. Breaking the environment like this is truly evil; your users will not take it kindly! Much better would be: 1. compile amanda with tar=/usr/local/libexec/amandatar (or whatever you prefer, as long as it's not both called "tar" and in peoples' paths) 2. leave the real tar well alone!! 3. create a script /usr/local/libexec/amandatar 4. chmod 755 /usr/local/libexec/amandatar -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / When I came back around from the dark side, there in front of me would be the landing area where the crew was, and the Earth, all in the view of my window. I couldn't help but think that there in front of me was all of humanity, except me. - Michael Collins, Apollo 11 Command Module Pilot
Re: Obtaining CVS source
On Fri, Sep 19, 2003 at 11:01:04AM +0100, Stevens, Julian C wrote: > Please can someone advise me how to obtain Amanda source via cvs? > My only internet access is from an NT workstation(!) via a proxy server, but > I don't know how to tell CVS about this. I just installed TortoiseCVS (www.tortoisecvs.org) on somebody's Windows machine here. (I don't use it myself, since I pretty much stick to UNIX -- but if I were a Windows type, I think it'd be my preferred CVS client). I didn't need to configure proxy info for our use of it, but I seem to recall that its Prefs dialog had a place for it. FWIW... -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / When I came back around from the dark side, there in front of me would be the landing area where the crew was, and the Earth, all in the view of my window. I couldn't help but think that there in front of me was all of humanity, except me. - Michael Collins, Apollo 11 Command Module Pilot
Re: Use /dev/nst0 or /dev/nrst0?
On Fri, Sep 19, 2003 at 04:00:51PM -0300, Bruno Negrão wrote: > I´m using a redhat linux to make amanda backups based on tar. Should I use the > device /dev/nst0 or /dev/nrst0? > > What does the r letter stand for? Not sure. Historically, it has meant "raw", i.e. a character special file; no prefix meant "cooked", i.e. the block special file for the same underlying device. But it's been a *long* time since I've seen a tape device with both raw and cooked versions, and Linux seems to have done away with the distinction even for disk devices. Now, I suspect "r" stands for redundant :-) I'd be curious to see the output of "ls -ld /dev/*st0" on your system... -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / When I came back around from the dark side, there in front of me would be the landing area where the crew was, and the Earth, all in the view of my window. I couldn't help but think that there in front of me was all of humanity, except me. - Michael Collins, Apollo 11 Command Module Pilot
Re: amanda skipped two runs.
On Mon, Sep 22, 2003 at 01:57:58PM -0500, Darin Dugan wrote: > http://groups.yahoo.com/group/amanda-users/message/46310 > > [...] Does anyone else > think it odd that deleting that one line cures the warning without causing > any problems? Not at all. As the referenced message says, amdump is doing something whose effects are undefined. The patch makes it not do that. In case you're interested "wait()" says to wait until one of the caller's child processes exits, and then return the child's exit status to the caller (in this case, the caller is amdump). "signal(SIGCHLD, SIG_IGN)" says ... well, I won't go into what its main purpose is, since that would take explaining all about signals. But as a side-effect, it says, "don't keep around the child-process information necessary to satisfy wait() calls". In fact, I think this side-effect is system-dependent. So amdump is asking the kernel to do contradictory things: first, "forget about my child processes; I won't be asking about them", then "tell me about my child processes". The patch simply makes it not make the first request; thus the second request is no longer a problem. > I'm wondering if anyone has done a kernel upgrade to 2.4.20-20.9 (RH9.0) > as mentioned that > it fixes signal delivery race condition, as I don't know whether to > follow the Amanda patch, or > the kernel patch. There's no reason to assume they're mutually exclusive! The Amanda patch can't hurt, so do that for sure. I don't know about the kernel patch one way or the other. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / When I came back around from the dark side, there in front of me would be the landing area where the crew was, and the Earth, all in the view of my window. I couldn't help but think that there in front of me was all of humanity, except me. - Michael Collins, Apollo 11 Command Module Pilot
Re: amrecover error
On Thu, Sep 25, 2003 at 12:34:57PM -0500, chris weisiger wrote: > so how do i rewind the tape? "mt rewind" -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / When I came back around from the dark side, there in front of me would be the landing area where the crew was, and the Earth, all in the view of my window. I couldn't help but think that there in front of me was all of humanity, except me. - Michael Collins, Apollo 11 Command Module Pilot
Re: Rewind before ejecting?
On Thu, Sep 25, 2003 at 11:57:35AM -0400, M3 Freak wrote: > [...] should I just issue an "eject" > command to the drive to spit the tape out, or do I have to rewind it > before ejecting it? It depends on the tape technology, I think. DAT tapes rewind on their own. Some other kinds might not (though I don't really know). What kind of tape drive do you have? -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / When I came back around from the dark side, there in front of me would be the landing area where the crew was, and the Earth, all in the view of my window. I couldn't help but think that there in front of me was all of humanity, except me. - Michael Collins, Apollo 11 Command Module Pilot
Re: problems dumping certain filesystems
On Mon, Sep 29, 2003 at 12:06:48PM +0200, Paul Bijnens wrote: > Marc Cuypers wrote: > > Found the problem. The firewall blocked communication between taper and > > dumper. > > That's strange, because there is no immediate communication between > these two, as far as I know. > > Driver is connected with a pipe to each dumper and to taper-reader. I believe there is a dumper->taper connection, for direct-to-tape dumps. That's how I read docs/PORT.USAGE, anyway -- see the bits on stream_server() and stream_client(). But both of those processes run on the same host, so it's still hard to see how a firewall could get between them. Unless Amanda's running on the firewall machine itself -- which I'd consider an unsafe idea anyway! > Strange is also that some partitions on that host got backed up, while > others did not. It looks as though the ones that succeeded all dumped via holding disk, so in those cases there was indeed no need for dumpers and taper to talk directly. > there was no error msg whatsoever in the mail report, except that it > simply failed). Yeah, that was the part that got my attention too! But then, it's 2.4.2, so maybe that bug's been fixed. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / When I came back around from the dark side, there in front of me would be the landing area where the crew was, and the Earth, all in the view of my window. I couldn't help but think that there in front of me was all of humanity, except me. - Michael Collins, Apollo 11 Command Module Pilot
Snapshot vs. -p1 (was Re: Lot's of I/O errors)
On Mon, Jul 14, 2003 at 11:07:10AM +0200, Toralf Lund wrote: > I've been getting a lot of > > *** A TAPE ERROR OCCURRED: [[writing file: I/O error]]. On Mon, Jul 14, 2003 at 01:44:26PM +0200, Toralf Lund wrote: > Note that I've now gone back to amanda-2.4.4, and successfully flushed > some images that caused trouble in the past. I think an important question > here is whether something related to the holding disk handling or taping > of images has changed since 2.4.4. Did you ever learn anything more about this? I'd like to upgrade from 2.4.4 to -p1 or the latest snapshot, but I'd appreciate your thoughts first -- and those of anyone else who cares to speak up. Thanks much. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / When I came back around from the dark side, there in front of me would be the landing area where the crew was, and the Earth, all in the view of my window. I couldn't help but think that there in front of me was all of humanity, except me. - Michael Collins, Apollo 11 Command Module Pilot
Re: amdump - hosts report missing estimate
On Fri, Oct 03, 2003 at 06:06:55PM +0100, Steve Taylor wrote: > planner: time 0.037: no feature set from host localhost > error result for host localhost disk /var: missing estimate Don't use "localhost". Use the FQDN instead. I'm beginning to think amcheck should print a warning if it sees "localhost" in the disklist... -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / When I came back around from the dark side, there in front of me would be the landing area where the crew was, and the Earth, all in the view of my window. I couldn't help but think that there in front of me was all of humanity, except me. - Michael Collins, Apollo 11 Command Module Pilot
Re: Amanda not sending emails
On Fri, Oct 03, 2003 at 11:22:45AM -0400, M3 Freak wrote: > Amanda is using "mail", and it works fine (just sent a test message > using "mail"). Here's what I got when I typed in what you suggested: > > UNCOMPRESS_PATH="/usr/bin/gzip" MAILER="/usr/bin/Mail" Note that sometimes "mail" and "Mail" are *not* (symlinks to) the same program, so don't assume that they behave identically. At some point in the distant past (at Berkeley, I believe), someone came up with a spiffy new mail client (which we'd now consider as awesome as, say, a 486 :-) For backward compatibility, they couldn't call it "mail", since that was the name of the old(er), ugly(er) client. Instead they picked a stupid name, "Mail", and for compatibility with *them*, we've had to live with that ever since (though Mail is sometimes known as "mailx" instead.) > 02 4 * * * root run-parts /etc/cron.daily > [...] > Should I create a new file called, for example, > "amanda.cron" under "/etc/cron.daily" instead of placing the cron entry > for amanda in "/etc/crontab"? NO! Or rather, only if you want your amdump and amcheck to run: - as root -- which you don't, since they want to run as the amanda user (i.e. the value you gave for the --with-user= option to configure). And for security reasons, that user should *not* be root! - at 4:20 AM, which is quite possibly too late for amdump, and is almost certainly too late for "amcheck -m" :-) Either make separate entries in /etc/crontab, or use "crontab -e" to create a crontab for the amanda user. As far as I can tell, the two methods are identical. (The only difference is that the user themselves can use "crontab -e", but only root can edit /etc/crontab. For the amanda user, that's probably irrelevent.) On Fri, Oct 03, 2003 at 12:29:06PM -0400, Jon LaBadie wrote: > On Fri, Oct 03, 2003 at 11:22:45AM -0400, M3 Freak wrote: > > [in crontab] > > MAILTO=root > > The parameter in the amanda.conf file "mailto" (lowercase) is > set correctly isn't it? > > But the uppercase form does appear in the source code. I don't know > if it might pick this up from the environment. Doesn't look that way. The only place MAILTO occurs (in the 2.4.4 source tree) is in server-src/conffile.c, where it's (a) an enum constant, and (b) a string that's used to (case-insensitively) match tokens from amanda.conf. I think that the MAILTO environment variable is a red herring. However, it might be useful in tracking down the real problem. What MAILTO in a crontab does is to tell cron where to email the program's stdout and stderr, if there is any -- but of course there isn't from "amcheck -m", since that mails its report instead. But what if amcheck is dying before it sends the email? Like, because it's being run as root instead of the amanda user perhaps... Try removing the "-m", and setting MAILTO properly in the crontab, if it isn't already. Then "amcheck" will write its report to stdout, and cron will mail you the results. That way, if amcheck is aborting early, you'll see the messages. Also check the mail logs, as others have suggested; and the cron logs too (they're in /var/log, or /var/cron, or /var/spool/cron, or somewhere like that). Either of those might tell you something useful. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / When I came back around from the dark side, there in front of me would be the landing area where the crew was, and the Earth, all in the view of my window. I couldn't help but think that there in front of me was all of humanity, except me. - Michael Collins, Apollo 11 Command Module Pilot
Re: Dumper Patch Problem
On Tue, Oct 07, 2003 at 10:57:50AM -0500, Jim Summers wrote: > Reading the patch file does it mean I simply need to delete the line > dumper.c that has the SIGCHLD in it? Correct. > But I guess I should figure out the correct patch command also? Well, you could, but for this particular patch, editing by hand is probably easier :-) -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / When I came back around from the dark side, there in front of me would be the landing area where the crew was, and the Earth, all in the view of my window. I couldn't help but think that there in front of me was all of humanity, except me. - Michael Collins, Apollo 11 Command Module Pilot
Re: exclude question
On Tue, Oct 07, 2003 at 05:07:50PM -0400, Jean-Francois Malouin wrote: > one subdir contains more than 100GB [i.e. tape size] but > also has something close to 2000 subdirs. You could write a script to generate the excludes dynamically. Or maybe you could rearrange the directories a bit: split them up into tape-sized subsets, and use symlinks to provide the appearance that they're all still in one directory. That is, if you currently have: /mountpoint/subdir/aardvark ... /mountpoint/subdir/zebra turn it into something like: /mountpoint/subdir-storage/a-m/aardvark ... /mountpoint/subdir-storage/n-z/zebra and: /mountpoint/subdir/aardvark -> ../subdir-storage/a-m/aardvark ... /mountpoint/subdir/zebra -> ../subdir-storage/n-z/zebra -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / When I came back around from the dark side, there in front of me would be the landing area where the crew was, and the Earth, all in the view of my window. I couldn't help but think that there in front of me was all of humanity, except me. - Michael Collins, Apollo 11 Command Module Pilot
Re: exclude question
On Tue, Oct 07, 2003 at 11:19:37PM +0200, Paul Bijnens wrote: > include "./[a-m]*" That works too -- and it's a lot cleaner than either of my ideas :-/ -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / When I came back around from the dark side, there in front of me would be the landing area where the crew was, and the Earth, all in the view of my window. I couldn't help but think that there in front of me was all of humanity, except me. - Michael Collins, Apollo 11 Command Module Pilot
Re: more doubts
On Fri, Oct 17, 2003 at 11:18:48AM +0200, JC Simonetti wrote: > Just to be sure we are talking about the same thing [...] We are indeed on the same wavelength. > I have different values for the filemarks, measured with another program and not > amtapetype. > Do you know if the filemarks are application-dependant, tape-dependant, > tape-and-taper-dependant ??? Doesn't surprise me. A tape mark is a particular bit pattern on the tape (well, duh!). Whether it's a funny otherwise-illegal block, which takes up some amount of extra space, as I gather was the case with old mainframe 9-track tape; or a one-bit field in a block header that would have been written anyway, as seems a reasonable hypothesis for "filemark 0" technologies like DDS; or something else, depends on the drive technology. So in theory, all tape marks written by a given drive should be the same size. But you can't read a tape mark. The whole idea is that it's a way to represent an out-of-band signal, i.e. end of file. So the drive reads the tape mark, but all it'll tell you about that is, "I just read a tape mark". You don't get to see the magic bit pattern. Thus, programs *can't* find out a tape mark's size for sure; they have to guess, using some heuristic or other. Different programs will use different heuristics, so I suppose they'll guess different values for what is in fact, the same quantity. We can only hope that their guesses are "close enough" to the real value. Thus, the tape mark itself is not application-dependent, but the estimate of its length *is*. > The IBM fms software tells that filemarks are tape-and-taper-dependant. Do you know > more? Do you have any opinions concerning that? By "tape-and-taper-dependent", do you mean, "dependent on the drive and on the particular tape"? If so, I'd agree. In theory, as I said, it depends only on the drive. But in practice, this is mag-tape we're talking about. It's a notoriously unreliable medium. One way of dealing with that, which was used by the old 9-track stuff, is that writing a block consisted of: 1. Write the block 2. Go back and (try to) reread it 3. If you can't read it, skip forward a bit (erasing? Not sure) and retry steps 1-2 4. All of this was happening within the hardware, i.e. within one O/S-level I/O request. There was a threshold (number of retries? length of tape consumed? Not sure) after which the drive would give up and return an error status to the O/S 5. Perhaps the O/S would retry the failed write a few times (steps 1-4, i.e. each software retry would involve many hardware-level retries) 6. Perhaps the O/S would then print a message on the console asking the operator what to do -- the mainframe equivalent of "Abort, Retry, Fail?". If the operator said "retry", repeat steps 1-6. Only if the operator said "fail" would the application's I/O request (the local equivalent of the UNIX write() sysstem call) finally return with an "I/O Error" status. Of course the read algorithms in both hardware and O/S knew to compensate for all of that. So you can see that there are potentially an *awful* lot of retries going on there, each one consuming a small chunk of tape. Thus, a tape record of a given length (whether a data block or a tape mark) would always take up the same amount of tape *for the record itself*, but the "inter-record gap" preceding the record could vary wildly in length -- anywhere between a fraction of an inch to several feet. Thus, the *apparent* length of a tape mark (or of any other tape record, for that matter) would depend not only on how much tape the bit pattern itself occupied (constant, I presume), but on whether you happened to try to write it on a bad patch of the tape or a good one. (I'm not sure how newer technologies deal with this error-prone-ness; if they have better ways, the variablity in apparent block lengths might be a lot less. But they haven't been able to reduce that variability to zero -- at least not for DDS3, judging by my amdump reports. And for streaming technologies, that introduces yet another variable.) So you can see that trying to intuit a tape mark's length is, to put it mildly, a bit of a challenge. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / When I came back around from the dark side, there in front of me would be the landing area where the crew was, and the Earth, all in the view of my window. I couldn't help but think that there in front of me was all of humanity, except me. - Michael Collins, Apollo 11 Command Module Pilot
Re: more doubts
On Fri, Oct 17, 2003 at 09:53:12AM -0400, Jon LaBadie wrote: > The sermon is, unless you see really ridiculous filemark values, > don't worry. They have nearly zero impact on amanda planning. Wisdom has been spoken :-) -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / When I came back around from the dark side, there in front of me would be the landing area where the crew was, and the Earth, all in the view of my window. I couldn't help but think that there in front of me was all of humanity, except me. - Michael Collins, Apollo 11 Command Module Pilot
Re: tape type for HP DDS3 C5708A
On Sat, Oct 18, 2003 at 03:29:13PM -0400, Jon LaBadie wrote: > On Sat, Oct 18, 2003 at 11:24:05PM +0530, Rohit Peyyeti wrote: > > FAILURE AND STRANGE DUMP SUMMARY: > > localhost /h lev 0 FAILED [dumps too big, but cannot incremental dump new disk] > > [plus three more of the same] > > This suggests to me that the DLE's are each larger than your tape. I don't think so. In that case, the message would have been "dump larger than tape, but cannot yada yada" (Phase 1 of delay_dumps()). The "dumps too big" variant comes from Phase 2, when a full dump is due, but planner wants to postpone it and do an incremental instead, to fit all the DLEs onto the tape. In the case of a new disk, as you pointed out, the do-an-incremental-instead option isn't possible, so planner's only choice is to skip the DLE entirely. Rohit: The answer, as Jon said, is to add DLEs a few at a time -- or, in your case, it looks like *one* at a time :-(, so as not to ask Amanda to put more on a tape than will fit. Or else just don't worry about it; leave all of the DLEs in there and let them fight it out for tape space :-) Sooner or later, they'll all make it onto tape, as //windows/UsersG3 did this time. I doubt that it'll happen any faster if you follow the usual advice and only add them slowly. The only advantages of doing it that way are: - you won't get Amanda shouting at you that YOUR BACKUPS FAILED!! - if some DLEs are more important to back up than others, you get to put those into the backup system first, rather than letting Amanda choose randomly which one(s) to add in any given run > > --> some NT_STATUS_OBJECT_NAME_NOT_FOUND errors <-- Someone recently posted a solution to that, I think. Couldn't hurt to check the archives for it. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / When I came back around from the dark side, there in front of me would be the landing area where the crew was, and the Earth, all in the view of my window. I couldn't help but think that there in front of me was all of humanity, except me. - Michael Collins, Apollo 11 Command Module Pilot
Re: recommendation needed
On Fri, Oct 17, 2003 at 06:43:40PM -0400, Jon LaBadie wrote: > I tried some throughput checks today. Test one was a "cp -r" > of a directory tree with 8.5GB (only a few large files) and > test two was a ufsdump of a 1GB partition. Both gave between > 3 and 3.5MB/sec rates to the NFS device. That certainly is > higher than the 1MB/sec I get to tape, but quite a bit lower > than the rate to a local disk. NFS-2 write performance is lousy, since the server needs to effectively fsync() on every write() call before it returns status to the client. (I suspect that one tends to notice this with large files more than with small ones, because we expect lots-of-small-file performance to be worse in any case.) NFS-3 is supposed to be better, as is FreeBSD's "nqnfs" (the "nq" stands for "not quite"), but I've never tried either. Here's a test, in addition to the ones Tony suggested: - Try copying a single large file from local disk to the NFS-mounted drive - Try copying the same large file, to the same partition on the NFS server, using FTP (if you use scp instead, turn off compression to keep it from becoming CPU-bound and confusing things) Comparing the length of time for those two tests will tell you how much of the problem is in NFS itself, and how much is in the lower layers of the networking subsystem. Also, if you can sit beside the Snap Server when doing those tests, listen to it. If its disks make noise when they seek, I bet you'll hear a *lot* more of that during the NFS test than during the FTP one. If there's a disk-activity LED, you might see a difference there too. (I don't know how this manifests in more scientific measurements like iostat results, if it does at all... The completely unscientific seek-rattle is more viscerally convincing anyway; at least I found it so :-) -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / When I came back around from the dark side, there in front of me would be the landing area where the crew was, and the Earth, all in the view of my window. I couldn't help but think that there in front of me was all of humanity, except me. - Michael Collins, Apollo 11 Command Module Pilot
Re: Question about new client setup
On Mon, Oct 20, 2003 at 11:53:34AM -0700, Dana Bourgeois wrote: > alta / lev 0 FAILED [disk / offline on alta?] I hate that message; it's extremely misleading. Besides what it says, it can also mean that sendsize was unable to parse the output of whatever subcommand it ran. That in turn can be because it used the wrong subcommand entirely (e.g. /usr/ccs/bin/dump instead of ufsdump on a Solaris box). Only rarely, it seems, does "disk foo offline" actually mean that disk foo was offline :-( Try looking in the "amdump" file and the various debug files (especially the sendsize* ones on the clients in question). > I'm wondering if it's the mismatch between > client version and server. Unknown, but docs/UPGRADE doesn't mention it (and it does mention an earlier incompatibility, so that'd be the place to look). I'd try to rule out other problems first. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / When I came back around from the dark side, there in front of me would be the landing area where the crew was, and the Earth, all in the view of my window. I couldn't help but think that there in front of me was all of humanity, except me. - Michael Collins, Apollo 11 Command Module Pilot
Proposal for more run-time configuration
force use of mmap instead of shared memory support --with-assertions compile assertions into code --with-gnu-ld assume the C compiler uses GNU ld default=no --with-pic try to use only PIC/non-PIC objects default=use both --with-krb4-security=DIR Location of Kerberos software /usr/kerberos /usr/cygnus /usr /opt/kerberos --without-debugging=/debug/dir do not record runtime debugging information in specified directory --with-tmpdir These should be configurable at run time. Actually, both for backward compatibility and to make initial installation easier, it probably makes sense to keep the current configure-time options, but it should be possible to override their values in a run-time config file. This would be similar the way Apache configuration works. --with-index-server=HOST default amanda index server `uname -n` --with-config=CONFIG default configuration DailySet1 --with-tape-server=HOST default restoring tape server is HOST same as --with-index-server --with-tape-device=ARG restoring tape server HOST's no rewinding tape drive --with-ftape-rawdevice=ARG raw device on tape server HOST's if using Linux ftape >=3.04d --with-changer-device=ARG default tape changer device /dev/ch0 if it exists --with-gnutar-listdir=DIR gnutar directory lists go in DIR localstatedir/amanda/gnutar-lists --with-maxtapeblocksize=kbMaximum size of a tape block --with-debug-days=NNnumber of days to keep debugging files default=4 --with-dump-honor-nodump if dump supports -h, use it for level0s too --with-tmpdir=/temp/dir area Amanda can use for temp files /tmp/amanda --with-testing=suffix use alternate service names For these, providing both configure-time and run-time configuration is especially useful, since configure "looks for one" by default. --with-gnutar=PROG use PROG as GNU tar executable default: looks for one --with-smbclient=PROG use PROG as Samba's smbclient executable default: looks for one This must be settable at configure time. For obvious bootstrapping reasons, it can't be overridden by a value in the run-time config file; but, as with Apache, it should be overridable on the command line. --with-configdir=DIR runtime config files in DIR sysconfdir/amanda These ones I'm not sure about. My arguments above point to making them run-time configurable, but there may be security-related arguments against this. (I've put the Kerberos-related ones in this class simply because I don't understand them...) --without-amandahosts use .rhosts instead of .amandahosts --without-bsd-security do not use BSD rsh/rlogin style security --with-portrange=low,high bind unreserved TCP server sockets to ports within this range unlimited --with-tcpportrange=low,high bind unreserved TCP server sockets to ports within this range unlimited --with-udpportrange=low,high bind reserved UDP server sockets to ports within this range unlimited --with-user=USER force execution to USER on client systems required --without-force-uiddo not force the uid to --with-user Kerberos options: --with-server-principal=ARGserver host principal "amanda" --with-server-instance=ARG server host instance "amanda" --with-server-keyfile=ARG server host key file "/.amanda" --with-client-principal=ARGclient host principal "rcmd" --with-client-instance=ARG client host instance HOSTNAME_INSTANCE --with-client-keyfile=ARG client host key file KEYFILE --with-ticket-lifetime=ARG ticket lifetime128 Special cases: --with-owner=USER force ownership of files to USER default == --with-user value This is overloaded to mean both "set files to owner USER" and "run executables as user USER". It should be split into two; I DO NOT want Amanda running as the owner of its files -- least privilege and all that. Assuming it's split into --with-file-owner=USER to set file ownership, and --with-run-user=USER to specify who gets to run amdump, the former obviously has to be a configure-time option, and the latter goes into the "not sure because of security" list. --with-group=GROUP group allowed to execute setuid-root programs required As for --with-owner Not sure about these; I don't understand the issues well enough to have an opinion: --with-fqdnuse FQDN's to backup multiple networks --with-buffered-dump buffer the dumping sockets on the server for speed These are splendid examples of what I'm advocating :-) --with-indexdirdeprecated, use indexdir in amanda.conf --with-dbdir deprecated, use infofile in amanda.conf --with-logdir deprecated, use logfile in amanda.conf -- | | /\ |-_|/ > Eric Siegerman,
Re: NFS mount as second holding disk
On Thu, Oct 23, 2003 at 08:32:31AM -0500, Dan Willis wrote: > Has anyone successfully used an NFS mount as a secondary holding disk? Haven't tried it. > Can backups still be run through dump or should they all be tar > going this route? I don't see offhand why it would make a difference. From the Amanda server's point of view, they're both just byte streams coming in over a socket. > Or is this just not advisable at all? Perhaps this is overly alarmist, but over on the info-cvs list, the common wisdom is that people should *not* access their CVS repositories over NFS, but use the CVS client/server protocol instead. That's because interoperability problems between different O/S's have been known to knock holes out of files (whole blocks of NULs instead of the data that should be there). A recent thread discussed the problem, and the circumstances in which one can probably get away with NFS-mounting one's repo. See this message from Larry Jones, one of the main CVS maintainers at this point, and a guy who, IMO, generally seems to know what he's talking about: http://mail.gnu.org/archive/html/info-cvs/2003-10/msg00060.html and the final paragraph of this one: http://mail.gnu.org/archive/html/info-cvs/2003-10/msg00064.html (The rest of the thread is of less interest, since it deals with more CVS-specific issues.) There are circumstances in which that problem isn't as critical (e.g. CVS working directories; since everything's in the repo anyway, all you risk is your latest round of changes). But a backup isn't the kind of thing I'd want to gamble with -- especially not a compressed one, where an error would trash the rest of the file! -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / When I came back around from the dark side, there in front of me would be the landing area where the crew was, and the Earth, all in the view of my window. I couldn't help but think that there in front of me was all of humanity, except me. - Michael Collins, Apollo 11 Command Module Pilot
Re: log files lost
On Tue, Oct 28, 2003 at 04:23:08PM +0300, vlad f halilow wrote: > hello everyone. by different reason's log files of amanda war > deleted. i want to manual recover it from tape (amrecover do > not working, Warning: no log files found for tape serversXX for > all tapes ), but i cannot. > [...] > [EMAIL PROTECTED] # tar tvf /dev/rmt/0bn You can still use amrestore; it's only amrecover that needs index files, logs, etc. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / When I came back around from the dark side, there in front of me would be the landing area where the crew was, and the Earth, all in the view of my window. I couldn't help but think that there in front of me was all of humanity, except me. - Michael Collins, Apollo 11 Command Module Pilot
Re: Rename tape labels?
On Fri, Oct 31, 2003 at 01:25:12PM +0530, Rohit wrote: > Is there anyway I can change tape label names after I'm well into amanda > dumpcycle? I currently have TLS-Set-1-01.. Set-1-02.. Set-2-01 and so on. > I want to get rid of this set concept from the label. This is untested; treat it with caution... 1. To begin the process, change the "labelstr" regexp in amanda.conf so that both the old and new label styles are recognized. 2. For the next tapecycle's worth of dumps, you'll have a mix of old- and new-style labels. Each day that a tape with an old-style label comes up to be reused: a. Do "amrmtape " b. Relabel the tape with a new-style label. Amanda will consider it to be a "new tape", but that's ok. 3. After all of the old-style tapes have been relabelled, change "labelstr" again, to only accept the new style of labels WARNINGS: - DO NOT do (2) all at once for all of the tapes; you'll clobber your backups. Make sure you do those steps, for each tape, just before Amanda would have overwritten that tape anyway. - If you're still on your first tapecycle, i.e. you're still adding new tapes one at a time, you'll have to finish adding in your new tapes before you can even begin step (2). - During this process, you're circumventing all (or most) of Amanda's protections against clobbering the wrong tape. So be careful! -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / When I came back around from the dark side, there in front of me would be the landing area where the crew was, and the Earth, all in the view of my window. I couldn't help but think that there in front of me was all of humanity, except me. - Michael Collins, Apollo 11 Command Module Pilot
Re: Spindle numbers in disklist
On Thu, Nov 06, 2003 at 10:05:19PM +0100, Alexander Jolk wrote: > I thought that by giving different spindle numbers to [NFS-mounted DLEs], amanda > would back them up in parallel (barring holding disk contention of > course). By default, Amanda will only back up one DLE *per client* at a time. You can fix that with the "maxdumps" parameter in amanda.conf, either globally or per-dumptype. "inparallel", by contrast, controls overall parallelism among the different clients. Before a new dump will start, among other constraints, there have to be fewer than "inparallel" dumps already running overall, *and* fewer than "maxdumps" dumps running on the client in question. The "client-constrained" state happens when either the "maxdumps" condition or the DLEs-per-spindle condition (which you've already dealt with) fails; if the "inparallel" condition fails, you get a different state reported -- "no-dumpers" I think. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / When I came back around from the dark side, there in front of me would be the landing area where the crew was, and the Earth, all in the view of my window. I couldn't help but think that there in front of me was all of humanity, except me. - Michael Collins, Apollo 11 Command Module Pilot
Re: moved to new disk, now amanda wants to do level 0's on whole system
On Fri, Nov 14, 2003 at 09:20:23AM -0500, Jay Fenlason wrote: > Also, cp/fr may not have correctly reset the modification times of the > files when it copied them. Oh, and they may not handle links well > either. To copy directory trees, I usually use "( cd /fromdir ; tar > cf - . ) | ( cd /todir ; tar xpf -)", which preserves modification > times, and permissions. I've had problems with tar, too. Unfortunately, that was so long ago that I forget what they were. Maybe it stores only mtime in the tarball, and on extraction sets both mtime and atime to the saved mtime value. Oh, and I think it likes to (try to) copy the contents of special files, FIFOs, and the like, instead of recreating them in the destination tree. Until recently, I used the cpio variant of your suggestion: cd /fromdir find . -depth -print0 | cpio -padmu0 /todir (You need GNU find and cpio for the "0" part to work. -depth is to get the directories' mtimes copied properly. It makes each directory come *after* its contents in the file listing. Without -depth, the directory would come first; cpio would properly set its mtime, and then stomp on it by creating the directory's contents.) But then I discovered rsync. Rsync rocks. "rsync -aH" copies everything the kernel lets you copy (i.e. not ctimes, and not inumbers). The only problem with rsync is the weird way it gives meaning to a trailing slash; these two are *not* equivalent: rsync -aH srcdir/ destdir rsync -aH srcdir destdir Then again, I'm not sure whether either cpio or rsync can deal with a username that's changed its numerical userid, or similarly for groups. I think some tar's can. Or maybe it's cpio that can handle that; can't remember. And gtar probably doesn't have any of those problems -- people are using it for backups after all :-) -- but it's not always available, and even non-GNU cpio's do everything but the "0" trick. But all of those -- tar, cpio, rsync -- are kludges. Is it just me, or do other people also find it ludicrous that 30+ years on, UNIX still doesn't have a proper copy command? -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / It must be said that they would have sounded better if the singer wouldn't throw his fellow band members to the ground and toss the drum kit around during songs. - Patrick Lenneau
Re: degraded mode
On Tue, Nov 18, 2003 at 12:52:04PM +1100, Barry Haycock wrote: > FAIL driver /dev/rdsk/c0t1d0s7 2003112 2 [can't dump no-hold > disk in degraded mode] > [...] > What is degraded mode? It's what happens when there's no tape in the drive, or no more room on the tape, or for whatever other reason, Amanda decides that the tape has become unusable. In that case, Amanda tries to dump as much as it can to the holding disk, and leaves it there for you to amflush to tape later. A no-hold DLE is one that has been configured to be dumped straight to tape, bypassing the holding disk. If all of this run's dumps must go to the holding disk, and this DLE must not go to the holding disk, you can see why Amanda has a problem with that... > This file system just happens to be where Amanda dumps are done to. An excellent reason for it to be marked no-hold. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / It must be said that they would have sounded better if the singer wouldn't throw his fellow band members to the ground and toss the drum kit around during songs. - Patrick Lenneau
Re: Permission Denied error on client
On Mon, Nov 17, 2003 at 04:59:19PM -0500, John Grover wrote: > Amanda Backup Client Hosts Check > > ERROR: host.domain.edu: [could not access /dev/vx/rdsk/var (/dev/vx/rdsk/var): > Permission denied] > ERROR: host.domain.edu: [could not access /dev/vx/rdsk/rootvol > (/dev/vx/rdsk/rootvol): Permission denied] > > Is this a read permission error on the filesystem or an execute error > on vxdump? Looks like the former. Check the ownership and permissions on the special files mentioned. The user/group under which vxdump is running needs read permission. I don't know about vxdump, but other dumps I've used do NOT need write permission, and so I do my best to arrange that they don't have it, even if that means deviating from the defaults for the system in question. Least Privilege, and all that. E.g. brw-r- 1 root sys 32, 8 Jun 23 2000 /dev/dsk/c0t1d0s0 Amanda was configured with "--with-group=sys", and for good measure, the "--with-user=XXX" user (which is NOT root) is a member of group "sys" in /etc/group. For FreeBSD, replace "sys" with "operator". For Linux, it probably depends on the distro, or you might have to chgrp the special files to a group you've created, as it looks as though I did here. On at least some of our systems (can't remember which ones), the original mode was 660; I had to chmod it to 640. So far, nothing's blown up as a result... -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / It must be said that they would have sounded better if the singer wouldn't throw his fellow band members to the ground and toss the drum kit around during songs. - Patrick Lenneau
Re: Testing tapes before use / bad tape
On Mon, Nov 24, 2003 at 09:46:31AM +0100, Martin Oehler wrote: > Hmm, the only option that sounds like it could speed up the [amtapetype] process > is blocksize. Does anyone know a good value for this? The same value as amdump will be using! With some tape technologies, the tape's capacity depends very much on the block size. In such a case, using a different block size for the test would give misleading results. On Sun, Nov 23, 2003 at 10:28:38AM +0100, Martin Oehler wrote: > My second problem is how to handle the "short write"? > I have to send in the tape, but the are 3-4 GB of data on this tape. > Without this data, my backup is inconsistent. The only possibility > I see (at the moment) is doing a full backup of the partitions having > some data on this tape. That's one possibility. You can use "amadmin force", staging the full backups over a few runs if necessary to fit them in. Another possibility would be to wait a tapecycle (or at the very least a dumpcycle) for the backups to expire on their own. (Don't forget to erase the tape before sending it back, if it contains anything confidential.) -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / It must be said that they would have sounded better if the singer wouldn't throw his fellow band members to the ground and toss the drum kit around during songs. - Patrick Lenneau
Re: amtapetype idea (Was: Testing tapes before use / bad tape)
On Mon, Nov 24, 2003 at 07:14:56PM +0100, Paul Bijnens wrote: > My idea was to write only one large file in the first pass, just > until [amtapetype] hits end of tape. One problem with that is that the drive's internal buffering might distort the results, by letting amtapetype think it has successfully written blocks that in fact won't make it to tape. (That's a problem anyway, of course, but sticking in a filemark every once in a while puts a known upper bound on the error.) Perhaps amtapetype could have a "test-tape" flag, that would basically tell it to suppress the second pass. Or the second pass could become a verification pass (just re-seed the random-number generator to the value from the beginning of the write pass). Or provide both options. Of course that would make "amtapetype" a rather misleading name. "amtape" would be a great choice for a new name; too bad it's taken :-/ -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / It must be said that they would have sounded better if the singer wouldn't throw his fellow band members to the ground and toss the drum kit around during songs. - Patrick Lenneau
Re: Running amdump leads to high CPU load on Linux server
On Sun, Nov 23, 2003 at 07:46:32PM -0500, Kurt Raschke wrote: > ...when amdump runs, the load spikes to between 4.00 and > 6.00, and the system becomes nearly unresponsive for the duration of > the backup. The server is backing up several local partitions, and > also two partitions on remote servers. Are you short of RAM? If the system's paging heavily, that'd make it crawl too. > I've tried starting amdump > with nice and setting it to a low priority, but when gtar and gzip are > started by amanda, the priority setting is somehow lost. Not surprising. Recall that Amanda runs client/server even when backing up the server's DLE's. The client-side processes are descendents of [x]inetd, not of amdump, and so don't inherit the latter's "nice" level. > The server > isn't even trying to back up multiple partitions in parallel, By this do you mean, "only one DLE at a time"; or "only one DLE *from the server* at a time, along with remote backups in parallel"? If the latter, well, of course there's some amount of server-side work even for the remote DLEs. Is the compression for the remote DLEs client- or server-side? If the latter, change "some amount" to "a lot" in the previous sentence :-) -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / It must be said that they would have sounded better if the singer wouldn't throw his fellow band members to the ground and toss the drum kit around during songs. - Patrick Lenneau
Re: Memory requirements for Amanda Server
On Mon, Nov 24, 2003 at 03:26:39PM -0500, Jon LaBadie wrote: > Anyone out there running amanda on a 386 or 486 with 16M of ram? Not any more, but I did in 1995 or thereabouts :-) 486-DX33 (or was it a DX2/66?) with 16 whole megabytes worth of 30-pin SIMMs. It turned out that (a) that wasn't enough RAM, and (b) FreeBSD 2.0.5's low-memory robustness left a fair amount to be desired, as I discovered a few times when I came in in the morning to find the backup server down, and the /var partition, where the holding disk was, thoroughly trashed. (It also turns out that FreeBSD still supports the *3*86 -- last month's 4.9 release contains a bug fix for it. I imagine Linux still does too. Both are equally gratifying...) I think the Amanda version was 2.2.6 -- just saw that mentioned near the beginning of the ChangLog, and it rings a bell. Interesting limitations (some from memory, some from ChangeLog; some probably incorrect): - No changer support. One tape per run; that's it, that's all. - No indexing, no amrecover. Amrestore would pull dumps off the tape, but from there you were strictly on your own. - No gnutar; it was strictly dump. - No "reserve". Effectively, it was hard-wired to 100%. - No "runspercycle". - No chunked dumps on holding disk. - No promote_hills() in the planner. If you missed a tape (e.g. for a holiday), causing that night's full dumps to be postponed, they'd have a strong tendency to stay clumped together with the next night's full dumps for a long time, at least on a small network like the one I was responsible for. - Blair Zajac's extensive patch set hadn't yet been merged into the canonical sources. If you wanted them, you had to download and apply them yourself. I don't believe Jean-Louis was involved yet (from the ChangeLog, runspercycle and chunked dumps were among his early patches); I'm not even sure that (the long-departed) Alexandre Oliva and John R. Jackson were around back then. Truth be told, I think Amanda was rather moribund at the time (hence the existence of BZ's patches in the first place); ISTR that it was Alexandre who woke the project up again. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / It must be said that they would have sounded better if the singer wouldn't throw his fellow band members to the ground and toss the drum kit around during songs. - Patrick Lenneau
Re: Barcode readers: how important?
On Tue, Nov 25, 2003 at 05:28:02PM +0100, Alexander Jolk wrote: > I understand that tape changing is also a somewhat faster process if > barcode labels are used because it doesn't need to do a load-read-unload > cycle for every tape, but I'm not sure about that. If that's true (which I *can't* vouch for, having been stuck in chg-manual land till now :-), it would also mean that the barcode reader saves wear and tear on the tapes. ISTM that every load-read-unload to check the tape label should count as a "use" for purposes of deciding when to retire the tape -- 99.9% of the media wasn't touched, but if that first .1% goes bad, you've got problems. Does that make sense to people? If so, it seems like the barcode reader could pay for itself fairly quickly -- and in a far more measurable way than just convenience. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / It must be said that they would have sounded better if the singer wouldn't throw his fellow band members to the ground and toss the drum kit around during songs. - Patrick Lenneau
Re: How to fix annoying break in tape sequence?
For the benefit of the archives (I know you've solved your recent problem): On Mon, Dec 01, 2003 at 04:49:50PM +, Dave Ewart wrote: > On Monday, 01.12.2003 at 16:32 +, Tom Brown wrote: > > [...] alter the config/tapelist file so that the required OurName-C-Mon > > is at the bottom (although this is less desirable) > > Are you sure? I have read that altering tapelist has no effect, since > it tapelist is an end-result file, not a "read-at-start" config file ... This is incorrect; editing tapelist *will* affect future runs. That said, the suggestion still has a problem. Simply moving OurName-C-Mon to the bottom (making it swap places with the "skipped" tape, OurName-B-Fri) will work for that Monday's run, but you'll have to do it again on Tuesday night, and every night until OurName-B-Fri cycles around again. I suppose you could move the OurName-B-Fri entry up to the top, to make it look as though the tape had been used in its proper sequence, but I'd be *very* reluctant to do that without carefully thinking through the ramifications. Reducing "tapecycle" for the duration is certainly cleaner, quite possibly safer, and in the end, probably easier. N.B.: In a tapelist record, I can't recall offhand whether it's the date field, or the record's physical position within the file, that Amanda actually cares about. Perhaps both. So to be safe, "moving" a tapelist entry should probably consist of both: - physically moving the line to the appropriate position - editing the line's date so that it sorts properly into its new location -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / It must be said that they would have sounded better if the singer wouldn't throw his fellow band members to the ground and toss the drum kit around during songs. - Patrick Lenneau
Re: Bad tape or worse news?
On Wed, Dec 03, 2003 at 01:32:11AM -0800, Jack Twilley wrote: > *** A TAPE ERROR OCCURRED: [writing label: short write]. How much was written to the tape before this message appeared? Anything at all? Look for a line like this in the NOTES section: taper: tape twilley008 kb 12033408 fm 8 writing file: short write > This is showing up in my messages file: Here's my interpretation: > Dec 3 01:08:19 duchess kernel: (sa0:ahc0:0:6:0): WRITE FILEMARKS. CDB: 10 0 0 0 2 0 I believe this is the SCSI command that was being attempted when a SCSI error occurred. > Dec 3 01:08:19 duchess kernel: (sa0:ahc0:0:6:0): CAM Status: SCSI Status Error > Dec 3 01:08:19 duchess kernel: (sa0:ahc0:0:6:0): SCSI Status: Check Condition There was a SCSI error. > Dec 3 01:08:19 duchess kernel: (sa0:ahc0:0:6:0): DATA PROTECT asc:27,0 > Dec 3 01:08:19 duchess kernel: (sa0:ahc0:0:6:0): Write protected This is the specific error code. ASC is "additional sense code"; there's also an ASCQ, "additional sense code qualifier". I'm guessing that 27,0 are the ASC and ASCQ resp. You can look these up at: http://www.t10.org/lists/asc-num.htm The above is the kernel reporting the raw data from the hardware. The rest is the kernel's interpretation. > Dec 3 01:08:19 duchess kernel: (sa0:ahc0:0:6:0): Unretryable error > Dec 3 01:08:19 duchess kernel: (sa0:ahc0:0:6:0): failed to write terminating > filemark(s) > Dec 3 01:08:19 duchess kernel: (sa0:ahc0:0:6:0): tape is now frozen- use an > OFFLINE, REWIND or MTEOM command to clear this state. So what we have is that the tape drive is refusing to write a file mark, because it says that the tape is write-protected. The most likely possibility, of course, is the write-protect switch on the tape. But it could also be a problem with the drive -- maybe something as simple as a fluffball covering an optical sensor. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / It must be said that they would have sounded better if the singer wouldn't throw his fellow band members to the ground and toss the drum kit around during songs. - Patrick Lenneau
Re: Update: Getting the Drive to Work...
Random thoughts: On Tue, Dec 09, 2003 at 10:09:28AM -0500, Josiah Ritchie wrote: > ># ./filltape > >dd: writing to `/dev/nst0': Input/output error > >285760+0 records in > >4464+0 records out > >dd: closing output file `/dev/nst0': Input/output error > >Command exited with non-zero status 1 > >0.28user 1.74system 18:35.34elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k > # bin/filltape > 6491712+0 records in > 101433+0 records out > 5.67user 33.95system 34:41.58elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k The second run got a lot higher data rate -- 48.7 output records per second, as opposed to 4.0 the first time. Changing the SCSI ID should *not* have done that if, as you say, the tape drive and the SCSI adapter are the only devices on the bus. (Even if there were an ID collision, I'm not sure it would have manifested as a 12-times slowdown -- but then I'm not sure it wouldn't have done, either.) > "Kernel Panic: for safety > In interrupt handler -- not syncing" > > Googling around seems to suggest that this is a problem with the card/the rest > of the hardware, That's what it smells like to me. > I've noticed that there is an aic7xxx_old driver. Maybe I should be using > that instead? Its an AHA2940/aic7870. At this point, I suspect you'll get much better answers on a mailing list for your O/S. > Is there any issue with both having > parity checking on If the card and drive both support parity, absolutely you should turn it on. Enabling it on only one of them won't work, but if both devices support it, having it enabled is a lot safer than not. > Can the SDT-9000 provide the term power to > the terminator on the end of the cable or is that just for the built-in > terminator? These are some thoughts that are floating in my head. I also noticed > that the syncing is initialized by the card in the card's bios is it possible > that this is the source of the issue? And these are subjects for a SCSI list, or see Gary Field's excellent SCSI FAQ: http://fieldhome.net:9080/scsi_faq/scsifaq.html There's also a SCSI FAQ at: http://scsifaq.paralan.com/ but I can't offer an opinion, having only just now discovered it. DON'T use the ones at faqs.org or rtfm.mit.edu; they're Gary Field's, but ancient versions, not updated since 1998. > >Card - Start of cable (auto terminate) Try setting the card's termination explicitly in its BIOS. I've read that sometimes auto-termination gets it wrong. > ># ./filltape > >dd: writing to `/dev/nst0': Input/output error > >[...] > >/dev/nst0: No such device or address OK, that second message suggests that the kernel somehow decided it no longer had a tape drive. Not sure about that, but it's not *completely* surprising under the circumstances. Definitely grounds for a reboot. > >Than I rebooted to make sure the thing reset... > > > ># bin/filltape > >dd: opening `/dev/nst0': Permission denied But this is just plain weird. A flaky controller should *not* have anything to do with the mode on a special file. Are you sure you ran this "filltape" as root (or as someone with write permission on /dev/nst0)? > >[...] > >/dev/nst0: No such file or directory And this is weird too. It looks as though the /dev/nst0 special file itself got deleted (as opposed to the underlying hardware appearing to go away, which is what the earlier "no such device or address" suggests.) > >The jumpers on the back look like this: > >Description setting > >--- --- --- --- --- --- --- > >Disable Compression off As an aside, for Amanda it's generally better to run with compression disabled (for details, see many threads in the archives). > >SCSI ID Settings above should be ID 6, but there are no other devices on the > >system (except maybe the card itself if that has an ID). Is it safe for me to > >set it to SCSI ID 0? I went ahead and did that and it booted up okay. The only requirement is that no two devices may have the same ID. The card *does* have an ID, btw -- probably 7; that's the traditional value. SCSI gives priority to the device with the higher ID, if both want the bus at once. That's why tapes are often at ID 6, so that they're less likely to be starved for data, which in turn means that they're more likely to keep streaming. See "How should I set the IDs of my devices?" in Gary Field's FAQ. But with only the two devices on the bus, priority isn't an issue, so 0 and 6 are equally good. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / It must be said that they would have sounded better if the singer wouldn't throw his fellow band members to the ground and toss the drum kit around during songs. - Patrick Lenneau
Re: degraded mode
On Fri, Dec 12, 2003 at 10:11:51AM -0500, Joshua Baker-LePain wrote: > In your situation, I would setup the config that backs up the server > itself to not use the holding disk. On a local only config, it doesn't > really buy you much speed, so why bother? A holding disk still buys you parallelism. Sure, a local full backup can provide bytes faster than a typical tape drive can consume them. But on an incremental, dump or (especially) gtar can spend a lot of time searching for the next file to back up. During that time, bytes aren't being copied; in a direct-to-tape dump, your tape drive sits idle; with a holding disk, the drive could have been writing a different DLE. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / It must be said that they would have sounded better if the singer wouldn't throw his fellow band members to the ground and toss the drum kit around during songs. - Patrick Lenneau
Re: gnutar version -- exactly 1.13.25 or just 1.13.25 and above?
On Tue, Dec 16, 2003 at 05:51:51PM -0500, Jon LaBadie wrote: > ** there still are no [gtar] point releases listed from 1.13.25 - 1.13.89 :)) > yet 1.13.90 was followed five weeks later with 1.13.91 and a week > after that 1.13.92. I've seen this before. It seems to be a new(ish) convention for labelling betas, the idea being that they're counting up towards a 1.14 release. I'm not sure what characteristics distinguish a point release (final component starting with a low digit) from a beta (yada yada high digit) -- or what they do if they need another beta after 1.13.99. I suppose both of those vary per-project. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / Furthermore, if you do not like any of [the many standards from which to choose], you can just wait for next year's model. - Andrew Tanenbaum
Re: database01 /export lev 0 FAILED 20031222[could not connect to database01]
On Mon, Dec 22, 2003 at 01:22:27PM -0500, Gene Heskett wrote: > On Monday 22 December 2003 12:23, Dean Pullen wrote: > ># /bin/sh ../config/mkinstalldirs /usr/bin/man/man8 > > As root, move my script out of there, and do a make clean. If the configuration is botched, "make distclean" is better, since "make clean" does NOT clean up the files created by "configure". (Note that most of what Gene's script does is to run "configure" with the right options.) On Mon, Dec 22, 2003 at 06:00:31PM -, Dean Pullen wrote: > Ok I added --mandir=/usr/share/man to the configure params and it > successfully 'make install'. The --mandir option is provided to let you put the man pages in an unusual place. Be aware that using it to force them to go where they should have gone anyway, is patching around the underlying problem, not fixing it. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / It must be said that they would have sounded better if the singer wouldn't throw his fellow band members to the ground and toss the drum kit around during songs. - Patrick Lenneau
Re: amanda server on sco 5.0.6
On Mon, Dec 22, 2003 at 04:52:27PM -0500, Kurt Yoder wrote: > I get some warnings (I culled these from the other configure messages) > > configure: WARNING: *** You do not have gnuplot. Amplot will not be > installed. This is a package dependency. Amplot relies on gnuplot, so if you want the former, you gotta install the latter. If you don't care about amplot, no problem. (I have no opinion as to whether or not you *should* care about amplot :-) > configure: WARNING: `cc' requires `-belf' to build shared libraries Ok, what this says about the compiler couldn't be clearer; but as for what it means for configure, I haven't a clue :-/ Is it telling you to add "-belf" to CFLAGS? Why can't configure do so itself? Or did it add them already (in which case, why's it bothering you with the warning?)? If all else fails, you could always --disable-shared. But don't do that yet; we don't yet know if it's needed. > configure: WARNING: netinet/ip.h: present but cannot be compiled > configure: WARNING: netinet/ip.h: check for missing prerequisite > headers? > configure: WARNING: netinet/ip.h: proceeding with the preprocessor's > result Ignore this. It's not something users need to worry about -- it's more a message for the Amanda developers. (If you're curious, see http://groups.yahoo.com/group/amanda-users/message/45004). > configure: WARNING: *** No readline library, no history and command > line editing in amrecover! As with the gnuplot message, this means that an optional prerequisite couldn't be found, but that configure has worked around it by suppressing some functionality. Whether it's worth fixing is up to you. None of these configure warnings has anything (that I can see) to do with tape changers. > Can I just remove all references to changer-src from the configure > script? I don't need a tape changer. Would I break amanda-server if > I tried to compile without changer-src? Not sure; try it and see... -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / It must be said that they would have sounded better if the singer wouldn't throw his fellow band members to the ground and toss the drum kit around during songs. - Patrick Lenneau
Re: Dump Vs Tar tradeoffs (if any)
On Tue, Dec 23, 2003 at 11:42:14AM -0500, Henson, George Mr JMLFDC wrote: > What are the advantages or disadvantages to using tar instead > of dump? (This is partially brief repetition, but also contains new points.) In dump's favour: - The estimate phase is faster - It doesn't change any of the timestamps of files it's backing up (tar doesn't change mtime either of course, but can't avoid changing either atime or ctime (actually, I recently read that Solaris provides a way, if you're root, but I don't know whether GNU tar takes advantage of it) - You can do interactive restores natively. (amrecover gives you the same functionality, regardless of dump vs. tar, so this difference *only* applies if Amanda isn't in the loop at restore time, or if you don't have the index files, which amrecover requires.) - Dump programs are customized to the local file system's idiosyncracies. I'm guessing (but don't know) that this means that dump can back up system-dependent metadata that tar has no clue about (ACLs, Linux ext2 "chattr" flags, FreeBSD's "chflags" variant thereof, and the like) In tar's favour: - You can exclude files - You can split a partition into multiple DLE's. This is necessary if you have partitions larger than will fit on a single tape, since Amanda can't split a single dump onto multiple tapes (not yet anyway; work is in progress, hooray!). - Dump is reported to be undependable on Linux -- Linus says so, anyway. (He has a thing against dump, so doesn't see that as a problem, but IMO it's because Linux has deviated from standard UNIX in undesirable ways. Regardless of blame, though, it's an issue to be dealt with.) - Backups are portable. The downside of every dump being customized to its file system is that you very likely can't restore a dump from platform X using platform Y's "restore". I've never tried cross-file-system restores on the same box (restoring from a Solaris VXFS dump onto a Solaris ufs partition, for example), but I imagine that whether you can get away with it depends on the specific combination and the specific platform. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / It must be said that they would have sounded better if the singer wouldn't throw his fellow band members to the ground and toss the drum kit around during songs. - Patrick Lenneau
Re: Scheduled backup being "prevented"?
On Wed, Jan 07, 2004 at 10:22:27AM -0500, Ken D'Ambrosio wrote: > Why the heck would we get "no estimate?" Take a look at the debug files on that client, especially the relevent sendsize.TIMESTAMP.debug. > Why would it say "Preventing bump... as directed"? Looks as though someone did "amadmin force-no-bump" on that DLE. The two symptoms seem unrelated, but there might be some connection I'm not aware of. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / It must be said that they would have sounded better if the singer wouldn't throw his fellow band members to the ground and toss the drum kit around during songs. - Patrick Lenneau
Re: extremely varied tape write rates?
On Fri, Jan 09, 2004 at 11:33:26AM -0500, Kurt Yoder wrote: > >DUMPER STATS TAPER STATS > > HOSTNAME DISKL ORIG-KBOUT-KB COMP% MMM:SSKB/s MMM:SSKB/s > > -- --- > > borneo.shc -corporate25821120 12795031 49.6 67:34 3155.9 45:57 4641.0 > > borneo.shc -cs_shared18595830 10108659 54.4 55:42 3024.9 145:29 1158.1 > > borneo.shc -_shared_212075330 8126030 67.3 40:23 3353.9 30:27 4447.4 > > britain.sh /shared01 25085110 12979245 51.7 37:29 5771.6 172:00 1257.7 > > sumatra.sh //java/c$ 7488930 7488930--70:04 1781.5 22:30 5548.0 I can't see anything here that distinguishes those two DLE's from the other three: - All five seem to have used the holding disk (in each case, the dump time and tape time are different) - As you remarked, some of Borneo's DLEs are fast, but one is slow, so it doesn't look client-specific - They're neither the biggest nor the smallest DLEs Are you sure there's nothing different about the slow DLEs' dumptypes that could account for it? Another thought: One thing that isn't shown in the report email is the order in which the DLEs were taped. Try digging through the amdump or log file to find that out (and, indeed, the time of day for each one). I don't know what you'll find, but there might be an interesting pattern. What I'm wondering about is resource contention on the Amanda server; e.g. a news expire that kicks in part-way through the amdump run, and thrashes the drive that contains your holding disk. Or something that saturates the bus your tape drive's on. Or any number of other possibilities. For that theory to fit your observations, the same DLEs would have to be taped at about the same time every run, which doesn't seem very Amanda-like but could perhaps happen under the right conditions. It might be interesting to look at the logs from more than one run, to see what varies and what doesn't. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / It must be said that they would have sounded better if the singer wouldn't throw his fellow band members to the ground and toss the drum kit around during songs. - Patrick Lenneau
Re: Ramifications of dump cycle and number of tapes choices
On Mon, Jan 12, 2004 at 06:56:25PM -0600, Fran Fabrizio wrote: > [Assuming dumpcycle=7, runspercycle=7, runtapes=1, tapecycle=33:] > Are the following statements correct? > > 1. We will have a full dump of any given partition somewhere on the most > recent 7 tapes Yes. As you've figured out, precisely where is up to Amanda. > and that to restore a file to any given date in the past > 7 days 27 days, actually (looking ahead to question #2). The file might not be in the most recent dump cycle, but whichever cycle it's in is still only 7 tapes long... > would take at most 7 tapes/steps but on average 3.5 tapes/steps. In your case, I think the numbers are 4 and very roughly 2, resp. Amanda doesn't always "bump" a given DLE to the next dump level; whether it does is determined by how much tape it would save by doing so (see amanda.conf parameters "bumpsize", "bumpmult", and "bumpdays"). With your parameters, and with the default bumpdays value of 2, the maximum number of tapes for a restore is 4: Day: 1 2 3 4 5 6 7 8 Level: 0 1 1 2 2 3 3 0 (With bumpdays=1, you'd be right; the maximum would be 7.) As for the average number of tapes... The vast majority of DLEs never get above level 1 or 2 in my experience; their dump histories look more like "011011..." or "01101112...". Thus, as an estimate of the average number of tapes to be read for a restore, (maximum_possible_dumplevel/2) is on the high side (in your case, max/2 = 2 is pretty close, but that's purely by accident). (maximum_actual_dumplevel/2) is probably low; the DLE spends a lot more days at the maximum dump level than it does climbing up to it. There are too many variables, I think, to estimate the average number of tapes without looking at your actual dump history, though the two formulas above might serve as *very* rough bounds. My estimate of 2 for your average is very much back-of-the-envelope guesswork: - for all those DLEs that never get above dump level 1, the average is 1.9 ((1 + 2 + 2 + 2 + 2 + 2 + 2) / 7). - for the few DLEs that stop at dump level 2, the average is 2.4 ((1 + 2 + 2 + 3 + 3 + 3 + 3) / 7), but only approximately; the DLE might bump to level 2 later, making the average a bit lower - for the *very* few DLEs that go to level 3 or above, the average will be higher still - but the level-1 DLEs outnumber all the rest combined, so 2 seems as good a guess as any But then, all of this pseudomathematical blather only applies to restores of many files (full-DLE disaster recovery, or a user-requested restore of an entire directory). For a single-file restore, you should only to need to read *one* tape. Amrecover will figure out in advance which tape the desired file lives on, so it won't need to search the rest. > 2. We can retrieve a file as it existed on any date in the past 27 days, > and possibly as it existed on days 28-33 This looks right to me. > Hence, it seems you are guaranteed to be able to retrieve any > file as it existed 27 or less days ago. "Almost guaranteed" :-( Be aware that in a panic situation, a full backup can be postponed to a run after the one where it should have been done ("delayed" is the word Amanda actually uses). If that happens, there will come a day or two, a month or so hence, when you can't quite meet your 27-day guarantee. Amanda tries hard to avoid delaying full backups, but it can happen, due to things like tape errors, operator failing to mount the right tape (shouldn't be an issue for you), tape filling up before it was expected to, and possibly other circumstances that aren't coming to mind right now. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / It must be said that they would have sounded better if the singer wouldn't throw his fellow band members to the ground and toss the drum kit around during songs. - Patrick Lenneau
Re: planner schedule ?
On Wed, Jan 14, 2004 at 03:33:02PM -0500, Brian Cuttler wrote: > I'm not sure its "planner", what component handles which order DLEs > are handled in ? Planner and driver both contribute, I believe: - Planner offers a suggested ordering (e.g. it gives priority to a DLE that's overdue for a level 0) - Driver determines dynamically which dump(s) it can start at any given time, based on your "dumporder" amanda.conf parameter, the ordering suggested by planner, and on available resources -- holding disk space, network bandwidth, number of dumps already running on that client, etc. (N.B.: Once a dump has been started, it runs flat-out; Amanda does no further throttling.) > I've got a partition of 54Gig, of which about 36 Gig are in use. > I've an amanda spool of 35 Gig. Fortunately for us compression > seems to be working pretty well as we are able to use the work/spool > area rather than going directly to tape. > > Got me wondering though, is amanda saving this DLE for last so that > it can utilize all of the spool area by itself or where we just lucky ? You're *not* just lucky. I don't think Amanda makes a point of specifically "saving this DLE for last" (unless your "dumporder" tells it to); that's just how things work out in your case. But it does make a point of saving the DLE until there's holding-disk budget for it -- and, once the DLE in question has been started, Amanda holds off starting others if they'd overbook the holding disk. (It'd be driver making that sort of decision, btw, not planner.) > Since this DLE is running 'alone', are we really gaining any performance > over running in degraded mode ? I presume you mean "over running in direct-to-tape mode"; degraded mode is something different. There probably is a gain. When you dump to disk, taper is typically (these days, I'd venture to guess almost certainly) able to provide data (from the holding-disk file) as fast as the tape drive needs it; but when you dump direct to tape (especially over the network), that might not be possible. In the latter case, the tape transport has to stop and start as data becomes available -- some people here call that "shoeshining the drive". That reduces the overall transfer rate, and depending on the tape technology, possibly (probably?) the amount of data that will fit on the tape. Besides, now that you've added a bunch of spool space, the DLE will no longer necessarily be running alone... -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / It must be said that they would have sounded better if the singer wouldn't throw his fellow band members to the ground and toss the drum kit around during songs. - Patrick Lenneau
Re: dump incremental level not greater than 1
On Thu, Feb 05, 2004 at 04:15:24PM +0100, [EMAIL PROTECTED] wrote: > in our environment there is one linux system (SuSE 8.1 amanda-2.4.4-41) > where amanda not increase the dump level to 2,3... Presumably the bumpsize/bumpmult/bumpdays criteria are never satisfied. See the documentation of those three amanda.conf parameters in amanda(8). -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / It must be said that they would have sounded better if the singer wouldn't throw his fellow band members to the ground and toss the drum kit around during songs. - Patrick Lenneau
Re: Concurrancy - dumping partitions
On Tue, Feb 03, 2004 at 10:40:45AM -0500, Brian Cuttler wrote: > At this point we discovered "netusage" was set to a very low value > (1200 though I'm uncertain of the units). The units are bytes/second (*not* bits/second), adjusted of course by whatever multiplier you specify. > Note 1 on this, I was surprised that with server = client this was > a factor. Amanda has a more general way of dealing with this. You can put multiple "interface" sections in amanda.conf, with different capacities; then specify for each DLE which interface its data will travel over. (The "interface" sections are intended to correspond to the NICs in your server, but Amanda doesn't try to enforce this.) So you can just create an "interface" section with a really high capacity, and use that for the server's DLEs. As an aside, I recently set our Amanda server's netusage way higher than its NIC's capacity. (a factor of ten, to be specific; that's how I discovered what the units are :-) The Dump Time reported in the nightly emails went through the roof, but the interesting part is that the Run Time wasn't much affected -- maybe even a small decrease, though I don't recall for sure. That surprised me at first, but it makes sense with a bit of thought. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / It must be said that they would have sounded better if the singer wouldn't throw his fellow band members to the ground and toss the drum kit around during songs. - Patrick Lenneau
Re: Concurrancy - dumping partitions
On Fri, Feb 06, 2004 at 04:29:36PM -0500, Eric Siegerman wrote: > You can put > multiple "interface" sections in amanda.conf, with different > capacities; then specify for each DLE which interface its data > will travel over. I should have made clear that all this is purely descriptive, not *pre*scriptive. That is, whatever you do with "interface" sections, DLEs, etc., will *not* affect which NIC the packets actually travel through. That's well beyond Amanda's control; it's up to the kernel's TCP/IP stack as usual. The only purpose here is to tell Amanda what's going on, so that it can budget the various NICs' bandwidth appropriately. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / It must be said that they would have sounded better if the singer wouldn't throw his fellow band members to the ground and toss the drum kit around during songs. - Patrick Lenneau
Re: Encryption
[CCing the author of the web page in question, in case she's no longer subscribed] On Tue, Jan 27, 2004 at 09:19:26AM +0100, Paul Bijnens wrote: > Long ago, I bookmarked this page: > > http://security.uchicago.edu/tools/gpg-amanda/ > > but I never tried it myself... Long ago, it seems I had a small amount of input into it. Funny how that happens :-) But I've never tried it either. Looking at it now, I see that the basic approach is to have the Amanda client do compression, with the "compression" program (GZIP= environment variable to "configure") being a script that, during backups, does essentially "gpg -e | gzip", and during restores does the inverse. The gzip step in this pipeline is a pointless waste of CPU time, and may make the backup *larger*; by design, encrypted data is supposed to resemble random data, and so it compresses very poorly. (If yours compresses well, I'd be *very* worried about how secure your encryption is!) Try removing the gzip step entirely, and reducing it to just the gpg command. (gpg compresses the data internally, for the sake of better encryption, so putting a gzip step *before* gpg is equally pointless.) -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / It must be said that they would have sounded better if the singer wouldn't throw his fellow band members to the ground and toss the drum kit around during songs. - Patrick Lenneau
Re: suggestions for a backup scheme?
On Fri, Jan 23, 2004 at 09:37:23AM -0500, Greg Troxel wrote: > Then, have a cron job that copies in either amanda.conf.full or > amanda.conf.incr to amanda.conf before the dump runs. Our approach is to have multiple Amanda configurations, with the same disklist (using symlinks). The crontab entry runs a script that in turn runs amdump with the appropriate "configuration" argument. > With this (to answer Jon's query), you meet your imposed requirements, > and you still get multi-machine backups, holding disk, concurrency, > and the ability to find backups. Making amanda do what you want is > vastly easier than writing your own scripts from scratch. Agreed. There's a lot more to Amanda than just its scheduling policy, and it does not seem to me to be prima facie ridiculous to want to use it with a different policy, and to be frustrated at how hard that can be. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / It must be said that they would have sounded better if the singer wouldn't throw his fellow band members to the ground and toss the drum kit around during songs. - Patrick Lenneau
Making two copies of a backup
[I'm CCing amanda-hackers because the answer to my question might depend heavily on Amanda internals; but the discussion doesn't belong there, so please reply to amanda-users.] I want to make two identical copies of an Amanda backup. This is a one-off thing -- archival backups of a client that's about to be wiped clean and repurposed. If it were an ongoing need, I'd ask for budget for a second tape drive and learn about rait. I have enough holding-disk space to hold all of the client's DLEs at once, so what I'm thinking is this: 1. Build a new Amanda configuration that backs up only the client in question (reserve=0; also record=no to be on the safe side) 2. Run the configuration with no tape in the drive, forcing all the (full) backups to holding disk 3. Hard-link the holding-disk files to another directory that Amanda doesn't know about 4. Run amflush 5. Hard-link the holding-disk files back to the Amanda spool directory (it's pure paranoia that I choose not to mv them instead and thus dispense with step 7) 6. Run amflush again 7. Delete the holding-disk files from the other directory Does this look like a reasonable approach? My main worry is that the curinfo database and multiple amflush's of the same data won't get along with each other. Is that likely to be a problem? -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / It must be said that they would have sounded better if the singer wouldn't throw his fellow band members to the ground and toss the drum kit around during songs. - Patrick Lenneau
Re: amverify fails for large (?) samba share
On Wed, Feb 11, 2004 at 04:06:15PM +0100, Vincent GRENET wrote: > [error from amverify] > The amanda server runs RHS 7.3, amanda 2.4.2p2, gnu tar 1.13.19, samba > 2.0.7. Is that gtar old enough to be a problem, or is 1.13.19 known to work? -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / It must be said that they would have sounded better if the singer wouldn't throw his fellow band members to the ground and toss the drum kit around during songs. - Patrick Lenneau
Re: "Unable to create temporary directory" message
On Wed, Feb 11, 2004 at 05:16:59PM -0600, Fran Fabrizio wrote: > ? Unable to create temporary directory in any of the directories listed > below: > [...] It's ufsdump saying this. The leading "?"'s are a clue -- Amanda prepends them to messages from the dump program that it doesn't recognize (they ultimately appear in the report message, labelled as "strange"). I confirmed it by finding the message in the output of "strings /usr/sbin/ufsdump". I don't know why it's happening. Does ufsdump run as the Amanda user or as root? (Check the first couple of lines on one of the sendbackup.* debug files). If the former, well, you've checked that that should be ok. If it's root -- about the only reasons I can think of that root couldn't create a file is if the filesystem's mounted readonly (totally bogus for /tmp of course!) or NFS-mounted and the permissions don't allow *world* write (again very weird for /tmp and probably broken, but just barely conceivable). -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / It must be said that they would have sounded better if the singer wouldn't throw his fellow band members to the ground and toss the drum kit around during songs. - Patrick Lenneau
Re: big filemark
On Thu, Feb 12, 2004 at 12:14:45PM -0500, Jon LaBadie wrote: > Well I think we have a new champion filemark. > filemark 534180 kbytes That's only 1.9% away from 512 (binary) MB -- I suspect within amtapetype's margin of error. I wonder if that drive writes fixed 512 MB physical blocks, doing the appropriate buffering internally. Not that it matters much, as Jon said; just idle curiosity. -- | | /\ |-_|/ > Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / It must be said that they would have sounded better if the singer wouldn't throw his fellow band members to the ground and toss the drum kit around during songs. - Patrick Lenneau