Re: Will retry behavior again
On Thu, Sep 01, 2005 at 02:40:14PM -0400, [EMAIL PROTECTED] wrote: Also, I'm positive, that we had enough space for holding disk to hold this particuar FS. Why did it start directly to tape (log's the very first line)? Have you adjusted the holding disk reserve parameter? By default it reserves 100% for incrementals in case of degraded operation (eg. no tape available). In my case, the holding disk can store several days worth of normal backups, full and incremental. So I've adjusted the reserve down to 10 or 20%. -- Jon H. LaBadie [EMAIL PROTECTED] JG Computing 4455 Province Line Road(609) 252-0159 Princeton, NJ 08540-4322 (609) 683-7220 (fax)
Re: Will retry behavior again
Actually it was set to 0, since I it suppose to be archival dump(no incremental) Jon LaBadie [EMAIL PROTECTED] wrote: On Thu, Sep 01, 2005 at 02:40:14PM -0400, [EMAIL PROTECTED] wrote: Also, I'm positive, that we had enough space for holding disk to hold this particuar FS. Why did it start directly to tape (log's the very first line)? Have you adjusted the holding disk reserve parameter? By default it reserves 100% for incrementals in case of degraded operation (eg. no tape available). In my case, the holding disk can store several days worth of normal backups, full and incremental. So I've adjusted the reserve down to 10 or 20%. -- Jon H. LaBadie [EMAIL PROTECTED] JG Computing 4455 Province Line Road(609) 252-0159 Princeton, NJ 08540-4322 (609) 683-7220 (fax)
Re: Will retry behavior again
[EMAIL PROTECTED] wrote: ... taper: wrote label `weekly3' date `20050901' dumper: kill index command driver: result time 39992.417 from dumper0: FAILED 00-00029 [data write: Connection reset by peer] driver: result time 39992.417 from taper: TRY-AGAIN 00-00029 [writing file: No space left on device] driver: error time 39992.429 serial gen mismatch ^^^ -- so, AMANDA advanced to the next tape, the taper request to retry was actually made, but serial gen mismatch has happend. Driver keeps a table of which dumper is handling what. When it receives a command it also checks the table dumper-number-to-current-filesystem-it-is-handling. The 00029 is the generation number. Fine. First it receives a FAILED from dumper0, and thus it frees the table entry, effectively setting generation number to 0, to indicate it's doing nothing. Less than a microsecond later, it receives a TRY-AGAIN from taper, which is referring to the same generation number, but which was freed just before. So amanda says that it received a command for which the generation number did not match. OK, that explains the error message and what it means. The strange thing above seems the order of the events. When bumping into EOT, I would expect the sequence: - First taper bumps into end of tape: taper: TRY-AGAIN 00-00029 [...No space left on device] - then driver says to port-dumper: kill whatever you're doing driver: ABORT 00-00029 This command is missing above!!! dumper: kill index command But the kill index comes in first, followed by driver saying it failed here, then followed by taper saying tape is full. From this sequence, it seems amanda made the correct decision to not try again what taper instructed, because dumper signalled a fatal error first. Why would that happen??? I don't know. I searched google for the phrase - and did not find anything helpful about this error. Does anybody know what is this error means and how to deal with it? Also, I'm positive, that we had enough space for holding disk to hold this particuar FS. Why did it start directly to tape (log's the very first line)? Shot in the dark: maybe a holdingdisk no in the dumptype? See the output of amadmin weekly disklist hercules. Another shot in the dark: what version of amanda is the server? Older versions had also a notion of negative chunksize: dumps larger than the absolute value of chunksize were portdump too. Maybe you have a negative chunksize? The same older versions of amanda could also port-dump when chunksize was omitted. This is all from memory I don't even have an old man page around (except in my archive backups :-) Amanda 2.4.2 already has no more support for negative chunksizes (and warns if you use them). -- Paul Bijnens, XplanationTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, F6, * * quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ...* * ... Are you sure? ... YES ... Phew ... I'm out * ***
Re: Will retry behavior again
Paul, we are running amanda v.2.4.5, on FC3. These are forced full dumps (dumplevel 0). I do not use negative chuncksize, but I do use negative "usedisk" (when you set how much of a disk must stay un-occupied by disk images). Do you think it may be somehow related? Also, I double-checked for "reserve" (set to 0) and "holdingdisk" was default (meaning I did not touch it at all before this last run, and now I set itexplicitly to "yes"). We also cleared some HDD space, and I'm going to create a second holding disk. Is there anything you (all) can think ofI must do before the next weekly run? The strange thing above seems the order of the events. When bumping into EOT, I would expect the sequence: - First taper bumps into end of tape: taper: TRY-AGAIN 00-00029 [...No space left on device] - then driver says to port-dumper: kill whatever you're doing driver: ABORT 00-00029 This command is missing above!!! dumper: kill index command Here is what I found strange: 1) it was taping to the tape 2(weekly2), 2) when received the NODEVICE, adjusted to weekly3, which is a new tape(!), no dumps were done to it so far, and then 3) kill index-FAILED-TRY AGAIN-mismatch. I would expect the sequesnce to be 1) tape: weekly2, NODEVICE(?), FAILED 2) load tape weekly3 3) TRY AGAIN-whatever... The same happend for another DLE, when amanda adjusted from weekly1 to weekly2 - that DLE was has failed as well. The saddest part is that the biggest DLE are tend to fail. I really do not want to splitthis configuration into two, and currently thinking about writing a script which would check if any DLE has failed, adjust the configuration and force the dump on the same day. I realize this sounds awkward, but I do not see any other solution. Do you? Thanks Vera