Re: Amanda is crashing on [runtar invalid option: -]
On Thu, Mar 23, 2023 at 2:07 AM Olivier wrote: > Hello, > > I have had Amanda running for over a decade, yesterday I had no issue at > all but last night, my backups for Ubuntu machines started crashing > consistently with the error: > strange(?): runtar: error [runtar invalid option: -] > The just-released amanda package upgrade seems to have a regression for GNUTAR DLEs; see: https://bugs.launchpad.net/debian/+source/amanda/+bug/2012536/ Nathan
Re: Degraded dump in amanda 3.5.2
On Fri, Dec 30, 2022 at 11:34:14 -0300, Pablo Venini wrote: > amcheck doesn't report errors Hmmm. As Stefan said, the key question is why Amanda is going into degraded mode. Normally when I have that happen it's because the target tape wasn't available at the start of the amdump run, but that doesn't seem to be the situation in your case. But it seems like buried somewhere in your logs should be some futher explanation of why Amanda is switching to degraded mode (apparently mid-run) Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Degraded dump in amanda 3.5.2
On Thu, Dec 29, 2022 at 17:17:56 -0300, Pablo Venini wrote: > Hi, I'm setting up a new backup server with amanda 3.5.2 on CentOS 7 > with vtapes. I've setup the vtapes directories, created the job and > added the dlc, checked permissions, then run amcheck and it shows no > errors. However when I run amdump, only one of the dlc gets backed > up, the other ones give a "can't do degraded dump without holding > disk" error. The config is: (I take it you are not attempting to configure any holding disk?) What does "amcheck" report? Nathan ---- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: what leads to a "new disk" ?
On Thu, Dec 01, 2022 at 10:18:03 +0100, Stefan G. Weichinger wrote: > > I have an installation where I didn't add or remove DLEs for a long time. > > But now an then amanda seems to "forget" a DLE and come up with > something like: > > samba.intra rootfs lev 0 FAILED [dumps too big, 42606931 KB, but > cannot incremental dump new disk] > > The DLE is NOT new. Where does that come from? Looks like the source file server-src/planner.c generates that message if the "last_level" data element for the DLE is negative... What does "amadmin info " report for that DLE (during the period when you are getting this message, i.e. before the next successful full dump takes place)? Nathan ---- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: New Amanda Community release 3.5.2 Has Arrived! -- email bounce
On Tue, Sep 13, 2022 at 16:11:15 -0400, Nathan Stratton Treadway wrote: > On Tue, Sep 13, 2022 at 17:29:41 +0200, Stefan G. Weichinger wrote: > > I received: > > > > "Your message to chris.hass...@betsol.com couldn't be delivered. > > > > Chris.Hassell wasn't found at betsol.com." > > (I just tried sending email to this email address. My message was > accepted for delivery by the Betsol mail server, and I haven't received > any bounce message back [after waiting a few minutes]. So hopefully the > bounce you saw was just a temporary misconfiguration on the Betsol mail > server...(?) ) Ack -- I just discovered that my 9/13 test message did result in a bounce message after all. (The bounce message went to my spam folder.) I tried again just now, and his email address still bounced. So it would seem that Chris is indeed no longer at Betsol :( Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: New Amanda Community release 3.5.2 Has Arrived! -- email bounce
On Wed, Sep 14, 2022 at 13:32:08 +0200, Stefan G. Weichinger wrote: > Am 13.09.22 um 22:11 schrieb Nathan Stratton Treadway: > > >(I just tried sending email to this email address. My message was > >accepted for delivery by the Betsol mail server, and I haven't received > >any bounce message back [after waiting a few minutes]. So hopefully the > >bounce you saw was just a temporary misconfiguration on the Betsol mail > >server...(?) ) > > Interesting, thanks for testing. > > Did you get a reply already? No, I haven't received any reply... (but also have not received any bounce message). > Would be great to know if someone at Betsol is responding to the > community, and when we see installable packages from them. (Yep, agreed.) Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: New Amanda Community release 3.5.2 Has Arrived! -- email bounce
On Tue, Sep 13, 2022 at 17:29:41 +0200, Stefan G. Weichinger wrote: > I received: > > "Your message to chris.hass...@betsol.com couldn't be delivered. > > Chris.Hassell wasn't found at betsol.com." (I just tried sending email to this email address. My message was accepted for delivery by the Betsol mail server, and I haven't received any bounce message back [after waiting a few minutes]. So hopefully the bounce you saw was just a temporary misconfiguration on the Betsol mail server...(?) ) Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: New Amanda Community release 3.5.2 Has Arrived!
On Tue, Aug 02, 2022 at 12:30:07 -0400, gene heskett wrote: > And where do I get the debian approved versions of 3.5.2? That's what Jose is attempting to create now... (So watch this thread for news.) Nathan ---- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: "overdue" wrong
On Thu, Jun 02, 2022 at 10:04:16 +0200, Stefan G. Weichinger wrote: > Overdue 19140 days: server:dle007 [...] > Dumps: lev datestmp tape file origK compK secs > 0 19700101 vtape-007-1 14 -1 -1 -1 Sure enough, 1970/1/1 is 19145 days ago, so the two utilities are consistent :) > The DLEs are dumped OK, though. The amadmin info shows that this dump has no size (in addition to the "zero" date), so somehow the amanda history is not recording a successful dump Can you make sure sure that the dump was in fact written successfully all the way to tape? Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: amrecover usage with chg-robot
On Fri, May 27, 2022 at 15:54:50 +0200, Stefan G. Weichinger wrote: > > > forgot pstree: > > -tmux: server-+-bash---amrecover-+-amandad-+-amandad > >| | `-amindexd > >| |-amandad-+-amandad > >| | > `-amidxtaped-+-exuvo_crypt---openssl > >| | `-2*[{amidxtaped}] > >| |-gzip > >| |-tar > >| `-{amrecover} Ah, so euxvo_crypt is run by the amidxtaped process rather than by the amrecover process itself. What does strace show amrecover is doing during this period? And "ps -ef" shows that the openssl process is still alive (i.e. not defunct). What does "strace" show on that process. If you manually kill it, does the change of processes up through amidxtaped unwind and amrecover resume normal processing? Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: amrecover usage with chg-robot
On Fri, May 27, 2022 at 10:28:09 +0200, Stefan G. Weichinger wrote: > After that both tar and gzip.binary are shown as in ps, > whatever that means. Okay, that's a little progress in the investigation. "" means that the process has exited, but the return code from the process has not been read by the parent process yet. So in this case, whatever process spawned the tar and gzip subprocesses is not "noticing" when the subprocesses finish... the question is why (and what is it stuck doing instead of cleaning up)? Are the "openssl enc" and/or encription-wrapper-script processes still out there at this time (and what state are they in)? You should be able to use pstree or "ps -ef" to determine which process is the parent (PPID column) of the defunct subprocesses. Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: amrecover usage with chg-robot
On Wed, May 25, 2022 at 17:09:22 +0200, Stefan G. Weichinger wrote: > > So to me it looks that my dumptype with both compression and > encryption is the problem. > > I use the script provided by Anton "exuvo" Olsson, he shared it in > earlier threads here. > > The current iteration on this server: > > https://dpaste.org/2YrkJ > > Maybe it hasn't yet been tested with amrecover from multiple tapes? > > Or the combination with gzip is a problem. I haven't used encryption with Amanda so I don't have anything specific to suggest. Off hand I don't see anything obviously incorrect with that script (Well... in the encrypt-operation case it writes the contents of "$@" to /tmp/encryptparams file but that file doesn't ever appear to be referenced... but that parameters don't appear to be referenced in the decrypt-operation case either, so I don't expect that aspect of the script is related to your problem.) My next step would be to investigate the status of the subprocesses during the period where amrecover seems to be hung, i.e. using ps, lsof, and strace. I'm guessing the "openssl enc -d" process doesn't exit for some reason; can you identify what it's trying to do or waiting for? If you see that process out there but just sleeping (i.e. using no CPU, and strace shows it's just stuck waiting in a "read" syscall or something)), what happens if you manually kill the process (i.e. does amrecover the proceed to its next step)? Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: amrecover usage with chg-robot
On Wed, May 25, 2022 at 14:50:00 +0200, Stefan G. Weichinger wrote: > Currently I have another amrecover running. It restored from tape1 > .. and now I only see these lines in the current debug file > "amidxtaped.20220525123652.debug": > > Wed May 25 14:48:25.078884308 2022: pid 705002: thd-0x556f690aca00: > amidxtaped: > /usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:490 > 12472320 kb > Wed May 25 14:48:40.090891256 2022: pid 705002: thd-0x556f690aca00: > amidxtaped: > /usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:490 > 12472320 kb > Wed May 25 14:48:55.102880465 2022: pid 705002: thd-0x556f690aca00: > amidxtaped: > /usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:490 > 12472320 kb > > ... for hours now. > Is that OK? Maybe tar still "scans" through that first tarball on tape ... ? When doing an extract tar does read on to the end of the tar file before exiting, but "hours and hours" seems like a long time to wait for that... Is tar still running (e.g. what does "top" or "ps" show)? If so, what does strace on the tar process show? Do any other amanda (sub)processes exist on the system at this time? Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Problem with amtape "hanging" when forked from Java
On Tue, Mar 08, 2022 at 18:27:47 -0500, Robert Heller wrote: > For some unfathomably reason amtape "hangs" when forked from a Java program. > > I've written a Java program that goes through vaulted tapes and forks amtape > (using Runtime.getRuntime().exec(()), and when a non-existant tape label is > asked for, amtape "hangs". I cannot figure out why or how to get amtape to > just exit with an error (which I can then handle). Robert, did you ever resolve the problem you were having with amtape hanging? Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Problem with amtape "hanging" when forked from Java
On Thu, Mar 10, 2022 at 17:08:36 -0500, Robert Heller wrote: > (I have no interactivity configuration in any of my > configurations files, so it is presumably defaulting to being empty.) (See below...) > Here is the diff: > > *** amtape-java.debug 2022-03-10 16:14:52.556321620 -0500 > --- amtape-shell.debug2022-03-10 16:11:33.357521853 -0500 > *** > *** 1,87 > ! Thu Mar 10 16:13:15.068034312 2022: pid 19101: thd-0x55f1f8259c00: amtape: > pid 19101 ruid 34 euid 34 version 3.5.1: start at Thu Mar 10 16:13:15 2022 > ! Thu Mar 10 16:13:15.068101557 2022: pid 19101: thd-0x55f1f8259c00: amtape: > Arguments: -otpchanger=vault_changer -ointeractivity= deepsoft-normal label > examplevault > ! Thu Mar 10 16:13:15.068391863 2022: pid 19101: thd-0x55f1f8259c00: amtape: > config_overrides: tpchanger vault_changer > ! Thu Mar 10 16:13:15.068403765 2022: pid 19101: thd-0x55f1f8259c00: amtape: > config_overrides: interactivity > ! Thu Mar 10 16:13:15.068513641 2022: pid 19101: thd-0x55f1f8259c00: amtape: > reading config file /etc/amanda/deepsoft-normal/amanda.conf > ! Thu Mar 10 16:13:15.068536891 2022: pid 19101: thd-0x55f1f8259c00: amtape: > reading config file /etc/amanda/deepsoft-common/amanda.conf I had in mind your paging through the two files in separate windows side-by-side, eyeballing for differences in program activity. For "diff" to be much use you really need to strip off the front part of each line (datetime, pid, and thd), using something like: $ sed -e's/.*00: amtape:/amtape:/' amtape-java.debug > amtape-java.debug_clean $ sed -e's/.*00: amtape:/amtape:/' amtape_shell.debug > amtape_shell.debug_clean (and then running the diff on the _clean versions). However, looking through the listing you posted, I did notice something: > --- 1,104 > ! Thu Mar 10 16:09:04.258210052 2022: pid 18689: thd-0x55c92c51d000: amtape: > pid 18689 ruid 34 euid 34 version 3.5.1: start at Thu Mar 10 16:09:04 2022 > ! Thu Mar 10 16:09:04.258287566 2022: pid 18689: thd-0x55c92c51d000: amtape: > Arguments: -otpchanger=vault_changer deepsoft-normal label examplevault > ! Thu Mar 10 16:09:04.258592995 2022: pid 18689: thd-0x55c92c51d000: amtape: > config_overrides: tpchanger vault_changer > ! Thu Mar 10 16:09:04.258713649 2022: pid 18689: thd-0x55c92c51d000: amtape: > reading config file /etc/amanda/deepsoft-normal/amanda.conf > ! Thu Mar 10 16:09:04.258739621 2022: pid 18689: thd-0x55c92c51d000: amtape: > reading config file /etc/amanda/deepsoft-common/amanda.conf > ! Thu Mar 10 16:09:04.312355119 2022: pid 18689: thd-0x55c92c51d000: amtape: > pid 18689 ruid 34 euid 34 version 3.5.1: rename at Thu Mar 10 16:09:04 2022 > ! Thu Mar 10 16:09:04.322326242 2022: pid 18689: thd-0x55c92c51d000: amtape: > Disabling Amanda::Interactivity::stdin because STDIN is not readable So it does appear that some Interactivity based on the stdin.pm module was used by that particular invocation of amtape I'm still pretty confused about the exact behavior you are seeing (i.e. why it's hanging in the Java context, and why the attempt to pass in an empty -ointeractivity option doesn't seem to make any difference, etc.)... ... but since the stdin plugin is deprecated, it seems worth investigating where that's getting configured in the first place... I see from the above messages that amanda is reading both deepsoft-normal/amanda.conf and deepsoft-common/amanda.conf . (If it's not clear from those files where the default interactivity is being set, it might help for you to post the output of "amadmin deepsoft-normal config"...) Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Problem with amtape "hanging" when forked from Java
On Thu, Mar 10, 2022 at 14:03:07 -0500, Robert Heller wrote: > It prints an error message and returns an error status: > > backup@newserver:~$ amtape -otpchanger=vault_changer -ointeractivity='' > wendellfreelibrary label wendellfreelibrary-vault-030 > ERROR: Source Volume 'wendellfreelibrary-vault-030' not found > > (and does not hang) Hmmm. What happens from the command line if you leave off the -ointeractivity parameter? The strange thing is that the strace does seem to show a repeated looping over all the vtape slots, presumably continually searching for the specfied label but I didn't notice any activity that is obviously interactivity-related in between the loops. So it's not clear why amtape is looping within the Java context but immeidately terminating with an error message in the shell/tclsh contexts. > I'm *guessing* that the Java Process created by exec() wants to deal with the > > printout, but can't. > In the strace you posted, amtape appeared to be looping through the slots multiple times without attempting to write that "ERROR" message anywhere, so off hand I would guess amtape is getting put into a different search mode, rather than a problem with the Java side not being able to accept the output. I'm not sure what would trigger that different mode, though. I suppose my next suggestion would be to compare carefully the amtape .debug files between a run from the command line and a run from within Java -- any difference between the two could be a hint Nathan ---- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Problem with amtape "hanging" when forked from Java
On Thu, Mar 10, 2022 at 13:02:24 -0500, Robert Heller wrote: > At Thu, 10 Mar 2022 12:46:43 -0500 Nathan Stratton Treadway > wrote: > > > > > On Thu, Mar 10, 2022 at 09:55:30 -0500, Robert Heller wrote: > > > Here is the Java fragment: > > > > > > public class FlushOldVaults extends BackupVault { > > > private static final String AMTAPE = "/usr/sbin/amtape"; > > > private static final String AMTAPEOPT1 = "-otpchanger=vault_changer"; > > > private static final String AMTAPEOPT2 = "-ointeractivity="; > > > > You would probably be able to confirm this by looking in the amanda > > log/debug files for the amtape process (i.e. > > /var/log/amanda/server//amtape.*.debug on Ubuntu) , but I'm pretty > > sure that you do actually need the empty argument in order to disable > > the interactivity, something like > > private static final String AMTAPEOPT2 = "-ointeractivity=''"; > > It does not like the empty argument, amtape throws a error status and the > Java subprocess returns a failure status. It is "happy" with what I have, > except it hangs on the broken tape. Okay, sounds like the argument parsing is different in the Java .exec() context than on a shell command line. The important question is whether or not the interactivity is actually disabled perhaps the amtape .debug file gives some confirmation? > The timeout never times out. The amtape process goes into Sleep state and > the > Java program just hangs. In that case hopefully strace -p/lsof -p on the amtape process (or any other Amanda processes that amtape spawns) can give some hint as to what it's waiting for Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Problem with amtape "hanging" when forked from Java
On Wed, Mar 09, 2022 at 22:50:29 -0500, Robert Heller wrote: > At Wed, 9 Mar 2022 23:50:45 +0100 Exuvo wrote: > > > > > Could you give the exact command line you give when it hangs? > > /usr/sbin/amtape -otpchanger=vault_changer wendellfreelibrary label > wendellfreelibrary_vault-030 > What does this command say/do when run from the command line (for a tape that causes a hang in the context of your Java program)? Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Problem with amtape "hanging" when forked from Java
On Thu, Mar 10, 2022 at 09:55:30 -0500, Robert Heller wrote: > Here is the Java fragment: > > public class FlushOldVaults extends BackupVault { > private static final String AMTAPE = "/usr/sbin/amtape"; > private static final String AMTAPEOPT1 = "-otpchanger=vault_changer"; > private static final String AMTAPEOPT2 = "-ointeractivity="; You would probably be able to confirm this by looking in the amanda log/debug files for the amtape process (i.e. /var/log/amanda/server//amtape.*.debug on Ubuntu) , but I'm pretty sure that you do actually need the empty argument in order to disable the interactivity, something like private static final String AMTAPEOPT2 = "-ointeractivity=''"; (I am not particularly certain that interactivity is your specific problem, but it seemed a plausible explaination and one that that was fairly easy to test out...) > if (!p.waitFor(60L, java.util.concurrent.TimeUnit.SECONDS)) { > System.err.printf("*** FlushOldVaults.amtape(): process > timeout\n"); > String kill[] = new String[2]; > kill[0] = "/bin/kill"; > Long j = new Long(p.pid()); > kill[1] = j.toString(); > Process killproc = Runtime.getRuntime().exec(kill); > killproc.waitFor(); Note that in order to use the strace debugging, you'll probably need to disable this timeout+kill logic -- otherwise the amtape process won't hang out long enough for you to figure out what it's trying to do when when it's "stuck" Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Problem with amtape "hanging" when forked from Java
On Tue, Mar 08, 2022 at 18:27:47 -0500, Robert Heller wrote: > > I've written a Java program that goes through vaulted tapes and forks amtape > (using Runtime.getRuntime().exec(()), and when a non-existant tape label is > asked for, amtape "hangs". I cannot figure out why or how to get amtape to > just exit with an error (which I can then handle). Does it still hang if you pass an argument "-ointeractivity=''" when you exec amtape? If it does still hang, what do "lsof -p" and "strace -p" show on the amtape process while it's stuck? Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Problem with taperscan and chg-single
On Sat, Apr 03, 2021 at 18:24:59 +1100, meku wrote: > The traditional taperscan does not support interactivity which is why I am > trying to use lexical or oldest. Yeah, "lexical" and "oldest" taperscans should work This problem seems vaguely familiar but I am not remembering off hand what would cause it What do the following commands report? amtape inventory amadmin retention Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: amgtar: defaults for NORMAL and STRANGE
On Wed, Jan 20, 2021 at 14:22:02 +1100, Tom Robinson wrote: > I'm still seeing messages in the report that should have been squashed. It > also doesn't matter what I have configured as 'NORMAL' for the application > configuration. > > STRANGE DUMP DETAILS: > /-- lambo.motec.com.au / lev 1 STRANGE > sendbackup: info BACKUP=APPLICATION > sendbackup: info APPLICATION=amgtar > sendbackup: info RECOVER_CMD=/usr/bin/gzip -dc > |/usr/lib64/amanda/application/amgtar restore [./file-to-restore]+ > sendbackup: info COMPRESS_SUFFIX=.gz > sendbackup: info end > | /usr/bin/tar: ./dev: directory is on a different filesystem; not dumped > | /usr/bin/tar: ./proc: directory is on a different filesystem; not dumped > | /usr/bin/tar: ./run: directory is on a different filesystem; not dumped > | /usr/bin/tar: ./sys: directory is on a different filesystem; not dumped > | /usr/bin/tar: ./mnt/s3backup: directory is on a different filesystem; not > dumped > | /usr/bin/tar: ./var/lib/nfs/rpc_pipefs: directory is on a different > filesystem; not dumped > ? /usr/bin/tar: ./mnt/s3backup: Warning: Cannot flistxattr: Operation not > supported [...] > property"NORMAL" ": socket ignored$" > property append "NORMAL" ": file changed as we read it$" > property append "NORMAL" ": directory is on a different filesystem; Note that the man page explaination of NORMAL includes the sentence 'These output are in the "FAILED DUMP DETAILS" section of the email report if the dump result is STRANGE'. In this case, the "Operation not supported" message is considered STRANGE... which in turn causes all the NORMAL message lines to be included in the report output as well. So presumably once you resolve all of those error messages for a particular DLE, that DLE will no show up with a STRANGE DUMP DETAILS section at all, in which case those NORMAL-category messages will no longer be included in the report. (You could prevent those messages from ever showing up in the report by setting them to IGNORE in the config file, but in general I'd say trying to fix the underlying cause of a STRANGE status is preferable to suppressing messages completely) Does that explain the behavor you were seeing? Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: amgtar: defaults for NORMAL and STRANGE
On Tue, Jan 19, 2021 at 11:53:52 +1100, Tom Robinson wrote: > > Also, the man page says there are defaults for NORMAL and STRANGE but these > 'defaults' don't seem to be included into the application definition when I > dump the config information with amadmin daily config: [...] > Is the man page incorrect? Are the 'defaults' really applied or do I have > to manually specify them in the config file? I haven't looked closely at this functionality before, but from a quick skim of the code in application-src/amgtar.c, it looks like those default values are built directly in to the program itself. That is, they aren't implemented as part of the config system and thus don't show in the output of "amadmin ... config", but they do indeed exist underneath the hood. (As a corollary to that, it seems like there isn't any way to completely delete the default strings from amgtar's processing, though you can override the treatment of a particular regex by explicitly specifying it as another type in the config file.) (Are you seeing any situations where it looks like the default strings aren't being applied as you would have expected from the man page description?) Nathan ---- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: amgtar: Operation not permitted
On Tue, Jan 19, 2021 at 11:53:52 +1100, Tom Robinson wrote: > amanda-server 3.5.1 > > Hi, > > I've recently started using amgtar instead of tar to reduce/remove the > STRANGE output in daily backup reports. > > I now get a lot of permission warnings and errors. Of particular concern > are the 'Operation not permitted' messages: I'm not coming up with any definitive explainations off the top of my head... What OS and/or distribution is this running on? How did you install Amanada? Can you navigate down into the directories that generate these errors using "ls" executed manually (as root)? Do you have apparmor or similar kernel-level security enforcement active? Nathan ---- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: "bad status on taper SHM-WRITE (dumper)" message
On Wed, Dec 23, 2020 at 08:11:25 -0500, Gene Heskett wrote: > amstatus: bad status on taper SHM-WRITE (dumper): 20 at > /usr/local/share/perl/5.24.1/Amanda/Status.pm line 929, <$fd> > line 3411. [...] > But that log was overwritten by the flush.sh I did trying to complete > the backup on vtape-30, so is gone forever. But vtape-31 was not In Amanda 3.5, the "/var/log/amanda//amdump" path is actually just a symlink pointing to the currently-active amdump file among the multiple timestamped files (amdump.MMDDhhmmss), so the original log file should still be out there. It will be interesting to compare the contents of the logs from working and non-working runs. So: 1) what's the timestamp for the run that generated this error? 2) what does $ grep SHM-WRITE amdump.202012[12]* show (when run from within the correct .../log/amanda/... directory)? (The idea being to grep through all the amdump.* files from the past 13 days, just as a quick way to hit both good and bad runs.) 3) is there any correlation between the runs where amstatus returns this error and other interesting messages appearing in the Amanda Mail Report for those runs? Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: did it again. -- crc differ
On Sat, Dec 19, 2020 at 14:43:56 -0500, Gene Heskett wrote: > On Saturday 19 December 2020 12:12:07 Nathan Stratton Treadway wrote: > > > On Sat, Dec 19, 2020 at 10:43:42 -0500, Gene Heskett wrote: > > > new error file, from /home on GO704:(word wrap off) > > > > > > dd if=/sdb/dumps/20201219085654/GO704._home.0 bs=32k count=1 > > > > Okay, that output looks good good. > > > > for completeness, can you post the section from this Amanda Report > > covering this error? > > In the last post. In that message I see the quoted "FAILURE DUMP SUMMARY" section for the earlier failure but not the report for when GO704:/home failed... > No hits on the crc from the previous post, adcf8473:2018270728, any place > in that /var/log/amanda tree. Anything under /tmp/amanda/ (there on GO704)? Nathan ---- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: did it again. -- crc differ
On Sat, Dec 19, 2020 at 10:43:42 -0500, Gene Heskett wrote: > new error file, from /home on GO704:(word wrap off) > > dd if=/sdb/dumps/20201219085654/GO704._home.0 bs=32k count=1 Okay, that output looks good good. for completeness, can you post the section from this Amanda Report covering this error? > New crc's > > root@coyote:GenesAmandaHelper-0.61$ grep adcf8473:2018270728 > /usr/local/var/amanda/Daily/* G0704 is a separate Amanda client machine, right? Can you do a similar grep in the amanda debug/log files over on that machine, too? Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amrecover hangs after restore
On Sat, Dec 19, 2020 at 03:32:07 -0500, Gene Heskett wrote: > But the problem is not fixed: > > FAILURE DUMP SUMMARY: > rpi4 /usr/lib lev 0 partial taper: source server crc (efe0c707:1538583893) > and input server crc (fa79e777:1538583893) > differ) (This is back to the CRC error, so I'll send a reply in that thread) Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: did it again. -- crc differ
On Sat, Dec 19, 2020 at 03:32:07 -0500, Gene Heskett wrote: > But the problem is not fixed: Well, at least this time it's a one-part dump file, so that may make investigation at little easier > > FAILURE DUMP SUMMARY: > rpi4 /usr/lib lev 0 partial taper: source server crc (efe0c707:1538583893) > and input server crc (fa79e777:1538583893) > differ) > rpi4 /usr/lib lev 0 was successfully retried > > But the failed dump is still in the holding disk: > > root@coyote:config-bak$ ls -l /sdb/dumps/20201219020104/ > total 1502560 > -rw--- 1 amanda amanda 1538616661 Dec 19 02:13 rpi4._usr_lib.0 > > >From the emailed report: > > driver: rpi4 /usr/lib 20201219020104 0 [Will retry dump because of holding > disk error: source server crc > (efe0c707:1538583893) and input server crc (fa79e777:1538583893) differ)] > taper: tape Dailys-24 kb 16495500 fm 79 [OK] > > and: > rpi4 /usr/lib 0 3273 1467 -- 5:23 10366.4 0:01 1502523.0 PARTIAL FLUSH > 5:11 4831.3 > > Even the sizes don't match so of course the crc's won't either. Note that the two sizes mentioned in the error message do match (1538583893), so I think the full file is getting transfered. (The file on the holding disk is 32kiB larger, i.e. the size of the Amanda header: 1538616661-1538583893=32768 .) What's the header of that holding-disk file look like? (e.g. $ sudo dd if=/sdb/dumps/20201219020104/rpi4._usr_lib.0 bs=32k count=1 ) Do you get any hits when you grep the Amanda debug and log files for those two CRC values ( efe0c707 and fa79e777 )? Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amrecover hangs after restore
On Fri, Dec 18, 2020 at 19:31:52 -0500, Gene Heskett wrote: > That likely won't happen again if at all, as I doubled the size of a > vtape, specifically to stop that. I'm only using around half of a 2T > drive for 60 vtapes. But I see it is growing. > > /dev/sde1 1.8T 1.1T 617G 65% /amandatapes Yeah, sounds like a good idea -- generally if your vtapes are all on a single shared filesystem like this and you were going to let Amanda use two vtapes in one run, there's no particular reason not to just increase the logical size of a vtape so that each run is containing within a single vtape instead. Anyway, here's hoping your backups run un-interrupted at least through the holidays :) ... Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amrecover hangs after restore
On Fri, Dec 18, 2020 at 02:56:55 -0500, Gene Heskett wrote: > But that after amanda script worked and I have uptodate indices and > config files in that vtape now. [...] so a/o vtape Dailys-25 I have > what I think is a good backup again. Great! Since you commented out the section of the "if" that depends on PARTS_WRITTEN, the code that does the parsing of the amstatus output won't matter any more -- but if you notice the next time amdump writes to two vtapes, it would be interesting to see the output of amstatus from that run... (And also to know if amstatatus still generates a perl warning in that situation.) > And it did not leave the failed crc file in the dumps assignment, I assume nothing we've done so far would have changed the crc-related behavior, so if you see that problem again, you should definitely follow up on that issue (over on that specific list thread) Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amrecover hangs after restore -- ALWAYS_SET_PATH
On Fri, Dec 18, 2020 at 16:23:47 -0500, Gene Heskett wrote: > On Friday 18 December 2020 15:06:14 Nathan Stratton Treadway wrote: > > > On Fri, Dec 18, 2020 at 14:44:07 -0500, Gene Heskett wrote: > > > On Friday 18 December 2020 14:33:03 Nathan Stratton Treadway wrote: > > > > ls -l /etc/login.defs /etc/defaults/su > > > > > > ls: cannot access '/etc/defaults/su': No such file or directory > > > -rw-r--r-- 1 root root 10496 Aug 7 2019 /etc/login.defs > > > > (Sorry, obviously I mispelled "default" in that command; see my other > > email for the new batch of commands.) > -rw-r--r-- 1 root root20 Dec 17 10:27 /etc/default/su > -rw-r--r-- 1 root root 10496 Aug 7 2019 /etc/login.defs Did you manually edit the /etc/default/su file yesterday? If so, I take it that you added the ALWAYS_SET_PATH line to it at that point? On _Stretch_, you do NOT want any ALWAYS_SET_PATH line at all. The point of that line is to make the "su" command _on Buster_ act like the Stretch version used to act. As the warning message indicates, your current version of "su" does not recognize that parameter. So, if you added that line to /etc/default/su yesterday, you can just go ahead and delete that line again. (Actually at 20 bytes I guess the file only contains that one line, so probably the file didn't exist before yesterday; if that's true, you can just delete it again) Meanwhile, the error message you have been getting for the past hear would appear to be coming from the line with that parameter in found in the login.defs file. I would say you could just go ahead and delete/comment out that line from that file as well -- but login.defs is shared across multiple commands in the shadow password suit, so there is a chance that some other command will be affected when you edit it. That's why I sent the commands to try to figure out where the current version of the file originally came from. However, off hand it seems like the warning message you were getting is an indication that the paremeter is just not implemented yet in your version of the shadow utilties, so if you don't want to investigate that side of things it's probably safe for you to go ahead and comment out the line in the login.defs file. That should fix the warning messages, and you can then keep an eye out for any cron jobs or whatever that suddenly stop working because the PATH is no longer set as expected.... Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amrecover hangs after restore -- ALWAYS_SET_PATH
On Fri, Dec 18, 2020 at 15:03:07 -0500, Nathan Stratton Treadway wrote: > What do you get from these commands?: > $ ls -l /etc/login.defs /etc/default/su > > $ dpkg -S /etc/login.defs > > $ dpkg -S /etc/default/su > > $ apt-cache policy login > Ooops, should have included this one as well: $ ls -lc /etc/login.defs /etc/default/su Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amrecover hangs after restore -- ALWAYS_SET_PATH
On Fri, Dec 18, 2020 at 14:44:07 -0500, Gene Heskett wrote: > On Friday 18 December 2020 14:33:03 Nathan Stratton Treadway wrote: > > > ls -l /etc/login.defs /etc/defaults/su > ls: cannot access '/etc/defaults/su': No such file or directory > -rw-r--r-- 1 root root 10496 Aug 7 2019 /etc/login.defs (Sorry, obviously I mispelled "default" in that command; see my other email for the new batch of commands.) Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amrecover hangs after restore -- ALWAYS_SET_PATH
On Fri, Dec 18, 2020 at 14:42:45 -0500, Gene Heskett wrote: > On Friday 18 December 2020 14:33:03 Nathan Stratton Treadway wrote: > > > grep ALWAYS_SET_PATH /etc/login.defs /etc/default/* > etc/login.defs:ALWAYS_SET_PATH yes > grep: /etc/default/grub.d: Is a directory > /etc/default/su:ALWAYS_SET_PATH yes Well, that explains why you are getting the warning message... Now the question is why those lines exist in the files (and in both of them, to boot)? The wierd thing is that this setting seems to be needed *on Buster* to return the behavior back to previous behavior -- but since you are running Stretch, it's not clear why those lines exist in the config files... What do you get from these commands?: $ ls -l /etc/login.defs /etc/default/su $ dpkg -S /etc/login.defs $ dpkg -S /etc/default/su $ apt-cache policy login Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amrecover hangs after restore -- ALWAYS_SET_PATH
On Fri, Dec 18, 2020 at 13:38:21 -0500, Gene Heskett wrote: > On Friday 18 December 2020 10:53:55 Nathan Stratton Treadway wrote: > > > On Fri, Dec 18, 2020 at 01:10:10 -0500, Gene Heskett wrote: > > > On Thursday 17 December 2020 23:03:59 Nathan Stratton Treadway wrote: > > > > What do > > > > $ grep ALWAYS_SET_PATH /etc/login.defs /etc/default/* > > > > $ ls -l /etc/login.defs /detc/defaults/su > > > > show? > > > > Any luck here? > > No, first, that link is for buster, this machine is stretch yet but I > have put that line an my .bashrc, in roots .bashrc, and in > amanda's .bashrc and . sourced them all without any detectable effect. I don't believe this error message is related to .bashrc at all. Instead, please run the above-quoted commands and let us know what you find... Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amrecover hangs after restore -- ALWAYS_SET_PATH
On Fri, Dec 18, 2020 at 01:10:10 -0500, Gene Heskett wrote: > On Thursday 17 December 2020 23:03:59 Nathan Stratton Treadway wrote: > > (You can check this by running something simple via su, e.g. > > $ su amanda -c "echo test message" > > ) > Which generates the error. okay, check. > > What do > > $ grep ALWAYS_SET_PATH /etc/login.defs /etc/default/* > > $ ls -l /etc/login.defs /detc/defaults/su > > show? Any luck here? Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amrecover hangs after restore -- ALWAYS_SET_PATH
On Thu, Dec 17, 2020 at 10:38:58 -0500, Gene Heskett wrote: > On Thursday 17 December 2020 09:24:58 Richard Sass wrote: > > > Gene: > > > > BUT Whats line 2 above, I've wasted a year looking for that, it does > > not grep in the whole src code tree. > > > > configuration error - unknown item 'ALWAYS_SET_PATH' (notify > > administrator) Presumably this comes from the "su" command itself rather than from any binary you've built from source. (You can check this by running something simple via su, e.g. $ su amanda -c "echo test message" ) > > > > Perhaps this will help > > > > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=905564 > Its of no help on this stretch install. > gene@coyote:~$ sudo -i > [sudo] password for gene: > root@coyote:~$ su amanda -c "geany bak-indices-configs" > configuration error - unknown item 'ALWAYS_SET_PATH' (notify > administrator) > > So where should it be put, and whose perms on stretch? If you are trying to eliminate that error message, it seems like you want to delete/deactivate the corresponding line (rather that put something new anywhere). What do $ grep ALWAYS_SET_PATH /etc/login.defs /etc/default/* $ ls -l /etc/login.defs /detc/defaults/su show? Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amrecover hangs after restore
On Thu, Dec 17, 2020 at 01:58:27 -0500, Gene Heskett wrote: > Here is the completed output of that command: > root@coyote:~$ su amanda -c "/usr/local/sbin/amstatus Daily" [...] > taped : 78 13416m 13308m (100.81%) (100.81%) > tape 1 : 79 24046m 24046m ( 37.57%) Dailys-21 (79 parts) Okay, those two lines are the section in question from your ouput. In this case, the "taped" line *does* have statistics on that same line, so the existing logic in the script should have worked fine, at least on this run. On Thu, Dec 17, 2020 at 06:03:38 -0500, Gene Heskett wrote: > But lets fix the current problem first, it screwed up last night on > only 1 vtape. Thast means the edit I made last night, has converted it > into a full time failure. Did you get this fixed? If not, post the info on the edit you made and the output from last night's run Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amrecover hangs after restore
On Wed, Dec 16, 2020 at 16:41:39 -0500, Gene Heskett wrote: > So here is the that script-- > Okay, a couple things: > PARTS_WRITTEN=`${AM_SBIN_DIR}/amstatus $CONFIGNAME | grep taped | awk -F: > '{print $2}' | awk '{print $1}'` If you run "amstatus" manually, what does your "taped" section look like now? On my Amanda v3.5 box with dumps going to two separate storages, I get: = taped TestBackup: 3 6085m 6061m (100.39%) (100.39%) tape 1 : 3 6085m 6085m ( 2.97%) TESTBACKUP-12 (3 parts) TestOffsite : 3 6085m 6061m (100.39%) (100.39%) tape 1 : 3 6085m 6085m ( 2.97%) TESTBACKUP-103 (3 parts) = , but your script clearly expects the parts-written figure to be on the same line as the word "taped". So I'm pretty sure you need to upgrade your script to support amstatus's new formatting in v3.5... but I'm not sure exactly what changes that would require in your setup (i.e. the output may be different with only one storage in use, etc.). (Note that because of this issue I don't think adding quote characters to the -gt line will actually fix the script: the expression [ "" -gt 0 ] will fail with a different error than [ -gt 0 ] ... but neither one is valid.) > # Ok, then lets make it part of the dd.report record > echo "Parts written = $PARTS_WRITTEN >> dd.report.$TAPENAME" Okay, this is what produced the output line I found interesting in your earlier email. First thing is that this line actaully has a misplaced " character so it's not doing what the comment describes. (It's writing to standard output instead of to the dd.report.$TAPEname file.) Instead you want to say echo "Parts written = $PARTS_WRITTEN" >> dd.report.$TAPENAME on that line of the script. But in spite of that issue, the corresponding line in the log you posted earlier confirms that PARTS_WRITTEN was empty in that run, which indeed explains the syntax error you got from the "-gt" line of the script. Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amrecover hangs after restore
On Wed, Dec 16, 2020 at 15:37:32 -0500, Gene Heskett wrote: > On Wednesday 16 December 2020 12:23:28 Nathan Stratton Treadway wrote: > > > On Wed, Dec 16, 2020 at 09:42:47 -0500, Gene Heskett wrote: > > > You reminded me of that, so its now done. We'll see if that fixes > > > it. > > > > (Note that putting in the quote characters should prevent the shell > > from aborting due to the syntax error, but it won't fix the underlying > > problem that the contents of the PARTS_WRITTEN variable appear to be > > bogus at that point in time. Though if you want to debug that issue > > further, it's probably best if you reply to that branch of this thread > > directly :) ) > > > I'll see if, in my somewhat decreased mental state, I can figure out how > to echo that into the log file. Jon L. seems to think a bit of perl that Well, there was a tantalizing hint already included in the output you posted to the list, so you may not need to actually chage the script for that part. Perhaps you shoud just go ahead and post the script; that would at least let us see what what script processing matches up with the output you posted earlier. > amanda uses has been updated in the last year or so and has broken > amanda somehow. Off hand, I am thinking there is a bug in the amstatus Perl code which gets triggered when you have a two-tape run, and then also a fix needed in your shell script so that PARTS_WRITTEN is always set correctly and the script can properly deal with a two-tape run. (Because you only see the error for the double-tape runs, I'm less inclinded to suspect a Perl upgrade is the issue [rather than a more general bug in amstatus, as it parsess the Amanda log file], but we may need to wait until the next time it happens before we can track it down.) Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amrecover hangs after restore
On Wed, Dec 16, 2020 at 09:42:47 -0500, Gene Heskett wrote: > You reminded me of that, so its now done. We'll see if that fixes it. (Note that putting in the quote characters should prevent the shell from aborting due to the syntax error, but it won't fix the underlying problem that the contents of the PARTS_WRITTEN variable appear to be bogus at that point in time. Though if you want to debug that issue further, it's probably best if you reply to that branch of this thread directly :) ) Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amrecover hangs after restore
On Mon, Dec 14, 2020 at 14:04:40 -0500, Gene Heskett wrote: > On Monday 14 December 2020 11:32:56 Nathan Stratton Treadway wrote: > > > On Sun, Dec 13, 2020 at 03:05:16 -0500, Gene Heskett wrote: > > > ./bak-indices-configs: line 135: [: -gt: unary operator expected > > > > There does seem to be an error message coming from the amstatus > > program which we can investigate later, but as far as your own script > > not doing the coping I think that might be explained by the above > > error message. > > > > So, what's on line 135 of the bak-indices-configs script? > > > That is a very long if else fi thing, wordwrap off: > --- > if [ $PARTS_WRITTEN -gt 0 ]; then > if [ $DUMMY -eq 1 ] ; then Seems like the only occurrence of "-gt" is in the $PARTS_WRITTEN line. The output you quoted in your earlier email mentions Parts written = >> dd.report.Dailys-17 . Does that mean that the PARTS_WRITTEN variable actaully contained the value ">> dd.report.Dailys-17"? That would definitely not parse out well in the if condition... If add quotes around the variable (i.e. you change that line to if [ "$PARTS_WRITTEN" -gt 0 ]; then ), I think that would prevent the script from erroring out at that spot (and is generally a good idea). However, this does lead to the followup question of how PARTS_WRITTEN is (supposed to be) getting set in the first place? (Given the other parts of your original email, I'll hazard a guess that the script is trying to parse the output of amstatus, but the parsing code is confused by the warning message amstatus is currently generating) Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amrecover hangs after restore
On Sun, Dec 13, 2020 at 03:05:16 -0500, Gene Heskett wrote: > ./bak-indices-configs: line 135: [: -gt: unary operator expected There does seem to be an error message coming from the amstatus program which we can investigate later, but as far as your own script not doing the coping I think that might be explained by the above error message. So, what's on line 135 of the bak-indices-configs script? Nathan ---- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amrecover hangs after restore
On Thu, Dec 03, 2020 at 16:58:34 +0100, Stefan G. Weichinger wrote: > Ah, I forgot that. I have "-p4". > Will retry asap. Thanks for the reminder. You mean you currently have an explicit "-p4" on the command line contained in the wrapper script? If -p1 does work, it would be interesting to know what happens with -p2 and -p3 as well Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amrecover hangs after restore
On Thu, Dec 03, 2020 at 08:46:00 +0100, Stefan G. Weichinger wrote: > > Am 01.12.20 um 13:44 schrieb Stefan G. Weichinger: > > >With the simple wrapper in place amrecover correctly runs through > >(*and* pigz is used for the amdump and the amrecover step). > > > >Tomorrow, when the admin there will insert a specific tape for me, > >I will test that from the tape I used for the failing tests a few > >days ago. > > With the tape the same amrecover process did *not* work, same > behavior as without the wrapper. Did you try a run using the wrapper to add "-p 1" to the pigz invocation? Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amanda.org Website Updates -- Amanda bugfix release?
On Wed, Dec 02, 2020 at 00:26:08 +, Pavan Raj wrote: > We would like to continue the momentum with the upgrades and plan for > a new release. (I'm not sure what "upgrades" you have in mind, but in any case I haven't seen any discussion of a new release here in quite a while, nor do I see any new-release-type activity in the git repo) > As a next step, we would like to improve the website to address the > security issues, modernize, and improve the usability. I could probably get excited about a revamped amanda.org website someday, but really getting a bugfix release out the door seems much more important. v3.5.1 was released December 1, 2017. Since then a number of fixes for bugs have been identified, but all those fixes are still not available for atual use anywhere, except for the few custom patches being applied by some distribution-specific maintainers. (There was an effort in November 2019 to collect some of the patches into the Zmanda git repo, but after that brief spurt nothing else happened.) Over the past couple of months we (here on this list) have identified several pretty serious bugs and come up with some apparent workarounds, but had no activity/help from Betsol developers in identifying the underlying cause and finding correct fixes for those issues, let alone in getting those fixes tested out and then incorportated into a release so users can avoid those problems in the future. So as an urgent first step, I'd definitely be in favor of a 3.5.2 release (and preferably a plan for subsequent minor releases coming down the pike) before any effort is spent on a website refresh... Nathan ---- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: did it again. -- crc differ
On Mon, Nov 30, 2020 at 18:41:40 -0500, Gene Heskett wrote: > > On Mon, Nov 30, 2020 at 12:46:46 -0500, Nathan Stratton Treadway wrote: > > > I assume that the first few lines of the > > > coyote._home_gene_Pictures.0 file is an Amana header (including an > > > XML chunk); can you post that here? > > > > Hmmm, it might also be useful to see the header from the > > coyote._home_gene_Pictures.0.5 file (i.e. the last of the subparts) as > > well > > > Try this: > gene@coyote:sudo dd > if=/sdb/dumps/20201130020105/coyote._home_gene_Pictures.0.5 bs=32k count=1 > > AMANDA: CONT_FILE 20201130020105 coyote /home/gene/Pictures > lev 0 comp N program APPLICATION > APPLICATION=amgtar > DLE=<http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amrecover hangs after restore
On Mon, Nov 30, 2020 at 15:35:36 +0100, Stefan G. Weichinger wrote: > Am 30.11.20 um 00:40 schrieb Nathan Stratton Treadway: > https://github.com/madler/pigz/issues/80 As I mentioned before I'm not familiar with pigz myself, but skimming through those Github issues (76 and 80), I would guess that the problem you are having with amrecover is something unrelated to #76. In particular, 76 seems to be about changing the exit status pigz returns if there is trailing junk on the compressed input stream... but as far as I understand at this point, there's no evidence of any trailing junk in the Amanda dumps. In any case, the problem doesn't seem to be amrecover mishandling some unexpected return status, but rather than it never retrivies the return status of the subprocess in the first place (as evidenced by the Zombie status of the pigz process). Off hand I can't really say why that would the case, but one theory that comes to mind is the fact that gzip normally doesn't spawn it's own subprocesses but pigz does. A way to test that theory would be put the shell-script wrapper around the pigz binary but just call the original binary with the same command line arguments that amrecover uses, and see if that setup ends up with processes in Zombie status as well -- and, if so, then try adding a "-p 1" parameter (for example) to the call to the real pigz binary to see if that changes the behavior any... Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: did it again. -- crc differ
On Mon, Nov 30, 2020 at 12:46:46 -0500, Nathan Stratton Treadway wrote: > I assume that the first few lines of the coyote._home_gene_Pictures.0 > file is an Amana header (including an XML chunk); can you post that > here? Hmmm, it might also be useful to see the header from the coyote._home_gene_Pictures.0.5 file (i.e. the last of the subparts) as well Nathan ---- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: did it again. -- crc differ
On Mon, Nov 30, 2020 at 03:12:41 -0500, Gene Heskett wrote: > Doing a level 0 on /home/gene/Pictures, it logged this in the email: > > coyote /home/gene/Pictures lev 0 partial taper: source server crc > (44cff778:11146117120) and input server crc (dfd0e83a:11146117120) > differ) > coyote /home/gene/Pictures lev 0 was successfully retried > > It did leave a 10+ Gb file in the vtape, but left the failed files in the > holding disk: > > root@coyote:~$ ls -l /sdb/dumps/20201130020105/ > total 10885096 > -rw--- 1 amanda amanda 2097152000 Nov 30 02:06 > coyote._home_gene_Pictures.0 Would /home/gene/Pictures have changed any between the two retries? If not, you might learn something by comparing the components of the successful dump with the files on the holding disk... (but off hand I'm not sure how many red-herring differences you'd have sort through to find any hints as to the actual problem). > > Does anyone have a clue what its really trying to tell me? I only have some vague clues: * the number after the ":" is the size of the file being CRCed. In this case 11146117120 shows up for both sides of the commparison, so it seems like the full file got transfered across to whatever step is causing the error. It also seems like this error applies to the entire 11GB dump rather than the individual 2GB parts. * The message "source server crc ([...]) and input server crc" appears to be generated in Amanda/Taper/Worker.pm:result_cb() in cases where $self->{'server_crc'} and $self->{'source_server_crc'} differ. $self->{'server_crc'} seems to be read out of the header of the dump file itself. $self->{'source_server_crc'} seems to be computed as part of transfering the file to the taper process, or something like that. So I guess the next question is where in the multiple stages of the life of the dump file the CRC missmatch gets introduced I assume that the first few lines of the coyote._home_gene_Pictures.0 file is an Amana header (including an XML chunk); can you post that here? Also, what do you find when you grep the Amanda debug/log files for those two CRC values ( 44cff778 and dfd0e83a )? One other thought: have the reported CRC errors in the past also been for the dump of the /home/gene/Pictures DLE, or are multiple different DLEs affected? Is it always level 0 dumps? Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amrecover hangs after restore
On Thu, Nov 26, 2020 at 11:12:45 +0100, Stefan G. Weichinger wrote: > Am 25.11.20 um 22:39 schrieb Debra S Baddorf: > > >>the theory is good, sure. I will test the restore aside from amrecover > >>tomorrow. > > > > > >If so, remember to ???throw out??? the first block of the file, which will > >choke the zip program. > >dd-skip=1 etc > > I was able to amrestore correctly .. no "-r", no skipping. (Did you try this while the real pigz binary was in place, or only after you replaced it with "gzip"?) On Thu, Nov 26, 2020 at 08:48:02 +0100, Stefan G. Weichinger wrote: > But guess what I did already ... > > # cp /usr/bin/pigz /usr/bin/pigz-original > > # cp /usr/bin/gzip /usr/bin/pigz > > # amrecover > > works :-P Since you have a workaround now I don't know how much more effort you want to spend on this, but if you do want to investigate further you could try replacing /usr/bin/pigz with a shell script wrapper which calls pigz-original but writes some debugging messages, etc. to a log file before and after invoking pigz-original, and exits with a "success" status. Basically just trying to see if there is any fussing you can do to how pigz is invoked or how the exit status is processed which changes the overall behavior of amrestore-calling-pigz. Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amrecover hangs after restore
On Thu, Nov 26, 2020 at 11:25:40 +0100, Stefan G. Weichinger wrote: > Am 25.11.20 um 20:57 schrieb Nathan Stratton Treadway: > >On Wed, Nov 25, 2020 at 14:34:17 -0500, Nathan Stratton Treadway wrote: > >>Also, do you see the same defunct pigz process that Jason reported in > >>his original post? > > > >Am 30.05.18 um 20:21 schrieb Jason L Tibbitts III: > >>root 2690 9.1 0.0 317692 11020 pts/0S+ 12:38 1:43 | > >> \_ amrecover math -s backup2 -t backup2 > >>root 2996 32.5 0.0 0 0 pts/0Z+ 12:48 2:52 | > >> \_ [pigz] > >>root 2998 3.3 0.0 0 0 pts/0Z+ 12:48 0:17 | > >> \_ [xfsrestore] > > > >Assuming you are seeing this same behavior: one theory that comes to > >mind is that pigz could be spawning subprocesses which then somehow > >confuse amrecover such that it doesn't properly detect when pigz > >terminates (and just keeps waiting for that to happen, even though it > >already has happened). > > > >I don't know enough about how amrecover spawn the pipes to know how > >likely that is, but one thing you could try is to kill the amrecover > >process with a SIGCHLD signal (once it reaches the above "everything is > >hung" situation) and see if one or both of those defunct processes go > >away, and if the amrecover process starts doing work again > >afterwards > > Not sure how to show the process tree as shown above ... (I think Jason's output was generated using the "--forest" option to ps, but really all that matters is the "Z" process state for the two subprocesses). > > "kill -s SIGCHLD" .. ran it against the PIDs of amrecover and pigz, > no effect. > > pigz isn't even killed by a "-9" > The fact that the pigz process is in defunct/"Z"ombie status means it's already dead and only still exists in the process listing because the parent process hasn't read the exit code yet. So even a -9 won't help (since that process is already dead). I was hoping SIGCHLD on the amrecover process would trick it into exiting whatever wait-loop it is in and checking for subprocesses that have already terminated (both pigz and xfsrestore in the above listing)... but sounds like that didn't work. Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amrecover hangs after restore
On Wed, Nov 25, 2020 at 14:34:17 -0500, Nathan Stratton Treadway wrote: > Also, do you see the same defunct pigz process that Jason reported in > his original post? Am 30.05.18 um 20:21 schrieb Jason L Tibbitts III: > root 2690 9.1 0.0 317692 11020 pts/0S+ 12:38 1:43 | > \_ amrecover math -s backup2 -t backup2 > root 2996 32.5 0.0 0 0 pts/0Z+ 12:48 2:52 | > \_ [pigz] > root 2998 3.3 0.0 0 0 pts/0Z+ 12:48 0:17 | > \_ [xfsrestore] Assuming you are seeing this same behavior: one theory that comes to mind is that pigz could be spawning subprocesses which then somehow confuse amrecover such that it doesn't properly detect when pigz terminates (and just keeps waiting for that to happen, even though it already has happened). I don't know enough about how amrecover spawn the pipes to know how likely that is, but one thing you could try is to kill the amrecover process with a SIGCHLD signal (once it reaches the above "everything is hung" situation) and see if one or both of those defunct processes go away, and if the amrecover process starts doing work again afterwards Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amrecover hangs after restore
On Wed, Nov 25, 2020 at 20:07:23 +0100, Stefan G. Weichinger wrote: > So maybe pigz needs some additional option at decompression, or some > fix, or amanda needs some patch to correctly handle the behavior or pigz > in the process. > > I *know* I could use amrestore. But amrecover should work, and the very I don't have any experience with pigz myself, but just from reading this thread it seems like it would be useful for you to test the amrestore approach in order to find out whether pigz run manually (outside of an Amanda pipeline) raises a failure exit status or givens any sort of hint of a problem processing either of those particular dump files. Also, do you see the same defunct pigz process that Jason reported in his original post? Does anything interesting show up in the Amanda debug logs for the amrecover process? Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Backing up remote server
On Tue, Oct 20, 2020 at 16:36:52 -0500, Robert Wolfe wrote: > Amanda Backup Client Hosts Check > > WARNING: wolfe2.wolfe.local: selfcheck request failed: error sending > REQ: write error to: Connection refused > Client check: 2 hosts checked in 0.069 seconds. 1 problem found. > > (brought to you by Amanda 3.3.3) > > Not sure what I need to do to get this to work. I have the firewall on > the remote server is disabled, but not sure what else I need to do. The details depend on the authentication you have specified for that host on the Amanda server and on the distribution/release of Linux running on your Amanda client... but off hand it sounds like you are missing the definition for the amandad service in /etc/inetd.conf or /etc/xinietd.d/* on the client system. Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: restore from vtapes written by amvault
On Sun, Oct 18, 2020 at 14:56:04 +0200, Bernhard Erdmann wrote: > I ended up patching the logfiles written by amvault, e.g. for > > $ fgrep 20020908 ../tapelist > 20020908 BE-full-43 reuse > $ amvault --dest-storage vtape be-full \* \* 20020908 > > I get log.20201018130312.0 afterwards. Then I do > > $ cp -p log.20201018130312.0 ../log_backup > $ perl -p -i -e 's/ 2002090800 / 20020908 /g' log.20201018130312.0 > > and then amvaulted dump images written to vtapes can be located by > amadmin find: Great, thanks for posting this final wrap-up message. So, when you ran your perl patching on the log files, were there any lines other than the "DONE taper" lines that got changed? (That info will help confirm the proper fix for the original problem in the vaulting code...) Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Wed, Oct 14, 2020 at 13:20:38 -0400, Chris Hoogendyk wrote: > I've been doing all the debugging stuff one one server. On my other > server, I had simply set it to traditional and it's been working. > Just now I went and applied ScanInventory.pm.patch_20201013C and > changed the amanda.conf back to using oldest. An amcheck daily told > me there were no acceptable volumes found! So I switched back to Hmmm... Well, clearly the exact failure mode different between the two servers :(. (You did previously see the "terminated with signal 11" error message on the eclogite server, right?) Did the dump that ran yesterday/last night actually write to the geo-daily-065 tape successfully, or was there some sort of changer error at run time? I guess the next thing to try would be to install ScanInventory.pm.patch_20201013B to enable some debugging, then run "amcheck -s" and "amtape ... taper" again and post the results. Those tests should be done using the oldest taperscan, but you can leave the amanda.conf as-is and test with -otaperscan on the command lines if you prefer. Also, since I assume the statefile has changed since you last posted it, I guess you should include it again (i.e. the version that is out then at the moemnt you are running those tests). Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Wed, Oct 14, 2020 at 12:20:33 -0400, Chris Hoogendyk wrote: > Both lexical.pm and ScanInventory.pm restored to original. New fix > only patch applied to ScanInventory.pm. amanda.conf restored to use > oldest. > >amanda@marlin:~/daily$ amcheck daily > >Amanda Tape Server Host Check >- >NOTE: Holding disk '/amanda3': 139784192 KB disk space available, > using 34926592 KB >NOTE: Holding disk '/amanda4': 170082304 KB disk space available, > using 65224704 KB >NOTE: Holding disk '/amanda5': 240713728 KB disk space available, > using 135856128 KB > * Authorized Use Only * > >snapper >slot 25: volume 'Bio-Research-028' >Will write to volume 'Bio-Research-028' in slot 25. >NOTE: skipping tape-writable test >Server check took 50.059 seconds >Amanda Backup Client Hosts Check > >Client check: 4 hosts checked in 6.692 seconds. 0 problems found. >(brought to you by Amanda 3.5.1) > >amanda@marlin:~/daily$ [...]> > > Launched a flush on that. Then the following seems to set up a tape on the > second tape drive. > >amanda@marlin:~/daily$ amtape daily taper > >slot 31: volume 'Bio-Research-032' >Will write to volume 'Bio-Research-032' in slot 31. > >amanda@marlin:~/daily$ Okay, sounds like things are back to working "normally" on that server, right? So, do you still have a second server which is getting coredumps (at least with the oldest taperscan)? Based on the investigation so far, it seems like the crash is caused by tape-inventory records which have no label text along with some specific other data field values. If you post the /usr/local/var/amanda/chg-robot-dev-tape-by-id-scsi* changer state file that other server, we can double check that such entries exist over there, too. (Assume they do, then I guess the question will be whether you want to apply the same ScanInventory.pm patch there, or if you instead want to try clearing that/those bad inventory record(s) without changing the installed code on that box) Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Tue, Oct 13, 2020 at 21:02:34 -0400, Chris Hoogendyk wrote: > Patched. > >amanda@marlin:~/daily$ amcheck -s -otaperscan=taper_lexical daily > >Amanda Tape Server Host Check >- >NOTE: Holding disk '/amanda3': 449998848 KB disk space available, > using 345141248 KB >NOTE: Holding disk '/amanda4': 3026923520 KB disk space available, > using 2922065920 KB >NOTE: Holding disk '/amanda5': 104857600 KB disk space available, > using 0 KB >slot 19: volume 'Bio-Research-007' >Will write to volume 'Bio-Research-007' in slot 19. >NOTE: skipping tape-writable test >Server check took 20.422 seconds >(brought to you by Amanda 3.5.1) Okay, great. As I said before I'm not confident this is a competely correct fix, but attached here is a patch file (against the original version of the file) containing just the "fix" line (i.e. no debugging statements). You should be able to swap your lexical.pm back to the original version and put ScanInventory.pm back to the original with just this one patch, and then go ahead and switch back to "oldest" taperscan again in your amanda.conf -- hopefully it will all "just work" again. I'm curious to see the log file from an "amtape daily taper" run with that setup in place (I assume that will run to successful completion, too...). Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239 --- ScanInventory.pm_orig_v3.5.12017-09-22 19:41:42.154305907 -0400 +++ ScanInventory.pm2020-10-13 22:43:25.148507391 -0400 @@ -723,6 +723,7 @@ return 0; } elsif ($dev_status == $DEVICE_STATUS_SUCCESS and $f_type == $Amanda::Header::F_TAPESTART) { + $label='' if !defined $label; if (!match_labelstr($self->{'labelstr'}, $autolabel, $label, $barcode, $meta, $self->{'chg'}->{'storage'}->{'storage_name'})) { if (!$autolabel->{'other_config'}) {
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Tue, Oct 13, 2020 at 15:08:43 -0400, Chris Hoogendyk wrote: > > End of /tmp/amanda/server/daily/amcheck-device.20201013150303.debug [...] >Tue Oct 13 15:03:04.326400287 2020: pid 10777: thd-0x2688800: > amcheck-device: warning: Use of >uninitialized value $label in concatenation (.) or string at >/usr/local/share/perl/5.22.1/Amanda/ScanInventory.pm line 687. >Tue Oct 13 15:03:04.326421243 2020: pid 10777: thd-0x2688800: > amcheck-device: >volume_is_labelable start: label: barcode: 29L7 >Tue Oct 13 15:03:04.326444683 2020: pid 10777: thd-0x2688800: > amcheck-device: >volume_is_labelable pre-matchlabel call Okay, great, this would seem to confirm the theory that passing an uninitialed $label value into the match_labelstr() function is what's triggering the crash. Here's a new patch to try. I am not sure that it's really the long-term-correct fix, but with some luck it will at least prevent the crash you are currrently seeing and let you switch back to the oldest taperscan. You can either apply this patch file against the *original* version of ScanInventory.pm, or just manually edit the previously-patched version of the file to add the $label='' if !defined $label; just below the debug("volume_is_labelable pre-matchlabel call"); line that appears in there now. Given that a shot (and send the log file lines as usual)... Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239 --- ScanInventory.pm_orig_v3.5.12017-09-22 19:41:42.154305907 -0400 +++ ScanInventory.pm2020-10-13 15:47:36.242054712 -0400 @@ -684,6 +684,7 @@ my $chg = $self->{'chg'}; my $autolabel = $chg->{'autolabel'}; +debug("volume_is_labelable start: label: $label barcode: $barcode"); if (!defined $dev_status) { return 0; } elsif ($dev_status & $DEVICE_STATUS_VOLUME_UNLABELED and @@ -723,8 +724,11 @@ return 0; } elsif ($dev_status == $DEVICE_STATUS_SUCCESS and $f_type == $Amanda::Header::F_TAPESTART) { +debug("volume_is_labelable pre-matchlabel call"); +$label='' if !defined $label; if (!match_labelstr($self->{'labelstr'}, $autolabel, $label, $barcode, $meta, $self->{'chg'}->{'storage'}->{'storage_name'})) { +debug("volume_is_labelable post-matchlabel call"); if (!$autolabel->{'other_config'}) { # $self->_user_msg(slot_result => 1, # label=> $label, @@ -734,7 +738,9 @@ return 0; } } else { +debug("volume_is_labelable pre-lookup_tapelabel call"); my $vol_tle = $self->{'tapelist'}->lookup_tapelabel($label); +debug("volume_is_labelable post-lookup_tapelabel call"); if (!$vol_tle) { # $self->_user_msg(slot_result => 1, # label => $label,
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Mon, Oct 12, 2020 at 22:22:45 -0400, Chris Hoogendyk wrote: >Mon Oct 12 22:16:21.857347044 2020: pid 23996: thd-0x25c0800: > amcheck-device: slot: 9 label: >Bio-Research-011ds: >Mon Oct 12 22:16:21.857380544 2020: pid 23996: thd-0x25c0800: > amcheck-device: slot: 10 label: >Bio-Research-012ds: 0 >Mon Oct 12 22:16:21.857471810 2020: pid 23996: thd-0x25c0800: > amcheck-device: slot: 11 label: >Bio-Research-028ds: 0 >Mon Oct 12 22:16:21.857580226 2020: pid 23996: thd-0x25c0800: > amcheck-device: warning: Use of >uninitialized value in concatenation (.) or string at >/usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm line 102. >Mon Oct 12 22:16:21.857607369 2020: pid 23996: thd-0x25c0800: > amcheck-device: slot: 12 label: ds: 0 Progress! This shows that the crash happens during processing of slot 12. (Looking back through the output of "amtape inventory" you sent, it appears that this slot contains a tape with barcode 29L7.) An interesting thing to note is that the *label* variable for that slot is uninitialized -- perhaps that's what is causing the crash? To test that theory a bit, I've attached another patch to try. Unfortunately this one is in a file used by the oldest.pm algorithm, too, so you'll probably want to revert the file back to the original as soon as you've finished teating, to make sure that the patched version doesn't affect an actual amanda run. So, basically save a copy of the original /usr/local/share/perl/5.22.1/Amanda/ScanInventory.pm file, then apply the patch attached to this email in-place to that file, and run the amcheck -s -otaperscan=taper_lexical daily test again. (I don't expect this patch to prevent the crash, but hopefully the new log messages will narrow down exactly where it is crashing.) Nathan ---- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239 --- ScanInventory.pm_orig_v3.5.12017-09-22 19:41:42.154305907 -0400 +++ ScanInventory.pm2020-10-13 00:53:04.662721210 -0400 @@ -684,6 +684,7 @@ my $chg = $self->{'chg'}; my $autolabel = $chg->{'autolabel'}; +debug("volume_is_labelable start: label: $label barcode: $barcode"); if (!defined $dev_status) { return 0; } elsif ($dev_status & $DEVICE_STATUS_VOLUME_UNLABELED and @@ -723,8 +724,10 @@ return 0; } elsif ($dev_status == $DEVICE_STATUS_SUCCESS and $f_type == $Amanda::Header::F_TAPESTART) { +debug("volume_is_labelable pre-matchlabel call"); if (!match_labelstr($self->{'labelstr'}, $autolabel, $label, $barcode, $meta, $self->{'chg'}->{'storage'}->{'storage_name'})) { +debug("volume_is_labelable post-matchlabel call"); if (!$autolabel->{'other_config'}) { # $self->_user_msg(slot_result => 1, # label=> $label, @@ -734,7 +737,9 @@ return 0; } } else { +debug("volume_is_labelable pre-lookup_tapelabel call"); my $vol_tle = $self->{'tapelist'}->lookup_tapelabel($label); +debug("volume_is_labelable post-lookup_tapelabel call"); if (!$vol_tle) { # $self->_user_msg(slot_result => 1, # label => $label,
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Mon, Oct 12, 2020 at 14:53:03 -0400, Chris Hoogendyk wrote: > There should be no difference in the tapes. I did them all by just > doing an `amtape daily slot nn` followed by an `amlabel daily > Bio-Research-nnn`. The first 20 or so were all done in sequence in > one session, and that would include the four you mention. I didn't > even retype the commands. I used the up arrow twice to pull up the > previous command, backspaced the number and typed a new number for > the slot or for the tape label as appropriate. Based on your description of when the problems started, I'm guessing the issue is not in how the tapes were originally labeled but some fluke of how they were used after that. > The new tapes that I put in with the new magazines were labeled in > the same way. Those are now out of the library, and the tapes that > had originally been in the library were returned. That is when the > problem occurred. So, what I'm wondering is if there is any pattern to which tape labels tie to the tapes used in the "new magazine" (and thus now no longer actually in the library) and which were in the "removed and later returned" category? Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Mon, Oct 12, 2020 at 15:27:44 -0400, Chris Hoogendyk wrote: >amcheck-device: Not a SCALAR reference at >/usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm line 102. Ah, drat, this was a typo in my patch. Please edit line 102 of that file and remove the doubled "$" character, i.e. $$sl->{device_status} should be $sl->{device_status} Then retry the amcheck test and see if gets any farther along. Nathan p.s. if there are any actual Perl programmers left on this list, feel free to jump in and point us in the right direction here -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Mon, Oct 12, 2020 at 18:19:21 +, Debra S Baddorf wrote: > Is it worth trying to just remove the ???state??? file (rename it to > .save for instance) > and letting amanda recreate it? Hmmm, that's an intersting idea... though it seems (from reading the man pages) that it might be possilbe to do this using the "amtape reset" command rather than deleting the state file directly, Chris, in the mean time, what to you get from the commands: amtape daily inventory amtape daily taper (For now we'll just let it default to oldest.pm, until we fix the patch for lexical.pm.) Both the output to the terminal session an the log files will probably be useful. Nathan ---- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Sun, Oct 11, 2020 at 23:32:08 -0400, Chris Hoogendyk wrote: > Text file. > > attached. Excellent, perfect. If you are able to run the patched version of lexical.pm that should give more explicit info, but meanwhile just looking through the statefile: one thing that jumps out at me is that four of the slot entries have Math::BigInt device_status fields, rather than simple integers: Bio-Research-004, Bio-Research-001, Bio-Research-013, Bio-Research-014 Do those four volumes ring a bell with you as being special in some way? (I wonder if the segfault might be related to the program trying to do some operation against a BigInt object when an integer is expected, or something) More generally, is there any pattern to the labels you used for your "normal" tapes v.s. the short-term ones you wrote and then sent to Iron Mountain? Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Sun, Oct 11, 2020 at 14:48:13 -0400, Chris Hoogendyk wrote: > In any case, I currently have both systems flushing to tape using > the traditional taper scan. That may work for now, but it would be > good to track this down. It's also puzzling that it just turned up R.e. tracking this down: my thought is to see if we can track down the problem by tweaking the code for lexical.pm -- that way, you can trigger test runs using "amcheck -otaperscan=taper_lexical" without running the risk of having a normal cron job attempt to run the code you are in the middle of modifying. I've attached a patch which, hopefully, will both fix the "uninitialized" warning messages that have been appearing in the logs and also print some debugging info as it loops through the tape inventory so we can see if it dies in the middle of that loop. So, when you are ready to investigate further, save a copy of the original /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm, then apply the patch to the "live" lexical.pm file, and then try running amcheck -s (with lexical scan) again. You may have to fix unblanced quotes or whatever typos in the patched lines, but hopefully you'll soon get a log file which lists the tapes in the inventory (but doesn't have the warning lines any more)... at which point, send me those log lines Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239 --- lexical.pm_orig_v3.5.1 2020-10-11 18:23:37.650643033 -0400 +++ lexical.pm 2020-10-11 18:47:47.097749555 -0400 @@ -97,6 +97,10 @@ for my $i (0..(scalar(@$inventory)-1)) { my $sl = $inventory->[$i]; + + # tracing segfault: +debug("slot: $i label: " . $sl->{'label'} . "ds: " . $$sl->{device_status}); + next if $seen->{$sl->{slot}}; if (!defined $sl->{'state'} || @@ -104,6 +108,7 @@ push @unknown, $sl } elsif ($sl->{'state'} == Amanda::Changer::SLOT_EMPTY) { } elsif (defined $sl->{'label'} && +defined $sl->{device_status} && $sl->{device_status} == $DEVICE_STATUS_SUCCESS) { if ($self->is_reusable_volume(label => $sl->{'label'})) { if ($last_label && $sl->{'label'} gt $last_label) {
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Sun, Oct 11, 2020 at 15:51:34 -0400, Nathan Stratton Treadway wrote: > My theory is that the driver for the changer keeps the inventory written > in a file somewhere specific to the changer, but since I don't have a Okay, looking back through the log file you posted a couple days ago, it looks like the file in question is found at /usr/local/var/amanda/chg-robot-dev-tape-by-id-scsi-1BDT-FlexStor-II-00MX64200626-LL0 If I'm reading the driver correctly, that should be a text file (perl-formated data definitions)... so can you post the contents here? (Or, if it seems too large, send it to me off-list.) (It's atuaally called "statefile", but the robot changer seems to include the inventory information as part of the changer state. ) Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Sun, Oct 11, 2020 at 14:48:13 -0400, Chris Hoogendyk wrote: > I'm not sure what I should be looking for. > > I don't see anything in amanda's home directory that seems likely, > nor in /tmp/amanda, and there is no /etc/amanda/. The tape library > has a web interface that I use to interact with it. Amanda is > configured to use mtx. I can also use mtx by hand to check on the > library's status, and I have a script amchanger that I wrote that > does that for me. > > So, aside from amanda keeping an inventory, and the tapelist that it > has, I'm not sure where else anything would be. My theory is that the driver for the changer keeps the inventory written in a file somewhere specific to the changer, but since I don't have a tape changer myself I am not familiar with that driver and don't know where to direct you to look, off hand Can you post the changer and tape-drive related parameters/sections from your amanda.conf file? (I assume the config sections on your two different servers are essentially the same, right?) Nsthan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Sun, Oct 11, 2020 at 12:24:45 -0400, Chris Hoogendyk wrote: > /tmp/amanda/server/daily/amcheck-device.20201011120931.debug (lexical) > Sun Oct 11 12:09:31.912816794 2020: pid 22002: thd-0xe8e600: amcheck-device: > NEO200x48: updating state > Sun Oct 11 12:09:31.922463203 2020: pid 22002: thd-0xe8e600: > amcheck-device: warning: Use of uninitialized value in numeric eq > (==) at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm > line 106. Excellent, this points towards the "tape inventory" part of the code. Can you take a look around your system to see if you can find where the tape changer stores inventory information, internally? (If you don't find it immediately I can look in the source to try to figure out the path it would use, but hopefully its easy enough to figure outjust looking thorugh amanda-related directories on your system.) If we can find that file, it may be able to see some "wierd" data that could be causing a crash by looking at the file directly. Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Sat, Oct 10, 2020 at 23:50:17 -0400, Chris Hoogendyk wrote: > Wow! > >amanda@marlin:~/daily$ amcheck -s -otaperscan=taper_traditional daily >Amanda Tape Server Host Check >- >NOTE: Holding disk '/amanda3': 913514496 KB disk space available, using > 808656896 KB >NOTE: Holding disk '/amanda4': 158228480 KB disk space available, using > 53370880 KB >NOTE: Holding disk '/amanda5': 1636618240 KB disk space available, using > 1531760640 KB >Searching for label 'Bio-Research-002':label 'Bio-Research-002' not > recognized or not found >slot 13:slot 13 not in use-slots (14-36) >slot 14: volume 'Bio-Research-013' is still active and cannot be > overwritten >slot 15: volume 'Bio-Research-003' >Will write to volume 'Bio-Research-003' in slot 15. >NOTE: skipping tape-writable test >Server check took 175.512 seconds >(brought to you by Amanda 3.5.1) >amanda@marlin:~/daily$ > > That worked! Interestingly, doing an `amcheck -s daily` after that > fails just as before. The amanda.conf uses taper_oldest. Okay, this lends support to the theory that the crash is actually happening in "scan" operation, rather than in some later part of the amcheck-driver/taper process. (Were there any error/warning messages written to the amcheck-device log file for that run?) > So, maybe if I temporarily go to the different algorithm, it will > work. Right now the backups are already running and dumping to Yeah, it might well work, and if so -- and if you don't care which tape(s) are used next -- then simply switching to taper_traditional would probably be the easiest approach to getting new dumps actually written to tape If the order the tapes are used does matter to you, I think it should probably be possible to fix the bug in taper_oldest (oldest.pm) to get it working (but I'm not really sure how much debugging effort it will involve...). If you are interested to attempt that, the next thing I would check is to see what happens with -otaperscan=taper_lexical (assuming that is also defined in your amanda.conf). The "lexical" and "oldest" algorithms both use the tape-drive inventory (while "traditional" does not), so that test will help narrow the problem down to just "oldest" or to the tape-inventory part of the code Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Sat, Oct 10, 2020 at 19:20:25 -0400, Chris Hoogendyk wrote: > The taper log from last night looks exactly like the amcheck log, > showing the same inventory followed by the updating state and the > same three repetitions of the warning at line 102. So you are saying that the taper log from last night goes up through the lines that look like >Thu Oct 08 23:30:04.244060162 2020: pid 18920: thd-0x28b4800: taper: mtx: > Storage Element 48:Full :VolumeTag=CLN002CU >Thu Oct 08 23:30:04.244209614 2020: pid 18920: thd-0x28b4800: taper: > NEO200x48: updating state , then switch to the warning lines from oldest.pm line 102, and then aborts suddenly? It's not getting you an actual solution, but I'm curious if changing to a different taperscan algorithm in the amanda.conf file (or using -otaperscan= , if you have any other ones defined already) allow "amcheck -s" to complete sucessfully (and in any case what the amcheck log file looks like with the another taperscan algorithm) Nathan ---- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: restore from vtapes written by amvault
On Fri, Oct 09, 2020 at 08:48:41 +0200, Bernhard Erdmann wrote: > There is another logfile: > /tmp/amanda/server/be-full/amfetchdump.20201009083605.debug > > $ cat /tmp/amanda/server/be-full/amfetchdump.20201009083605.debug > Fr Okt 09 08:36:05.594766041 2020: pid 2215: thd-0x1bab4f0: amfetchdump: > pid 2215 ruid 33 euid 33 version 3.5.1: start at Fri Oct 9 08:36:05 2020 Yeah, that's the one to look at. Unfortunately, I don't see anything in there that tells us more than we already knew (i.e. that the program wasn't choosing the correct storage before searching for the requested dump). However, in my quick tests on a somewhat-similar setup here, I found that I could in fact get amfetchdump to request the correct volume by doing a manual override of the storage on the command line So, does the following command work any better?: $ amfetchdump -ostorage=vtape -d vtape be-full svr '^/$' 2119 Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Sat, Oct 10, 2020 at 17:11:16 -0400, Chris Hoogendyk wrote: > Also > > 8 lines like this in the kern.log: > >Oct 9 21:05:58 eclogite kernel: [650814.343315] amcheck-device[5089]: > segfault at 0 ip >7f94159617c6 sp 7fff61039da8 error 4 in > libc-2.23.so[7f94158d6000+1c] > Well, that doesn't really tell us more more about what is going wrong, other than the slight hint the problem is all the way down in the the libc library somehow. I don't know how much you are trying to investigate further at this point in your furlough schedule... but I still feel that comparing the log from a succesfull run to this aborted run has the best chance of generating a hint as to exactly what operation is underway at the point of failure. Also, the taper/changer logs from last night's run should give some hint as to what it was attempting, and perhaps those logs will be different enough from the amcheck-device logs that it'll give some new information (The last operation that appears to be happening in your quoted amcheck-device log lines is a scan through the tape-changer inventory. I don't have a physical tape-drive changer myself so I don't have any guesses as to what could be wrong, but based on the history of the situation you described it does seem plausible that the inventory database it's working from could contain some "unexpected" data of some sort...) Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Sat, Oct 10, 2020 at 00:29:04 -0400, Nathan Stratton Treadway wrote: > Can you post the output of > > $ sed -n '99,$p;105q' > /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm > ? > > (In other words, what's line 102 in that file on your system, with a > few lines of context?) (Hmm, perhaps $ cat -n /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm | grep "^ *102" -C3 would be better -- that way, the file line numbers are included in output...) Nathan ---- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Fri, Oct 09, 2020 at 22:22:47 -0400, Chris Hoogendyk wrote: > both servers, I'm getting this error (ERROR: amcheck-device > terminated with signal 11). When I ran the amcheck before swapping amcheck-device is a Perl program, so it's a little bit impressive to be triggering a SEGV of the process :( . Do you get any coredump-related kernel messages in your syslog file when the process crashes? > Fri Oct 09 21:57:09.795469021 2020: pid 24239: thd-0xc2e600: amcheck-device: > NEO200x48: updating state > Fri Oct 09 21:57:09.802340532 2020: pid 24239: thd-0xc2e600: > amcheck-device: warning: Use of uninitialized value in numeric eq > (==) at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm > line 102. > Fri Oct 09 21:57:09.802537511 2020: pid 24239: thd-0xc2e600: > amcheck-device: warning: Use of uninitialized value in numeric eq > (==) at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm > line 102. > Fri Oct 09 21:57:09.802622523 2020: pid 24239: thd-0xc2e600: > amcheck-device: warning: Use of uninitialized value in numeric eq > (==) at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm > line 102. I am guessing this "uninitialized value" warning is not directly causing the crash, but those log message might possibly hint as to where in the program execution had reached just prior to the crash. Can you post the output of $ sed -n '99,$p;105q' /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm ? (In other words, what's line 102 in that file on your system, with a few lines of context?) Do you see those warning lines in the log files from a successfully amcheck run (i.e. from a few days ago)? What do those logs show after the warning lines (or the "updating state" line)? Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: restore from vtapes written by amvault
On Thu, Oct 08, 2020 at 14:29:48 -0400, Nathan Stratton Treadway wrote: > On Thu, Oct 08, 2020 at 20:00:18 +0200, Bernhard Erdmann wrote: > > $ amadmin be-full find svr / > > > > date host disk lv storage pooltape or file file part status > > 2000-01-19 svr / 0 vtape vtape vBE-full-0012 1/1 OK > > 2000-01-19 svr / 0 be-full be-full BE-full-00 2 1/-1 OK > > Okay, that's good news. > > > > > But amfetchdump still does not know about tape vBE-full-001: > > > > $ amfetchdump -d vtape be-full svr '^/$' 2119 > > 1 volume(s) needed for restoration > > The following volumes are needed: BE-full-00 > > Have you looked into explicitly specifying the storage that amfetchdump > is looking for/at? That would be my first thing to investigate at this > point. If I get a chance later this evening I'll try to look back > through my notes to see if I can remember the details of how that > works Well looking through the source code for amfetchdump, it seems like the program does not pull storage names from the amanda.conf file after all, but instead it seems to assume that it can determine the storage to use based on the changer specified by the"-d vtape" parameter. Based on the fact that it's prompting for tape BE-full-00, though, that doesn't seem to be working as expected It looks like amfetchdump should be creating a "$logdir/fetchdump.$timestamp" log file. If so, does that include any mention of opening the vtape changer and/or detecting storage names? (Assuming it's not super long, you could post that log file here for us to take a look at...) Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: restore from vtapes written by amvault
On Thu, Oct 08, 2020 at 14:29:48 -0400, Nathan Stratton Treadway wrote: > Have you looked into explicitly specifying the storage that amfetchdump > is looking for/at? That would be my first thing to investigate at this > point. If I get a chance later this evening I'll try to look back > through my notes to see if I can remember the details of how that > works (What does $ amadmin be-full config | grep -i storage show right now?) Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: restore from vtapes written by amvault
On Thu, Oct 08, 2020 at 20:00:18 +0200, Bernhard Erdmann wrote: > $ amadmin be-full find svr / > > date host disk lv storage pooltape or file file part status > 2000-01-19 svr / 0 vtape vtape vBE-full-0012 1/1 OK > 2000-01-19 svr / 0 be-full be-full BE-full-00 2 1/-1 OK Okay, that's good news. > But amfetchdump still does not know about tape vBE-full-001: > > $ amfetchdump -d vtape be-full svr '^/$' 2119 > 1 volume(s) needed for restoration > The following volumes are needed: BE-full-00 Have you looked into explicitly specifying the storage that amfetchdump is looking for/at? That would be my first thing to investigate at this point. If I get a chance later this evening I'll try to look back through my notes to see if I can remember the details of how that works Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: restore from vtapes written by amvault
On Wed, Oct 07, 2020 at 17:50:22 -0400, Nathan Stratton Treadway wrote: > That is, I would make a copy of the original log.20201004123343.0 file > into some other directory, then used a txt editor to edit that > particular DONE line to remove the "00" at the end of the > datetimestamp field... and then run the "amadmin ... find" command again > to see if that edit allowed it to start finding the vaulted copies of the > dumps. > (In case it's not clear, what I was trying to say was that I would save a copy of the file off in another directory for safe keeping, then edit the original file for my testing.) Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: restore from vtapes written by amvault
> Am 07.10.20 um 00:05 schrieb Nathan Stratton Treadway: > > $ grep 20201004123343 ../tapelist > 20201004123343 vBE-full-001 reuse BLOCKSIZE:32 POOL:vtape STORAGE:vtape > CONFIG:be-full Okay, that looks correct as far as I can tell > > Also, what do you get when you grep log.20201004123343.0 for "srv /"? > > (That should give you all the taper lines related to writing the > > "missing" dump for srv / .) > > $ grep "svr / " log.20201004123343.0 > PART taper "ST:vtape" vBE-full-001 2 svr / 2119 1/-1 0 [sec 43.408733 > bytes 13631488 kps 306.666403] > DONE taper "ST:vtape" svr / 211900 1 0 [sec 44.00 bytes 13631488 > kps 302.545455 orig-kb 0] Hmmm, the one thing that seems a little strange is the extra zeros at the end of the datetimestamp string on the DONE line I don't know how well this situation of mixing dumps made in the date-only datestamp era with vault-copies made in the datetimestamp era has actually been tested... so if it were me that's probably what I'd play with next. That is, I would make a copy of the original log.20201004123343.0 file into some other directory, then used a txt editor to edit that particular DONE line to remove the "00" at the end of the datetimestamp field... and then run the "amadmin ... find" command again to see if that edit allowed it to start finding the vaulted copies of the dumps. Let us know how it goes... Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: driver general protection fault on amanda 3.5.1
On Wed, Oct 07, 2020 at 14:54:30 -0400, Steve Ryan wrote: > I'm trying to debug an issue we've been having in our amanda > 3.5.1 setup. Currently backups are failing every night due to (I > believe) the driver faulting. Relevant logs: > > amdump mail report: > FAILURE DUMP SUMMARY: > chunker: FATAL Broken pipe at > /usr/lib64/perl5/vendor_perl/Amanda/IPC/LineProtocol.pm line 429. > chunker: FATAL Connection reset by peer at > /usr/lib64/perl5/vendor_perl/Amanda/IPC/LineProtocol.pm line 579. > > dmesg: > 2020-10-07T01:06:08.770127-04:00 vacuum.cs.umd.edu kernel: traps: > driver[25995] general protection ip:7f2a9ffe50ec sp:7ffc61f8b040 > error:0 in libamanda-3.5.1.so[7f2a9ffaa000+81000] > > > The environment is about ~80ish nodes total, running mostly RHEL7 > with some RHEL8 and ~3-5 Ubuntu/Debian machines. Everything is > running 3.5.1. straight from the official sources. I don't think > it's being caused by a client machine anyway, and some machines get > backed up each night. I don't remember seeing this particular problem reported here before and don't have any silver bullet... Which distribution is the Amanda server running on? Was this setup of Amanda-server-and-~80ish-clients ever working properly at some point before this crashing started?? > Has anyone seen this issue before/know what debug info I should be > looking for in the logs? If the driver proceess is indeed core dumping, you should see evidence of that in /var/log/amanda/server//driver..debug for that run. At the very least the log should end abruptly; if you are lucky there you might find a stack trace or something givening a clue as to what is happening just before the crash. If you can go back through the runs from various nights and correlate the crashes to e.g. a particular client kicking off just beforehand, or something, that might be a useful clue. You can also look at the chunker..debug files in that same directory to see if they give any additional hits, but off hand I'd guess that they are just going to report that the chunker processes are aborting due to the fact that the far side of the socket/pipe disappeared, which presumably is caused by the driver process crashing Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: restore from vtapes written by amvault
On Tue, Oct 06, 2020 at 18:49:55 +0200, Bernhard Erdmann wrote: > "amadmin CONF find" does not locate the dumps on vtape (labeled > vBE-full-001), only on original tapes (labeled BE-full-00 to BE-full-04). > > $ amadmin be-full find svr / > > date host disk lv storage pooltape or file file part status > 2000-01-19 svr / 0 be-full be-full BE-full-00 2 1/-1 OK > [...] > The logdir contains the original logfiles of 19 Jan 2000 as well as the > logfile log.20201004123343.0 describing the amvaulting to vBE-full-001. > (I assume you are running Amanda v3.5, right?) I use dump-time vaulting rather than "amvault" and am not exactly certain what details are different between the two approaches, but off hand I'm guessing that if you can fix the "find" problem that will also allow the other commands to start working... When I use "find", it does find both the "primary" and "vault" copies: = # su backup -c "amadmin TestBackup find TestServer" datehostdisk lv storage pooltape or file file part status 2019-09-19 23:17:28 TestServer /data 0 TestOffsite TestOffsite TESTBACKUP-1032 1/1 OK 2019-09-19 23:17:28 TestServer /data 0 TestBackup TestBackup TESTBACKUP-12 2 1/1 OK = Looking at the source code for the "find" command, it seems that Amanda looks through the log.* files based on the data stamps pulled out of the tapelist file... so in your case, what does grep 20201004123343 tapelist show (for the /etc/amanda/be-full/tapelist file)? Also, what do you get when you grep log.20201004123343.0 for "srv /"? (That should give you all the taper lines related to writing the "missing" dump for srv / .) Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Is amanda busy
On Wed, Sep 23, 2020 at 10:10:21 +0700, Olivier wrote: > Does it exist a command that can be used to check whether amanda is busy > or not? > > For example, do not launch the daily backup if the previous one is still > running, or do not reboot Amanda server (network stability issue) if a > backup is being done. I don't know of a straightforward Amanda-provided command to do exactly this. (Note that in older versions, Amanda would abort a new amdump run if an old run was still underway, but in v3.5 there is support for concurrent runs so it specifically doesn't abort automatically any more.) In general, you can look to see if /var/log/amanda//amdump exists. That symlink is created when amdump starts, and renamed to "amdump.1" as amdump finishes, so if the "amdump" symlink still exists than the job is still running (... or it died without cleaning up). If you are programming a script to check for this, you might also check for an "amflush" symlink at the same time -- that symlink exists while amflush is running, and depending on your configuration you might not want to start a new amdump job while amflush is running, either. For a manual check, you can run the "amstatus' command to see the status of either the current in-progress run (if it shows the amdump or amflush file) or last-complieted run (amdump.1/amflush.1). Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Incrementals becoming fulls
On Tue, Sep 15, 2020 at 20:14:34 -0700, Jim Kusznir wrote: > I've read the website on the amanda site about changing disk IDs, but I did > a fairly thorough checking of that, and I don't think that's it. First, > there is literally NO device on the mount point: > > root@AmandaBackup:/usr/local # mount > cys-bkup/iocage/jails/AmandaBackup/root on / (zfs, local, nfsv4acls) > root@AmandaBackup:/usr/local # > > And that stays exactly the same. The important factor for Amanda (or, more precisely, GNU tar), is not the mount points per se, but rather the Device field of each particular file's inode. Does FreeNAS have a "stat" command? If so, it would be interesting to see the output of that command for a few of the files in question, and perhaps for various top-level directories of the jail above the cross-mounted data directories, in hopes that gives some hint of what is going on > > I've found and ran the tar-snapshot-edit perl script in read mode to view > the device IDs. I am seeing a few different device IDs show up, but the > level 0, level 1 and leve1.new for any given share always have the same > device ID. Here's a snippit of the output: > > File: amclient-tdriveCYS-2018-Session-A-B_0 > Detected snapshot file version: 2 > > Device 0x2900ff0b occurs 6305 times. > > File: amclient-tdriveCYS-2018-Session-A-B_1 > Detected snapshot file version: 2 > > Device 0x2900ff0b occurs 6306 times. > > File: amclient-tdriveCYS-2018-Session-A-B_1.new > Detected snapshot file version: 2 > > Device 0x2900ff0b occurs 6305 times. but I agree that since these all match, it seems something else beyond the usual device-id-change is going on. > > I've noticed that all affected devices will end up with a .new. I think I believe this means that the backup didn't didn't finish -- there shouldn't be any .new files left between runs. Do you see any errors in your mail report, or in your log files? What does "ls -l" show for the directory containing the snapshot files? Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: How to "unlable" a tape
On Fri, Sep 18, 2020 at 10:53:44 +0700, Olivier wrote: > I know there is amadmin no-reuse, but suppose the tape had been > completely destroyed and is not readable anymore, there should be a way > to tell Amanda it should completely forget about that tape, remove any > index it can have aboutt he tape. > > What would be the command then? You're looking for "amrmtape" . Nathan ---- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: amsamba errors
On Mon, Jul 06, 2020 at 19:39:49 +0200, Stefan G. Weichinger wrote: > I see stuff like this in my amanda-backups when I use the > amsamba-application (amanda-3.5.1, Debian 10.4, samba-4.10.15): > > sendbackup: info BACKUP=APPLICATION > sendbackup: info APPLICATION=amsamba > sendbackup: info RECOVER_CMD=/bin/gzip -dc > |/usr/lib/amanda/application/amsamba restore [./file-to-restore]+ > sendbackup: info COMPRESS_SUFFIX=.gz > sendbackup: info end > ? smbclient: cli_setatr failed: NT_STATUS_ACCESS_DENIED > ? smbclient: cli_setatr failed: NT_STATUS_ACCESS_DENIED > > [..] > > Unfortunately even the log files in > /var/log/amanda/log.error don't show more details. A little bit of additional information should be found in the /var/log/amanda/client//Amsamba. The user in /etc/amandapass has admin rights on the files in the DLE (at > least I am told so). > > Maybe it's "only" setting the archive bit that fails? If you count the number of NT_STATUS_ACCESS_DENIED errors in a particular /var/log/amanda/log.error/errout file and it exactly matches the number of files backed up in that run (e.g. as determined by a count of lines in the corresponding /var/lib/amanda//index//___/_-unsorted.gz file ), then it seems likely to be the archive bit permissions. (Another symptom pointing in that direction is if incremental dumps for that Samba DLE are the same size as full dumps even when you know that most files would not have been changed since the previous full dump.) To test explicitly, you can do a manual smbclient test, something like this: $ USER= smbclient //path-to-share -E -W -c "cd ; tarmod full reset hidden system quiet; tar c -" > smbclient_test.tar to try to backup the remote test directory into the local smbclient_test.tar file using the same smbclient "tar" settings that Amanda uses (Also, the smbclient "ls" command shows you a flag of "A" or "N" on a file, so you can use that to ocnfirm that the smbclient tar command is successfully clearing the Archive bit.) Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Timeout during estimate
On Mon, Jun 15, 2020 at 00:20:25 -0400, Nathan Stratton Treadway wrote: > On Mon, Jun 15, 2020 at 10:41:58 +0700, Olivier wrote: > > I have an Amanda client that takes more than 4 hours to do the > > estimate. The estimate is computed correctly, but when amandad on the [...] > Sounds like you are looking for the "etimeout" parameter in amanda.conf > on the Amanda server. (You don't need to recompile anything to change > this setting.) I meant to add that another approach is to change the "estimate" option for the disktype used in the DLE(s) for that client machine to something other than "client". Depending on your situation, one of the other two options may given you a good-enough estimate of the size of the dumps in a lot less time... which would probably have the additional effect of letting the overall dump complete more quickly. Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Timeout during estimate
On Mon, Jun 15, 2020 at 10:41:58 +0700, Olivier wrote: > I have an Amanda client that takes more than 4 hours to do the > estimate. The estimate is computed correctly, but when amandad on the > client tries to send back the estimate to the server, the packet times > out. > > I kind of remember that there is a timeout parameter that I need to > tweak before recompiling Amanda, but I can't remember if it is on the > client or on te server. I tend to think it is on the server. But > definitive answer is welcome. Sounds like you are looking for the "etimeout" parameter in amanda.conf on the Amanda server. (You don't need to recompile anything to change this setting.) Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: runspercycle with value 5
On Mon, Jun 01, 2020 at 15:09:54 +0200, zoi...@medialab.sissa.it wrote: > documentation and in messages in this mailing list runspercycle set with > a value of 5 and immediately next to it (like a golden rule) the mention > that amdump will run 5 times per cycle excluding weekends? > > My question is: why excluding weekends? Is not possible to run amdump > 5 times per week excluding Wednesday and Friday? Is there any > correlation between the value 5 and the weekends? These types of examples probably date back to the era when Amanda was generally used to send backups to a no-changer tape drive... so you needed someone physically present to change the tape before each new run (and thus couldn't do new backups if the office was empty on the weekends). Thus, these examples target getting a full dump once a week, but assume that only 5 tapes will be used over the course of that week. But indeed there's no particular reason to only do backups on weekdays (especially when using tape changers or vtapes, etc. which don't need daily manual intervention). In fact, at my site the current setup (which uses vtapes) runs amdump every night, and has dumpcycle set to 3 and runspercycle is left unset (which means "same as dumpcycle") -- so we're both running the dump all seven days a week and also getting more than two full-dump cycles within each week. > > Another related question: In a configuration with > >dumpcycle 7 days >runspercycle 5 > > during a dumpcycle (a week) do I have to run amdump in the same days > as in the > previous dumpcycle or in the first dumpcycle I can run amdump during > the days > > 1 2 4 5 6 > > and in the second dumpcycle amdump can be run during the days > > 2 3 4 6 7? Amanda simply uses the runspercycle number to (try to) figure out how to space out full v.s. incremental dumps over the course of a cycle -- but this is done each run at a time, rather than there existing some sort of overarching schedule which later runs then have to follow. Put another way, each time Amanda runs, it gathers up the estimate statistics for all the DLEs in the disklist, then tries to calculate an answer to the question "Which DLEs should I pick to full-dump today, so that no DLE is overdue for a full dump and so that the total size of the full dumps today is close to the daily average size of full dumps?". (You can get some insight into the calculations involved using the "amadmin balance" command.) Since each Ananda run makes this calculation separately, if you change the run schedule it will simply re-calculate using the current numbers and do the best it can. There are lots of nuances to how that plays out, but the short version is that assuming you do keep to the same number of actual runs over the course of the dumpcycle, you should be fine -- with the caveat that at least when your filesystems have a fairly constant rate of changed/added files, if your runs aren't spaced evenly then the runs that happen after a longer delay will have more data than the runs after a shorter delay. That is often not really a big deal, but depending on your situation it can be undesirable (which is why many of the examples out there are structured to try to keep the daily backup size as constant as possible). Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: amanda-3.4.5 does not fill one tape
On Fri, May 15, 2020 at 10:59:49 +0200, Stefan G. Weichinger wrote: > I am gonna restore as well just to check if there are hidden write > errors (doesn't look like that to me so far ...) That reminded me that (at least on our Ubuntu Linux system) the smartmontools package's "smartctl" let us read error statistics information from the SCSI tape drive. I put smartctl -l error -H $TAPEDEV in the cron script which ran Amanda, and it would produce output like this: == smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ TapeAlert: OK Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 00 0 0 0 0.000 0 write: 63040 6304 6304 6304 0.000 0 == With that output at the end of the Amanda-running script I could keep an eye on the numbers for each particular tape as it came through the rotation cycle. (A few thousand seemed normal, at least by the time I implemented this monitoring [since by then the tapes were several years old]; when tapes were starting to really go bad I saw error counts in the tens of thousands, etc.) When the drive detected that the tape heads needed to be cleaned, the output looked like == smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ TapeAlert Errors (C=Critical, W=Warning, I=Informational): [0x14] C: The tape drive needs cleaning: 1. If the operation has stopped, eject the tape and clean the drive. 2. If the operation has not stopped, wait for it to finish and then clean the drive. Check the tape drive users manual for device specific cleaning instructions. Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 00 0 0 0 0.000 0 write: 33901 3392 3389 3390 0.000 1 == (You should double-check the behavior if you are going to rely on it, but as I recall the stats are cleared when you swap a tape and when you run the above smartctl command.) Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: amanda-3.4.5 does not fill one tape
On Thu, May 14, 2020 at 09:14:17 +0200, Stefan G. Weichinger wrote: > Interesting, how can a "dirty" drive trigger this behavior? > > I'd expect failures all along and not after ~200 or 300 GB written. > > I don't see any interrupted writing or so (until that End Of Tape). (We switched to disk-drive vtapes a long time ago so when I was last looking into the details of backup-tape-drive behavior it was probably for pre-LTO technology, but I would assume that for this discussion LTO is similar) For "modern" error-correcting tape drives, when the computer sends data out to the tape drive to be written to tape, the drive actually then uses the read head to immedately read back in the data it just wrote. If that read fails, the drive will automatically/transparently try the write again... repeating the process until it is able to achieve a successful confirmation read of that block of data. Normally this just happens once in a while, when there's a bad spot on the tape or some fluke of writing makes the data unreadable, and one doesn't even notice it's happening. However, if the drive head is dirty or the tape media in general is wearing out, then what happens is that many many many of the data blocks either will be written badly or will fail to read back in [depending on what exactly is dirty or failing], and the drive will have to re-write data multiple times before a succesful write/read cycle. When that happens, then lots of the linear space on the tape is used by all the repeated writes -- thus making the tape appear to have a lower capacity than you would expect -- and also all that re-writing means the data throughput from the server's point of view is much reduced. (Note that in this scenario the drive just keeps retrying to write a block up data until it succeeds... or until it hits the end of the tape. So that's why you don't get "interrupted writing" in the sense of having mid-tape write errors returned by the tape device the computer. [But it is "interrupted" in the sense that a block takes much longer to write than it should so the computer has to wait a long time before it can sent the next block of data down to the drive.]) Hope that makes sense. Nathan ---- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: smbclient
On Thu, Apr 23, 2020 at 10:40:25 +0100, Nuno Dias wrote: > The "estimate server" solved the problem, definitely the amsamba don't > like in the estimate the NT_STATUS_ACCESS_DENIED even when is not in > the PATH. So, with "estimate server" the backup was able to complete the full dump phase successfully? It would be interesting to know if there are any NT_STATUS errors in the Amsamba logs for that successfully run and also to hear if the second day's (level 1) backup is actually smaller than the full dump, as expected. Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: smbclient
On Tue, Apr 21, 2020 at 13:06:13 +0100, Nuno Dias wrote: > I tried everything, I'm using the Administrator user in windows 10, I > check and he has the rights to do everything, nevertheless in smbclient > I have several ERRORS saying "NT_STATUS_ACCESS_DENIED listing" this is > system files or system directories. > > The backup I want to do is the users directories, and I can read that > directories and files, it seems amanda fails because of the previous > ERRORS. > > Even if I put something like this //pcwindows/c$/Users/user/files and > I'm positive that I can read these dir and files, I still have error > in //pcwindows/c$/Progranas > NT_STATUS_ACCESS_DENIED listing \Programas\* > > And this is not in the PATH of the backup :( Yeah, that does seem wierd :( It may be that the estimate phase inadvertently tries to access files that are outside the specified directory tree, or something. (In my case we are only backing up "data file" shares on the Windows PC, so the system directories are not included anywhere on the shares in question.) Windows clients are definitely not my strong point, but if you want you could go ahead and send me (off-list) the Amsamba.*debug file that corresponds to the failed estimate (and probably your amanda.conf and disklist files too) and I can see if I can determine anything by comparing that with the logs from my working system The other thing that just occurred to me is that if it seems like its the estimate phase that is failing, you could trying adding "estimate server" to the amsamba dumptype to see if that at least allows Amanda to proceed to the dumping phase. Nathan ---- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: smbclient
On Mon, Apr 20, 2020 at 18:29:13 +0100, Nuno Dias wrote: > So, after countless hours with this issue without success, let me ask > the community about smbtar. > > Can I use smbtar with amanda? if yes how? do I need to write a new > plugin? > (At least here on my Ubuntu system, smbtar is just a wrapper shell script around smbclient, so I doubt that adding that level of indirection into the pipeline Amanda is trying to execute would make things work better However, it does remind me that when I was trying to get this working I did extract the actual "smbclient" command from the .debug file and run it manually from the command line in order to experiment. For example, the .debug file for the estimate phase shows the command /usr/bin/smbclient -d 0 -U backup -E -W -c archive 0;recurse;du , so then I set the USER and PASSWD environment variables in my shell session [running on the Amanda server machine], and did experimentation using /usr/bin/smbclient -d 0 -E -W -c "archive 0;recurse;du" ... and similarly for the command taken from the sendbackup runs where the command was something more along the lines of "cd TestDirectory/; tarmod full reset hidden system quiet; tar c -" etc. This allowed me to try out permissions changes on the Share volumes in real time, rather than having to kick off a full Amanda run to see the results of my changes.) Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: smbclient
On Sat, Apr 18, 2020 at 11:44:18 +0100, Nuno Dias wrote: > Hi, > > The OS is Fedora 30 with amanda-3.5.1-19.fc30.x86_64 and samba- > client-4.10.14-0.fc30.x86_64 (this are the packages from the OS). > > The error that I have is accessing some files, > > ERROR smbclient: NT_STATUS_ACCESS_DENIED listing > Windows\\System32\\config\\systemprofile\\AppData\\Local\\Microsoft\\Wi > ndows\\INetCache\\Content.IE5\\*" > > Can this be the reason all the estimates fails? Yes, it could be (though I would expect to see a more explict failure message from the estimate phase later on in the log file if that were the case). You will certainly need to connect to your Windows client using a username which has sufficient permissions to read all files, and also to update the "archive bit" on the backed-up files so that level 1 dumps work as expected. I haven't delived into the nuances of NTFS permissions and have only gone through this on a single windows client machine, but in my experimentation I found that setting the following permissions using Server Manager for the particular shares in question produced a working level-1 backup with no error messages in the log, while also at least preventing obvious editing of the share by the backup user : permissions (for the smbclient user) enabled: "Read & exectue", "List folder contents", "Read" Advanced Attributes enabled: "Traverse folder/execute file", "List folder/read data", "Read attributes", "Read extended attributes", "Read permissions" "Write attributes" (On this particular client box, then Sharing tab already had "Full Control" granted to Everyone, so I did not have to fine-tune the settings there to get Amanda working.) Hope that is at least somewhat helpful... Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: smbclient
On Fri, Apr 17, 2020 at 19:51:47 +0100, Nuno Dias wrote: > Hi, > > I'm trying to use the amsamba plugin, put fails everytime in the > estimate. I have a vague recollection of running in to an unexpected warning message from from smbclient when I first set up Amanda 3.5.x with Samba 4.7.x), but it sounds like a different problem from yours. What error messages do you find in the .../client//Amsamba.2020*.debug log file? (Knowing your OS and Amanda versions [and exact version of Samba] might be helpful, too.) Nathan ---- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Zmanda/Amanda: current support list and some new work ....
On Wed, Feb 26, 2020 at 17:05:22 +, Chris Hassell wrote: > Just to keep up-to-date after a long effort I'm working > currently on these areas and have finally gotten out of config-mgmt / > packaging. I don't see any commits in git://github.com/zmanda/amanda later than December 2019 or so. Has the public amanda work been moved to another repo? Nathan -------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: separate spindle for dump use -- Amanda planner
On Sun, Feb 09, 2020 at 15:29:04 -0500, Gene Heskett wrote: > It has perms to use 2 vtapes, so there is nothing left in either of the > holding disks, ever. And I am still trying to make it use higher levels, > its entirely too often it will advance half the disklist from level 2 to > level 0, doing a jump of 9 days!, and that then runs onto the 2d vtape > by half a vtape. Doesn't make any sense either when it says going to do > a level 3, then in the final report in the same email it actually does a > level 0, maybe 20 times out of a 75+ entry disklist. > > So basically, I am tired of amanda lying to me. It should do what the Gene, When you say Amanda is lying to you, are you referring to the section of the Amanda mail report that looks like: planner: Incremental of bumped to level 6. planner: Incremental of bumped to level 2. planner: Incremental of bumped to level 2. [...] planner: Full dump of promoted from 6 days ahead. planner: Full dump of promoted from 6 days ahead. planner: Full dump of promoted from 6 days ahead. [...] ... and in particular the fact that sometime the same DLE shows up in both of those lists? Assuming so: keep in mind that the "bumped to level" messages actually come from an early pass through the DLE list, during which Amanda simple decides if the incremental level (for each DLE) should be bumped up, based on the various bump* amanda.conf dumptype parameters. (Presumably the calculations from this step will match the results printed by the "admadmin ... bumpsize" command.) At this point Amanda hasn't tried to do any "balancing" yet, so really these "Incremental of bumped to level X" messages should be interpreted as meaning "if I happen to do an incremental dump for , that dump will be at level X" (rather than as, say, "I am about to do a level X dump of "). > planner says its going to and it might eventually hit a balanced > schedule, but with something overriding the planner, there is no way in > hell it will ever get it done. The "from from N days ahead" lines are the ones from the logic that attempts to balance the runs. I think it's fair to say that they are _from_ the planner, so the issue is not really that something is "overriding" the planner, but rather that the planner's calculations aren't working out very well in your context Looks like there we had some discussion here on the list back in November 2018 about the unbalanced runs you were seeing, which got as far as my theorizing that perhaps you had a few DLEs that were so much larger than all the others that Amanda's normal algorithm for adjusting the balance is unable to achieve that goal I don't know if there is anyone left on this list who understands the inner workings of those algorithms, but if you really want to get to the bottom of the problem you'll probably have to spend some time combing through the balance calculation section of the planner log files Or, you can try one or both of these and see if they happen to improve the situation: * break up your largest DLEs (or exclude some of the data from Amanda backup in order to make them smaller) * make your dumpcycle either longer or shorter, so that the expected average daily size changes. (Depending on the relatives sizes of your DLEs, you may need to made the daily size either larger or smaller to get things to work better...) Anyway, if you decide you want to get to the bottom of your balance issue, it probably makes sense to go back to the 'following the "balance" report' thread from 2018 and pick up from there... > Back at about 2.4.2 I had zero trouble filling to within 50 megs, a 4GB > DS2 tape. And that was by telling amanda the tape was 3.5Gigs, and (I have no idea how much the planner algorithm changed between 2.4 and 3.5, but keep in mind that your DLE list changed a lot since then as well, so it's not really an apples-to-apples comparison...) Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: separate spindle for dump use
On Sun, Feb 09, 2020 at 05:47:46 -0500, Gene Heskett wrote: > amcheck is again happy, and since a df now shows that now empty disk as > haveing 896702760 1k blocks free, it ought to have all the room it > needs, and I'm reasoning the original /usr/dumps holding disk can be > commented out. > > However, since I'm also backing up 4 other machines whose "spindles" are > out on the net and can and do run in-parallel, is the removal of the old > default a good idea? That would, depending on the scheduling, have 5 > dumpers writing to the same spindle again. I am not sure we can give a definitive answer to that question... Off hand I would suspect that the single disk drive is fast enough to handle all the traffic as it comes in over the network, especially if it's a decently modern drive attached via direct SATA link and you have a reasonable amount of RAM available for buffering/caching. If so, your total backup-run time would be about the same with just the one drive. On the other hand, it's certainliy true that if you have two drives in use for holding disk files then the total amount of drive-head movement is reduced (since each drive doesn't have to move back and fourth between as many different files), compared having just one drive (assuming spinning-platter drives, obviously). So if you are worried about your sdb drive's lifespan, it makes sense to keep /usr/dumps in the mix. (On the other hand, if sdb is only used for the holding disk while sda is important because it contains the active root filesystem, you might want to push the extra wear onto sdb by commenting out the /usr/dumps side.) But without some specific unusual factor in your situation, I'd guess that it from the drive hardware side of things it probably won't matter in the end, and you should just go ahead and configure whichever way seems easiest for you as system administrator Nathan ---- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239