from:"Nathan Stratton Treadway"

Re: Amanda is crashing on [runtar invalid option: -]

2023-03-23 Thread Nathan Stratton Treadway

On Thu, Mar 23, 2023 at 2:07 AM Olivier  wrote:

> Hello,
>
> I have had Amanda running for over a decade, yesterday I had no issue at
> all but last night, my backups for Ubuntu machines started crashing
> consistently with the error:
> strange(?): runtar: error [runtar invalid option: -]
>

The just-released amanda package upgrade seems to have a regression for
GNUTAR DLEs; see:
https://bugs.launchpad.net/debian/+source/amanda/+bug/2012536/

Nathan

Re: Degraded dump in amanda 3.5.2

2023-01-07 Thread Nathan Stratton Treadway

On Fri, Dec 30, 2022 at 11:34:14 -0300, Pablo Venini wrote:
> amcheck doesn't report errors

Hmmm.

As Stefan said, the key question is why Amanda is going into degraded
mode.  Normally when I have that happen it's because the target tape
wasn't available at the start of the amdump run, but that doesn't seem
to be the situation in your case.

But it seems like buried somewhere in your logs should be some futher
explanation of why Amanda is switching to degraded mode (apparently
mid-run)

Nathan



--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Degraded dump in amanda 3.5.2

2022-12-29 Thread Nathan Stratton Treadway

On Thu, Dec 29, 2022 at 17:17:56 -0300, Pablo Venini wrote:
> Hi, I'm setting up a new backup server with amanda 3.5.2 on CentOS 7
> with vtapes. I've setup the vtapes directories, created the job and
> added the dlc, checked permissions, then run amcheck and it shows no
> errors. However when I run amdump, only one of the dlc gets backed
> up, the other ones give a "can't do degraded dump without holding
> disk" error. The config is:

(I take it you are not attempting to configure any holding disk?)

What does "amcheck" report?


Nathan

----
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: what leads to a "new disk" ?

2022-12-04 Thread Nathan Stratton Treadway

On Thu, Dec 01, 2022 at 10:18:03 +0100, Stefan G. Weichinger wrote:
> 
> I have an installation where I didn't add or remove DLEs for a long time.
> 
> But now an then amanda seems to "forget" a DLE and come up with
> something like:
> 
> samba.intra rootfs lev 0  FAILED [dumps too big, 42606931 KB, but
> cannot incremental dump new disk]
> 
> The DLE is NOT new. Where does that come from?

Looks like the source file server-src/planner.c generates that message
if the "last_level" data element for the DLE is negative...

What does "amadmin  info " report for that DLE (during the
period when you are getting this message, i.e. before the next
successful full dump takes place)?

Nathan

----
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: New Amanda Community release 3.5.2 Has Arrived! -- email bounce

2022-09-28 Thread Nathan Stratton Treadway

On Tue, Sep 13, 2022 at 16:11:15 -0400, Nathan Stratton Treadway wrote:
> On Tue, Sep 13, 2022 at 17:29:41 +0200, Stefan G. Weichinger wrote:
> > I received:
> > 
> > "Your message to chris.hass...@betsol.com couldn't be delivered.
> > 
> > Chris.Hassell wasn't found at betsol.com."
> 
> (I just tried sending email to this email address.  My message was
> accepted for delivery by the Betsol mail server, and I haven't received
> any bounce message back [after waiting a few minutes].  So hopefully the
> bounce you saw was just a temporary misconfiguration on the Betsol mail
> server...(?) )

Ack -- I just discovered that my 9/13 test message did result in a
bounce message after all.  (The bounce message went to my spam
folder.)

I tried again just now, and his email address still bounced.

So it would seem that Chris is indeed no longer at Betsol  :(

Nathan


--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: New Amanda Community release 3.5.2 Has Arrived! -- email bounce

2022-09-14 Thread Nathan Stratton Treadway

On Wed, Sep 14, 2022 at 13:32:08 +0200, Stefan G. Weichinger wrote:
> Am 13.09.22 um 22:11 schrieb Nathan Stratton Treadway:
> 
> >(I just tried sending email to this email address.  My message was
> >accepted for delivery by the Betsol mail server, and I haven't received
> >any bounce message back [after waiting a few minutes].  So hopefully the
> >bounce you saw was just a temporary misconfiguration on the Betsol mail
> >server...(?) )
> 
> Interesting, thanks for testing.
> 
> Did you get a reply already?

No, I haven't received any reply... (but also have not received any
bounce message).

> Would be great to know if someone at Betsol is responding to the
> community, and when we see installable packages from them.

(Yep, agreed.)

Nathan
--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: New Amanda Community release 3.5.2 Has Arrived! -- email bounce

2022-09-13 Thread Nathan Stratton Treadway

On Tue, Sep 13, 2022 at 17:29:41 +0200, Stefan G. Weichinger wrote:
> I received:
> 
> "Your message to chris.hass...@betsol.com couldn't be delivered.
> 
> Chris.Hassell wasn't found at betsol.com."

(I just tried sending email to this email address.  My message was
accepted for delivery by the Betsol mail server, and I haven't received
any bounce message back [after waiting a few minutes].  So hopefully the
bounce you saw was just a temporary misconfiguration on the Betsol mail
server...(?) )

Nathan 

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: New Amanda Community release 3.5.2 Has Arrived!

2022-08-02 Thread Nathan Stratton Treadway

On Tue, Aug 02, 2022 at 12:30:07 -0400, gene heskett wrote:
> And where do I get the debian approved versions of 3.5.2? 

That's what Jose is attempting to create now... (So watch this thread
for news.)

Nathan


----
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: "overdue" wrong

2022-06-02 Thread Nathan Stratton Treadway

On Thu, Jun 02, 2022 at 10:04:16 +0200, Stefan G. Weichinger wrote:
> Overdue 19140 days: server:dle007

[...]
>   Dumps: lev datestmp  tape file   origK   compK secs
>   0  19700101  vtape-007-1  14 -1 -1 -1

Sure enough, 1970/1/1 is 19145 days ago, so the two utilities are
consistent :)


> The DLEs are dumped OK, though. 

The amadmin info shows that this dump has no size (in addition to the
"zero" date), so somehow the amanda history is not recording a
successful dump  Can you make sure sure that the dump was in fact
written successfully all the way to tape?  


Nathan


--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: amrecover usage with chg-robot

2022-05-27 Thread Nathan Stratton Treadway

On Fri, May 27, 2022 at 15:54:50 +0200, Stefan G. Weichinger wrote:
> 
> 
> forgot pstree:
> 
> -tmux: server-+-bash---amrecover-+-amandad-+-amandad
> 
>|  | `-amindexd
> 
>|  |-amandad-+-amandad
> 
>|  |
> `-amidxtaped-+-exuvo_crypt---openssl
> 
>|  | `-2*[{amidxtaped}]
> 
>|  |-gzip
> 
>|  |-tar
> 
>|  `-{amrecover}

Ah, so euxvo_crypt is run by the amidxtaped process rather than by the
amrecover process itself.

What does strace show amrecover is doing during this period?  

And "ps -ef" shows that the openssl process is still alive (i.e. not
defunct).  What does "strace" show on that process.  If you manually
kill it, does the change of processes up through amidxtaped unwind and
amrecover resume normal processing?

Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: amrecover usage with chg-robot

2022-05-27 Thread Nathan Stratton Treadway

On Fri, May 27, 2022 at 10:28:09 +0200, Stefan G. Weichinger wrote:
> After that both tar and gzip.binary are shown as  in ps,
> whatever that means.

Okay, that's a little progress in the investigation.

"" means that the process has exited, but the return code from
the process has not been read by the parent process yet.  So in this
case, whatever process spawned the tar and gzip subprocesses is not
"noticing" when the subprocesses finish... the question is why (and what
is it stuck doing instead of cleaning up)?

Are the "openssl enc" and/or encription-wrapper-script processes still
out there at this time (and what state are they in)?

You should be able to use pstree or "ps -ef" to determine which process
is the parent (PPID column) of the defunct subprocesses.

Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: amrecover usage with chg-robot

2022-05-25 Thread Nathan Stratton Treadway

On Wed, May 25, 2022 at 17:09:22 +0200, Stefan G. Weichinger wrote:
> 
> So to me it looks that my dumptype with both compression and
> encryption is the problem.
> 
> I use the script provided by Anton "exuvo" Olsson, he shared it in
> earlier threads here.
> 
> The current iteration on this server:
> 
> https://dpaste.org/2YrkJ
> 
> Maybe it hasn't yet been tested with amrecover from multiple tapes?
> 
> Or the combination with gzip is a problem.

I haven't used encryption with Amanda so I don't have anything
specific to suggest.

Off hand I don't see anything obviously incorrect with that script

(Well... in the encrypt-operation case it writes the contents of "$@" to
/tmp/encryptparams file but that file doesn't ever appear to be
referenced... but that parameters don't appear to be referenced in the
decrypt-operation case either, so I don't expect that aspect of the
script is related to your problem.)

My next step would be to investigate the status of the subprocesses
during the period where amrecover seems to be hung, i.e. using ps, lsof,
and strace.  

I'm guessing the "openssl enc -d" process doesn't exit for some reason;
can you identify what it's trying to do or waiting for?  If you see that
process out there but just sleeping (i.e. using no CPU, and strace shows
it's just stuck waiting in a "read" syscall or something)), what happens
if you manually kill the process (i.e. does amrecover the proceed to
its next step)?

    Nathan

Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: amrecover usage with chg-robot

2022-05-25 Thread Nathan Stratton Treadway

On Wed, May 25, 2022 at 14:50:00 +0200, Stefan G. Weichinger wrote:
> Currently I have another amrecover running. It restored from tape1
> .. and now I only see these lines in the current debug file
> "amidxtaped.20220525123652.debug":
> 
> Wed May 25 14:48:25.078884308 2022: pid 705002: thd-0x556f690aca00:
> amidxtaped:
> /usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:490
> 12472320 kb
> Wed May 25 14:48:40.090891256 2022: pid 705002: thd-0x556f690aca00:
> amidxtaped:
> /usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:490
> 12472320 kb
> Wed May 25 14:48:55.102880465 2022: pid 705002: thd-0x556f690aca00:
> amidxtaped:
> /usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:490
> 12472320 kb
> 
> ... for hours now.

> Is that OK? Maybe tar still "scans" through that first tarball on tape ... ?

When doing an extract tar does read on to the end of the tar file before
exiting, but "hours and hours" seems like a long time to wait for
that...

Is tar still running (e.g. what does "top" or "ps" show)?  If so, what
does strace on the tar process show?  Do any other amanda (sub)processes
exist on the system at this time?


        Nathan



Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Problem with amtape "hanging" when forked from Java

2022-04-09 Thread Nathan Stratton Treadway

On Tue, Mar 08, 2022 at 18:27:47 -0500, Robert Heller wrote:
> For some unfathomably reason amtape "hangs" when forked from a Java program.
> 
> I've written a Java program that goes through vaulted tapes and forks amtape 
> (using Runtime.getRuntime().exec(()), and when a non-existant tape label is 
> asked for, amtape "hangs".  I cannot figure out why or how to get amtape to 
> just exit with an error (which I can then handle).

Robert, did you ever resolve the problem you were having with amtape
hanging?

Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Problem with amtape "hanging" when forked from Java

2022-03-10 Thread Nathan Stratton Treadway

On Thu, Mar 10, 2022 at 17:08:36 -0500, Robert Heller wrote:

> (I have no interactivity configuration in any of my
> configurations files, so it is presumably defaulting to being empty.)

(See below...)


> Here is the diff:
> 
> *** amtape-java.debug 2022-03-10 16:14:52.556321620 -0500
> --- amtape-shell.debug2022-03-10 16:11:33.357521853 -0500
> ***
> *** 1,87 
> ! Thu Mar 10 16:13:15.068034312 2022: pid 19101: thd-0x55f1f8259c00: amtape: 
> pid 19101 ruid 34 euid 34 version 3.5.1: start at Thu Mar 10 16:13:15 2022
> ! Thu Mar 10 16:13:15.068101557 2022: pid 19101: thd-0x55f1f8259c00: amtape: 
> Arguments: -otpchanger=vault_changer -ointeractivity= deepsoft-normal label 
> examplevault
> ! Thu Mar 10 16:13:15.068391863 2022: pid 19101: thd-0x55f1f8259c00: amtape: 
> config_overrides: tpchanger vault_changer
> ! Thu Mar 10 16:13:15.068403765 2022: pid 19101: thd-0x55f1f8259c00: amtape: 
> config_overrides: interactivity 
> ! Thu Mar 10 16:13:15.068513641 2022: pid 19101: thd-0x55f1f8259c00: amtape: 
> reading config file /etc/amanda/deepsoft-normal/amanda.conf
> ! Thu Mar 10 16:13:15.068536891 2022: pid 19101: thd-0x55f1f8259c00: amtape: 
> reading config file /etc/amanda/deepsoft-common/amanda.conf


I had in mind your paging through the two files in separate windows
side-by-side, eyeballing for differences in program activity.

For "diff" to be much use you really need to strip off the front part of
each line (datetime, pid, and thd), using something like:

  $ sed -e's/.*00: amtape:/amtape:/' amtape-java.debug > amtape-java.debug_clean
  $ sed -e's/.*00: amtape:/amtape:/' amtape_shell.debug > 
amtape_shell.debug_clean

(and then running the diff on the _clean versions).


However, looking through the listing you posted, I did notice something:


> --- 1,104 
> ! Thu Mar 10 16:09:04.258210052 2022: pid 18689: thd-0x55c92c51d000: amtape: 
> pid 18689 ruid 34 euid 34 version 3.5.1: start at Thu Mar 10 16:09:04 2022
> ! Thu Mar 10 16:09:04.258287566 2022: pid 18689: thd-0x55c92c51d000: amtape: 
> Arguments: -otpchanger=vault_changer deepsoft-normal label examplevault
> ! Thu Mar 10 16:09:04.258592995 2022: pid 18689: thd-0x55c92c51d000: amtape: 
> config_overrides: tpchanger vault_changer
> ! Thu Mar 10 16:09:04.258713649 2022: pid 18689: thd-0x55c92c51d000: amtape: 
> reading config file /etc/amanda/deepsoft-normal/amanda.conf
> ! Thu Mar 10 16:09:04.258739621 2022: pid 18689: thd-0x55c92c51d000: amtape: 
> reading config file /etc/amanda/deepsoft-common/amanda.conf
> ! Thu Mar 10 16:09:04.312355119 2022: pid 18689: thd-0x55c92c51d000: amtape: 
> pid 18689 ruid 34 euid 34 version 3.5.1: rename at Thu Mar 10 16:09:04 2022
> ! Thu Mar 10 16:09:04.322326242 2022: pid 18689: thd-0x55c92c51d000: amtape: 
> Disabling Amanda::Interactivity::stdin because STDIN is not readable

So it does appear that some Interactivity based on the stdin.pm module
was used by that particular invocation of amtape

I'm still pretty confused about the exact behavior you are seeing (i.e.
why it's hanging in the Java context, and why the attempt to pass in an
empty -ointeractivity option doesn't seem to make any difference,
etc.)...

... but since the stdin plugin is deprecated, it seems worth
investigating where that's getting configured in the first place... 

I see from the above messages that amanda is reading both 
deepsoft-normal/amanda.conf and deepsoft-common/amanda.conf .

(If it's not clear from those files where the default interactivity is
being set, it might help for you to post the output of "amadmin
deepsoft-normal config"...)


Nathan






Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Problem with amtape "hanging" when forked from Java

2022-03-10 Thread Nathan Stratton Treadway

On Thu, Mar 10, 2022 at 14:03:07 -0500, Robert Heller wrote:
> It prints an error message and returns an error status:
> 
> backup@newserver:~$ amtape -otpchanger=vault_changer -ointeractivity='' 
> wendellfreelibrary label wendellfreelibrary-vault-030
> ERROR: Source Volume 'wendellfreelibrary-vault-030' not found
> 
> (and does not hang)

Hmmm.

What happens from the command line if you leave off the -ointeractivity
parameter?

The strange thing is that the strace does seem to show a repeated
looping over all the vtape slots, presumably continually searching for
the specfied label but I didn't notice any activity that is
obviously interactivity-related in between the loops.  So it's not clear
why amtape is looping within the Java context but immeidately
terminating with an error message in the shell/tclsh contexts.

> I'm *guessing* that the Java Process created by exec() wants to deal with the 
>
> printout, but can't.  
>

In the strace you posted, amtape appeared to be looping through the
slots multiple times without attempting to write that "ERROR" message
anywhere, so off hand I would guess amtape is getting put into a
different search mode, rather than a problem with the Java side not
being able to accept the output.  I'm not sure what would trigger that
different mode, though.

I suppose my next suggestion would be to compare carefully the amtape
.debug files between a run from the command line and a run from within
Java -- any difference between the two could be a hint

    Nathan

----
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Problem with amtape "hanging" when forked from Java

2022-03-10 Thread Nathan Stratton Treadway

On Thu, Mar 10, 2022 at 13:02:24 -0500, Robert Heller wrote:
> At Thu, 10 Mar 2022 12:46:43 -0500 Nathan Stratton Treadway 
>  wrote:
> 
> > 
> > On Thu, Mar 10, 2022 at 09:55:30 -0500, Robert Heller wrote:
> > > Here is the Java fragment:
> > > 
> > > public class FlushOldVaults extends BackupVault {
> > > private static final String AMTAPE = "/usr/sbin/amtape";
> > > private static final String AMTAPEOPT1 = "-otpchanger=vault_changer";
> > > private static final String AMTAPEOPT2 = "-ointeractivity=";
> > 
> > You would probably be able to confirm this by looking in the amanda
> > log/debug files for the amtape process (i.e.
> > /var/log/amanda/server//amtape.*.debug on Ubuntu) , but I'm pretty
> > sure that you do actually need the empty argument in order to disable
> > the interactivity, something like
> >  private static final String AMTAPEOPT2 = "-ointeractivity=''";
> 
> It does not like the empty argument,  amtape throws a error status and the 
> Java subprocess returns a failure status.  It is "happy" with what I have, 
> except it hangs on the broken tape.

Okay, sounds like the argument parsing is different in the Java .exec()
context than on a shell command line.  The important question is whether
or not the interactivity is actually disabled  perhaps the amtape .debug
file gives some confirmation?


> The timeout never times out.  The amtape process goes into Sleep state and 
> the 
> Java program just hangs.

In that case hopefully strace -p/lsof -p on the amtape process (or any
other Amanda processes that amtape spawns) can give some hint as to what
it's waiting for

Nathan






Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Problem with amtape "hanging" when forked from Java

2022-03-10 Thread Nathan Stratton Treadway

On Wed, Mar 09, 2022 at 22:50:29 -0500, Robert Heller wrote:
> At Wed, 9 Mar 2022 23:50:45 +0100 Exuvo  wrote:
> 
> > 
> > Could you give the exact command line you give when it hangs?
> 
> /usr/sbin/amtape -otpchanger=vault_changer wendellfreelibrary label 
> wendellfreelibrary_vault-030
> 

What does this command say/do when run from the command line (for a tape
that causes a hang in the context of your Java program)?


Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Problem with amtape "hanging" when forked from Java

2022-03-10 Thread Nathan Stratton Treadway

On Thu, Mar 10, 2022 at 09:55:30 -0500, Robert Heller wrote:
> Here is the Java fragment:
> 
> public class FlushOldVaults extends BackupVault {
> private static final String AMTAPE = "/usr/sbin/amtape";
> private static final String AMTAPEOPT1 = "-otpchanger=vault_changer";
> private static final String AMTAPEOPT2 = "-ointeractivity=";

You would probably be able to confirm this by looking in the amanda
log/debug files for the amtape process (i.e.
/var/log/amanda/server//amtape.*.debug on Ubuntu) , but I'm pretty
sure that you do actually need the empty argument in order to disable
the interactivity, something like
 private static final String AMTAPEOPT2 = "-ointeractivity=''";

(I am not particularly certain that interactivity is your specific
problem, but it seemed a plausible explaination and one that that was
fairly easy to test out...)


> if (!p.waitFor(60L, java.util.concurrent.TimeUnit.SECONDS)) {
> System.err.printf("*** FlushOldVaults.amtape(): process 
> timeout\n");
> String kill[] = new String[2];
> kill[0] = "/bin/kill";
> Long j = new Long(p.pid());
> kill[1] = j.toString();
> Process killproc = Runtime.getRuntime().exec(kill);
> killproc.waitFor();

Note that in order to use the strace debugging, you'll probably need to
disable this timeout+kill logic -- otherwise the amtape process won't
hang out long enough for you to figure out what it's trying to do when
when it's "stuck"


    Nathan





Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Problem with amtape "hanging" when forked from Java

2022-03-09 Thread Nathan Stratton Treadway

On Tue, Mar 08, 2022 at 18:27:47 -0500, Robert Heller wrote:
> 
> I've written a Java program that goes through vaulted tapes and forks amtape 
> (using Runtime.getRuntime().exec(()), and when a non-existant tape label is 
> asked for, amtape "hangs".  I cannot figure out why or how to get amtape to 
> just exit with an error (which I can then handle).

Does it still hang if you pass an argument "-ointeractivity=''" when you
exec amtape?

If it does still hang, what do "lsof -p" and "strace -p" show on the
amtape process while it's stuck?

Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Problem with taperscan and chg-single

2021-04-05 Thread Nathan Stratton Treadway

On Sat, Apr 03, 2021 at 18:24:59 +1100, meku wrote:
> The traditional taperscan does not support interactivity which is why I am
> trying to use lexical or oldest.

Yeah, "lexical" and "oldest" taperscans should work

This problem seems vaguely familiar but I am not remembering off hand
what would cause it

What do the following commands report?

  amtape  inventory


  amadmin  retention



Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: amgtar: defaults for NORMAL and STRANGE

2021-01-19 Thread Nathan Stratton Treadway

On Wed, Jan 20, 2021 at 14:22:02 +1100, Tom Robinson wrote:
> I'm still seeing messages in the report that should have been squashed. It
> also doesn't matter what I have configured as 'NORMAL' for the application
> configuration.
> 
> STRANGE DUMP DETAILS:
>   /-- lambo.motec.com.au / lev 1 STRANGE
>   sendbackup: info BACKUP=APPLICATION
>   sendbackup: info APPLICATION=amgtar
>   sendbackup: info RECOVER_CMD=/usr/bin/gzip -dc
> |/usr/lib64/amanda/application/amgtar restore [./file-to-restore]+
>   sendbackup: info COMPRESS_SUFFIX=.gz
>   sendbackup: info end
>   | /usr/bin/tar: ./dev: directory is on a different filesystem; not dumped
>   | /usr/bin/tar: ./proc: directory is on a different filesystem; not dumped
>   | /usr/bin/tar: ./run: directory is on a different filesystem; not dumped
>   | /usr/bin/tar: ./sys: directory is on a different filesystem; not dumped
>   | /usr/bin/tar: ./mnt/s3backup: directory is on a different filesystem; not 
> dumped
>   | /usr/bin/tar: ./var/lib/nfs/rpc_pipefs: directory is on a different 
> filesystem; not dumped
>   ? /usr/bin/tar: ./mnt/s3backup: Warning: Cannot flistxattr: Operation not 
> supported

[...]
> property"NORMAL" ": socket ignored$"
> property append "NORMAL" ": file changed as we read it$"
> property append "NORMAL" ": directory is on a different filesystem;

Note that the man page explaination of NORMAL includes the sentence
'These output are in the "FAILED DUMP DETAILS" section of the email
report if the dump result is STRANGE'.

In this case, the "Operation not supported" message is considered
STRANGE... which in turn causes all the NORMAL message lines to be
included in the report output as well.

So presumably once you resolve all of those error messages for a
particular DLE, that DLE will no show up with a STRANGE DUMP DETAILS
section at all, in which case those NORMAL-category messages will no
longer be included in the report.

(You could prevent those messages from ever showing up in the report by
setting them to IGNORE in the config file, but in general I'd say trying
to fix the underlying cause of a STRANGE status is preferable to
suppressing messages completely)


Does that explain the behavor you were seeing?


Nathan



Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: amgtar: defaults for NORMAL and STRANGE

2021-01-18 Thread Nathan Stratton Treadway

On Tue, Jan 19, 2021 at 11:53:52 +1100, Tom Robinson wrote:
> 
> Also, the man page says there are defaults for NORMAL and STRANGE but these
> 'defaults' don't seem to be included into the application definition when I
> dump the config information with amadmin daily config:

[...]
> Is the man page incorrect? Are the 'defaults' really applied or do I have
> to manually specify them in the config file?

I haven't looked closely at this functionality before, but from a quick
skim of the code in application-src/amgtar.c, it looks like those
default values are built directly in to the program itself.  

That is, they aren't implemented as part of the config system and thus
don't show in the output of "amadmin ... config", but they do indeed
exist underneath the hood.  

(As a corollary to that, it seems like there isn't any way to completely
delete the default strings from amgtar's processing, though you can
override the treatment of a particular regex by explicitly specifying
it as another type in the config file.)


(Are you seeing any situations where it looks like the default strings
aren't being applied as you would have expected from the man page
description?)

Nathan


----
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: amgtar: Operation not permitted

2021-01-18 Thread Nathan Stratton Treadway

On Tue, Jan 19, 2021 at 11:53:52 +1100, Tom Robinson wrote:
> amanda-server 3.5.1
> 
> Hi,
> 
> I've recently started using amgtar instead of tar to reduce/remove the
> STRANGE output in daily backup reports.
> 
> I now get a lot of permission warnings and errors. Of particular concern
> are the 'Operation not permitted' messages:

I'm not coming up with any definitive explainations off the top of my
head...

What OS and/or distribution is this running on?  How did you install
Amanada?

Can you navigate down into the directories that generate these errors
using "ls" executed manually (as root)?

Do you have apparmor or similar kernel-level security enforcement
active?

Nathan

----
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: "bad status on taper SHM-WRITE (dumper)" message

2020-12-23 Thread Nathan Stratton Treadway

On Wed, Dec 23, 2020 at 08:11:25 -0500, Gene Heskett wrote:
> amstatus: bad status on taper SHM-WRITE (dumper): 20 at 
> /usr/local/share/perl/5.24.1/Amanda/Status.pm line 929, <$fd> 
> line 3411.
[...] 
> But that log was overwritten by the flush.sh I did trying to complete 
> the backup on vtape-30, so is gone forever. But vtape-31 was not 

In Amanda 3.5, the "/var/log/amanda//amdump" path is actually
just a symlink pointing to the currently-active amdump file among the multiple
timestamped files (amdump.MMDDhhmmss), so the original log file should
still be out there.

It will be interesting to compare the contents of the logs from working
and non-working runs.  So:

1) what's the timestamp for the run that generated this error?

2) what does 
 $ grep SHM-WRITE amdump.202012[12]*
   show (when run from within the correct .../log/amanda/... directory)?  

   (The idea being to grep through all the amdump.* files from the past
   13 days, just as a quick way to hit both good and bad runs.)

3) is there any correlation between the runs where amstatus returns this
   error and other interesting messages appearing in the Amanda Mail
   Report for those runs?

Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: did it again. -- crc differ

2020-12-19 Thread Nathan Stratton Treadway

On Sat, Dec 19, 2020 at 14:43:56 -0500, Gene Heskett wrote:
> On Saturday 19 December 2020 12:12:07 Nathan Stratton Treadway wrote:
> 
> > On Sat, Dec 19, 2020 at 10:43:42 -0500, Gene Heskett wrote:
> > > new error file, from /home on GO704:(word wrap off)
> > >
> > > dd if=/sdb/dumps/20201219085654/GO704._home.0 bs=32k count=1
> >
> > Okay, that output looks good good.
> >
> > for completeness, can you post the section from this Amanda Report
> > covering this error?
> 
> In the last post.

In that message I see the quoted "FAILURE DUMP SUMMARY" section for the
earlier failure but not the report for when GO704:/home failed...



> No hits on the crc from the previous post, adcf8473:2018270728, any place 
> in that /var/log/amanda tree.

Anything under /tmp/amanda/ (there on GO704)?

    Nathan

----
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: did it again. -- crc differ

2020-12-19 Thread Nathan Stratton Treadway

On Sat, Dec 19, 2020 at 10:43:42 -0500, Gene Heskett wrote:
> new error file, from /home on GO704:(word wrap off)
> 
> dd if=/sdb/dumps/20201219085654/GO704._home.0 bs=32k count=1

Okay, that output looks good good.

for completeness, can you post the section from this Amanda Report
covering this error?


> New crc's
> 
> root@coyote:GenesAmandaHelper-0.61$ grep adcf8473:2018270728 
> /usr/local/var/amanda/Daily/*

G0704 is a separate Amanda client machine, right?  Can you do a similar
grep in the amanda debug/log files over on that machine, too?


Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amrecover hangs after restore

2020-12-19 Thread Nathan Stratton Treadway

On Sat, Dec 19, 2020 at 03:32:07 -0500, Gene Heskett wrote:
> But the problem is not fixed:
> 
> FAILURE DUMP SUMMARY:
>   rpi4 /usr/lib lev 0  partial taper: source server crc (efe0c707:1538583893) 
> and input server crc (fa79e777:1538583893) 
> differ)

(This is back to the CRC error, so I'll send a reply in that thread)

Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: did it again. -- crc differ

2020-12-19 Thread Nathan Stratton Treadway

On Sat, Dec 19, 2020 at 03:32:07 -0500, Gene Heskett wrote:
> But the problem is not fixed:

Well, at least this time it's a one-part dump file, so that may make
investigation at little easier

> 
> FAILURE DUMP SUMMARY:
>   rpi4 /usr/lib lev 0  partial taper: source server crc (efe0c707:1538583893) 
> and input server crc (fa79e777:1538583893) 
> differ)
>   rpi4 /usr/lib lev 0  was successfully retried
> 
> But the failed dump is still in the holding disk:
> 
> root@coyote:config-bak$ ls -l /sdb/dumps/20201219020104/
> total 1502560
> -rw--- 1 amanda amanda 1538616661 Dec 19 02:13 rpi4._usr_lib.0
> 
> >From the emailed report:
> 
>   driver: rpi4 /usr/lib 20201219020104 0 [Will retry dump because of holding 
> disk error: source server crc 
> (efe0c707:1538583893) and input server crc (fa79e777:1538583893) differ)]
>   taper: tape Dailys-24 kb 16495500 fm 79 [OK]
> 
> and:
> rpi4  /usr/lib 0 3273 1467  -- 5:23 10366.4  0:01 1502523.0 PARTIAL FLUSH  
> 5:11  4831.3
> 
> Even the sizes don't match so of course the crc's won't either.

Note that the two sizes mentioned in the error message do match
(1538583893), so I think the full file is getting transfered.

(The file on the holding disk is 32kiB larger, i.e. the size of the
Amanda header:  1538616661-1538583893=32768 .)


What's the header of that holding-disk file look like? (e.g.
  $ sudo dd if=/sdb/dumps/20201219020104/rpi4._usr_lib.0 bs=32k count=1
)

Do you get any hits when you grep the Amanda debug and log files for
those two CRC values ( efe0c707 and fa79e777 )?


        Nathan




Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amrecover hangs after restore

2020-12-18 Thread Nathan Stratton Treadway

On Fri, Dec 18, 2020 at 19:31:52 -0500, Gene Heskett wrote:
> That likely won't happen again if at all, as I doubled the size of a 
> vtape, specifically to stop that. I'm only using around half of a 2T 
> drive for 60 vtapes. But I see it is growing.
> 
> /dev/sde1   1.8T  1.1T  617G  65% /amandatapes

Yeah, sounds like a good idea -- generally if your vtapes are all on a
single shared filesystem like this and you were going to let Amanda use
two vtapes in one run, there's no particular reason not to just increase
the logical size of a vtape so that each run is containing within a
single vtape instead.

Anyway, here's hoping your backups run un-interrupted at least through
the holidays :) ...

Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amrecover hangs after restore

2020-12-18 Thread Nathan Stratton Treadway

On Fri, Dec 18, 2020 at 02:56:55 -0500, Gene Heskett wrote:

> But that after amanda script worked and I have uptodate indices and
> config files in that vtape now.  [...] so a/o vtape Dailys-25 I have
> what I think is a good backup again.

Great!

Since you commented out the section of the "if" that depends on
PARTS_WRITTEN, the code that does the parsing of the amstatus output
won't matter any more -- but if you notice the next time amdump writes
to two vtapes, it would be interesting to see the output of amstatus
from that run...  (And also to know if amstatatus still generates a perl
warning in that situation.)

> And it did not leave the failed crc file in the dumps assignment,

I assume nothing we've done so far would have changed the crc-related
behavior, so if you see that problem again, you should definitely follow
up on that issue (over on that specific list thread)

Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amrecover hangs after restore -- ALWAYS_SET_PATH

2020-12-18 Thread Nathan Stratton Treadway

On Fri, Dec 18, 2020 at 16:23:47 -0500, Gene Heskett wrote:
> On Friday 18 December 2020 15:06:14 Nathan Stratton Treadway wrote:
> 
> > On Fri, Dec 18, 2020 at 14:44:07 -0500, Gene Heskett wrote:
> > > On Friday 18 December 2020 14:33:03 Nathan Stratton Treadway wrote:
> > > > ls -l /etc/login.defs /etc/defaults/su
> > >
> > > ls: cannot access '/etc/defaults/su': No such file or directory
> > > -rw-r--r-- 1 root root 10496 Aug  7  2019 /etc/login.defs
> >
> > (Sorry, obviously I mispelled "default" in that command; see my other
> > email for the new batch of commands.)
> -rw-r--r-- 1 root root20 Dec 17 10:27 /etc/default/su
> -rw-r--r-- 1 root root 10496 Aug  7  2019 /etc/login.defs

Did you manually edit the /etc/default/su file yesterday?

If so, I take it that you added the ALWAYS_SET_PATH line to it at that
point?

On _Stretch_, you do NOT want any ALWAYS_SET_PATH line at all.  The
point of that line is to make the "su" command _on Buster_ act like the
Stretch version used to act.  As the warning message indicates, your
current version of "su" does not recognize that parameter.

So, if you added that line to /etc/default/su yesterday, you can just go
ahead and delete that line again.  (Actually at 20 bytes I guess the
file only contains that one line, so probably the file didn't exist
before yesterday; if that's true, you can just delete it again)

Meanwhile, the error message you have been getting for the past hear
would appear to be coming from the line with that parameter in found in
the login.defs file.

I would say you could just go ahead and delete/comment out that line
from that file as well -- but login.defs is shared across multiple
commands in the shadow password suit, so there is a chance that some
other command will be affected when you edit it.  That's why I sent the
commands to try to figure out where the current version of the file
originally came from.

However, off hand it seems like the warning message you were getting is
an indication that the paremeter is just not implemented yet in your
version of the shadow utilties, so if you don't want to investigate that
side of things it's probably safe for you to go ahead and comment out
the line in the login.defs file.  That should fix the warning messages,
and you can then keep an eye out for any cron jobs or whatever that
suddenly stop working because the PATH is no longer set as expected....

    Nathan

Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amrecover hangs after restore -- ALWAYS_SET_PATH

2020-12-18 Thread Nathan Stratton Treadway

On Fri, Dec 18, 2020 at 15:03:07 -0500, Nathan Stratton Treadway wrote:
> What do you get from these commands?:
>   $ ls -l  /etc/login.defs /etc/default/su
> 
>   $ dpkg -S /etc/login.defs
> 
>   $ dpkg -S /etc/default/su
>   
>   $ apt-cache policy login
> 

Ooops, should have included this one as well:
   $ ls -lc  /etc/login.defs /etc/default/su


Nathan
  


--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amrecover hangs after restore -- ALWAYS_SET_PATH

2020-12-18 Thread Nathan Stratton Treadway

On Fri, Dec 18, 2020 at 14:44:07 -0500, Gene Heskett wrote:
> On Friday 18 December 2020 14:33:03 Nathan Stratton Treadway wrote:
> 
> > ls -l /etc/login.defs /etc/defaults/su
> ls: cannot access '/etc/defaults/su': No such file or directory
> -rw-r--r-- 1 root root 10496 Aug  7  2019 /etc/login.defs

(Sorry, obviously I mispelled "default" in that command; see my other
email for the new batch of commands.)

Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amrecover hangs after restore -- ALWAYS_SET_PATH

2020-12-18 Thread Nathan Stratton Treadway

On Fri, Dec 18, 2020 at 14:42:45 -0500, Gene Heskett wrote:
> On Friday 18 December 2020 14:33:03 Nathan Stratton Treadway wrote:
> 
> > grep ALWAYS_SET_PATH /etc/login.defs /etc/default/*
> etc/login.defs:ALWAYS_SET_PATH yes
> grep: /etc/default/grub.d: Is a directory
> /etc/default/su:ALWAYS_SET_PATH yes

Well, that explains why you are getting the warning message...

Now the question is why those lines exist in the files (and in both of
them, to boot)?

The wierd thing is that this setting seems to be needed *on Buster* to
return the behavior back to previous behavior -- but since you are
running Stretch, it's not clear why those lines exist in the config
files...

What do you get from these commands?:
  $ ls -l  /etc/login.defs /etc/default/su

  $ dpkg -S /etc/login.defs

  $ dpkg -S /etc/default/su

  $ apt-cache policy login

Nathan
--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amrecover hangs after restore -- ALWAYS_SET_PATH

2020-12-18 Thread Nathan Stratton Treadway

On Fri, Dec 18, 2020 at 13:38:21 -0500, Gene Heskett wrote:
> On Friday 18 December 2020 10:53:55 Nathan Stratton Treadway wrote:
> 
> > On Fri, Dec 18, 2020 at 01:10:10 -0500, Gene Heskett wrote:
> > > On Thursday 17 December 2020 23:03:59 Nathan Stratton Treadway wrote:
> > > > What do
> > > >   $ grep ALWAYS_SET_PATH /etc/login.defs /etc/default/*
> > > >   $ ls -l /etc/login.defs /detc/defaults/su
> > > > show?
> >
> > Any luck here?
> 
> No, first, that link is for buster, this  machine is stretch yet but I 
> have put that line an my .bashrc, in roots .bashrc, and in 
> amanda's .bashrc and . sourced them all without any detectable effect.

I don't believe this error message is related to .bashrc at all.

Instead, please run the above-quoted commands and let us know what you
find...

        Nathan



Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amrecover hangs after restore -- ALWAYS_SET_PATH

2020-12-18 Thread Nathan Stratton Treadway

On Fri, Dec 18, 2020 at 01:10:10 -0500, Gene Heskett wrote:
> On Thursday 17 December 2020 23:03:59 Nathan Stratton Treadway wrote:
> > (You can check this by running something simple via su, e.g.
> >   $ su amanda -c "echo test message"
> > )
> Which generates the error.

okay, check.


> > What do
> >   $ grep ALWAYS_SET_PATH /etc/login.defs /etc/default/*
> >   $ ls -l /etc/login.defs /detc/defaults/su
> > show?

Any luck here?

Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amrecover hangs after restore -- ALWAYS_SET_PATH

2020-12-17 Thread Nathan Stratton Treadway

On Thu, Dec 17, 2020 at 10:38:58 -0500, Gene Heskett wrote:
> On Thursday 17 December 2020 09:24:58 Richard Sass wrote:
> 
> > Gene:
> >
> > BUT Whats line 2 above, I've wasted a year looking for that, it does
> > not grep in the whole src code tree.
> >
> > configuration error - unknown item 'ALWAYS_SET_PATH' (notify
> > administrator)

Presumably this comes from the "su" command itself rather than from any
binary you've built from source.

(You can check this by running something simple via su, e.g.
  $ su amanda -c "echo test message"
)

> >
> > Perhaps this will help
> >
> > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=905564
> Its of no help on this stretch install.
> gene@coyote:~$ sudo -i
> [sudo] password for gene:
> root@coyote:~$ su amanda -c "geany bak-indices-configs"
> configuration error - unknown item 'ALWAYS_SET_PATH' (notify 
> administrator)
> 
> So where should it be put, and whose perms on stretch?

If you are trying to eliminate that error message, it seems like you
want to delete/deactivate the corresponding line (rather that put
something new anywhere).

What do 
  $ grep ALWAYS_SET_PATH /etc/login.defs /etc/default/*
  $ ls -l /etc/login.defs /detc/defaults/su
show?

Nathan

Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amrecover hangs after restore

2020-12-17 Thread Nathan Stratton Treadway

On Thu, Dec 17, 2020 at 01:58:27 -0500, Gene Heskett wrote:
> Here is the completed output of that command:
>  root@coyote:~$ su amanda -c "/usr/local/sbin/amstatus Daily"

[...]
> taped   :  78 13416m 13308m (100.81%) (100.81%)
> tape 1  :  79 24046m 24046m ( 37.57%) Dailys-21 (79 parts)

Okay, those two lines are the section in question from your ouput.  In
this case, the "taped" line *does* have statistics on that same line, so
the existing logic in the script should have worked fine, at least on
this run.



On Thu, Dec 17, 2020 at 06:03:38 -0500, Gene Heskett wrote:
> But lets fix the current problem first, it screwed up last night on
> only 1 vtape. Thast means the edit I made last night, has converted it
> into a full time failure.

Did you get this fixed?  If not, post the info on the edit you made and
the output from last night's run


Nathan


--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amrecover hangs after restore

2020-12-16 Thread Nathan Stratton Treadway

On Wed, Dec 16, 2020 at 16:41:39 -0500, Gene Heskett wrote:
> So here is the that script--
> 

Okay, a couple things:

> PARTS_WRITTEN=`${AM_SBIN_DIR}/amstatus $CONFIGNAME | grep taped | awk -F: 
> '{print $2}' | awk '{print $1}'`

If you run "amstatus" manually, what does your "taped" section look like
now?

On my Amanda v3.5 box with dumps going to two separate storages, I get:

=
taped
  TestBackup:   3  6085m  6061m (100.39%) (100.39%)
tape 1  :   3  6085m  6085m (  2.97%) TESTBACKUP-12 (3 parts)
  TestOffsite   :   3  6085m  6061m (100.39%) (100.39%)
tape 1  :   3  6085m  6085m (  2.97%) TESTBACKUP-103 (3 parts)
=

, but your script clearly expects the parts-written figure to be on the
same line as the word "taped".  

So I'm pretty sure you need to upgrade your script to support amstatus's
new formatting in v3.5... but I'm not sure exactly what changes that
would require in your setup (i.e. the output may be different with only
one storage in use, etc.).

(Note that because of this issue I don't think adding quote characters to
the -gt line will actually fix the script: the expression
   [ "" -gt 0 ] 
will fail with a different error than 
   [ -gt 0 ]
... but neither one is valid.)

> # Ok, then lets make it part of the dd.report record
> echo "Parts written = $PARTS_WRITTEN >> dd.report.$TAPENAME"

Okay, this is what produced the output line I found interesting in your
earlier email.

First thing is that this line actaully has a misplaced " character so
it's not doing what the comment describes.  (It's writing to
standard output instead of to the dd.report.$TAPEname file.)  Instead
you want to say
  echo "Parts written = $PARTS_WRITTEN" >> dd.report.$TAPENAME
on that line of the script.

But in spite of that issue, the corresponding line in the log you posted
earlier confirms that PARTS_WRITTEN was empty in that run, which indeed
explains the syntax error you got from the "-gt" line of the script.

    Nathan

Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amrecover hangs after restore

2020-12-16 Thread Nathan Stratton Treadway

On Wed, Dec 16, 2020 at 15:37:32 -0500, Gene Heskett wrote:
> On Wednesday 16 December 2020 12:23:28 Nathan Stratton Treadway wrote:
> 
> > On Wed, Dec 16, 2020 at 09:42:47 -0500, Gene Heskett wrote:
> > > You reminded me of that, so its now done. We'll see if that fixes
> > > it.
> >
> > (Note that putting in the quote characters should prevent the shell
> > from aborting due to the syntax error, but it won't fix the underlying
> > problem that the contents of the PARTS_WRITTEN variable appear to be
> > bogus at that point in time.  Though if you want to debug that issue
> > further, it's probably best if you reply to that branch of this thread
> > directly :) )
> >
> I'll see if, in my somewhat decreased mental state, I can figure out how 
> to echo that into the log file. Jon L. seems to think a bit of perl that 

Well, there was a tantalizing hint already included in the output you
posted to the list, so you may not need to actually chage the script for
that part.

Perhaps you shoud just go ahead and post the script; that would at least
let us see what what script processing matches up with the output you
posted earlier.


> amanda uses has been updated in the last year or so and has broken 
> amanda somehow. 

Off hand, I am thinking there is a bug in the amstatus Perl code which
gets triggered when you have a two-tape run, and then also a fix needed
in your shell script so that PARTS_WRITTEN is always set correctly and
the script can properly deal with a two-tape run.

(Because you only see the error for the double-tape runs, I'm less
inclinded to suspect a Perl upgrade is the issue [rather than a more
general bug in amstatus, as it parsess the Amanda log file], but we may
need to wait until the next time it happens before we can track it
down.)

        Nathan


Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amrecover hangs after restore

2020-12-16 Thread Nathan Stratton Treadway

On Wed, Dec 16, 2020 at 09:42:47 -0500, Gene Heskett wrote:
> You reminded me of that, so its now done. We'll see if that fixes it.

(Note that putting in the quote characters should prevent the shell from
aborting due to the syntax error, but it won't fix the underlying
problem that the contents of the PARTS_WRITTEN variable appear to be
bogus at that point in time.  Though if you want to debug that issue
further, it's probably best if you reply to that branch of this thread
directly :) )


Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amrecover hangs after restore

2020-12-14 Thread Nathan Stratton Treadway

On Mon, Dec 14, 2020 at 14:04:40 -0500, Gene Heskett wrote:
> On Monday 14 December 2020 11:32:56 Nathan Stratton Treadway wrote:
> 
> > On Sun, Dec 13, 2020 at 03:05:16 -0500, Gene Heskett wrote:
> > > ./bak-indices-configs: line 135: [: -gt: unary operator expected
> >
> > There does seem to be an error message coming from the amstatus
> > program which we can investigate later, but as far as your own script
> > not doing the coping I think that might be explained by the above
> > error message.
> >
> > So, what's on line 135 of the bak-indices-configs script?
> >
> That is a very long if else fi thing, wordwrap off:
> ---
> if [ $PARTS_WRITTEN -gt 0 ]; then
>   if [ $DUMMY -eq 1 ] ; then

Seems like the only occurrence of "-gt" is in the $PARTS_WRITTEN line.

The output you quoted in your earlier email mentions
   Parts written =  >> dd.report.Dailys-17
.  Does that mean that the PARTS_WRITTEN variable actaully contained the value
">> dd.report.Dailys-17"?  That would definitely not parse out well in
the if condition...

If add quotes around the variable (i.e. you change that line to 
  if [ "$PARTS_WRITTEN" -gt 0 ]; then
), I think that would prevent the script from erroring out at that
spot (and is generally a good idea).

However, this does lead to the followup question of how PARTS_WRITTEN is
(supposed to be) getting set in the first place?

(Given the other parts of your original email, I'll hazard a guess that
the script is trying to parse the output of amstatus, but the parsing
code is confused by the warning message amstatus is currently
generating)

    Nathan

Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amrecover hangs after restore

2020-12-14 Thread Nathan Stratton Treadway

On Sun, Dec 13, 2020 at 03:05:16 -0500, Gene Heskett wrote:
> ./bak-indices-configs: line 135: [: -gt: unary operator expected

There does seem to be an error message coming from the amstatus program
which we can investigate later, but as far as your own script not doing
the coping I think that might be explained by the above error message.

So, what's on line 135 of the bak-indices-configs script?


Nathan 

----
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amrecover hangs after restore

2020-12-03 Thread Nathan Stratton Treadway

On Thu, Dec 03, 2020 at 16:58:34 +0100, Stefan G. Weichinger wrote:
> Ah, I forgot that. I have "-p4".
> Will retry asap. Thanks for the reminder.

You mean you currently have an explicit "-p4" on the command line
contained in the wrapper script?

If -p1 does work, it would be interesting to know what happens with -p2
and -p3 as well

Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amrecover hangs after restore

2020-12-03 Thread Nathan Stratton Treadway

On Thu, Dec 03, 2020 at 08:46:00 +0100, Stefan G. Weichinger wrote:
> 
> Am 01.12.20 um 13:44 schrieb Stefan G. Weichinger:
> 
> >With the simple wrapper in place amrecover correctly runs through
> >(*and* pigz is used for the amdump and the amrecover step).
> >
> >Tomorrow, when the admin there will insert a specific tape for me,
> >I will test that from the tape I used for the failing tests a few
> >days ago.
> 
> With the tape the same amrecover process did *not* work, same
> behavior as without the wrapper.

Did you try a run using the wrapper to add "-p 1" to the pigz
invocation?

Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amanda.org Website Updates -- Amanda bugfix release?

2020-12-02 Thread Nathan Stratton Treadway

On Wed, Dec 02, 2020 at 00:26:08 +, Pavan Raj wrote:
> We would like to continue the momentum with the upgrades and plan for
> a new release.

(I'm not sure what "upgrades" you have in mind, but in any case I
haven't seen any discussion of a new release here in quite a while, nor
do I see any new-release-type activity in the git repo)

> As a next step, we would like to improve the website to address the
> security issues, modernize, and improve the usability.

I could probably get excited about a revamped amanda.org website
someday, but really getting a bugfix release out the door seems much
more important.

v3.5.1 was released December 1, 2017.  Since then a number of fixes for
bugs have been identified, but all those fixes are still not available
for atual use anywhere, except for the few custom patches being applied
by some distribution-specific maintainers.  (There was an effort in
November 2019 to collect some of the patches into the Zmanda git repo,
but after that brief spurt nothing else happened.)

Over the past couple of months we (here on this list) have identified
several pretty serious bugs and come up with some apparent workarounds,
but had no activity/help from Betsol developers in identifying the
underlying cause and finding correct fixes for those issues, let alone
in getting those fixes tested out and then incorportated into a release
so users can avoid those problems in the future.

So as an urgent first step, I'd definitely be in favor of a 3.5.2
release (and preferably a plan for subsequent minor releases coming down
the pike) before any effort is spent on a website refresh...

Nathan

----
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: did it again. -- crc differ

2020-11-30 Thread Nathan Stratton Treadway

On Mon, Nov 30, 2020 at 18:41:40 -0500, Gene Heskett wrote:
> > On Mon, Nov 30, 2020 at 12:46:46 -0500, Nathan Stratton Treadway wrote:
> > > I assume that the first few lines of the
> > > coyote._home_gene_Pictures.0 file is an Amana header (including an
> > > XML chunk); can you post that here?
> >
> > Hmmm, it might also be useful to see the header from the
> > coyote._home_gene_Pictures.0.5 file (i.e. the last of the subparts) as
> > well
> >
> Try this:
> gene@coyote:sudo dd 
> if=/sdb/dumps/20201130020105/coyote._home_gene_Pictures.0.5 bs=32k count=1
> 
> AMANDA: CONT_FILE 20201130020105 coyote /home/gene/Pictures  
> lev 0 comp N program APPLICATION
> APPLICATION=amgtar
> DLE=<http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amrecover hangs after restore

2020-11-30 Thread Nathan Stratton Treadway

On Mon, Nov 30, 2020 at 15:35:36 +0100, Stefan G. Weichinger wrote:
> Am 30.11.20 um 00:40 schrieb Nathan Stratton Treadway:
> https://github.com/madler/pigz/issues/80

As I mentioned before I'm not familiar with pigz myself, but skimming
through those Github issues (76 and 80), I would guess that the problem
you are having with amrecover is something unrelated to #76.

In particular, 76 seems to be about changing the exit status pigz
returns if there is trailing junk on the compressed input stream... but
as far as I understand at this point, there's no evidence of any
trailing junk in the Amanda dumps.

In any case, the problem doesn't seem to be amrecover mishandling some
unexpected return status, but rather than it never retrivies the return
status of the subprocess in the first place (as evidenced by the Zombie
status of the pigz process).

Off hand I can't really say why that would the case, but one theory that
comes to mind is the fact that gzip normally doesn't spawn it's own
subprocesses but pigz does.  A way to test that theory would be put the
shell-script wrapper around the pigz binary but just call the original
binary with the same command line arguments that amrecover uses, and see
if that setup ends up with processes in Zombie status as well -- and, if
so, then try adding a "-p 1" parameter (for example) to the call to the
real pigz binary to see if that changes the behavior any...

Nathan
--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: did it again. -- crc differ

2020-11-30 Thread Nathan Stratton Treadway

On Mon, Nov 30, 2020 at 12:46:46 -0500, Nathan Stratton Treadway wrote:
> I assume that the first few lines of the coyote._home_gene_Pictures.0
> file is an Amana header (including an XML chunk); can you post that
> here?


Hmmm, it might also be useful to see the header from the
coyote._home_gene_Pictures.0.5 file (i.e. the last of the subparts) as
well

Nathan

----
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: did it again. -- crc differ

2020-11-30 Thread Nathan Stratton Treadway

On Mon, Nov 30, 2020 at 03:12:41 -0500, Gene Heskett wrote:
> Doing a level 0 on /home/gene/Pictures, it logged this in the email:
> 
>   coyote /home/gene/Pictures lev 0  partial taper: source server crc 
> (44cff778:11146117120) and input server crc (dfd0e83a:11146117120) 
> differ)
>   coyote /home/gene/Pictures lev 0  was successfully retried
> 
> It did leave a 10+ Gb file in the vtape, but left the failed files in the 
> holding disk:
> 
> root@coyote:~$ ls -l /sdb/dumps/20201130020105/
> total 10885096
> -rw--- 1 amanda amanda 2097152000 Nov 30 02:06  
> coyote._home_gene_Pictures.0

Would /home/gene/Pictures have changed any between the two retries?  If
not, you might learn something by comparing the components of the
successful dump with the files on the holding disk... (but off hand I'm
not sure how many red-herring differences you'd have sort through to
find any hints as to the actual problem).

> 
> Does anyone have a clue what its really trying to tell me?

I only have some vague clues:

* the number after the ":" is the size of the file being CRCed.  In this
  case 11146117120 shows up for both sides of the commparison, so it
  seems like the full file got transfered across to whatever step is
  causing the error.  It also seems like this error applies to the
  entire 11GB dump rather than the individual 2GB parts.

* The message "source server crc ([...]) and input server crc" appears 
  to be generated in Amanda/Taper/Worker.pm:result_cb() in cases where
  $self->{'server_crc'} and $self->{'source_server_crc'} differ.

  $self->{'server_crc'} seems to be read out of the header of the dump
  file itself.

  $self->{'source_server_crc'} seems to be computed as part of
  transfering the file to the taper process, or something like that.

So I guess the next question is where in the multiple stages of the life
of the dump file  the CRC missmatch gets introduced

I assume that the first few lines of the coyote._home_gene_Pictures.0
file is an Amana header (including an XML chunk); can you post that
here?

Also, what do you find when you grep the Amanda debug/log files for
those two CRC values ( 44cff778 and dfd0e83a )?

One other thought: have the reported CRC errors in the past also been
for the dump of the /home/gene/Pictures DLE, or are multiple different
DLEs affected?  Is it always level 0 dumps?

    Nathan

Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amrecover hangs after restore

2020-11-29 Thread Nathan Stratton Treadway

On Thu, Nov 26, 2020 at 11:12:45 +0100, Stefan G. Weichinger wrote:
> Am 25.11.20 um 22:39 schrieb Debra S Baddorf:
> 
> >>the theory is good, sure. I will test the restore aside from amrecover
> >>tomorrow.
> >
> >
> >If so, remember to ???throw out??? the first block of the file, which will 
> >choke the zip program.
> >dd-skip=1 etc
> 
> I was able to amrestore correctly .. no "-r", no skipping.

(Did you try this while the real pigz binary was in place, or only after
you replaced it with "gzip"?)

On Thu, Nov 26, 2020 at 08:48:02 +0100, Stefan G. Weichinger wrote:
> But guess what I did already ...
> 
> # cp /usr/bin/pigz /usr/bin/pigz-original
> 
> # cp /usr/bin/gzip /usr/bin/pigz
> 
> # amrecover 
> 
> works :-P

Since you have a workaround now I don't know how much more effort you
want to spend on this, but if you do want to investigate further you
could try replacing /usr/bin/pigz with a shell script wrapper which
calls pigz-original but writes some debugging messages, etc. to a log
file before and after invoking pigz-original, and exits with a "success"
status. Basically just trying to see if there is any fussing you can do
to how pigz is invoked or how the exit status is processed which changes
the overall behavior of amrestore-calling-pigz.

    Nathan

Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amrecover hangs after restore

2020-11-29 Thread Nathan Stratton Treadway

On Thu, Nov 26, 2020 at 11:25:40 +0100, Stefan G. Weichinger wrote:
> Am 25.11.20 um 20:57 schrieb Nathan Stratton Treadway:
> >On Wed, Nov 25, 2020 at 14:34:17 -0500, Nathan Stratton Treadway wrote:
> >>Also, do you see the same defunct pigz process that Jason reported in
> >>his original post?
> >
> >Am 30.05.18 um 20:21 schrieb Jason L Tibbitts III:
> >>root  2690  9.1  0.0 317692 11020 pts/0S+   12:38   1:43  | 
> >>  \_ amrecover math -s backup2 -t backup2
> >>root  2996 32.5  0.0  0 0 pts/0Z+   12:48   2:52  | 
> >>  \_ [pigz] 
> >>root  2998  3.3  0.0  0 0 pts/0Z+   12:48   0:17  | 
> >>  \_ [xfsrestore] 
> >
> >Assuming you are seeing this same behavior: one theory that comes to
> >mind is that pigz could be spawning subprocesses which then somehow
> >confuse amrecover such that it doesn't properly detect when pigz
> >terminates (and just keeps waiting for that to happen, even though it
> >already has happened).
> >
> >I don't know enough about how amrecover spawn the pipes to know how
> >likely that is, but one thing you could try is to kill the amrecover
> >process with a SIGCHLD signal (once it reaches the above "everything is
> >hung" situation) and see if one or both of those defunct processes go
> >away, and if the amrecover process starts doing work again
> >afterwards
> 
> Not sure how to show the process tree as shown above ...

(I think Jason's output was generated using the "--forest" option to ps,
but really all that matters is the "Z" process state for the two
subprocesses).

> 
> "kill -s SIGCHLD" .. ran it against the PIDs of amrecover and pigz,
> no effect.
>
> pigz isn't even killed by a "-9"
>

The fact that the pigz process is in defunct/"Z"ombie status means it's
already dead and only still exists in the process listing because the
parent process hasn't read the exit code yet.  So even a -9 won't help
(since that process is already dead).
 

I was hoping SIGCHLD on the amrecover process would trick it into
exiting whatever wait-loop it is in and checking for subprocesses that
have already terminated (both pigz and xfsrestore in the above
listing)... but sounds like that didn't work.

Nathan


Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amrecover hangs after restore

2020-11-25 Thread Nathan Stratton Treadway

On Wed, Nov 25, 2020 at 14:34:17 -0500, Nathan Stratton Treadway wrote:
> Also, do you see the same defunct pigz process that Jason reported in
> his original post?

Am 30.05.18 um 20:21 schrieb Jason L Tibbitts III:
> root  2690  9.1  0.0 317692 11020 pts/0S+   12:38   1:43  |   
> \_ amrecover math -s backup2 -t backup2 
> root  2996 32.5  0.0  0 0 pts/0Z+   12:48   2:52  |   
> \_ [pigz]  
> root  2998  3.3  0.0  0 0 pts/0Z+   12:48   0:17  |   
> \_ [xfsrestore]

Assuming you are seeing this same behavior: one theory that comes to
mind is that pigz could be spawning subprocesses which then somehow
confuse amrecover such that it doesn't properly detect when pigz
terminates (and just keeps waiting for that to happen, even though it
already has happened).

I don't know enough about how amrecover spawn the pipes to know how
likely that is, but one thing you could try is to kill the amrecover
process with a SIGCHLD signal (once it reaches the above "everything is
hung" situation) and see if one or both of those defunct processes go
away, and if the amrecover process starts doing work again
afterwards


Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amrecover hangs after restore

2020-11-25 Thread Nathan Stratton Treadway

On Wed, Nov 25, 2020 at 20:07:23 +0100, Stefan G. Weichinger wrote:
> So maybe pigz needs some additional option at decompression, or some
> fix, or amanda needs some patch to correctly handle the behavior or pigz
> in the process.
> 
> I *know* I could use amrestore. But amrecover should work, and the very

I don't have any experience with pigz myself, but just from reading this
thread it seems like it would be useful for you to test the amrestore
approach in order to find out whether pigz run manually (outside of an
Amanda pipeline) raises a failure exit status or givens any sort of hint
of a problem processing either of those particular dump files.

Also, do you see the same defunct pigz process that Jason reported in
his original post?

Does anything interesting show up in the Amanda debug logs for the
amrecover process?

Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Backing up remote server

2020-10-20 Thread Nathan Stratton Treadway

On Tue, Oct 20, 2020 at 16:36:52 -0500, Robert Wolfe wrote:
> Amanda Backup Client Hosts Check
> 
> WARNING: wolfe2.wolfe.local: selfcheck request failed: error sending
> REQ: write error to: Connection refused
> Client check: 2 hosts checked in 0.069 seconds.  1 problem found.
> 
> (brought to you by Amanda 3.3.3)
> 
> Not sure what I need to do to get this to work.  I have the firewall on
> the remote server is disabled, but not sure what else I need to do.

The details depend on the authentication you have specified for that
host on the Amanda server and on the distribution/release of Linux
running on your Amanda client... but off hand it sounds like you are
missing the definition for the amandad service in /etc/inetd.conf or
/etc/xinietd.d/* on the client system.

Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: restore from vtapes written by amvault

2020-10-18 Thread Nathan Stratton Treadway

On Sun, Oct 18, 2020 at 14:56:04 +0200, Bernhard Erdmann wrote:
> I ended up patching the logfiles written by amvault, e.g. for
> 
> $ fgrep 20020908 ../tapelist
> 20020908 BE-full-43 reuse
> $ amvault --dest-storage vtape be-full \* \* 20020908
> 
> I get log.20201018130312.0 afterwards. Then I do
> 
> $ cp -p log.20201018130312.0 ../log_backup
> $ perl -p -i -e 's/ 2002090800 / 20020908 /g' log.20201018130312.0
> 
> and then amvaulted dump images written to vtapes can be located by
> amadmin find:

Great, thanks for posting this final wrap-up message.

So, when you ran your perl patching on the log files, were there any
lines other than the "DONE taper" lines that got changed?  

(That info will help confirm the proper fix for the original problem in
the vaulting code...)


Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-15 Thread Nathan Stratton Treadway

On Wed, Oct 14, 2020 at 13:20:38 -0400, Chris Hoogendyk wrote:
> I've been doing all the debugging stuff one one server. On my other
> server, I had simply set it to traditional and it's been working.
> Just now I went and applied ScanInventory.pm.patch_20201013C and
> changed the amanda.conf back to using oldest. An amcheck daily told
> me there were no acceptable volumes found! So I switched back to

Hmmm...  Well, clearly the exact failure mode different between the two
servers :(.

(You did previously see the "terminated with signal 11" error message on
the eclogite server, right?)

Did the dump that ran yesterday/last night actually write to the
geo-daily-065 tape successfully, or was there some sort of changer error
at run time?

I guess the next thing to try would be to install
ScanInventory.pm.patch_20201013B to enable some debugging, then run
"amcheck -s" and "amtape ... taper" again and post the results.  Those
tests should be done using the oldest taperscan, but you can leave the
amanda.conf as-is and test with -otaperscan on the command lines if you
prefer.

Also, since I assume the statefile has changed since you last posted it,
I guess you should include it again (i.e. the version that is out then
at the moemnt you are running those tests).

Nathan 

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-14 Thread Nathan Stratton Treadway

On Wed, Oct 14, 2020 at 12:20:33 -0400, Chris Hoogendyk wrote:
> Both lexical.pm and ScanInventory.pm restored to original. New fix
> only patch applied to ScanInventory.pm. amanda.conf restored to use
> oldest.
> 
>amanda@marlin:~/daily$ amcheck daily
> 
>Amanda Tape Server Host Check
>-
>NOTE: Holding disk '/amanda3': 139784192 KB disk space available, 
> using 34926592 KB
>NOTE: Holding disk '/amanda4': 170082304 KB disk space available, 
> using 65224704 KB
>NOTE: Holding disk '/amanda5': 240713728 KB disk space available, 
> using 135856128 KB
>  * Authorized Use Only *
> 
>snapper
>slot 25: volume 'Bio-Research-028'
>Will write to volume 'Bio-Research-028' in slot 25.
>NOTE: skipping tape-writable test
>Server check took 50.059 seconds
>Amanda Backup Client Hosts Check
>
>Client check: 4 hosts checked in 6.692 seconds.  0 problems found.
>(brought to you by Amanda 3.5.1)
> 
>amanda@marlin:~/daily$
[...]> 
> 
> Launched a flush on that. Then the following seems to set up a tape on the 
> second tape drive.
> 
>amanda@marlin:~/daily$ amtape daily taper
> 
>slot 31: volume 'Bio-Research-032'
>Will write to volume 'Bio-Research-032' in slot 31.
> 
>amanda@marlin:~/daily$

Okay, sounds like things are back to working "normally" on that server,
right?

So, do you still have a second server which is getting coredumps (at
least with the oldest taperscan)?

Based on the investigation so far, it seems like the crash is caused by
tape-inventory records which have no label text along with some specific
other data field values.

If you post the /usr/local/var/amanda/chg-robot-dev-tape-by-id-scsi*
changer state file that other server, we can double check that such
entries exist over there, too.

(Assume they do, then I guess the question will be whether you want to
apply the same ScanInventory.pm patch there, or if you instead want to
try clearing that/those bad inventory record(s) without changing the
installed code on that box)

Nathan


Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-13 Thread Nathan Stratton Treadway

On Tue, Oct 13, 2020 at 21:02:34 -0400, Chris Hoogendyk wrote:
> Patched.
> 
>amanda@marlin:~/daily$ amcheck -s -otaperscan=taper_lexical daily
> 
>Amanda Tape Server Host Check
>-
>NOTE: Holding disk '/amanda3': 449998848 KB disk space available, 
> using 345141248 KB
>NOTE: Holding disk '/amanda4': 3026923520 KB disk space available, 
> using 2922065920 KB
>NOTE: Holding disk '/amanda5': 104857600 KB disk space available, 
> using 0 KB
>slot 19: volume 'Bio-Research-007'
>Will write to volume 'Bio-Research-007' in slot 19.
>NOTE: skipping tape-writable test
>Server check took 20.422 seconds
>(brought to you by Amanda 3.5.1)

Okay, great.

As I said before I'm not confident this is a competely correct fix, but
attached here is a patch file (against the original version of the file)
containing just the "fix" line (i.e. no debugging statements).

You should be able to swap your lexical.pm back to the original version
and put ScanInventory.pm back to the original with just this one patch,
and then go ahead and switch back to "oldest" taperscan again in your
amanda.conf -- hopefully it will all "just work" again.

I'm curious to see the log file from an "amtape daily taper" run with
that setup in place (I assume that will run to successful completion,
too...).

    Nathan

Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239
--- ScanInventory.pm_orig_v3.5.12017-09-22 19:41:42.154305907 -0400
+++ ScanInventory.pm2020-10-13 22:43:25.148507391 -0400
@@ -723,6 +723,7 @@
return 0;
 } elsif ($dev_status == $DEVICE_STATUS_SUCCESS and
 $f_type == $Amanda::Header::F_TAPESTART) {
+   $label='' if !defined $label;
if (!match_labelstr($self->{'labelstr'}, $autolabel, $label,
$barcode, $meta, 
$self->{'chg'}->{'storage'}->{'storage_name'})) {
if (!$autolabel->{'other_config'}) {

Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-13 Thread Nathan Stratton Treadway

On Tue, Oct 13, 2020 at 15:08:43 -0400, Chris Hoogendyk wrote:
> 
> End of /tmp/amanda/server/daily/amcheck-device.20201013150303.debug
[...]
>Tue Oct 13 15:03:04.326400287 2020: pid 10777: thd-0x2688800: 
> amcheck-device: warning: Use of
>uninitialized value $label in concatenation (.) or string at
>/usr/local/share/perl/5.22.1/Amanda/ScanInventory.pm line 687.
>Tue Oct 13 15:03:04.326421243 2020: pid 10777: thd-0x2688800: 
> amcheck-device:
>volume_is_labelable start: label:  barcode: 29L7
>Tue Oct 13 15:03:04.326444683 2020: pid 10777: thd-0x2688800: 
> amcheck-device:
>volume_is_labelable pre-matchlabel call

Okay, great, this would seem to confirm the theory that passing an
uninitialed $label value into the match_labelstr() function is what's
triggering the crash.

Here's a new patch to try.  I am not sure that it's really the
long-term-correct fix, but with some luck it will at least prevent the
crash you are currrently seeing and let you switch back to the oldest
taperscan.

You can either apply this patch file against the *original* version of
ScanInventory.pm, or just manually edit the previously-patched version
of the file to add the 
   $label='' if !defined $label;
just below the 
   debug("volume_is_labelable pre-matchlabel call");
line that appears in there now.

Given that a shot (and send the log file lines as usual)...


Nathan


--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239
--- ScanInventory.pm_orig_v3.5.12017-09-22 19:41:42.154305907 -0400
+++ ScanInventory.pm2020-10-13 15:47:36.242054712 -0400
@@ -684,6 +684,7 @@
 my $chg = $self->{'chg'};
 my $autolabel = $chg->{'autolabel'};
 
+debug("volume_is_labelable start: label: $label barcode: $barcode");   
 if (!defined $dev_status) {
return 0;
 } elsif ($dev_status & $DEVICE_STATUS_VOLUME_UNLABELED and
@@ -723,8 +724,11 @@
return 0;
 } elsif ($dev_status == $DEVICE_STATUS_SUCCESS and
 $f_type == $Amanda::Header::F_TAPESTART) {
+debug("volume_is_labelable pre-matchlabel call");   
+$label='' if !defined $label;
if (!match_labelstr($self->{'labelstr'}, $autolabel, $label,
$barcode, $meta, 
$self->{'chg'}->{'storage'}->{'storage_name'})) {
+debug("volume_is_labelable post-matchlabel call");   
if (!$autolabel->{'other_config'}) {
 #  $self->_user_msg(slot_result  => 1,
 #   label=> $label,
@@ -734,7 +738,9 @@
return 0;
}
} else {
+debug("volume_is_labelable pre-lookup_tapelabel  call");   
my $vol_tle = $self->{'tapelist'}->lookup_tapelabel($label);
+debug("volume_is_labelable post-lookup_tapelabel  call");   
if (!$vol_tle) {
 #  $self->_user_msg(slot_result => 1,
 #   label   => $label,

Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-12 Thread Nathan Stratton Treadway

On Mon, Oct 12, 2020 at 22:22:45 -0400, Chris Hoogendyk wrote:
>Mon Oct 12 22:16:21.857347044 2020: pid 23996: thd-0x25c0800: 
> amcheck-device: slot: 9  label:
>Bio-Research-011ds:
>Mon Oct 12 22:16:21.857380544 2020: pid 23996: thd-0x25c0800: 
> amcheck-device: slot: 10  label:
>Bio-Research-012ds: 0
>Mon Oct 12 22:16:21.857471810 2020: pid 23996: thd-0x25c0800: 
> amcheck-device: slot: 11  label:
>Bio-Research-028ds: 0
>Mon Oct 12 22:16:21.857580226 2020: pid 23996: thd-0x25c0800: 
> amcheck-device: warning: Use of
>uninitialized value in concatenation (.) or string at
>/usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm line 102.
>Mon Oct 12 22:16:21.857607369 2020: pid 23996: thd-0x25c0800: 
> amcheck-device: slot: 12  label: ds: 0

Progress!

This shows that the crash happens during processing of slot 12.

(Looking back through the output of "amtape inventory" you sent, it
appears that this slot contains a tape with barcode 29L7.)


An interesting thing to note is that the *label* variable for that slot is
uninitialized -- perhaps that's what is causing the crash?

To test that theory a bit, I've attached another patch to try. 
Unfortunately this one is in a file used by the oldest.pm algorithm,
too, so you'll probably want to revert the file back to the original as
soon as you've finished teating, to make sure that the patched version
doesn't affect an actual amanda run.

So, basically save a copy of the original
   /usr/local/share/perl/5.22.1/Amanda/ScanInventory.pm 
file, then apply the patch attached to this email in-place to that file,
and run the 
  amcheck -s -otaperscan=taper_lexical daily
test again.  (I don't expect this patch to prevent the crash, but
hopefully the new log messages will narrow down exactly where it is crashing.)

Nathan


----
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239
--- ScanInventory.pm_orig_v3.5.12017-09-22 19:41:42.154305907 -0400
+++ ScanInventory.pm2020-10-13 00:53:04.662721210 -0400
@@ -684,6 +684,7 @@
 my $chg = $self->{'chg'};
 my $autolabel = $chg->{'autolabel'};
 
+debug("volume_is_labelable start: label: $label barcode: $barcode");   
 if (!defined $dev_status) {
return 0;
 } elsif ($dev_status & $DEVICE_STATUS_VOLUME_UNLABELED and
@@ -723,8 +724,10 @@
return 0;
 } elsif ($dev_status == $DEVICE_STATUS_SUCCESS and
 $f_type == $Amanda::Header::F_TAPESTART) {
+debug("volume_is_labelable pre-matchlabel call");   
if (!match_labelstr($self->{'labelstr'}, $autolabel, $label,
$barcode, $meta, 
$self->{'chg'}->{'storage'}->{'storage_name'})) {
+debug("volume_is_labelable post-matchlabel call");   
if (!$autolabel->{'other_config'}) {
 #  $self->_user_msg(slot_result  => 1,
 #   label=> $label,
@@ -734,7 +737,9 @@
return 0;
}
} else {
+debug("volume_is_labelable pre-lookup_tapelabel  call");   
my $vol_tle = $self->{'tapelist'}->lookup_tapelabel($label);
+debug("volume_is_labelable post-lookup_tapelabel  call");   
if (!$vol_tle) {
 #  $self->_user_msg(slot_result => 1,
 #   label   => $label,

Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-12 Thread Nathan Stratton Treadway

On Mon, Oct 12, 2020 at 14:53:03 -0400, Chris Hoogendyk wrote:
> There should be no difference in the tapes. I did them all by just
> doing an `amtape daily slot nn` followed by an `amlabel daily
> Bio-Research-nnn`. The first 20 or so were all done in sequence in
> one session, and that would include the four you mention. I didn't
> even retype the commands. I used the up arrow twice to pull up the
> previous command, backspaced the number and typed a new number for
> the slot or for the tape label as appropriate.

Based on your description of when the problems started, I'm guessing the
issue is not in how the tapes were originally labeled but some fluke of
how they were used after that.

> The new tapes that I put in with the new magazines were labeled in
> the same way. Those are now out of the library, and the tapes that
> had originally been in the library were returned. That is when the
> problem occurred.

So, what I'm wondering is if there is any pattern to which tape labels
tie to the tapes used in the "new magazine" (and thus now no longer
actually in the library) and which were in the "removed and later
returned" category?

Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-12 Thread Nathan Stratton Treadway

On Mon, Oct 12, 2020 at 15:27:44 -0400, Chris Hoogendyk wrote:
>amcheck-device: Not a SCALAR reference at
>/usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm line 102.

Ah, drat, this was a typo in my patch.  Please edit line 102 of that
file and remove the doubled "$" character, i.e. 
   $$sl->{device_status}
should be
   $sl->{device_status}

Then retry the amcheck test and see if gets any farther along.

Nathan

p.s. if there are any actual Perl programmers left on this list, feel
free to jump in and point us in the right direction here

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-12 Thread Nathan Stratton Treadway

On Mon, Oct 12, 2020 at 18:19:21 +, Debra S Baddorf wrote:
> Is it worth trying to just remove the ???state???  file  (rename it to  
> .save  for instance)
> and letting amanda recreate it?

Hmmm, that's an intersting idea... though it seems (from reading the man
pages) that it might be possilbe to do this using the "amtape reset"
command rather than deleting the state file directly,

Chris, in the mean time, what to you get from the commands:

  amtape daily inventory

  amtape daily taper 

(For now we'll just let it default to oldest.pm, until we fix the patch
for lexical.pm.)

Both the output to the terminal session an the log files will probably
be useful.

Nathan

----
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-11 Thread Nathan Stratton Treadway

On Sun, Oct 11, 2020 at 23:32:08 -0400, Chris Hoogendyk wrote:
> Text file.
> 
> attached.

Excellent, perfect.

If you are able to run the patched version of lexical.pm that should
give more explicit info, but meanwhile just looking through the
statefile: one thing that jumps out at me is that four of the slot
entries have Math::BigInt device_status fields, rather than simple
integers:

  Bio-Research-004, Bio-Research-001, Bio-Research-013, Bio-Research-014

Do those four volumes ring a bell with you as being special in some way?

(I wonder if the segfault might be related to the program trying to do
some operation against a  BigInt object when an integer is expected, or
something)

More generally, is there any pattern to the labels you used for your
"normal" tapes v.s. the short-term ones you wrote and then sent to Iron
Mountain?

Nathan


--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-11 Thread Nathan Stratton Treadway

On Sun, Oct 11, 2020 at 14:48:13 -0400, Chris Hoogendyk wrote:
> In any case, I currently have both systems flushing to tape using
> the traditional taper scan. That may work for now, but it would be
> good to track this down. It's also puzzling that it just turned up

R.e. tracking this down: my thought is to see if we can track down the
problem by tweaking the code for lexical.pm -- that way, you can trigger
test runs using "amcheck -otaperscan=taper_lexical" without running the
risk of having a normal cron job attempt to run the code you are in the
middle of modifying.

I've attached a patch which, hopefully, will both fix the
"uninitialized" warning messages that have been appearing in the logs
and also print some debugging info as it loops through the tape inventory
so we can see if it dies in the middle of that loop.

So, when you are ready to investigate further, save a copy of the
original /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm, then
apply the patch to the "live" lexical.pm file, and then try running
amcheck -s (with lexical scan) again.

You may have to fix unblanced quotes or whatever typos in the patched
lines, but hopefully you'll soon get a log file which lists the tapes in
the inventory (but doesn't have the warning lines any more)... at which
point, send me those log lines

Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239
--- lexical.pm_orig_v3.5.1  2020-10-11 18:23:37.650643033 -0400
+++ lexical.pm  2020-10-11 18:47:47.097749555 -0400
@@ -97,6 +97,10 @@

 for my $i (0..(scalar(@$inventory)-1)) {
my $sl = $inventory->[$i];
+
+   # tracing segfault:
+debug("slot: $i  label: " .  $sl->{'label'} . "ds: " . 
$$sl->{device_status});
+
next if $seen->{$sl->{slot}};

if (!defined $sl->{'state'} ||
@@ -104,6 +108,7 @@
push @unknown, $sl
} elsif ($sl->{'state'} == Amanda::Changer::SLOT_EMPTY) {
} elsif (defined $sl->{'label'} &&
+defined $sl->{device_status} &&
 $sl->{device_status} == $DEVICE_STATUS_SUCCESS) {
if ($self->is_reusable_volume(label => $sl->{'label'})) {
if ($last_label && $sl->{'label'} gt $last_label) {

Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-11 Thread Nathan Stratton Treadway

On Sun, Oct 11, 2020 at 15:51:34 -0400, Nathan Stratton Treadway wrote:
> My theory is that the driver for the changer keeps the inventory written
> in a file somewhere specific to the changer, but since I don't have a

Okay, looking back through the log file you posted a couple days ago,
it looks like the file in question is found at 

/usr/local/var/amanda/chg-robot-dev-tape-by-id-scsi-1BDT-FlexStor-II-00MX64200626-LL0

If I'm reading the driver correctly, that should be a text file
(perl-formated data definitions)... so can you post the contents here? 
(Or, if it seems too large, send it to me off-list.)

(It's atuaally called "statefile", but the robot changer seems to
include the inventory information as part of the changer state. )

Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-11 Thread Nathan Stratton Treadway

On Sun, Oct 11, 2020 at 14:48:13 -0400, Chris Hoogendyk wrote:
> I'm not sure what I should be looking for.
> 
> I don't see anything in amanda's home directory that seems likely,
> nor in /tmp/amanda, and there is no /etc/amanda/. The tape library
> has a web interface that I use to interact with it. Amanda is
> configured to use mtx. I can also use mtx by hand to check on the
> library's status, and I have a script amchanger that I wrote that
> does that for me.
> 
> So, aside from amanda keeping an inventory, and the tapelist that it
> has, I'm not sure where else anything would be.

My theory is that the driver for the changer keeps the inventory written
in a file somewhere specific to the changer, but since I don't have a
tape changer myself I am not familiar with that driver and don't know
where to direct you to look, off hand

Can you post the changer and tape-drive related parameters/sections from
your amanda.conf file?  (I assume the config sections on your two
different servers are essentially the same, right?)

Nsthan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-11 Thread Nathan Stratton Treadway

On Sun, Oct 11, 2020 at 12:24:45 -0400, Chris Hoogendyk wrote:
> /tmp/amanda/server/daily/amcheck-device.20201011120931.debug (lexical)

> Sun Oct 11 12:09:31.912816794 2020: pid 22002: thd-0xe8e600: amcheck-device: 
> NEO200x48: updating state
> Sun Oct 11 12:09:31.922463203 2020: pid 22002: thd-0xe8e600:
> amcheck-device: warning: Use of uninitialized value in numeric eq
> (==) at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm
> line 106.

Excellent, this points towards the "tape inventory" part of the code.

Can you take a look around your system to see if you can find where the
tape changer stores inventory information, internally?  (If you don't
find it immediately I can look in the source to try to figure out the
path it would use, but hopefully its easy enough to figure outjust
looking thorugh amanda-related directories on your system.)  If we can
find that file, it may be able to see some "wierd" data that could be
causing a crash by looking at the file directly.

Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-10 Thread Nathan Stratton Treadway

On Sat, Oct 10, 2020 at 23:50:17 -0400, Chris Hoogendyk wrote:
> Wow!
> 
>amanda@marlin:~/daily$ amcheck -s -otaperscan=taper_traditional daily
>Amanda Tape Server Host Check
>-
>NOTE: Holding disk '/amanda3': 913514496 KB disk space available, using 
> 808656896 KB
>NOTE: Holding disk '/amanda4': 158228480 KB disk space available, using 
> 53370880 KB
>NOTE: Holding disk '/amanda5': 1636618240 KB disk space available, using 
> 1531760640 KB
>Searching for label 'Bio-Research-002':label 'Bio-Research-002' not 
> recognized or not found
>slot 13:slot 13 not in use-slots (14-36)
>slot 14: volume 'Bio-Research-013' is still active and cannot be 
> overwritten
>slot 15: volume 'Bio-Research-003'
>Will write to volume 'Bio-Research-003' in slot 15.
>NOTE: skipping tape-writable test
>Server check took 175.512 seconds
>(brought to you by Amanda 3.5.1)
>amanda@marlin:~/daily$
> 
> That worked! Interestingly, doing an `amcheck -s daily` after that
> fails just as before. The amanda.conf uses taper_oldest.

Okay, this lends support to the theory that the crash is actually
happening in "scan" operation, rather than in some later part of the
amcheck-driver/taper process.

(Were there any error/warning messages written to the amcheck-device log
file for that run?)

> So, maybe if I temporarily go to the different algorithm, it will
> work. Right now the backups are already running and dumping to

Yeah, it might well work, and if so -- and if you don't care which
tape(s) are used next -- then simply switching to taper_traditional
would probably be the easiest approach to getting new dumps actually
written to tape

If the order the tapes are used does matter to you, I think it should
probably be possible to fix the bug in taper_oldest (oldest.pm) to get
it working (but I'm not really sure how much debugging effort it will
involve...).

If you are interested to attempt that, the next thing I would check is
to see what happens with -otaperscan=taper_lexical (assuming that is also
defined in your amanda.conf).  The "lexical" and "oldest" algorithms
both use the tape-drive inventory (while "traditional" does not), so
that test will help narrow the problem down to just "oldest" or to the
tape-inventory part of the code  

Nathan

Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-10 Thread Nathan Stratton Treadway

On Sat, Oct 10, 2020 at 19:20:25 -0400, Chris Hoogendyk wrote:
> The taper log from last night looks exactly like the amcheck log,
> showing the same inventory followed by the updating state and the
> same three repetitions of the warning at line 102.

So you are saying that the taper log from last night goes up through the
lines that look like

>Thu Oct 08 23:30:04.244060162 2020: pid 18920: thd-0x28b4800: taper: mtx:  
>  Storage Element 48:Full :VolumeTag=CLN002CU
>Thu Oct 08 23:30:04.244209614 2020: pid 18920: thd-0x28b4800: taper: 
> NEO200x48: updating state

, then switch to the warning lines from oldest.pm line 102, and then
aborts suddenly?

It's not getting you an actual solution, but I'm curious if changing to
a different taperscan algorithm in the amanda.conf file (or using
-otaperscan= , if you have any other ones defined already) allow
"amcheck -s" to complete sucessfully (and in any case what the amcheck
log file looks like with the another taperscan algorithm)

Nathan

----
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: restore from vtapes written by amvault

2020-10-10 Thread Nathan Stratton Treadway

On Fri, Oct 09, 2020 at 08:48:41 +0200, Bernhard Erdmann wrote:
> There is another logfile:
> /tmp/amanda/server/be-full/amfetchdump.20201009083605.debug
> 
> $ cat /tmp/amanda/server/be-full/amfetchdump.20201009083605.debug
> Fr Okt 09 08:36:05.594766041 2020: pid 2215: thd-0x1bab4f0: amfetchdump:
> pid 2215 ruid 33 euid 33 version 3.5.1: start at Fri Oct  9 08:36:05 2020

Yeah, that's the one to look at.

Unfortunately, I don't see anything in there that tells us more than we
already knew (i.e. that the program wasn't choosing the correct storage
before searching for the requested dump).

However, in my quick tests on a somewhat-similar setup here, I found
that I could in fact get amfetchdump to request the correct volume by
doing a manual override of the storage on the command line

So, does the following command work any better?:
  $ amfetchdump -ostorage=vtape -d vtape be-full svr '^/$' 2119



Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-10 Thread Nathan Stratton Treadway

On Sat, Oct 10, 2020 at 17:11:16 -0400, Chris Hoogendyk wrote:
> Also
> 
> 8 lines like this in the kern.log:
> 
>Oct  9 21:05:58 eclogite kernel: [650814.343315] amcheck-device[5089]: 
> segfault at 0 ip
>7f94159617c6 sp 7fff61039da8 error 4 in 
> libc-2.23.so[7f94158d6000+1c]
> 

Well, that doesn't really tell us more more about what is going wrong,
other than the slight hint the problem is all the way down in the the
libc library somehow.

I don't know how much you are trying to investigate further at this
point in your furlough schedule... but I still feel that comparing the
log from a succesfull run to this aborted run has the best chance of
generating a hint as to exactly what operation is underway at the point
of failure.

Also, the taper/changer logs from last night's run should give some hint
as to what it was attempting, and perhaps those logs will be different
enough from the amcheck-device logs that it'll give some new
information

(The last operation that appears to be happening in your quoted
amcheck-device log lines is a scan through the tape-changer inventory. 
I don't have a physical tape-drive changer myself so I don't have any
guesses as to what could be wrong, but based on the history of the
situation you described it does seem plausible that the inventory
database it's working from could contain some "unexpected" data of some
sort...)

Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-09 Thread Nathan Stratton Treadway

On Sat, Oct 10, 2020 at 00:29:04 -0400, Nathan Stratton Treadway wrote:
> Can you post the output of 
> 
>   $ sed -n '99,$p;105q' 
> /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm
> ?
> 
> (In other words, what's line 102 in that file on your system, with a
> few lines of context?)

(Hmm, perhaps 

  $ cat -n /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm | grep "^ 
*102" -C3

would be better -- that way, the file line numbers are included in
output...)



Nathan

----
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-09 Thread Nathan Stratton Treadway

On Fri, Oct 09, 2020 at 22:22:47 -0400, Chris Hoogendyk wrote:
> both servers, I'm getting this error (ERROR: amcheck-device
> terminated with signal 11). When I ran the amcheck before swapping

amcheck-device is a Perl program, so it's a little bit impressive to be
triggering a SEGV of the process :( .

Do you get any coredump-related kernel messages in your syslog file when
the process crashes?


> Fri Oct 09 21:57:09.795469021 2020: pid 24239: thd-0xc2e600: amcheck-device: 
> NEO200x48: updating state
> Fri Oct 09 21:57:09.802340532 2020: pid 24239: thd-0xc2e600:
> amcheck-device: warning: Use of uninitialized value in numeric eq
> (==) at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm
> line 102.
> Fri Oct 09 21:57:09.802537511 2020: pid 24239: thd-0xc2e600:
> amcheck-device: warning: Use of uninitialized value in numeric eq
> (==) at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm
> line 102.
> Fri Oct 09 21:57:09.802622523 2020: pid 24239: thd-0xc2e600:
> amcheck-device: warning: Use of uninitialized value in numeric eq
> (==) at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm
> line 102.

I am guessing this "uninitialized value" warning is not directly causing
the crash, but those log message might possibly hint as to where in the
program execution had reached just prior to the crash.

Can you post the output of 

  $ sed -n '99,$p;105q' /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm
?

(In other words, what's line 102 in that file on your system, with a
few lines of context?)


Do you see those warning lines in the log files from a successfully
amcheck run (i.e. from a few days ago)?  What do those logs show after
the warning lines (or the "updating state" line)?


    Nathan


Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: restore from vtapes written by amvault

2020-10-08 Thread Nathan Stratton Treadway

On Thu, Oct 08, 2020 at 14:29:48 -0400, Nathan Stratton Treadway wrote:
> On Thu, Oct 08, 2020 at 20:00:18 +0200, Bernhard Erdmann wrote:
> > $ amadmin be-full find svr /
> > 
> > date   host disk lv storage pooltape or file file part status
> > 2000-01-19 svr  / 0 vtape   vtape   vBE-full-0012  1/1 OK
> > 2000-01-19 svr  / 0 be-full be-full BE-full-00  2 1/-1 OK
> 
> Okay, that's good news.
> 
> 
> 
> > But amfetchdump still does not know about tape vBE-full-001:
> > 
> > $ amfetchdump -d vtape be-full svr '^/$' 2119
> > 1 volume(s) needed for restoration
> > The following volumes are needed: BE-full-00
> 
> Have you looked into explicitly specifying the storage that amfetchdump
> is looking for/at?  That would be my first thing to investigate at this
> point.  If I get a chance later this evening I'll try to look back
> through my notes to see if I can remember the details of how that
> works

Well looking through the source code for amfetchdump, it seems like
the program does not pull storage names from the amanda.conf file after
all, but instead it seems to assume that it can determine the storage to
use based on the changer specified by the"-d vtape" parameter.  Based on
the fact that it's prompting for tape BE-full-00, though, that doesn't
seem to be working as expected

It looks like amfetchdump should be creating a
"$logdir/fetchdump.$timestamp" log file.  If so, does that include any
mention of opening the vtape changer and/or detecting storage names?

(Assuming it's not super long, you could post that log file here for us
to take a look at...)

    Nathan

Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: restore from vtapes written by amvault

2020-10-08 Thread Nathan Stratton Treadway

On Thu, Oct 08, 2020 at 14:29:48 -0400, Nathan Stratton Treadway wrote:
> Have you looked into explicitly specifying the storage that amfetchdump
> is looking for/at?  That would be my first thing to investigate at this
> point.  If I get a chance later this evening I'll try to look back
> through my notes to see if I can remember the details of how that
> works

(What does
  $ amadmin be-full config | grep -i storage
show right now?)

Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: restore from vtapes written by amvault

2020-10-08 Thread Nathan Stratton Treadway

On Thu, Oct 08, 2020 at 20:00:18 +0200, Bernhard Erdmann wrote:
> $ amadmin be-full find svr /
> 
> date   host disk lv storage pooltape or file file part status
> 2000-01-19 svr  / 0 vtape   vtape   vBE-full-0012  1/1 OK
> 2000-01-19 svr  / 0 be-full be-full BE-full-00  2 1/-1 OK

Okay, that's good news.



> But amfetchdump still does not know about tape vBE-full-001:
> 
> $ amfetchdump -d vtape be-full svr '^/$' 2119
> 1 volume(s) needed for restoration
> The following volumes are needed: BE-full-00

Have you looked into explicitly specifying the storage that amfetchdump
is looking for/at?  That would be my first thing to investigate at this
point.  If I get a chance later this evening I'll try to look back
through my notes to see if I can remember the details of how that
works


Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: restore from vtapes written by amvault

2020-10-07 Thread Nathan Stratton Treadway

On Wed, Oct 07, 2020 at 17:50:22 -0400, Nathan Stratton Treadway wrote:
> That is, I would make a copy of the original log.20201004123343.0 file
> into some other directory, then used a txt editor to edit that
> particular DONE line to remove the "00" at the end of the
> datetimestamp field... and then run the "amadmin ... find" command again
> to see if that edit allowed it to start finding the vaulted copies of the
> dumps.
> 

(In case it's not clear, what I was trying to say was that I would save
a copy of the file off in another directory for safe keeping, then edit
the original file for my testing.)
Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: restore from vtapes written by amvault

2020-10-07 Thread Nathan Stratton Treadway

> Am 07.10.20 um 00:05 schrieb Nathan Stratton Treadway:
> 
> $ grep 20201004123343 ../tapelist
> 20201004123343 vBE-full-001 reuse BLOCKSIZE:32 POOL:vtape STORAGE:vtape 
> CONFIG:be-full

Okay, that looks correct as far as I can tell

> > Also, what do you get when you grep log.20201004123343.0 for "srv /"?
> > (That should give you all the taper lines related to writing the
> > "missing" dump for srv / .)
> 
> $ grep "svr / " log.20201004123343.0
> PART taper "ST:vtape" vBE-full-001 2 svr / 2119 1/-1 0 [sec 43.408733 
> bytes 13631488 kps 306.666403]
> DONE taper "ST:vtape" svr / 211900 1 0 [sec 44.00 bytes 13631488 
> kps 302.545455 orig-kb 0]

Hmmm, the one thing that seems a little strange is the extra zeros at
the end of the datetimestamp string on the DONE line

I don't know how well this situation of mixing dumps made in the
date-only datestamp era with vault-copies made in the datetimestamp era
has actually been tested... so if it were me that's probably what I'd
play with next.

That is, I would make a copy of the original log.20201004123343.0 file
into some other directory, then used a txt editor to edit that
particular DONE line to remove the "00" at the end of the
datetimestamp field... and then run the "amadmin ... find" command again
to see if that edit allowed it to start finding the vaulted copies of the
dumps.

Let us know how it goes...

        Nathan


Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: driver general protection fault on amanda 3.5.1

2020-10-07 Thread Nathan Stratton Treadway

On Wed, Oct 07, 2020 at 14:54:30 -0400, Steve Ryan wrote:
> I'm trying to debug an issue we've been having in our amanda
> 3.5.1 setup. Currently backups are failing every night due to (I
> believe) the driver faulting. Relevant logs:
> 
> amdump mail report:
> FAILURE DUMP SUMMARY:
>   chunker: FATAL Broken pipe at
> /usr/lib64/perl5/vendor_perl/Amanda/IPC/LineProtocol.pm line 429.
>   chunker: FATAL Connection reset by peer at
> /usr/lib64/perl5/vendor_perl/Amanda/IPC/LineProtocol.pm line 579.
> 
> dmesg:
> 2020-10-07T01:06:08.770127-04:00 vacuum.cs.umd.edu kernel: traps:
> driver[25995] general protection ip:7f2a9ffe50ec sp:7ffc61f8b040
> error:0 in libamanda-3.5.1.so[7f2a9ffaa000+81000]
> 
> 
> The environment is about ~80ish nodes total, running mostly RHEL7
> with some RHEL8 and ~3-5 Ubuntu/Debian machines. Everything is
> running 3.5.1. straight from the official sources. I don't think
> it's being caused by a client machine anyway, and some machines get
> backed up each night.

I don't remember seeing this particular problem reported here before and
don't have any silver bullet...

Which distribution is the Amanda server running on?

Was this setup of Amanda-server-and-~80ish-clients ever working
properly at some point before this crashing started??

> Has anyone seen this issue before/know what debug info I should be
> looking for in the logs?

If the driver proceess is indeed core dumping, you should see evidence
of that in /var/log/amanda/server//driver..debug for
that run.  At the very least the log should end abruptly; if you are
lucky there you might find a stack trace or something givening a clue as
to what is happening just before the crash.

If you can go back through the runs from various nights and correlate
the crashes to e.g. a particular client kicking off just beforehand, or
something, that might be a useful clue.

You can also look at the chunker..debug files in that same
directory to see if they give any additional hits, but off hand I'd
guess that they are just going to report that the chunker processes are
aborting due to the fact that the far side of the socket/pipe
disappeared, which presumably is caused by the driver process
crashing

        Nathan

Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: restore from vtapes written by amvault

2020-10-06 Thread Nathan Stratton Treadway

On Tue, Oct 06, 2020 at 18:49:55 +0200, Bernhard Erdmann wrote:
> "amadmin CONF find" does not locate the dumps on vtape (labeled
> vBE-full-001), only on original tapes (labeled BE-full-00 to BE-full-04).
> 
> $ amadmin be-full find svr /
> 
> date   host disk lv storage pooltape or file file part status
> 2000-01-19 svr  / 0 be-full be-full BE-full-00  2 1/-1 OK
> 
[...]

> The logdir contains the original logfiles of 19 Jan 2000 as well as the
> logfile log.20201004123343.0 describing the amvaulting to vBE-full-001.
> 

(I assume you are running Amanda v3.5, right?)

I use dump-time vaulting rather than "amvault" and am not exactly
certain what details are different between the two approaches, but off
hand I'm guessing that if you can fix the "find" problem that will also
allow the other commands to start working...

When I use "find", it does find both the "primary" and "vault" copies:

=
# su backup -c "amadmin TestBackup find TestServer"

datehostdisk  lv storage pooltape or 
file   file part status
2019-09-19 23:17:28 TestServer  /data  0 TestOffsite TestOffsite 
TESTBACKUP-1032  1/1 OK
2019-09-19 23:17:28 TestServer  /data  0 TestBackup  TestBackup  
TESTBACKUP-12 2  1/1 OK
=

Looking at the source code for the "find" command, it seems that Amanda
looks through the log.* files based on the data stamps pulled out of the
tapelist file...  so in your case, what does

  grep 20201004123343 tapelist

show (for the /etc/amanda/be-full/tapelist file)?

Also, what do you get when you grep log.20201004123343.0 for "srv /"? 
(That should give you all the taper lines related to writing the
"missing" dump for srv / .)

    Nathan

Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Is amanda busy

2020-09-23 Thread Nathan Stratton Treadway

On Wed, Sep 23, 2020 at 10:10:21 +0700, Olivier wrote:
> Does it exist a command that can be used to check whether amanda is busy
> or not?
> 
> For example, do not launch the daily backup if the previous one is still
> running, or do not reboot Amanda server (network stability issue) if a
> backup is being done.

I don't know of a straightforward Amanda-provided command to do exactly
this.

(Note that in older versions, Amanda would abort a new amdump run if an
old run was still underway, but in v3.5 there is support for concurrent
runs so it specifically doesn't abort automatically any more.)

In general, you can look to see if /var/log/amanda//amdump
exists.  That symlink is created when amdump starts, and renamed to
"amdump.1" as amdump finishes, so if the "amdump" symlink still exists
than the job is still running (... or it died without cleaning up).

If you are programming a script to check for this, you might also check
for an "amflush" symlink at the same time -- that symlink exists while
amflush is running, and depending on your configuration you might not
want to start a new amdump job while amflush is running, either.

For a manual check, you can run the "amstatus' command to see the status
of either the current in-progress run (if it shows the amdump or amflush
file) or last-complieted run (amdump.1/amflush.1).

Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Incrementals becoming fulls

2020-09-20 Thread Nathan Stratton Treadway

On Tue, Sep 15, 2020 at 20:14:34 -0700, Jim Kusznir wrote:
> I've read the website on the amanda site about changing disk IDs, but I did
> a fairly thorough checking of that, and I don't think that's it.  First,
> there is literally NO device on the mount point:
> 
> root@AmandaBackup:/usr/local # mount
> cys-bkup/iocage/jails/AmandaBackup/root on / (zfs, local, nfsv4acls)
> root@AmandaBackup:/usr/local #
> 
> And that stays exactly the same.

The important factor for Amanda (or, more precisely, GNU tar), is not
the mount points per se, but rather the Device field of each particular
file's inode.

Does FreeNAS have a "stat" command?  If so, it would be interesting to
see the output of that command for a few of the files in question, and
perhaps for various top-level directories of the jail above the
cross-mounted data directories, in hopes that gives some hint of what is
going on

> 
> I've found and ran the tar-snapshot-edit perl script in read mode to view
> the device IDs.  I am seeing a few different device IDs show up, but the
> level 0, level 1 and leve1.new for any given share always have the same
> device ID.  Here's a snippit of the output:
> 
> File: amclient-tdriveCYS-2018-Session-A-B_0
>   Detected snapshot file version: 2
> 
>   Device 0x2900ff0b occurs 6305 times.
> 
> File: amclient-tdriveCYS-2018-Session-A-B_1
>   Detected snapshot file version: 2
> 
>   Device 0x2900ff0b occurs 6306 times.
> 
> File: amclient-tdriveCYS-2018-Session-A-B_1.new
>   Detected snapshot file version: 2
> 
>   Device 0x2900ff0b occurs 6305 times.

 but I agree that since these all match, it seems something else
beyond the usual device-id-change is going on.

> 
> I've noticed that all affected devices will end up with a .new.  I think

I believe this means that the backup didn't didn't finish -- there
shouldn't be any .new files left between runs.

Do you see any errors in your mail report, or in your log files?

What does "ls -l" show for the directory containing the snapshot files?

    Nathan

Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: How to "unlable" a tape

2020-09-20 Thread Nathan Stratton Treadway

On Fri, Sep 18, 2020 at 10:53:44 +0700, Olivier wrote:
> I know there is amadmin no-reuse, but suppose the tape had been
> completely destroyed and is not readable anymore, there should be a way
> to tell Amanda it should completely forget about that tape, remove any
> index it can have aboutt he tape.
> 
> What would be the command then?

You're looking for "amrmtape" .

Nathan

----
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: amsamba errors

2020-07-06 Thread Nathan Stratton Treadway

On Mon, Jul 06, 2020 at 19:39:49 +0200, Stefan G. Weichinger wrote:
> I see stuff like this in my amanda-backups when I use the
> amsamba-application (amanda-3.5.1, Debian 10.4, samba-4.10.15):
> 
> sendbackup: info BACKUP=APPLICATION
> sendbackup: info APPLICATION=amsamba
> sendbackup: info RECOVER_CMD=/bin/gzip -dc
> |/usr/lib/amanda/application/amsamba restore [./file-to-restore]+
> sendbackup: info COMPRESS_SUFFIX=.gz
> sendbackup: info end
> ? smbclient: cli_setatr failed: NT_STATUS_ACCESS_DENIED
> ? smbclient: cli_setatr failed: NT_STATUS_ACCESS_DENIED
> 
> [..]
> 
> Unfortunately even the log files in
> /var/log/amanda/log.error don't show more details.

A little bit of additional information should be found in the 
  /var/log/amanda/client//Amsamba. The user in /etc/amandapass has admin rights on the files in the DLE (at
> least I am told so).
> 
> Maybe it's "only" setting the archive bit that fails?

If you count the number of NT_STATUS_ACCESS_DENIED errors in a
particular 
   /var/log/amanda/log.error/errout
file and it exactly matches the number of files backed up in that run
(e.g. as determined by a count of lines in the corresponding
   
/var/lib/amanda//index//___/_-unsorted.gz
file ), then it seems likely to be the archive bit permissions.

(Another symptom pointing in that direction is if incremental dumps for
that Samba DLE are the same size as full dumps even when you know that
most files would not have been changed since the previous full dump.)


To test explicitly, you can do a manual smbclient test, something like
this:

  $ USER= smbclient //path-to-share -E -W  -c "cd 
; tarmod full reset hidden system quiet;  tar c -" > 
smbclient_test.tar

to try to backup the remote test directory into the local
smbclient_test.tar file using the same smbclient "tar" settings that
Amanda uses 

(Also, the smbclient "ls" command shows you a flag of "A" or "N" on a
file, so you can use that to ocnfirm that the smbclient tar command is
successfully clearing the Archive bit.)

    Nathan



Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Timeout during estimate

2020-06-14 Thread Nathan Stratton Treadway

On Mon, Jun 15, 2020 at 00:20:25 -0400, Nathan Stratton Treadway wrote:
> On Mon, Jun 15, 2020 at 10:41:58 +0700, Olivier wrote:
> > I have an Amanda client that takes more than 4 hours to do the
> > estimate. The estimate is computed correctly, but when amandad on the

[...] 
> Sounds like you are looking for the "etimeout" parameter in amanda.conf
> on the Amanda server.  (You don't need to recompile anything to change
> this setting.)

I meant to add that another approach is to change the "estimate" option
for the disktype used in the DLE(s) for that client machine to something
other than "client".  Depending on your situation, one of the other two
options may given you a good-enough estimate of the size of the dumps
in a lot less time... which would probably have the additional effect of
letting the overall dump complete more quickly.

Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Timeout during estimate

2020-06-14 Thread Nathan Stratton Treadway

On Mon, Jun 15, 2020 at 10:41:58 +0700, Olivier wrote:
> I have an Amanda client that takes more than 4 hours to do the
> estimate. The estimate is computed correctly, but when amandad on the
> client tries to send back the estimate to the server, the packet times
> out.
> 
> I kind of remember that there is a timeout parameter that I need to
> tweak before recompiling Amanda, but I can't remember if it is on the
> client or on te server. I tend to think it is on the server. But
> definitive answer is welcome.

Sounds like you are looking for the "etimeout" parameter in amanda.conf
on the Amanda server.  (You don't need to recompile anything to change
this setting.)

Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: runspercycle with value 5

2020-06-01 Thread Nathan Stratton Treadway

On Mon, Jun 01, 2020 at 15:09:54 +0200, zoi...@medialab.sissa.it wrote:
> documentation and in messages in this mailing list runspercycle set with
> a value of 5 and immediately next to it (like a golden rule) the mention
> that amdump will run 5 times per cycle excluding weekends?
>
> My question is:  why excluding weekends? Is not possible to run amdump
> 5 times per week excluding Wednesday and Friday? Is there any
> correlation between the value 5 and the weekends?

These types of examples probably date back to the era when Amanda was
generally used to send backups to a no-changer tape drive... so you
needed someone physically present to change the tape before each new run
(and thus couldn't do new backups if the office was empty on the
weekends).  Thus, these examples target getting a full dump once a week,
but assume that only 5 tapes will be used over the course of that week.

But indeed there's no particular reason to only do backups on weekdays
(especially when using tape changers or vtapes, etc. which don't need
daily manual intervention).

In fact, at my site the current setup (which uses vtapes) runs amdump
every night, and has dumpcycle set to 3 and runspercycle is left unset
(which means "same as dumpcycle") -- so we're both running the dump all
seven days a week and also getting more than two full-dump cycles within
each week.

> 
> Another related question: In a configuration with
> 
>dumpcycle 7 days
>runspercycle 5
> 
> during a dumpcycle (a week) do I have to run amdump in the same days
> as in the
> previous dumpcycle or in the first dumpcycle I can run amdump during
> the days
> 
> 1 2 4 5 6
> 
> and in the second dumpcycle amdump can be run during the days
> 
> 2 3 4 6 7?

Amanda simply uses the runspercycle number to (try to) figure out how to
space out full v.s. incremental dumps over the course of a cycle -- but
this is done each run at a time, rather than there existing some sort of
overarching schedule which later runs then have to follow.

Put another way, each time Amanda runs, it gathers up the estimate
statistics for all the DLEs in the disklist, then tries to calculate an
answer to the question "Which DLEs should I pick to full-dump today, so
that no DLE is overdue for a full dump and so that the total size of the
full dumps today is close to the daily average size of full dumps?".

(You can get some insight into the calculations involved using the
"amadmin  balance" command.)

Since each Ananda run makes this calculation separately, if you change
the run schedule it will simply re-calculate using the current numbers
and do the best it can.

There are lots of nuances to how that plays out, but the short version
is that assuming you do keep to the same number of actual runs over the
course of the dumpcycle, you should be fine -- with the caveat that at
least when your filesystems have a fairly constant rate of changed/added
files, if your runs aren't spaced evenly then the runs that happen after
a longer delay will have more data than the runs after a shorter delay. 
That is often not really a big deal, but depending on your situation it
can be undesirable (which is why many of the examples out there are
structured to try to keep the daily backup size as constant as
possible).

        Nathan

Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: amanda-3.4.5 does not fill one tape

2020-05-15 Thread Nathan Stratton Treadway

On Fri, May 15, 2020 at 10:59:49 +0200, Stefan G. Weichinger wrote:
> I am gonna restore as well just to check if there are hidden write
> errors (doesn't look like that to me so far ...)

That reminded me that (at least on our Ubuntu Linux system) the
smartmontools package's "smartctl" let us read error statistics
information from the SCSI tape drive.  I put

   smartctl -l error -H $TAPEDEV

in the cron script which ran Amanda, and it would produce output like
this:

==
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen  
Home page is http://smartmontools.sourceforge.net/  

TapeAlert: OK   

Error counter log:  
   Errors Corrected by   Total   Correction Gigabytes   
Total  
   ECC  rereads/errors   algorithm  processed   
uncorrected
   fast | delayed   rewrites  corrected  invocations   [10^9 bytes] 
errors 
read:  00 0 0  0  0.000 
0  
write:  63040  6304  6304   6304  0.000 
0  
==

With that output at the end of the Amanda-running script I could keep an
eye on the numbers for each particular tape as it came through the
rotation cycle.  (A few thousand seemed normal, at least by the time I
implemented this monitoring [since by then the tapes were several years
old]; when tapes were starting to really go bad I saw error counts in
the tens of thousands, etc.)

When the drive detected that the tape heads needed to be cleaned, the
output looked like
==
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen  
Home page is http://smartmontools.sourceforge.net/  

TapeAlert Errors (C=Critical, W=Warning, I=Informational):  
[0x14] C: The tape drive needs cleaning:
  1. If the operation has stopped, eject the tape and clean the drive.  
  2. If the operation has not stopped, wait for it to finish and then   
  clean the drive.  
  Check the tape drive users manual for device specific cleaning instructions.  

Error counter log:  
   Errors Corrected by   Total   Correction Gigabytes   
Total  
   ECC  rereads/errors   algorithm  processed   
uncorrected
   fast | delayed   rewrites  corrected  invocations   [10^9 bytes] 
errors 

read:  00 0 0  0  0.000 
0  
write:  33901  3392  3389   3390  0.000 
1  
==

(You should double-check the behavior if you are going to rely on it,
but as I recall the stats are cleared when you swap a tape and when you
run the above smartctl command.)

Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: amanda-3.4.5 does not fill one tape

2020-05-14 Thread Nathan Stratton Treadway

On Thu, May 14, 2020 at 09:14:17 +0200, Stefan G. Weichinger wrote:
> Interesting, how can a "dirty" drive trigger this behavior?
> 
> I'd expect failures all along and not after ~200 or 300 GB written.
> 
> I don't see any interrupted writing or so (until that End Of Tape).

(We switched to disk-drive vtapes a long time ago so when I was last
looking into the details of backup-tape-drive behavior it was probably
for pre-LTO technology, but I would assume that for this discussion LTO
is similar)

For "modern" error-correcting tape drives, when the computer sends data
out to the tape drive to be written to tape, the drive actually then
uses the read head to immedately read back in the data it just wrote. 
If that read fails, the drive will automatically/transparently try the
write again... repeating the process until it is able to achieve a
successful confirmation read of that block of data.

Normally this just happens once in a while, when there's a bad spot on
the tape or some fluke of writing makes the data unreadable, and one
doesn't even notice it's happening.  

However, if the drive head is dirty or the tape media in general is
wearing out, then what happens is that many many many of the data blocks
either will be written badly or will fail to read back in [depending on
what exactly is dirty or failing], and the drive will have to re-write
data multiple times before a succesful write/read cycle.

When that happens, then lots of the linear space on the tape is used by
all the repeated writes -- thus making the tape appear to have a lower
capacity than you would expect -- and also all that re-writing means the
data throughput from the server's point of view is much reduced.

(Note that in this scenario the drive just keeps retrying to write a
block up data until it succeeds... or until it hits the end of the tape. 
So that's why you don't get "interrupted writing" in the sense of having
mid-tape write errors returned by the tape device the computer.  [But it
is "interrupted" in the sense that a block takes much longer to write
than it should so the computer has to wait a long time before it can
sent the next block of data down to the drive.])

Hope that makes sense.

        Nathan

----
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: smbclient

2020-04-23 Thread Nathan Stratton Treadway

On Thu, Apr 23, 2020 at 10:40:25 +0100, Nuno Dias wrote:
>  The "estimate server" solved the problem, definitely the amsamba don't
> like in the estimate the NT_STATUS_ACCESS_DENIED even when is not in
> the PATH.

So, with "estimate server" the backup was able to complete the full dump
phase successfully?

It would be interesting to know if there are any NT_STATUS errors in the
Amsamba logs for that successfully run and also to hear if the
second day's (level 1) backup is actually smaller than the full dump, as
expected.

Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: smbclient

2020-04-21 Thread Nathan Stratton Treadway

On Tue, Apr 21, 2020 at 13:06:13 +0100, Nuno Dias wrote:
>  I tried everything, I'm using the Administrator user in windows 10, I
> check and he has the rights to do everything, nevertheless in smbclient
> I have several ERRORS saying "NT_STATUS_ACCESS_DENIED listing" this is
> system files or system directories.
> 
>  The backup I want to do is the users directories, and I can read that
> directories and files, it seems amanda fails because of the previous
> ERRORS.
> 
>  Even if I put something like this //pcwindows/c$/Users/user/files and
> I'm positive that I can read these dir and files, I still have error
> in //pcwindows/c$/Progranas
> NT_STATUS_ACCESS_DENIED listing \Programas\*
> 
>  And this is not in the PATH of the backup :(

Yeah, that does seem wierd :(

It may be that the estimate phase inadvertently tries to access files
that are outside the specified directory tree, or something.  (In my
case we are only backing up "data file" shares on the Windows PC, so the
system directories are not included anywhere on the shares in question.)

Windows clients are definitely not my strong point, but if you want you
could go ahead and send me (off-list) the Amsamba.*debug file that
corresponds to the failed estimate (and probably your amanda.conf and
disklist files too) and I can see if I can determine anything by
comparing that with the logs from my working system

The other thing that just occurred to me is that if it seems like its
the estimate phase that is failing, you could trying adding "estimate
server" to the amsamba dumptype to see if that at least allows Amanda to
proceed to the dumping phase.


Nathan

----
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: smbclient

2020-04-20 Thread Nathan Stratton Treadway

On Mon, Apr 20, 2020 at 18:29:13 +0100, Nuno Dias wrote:
>  So, after countless hours with this issue without success, let me ask
> the community about smbtar.
> 
>  Can I use smbtar with amanda? if yes how? do I need to write a new
> plugin? 
> 

(At least here on my Ubuntu system, smbtar is just a wrapper shell
script around smbclient, so I doubt that adding that level of
indirection into the pipeline Amanda is trying to execute would make
things work better

However, it does remind me that when I was trying to get this working I
did extract the actual "smbclient" command from the .debug file and run
it manually from the command line in order to experiment.  For example,
the .debug file for the estimate phase shows the command 
  /usr/bin/smbclient  -d 0 -U backup -E -W  -c archive 
0;recurse;du

, so then I set the USER and PASSWD environment variables in my shell
session [running on the Amanda server machine], and did experimentation
using
  /usr/bin/smbclient  -d 0 -E -W  -c "archive 
0;recurse;du"

... and similarly for the command taken from the sendbackup runs where
the command was something more along the lines of "cd TestDirectory/;
tarmod full reset hidden system quiet; tar c -" etc.

This allowed me to try out permissions changes on the Share volumes in
real time, rather than having to kick off a full Amanda run to see the
results of my changes.)

Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: smbclient

2020-04-20 Thread Nathan Stratton Treadway

On Sat, Apr 18, 2020 at 11:44:18 +0100, Nuno Dias wrote:
>  Hi,
> 
>  The OS is Fedora 30 with  amanda-3.5.1-19.fc30.x86_64 and samba-
> client-4.10.14-0.fc30.x86_64 (this are the packages from the OS).
> 
>  The error that I have is accessing some files,
> 
> ERROR smbclient: NT_STATUS_ACCESS_DENIED listing
> Windows\\System32\\config\\systemprofile\\AppData\\Local\\Microsoft\\Wi
> ndows\\INetCache\\Content.IE5\\*"
> 
>  Can this be the reason all the estimates fails?

Yes, it could be (though I would expect to see a more explict failure
message from the estimate phase later on in the log file if that were
the case).

You will certainly need to connect to your Windows client using a
username which has sufficient permissions to read all files, and also to
update the "archive bit" on the backed-up files so that level 1 dumps
work as expected.

I haven't delived into the nuances of NTFS permissions and have only
gone through this on a single windows client machine, but in my
experimentation I found that setting the following permissions using
Server Manager for the particular shares in question produced a working
level-1 backup with no error messages in the log, while also at least
preventing obvious editing of the share by the backup user :
  permissions (for the smbclient user) enabled: 
 "Read & exectue", 
 "List folder contents",
"Read"

  Advanced Attributes enabled: 
 "Traverse folder/execute file",
 "List folder/read data", 
 "Read attributes", 
 "Read extended attributes", 
 "Read permissions" 
 "Write attributes"

(On this particular client box, then Sharing tab already had "Full
Control" granted to Everyone, so I did not have to fine-tune the
settings there to get Amanda working.)

Hope that is at least somewhat helpful...


Nathan


Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: smbclient

2020-04-17 Thread Nathan Stratton Treadway

On Fri, Apr 17, 2020 at 19:51:47 +0100, Nuno Dias wrote:
>  Hi,
> 
>  I'm trying to use the amsamba plugin, put fails everytime in the
> estimate.

I have a vague recollection of running in to an unexpected warning
message from from smbclient when I first set up Amanda 3.5.x with Samba
4.7.x), but it sounds like a different problem from yours.

What error messages do you find in the
.../client//Amsamba.2020*.debug log file?

(Knowing your OS and Amanda versions [and exact version of Samba] might
be helpful, too.)

Nathan

----
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: Zmanda/Amanda: current support list and some new work ....

2020-02-26 Thread Nathan Stratton Treadway

On Wed, Feb 26, 2020 at 17:05:22 +, Chris Hassell wrote:
> Just to keep up-to-date after a long effort  I'm working
> currently on these areas and have finally gotten out of config-mgmt /
> packaging.

I don't see any commits in git://github.com/zmanda/amanda later than
December 2019 or so.  Has the public amanda work been moved to another
repo?

Nathan

--------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: separate spindle for dump use -- Amanda planner

2020-02-12 Thread Nathan Stratton Treadway

On Sun, Feb 09, 2020 at 15:29:04 -0500, Gene Heskett wrote:
> It has perms to use 2 vtapes, so there is nothing left in either of the 
> holding disks, ever. And I am still trying to make it use higher levels, 
> its entirely too often it will advance half the disklist from level 2 to 
> level 0, doing a jump of 9 days!, and that then runs onto the 2d vtape 
> by half a vtape. Doesn't make any sense either when it says going to do 
> a level 3, then in the final report in the same email it actually does a 
> level 0, maybe 20 times out of a 75+ entry disklist. 
> 
> So basically, I am tired of amanda lying to me. It should do what the 

Gene,

When you say Amanda is lying to you, are you referring to the section
of the Amanda mail report that looks like:

  planner: Incremental of  bumped to level 6.
  planner: Incremental of  bumped to level 2.
  planner: Incremental of  bumped to level 2.
  [...]
  planner: Full dump of  promoted from 6 days ahead.
  planner: Full dump of  promoted from 6 days ahead.
  planner: Full dump of  promoted from 6 days ahead.
  [...]

... and in particular the fact that sometime the same DLE shows up in
both of those lists?

Assuming so: keep in mind that the "bumped to level" messages actually
come from an early pass through the DLE list, during which Amanda simple
decides if the incremental level (for each DLE) should be bumped up,
based on the various bump* amanda.conf dumptype parameters.  (Presumably
the calculations from this step will match the results printed by the
"admadmin ... bumpsize" command.)

At this point Amanda hasn't tried to do any "balancing" yet, so really
these "Incremental of  bumped to level X" messages should be
interpreted as meaning "if I happen to do an incremental dump for ,
that dump will be at level X" (rather than as, say, "I am about to do a
level X dump of ").

> planner says its going to and it might eventually hit a balanced 
> schedule, but with something overriding the planner, there is no way in 
> hell it will ever get it done.

The "from from N days ahead" lines are the ones from the logic that
attempts to balance the runs.  I think it's fair to say that they are
_from_ the planner, so the issue is not really that something is
"overriding" the planner, but rather that the planner's calculations
aren't working out very well in your context

Looks like there we had some discussion here on the list back in
November 2018 about the unbalanced runs you were seeing, which got as
far as my theorizing that perhaps you had a few DLEs that were so much
larger than all the others that Amanda's normal algorithm for adjusting
the balance is unable to achieve that goal 

I don't know if there is anyone left on this list who understands the
inner workings of those algorithms, but if you really want to get to the
bottom of the problem you'll probably have to spend some time combing
through the balance calculation section of the planner log files

Or, you can try one or both of these and see if they happen to improve
the situation:

  * break up your largest DLEs (or exclude some of the data from Amanda
backup in order to make them smaller)

  * make your dumpcycle either longer or shorter, so that the expected
average daily size changes.  (Depending on the relatives sizes of
your DLEs, you may need to made the daily size either larger or
smaller to get things to work better...)

Anyway, if you decide you want to get to the bottom of your balance
issue, it probably makes sense to go back to the 'following the
"balance" report' thread from 2018 and pick up from there...

> Back at about 2.4.2 I had zero trouble filling to within 50 megs, a 4GB
> DS2 tape.  And that was by telling amanda the tape was 3.5Gigs, and

(I have no idea how much the planner algorithm changed between 2.4 and
3.5, but keep in mind that your DLE list changed a lot since then as
well, so it's not really an apples-to-apples comparison...)

Nathan

Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray
Ontko & Co.  - Software consulting services - http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Re: separate spindle for dump use

2020-02-11 Thread Nathan Stratton Treadway

On Sun, Feb 09, 2020 at 05:47:46 -0500, Gene Heskett wrote:
> amcheck is again happy, and since a df now shows that now empty disk as 
> haveing 896702760 1k blocks free, it ought to have all the room it 
> needs, and I'm reasoning the original /usr/dumps holding disk can be 
> commented out.
> 
> However, since I'm also backing up 4 other machines whose "spindles" are 
> out on the net and can and do run in-parallel, is the removal of the old 
> default a good idea?  That would, depending on the scheduling, have 5 
> dumpers writing to the same spindle again.

I am not sure we can give a definitive answer to that question...

Off hand I would suspect that the single disk drive is fast enough to
handle all the traffic as it comes in over the network, especially if
it's a decently modern drive attached via direct SATA link and you have
a reasonable amount of RAM available for buffering/caching.  If so, your
total backup-run time would be about the same with just the one drive.

On the other hand, it's certainliy true that if you have two drives in
use for holding disk files then the total amount of drive-head movement
is reduced (since each drive doesn't have to move back and fourth
between as many different files), compared having just one drive
(assuming spinning-platter drives, obviously).  So if you are worried
about your sdb drive's lifespan, it makes sense to keep /usr/dumps in
the mix.  (On the other hand, if sdb is only used for the holding disk
while sda is important because it contains the active root filesystem,
you might want to push the extra wear onto sdb by commenting out the
/usr/dumps side.)

But without some specific unusual factor in your situation, I'd guess
that it from the drive hardware side of things it probably won't matter
in the end, and you should just go ahead and configure whichever way
seems easiest for you as system administrator

Nathan

----
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

1 2 3 4 5 >

1 - 100 of 418 matches

Mail list logo