Odd non-fatal errors in amdump reports.

2017-11-07 Thread Austin S. Hemmelgarn
Where I work, we recently switched from manually triggered vaulting to 
automatic vaulting using the vault-storage, vault, and dump-selection 
options.  Things appear to be working correctly, but we keep getting 
some odd non-fatal error messages (that might be bogus as well, since 
I've verified the dumps mentioned restore correctly) in the amdump 
e-mails.  I've been trying to figure out these 'errors' for the past
few weeks now, and I'm hoping someone on the list might have some advice
(or better yet, might recognize the symptoms and know how to fix them).

In our configuration, we have three different backup sets (each is on 
it's own schedule).  Of these, two are consistently showing the following
error in the amdump e-mail report (I've redacted hostnames and exact paths,
the second path listed though is a parent directory of the first):

taper: FATAL Header of dumpfile does not match command from driver 0 XXX 
/home/X 20171031074642 -- 0 XXX /home/XX 
20171031074642 at /usr/lib64/perl5/vendor_perl/5.24.1/Amanda/Taper/Worker.pm 
line 1168

For a given backup set, the particular hostname and paths are always the 
same, but the backup appears to get taped correctly, and restores 
correctly as well.

With the third backup set, we're regularly seeing things like the 
following in the dump summary section, but no other visible error 
messages:

   DUMPER 
STATS TAPER STATS
HOSTNAME DISK   L ORIG-KB  OUT-KB  COMP%  MMM:SS
 KB/s MMM:SS KB/s
- -- 
 ---
XX   /boot  0--   
FAILED 
XX   /boot  1  10  10-- 0:00
168.8   0:00  0.0

In this case, the particular DLE's affected are always the same,
and the first line that claims a failure always shows dump level
zero, even when the backup is supposed to be at another level.
Just like the other error, the affected dumps always restore
correctly when tested, and get correctly vaulted as well.  The
affected DLE's are only on Linux systems, but it seems to not
care what distro or amanda version is being used (it's affected,
Debian, Gentoo, and Fedora systems, and covers 5 different
Amanda client versions), and are invariably small (sub-gigabyte)
filesystems, but I've not found any other commonality among them.

All three sets use essentially the same amanda.conf file (the 
differences are literally just in when they get run), which
I've attached in-line at the end of this e-mail with
sensitive data redacted.  The thing I find particularly odd is
that this config is essentially identical to what I use on my
personal systems, which are not exhibiting either problem.

8<

org  "X"
mailto   "admin"
dumpuser "amanda"
inparallel 2
dumporder "Ss"
taperalgo largestfit

displayunit "k"
netusage  800 Kbps

dumpcycle 4 weeks
runspercycle 28
tapecycle 128 tapes

bumppercent 20
bumpdays 2

etimeout 900
dtimeout 1800
ctimeout 30

device_output_buffer_size 256M

compress-index no

flush-threshold-dumped 0
flush-threshold-scheduled 0
taperflush 0
autoflush yes

runtapes 16

define changer vtl {
tapedev "chg-disk:/net/XX/amanda/X"
changerfile "/etc/amanda/X/changer"
property "num-slot" "128"
property "auto-create-slot" "yes"
}

define changer aws {
tapedev 
"chg-multi:s3:/slot-{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127}"
changerfile "/etc/amanda/X/s3-changer"
device-property "S3_SSL" "YES"
device-property "S3_ACCESS_KEY" ""
device-property "S3_SECRET_KEY" 
""
device-property "S3_MULTI_PART_UPLOAD" "YES"
device-property "CREATE_BUCKET" "NO"
device-property "S3_BUCKET_LOCATION" "X"
device-property "STORAGE_API" "AWS4"
}

define storage local-vtl {
tpchanger "vtl"
tapepool "$r"
tapetype "V64G"
labelstr "^-[0-9][0-9]*$"
autolabel "-%%%" any
erase-on-full YES
erase-on-failure YES
vault cloud 0
}

define storage cloud {
tpchanger "aws"
tapepool "$r"
tapetype "S3TAPE"
labelstr "^Vault--[0-9][0-9]*$"
autolabel "Vault--%%%" any
erase-on-full YES
erase-on-failure YES
 

Re: Odd non-fatal errors in amdump reports.

2017-11-07 Thread Jean-Louis Martineau

Austin,

It's hard to say something with only the error message.

Can you post the amdump. and log..0 for the 2 
backup set that fail.


The tapedev of the aws changer can be written like:

   tapedev "chg-multi:s3:/slot-{0..127}


Jean-Louis

On 07/11/17 09:17 AM, Austin S. Hemmelgarn wrote:

Where I work, we recently switched from manually triggered vaulting to
automatic vaulting using the vault-storage, vault, and dump-selection
options.  Things appear to be working correctly, but we keep getting
some odd non-fatal error messages (that might be bogus as well, since
I've verified the dumps mentioned restore correctly) in the amdump
e-mails.  I've been trying to figure out these 'errors' for the past
few weeks now, and I'm hoping someone on the list might have some advice
(or better yet, might recognize the symptoms and know how to fix them).

In our configuration, we have three different backup sets (each is on
it's own schedule).  Of these, two are consistently showing the following
error in the amdump e-mail report (I've redacted hostnames and exact paths,
the second path listed though is a parent directory of the first):

taper: FATAL Header of dumpfile does not match command from driver 0 XXX 
/home/X 20171031074642 -- 0 XXX /home/XX 
20171031074642 at /usr/lib64/perl5/vendor_perl/5.24.1/Amanda/Taper/Worker.pm 
line 1168

For a given backup set, the particular hostname and paths are always the
same, but the backup appears to get taped correctly, and restores
correctly as well.

With the third backup set, we're regularly seeing things like the
following in the dump summary section, but no other visible error
messages:

DUMPER 
STATS TAPER STATS
HOSTNAME DISK   L ORIG-KB  OUT-KB  COMP%  MMM:SS
 KB/s MMM:SS KB/s
- -- 
 ---
XX   /boot  0--   FAILED
XX   /boot  1  10  10-- 0:00
168.8   0:00  0.0

In this case, the particular DLE's affected are always the same,
and the first line that claims a failure always shows dump level
zero, even when the backup is supposed to be at another level.
Just like the other error, the affected dumps always restore
correctly when tested, and get correctly vaulted as well.  The
affected DLE's are only on Linux systems, but it seems to not
care what distro or amanda version is being used (it's affected,
Debian, Gentoo, and Fedora systems, and covers 5 different
Amanda client versions), and are invariably small (sub-gigabyte)
filesystems, but I've not found any other commonality among them.

All three sets use essentially the same amanda.conf file (the
differences are literally just in when they get run), which
I've attached in-line at the end of this e-mail with
sensitive data redacted.  The thing I find particularly odd is
that this config is essentially identical to what I use on my
personal systems, which are not exhibiting either problem.

8<

org  "X"
mailto   "admin"
dumpuser "amanda"
inparallel 2
dumporder "Ss"
taperalgo largestfit

displayunit "k"
netusage  800 Kbps

dumpcycle 4 weeks
runspercycle 28
tapecycle 128 tapes

bumppercent 20
bumpdays 2

etimeout 900
dtimeout 1800
ctimeout 30

device_output_buffer_size 256M

compress-index no

flush-threshold-dumped 0
flush-threshold-scheduled 0
taperflush 0
autoflush yes

runtapes 16

define changer vtl {
tapedev "chg-disk:/net/XX/amanda/X"
changerfile "/etc/amanda/X/changer"
property "num-slot" "128"
property "auto-create-slot" "yes"
}

define changer aws {
tapedev 
"chg-multi:s3:/slot-{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127}"
changerfile "/etc/amanda/X/s3-changer"
device-property "S3_SSL" "YES"
device-property "S3_ACCESS_KEY" ""
device-property "S3_SECRET_KEY" 
""
device-property "S3_MULTI_PART_UPLOAD" "YES"
device-property "CREATE_BUCKET" "NO"
device-property "S3_BUCKET_LOCATION" "X"
device-property "STORAGE_API" "AWS4"
}

define storage local-vtl {
tpchanger "vtl"
tapepool "$r"
tapetype "V64G"
 labelstr "^-[0-9][0-9]*$"
 autolabel "-%%%" any
erase-on-full YES
er

Re: Odd non-fatal errors in amdump reports.

2017-11-07 Thread Austin S. Hemmelgarn

On 2017-11-07 10:22, Jean-Louis Martineau wrote:

Austin,

It's hard to say something with only the error message.

Can you post the amdump. and log..0 for the 2
backup set that fail.
Yes, though it may take me a while since our policy is pretty strict 
about scrubbing hostnames and usernames from any internal files we make 
visible publicly.


Just to clarify, it will end up being 3 total pairs of files, two from 
backup sets that show the first issue I mentioned (the complaint about a 
header mismatch), and one from the backup set showing the second issue I 
mentioned (the apparently bogus dump failures listed in the dump summary).


The tapedev of the aws changer can be written like:

tapedev "chg-multi:s3:/slot-{0..127}
Thanks, I hadn't know that the configuration file syntax supported 
sequences like this, that makes it look so much nicer!



Jean-Louis

On 07/11/17 09:17 AM, Austin S. Hemmelgarn wrote:
 > Where I work, we recently switched from manually triggered vaulting to
 > automatic vaulting using the vault-storage, vault, and dump-selection
 > options. Things appear to be working correctly, but we keep getting
 > some odd non-fatal error messages (that might be bogus as well, since
 > I've verified the dumps mentioned restore correctly) in the amdump
 > e-mails. I've been trying to figure out these 'errors' for the past
 > few weeks now, and I'm hoping someone on the list might have some advice
 > (or better yet, might recognize the symptoms and know how to fix them).
 >
 > In our configuration, we have three different backup sets (each is on
 > it's own schedule). Of these, two are consistently showing the following
 > error in the amdump e-mail report (I've redacted hostnames and exact 
paths,

 > the second path listed though is a parent directory of the first):
 >
 > taper: FATAL Header of dumpfile does not match command from driver 0 
XXX /home/X 20171031074642 -- 0 XXX 
/home/XX 20171031074642 at 
/usr/lib64/perl5/vendor_perl/5.24.1/Amanda/Taper/Worker.pm line 1168

 >
 > For a given backup set, the particular hostname and paths are always the
 > same, but the backup appears to get taped correctly, and restores
 > correctly as well.
 >
 > With the third backup set, we're regularly seeing things like the
 > following in the dump summary section, but no other visible error
 > messages:
 >
 > DUMPER STATS TAPER STATS
 > HOSTNAME DISK L ORIG-KB OUT-KB COMP% MMM:SS KB/s MMM:SS KB/s
 > - -- 
 ---

 > XX /boot 0 -- FAILED
 > XX /boot 1 10 10 -- 0:00 168.8 0:00 0.0
 >
 > In this case, the particular DLE's affected are always the same,
 > and the first line that claims a failure always shows dump level
 > zero, even when the backup is supposed to be at another level.
 > Just like the other error, the affected dumps always restore
 > correctly when tested, and get correctly vaulted as well. The
 > affected DLE's are only on Linux systems, but it seems to not
 > care what distro or amanda version is being used (it's affected,
 > Debian, Gentoo, and Fedora systems, and covers 5 different
 > Amanda client versions), and are invariably small (sub-gigabyte)
 > filesystems, but I've not found any other commonality among them.
 >
 > All three sets use essentially the same amanda.conf file (the
 > differences are literally just in when they get run), which
 > I've attached in-line at the end of this e-mail with
 > sensitive data redacted. The thing I find particularly odd is
 > that this config is essentially identical to what I use on my
 > personal systems, which are not exhibiting either problem.
 >
 > 8<
 >
 > org "X"
 > mailto "admin"
 > dumpuser "amanda"
 > inparallel 2
 > dumporder "Ss"
 > taperalgo largestfit
 >
 > displayunit "k"
 > netusage 800 Kbps
 >
 > dumpcycle 4 weeks
 > runspercycle 28
 > tapecycle 128 tapes
 >
 > bumppercent 20
 > bumpdays 2
 >
 > etimeout 900
 > dtimeout 1800
 > ctimeout 30
 >
 > device_output_buffer_size 256M
 >
 > compress-index no
 >
 > flush-threshold-dumped 0
 > flush-threshold-scheduled 0
 > taperflush 0
 > autoflush yes
 >
 > runtapes 16
 >
 > define changer vtl {
 > tapedev "chg-disk:/net/XX/amanda/X"
 > changerfile "/etc/amanda/X/changer"
 > property "num-slot" "128"
 > property "auto-create-slot" "yes"
 > }
 >
 > define changer aws {
 > tapedev 
"chg-multi:s3:/slot-{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127}"

 > changerfile "/etc/amanda/X/s3-ch

Re: Odd non-fatal errors in amdump reports.

2017-11-08 Thread Jean-Louis Martineau

On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:

On 2017-11-07 10:22, Jean-Louis Martineau wrote:

Austin,

It's hard to say something with only the error message.

Can you post the amdump. and log..0 for the 2
backup set that fail.

I've attached the files (I would put them inline, but one of the sets 
has over 100 DLE's, so the amdump file is huge, and the others are 
still over 100k each, and I figured nobody want's to try and wad 
through those in-line).


The set1 and set2 files are for the two backup sets that show the 
header mismatch error, and the set3 files are for the one that claims 
failures in the dump summary.



I looked at set3, the error in the 'DUMP SUMMARY' are related to the 
error in the 'FAILURE DUMP SUMMARY'


  client2 /boot lev 0  FLUSH [File 0 not found]
  client3 /boot lev 0  FLUSH [File 0 not found]
  client7 /boot lev 0  FLUSH [File 0 not found]
  client8 /boot lev 0  FLUSH [File 0 not found]
  client0 /boot lev 0  FLUSH [File 0 not found]
  client9 /boot lev 0  FLUSH [File 0 not found]
  client9 /srv lev 0  FLUSH [File 0 not found]
  client9 /var lev 0  FLUSH [File 0 not found]
  server0 /boot lev 0  FLUSH [File 0 not found]
  client10 /boot lev 0  FLUSH [File 0 not found]
  client11 /boot lev 0  FLUSH [File 0 not found]
  client12 /boot lev 0  FLUSH [File 0 not found]

They are VAULT attemp, not FLUSH, looking only at the first entry, it 
try to vault 'client2 /boot 0 20171024084159' which it expect to find on 
tape Server-01. It is an older dump.


Do Server-01 is still there? Did it still contains the dump?

Jean-Louis
This message is the property of CARBONITE, INC. and may contain confidential or 
privileged information.
If this message has been delivered to you by mistake, then do not copy or 
deliver this message to anyone.  Instead, destroy it and notify me by reply 
e-mail


Re: Odd non-fatal errors in amdump reports.

2017-11-08 Thread Austin S. Hemmelgarn

On 2017-11-08 08:03, Jean-Louis Martineau wrote:

On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
 > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
 >> Austin,
 >>
 >> It's hard to say something with only the error message.
 >>
 >> Can you post the amdump. and log..0 for the 2
 >> backup set that fail.
 >>
 > I've attached the files (I would put them inline, but one of the sets
 > has over 100 DLE's, so the amdump file is huge, and the others are
 > still over 100k each, and I figured nobody want's to try and wad
 > through those in-line).
 >
 > The set1 and set2 files are for the two backup sets that show the
 > header mismatch error, and the set3 files are for the one that claims
 > failures in the dump summary.


I looked at set3, the error in the 'DUMP SUMMARY' are related to the
error in the 'FAILURE DUMP SUMMARY'

client2 /boot lev 0 FLUSH [File 0 not found]
client3 /boot lev 0 FLUSH [File 0 not found]
client7 /boot lev 0 FLUSH [File 0 not found]
client8 /boot lev 0 FLUSH [File 0 not found]
client0 /boot lev 0 FLUSH [File 0 not found]
client9 /boot lev 0 FLUSH [File 0 not found]
client9 /srv lev 0 FLUSH [File 0 not found]
client9 /var lev 0 FLUSH [File 0 not found]
server0 /boot lev 0 FLUSH [File 0 not found]
client10 /boot lev 0 FLUSH [File 0 not found]
client11 /boot lev 0 FLUSH [File 0 not found]
client12 /boot lev 0 FLUSH [File 0 not found]

They are VAULT attemp, not FLUSH, looking only at the first entry, it
try to vault 'client2 /boot 0 20171024084159' which it expect to find on
tape Server-01. It is an older dump.

Do Server-01 is still there? Did it still contains the dump?

Hmm, looks like that's a leftover from changing our labeling format 
shortly after switching to this new configuration.  I thought I purged 
all the stuff with the old label scheme, but I guess not.


It somewhat surprises me that this doesn't give any kind of error 
indication in the e-mail report beyond the 'FAILED' line in the dump 
summary.


Re: Odd non-fatal errors in amdump reports.

2017-11-10 Thread Austin S. Hemmelgarn

On 2017-11-08 08:03, Jean-Louis Martineau wrote:

On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
 > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
 >> Austin,
 >>
 >> It's hard to say something with only the error message.
 >>
 >> Can you post the amdump. and log..0 for the 2
 >> backup set that fail.
 >>
 > I've attached the files (I would put them inline, but one of the sets
 > has over 100 DLE's, so the amdump file is huge, and the others are
 > still over 100k each, and I figured nobody want's to try and wad
 > through those in-line).
 >
 > The set1 and set2 files are for the two backup sets that show the
 > header mismatch error, and the set3 files are for the one that claims
 > failures in the dump summary.


I looked at set3, the error in the 'DUMP SUMMARY' are related to the
error in the 'FAILURE DUMP SUMMARY'

client2 /boot lev 0 FLUSH [File 0 not found]
client3 /boot lev 0 FLUSH [File 0 not found]
client7 /boot lev 0 FLUSH [File 0 not found]
client8 /boot lev 0 FLUSH [File 0 not found]
client0 /boot lev 0 FLUSH [File 0 not found]
client9 /boot lev 0 FLUSH [File 0 not found]
client9 /srv lev 0 FLUSH [File 0 not found]
client9 /var lev 0 FLUSH [File 0 not found]
server0 /boot lev 0 FLUSH [File 0 not found]
client10 /boot lev 0 FLUSH [File 0 not found]
client11 /boot lev 0 FLUSH [File 0 not found]
client12 /boot lev 0 FLUSH [File 0 not found]

They are VAULT attemp, not FLUSH, looking only at the first entry, it
try to vault 'client2 /boot 0 20171024084159' which it expect to find on
tape Server-01. It is an older dump.

Do Server-01 is still there? Did it still contains the dump?

OK, I've done some further investigation by tweaking the labeling a bit 
(which actually fixed a purely cosmetic issue we were having), but I'm 
still seeing the same problem that prompted this thread, and I can 
confirm that the dumps are where Amanda is trying to look for them, it's 
just not seeing them for some reason.  I hadn't thought of this before, 
but could it have something to do with the virtual tape library being 
auto-mounted over NFS on the backup server?


Re: Odd non-fatal errors in amdump reports.

2017-11-10 Thread Jean-Louis Martineau

On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote:

On 2017-11-08 08:03, Jean-Louis Martineau wrote:

On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
 > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
 >> Austin,
 >>
 >> It's hard to say something with only the error message.
 >>
 >> Can you post the amdump. and log..0 for the 2
 >> backup set that fail.
 >>
 > I've attached the files (I would put them inline, but one of the sets
 > has over 100 DLE's, so the amdump file is huge, and the others are
 > still over 100k each, and I figured nobody want's to try and wad
 > through those in-line).
 >
 > The set1 and set2 files are for the two backup sets that show the
 > header mismatch error, and the set3 files are for the one that claims
 > failures in the dump summary.


I looked at set3, the error in the 'DUMP SUMMARY' are related to the
error in the 'FAILURE DUMP SUMMARY'

client2 /boot lev 0 FLUSH [File 0 not found]
client3 /boot lev 0 FLUSH [File 0 not found]
client7 /boot lev 0 FLUSH [File 0 not found]
client8 /boot lev 0 FLUSH [File 0 not found]
client0 /boot lev 0 FLUSH [File 0 not found]
client9 /boot lev 0 FLUSH [File 0 not found]
client9 /srv lev 0 FLUSH [File 0 not found]
client9 /var lev 0 FLUSH [File 0 not found]
server0 /boot lev 0 FLUSH [File 0 not found]
client10 /boot lev 0 FLUSH [File 0 not found]
client11 /boot lev 0 FLUSH [File 0 not found]
client12 /boot lev 0 FLUSH [File 0 not found]

They are VAULT attemp, not FLUSH, looking only at the first entry, it
try to vault 'client2 /boot 0 20171024084159' which it expect to find on
tape Server-01. It is an older dump.

Do Server-01 is still there? Did it still contains the dump?

OK, I've done some further investigation by tweaking the labeling a 
bit (which actually fixed a purely cosmetic issue we were having), but 
I'm still seeing the same problem that prompted this thread, and I can 
confirm that the dumps are where Amanda is trying to look for them, 
it's just not seeing them for some reason.  I hadn't thought of this 
before, but could it have something to do with the virtual tape 
library being auto-mounted over NFS on the backup server?



Austin,

Can you try to see if amfetchdump can restore it?

 * amfetchdump CONFIG client2 /boot 20171024084159

Jean-Louis
This message is the property of CARBONITE, INC. and may contain confidential or 
privileged information.
If this message has been delivered to you by mistake, then do not copy or 
deliver this message to anyone.  Instead, destroy it and notify me by reply 
e-mail


Re: Odd non-fatal errors in amdump reports.

2017-11-10 Thread Austin S. Hemmelgarn

On 2017-11-10 08:27, Jean-Louis Martineau wrote:

On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote:

On 2017-11-08 08:03, Jean-Louis Martineau wrote:

On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
 > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
 >> Austin,
 >>
 >> It's hard to say something with only the error message.
 >>
 >> Can you post the amdump. and log..0 for the 2
 >> backup set that fail.
 >>
 > I've attached the files (I would put them inline, but one of the sets
 > has over 100 DLE's, so the amdump file is huge, and the others are
 > still over 100k each, and I figured nobody want's to try and wad
 > through those in-line).
 >
 > The set1 and set2 files are for the two backup sets that show the
 > header mismatch error, and the set3 files are for the one that claims
 > failures in the dump summary.


I looked at set3, the error in the 'DUMP SUMMARY' are related to the
error in the 'FAILURE DUMP SUMMARY'

client2 /boot lev 0 FLUSH [File 0 not found]
client3 /boot lev 0 FLUSH [File 0 not found]
client7 /boot lev 0 FLUSH [File 0 not found]
client8 /boot lev 0 FLUSH [File 0 not found]
client0 /boot lev 0 FLUSH [File 0 not found]
client9 /boot lev 0 FLUSH [File 0 not found]
client9 /srv lev 0 FLUSH [File 0 not found]
client9 /var lev 0 FLUSH [File 0 not found]
server0 /boot lev 0 FLUSH [File 0 not found]
client10 /boot lev 0 FLUSH [File 0 not found]
client11 /boot lev 0 FLUSH [File 0 not found]
client12 /boot lev 0 FLUSH [File 0 not found]

They are VAULT attemp, not FLUSH, looking only at the first entry, it
try to vault 'client2 /boot 0 20171024084159' which it expect to find on
tape Server-01. It is an older dump.

Do Server-01 is still there? Did it still contains the dump?

OK, I've done some further investigation by tweaking the labeling a 
bit (which actually fixed a purely cosmetic issue we were having), but 
I'm still seeing the same problem that prompted this thread, and I can 
confirm that the dumps are where Amanda is trying to look for them, 
it's just not seeing them for some reason.  I hadn't thought of this 
before, but could it have something to do with the virtual tape 
library being auto-mounted over NFS on the backup server?



Austin,

Can you try to see if amfetchdump can restore it?

  * amfetchdump CONFIG client2 /boot 20171024084159
At the moment, I'm re-testing things after tweaking some NFS parameters 
for the virtual tape library (apparently the FreeNAS server that's 
actually storing the data didn't have NFSv4 turned on, so it was mounted 
with NFSv3, which we've had issues with before on our network), so I 
can't exactly check immediately, but assuming the problem repeats, I'll 
do that first thing once the test dump is done.


Re: Odd non-fatal errors in amdump reports.

2017-11-10 Thread Jean-Louis Martineau

Austin,

Can you try the attached patch, I think it could fix the set1 and set2 
errors.


Jean-Louis

On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:

On 2017-11-07 10:22, Jean-Louis Martineau wrote:

Austin,

It's hard to say something with only the error message.

Can you post the amdump. and log..0 for the 2
backup set that fail.

I've attached the files (I would put them inline, but one of the sets 
has over 100 DLE's, so the amdump file is huge, and the others are 
still over 100k each, and I figured nobody want's to try and wad 
through those in-line).


The set1 and set2 files are for the two backup sets that show the 
header mismatch error, and the set3 files are for the one that claims 
failures in the dump summary.

This message is the property of CARBONITE, INC. and may contain confidential or 
privileged information.
If this message has been delivered to you by mistake, then do not copy or 
deliver this message to anyone.  Instead, destroy it and notify me by reply 
e-mail
diff --git a/perl/Amanda/Recovery/Planner.pm b/perl/Amanda/Recovery/Planner.pm
index 7bf09c7..ecb8cc2 100644
--- a/perl/Amanda/Recovery/Planner.pm
+++ b/perl/Amanda/Recovery/Planner.pm
@@ -235,16 +235,24 @@ sub make_plan {
 my $self = shift;
 my %params = @_;
 
-for my $rq_param (qw(plan_cb dumpspecs)) {
+for my $rq_param (qw(plan_cb )) {
 	croak "required parameter '$rq_param' missing"
 	unless exists $params{$rq_param};
 }
 my $status = $params{'status'};
 my $dumpspecs = $params{'dumpspecs'};
+my $hostname = $params{'hostname'};
+my $diskname = $params{'diskname'};
+my $dump_timestamp = $params{'dump_timestamp'};
+my $level = $params{'level'};
 my $src_labelstr = $params{'src_labelstr'};
 
 # first, get the set of dumps that match these dumpspecs
 my @dumps = Amanda::DB::Catalog::get_dumps(dumpspecs => $dumpspecs,
+	   hostname => $hostname,
+	   diskname => $diskname,
+	   dump_timestamp => $dump_timestamp,
+	   level => $level,
 	   status => $status,
 	   labelstr  => $src_labelstr);
 
diff --git a/perl/Amanda/Taper/Worker.pm b/perl/Amanda/Taper/Worker.pm
index 7f205be..c501a20 100644
--- a/perl/Amanda/Taper/Worker.pm
+++ b/perl/Amanda/Taper/Worker.pm
@@ -944,7 +944,10 @@ sub setup_and_start_dump {
 	undef);
 	my @storage_list = ( $self->{'src_storage'} );
 	Amanda::Recovery::Planner::make_plan(
-			dumpspecs => \@dumpspecs,
+			hostname => $self->{'hostname'},
+			diskname => $self->{'diskname'},
+			dump_timestamp => $self->{'datestamp'},
+			level => $self->{'level'},
 			changer => $chg,
 			storage_list => \@storage_list,
 			only_in_storage => 1,


Re: Odd non-fatal errors in amdump reports.

2017-11-10 Thread Austin S. Hemmelgarn

On 2017-11-10 10:00, Jean-Louis Martineau wrote:

Austin,

Can you try the attached patch, I think it could fix the set1 and set2
errors.

Yes, but I won't be able to log in this weekend to revert it if it 
doesn't work, so I won't be able to test it until Monday.


Am I correct in assuming that it only needs to be applied on the server 
and not the clients?


On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
 > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
 >> Austin,
 >>
 >> It's hard to say something with only the error message.
 >>
 >> Can you post the amdump. and log..0 for the 2
 >> backup set that fail.
 >>
 > I've attached the files (I would put them inline, but one of the sets
 > has over 100 DLE's, so the amdump file is huge, and the others are
 > still over 100k each, and I figured nobody want's to try and wad
 > through those in-line).
 >
 > The set1 and set2 files are for the two backup sets that show the
 > header mismatch error, and the set3 files are for the one that claims
 > failures in the dump summary.




Re: Odd non-fatal errors in amdump reports.

2017-11-10 Thread Jean-Louis Martineau

On 10/11/17 10:10 AM, Austin S. Hemmelgarn wrote:

On 2017-11-10 10:00, Jean-Louis Martineau wrote:

Austin,

Can you try the attached patch, I think it could fix the set1 and set2
errors.

Yes, but I won't be able to log in this weekend to revert it if it 
doesn't work, so I won't be able to test it until Monday.


Am I correct in assuming that it only needs to be applied on the 
server and not the clients?

Yes, only on the server

Jean-Louis



On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
 > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
 >> Austin,
 >>
 >> It's hard to say something with only the error message.
 >>
 >> Can you post the amdump. and log..0 for the 2
 >> backup set that fail.
 >>
 > I've attached the files (I would put them inline, but one of the sets
 > has over 100 DLE's, so the amdump file is huge, and the others are
 > still over 100k each, and I figured nobody want's to try and wad
 > through those in-line).
 >
 > The set1 and set2 files are for the two backup sets that show the
 > header mismatch error, and the set3 files are for the one that claims
 > failures in the dump summary.



This message is the property of CARBONITE, INC. and may contain confidential or 
privileged information.
If this message has been delivered to you by mistake, then do not copy or 
deliver this message to anyone.  Instead, destroy it and notify me by reply 
e-mail


Re: Odd non-fatal errors in amdump reports.

2017-11-10 Thread Austin S. Hemmelgarn

On 2017-11-10 08:27, Jean-Louis Martineau wrote:

On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote:

On 2017-11-08 08:03, Jean-Louis Martineau wrote:

On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
 > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
 >> Austin,
 >>
 >> It's hard to say something with only the error message.
 >>
 >> Can you post the amdump. and log..0 for the 2
 >> backup set that fail.
 >>
 > I've attached the files (I would put them inline, but one of the sets
 > has over 100 DLE's, so the amdump file is huge, and the others are
 > still over 100k each, and I figured nobody want's to try and wad
 > through those in-line).
 >
 > The set1 and set2 files are for the two backup sets that show the
 > header mismatch error, and the set3 files are for the one that claims
 > failures in the dump summary.


I looked at set3, the error in the 'DUMP SUMMARY' are related to the
error in the 'FAILURE DUMP SUMMARY'

client2 /boot lev 0 FLUSH [File 0 not found]
client3 /boot lev 0 FLUSH [File 0 not found]
client7 /boot lev 0 FLUSH [File 0 not found]
client8 /boot lev 0 FLUSH [File 0 not found]
client0 /boot lev 0 FLUSH [File 0 not found]
client9 /boot lev 0 FLUSH [File 0 not found]
client9 /srv lev 0 FLUSH [File 0 not found]
client9 /var lev 0 FLUSH [File 0 not found]
server0 /boot lev 0 FLUSH [File 0 not found]
client10 /boot lev 0 FLUSH [File 0 not found]
client11 /boot lev 0 FLUSH [File 0 not found]
client12 /boot lev 0 FLUSH [File 0 not found]

They are VAULT attemp, not FLUSH, looking only at the first entry, it
try to vault 'client2 /boot 0 20171024084159' which it expect to find on
tape Server-01. It is an older dump.

Do Server-01 is still there? Did it still contains the dump?

OK, I've done some further investigation by tweaking the labeling a 
bit (which actually fixed a purely cosmetic issue we were having), but 
I'm still seeing the same problem that prompted this thread, and I can 
confirm that the dumps are where Amanda is trying to look for them, 
it's just not seeing them for some reason.  I hadn't thought of this 
before, but could it have something to do with the virtual tape 
library being auto-mounted over NFS on the backup server?



Austin,

Can you try to see if amfetchdump can restore it?

  * amfetchdump CONFIG client2 /boot 20171024084159

amfetchdump doesn't see it, and neither does amrecover, but the files 
for the given parts are definitely there (I know for a fact that the 
dump in question has exactly one part, and the file for that does exist 
on the virtual tape mentioned in the log file).


I'm probably not going to be able to check more on this today, but I'll 
likely be checking if amrestore and amadmin find can see them.


Re: Odd non-fatal errors in amdump reports.

2017-11-10 Thread Jean-Louis Martineau

The previous patch broke something.
Try this new set2-r2.diff patch

Jean-Louis

On 10/11/17 10:40 AM, Austin S. Hemmelgarn wrote:

On 2017-11-10 08:27, Jean-Louis Martineau wrote:

On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote:

On 2017-11-08 08:03, Jean-Louis Martineau wrote:

On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
 > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
 >> Austin,
 >>
 >> It's hard to say something with only the error message.
 >>
 >> Can you post the amdump. and log..0 for 
the 2

 >> backup set that fail.
 >>
 > I've attached the files (I would put them inline, but one of the 
sets

 > has over 100 DLE's, so the amdump file is huge, and the others are
 > still over 100k each, and I figured nobody want's to try and wad
 > through those in-line).
 >
 > The set1 and set2 files are for the two backup sets that show the
 > header mismatch error, and the set3 files are for the one that 
claims

 > failures in the dump summary.


I looked at set3, the error in the 'DUMP SUMMARY' are related to the
error in the 'FAILURE DUMP SUMMARY'

client2 /boot lev 0 FLUSH [File 0 not found]
client3 /boot lev 0 FLUSH [File 0 not found]
client7 /boot lev 0 FLUSH [File 0 not found]
client8 /boot lev 0 FLUSH [File 0 not found]
client0 /boot lev 0 FLUSH [File 0 not found]
client9 /boot lev 0 FLUSH [File 0 not found]
client9 /srv lev 0 FLUSH [File 0 not found]
client9 /var lev 0 FLUSH [File 0 not found]
server0 /boot lev 0 FLUSH [File 0 not found]
client10 /boot lev 0 FLUSH [File 0 not found]
client11 /boot lev 0 FLUSH [File 0 not found]
client12 /boot lev 0 FLUSH [File 0 not found]

They are VAULT attemp, not FLUSH, looking only at the first entry, it
try to vault 'client2 /boot 0 20171024084159' which it expect to 
find on

tape Server-01. It is an older dump.

Do Server-01 is still there? Did it still contains the dump?

OK, I've done some further investigation by tweaking the labeling a 
bit (which actually fixed a purely cosmetic issue we were having), 
but I'm still seeing the same problem that prompted this thread, and 
I can confirm that the dumps are where Amanda is trying to look for 
them, it's just not seeing them for some reason.  I hadn't thought 
of this before, but could it have something to do with the virtual 
tape library being auto-mounted over NFS on the backup server?



Austin,

Can you try to see if amfetchdump can restore it?

  * amfetchdump CONFIG client2 /boot 20171024084159

amfetchdump doesn't see it, and neither does amrecover, but the files 
for the given parts are definitely there (I know for a fact that the 
dump in question has exactly one part, and the file for that does 
exist on the virtual tape mentioned in the log file).


I'm probably not going to be able to check more on this today, but 
I'll likely be checking if amrestore and amadmin find can see them.



This message is the property of CARBONITE, INC. and may contain confidential or 
privileged information.
If this message has been delivered to you by mistake, then do not copy or 
deliver this message to anyone.  Instead, destroy it and notify me by reply 
e-mail
diff --git a/perl/Amanda/DB/Catalog.pm b/perl/Amanda/DB/Catalog.pm
index 56f7d70..44d2242 100644
--- a/perl/Amanda/DB/Catalog.pm
+++ b/perl/Amanda/DB/Catalog.pm
@@ -468,7 +468,7 @@ sub get_latest_write_timestamp {
 
 if (@timestamps) {
 	# if we're not looking for a particular type, then this is easy
-	if (!exists $params{'types'}) {
+	if (!defined $params{'types'}) {
 	return $timestamps[-1];
 	}
 
@@ -524,20 +524,20 @@ sub get_parts_and_dumps {
 
 # pre-process params by appending all of the "singular" parameters to the "plurals"
 push @{$params{'write_timestamps'}}, map { zeropad($_) } $params{'write_timestamp'} 
-	if exists($params{'write_timestamp'});
+	if defined($params{'write_timestamp'});
 push @{$params{'dump_timestamps'}}, map { zeropad($_) } $params{'dump_timestamp'} 
-	if exists($params{'dump_timestamp'});
+	if defined($params{'dump_timestamp'});
 push @{$params{'hostnames'}}, $params{'hostname'} 
-	if exists($params{'hostname'});
+	if defined($params{'hostname'});
 push @{$params{'disknames'}}, $params{'diskname'} 
-	if exists($params{'diskname'});
+	if defined($params{'diskname'});
 push @{$params{'levels'}}, $params{'level'} 
-	if exists($params{'level'});
+	if defined($params{'level'});
 push @{$params{'storages'}}, $params{'storage'}
 	if defined($params{'storage'});
 if ($get_what eq 'parts') {
 	push @{$params{'labels'}}, $params{'label'}
-	if exists($params{'label'});
+	if defined($params{'label'});
 } else {
 	delete $params{'labels'};
 }
@@ -562,7 +562,7 @@ sub get_parts_and_dumps {
 my @logfiles;
 if ($params{'holding'}) {
 	@logfiles = ( 'holding', );
-} elsif (exists($params{'write_timestamps'})) {
+} elsif (defined($params{'write_timestamps'})) {
 	# if we have specific write_timestamps, the job is pretty easy.
 	my %timestamps_hash = map { ($_, und

Re: Odd non-fatal errors in amdump reports.

2017-11-13 Thread Austin S. Hemmelgarn

On 2017-11-10 08:45, Austin S. Hemmelgarn wrote:

On 2017-11-10 08:27, Jean-Louis Martineau wrote:

On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote:

On 2017-11-08 08:03, Jean-Louis Martineau wrote:

On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
 > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
 >> Austin,
 >>
 >> It's hard to say something with only the error message.
 >>
 >> Can you post the amdump. and log..0 for the 2
 >> backup set that fail.
 >>
 > I've attached the files (I would put them inline, but one of the 
sets

 > has over 100 DLE's, so the amdump file is huge, and the others are
 > still over 100k each, and I figured nobody want's to try and wad
 > through those in-line).
 >
 > The set1 and set2 files are for the two backup sets that show the
 > header mismatch error, and the set3 files are for the one that 
claims

 > failures in the dump summary.


I looked at set3, the error in the 'DUMP SUMMARY' are related to the
error in the 'FAILURE DUMP SUMMARY'

client2 /boot lev 0 FLUSH [File 0 not found]
client3 /boot lev 0 FLUSH [File 0 not found]
client7 /boot lev 0 FLUSH [File 0 not found]
client8 /boot lev 0 FLUSH [File 0 not found]
client0 /boot lev 0 FLUSH [File 0 not found]
client9 /boot lev 0 FLUSH [File 0 not found]
client9 /srv lev 0 FLUSH [File 0 not found]
client9 /var lev 0 FLUSH [File 0 not found]
server0 /boot lev 0 FLUSH [File 0 not found]
client10 /boot lev 0 FLUSH [File 0 not found]
client11 /boot lev 0 FLUSH [File 0 not found]
client12 /boot lev 0 FLUSH [File 0 not found]

They are VAULT attemp, not FLUSH, looking only at the first entry, it
try to vault 'client2 /boot 0 20171024084159' which it expect to 
find on

tape Server-01. It is an older dump.

Do Server-01 is still there? Did it still contains the dump?

OK, I've done some further investigation by tweaking the labeling a 
bit (which actually fixed a purely cosmetic issue we were having), 
but I'm still seeing the same problem that prompted this thread, and 
I can confirm that the dumps are where Amanda is trying to look for 
them, it's just not seeing them for some reason.  I hadn't thought of 
this before, but could it have something to do with the virtual tape 
library being auto-mounted over NFS on the backup server?



Austin,

Can you try to see if amfetchdump can restore it?

  * amfetchdump CONFIG client2 /boot 20171024084159
At the moment, I'm re-testing things after tweaking some NFS parameters 
for the virtual tape library (apparently the FreeNAS server that's 
actually storing the data didn't have NFSv4 turned on, so it was mounted 
with NFSv3, which we've had issues with before on our network), so I 
can't exactly check immediately, but assuming the problem repeats, I'll 
do that first thing once the test dump is done.


It looks like the combination of fixing the incorrect labeling in the 
config and switching to NFSv4 fixed this particular case.


Re: Odd non-fatal errors in amdump reports.

2017-11-13 Thread Austin S. Hemmelgarn

On 2017-11-10 12:52, Jean-Louis Martineau wrote:

The previous patch broke something.
Try this new set2-r2.diff patch
Given that the switch to NFSv4 combined with a change to the labeling 
scheme fixed the other issue, I'm going to re-test these two sets with 
the same changes before I test the patch just so I've got something 
current to compare against.  I should have results from that later 
today, and will likely be testing this patch tomorrow if things aren't 
resolved by the other changes (and based on what you've said and what 
I've seen, I don't think the switch to NFSv4 or the labeling change will 
fix this one).


Jean-Louis

On 10/11/17 10:40 AM, Austin S. Hemmelgarn wrote:
 > On 2017-11-10 08:27, Jean-Louis Martineau wrote:
 >> On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote:
 >>> On 2017-11-08 08:03, Jean-Louis Martineau wrote:
  On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
  > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
  >> Austin,
  >>
  >> It's hard to say something with only the error message.
  >>
  >> Can you post the amdump. and log..0 for
  the 2
  >> backup set that fail.
  >>
  > I've attached the files (I would put them inline, but one of the
  sets
  > has over 100 DLE's, so the amdump file is huge, and the others are
  > still over 100k each, and I figured nobody want's to try and wad
  > through those in-line).
  >
  > The set1 and set2 files are for the two backup sets that show the
  > header mismatch error, and the set3 files are for the one that
  claims
  > failures in the dump summary.
 
 
  I looked at set3, the error in the 'DUMP SUMMARY' are related to the
  error in the 'FAILURE DUMP SUMMARY'
 
  client2 /boot lev 0 FLUSH [File 0 not found]
  client3 /boot lev 0 FLUSH [File 0 not found]
  client7 /boot lev 0 FLUSH [File 0 not found]
  client8 /boot lev 0 FLUSH [File 0 not found]
  client0 /boot lev 0 FLUSH [File 0 not found]
  client9 /boot lev 0 FLUSH [File 0 not found]
  client9 /srv lev 0 FLUSH [File 0 not found]
  client9 /var lev 0 FLUSH [File 0 not found]
  server0 /boot lev 0 FLUSH [File 0 not found]
  client10 /boot lev 0 FLUSH [File 0 not found]
  client11 /boot lev 0 FLUSH [File 0 not found]
  client12 /boot lev 0 FLUSH [File 0 not found]
 
  They are VAULT attemp, not FLUSH, looking only at the first entry, it
  try to vault 'client2 /boot 0 20171024084159' which it expect to
  find on
  tape Server-01. It is an older dump.
 
  Do Server-01 is still there? Did it still contains the dump?
 
 >>> OK, I've done some further investigation by tweaking the labeling a
 >>> bit (which actually fixed a purely cosmetic issue we were having),
 >>> but I'm still seeing the same problem that prompted this thread, and
 >>> I can confirm that the dumps are where Amanda is trying to look for
 >>> them, it's just not seeing them for some reason. I hadn't thought
 >>> of this before, but could it have something to do with the virtual
 >>> tape library being auto-mounted over NFS on the backup server?
 >>>
 >> Austin,
 >>
 >> Can you try to see if amfetchdump can restore it?
 >>
 >> * amfetchdump CONFIG client2 /boot 20171024084159
 >>
 > amfetchdump doesn't see it, and neither does amrecover, but the files
 > for the given parts are definitely there (I know for a fact that the
 > dump in question has exactly one part, and the file for that does
 > exist on the virtual tape mentioned in the log file).
 >
 > I'm probably not going to be able to check more on this today, but
 > I'll likely be checking if amrestore and amadmin find can see them.
 >


Re: Odd non-fatal errors in amdump reports.

2017-11-13 Thread Austin S. Hemmelgarn

On 2017-11-10 12:52, Jean-Louis Martineau wrote:

The previous patch broke something.
Try this new set2-r2.diff patch


Unfortunately, that doesn't appear to have fixed it, though the errors 
look different now.  I'll try and get the log scrubbed by the end of the 
day and post it here.


On 10/11/17 10:40 AM, Austin S. Hemmelgarn wrote:
 > On 2017-11-10 08:27, Jean-Louis Martineau wrote:
 >> On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote:
 >>> On 2017-11-08 08:03, Jean-Louis Martineau wrote:
  On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
  > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
  >> Austin,
  >>
  >> It's hard to say something with only the error message.
  >>
  >> Can you post the amdump. and log..0 for
  the 2
  >> backup set that fail.
  >>
  > I've attached the files (I would put them inline, but one of the
  sets
  > has over 100 DLE's, so the amdump file is huge, and the others are
  > still over 100k each, and I figured nobody want's to try and wad
  > through those in-line).
  >
  > The set1 and set2 files are for the two backup sets that show the
  > header mismatch error, and the set3 files are for the one that
  claims
  > failures in the dump summary.
 
 
  I looked at set3, the error in the 'DUMP SUMMARY' are related to the
  error in the 'FAILURE DUMP SUMMARY'
 
  client2 /boot lev 0 FLUSH [File 0 not found]
  client3 /boot lev 0 FLUSH [File 0 not found]
  client7 /boot lev 0 FLUSH [File 0 not found]
  client8 /boot lev 0 FLUSH [File 0 not found]
  client0 /boot lev 0 FLUSH [File 0 not found]
  client9 /boot lev 0 FLUSH [File 0 not found]
  client9 /srv lev 0 FLUSH [File 0 not found]
  client9 /var lev 0 FLUSH [File 0 not found]
  server0 /boot lev 0 FLUSH [File 0 not found]
  client10 /boot lev 0 FLUSH [File 0 not found]
  client11 /boot lev 0 FLUSH [File 0 not found]
  client12 /boot lev 0 FLUSH [File 0 not found]
 
  They are VAULT attemp, not FLUSH, looking only at the first entry, it
  try to vault 'client2 /boot 0 20171024084159' which it expect to
  find on
  tape Server-01. It is an older dump.
 
  Do Server-01 is still there? Did it still contains the dump?
 
 >>> OK, I've done some further investigation by tweaking the labeling a
 >>> bit (which actually fixed a purely cosmetic issue we were having),
 >>> but I'm still seeing the same problem that prompted this thread, and
 >>> I can confirm that the dumps are where Amanda is trying to look for
 >>> them, it's just not seeing them for some reason. I hadn't thought
 >>> of this before, but could it have something to do with the virtual
 >>> tape library being auto-mounted over NFS on the backup server?
 >>>
 >> Austin,
 >>
 >> Can you try to see if amfetchdump can restore it?
 >>
 >> * amfetchdump CONFIG client2 /boot 20171024084159
 >>
 > amfetchdump doesn't see it, and neither does amrecover, but the files
 > for the given parts are definitely there (I know for a fact that the
 > dump in question has exactly one part, and the file for that does
 > exist on the virtual tape mentioned in the log file).
 >
 > I'm probably not going to be able to check more on this today, but
 > I'll likely be checking if amrestore and amadmin find can see them.
 >


Re: Odd non-fatal errors in amdump reports.

2017-11-13 Thread Jean-Louis Martineau

On 13/11/17 02:53 PM, Austin S. Hemmelgarn wrote:

driver: send-cmd time 9300.326 to taper1: VAULT-WRITE worker1-0 00-00120 local-vtl local-vtl Home-0001 client0 /home/1D 0 20171113073255 
"" "" "" "" 1073741824 memory "" "" 0


FAIL taper "ST:cloud" "POOL:cloud" client0 /home/1D 20171113073255 0 error "File 0 
not found"


Do that dump still exists on tape Home-0001? Find it with amfetchdump.

If yes, send me the taper debug file.


Jean-Louis
This message is the property of CARBONITE, INC. and may contain confidential or 
privileged information.
If this message has been delivered to you by mistake, then do not copy or 
deliver this message to anyone.  Instead, destroy it and notify me by reply 
e-mail


Re: Odd non-fatal errors in amdump reports.

2017-11-14 Thread Austin S. Hemmelgarn

On 2017-11-13 16:42, Jean-Louis Martineau wrote:

On 13/11/17 02:53 PM, Austin S. Hemmelgarn wrote:

driver: send-cmd time 9300.326 to taper1: VAULT-WRITE worker1-0 00-00120 
local-vtl local-vtl Home-0001 client0 /home/1D 0 20171113073255 "" "" "" 
"" 1073741824 memory "" "" 0


 > FAIL taper "ST:cloud" "POOL:cloud" client0 /home/1D 20171113073255 0 
error "File 0 not found"


Do that dump still exists on tape Home-0001? Find it with amfetchdump.

If yes, send me the taper debug file.
amfetchdump does not see it, but looking directly at the virtual tape 
directories, I can see it there.




Re: Odd non-fatal errors in amdump reports.

2017-11-14 Thread Austin S. Hemmelgarn

On 2017-11-14 07:34, Austin S. Hemmelgarn wrote:

On 2017-11-13 16:42, Jean-Louis Martineau wrote:

On 13/11/17 02:53 PM, Austin S. Hemmelgarn wrote:

driver: send-cmd time 9300.326 to taper1: VAULT-WRITE worker1-0 
00-00120 local-vtl local-vtl Home-0001 client0 /home/1D 0 
20171113073255 "" "" "" "" 1073741824 memory "" "" 0


 > FAIL taper "ST:cloud" "POOL:cloud" client0 /home/1D 20171113073255 
0 error "File 0 not found"


Do that dump still exists on tape Home-0001? Find it with amfetchdump.

If yes, send me the taper debug file.
amfetchdump does not see it, but looking directly at the virtual tape 
directories, I can see it there.


Just tried an amcheckdump on everything, it looks like some of the dump 
files are corrupted, but I can't for the life of me figure out why (I 
test our network regularly and it has no problems, and any problems with 
a particular system should show up as more than just corrupted tar 
files).  I'm going to try disabling compression and see if that helps at 
all, as that's the only processing other than the default that we're 
doing on the dumps (long term, it's not really a viable option, but if 
it fixes things at least we know what's broken).


Re: Odd non-fatal errors in amdump reports.

2017-11-14 Thread Austin S. Hemmelgarn

On 2017-11-14 07:43, Austin S. Hemmelgarn wrote:

On 2017-11-14 07:34, Austin S. Hemmelgarn wrote:

On 2017-11-13 16:42, Jean-Louis Martineau wrote:

On 13/11/17 02:53 PM, Austin S. Hemmelgarn wrote:

driver: send-cmd time 9300.326 to taper1: VAULT-WRITE worker1-0 
00-00120 local-vtl local-vtl Home-0001 client0 /home/1D 0 
20171113073255 "" "" "" "" 1073741824 memory "" "" 0


 > FAIL taper "ST:cloud" "POOL:cloud" client0 /home/1D 20171113073255 
0 error "File 0 not found"


Do that dump still exists on tape Home-0001? Find it with amfetchdump.

If yes, send me the taper debug file.
amfetchdump does not see it, but looking directly at the virtual tape 
directories, I can see it there.


Just tried an amcheckdump on everything, it looks like some of the dump 
files are corrupted, but I can't for the life of me figure out why (I 
test our network regularly and it has no problems, and any problems with 
a particular system should show up as more than just corrupted tar 
files).  I'm going to try disabling compression and see if that helps at 
all, as that's the only processing other than the default that we're 
doing on the dumps (long term, it's not really a viable option, but if 
it fixes things at least we know what's broken).
No luck changing compression.  I would suspect some issue with NFS, but 
I've started seeing the same symptoms on my laptop as well now (which is 
completely unrelated to any of the sets at work other than having an 
almost identical configuration other than paths and the total number of 
tapes).


Re: Odd non-fatal errors in amdump reports.

2017-11-16 Thread Austin S. Hemmelgarn

On 2017-11-14 14:37, Austin S. Hemmelgarn wrote:

On 2017-11-14 07:43, Austin S. Hemmelgarn wrote:

On 2017-11-14 07:34, Austin S. Hemmelgarn wrote:

On 2017-11-13 16:42, Jean-Louis Martineau wrote:

On 13/11/17 02:53 PM, Austin S. Hemmelgarn wrote:

driver: send-cmd time 9300.326 to taper1: VAULT-WRITE worker1-0 
00-00120 local-vtl local-vtl Home-0001 client0 /home/1D 0 
20171113073255 "" "" "" "" 1073741824 memory "" "" 0


 > FAIL taper "ST:cloud" "POOL:cloud" client0 /home/1D 
20171113073255 0 error "File 0 not found"


Do that dump still exists on tape Home-0001? Find it with amfetchdump.

If yes, send me the taper debug file.
amfetchdump does not see it, but looking directly at the virtual tape 
directories, I can see it there.


Just tried an amcheckdump on everything, it looks like some of the 
dump files are corrupted, but I can't for the life of me figure out 
why (I test our network regularly and it has no problems, and any 
problems with a particular system should show up as more than just 
corrupted tar files).  I'm going to try disabling compression and see 
if that helps at all, as that's the only processing other than the 
default that we're doing on the dumps (long term, it's not really a 
viable option, but if it fixes things at least we know what's broken).
No luck changing compression.  I would suspect some issue with NFS, but 
I've started seeing the same symptoms on my laptop as well now (which is 
completely unrelated to any of the sets at work other than having an 
almost identical configuration other than paths and the total number of 
tapes).


So, I finally got things working by switching from:

storage "local-vtl"
vault-storage "cloud"

To:

storage: "local-vtl" "cloud"

And removing the "vault" option from the local-vtl storage definition. 
Strictly speaking, this is working around the issue instead of fixing 
it, but it fits within what we need for our usage, and actually makes 
the amdump runs complete faster (since dumps get taped to S3 in parallel 
with getting taped to the local vtapes).


Based on this, and the fact that the issues I was seeing with corrupted 
dumps being reported by amcheckdump, I think the issue is probably an 
interaction between the vaulting code and the regular taping code, but 
I'm not certain.


Thanks for the help.