from:"Austin S. Hemmelgarn"


On 2018-11-15 13:36, Gene Heskett wrote:

On Thursday 15 November 2018 12:57:54 Austin S. Hemmelgarn wrote:


On 2018-11-15 11:53, Gene Heskett wrote:

On Thursday 15 November 2018 07:36:37 Austin S. Hemmelgarn wrote:

On 2018-11-15 06:16, Gene Heskett wrote:

I ask because after last nights run it showed one huge and 3 teeny
level 0's for the 4 new dle's.  So I just re-adjusted the
locations of some categories and broke the big one up into 2
pieces. "./[A-P]*" and ./[Q-Z]*", so the next run will have 5 new
dle's.

But an estimate does not show the new names that results in. I've
even took the estimate assignment calcsize back out of the global
dumptype, which ack the manpage, forces the estimates to be
derived from a dummy run of tar, didn't help.

Clues? Having this info from an estimate query might take a couple
hours, but it sure would be helpfull when redesigning ones
dle's.I'm fairly certain you can't, because it specifically shows
server-side


estimates, which have no data to work from if there has never been
a dump run for the DLE.


Even if you told it to user tar for the estimate phase? That has
enough legs to be called a bug. IMO anyway.


As mentioned in one of my other responses, I can kind of see the value
in this not bothering the client systems.  Keep in mind that server
estimates cost nothing on the client, while calcsize or client
estimates may use a significant amount of resources.


My default has been calcsize for three or 4 years, changed because tar
was changed & was screwing up the estimates. I can remember 15+ years
ago when I was using real tar estimates, on a much smaller machine, and
it could come within 50 megabytes of filling a DDS-2 tape (4 GB
compressed) for weeks at a time. So that part of amanda worked a lot
better than it does today. And its slowly gone to the dogs as my system
grew in complexity.  And went in a handbasket when I had to change to
calcsize during the tar churn.
I've not been using AMANDA anywhere near as long as you have, but I've 
actually not seen any issues with accuracy of 'estimate client' mode 
estimates with current versions of GNU tar, except when the estimate ran 
while data in the DLE was being modified (and in that case, it makes 
sense that it would be bogus).  I generally don't 'estimate client' on 
my own systems though because it consistently takes far longer than 
'estimate calcsize', and I'm not picky about the estimates being perfect.
  

In this case, I do think the documentation should be a bit clearer,


Yes, but who is to rewrite it?  He should know a heck of a lot more than
I do about the amanda innards than I do even after 2 decades, and better
defined words here and there too. diakdevice is a very poor substitute
for the far more common slanguage of "/path/to/"

and it would be useful to be able to get regular (calcsize and/or
client) estimates on-demand, but I do think that the default is
reasonably sane.


It may well be sane, we'll see how it works in the morning. AIUI,
calcsize runs only on old history. so that should not impinge a load on
the client, even when the client is itself.

Unless I'm mistaken:

* 'estimate server' runs only on historical data, and doesn't even talk 
to the client systems.  It's good at limiting the impact the estimate 
has on the client, but reliably gives bogus estimates if your DLEs don't 
show consistent behavior (that is, each backup of a given level is 
roughly the same size as every other backup at that level).
* 'estimate client' relies on the backup program being used to give it 
info about how big it will be.  It gives estimates that are close to 
100% accurate, but currently essentially requires running the backup 
process twice (once for the estimate, once for the actual backup) and 
imposes a non-negligible amount of load on the client.
* 'estimate calcsize' does something kind of in-between.  AIUI, it looks 
at some historical data, and also looks at the on-disk size of the data, 
then factors in compression ratios and such to give an estimate that's 
usually reasonably accurate without needing the DLEs to be consistent or 
imposing significant load on the clients.

Re: Does anyone know how to make an amadmin $config estimate work for new dle's?


On 2018-11-15 11:53, Gene Heskett wrote:

On Thursday 15 November 2018 07:36:37 Austin S. Hemmelgarn wrote:


On 2018-11-15 06:16, Gene Heskett wrote:

I ask because after last nights run it showed one huge and 3 teeny
level 0's for the 4 new dle's.  So I just re-adjusted the locations
of some categories and broke the big one up into 2 pieces.
"./[A-P]*" and ./[Q-Z]*", so the next run will have 5 new dle's.

But an estimate does not show the new names that results in. I've
even took the estimate assignment calcsize back out of the global
dumptype, which ack the manpage, forces the estimates to be derived
from a dummy run of tar, didn't help.

Clues? Having this info from an estimate query might take a couple
hours, but it sure would be helpfull when redesigning ones dle's.I'm
fairly certain you can't, because it specifically shows server-side


estimates, which have no data to work from if there has never been a
dump run for the DLE.


Even if you told it to user tar for the estimate phase? That has enough
legs to be called a bug. IMO anyway.
As mentioned in one of my other responses, I can kind of see the value 
in this not bothering the client systems.  Keep in mind that server 
estimates cost nothing on the client, while calcsize or client estimates 
may use a significant amount of resources.


In this case, I do think the documentation should be a bit clearer, and 
it would be useful to be able to get regular (calcsize and/or client) 
estimates on-demand, but I do think that the default is reasonably sane.

Re: Does anyone know how to make an amadmin $config estimate work for new dle's?

On 2018-11-15 11:21, Chris Nighswonger wrote:
On Thu, Nov 15, 2018 at 7:40 AM Austin S. Hemmelgarn 
mailto:ahferro...@gmail.com>> wrote:

On 2018-11-15 06:16, Gene Heskett wrote:
 > I ask because after last nights run it showed one huge and 3
teeny level
 > 0's for the 4 new dle's.  So I just re-adjusted the locations of some
 > categories and broke the big one up into 2 pieces. "./[A-P]*"
 > and ./[Q-Z]*", so the next run will have 5 new dle's.
 >
 > But an estimate does not show the new names that results in. I've
even
 > took the estimate assignment calcsize back out of the global
dumptype,
 > which ack the manpage, forces the estimates to be derived from a
dummy
 > run of tar, didn't help.
 >
 > Clues? Having this info from an estimate query might take a
couple hours,
 > but it sure would be helpfull when redesigning ones dle's.

I'm fairly certain you can't, because it specifically shows server-side
estimates, which have no data to work from if there has never been a
dump run for the DLE.

What would be the downside to having the amanda client execute ' du -s' 
or some such on the DLE and return the results when amcheck and friends 
realize there is no reliable size estimate? This would seem to be a much 
more accurate estimate than a non-existent server estimate.

My guess is that it's intentionally limited to server estimates to avoid 
putting load on the client systems.  Both calcsize and client estimates 
require reading a nontrivial amount of data on the client side, and 
client estimates also involve a nontrivial amount of processing.

That said, it would be nice to be able to explicitly run any of the 
three types of estimate.

Re: Does anyone know how to make an amadmin $config estimate work for new dle's?


On 2018-11-15 06:16, Gene Heskett wrote:

I ask because after last nights run it showed one huge and 3 teeny level
0's for the 4 new dle's.  So I just re-adjusted the locations of some
categories and broke the big one up into 2 pieces. "./[A-P]*"
and ./[Q-Z]*", so the next run will have 5 new dle's.

But an estimate does not show the new names that results in. I've even
took the estimate assignment calcsize back out of the global dumptype,
which ack the manpage, forces the estimates to be derived from a dummy
run of tar, didn't help.

Clues? Having this info from an estimate query might take a couple hours,
but it sure would be helpfull when redesigning ones dle's.I'm fairly certain you can't, because it specifically shows server-side 
estimates, which have no data to work from if there has never been a 
dump run for the DLE.

Re: Monitor and Manage

2018-11-14 Thread Austin S. Hemmelgarn


On 2018-11-14 10:44, Chris Miller wrote:

Hi Folks,

I now have three working configs, meaning that I can backup three 
clients. There is not much difference among the configs, but that is a 
topic for a different thread. My question is how I manage what AMANDA is 
doing?


So, let's suppose I fire up all three amdumps at once:

  * How do I know if I'm getting level 0 or higher?
  * How do I know the backups are running and have not silently failed?
  * How do I know when they complete?
  * How do I know what has been accomplished?
  * :

These are all the sort of questions that might be answered by some sort 
of dashboard, but I haven't heard of any such thing, nor do I expect to 
hear of any such thing, but I am also equally sure that all the answers 
exist. I just don't know where.


In short, how do I monitor and mange AMANDA?
Well, for generic monitoring, make sure the system can deliver email and 
you have the aliases set up appropriately, and then configure Amanda to 
email you a report when the dump completes.


The reports themselves are actually rather thorough, going over both 
aggregate timing and performance information as well as the useful 
generic stuff like knowing what dump level everything ran at and what 
tapes got used.


You can get similar details for the last dump (or the current one if one 
is in-progress) using the `amstatus` command, which will also show 
progress info for individual DLE's if there is a dump running currently.


For more in-depth management, take a look at the `amadmin` and `amtape` 
commands, they both provide useful functionality for general management 
that doesn't involve actually running the backups, including:


* Forcing a level 0 or level 1 dump for any or all of the DLE's for the 
next run.  Don't get in the habit of doing this regularly, overriding 
the planner will usually not get you good results.
* Forcing Amanda to bump to a new dump level for a given DLE.  Again, 
don't do this regularly.
* Querying when the next level 0 dump is due for a given DLE.  This 
gives you an upper limit on when the DLE will get a level 0 dump 
assuming you stick to the schedule you told Amanda about.
* Querying details about all currently stored backups, including dates, 
location, and dump status.

* Querying the state of all the tapes/vtapes Amanda is managing.

Re: dumporder

2018-11-05 Thread Austin S. Hemmelgarn


On 11/5/2018 1:31 PM, Chris Nighswonger wrote:

Is there any wisdom available on optimization of dumporder?


This is personal experience only, but I find that in general, if all 
your dumps run at about the same speed and you don't have to worry about 
bandwidth, using something like the following generally gets reasonably 
good behavior:


'ssSS'

In essence, it ensures that the smallest dumps complete fast, while 
still making sure the big ones get started early.


Where I work, we've got a couple of slow systems, and I find that this 
works a bit better under those circumstances:


'ssST'

Similar to the above, except it makes sure that the long backups get 
started early too (I would use 'ssTT', except that we only run one DLE 
at a time for any given host).

Re: Can AMANDA be configured "per client"?

2018-11-05 Thread Austin S. Hemmelgarn

On 11/5/2018 1:05 PM, Chris Miller wrote:

Hi Folks,

I have four servers, henceforth AMANDA clients, that I need to backup
and a lot of NAS space available. I'd like to configure AMANDA to treat
each of the four AMANDA clients as if it were the only client, meaning
each client should have it's own configuration which includes location
for backup storage. I have a 3 TB staging disk on the AMANDA server. I
have reasons for the individual treatment of clients that include
off-site storage requirements and differing data sensitivity, so the
simple solution is to be able to configure AMANDA to treat each client
as a single case, so I can provide for proper security and custody of
the backups. Can this be done?
Yes, just create a separate configuration for each client on the server
(that is, a separate amanda.conf and disklist for each client, with each
pair being in it's own sub-folder of your main amanda configuration
directory). This is actually a pretty common practice a lot of places
(for example, the company I work for has 3 separate configurations that
run at different times overnight and have slightly different parameters
for the actual backups). The only caveat is that you have to explicitly
run dumps for each configuration, but that's really not that hard.

Please refer to the small table below. I have some basic questions, but
the volume of documentation is difficult to grasp all at once, so please
forgive what might seem like trivial questions; they are not yet trivial
to me.

Using 10.1.1.10 from the table below as an example:

1. I think I define the length of my tapes to be the maximum for a
given client backup, which is the size of a level 0 dump, which is
135 GB for the example of 10.1.1.10. Since I want to configure
AMANDA to treat each client as an individual and not part of a
collection of backup tasks, I assume AMANDA will use one vtape per
client per night. Can this be done? How do I qualify configuration
settings per client?
The main part of this should be answered by my comment above (if you
have separate configurations, it's trivial to specify different settings
for each client).

That said, you probably want the vtape size to be _larger_ than your
current theoretical max backup size, because it's very hard to change
the vtape 'size' after the fact, and if you run out of space the whole
backup may fail. Keep in mind that vtapes only take up whatever space
is necessary for the data being stored on them (plus a bit extra for the
virtual label), so you can set this to an arbitrarily large value. As
an example, the vtape configuration where I work specifies 2TB vtapes,
because that's an amount I know for certain we will never hit in one
run, even if everything is a level 0 backup.

2. I have planned for one level 0 and five level 1 backups per week. Do
I call this "a cycle"? I think I need 185 GB storage per "cycle" and
this tells me how many "cycles", in this case, weeks, I can store
before I have to re-use tapes. Does this mean I can plan on a 43
cycle (week) retention of my backups? Will AMANDA append to tapes,
meaning can I put a full week on one vtape?This doesn't _quite_ line up with what Amanda calls a cycle, see my
comments on your next question for more info on that. Also, as
mentioned above, assume you will need more space than you calculated,
failed backups are a pain to deal with.

As far as taping, Amanda _never_ appends to a tape, it only ever
rewrites whole tapes. While it's technically possible to get Amanda to
pack all the data it can onto one tape across multiple runs, it's
generally only a good idea to do this if you need to store backups on a
very limited number of physical tapes because:

* It means that some of your backup data may sit around on the Amanda
server for an extended period of time before being taped (if you're
doing a full week's worth of backups on one tape, that level zero backup
won't get taped until the end of the week).

* Amanda rewrites whole tapes. This means you will lose all backups on
a tape when it gets reused.

Because you don't have any wasted storage space using vtapes, it's
better to just plan on one vtape per run, specify a number appropriate
for your retention requirements (plus a few extra to allow recovery from
errors), and then just let Amanda run. Such a configuration is more
reliable and significantly more predictable.

As another concrete example from the configuration where I work: We do
4 week cycles (so we have at least one level zero backup every 28 days
for each disk list entry), do daily backups, and retain backups for 16
weeks. For our vtape configuration, this translates to requiring 112
tapes for all the current backups. We need to be able to access the
oldest backups during the current cycle, so we have an additional
cycle's worth of tapes as well (bringing the total up to 130). We also
need to guarantee

Re: Zmanda acquired from Carbonite by BETSOL -- future of Amanda development?

2018-10-02 Thread Austin S. Hemmelgarn


On 2018-10-02 13:29, Gene Heskett wrote:

On Tuesday 02 October 2018 12:34:40 Ashwin Krishna wrote:


Hi All,

We propose to have the call on Oct 8th at 11 AM Mountain Time.

Agenda:
*   Zmanda's Acquisition by BETSOL
*   Attendee Introductions
*   Existing Governance Model of Amanda Community
*   Suggested changes to the Governance Model
*   BETSOL's Commitment to Open Source Community

We have taken a note of all the suggestions received on the mailing
list and we will go through the same on the call.

Meeting Details:
Amanda Open Source Community Discussion
Mon, Oct 8, 2018 11:00 AM - 12:00 PM MDT
Please join my meeting from your computer, tablet or smartphone.
https://global.gotomeeting.com/join/438069045
You can also dial in using your phone.
United States: +1 (786) 535-3211
Access Code: 438-069-045
First GoToMeeting? Let's do a quick system check:
https://link.gotomeeting.com/system-check

Regards,
Ashwin Krishna

-Original Message-
From: Nathan Stratton Treadway 
Sent: Thursday, September 27, 2018 9:01 PM
To: Ashwin Krishna 
Cc: amanda-users@amanda.org
Subject: Re: Zmanda acquired from Carbonite by BETSOL -- future of
Amanda development?

Ashwin, thanks very much for getting in contact with the Amanda
mailing list.

On Thu, Sep 27, 2018 at 06:16:02 +, Ashwin Krishna wrote:

We are 100% committed to the open source community and will be
contributing to the code base to the best of our abilities.


[...]


I want to assure you that we are actively investing in growing
Amanda and we have young enthusiastic engineers in the team.

You can expect the next Amanda releases to include support for newer
versions of operating systems, defect fixes, security enhancements
etc.


[...]


We have retained the team members that we could of previous Zmanda
team. I can tell you that it's not easy without support from the
community members. We encourage the community members to guide and
contribute as much as you can. If you need commit access to the code
base, please don't hesitate to reach out to us. You can expect our
commitment and support to you.


On Thu, Sep 27, 2018 at 22:54:02 +, Ashwin Krishna wrote:

We are planning to host a conference call and would like all the
active admins and community members to join to have a discussion
with the Zmanda team at BETSOL regarding future collaborations.

Will be sending out the meeting details (US time) with the agenda
later.


It sounds like getting the new BETSOL team in direct contact with the
admins for the mailing list and other amanda.org-related resources in
an important step at this point.


However, I would say that for many of us here on the list, the most
notable change in the past 7 months is not related those things (which
have continued to chug along as before), but rather the lack of "a
developer" to move things along here on the public lists and in the
public source repo.

A decade or two ago it sounds like there were a number of developers
involved, but more recently it's just been one or two Zmanda people
who have served that role.

Obviously this could be a good time to reconsider this arrangement if
there are in fact other people ready to jump in, but off hand I'm
guessing that what's likely to work going forward is for there to be a
small number of BETSOL developers back in that role.

As an Amanda user who has tried to contribute back a few improvements
to the code line, I'm not really looking to have direct commit access
myself, but rather hope to get back to someone (hanging out here on
the mailing lists) who can take the patches I came up with hacking
around on my own system and understand whether or not they will really
work for everyone, and who will know which branches should have that
change pushed onto them, and what tweaks are needed to make the patch
apply to some older branch, etc.

So, here's hoping you all at BETSOL are soon able to identify
someone/a few people to take over that function, and patches and
discussions can start flowing again

Nathan

p.s. Personally I'd say that, rather than than a new major release
with support for newer versions of operating systems and whatnot.,
more urgent would be a minor release to gather up the handful of
bugfixes which have already been discussed since 3.5.1 came out and
get them published as part of an official release


+10, the 3.3.7p1 planner in particular is in serious need of help. It
refuses to adjust the schedule of the 3 largest members of my disklist,
choosing instead to do all three level 0's on the same run, so a 30 gig
average backup, has become 24 gigs for many nights, followed by a 60+
gig run using 3 vtapes. 5 or 6 tapelist cycles in a row now. I'd build
this mythical 3.5.1 but its been hidden someplace my browsing has not
found.

You should be able to get 3.5.1 here:
https://sourceforge.net/projects/amanda/files/amanda%20-%20stable/3.5.1/

That said, 3.5.1 doesn't seem to be much

Re: Weird amdump behavior

2018-07-30 Thread Austin S. Hemmelgarn


On 2018-07-30 00:38, Kamil Jońca wrote:

Gene Heskett  writes:


On Saturday 28 July 2018 08:30:27 Kamil Jońca wrote:


Gene Heskett  writes:

[..]


Too many dumps per spindle, drive seeks take time=timeout?


As I can see in gdb/strace planner hangs on "futex"
'futex' is short for 'Fast Userspace muTEX', it's a synchronization 
primitive.  Based on personal experience (not with Amanda, but just 
debugging software hangs in general), this usually means it's either a 
threading issue, or that you've ended up with a deadlock somewhere 
between processes.  Regardless, it's probably an issue on the local 
system, and most likely only happens when backing up more than one 
client because you have more processes/threads involved and actually 
doing things in that case.


This is probably going to sound stupid, but try 
updating/rebuilding/reinstalling Perl, whatever Perl packages Amanda 
depends on (I don't remember which packages they are), and Amanda 
itself.  Most of the time when I see this kind of issue, it ends up 
being a case of at-rest data corruption in the executables or libraries, 
and reinstalling the problem software typically fixes things.




1. I do not configure spindle at all.


So its possible to have multiple dumps from the same spindle at the same
time.


No. There is another parameter,
--8<---cut here---start->8---
  maxdumps int
Default: 1. The maximum number of backups from a single host
that Amanda will attempt to run in parallel. See also the
inparallel option.
--8<---cut here---end--->8---

And I use default value, so I have at most one dump per host at once
(and I am quite happy with this)

Of course I can change spindles for testing, but, to be honest, I do
not understand, how should that help.




Please, give every disk in each machine its own unique spindle number.
Your backups should be done much faster.


I do not want faster dumps . I want working dumps.

KJ

Re: taper should wait until all dumps are done

2018-07-27 Thread Austin S. Hemmelgarn


On 2018-07-27 14:15, Stefan G. Weichinger wrote:

Am 27.07.2018 um 19:37 schrieb Austin S. Hemmelgarn:


Perhaps I can help with that.


Great stuff, thanks for your informative reply, that's exactly the 
information I would like to have in the docs etc


Will consult that in detail asap.

A quick note on what I try to solve here:

I have servers with only one big RAID-array consisting of maybe 4 or 6 
physical disks, and based on that (software-)RAID there is one LVM 
volume group. So the logical volumes containing the data to be backed up 
(DLEs) are on the same array as the other LV providing the amanda 
holding disk.


Yes, I know, that's not optimal, though I can't easily change that (I 
would have to add separate disks for holding disk purpose ... cost and 
space/controller issues)
Don't worry, I've got to deal with similarly sub-optimal stuff where I 
work (our backup server has to multiplex all the dumps _and_ taping over 
a single GbE connection, so our backups are _always_ network-bound, even 
when we do really aggressive compression), so I entirely understand.


So I want to avoid too much parallel activity of dumper and taper 
processes because that lets the throughput drop down massively (not to 
mention the additional stress on the hardware).


So it would be great to be able to tell amanda "the DLEs coming from the 
amanda client which is the amanda server (~localhost) should be dumped 
to holdingdisk while no taper processes run"


Or something in that direction.

I will consider reducing maxdumps to 4 as well and test "" for 
tonight's run.


And yes, I also test "holdingdisk no" for some DLEs already: I have big 
chunks of VM backups where it doesn't make sense to copy them within the 
RAID array ... I tape them directly.
If you're taping to vtapes, you might actually be able to set things up 
to not need a holding disk at all.  I'm a bit fuzzy on how to configure 
it, but I know it's possible to set up vtapes to tape things in 
parallel.  If you do that, you could (probably, again not 100% certain) 
get rid of the holding disk, dump direct to the vtapes, and still have 
the dumps run in parallel.  That would avoid having to worry about the 
taper processes competing with the dumper processes.  The only caveat is 
that failure to tape would mean failure to dump too, but the number of 
situations where you would fail to tape but still be able to dump to the 
same array as a holding disk is near zero, and the only one I can think 
of off the top of my head is completely avoided by not having a holding 
disk.

Re: taper should wait until all dumps are done

2018-07-27 Thread Austin S. Hemmelgarn


On 2018-07-19 09:41, Stefan G. Weichinger wrote:


I know about the 2 parameters

flush-threshold-dumped
flush-threshold-scheduled

but how to make sure that *all* the planned dumps are done before
writing to tape?

Some kind of "taper-wait" ...

Or just by trial-and-error with the 2 mentioned parameters?

You can do this by figuring out the upper limit of how much space all 
your backups will need, figuring out what percentage of your tape size 
that translates to, and then setting both of the flush-threshold values 
to that percentage, taperflush to 0 (to flush everything), and autoflush 
to 'yes' (so that it actually flushes the data).


However, keep in mind that for this to work, your holding disk has to be 
able to hold all of your dumps for a single run simultaneously.

Re: taper should wait until all dumps are done

2018-07-27 Thread Austin S. Hemmelgarn


On 2018-07-27 12:23, Stefan G. Weichinger wrote:

Am 27.07.2018 um 17:02 schrieb Jean-Francois Malouin:


You should also consider playing with dumporder.
I have it set to '' and that makes the longest (time wise)
dumps go first so that the fast ones get push at the end.
In one config I have:

dumporder ""
flush-threshold-dumped 100
flush-threshold-scheduled 100
taperflush 100
autoflush yes

so that all the dumps will wait until the longest one are done.
It also won't go until it can fill one volume (100%). You can
obviously go further than that if you have enough hold disk.

Or at least it's my understanding...


(the ML was down for a while, so that's the reason for my delayed 
response, it should work now)


I checked "dumporder" in that config, it was "BTBT...", I changed it to 
"TTT..." now for a test.


Although I am not 100% convinced that this will do the trick ;-)

We will see.

I never fully understood that parameter and its influence so far, to me 
it's a bit "unintuitive".

Perhaps I can help with that.

Part of what Amanda's scheduling does is figure out the size that each 
backup will be on each run (based on the estimate process), how much 
bandwidth it will need while dumping (based on the bandwidth settings 
for that particular dump type), and the amount of time it will take 
(predicted based on the size, prior timing data, and possibly the 
bandwidth).  That information is then used together with the 'dumporder' 
setting to control how each dumper chooses what dump to do next when it 
finishes dumping.  Each letter in the value corresponds to exactly one 
dumper, and controls only that dumper's selection.


The size-based selection is generally the easiest to explain, it just 
says to pick the largest (for 'S') or smallest (for 's') dump out of the 
set and run that next.


The bandwidth-based selection is only relevant if you have bandwidth 
settings configured.  Without them, it treats all dumps as equal, and 
picks the next dump based solely on the order that amanda has them 
sorted (which, IIRC, matches the order found in the disk list).  With 
them, it uses a similar selection method to the size-based selection, 
just looking at bandwidth instead of size.


The time-based selection is where things get tricky, but they get tricky 
because of how complicated it is to predict how long a dump will take, 
not because the selection is complicated (it works just like size-based 
selection, just looking at estimated runtime instead of size).  Pretty 
much, the timing data is extrapolated by looking at previous dumps of 
the DLE, correlating size and actual run-time.  I'm not sure what 
fitting method it uses for the extrapolation (my first guess would be 
simple linear extrapolation, because that's easy and should work most of 
the time), and I'm also not sure what, if any, impact bandwidth has on 
the calculation.


So, in short you have:

* 'S' and 's': Simple deterministic selection based on the predicted 
size of the dump.
* 'B' and 'b': Simple deterministic selection based on bandwidth 
settings if they are defined, otherwise trivial FIFO selection.
* 'T' and 't': Not quite deterministic selection based on predicted 
execution time of the dump process.


So, for a couple of examples:

* The default setting 'BTBTBTBT' This will have half the dumpers select 
dumps that will take the largest amount of time, and the other select 
the ones that will take the largest amount of bandwidth.  This works 
reasonably well if you have bandwidth settings configured and wide 
variance in dump size.


* What you're looking at testing '': This is a trivial case of 
all dumpers selecting the dumps that will take the longest time.  If 
you're dumping almost all similar hosts, this will be essentially 
equivalent to just selecting the largest.  If you're dumping a wide 
variety of different hosts, it will be equivalent to selecting the 
largest on the first dump, but after that will select based on which 
system takes the longest.


* What I use on my own systems 'SSss' (I only run four dumpers, not 
eight):  This is a reasonably simple option that gives a good balance 
between getting dumps done as quickly as possible, and not wasting time 
waiting on the big ones.  Two of the dumpers select whatever dump is the 
largest, so that some of the big ones get started right away, while the 
other two select the smallest dumps, so that those get backed up 
immediately.  I've done some really simple testing that indicates that 
this actually gets all the dumps done faster on average than the default 
for the case of all your systems being able to dump data at the same rate.


* What we use where I work 'TTss': This is one where things get a 
bit complicated.  There are three different ways things get selected 
here.  First, two of the eight dumpers will select dumps that are going 
to take the longest amount of time.  Then, you have four that will pull 
the largest ones, and two that

Re: custom_compress with zstd

2018-04-04 Thread Austin S. Hemmelgarn


On 2018-04-04 06:01, Stefan G. Weichinger wrote:

Am 2018-04-03 um 20:52 schrieb Austin S. Hemmelgarn:

On 2018-04-03 14:25, Stefan G. Weichinger wrote:


Does anyone already use zstd  https://en.wikipedia.org/wiki/Zstandard
with amanda?

I will try to define an initial dumptype and play around although I
wonder if the standard behavior leads to any problems.

zstd does not remove the source file after de/compression per default
(only with "--rm") ... but as it is used within a pipe (?) with amanda I
assume that won't hurt.

The "-d" for decompression is there so that should work.



I've been using it for a few months now both at home and at work.  It
works just fine as-is and gets pretty good performance.

In both cases though, I actually use a wrapper script.  The one for
backups at work just adds `-T2` to the zstd command line as our backup
server has lots of CPU (and CPU time), but the backups are
network-limited.  At home, I also bump the compression level as high as
I can without needing special decompression options (so the full command
line at home that the wrapper passes is `-19 --long --zstd=hlog=26 -T2`).

I've done numerous restores from both sets of backups both with and
without the wrapper script (I initially set both up to just use zstd
directly), and it all appears to work just fine.


Would this work as well?
That's essentially what I used initially, and I had no issues with it at 
all either backing things up or restoring.


->

define dumptype client-zstd-tar {
global
program "GNUTAR"
comment "custom client compression dumped with tar"
compress client custom
client_custom_compress "/usr/bin/zstd"
}

Re: custom_compress with zstd

2018-04-03 Thread Austin S. Hemmelgarn


On 2018-04-03 14:25, Stefan G. Weichinger wrote:


Does anyone already use zstd  https://en.wikipedia.org/wiki/Zstandard
with amanda?

I will try to define an initial dumptype and play around although I
wonder if the standard behavior leads to any problems.

zstd does not remove the source file after de/compression per default
(only with "--rm") ... but as it is used within a pipe (?) with amanda I
assume that won't hurt.

The "-d" for decompression is there so that should work.



I've been using it for a few months now both at home and at work.  It 
works just fine as-is and gets pretty good performance.


In both cases though, I actually use a wrapper script.  The one for 
backups at work just adds `-T2` to the zstd command line as our backup 
server has lots of CPU (and CPU time), but the backups are 
network-limited.  At home, I also bump the compression level as high as 
I can without needing special decompression options (so the full command 
line at home that the wrapper passes is `-19 --long --zstd=hlog=26 -T2`).


I've done numerous restores from both sets of backups both with and 
without the wrapper script (I initially set both up to just use zstd 
directly), and it all appears to work just fine.

Re: Amanda clients running Docker

2018-03-29 Thread Austin S. Hemmelgarn


On 2018-03-27 11:12, Joi L. Ellis wrote:
I’m looking for information about how best to manage Amanda clients upon 
which are Devs are running docker containers.  Some of the production 
hosts are also running containers.  Does anyone have suggestions 
regarding best practices for backing up docker containers in an Amanda 
environment?  (I don’t use docker and I haven’t found anything online 
discussing containers on Amanda clients.)


Any pointers, suggestions, or online references would be very welcome.
I don't use Docker myself, but I do use LXC and know a lot of people who 
use a wide variety of container platforms including Docker, and the 
general principals are pretty much the same regardless of platform.


You have 5 options for handling container backups with Amanda:

1. Back up the containers as part of the regular host-system backup, and 
do all the containers together as one DLE.
2. Back up the containers as part of the regular host system backup with 
each container being it's own DLE (or DLE's).
3. Back up the containers in a separate backup set from the host system, 
with one DLE per host system.
4. Back up the containers in a separate backup set from the host system, 
with one DLE per container.

5. Back up the containers from the containers themselves.

Of these, most people I know use option 2 or 4 (I use approach 2 with 
locally written integration with the LXC to get the list of containers 
to back up).  Option 1 is probably the easiest, but can have performance 
issues if you have lots of containers (and requires a bit of effort to 
make sure you don't back up transient things like CI build containers). 
Option 3 suffers from the same issues that option 1 does, but takes more 
effort to set up.  Option 5 violates principles of minimalism, and is 
only really practical if your containers are full-system images instead 
of just bare-bones micro-services.

Re: some suggested config parameters for backups to local disk

2018-03-23 Thread Austin S. Hemmelgarn


On 2018-03-23 08:25, hy...@lactose.homelinux.net wrote:

"Ryan, Lyle (US)" writes:


The server has an 11TB filesystem to store the backups in.  I should
probably be fancier and split this up more, but not now.   So I've got my
holding, state, and vtapes directories all in there.


In this scenario, I would think there's no point to a "holding" disk.

I use a holding disk because my actual backup disk is external-USB and
(comparatively) slow.  So I backup to a holding disk on my internal
SSD, releasing the client and the network as soon as possible, and then
copy the backup to the backup drive afterwards.  But in your case, I
don't see any benefit.

There are two other benefits to having a holding disk:

1. It lets you run dumps in parallel.  Without a holding disk (or some 
somewhat complicated setup of the vtapes to allow parallel taping), you 
can only dump one DLE at a time because it dumps directly to tape.


2. It lets you defer taping until you have some minimum amount of data 
ready to be taped.  This may sound kind of useless when working with 
vtapes, but if the holding disk is on the same device as the final vtape 
library, deferring until the dumps are all done (or at least, almost all 
done) can help improve dumping performance, because the dump processes 
won't be competing with the taper process for disk bandwidth.

Re: some suggested config parameters for backups to local disk

2018-03-23 Thread Austin S. Hemmelgarn


On 2018-03-22 19:03, Ryan, Lyle (US) wrote:
I've got an Amanda 3.4.5 server running on Centos 7 now, and am able to 
do rudimentary backups of a remote client.


But in spite of reading man pages, HowTo's, etc, I need help choosing 
config params.  I don't mind continuing to read and experiment, but if 
someone could get me at least in the ballpark, I'd really appreciate it.


The server has an 11TB filesystem to store the backups in.  I should 
probably be fancier and split this up more, but not now.   So I've got 
my holding, state, and vtapes directories all in there.


The main client I want to back up has 4TB I want to backup.  It's almost 
all in one filesystem, but the HowTo for splitting DLE's with exclude 
lists is clear, so it should be easy to split this into (say) 10 smaller 
individual dumps.  The bulk of the data is pretty static, maybe 
10%/month changes.  It's hard to imagine 20%/month changing.


For a start, I'd like to get a full done every 2 weeks, and 
incrementals/differentials on the intervening days.   If I have room to 
keep 2 fulls (2 complete dumpcycles) that would be great.
Given what you've said, you should have enough room to do so, but only 
if you use compression.  Assuming the rate of change you quote above s 
approximately constant and doesn't result in bumping to a level higher 
than 1, then without compression you will need roughly 4.015TB per cycle 
(4TB for the full backup, ~15.38GB for the incrementals (roughly 0.38% 
change per day for 13 days)), plus 4TB of space for the holding disk 
(because you have to have room for a full backup _there_ prior to taping 
anything).  With compression and assuming you get a compression ratio of 
about 50%, you should actually be able to fit four complete cycles (you 
would need about 2.0075TB per cycle), though if you decide you want that 
I would bump the tapecycle to 60 and the number of slots to 60.


So I'm thinking:

- dumpcycle = 14

- runspercycle = 0 (default)

- tapecycle = 30

- runtapes = 1 (default)

I'd break the filesystem into 10 pieces, so 400GB each. and make the 
vtapes 400GB each (with tapetype length) relying on server-side 
compression to make it fit.


The HowTo "Use pigz to speed compression" looks clear, and the DL380 G7 
isn't doing anything else, so server-side compression sounds good.


Any advice on this or better ideas?  Maybe I'm off in left-field.

And one bonus question:  I'm assuming Amanda will just make vtapes as 
necessary, but is there any guidance as to how many vtape slots I should 
create ahead of time?  If my dumpcycle=14, maybe create 14 slots just to 
make tapes easier to find?


Debra covered the requirements for vtapes, slots, and everything very 
well in her reply, so I won't repeat any of that here.  I do however 
have some other more generic advice I can give based on my own experience:


* Make your vtapes as large as possible.  They won't take up any space 
beyond what's stored on them (in storage terminology, they're thinly 
provisioned), so their total 'virtual' size can be far more than your 
actual storage capacity, but if you can make it so that you can always 
fit a full backup on a single vtape, it will make figuring out how many 
vtapes you need easier, and additionally give a slight boost to taping 
performance (because the taper never has to stop to switch to a new 
vtape).  In your case, I'd say stating 5TB for your vtape size is 
reasonable, that would give you some extra room if you suddenly have 
more data without being insanely over-sized.


* Make sure to set a reasonable part_size for your vtapes.  While you 
wouldn't have to worry about splitting dumps if you take my above advice 
about vtape size, using parts has some other performance related 
advantages.  I normally use 1G, but all of my dumps are less than 100G 
in size.  In your case, if you'll have 10 400G dumps, I'd probably go 
for 4G for the part size.


* Match your holding disk chunk size to your vtape's part_size.  I have 
no hard number to back this up, but it appears to provide a slight 
performance improvement while dumping data.


* Don't worry right now about parallelizing the taping process.  It's 
somewhat complicated to get it working right, significantly changes how 
you have to calculate vtape slots and sizes, and will probably not 
provide much benefit unless you're taping to a really fast RAID array 
that does a very good job of handling parallel writes.


* There's essentially zero performance benefit to having your holding 
disk on a separate partition from your final storage unless you have it 
on a completely separate disk.  There are some benefits in terms of 
reliability, but realizing them requires some significant planning (you 
have to figure out exactly what amount of space your holding disk will 
need).


* If you're indexing the backups, store the working index directory (the 
one Amanda actually reads and writes to) on a separate drive from the 
holding disk and final backup

Re: installing on Centos 7 - some newbee questions

2018-03-09 Thread Austin S. Hemmelgarn


On 2018-03-07 21:30, Ryan, Lyle (US) wrote:
Hello all.  I’m getting my first Amanda server running on Centos 7 and 
have a few questions:


- Centos is packaged with 3.3.3   Is that good enough or should I build 3.5?
Provided it's not missing any features you need and doesn't have any 
bugs that affect you, yeah it should be fine (and assuming of course 
you're not exposing it to the internet).  This applies even if you've 
got other versions on the network too (provided the protocols match up, 
it's perfectly possible to run differing versions of Amanda throughout 
the network).


- the server will use only disks, no tapes.   10TB, mostly all devoted 
to /home (though I could repartition)


- I believe I still use vtapes and a holding disk, even though they’ll 
all just be directories on the main partition.  sound right?
Yes.  The holding disk is actually pretty important even when using 
vtapes for two reasons:


1. It allows you to back-up DLE's that are larger than the size you've 
specified for your vtapes.
2. It lets you run multiple backups in parallel without having to jump 
through hoops to allow Amanda to write to multiple vtapes in parallel.


One quick tip regarding this type of configuration:  Try to match the 
part-size tapetype option and the chunksize option for the holding disk. 
 As stupid as it sounds, matching these actually improves performance 
by a measurable amount in most cases.  If you've got a bunch of big 
backups, 1GB is generally a reasonable size for both.


- I follow the instructions at 
https://wiki.zmanda.com/index.php/GSWA/Build_a_Basic_Configuration but 
when running amcheck get the error:


    can not stat /var/lib/Amanda/gnutar-lists

- indeed there is no file present there.  any ideas?
Just create it and set the correct permissions.  Strictly speaking, the 
package should create this when installed, but it seems a number of 
distributions' packages don't do so.

Re: keep a backup forever

2018-01-31 Thread Austin S. Hemmelgarn


On 2018-01-30 17:18, ghe wrote:

On 01/30/2018 12:29 PM, hy...@lactose.homelinux.net wrote:

I feel like I've asked this before, but I can't find any emails.
I can't believe this isn't an FAQ.  Or rather, there is an FAQ, but the
answer is (a) very sparse and (b) doesn't really answer the question.

I had a machine.  That machine was getting regular backups.  The machine
died.  I have replaced it with a new machine.  So having had this
emergency, I now want to keep, in perpetuity, my last full backup of
the now-dead machine.


How big was the dead disk? Do you have space to store the whole thing?

Did amanda do a level 0 of the whole dead disk to 17? If not, there are
very likely pieces of that disk on several of your virtual tapes.
amrestore deals with all that.


The backup in question is on (virtual) tape number 17.  So let's say
I take the approparite files that are in my /storage/amanda/vtapes/slot17
directory and copy them somewhere safe.  Six months go by, my real
slot17 gets reused, and I take those old files and copy them into slot44.

What is my next step?  How do I get those backups back into my amanda
index so that I can amrecover from them?  Is that what amreindex does?
Is that what amrestore does?


What I'd do is recover the last files amanda backed up from that disk,
using amrestore. I'd restore to a disk, consider that the perpetual
backup, and not try to get that old disk data anywhere in amanmda's
database -- amanda is very much oriented to reusing things in a cycle,
and trying to get her to change her ways can be difficult.

amrestore's a pleasant piece of software to use. You just tell it the
date you want to restore, the disk, the files, and some other things (I
use it infrequently, and I have to read the man page every time).
amrestore figures out which tapes you need, and restores the data.

Then you can do what you want with them -- burn to optical, buy a new
disk, whatever.

I would suggest the same approach myself.  In fact, that's pretty much 
what we do where I work.  Whenever we permanently decomission a system, 
it gets pulled from the backup rotation, and we image the disk and store 
the disk image in archival storage that's separate from the storage we 
use for regular backups.  Our procedure is similar for a failed disk we 
don't plan to replace, except instead of imaging it as-is, we rebuild it 
from backups and then image it (the imaging procedure was the norm 
before we switched to amanda, so it's just kind of stuck around).

Re: application amgtar ignore messages

2017-12-08 Thread Austin S. Hemmelgarn


On 2017-12-07 22:26, Jon LaBadie wrote:

If I want amgtar to ignore certain messages, is it
sufficient to list the message on the amanda server
or must the ignored message also be listed in
amanda-client.conf?

I've done it several times, only on the server, and
it seemed to work fine.  But I'm now trying to ignore
one message that appears on only one client and I'm
having no success.

Do I need to set up an "application amgtar" stanza
on the client?

Doesn't affect the question, but the problem is
caused by the "gnome virtual file system directory",
/home/user/.config/.gvfs.  This is a fuser mountpoint
not accessible by root.  So it generates a "can not
stat" error message from amgtar.

The better approach to this is to add that to the exclude file for that 
particular disk.  It's a well known path, so nothing else should be 
using it, and it's an area that shouldn't be dumped anyway, for a lot of 
the same reasons you shouldn't be dumping /sys or /dev/shm (and in fact, 
it isn't getting dumped, because amgtar can't see inside it).

Re: Odd non-fatal errors in amdump reports.

2017-11-16 Thread Austin S. Hemmelgarn


On 2017-11-14 14:37, Austin S. Hemmelgarn wrote:

On 2017-11-14 07:43, Austin S. Hemmelgarn wrote:

On 2017-11-14 07:34, Austin S. Hemmelgarn wrote:

On 2017-11-13 16:42, Jean-Louis Martineau wrote:

On 13/11/17 02:53 PM, Austin S. Hemmelgarn wrote:

driver: send-cmd time 9300.326 to taper1: VAULT-WRITE worker1-0 
00-00120 local-vtl local-vtl Home-0001 client0 /home/1D 0 
20171113073255 "" "" "" "" 1073741824 memory "" "" 0


 > FAIL taper "ST:cloud" "POOL:cloud" client0 /home/1D 
20171113073255 0 error "File 0 not found"


Do that dump still exists on tape Home-0001? Find it with amfetchdump.

If yes, send me the taper debug file.
amfetchdump does not see it, but looking directly at the virtual tape 
directories, I can see it there.


Just tried an amcheckdump on everything, it looks like some of the 
dump files are corrupted, but I can't for the life of me figure out 
why (I test our network regularly and it has no problems, and any 
problems with a particular system should show up as more than just 
corrupted tar files).  I'm going to try disabling compression and see 
if that helps at all, as that's the only processing other than the 
default that we're doing on the dumps (long term, it's not really a 
viable option, but if it fixes things at least we know what's broken).
No luck changing compression.  I would suspect some issue with NFS, but 
I've started seeing the same symptoms on my laptop as well now (which is 
completely unrelated to any of the sets at work other than having an 
almost identical configuration other than paths and the total number of 
tapes).


So, I finally got things working by switching from:

storage "local-vtl"
vault-storage "cloud"

To:

storage: "local-vtl" "cloud"

And removing the "vault" option from the local-vtl storage definition. 
Strictly speaking, this is working around the issue instead of fixing 
it, but it fits within what we need for our usage, and actually makes 
the amdump runs complete faster (since dumps get taped to S3 in parallel 
with getting taped to the local vtapes).


Based on this, and the fact that the issues I was seeing with corrupted 
dumps being reported by amcheckdump, I think the issue is probably an 
interaction between the vaulting code and the regular taping code, but 
I'm not certain.


Thanks for the help.

Re: Odd non-fatal errors in amdump reports.

2017-11-14 Thread Austin S. Hemmelgarn


On 2017-11-14 07:43, Austin S. Hemmelgarn wrote:

On 2017-11-14 07:34, Austin S. Hemmelgarn wrote:

On 2017-11-13 16:42, Jean-Louis Martineau wrote:

On 13/11/17 02:53 PM, Austin S. Hemmelgarn wrote:

driver: send-cmd time 9300.326 to taper1: VAULT-WRITE worker1-0 
00-00120 local-vtl local-vtl Home-0001 client0 /home/1D 0 
20171113073255 "" "" "" "" 1073741824 memory "" "" 0


 > FAIL taper "ST:cloud" "POOL:cloud" client0 /home/1D 20171113073255 
0 error "File 0 not found"


Do that dump still exists on tape Home-0001? Find it with amfetchdump.

If yes, send me the taper debug file.
amfetchdump does not see it, but looking directly at the virtual tape 
directories, I can see it there.


Just tried an amcheckdump on everything, it looks like some of the dump 
files are corrupted, but I can't for the life of me figure out why (I 
test our network regularly and it has no problems, and any problems with 
a particular system should show up as more than just corrupted tar 
files).  I'm going to try disabling compression and see if that helps at 
all, as that's the only processing other than the default that we're 
doing on the dumps (long term, it's not really a viable option, but if 
it fixes things at least we know what's broken).
No luck changing compression.  I would suspect some issue with NFS, but 
I've started seeing the same symptoms on my laptop as well now (which is 
completely unrelated to any of the sets at work other than having an 
almost identical configuration other than paths and the total number of 
tapes).

Re: Odd non-fatal errors in amdump reports.

2017-11-14 Thread Austin S. Hemmelgarn


On 2017-11-14 07:34, Austin S. Hemmelgarn wrote:

On 2017-11-13 16:42, Jean-Louis Martineau wrote:

On 13/11/17 02:53 PM, Austin S. Hemmelgarn wrote:

driver: send-cmd time 9300.326 to taper1: VAULT-WRITE worker1-0 
00-00120 local-vtl local-vtl Home-0001 client0 /home/1D 0 
20171113073255 "" "" "" "" 1073741824 memory "" "" 0


 > FAIL taper "ST:cloud" "POOL:cloud" client0 /home/1D 20171113073255 
0 error "File 0 not found"


Do that dump still exists on tape Home-0001? Find it with amfetchdump.

If yes, send me the taper debug file.
amfetchdump does not see it, but looking directly at the virtual tape 
directories, I can see it there.


Just tried an amcheckdump on everything, it looks like some of the dump 
files are corrupted, but I can't for the life of me figure out why (I 
test our network regularly and it has no problems, and any problems with 
a particular system should show up as more than just corrupted tar 
files).  I'm going to try disabling compression and see if that helps at 
all, as that's the only processing other than the default that we're 
doing on the dumps (long term, it's not really a viable option, but if 
it fixes things at least we know what's broken).

Re: Odd non-fatal errors in amdump reports.

2017-11-14 Thread Austin S. Hemmelgarn


On 2017-11-13 16:42, Jean-Louis Martineau wrote:

On 13/11/17 02:53 PM, Austin S. Hemmelgarn wrote:

driver: send-cmd time 9300.326 to taper1: VAULT-WRITE worker1-0 00-00120 
local-vtl local-vtl Home-0001 client0 /home/1D 0 20171113073255 "" "" "" 
"" 1073741824 memory "" "" 0


 > FAIL taper "ST:cloud" "POOL:cloud" client0 /home/1D 20171113073255 0 
error "File 0 not found"


Do that dump still exists on tape Home-0001? Find it with amfetchdump.

If yes, send me the taper debug file.
amfetchdump does not see it, but looking directly at the virtual tape 
directories, I can see it there.

Re: power down hard drives


On 2017-11-13 14:51, Jon LaBadie wrote:

On Mon, Nov 13, 2017 at 02:04:42PM -0500, Gene Heskett wrote:

On Monday 13 November 2017 13:42:13 Jon LaBadie wrote:


On Mon, Nov 13, 2017 at 11:40:17AM -0500, Austin S. Hemmelgarn wrote:

On 2017-11-13 11:11, Gene Heskett wrote:

On Monday 13 November 2017 10:12:47 Austin S. Hemmelgarn wrote:

On 2017-11-13 09:56, Gene Heskett wrote:

On Monday 13 November 2017 07:19:45 Austin S. Hemmelgarn wrote:

On 2017-11-11 01:49, Jon LaBadie wrote:

Just a thought.  My amanda server has seven hard drives
dedicated to saving amanda data.  Only 2 are typically
used (holding and one vtape drive) during an amdump run.
Even then, the usage is only for about 3 hours.

So there is a lot of electricity and disk drive wear for
inactive drives.

Can todays drives be unmounted and powered down then
when needed, powered up and mounted again?

I'm not talking about system hibernation, the system
and its other drives still need to be active.

Back when 300GB was a big drive I had 2 of them in
external USB housings.  They shut themselves down
on inactivity.  When later accessed, there would
be about 5-10 seconds delay while the drive spun
up and things proceeded normally.

That would be a fine arrangement now if it could
be mimiced.


Aside from what Stefan mentioned (using hdparam to set the
standby timeout, check the man page for hdparam as the
numbers are not exactly sensible), you may consider looking
into auto-mounting each of the drives, as that can help
eliminate things that would keep the drives on-line (or make
it more obvious that something is still using them).


...


But if I allow the 2TB to be  unmounted and self-powered down,
once daily, what shortening of its life would I be subjected to?
In other words, how many start-stop cycles can it survive?


It's hard to be certain.  For what it's worth though, you might want
to test this to be certain that it's actually going to save you
energy.  It takes a lot of power to get the platters up to speed,
but it doesn't take much to keep them running at that speed.  It
might be more advantageous to just configure the device to idle
(that is, park the heads) after some time out and leave the platters
spinning instead of spinning down completely (and it should result
in less wear on the spindle motor).


In my situation, each of the six data drives is only
needed for a 2 week period out of each 12 weeks.  Once
shutdown, it could be down for 10 weeks.

Jon


Which is more than enough time for stiction to appear if the heads are
not parked off disk.


Don't today's drives automatically park heads?
I don't think there were ever any (at least, not ATA or SAS) that didn't 
when they went into standby.  In fact, I've never seen a modern style 
hard disk with 'voice coil' style actuators that didn't automatically 
park the heads (and part of my job is tearing apart old hard drives 
prior to physical media destruction, so I've seen my fair share of them).

Re: Odd non-fatal errors in amdump reports.


On 2017-11-10 12:52, Jean-Louis Martineau wrote:

The previous patch broke something.
Try this new set2-r2.diff patch


Unfortunately, that doesn't appear to have fixed it, though the errors 
look different now.  I'll try and get the log scrubbed by the end of the 
day and post it here.


On 10/11/17 10:40 AM, Austin S. Hemmelgarn wrote:
 > On 2017-11-10 08:27, Jean-Louis Martineau wrote:
 >> On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote:
 >>> On 2017-11-08 08:03, Jean-Louis Martineau wrote:
 >>>> On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
 >>>> > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
 >>>> >> Austin,
 >>>> >>
 >>>> >> It's hard to say something with only the error message.
 >>>> >>
 >>>> >> Can you post the amdump. and log..0 for
 >>>> the 2
 >>>> >> backup set that fail.
 >>>> >>
 >>>> > I've attached the files (I would put them inline, but one of the
 >>>> sets
 >>>> > has over 100 DLE's, so the amdump file is huge, and the others are
 >>>> > still over 100k each, and I figured nobody want's to try and wad
 >>>> > through those in-line).
 >>>> >
 >>>> > The set1 and set2 files are for the two backup sets that show the
 >>>> > header mismatch error, and the set3 files are for the one that
 >>>> claims
 >>>> > failures in the dump summary.
 >>>>
 >>>>
 >>>> I looked at set3, the error in the 'DUMP SUMMARY' are related to the
 >>>> error in the 'FAILURE DUMP SUMMARY'
 >>>>
 >>>> client2 /boot lev 0 FLUSH [File 0 not found]
 >>>> client3 /boot lev 0 FLUSH [File 0 not found]
 >>>> client7 /boot lev 0 FLUSH [File 0 not found]
 >>>> client8 /boot lev 0 FLUSH [File 0 not found]
 >>>> client0 /boot lev 0 FLUSH [File 0 not found]
 >>>> client9 /boot lev 0 FLUSH [File 0 not found]
 >>>> client9 /srv lev 0 FLUSH [File 0 not found]
 >>>> client9 /var lev 0 FLUSH [File 0 not found]
 >>>> server0 /boot lev 0 FLUSH [File 0 not found]
 >>>> client10 /boot lev 0 FLUSH [File 0 not found]
 >>>> client11 /boot lev 0 FLUSH [File 0 not found]
 >>>> client12 /boot lev 0 FLUSH [File 0 not found]
 >>>>
 >>>> They are VAULT attemp, not FLUSH, looking only at the first entry, it
 >>>> try to vault 'client2 /boot 0 20171024084159' which it expect to
 >>>> find on
 >>>> tape Server-01. It is an older dump.
 >>>>
 >>>> Do Server-01 is still there? Did it still contains the dump?
 >>>>
 >>> OK, I've done some further investigation by tweaking the labeling a
 >>> bit (which actually fixed a purely cosmetic issue we were having),
 >>> but I'm still seeing the same problem that prompted this thread, and
 >>> I can confirm that the dumps are where Amanda is trying to look for
 >>> them, it's just not seeing them for some reason. I hadn't thought
 >>> of this before, but could it have something to do with the virtual
 >>> tape library being auto-mounted over NFS on the backup server?
 >>>
 >> Austin,
 >>
 >> Can you try to see if amfetchdump can restore it?
 >>
 >> * amfetchdump CONFIG client2 /boot 20171024084159
 >>
 > amfetchdump doesn't see it, and neither does amrecover, but the files
 > for the given parts are definitely there (I know for a fact that the
 > dump in question has exactly one part, and the file for that does
 > exist on the virtual tape mentioned in the log file).
 >
 > I'm probably not going to be able to check more on this today, but
 > I'll likely be checking if amrestore and amadmin find can see them.
 >

Re: power down hard drives


On 2017-11-13 11:11, Gene Heskett wrote:

On Monday 13 November 2017 10:12:47 Austin S. Hemmelgarn wrote:


On 2017-11-13 09:56, Gene Heskett wrote:

On Monday 13 November 2017 07:19:45 Austin S. Hemmelgarn wrote:

On 2017-11-11 01:49, Jon LaBadie wrote:

Just a thought.  My amanda server has seven hard drives
dedicated to saving amanda data.  Only 2 are typically
used (holding and one vtape drive) during an amdump run.
Even then, the usage is only for about 3 hours.

So there is a lot of electricity and disk drive wear for
inactive drives.

Can todays drives be unmounted and powered down then
when needed, powered up and mounted again?

I'm not talking about system hibernation, the system
and its other drives still need to be active.

Back when 300GB was a big drive I had 2 of them in
external USB housings.  They shut themselves down
on inactivity.  When later accessed, there would
be about 5-10 seconds delay while the drive spun
up and things proceeded normally.

That would be a fine arrangement now if it could
be mimiced.


Aside from what Stefan mentioned (using hdparam to set the standby
timeout, check the man page for hdparam as the numbers are not
exactly sensible), you may consider looking into auto-mounting each
of the drives, as that can help eliminate things that would keep
the drives on-line (or make it more obvious that something is still
using them).


I've investigated that, and I have amanda wrapped up in a script
that could do that, but ran into a showstopper I've long since
forgotten about.  Al this was back in the time I was writing that
wrapper, years ago now. One of the show stoppers AIR was the fact
that only root can mount and unmount a drive, and my script runs as
amanda.


While such a wrapper might work if you use sudo inside it (you can
configure sudo to allow root to run things as the amanda user without
needing a password, then run the wrapper as root), what I was trying
to refer to in a system-agnostic manner (since the exact mechanism is
different between different UNIX derivatives) was on-demand
auto-mounting, as provided by autofs on Linux or the auto-mount daemon
(amd) on BSD.  When doing on-demand auto-mounting, you don't need a
wrapper at all, as the access attempt will trigger the mount, and then
the mount will time out after some period of inactivity and be
unmounted again.  It's mostly used for network resources (possibly
with special auto-lookup mechanisms), as certain protocols (NFS in
particular) tend to have issues if the server goes down while a share
is mounted remotely, even if nothing is happening on that share, but
it works just as well for auto-mounting of local fixed or removable
volumes that aren't needed all the time (I use it for a handful of
things on my personal systems to minimize idle resource usage).


Sounds good perhaps. I am currently up to my eyeballs in an unrelated
problem, and I won't get to this again until that project is completed
and I have brought the 2TB drive in and configured it for amanda's
usage. That will tend to enforce my one thing at a time but do it right
bent. :)  What I have is working for a loose definition of working...
Yeah, I know what that's like.  Prior to switching to amanda where I 
worked, we had a home-grown backup system that had all kinds of odd edge 
cases I had to make sure never happened.  I'm extremely glad we decided 
to stop using that, since it means I can now focus on more interesting 
problems (in theory at least, we're having an issue with our Amanda 
config right now too, but thankfully it's not a huge one).


But if I allow the 2TB to be  unmounted and self-powered down, once
daily, what shortening of its life would I be subjected to?  In other
words, how many start-stop cycles can it survive?
It's hard to be certain.  For what it's worth though, you might want to 
test this to be certain that it's actually going to save you energy.  It 
takes a lot of power to get the platters up to speed, but it doesn't 
take much to keep them running at that speed.  It might be more 
advantageous to just configure the device to idle (that is, park the 
heads) after some time out and leave the platters spinning instead of 
spinning down completely (and it should result in less wear on the 
spindle motor).


Interesting, I had started a long time test yesterday, and the reported
hours has wrapped in the report, apparently at 65636 hours. Somebody
apparently didn't expect a drive to last that long? ;-)  The drive?
Healthy as can be.
That's about 7.48 years, so I can actually somewhat understand not going 
past 16-bits for that since most people don't use a given disk for more 
than about 5 years worth of power-on time before replacing it.  However, 
what matters is really not how long the device has been powered on, but 
how much abuse the drive has taken.  Running 24/7 for 5 years with no 
movement of the system (including nothing like earthquakes), in a 
temperature, humidity, and pressure controlled room will get

Re: power down hard drives


On 2017-11-13 09:56, Gene Heskett wrote:

On Monday 13 November 2017 07:19:45 Austin S. Hemmelgarn wrote:


On 2017-11-11 01:49, Jon LaBadie wrote:

Just a thought.  My amanda server has seven hard drives
dedicated to saving amanda data.  Only 2 are typically
used (holding and one vtape drive) during an amdump run.
Even then, the usage is only for about 3 hours.

So there is a lot of electricity and disk drive wear for
inactive drives.

Can todays drives be unmounted and powered down then
when needed, powered up and mounted again?

I'm not talking about system hibernation, the system
and its other drives still need to be active.

Back when 300GB was a big drive I had 2 of them in
external USB housings.  They shut themselves down
on inactivity.  When later accessed, there would
be about 5-10 seconds delay while the drive spun
up and things proceeded normally.

That would be a fine arrangement now if it could
be mimiced.


Aside from what Stefan mentioned (using hdparam to set the standby
timeout, check the man page for hdparam as the numbers are not exactly
sensible), you may consider looking into auto-mounting each of the
drives, as that can help eliminate things that would keep the drives
on-line (or make it more obvious that something is still using them).


I've investigated that, and I have amanda wrapped up in a script that
could do that, but ran into a showstopper I've long since forgotten
about.  Al this was back in the time I was writing that wrapper, years
ago now. One of the show stoppers AIR was the fact that only root can
mount and unmount a drive, and my script runs as amanda.

While such a wrapper might work if you use sudo inside it (you can 
configure sudo to allow root to run things as the amanda user without 
needing a password, then run the wrapper as root), what I was trying to 
refer to in a system-agnostic manner (since the exact mechanism is 
different between different UNIX derivatives) was on-demand 
auto-mounting, as provided by autofs on Linux or the auto-mount daemon 
(amd) on BSD.  When doing on-demand auto-mounting, you don't need a 
wrapper at all, as the access attempt will trigger the mount, and then 
the mount will time out after some period of inactivity and be unmounted 
again.  It's mostly used for network resources (possibly with special 
auto-lookup mechanisms), as certain protocols (NFS in particular) tend 
to have issues if the server goes down while a share is mounted 
remotely, even if nothing is happening on that share, but it works just 
as well for auto-mounting of local fixed or removable volumes that 
aren't needed all the time (I use it for a handful of things on my 
personal systems to minimize idle resource usage).

Re: Odd non-fatal errors in amdump reports.


On 2017-11-10 12:52, Jean-Louis Martineau wrote:

The previous patch broke something.
Try this new set2-r2.diff patch
Given that the switch to NFSv4 combined with a change to the labeling 
scheme fixed the other issue, I'm going to re-test these two sets with 
the same changes before I test the patch just so I've got something 
current to compare against.  I should have results from that later 
today, and will likely be testing this patch tomorrow if things aren't 
resolved by the other changes (and based on what you've said and what 
I've seen, I don't think the switch to NFSv4 or the labeling change will 
fix this one).


Jean-Louis

On 10/11/17 10:40 AM, Austin S. Hemmelgarn wrote:
 > On 2017-11-10 08:27, Jean-Louis Martineau wrote:
 >> On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote:
 >>> On 2017-11-08 08:03, Jean-Louis Martineau wrote:
 >>>> On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
 >>>> > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
 >>>> >> Austin,
 >>>> >>
 >>>> >> It's hard to say something with only the error message.
 >>>> >>
 >>>> >> Can you post the amdump. and log..0 for
 >>>> the 2
 >>>> >> backup set that fail.
 >>>> >>
 >>>> > I've attached the files (I would put them inline, but one of the
 >>>> sets
 >>>> > has over 100 DLE's, so the amdump file is huge, and the others are
 >>>> > still over 100k each, and I figured nobody want's to try and wad
 >>>> > through those in-line).
 >>>> >
 >>>> > The set1 and set2 files are for the two backup sets that show the
 >>>> > header mismatch error, and the set3 files are for the one that
 >>>> claims
 >>>> > failures in the dump summary.
 >>>>
 >>>>
 >>>> I looked at set3, the error in the 'DUMP SUMMARY' are related to the
 >>>> error in the 'FAILURE DUMP SUMMARY'
 >>>>
 >>>> client2 /boot lev 0 FLUSH [File 0 not found]
 >>>> client3 /boot lev 0 FLUSH [File 0 not found]
 >>>> client7 /boot lev 0 FLUSH [File 0 not found]
 >>>> client8 /boot lev 0 FLUSH [File 0 not found]
 >>>> client0 /boot lev 0 FLUSH [File 0 not found]
 >>>> client9 /boot lev 0 FLUSH [File 0 not found]
 >>>> client9 /srv lev 0 FLUSH [File 0 not found]
 >>>> client9 /var lev 0 FLUSH [File 0 not found]
 >>>> server0 /boot lev 0 FLUSH [File 0 not found]
 >>>> client10 /boot lev 0 FLUSH [File 0 not found]
 >>>> client11 /boot lev 0 FLUSH [File 0 not found]
 >>>> client12 /boot lev 0 FLUSH [File 0 not found]
 >>>>
 >>>> They are VAULT attemp, not FLUSH, looking only at the first entry, it
 >>>> try to vault 'client2 /boot 0 20171024084159' which it expect to
 >>>> find on
 >>>> tape Server-01. It is an older dump.
 >>>>
 >>>> Do Server-01 is still there? Did it still contains the dump?
 >>>>
 >>> OK, I've done some further investigation by tweaking the labeling a
 >>> bit (which actually fixed a purely cosmetic issue we were having),
 >>> but I'm still seeing the same problem that prompted this thread, and
 >>> I can confirm that the dumps are where Amanda is trying to look for
 >>> them, it's just not seeing them for some reason. I hadn't thought
 >>> of this before, but could it have something to do with the virtual
 >>> tape library being auto-mounted over NFS on the backup server?
 >>>
 >> Austin,
 >>
 >> Can you try to see if amfetchdump can restore it?
 >>
 >> * amfetchdump CONFIG client2 /boot 20171024084159
 >>
 > amfetchdump doesn't see it, and neither does amrecover, but the files
 > for the given parts are definitely there (I know for a fact that the
 > dump in question has exactly one part, and the file for that does
 > exist on the virtual tape mentioned in the log file).
 >
 > I'm probably not going to be able to check more on this today, but
 > I'll likely be checking if amrestore and amadmin find can see them.
 >

Re: Odd non-fatal errors in amdump reports.

On 2017-11-10 08:45, Austin S. Hemmelgarn wrote:

On 2017-11-10 08:27, Jean-Louis Martineau wrote:

On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote:

On 2017-11-08 08:03, Jean-Louis Martineau wrote:

On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
 > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
 >> Austin,
 >>
 >> It's hard to say something with only the error message.
 >>
 >> Can you post the amdump. and log..0 for the 2
 >> backup set that fail.
 >>
 > I've attached the files (I would put them inline, but one of the 
sets

 > has over 100 DLE's, so the amdump file is huge, and the others are
 > still over 100k each, and I figured nobody want's to try and wad
 > through those in-line).
 >
 > The set1 and set2 files are for the two backup sets that show the
 > header mismatch error, and the set3 files are for the one that 
claims

 > failures in the dump summary.

I looked at set3, the error in the 'DUMP SUMMARY' are related to the
error in the 'FAILURE DUMP SUMMARY'

client2 /boot lev 0 FLUSH [File 0 not found]
client3 /boot lev 0 FLUSH [File 0 not found]
client7 /boot lev 0 FLUSH [File 0 not found]
client8 /boot lev 0 FLUSH [File 0 not found]
client0 /boot lev 0 FLUSH [File 0 not found]
client9 /boot lev 0 FLUSH [File 0 not found]
client9 /srv lev 0 FLUSH [File 0 not found]
client9 /var lev 0 FLUSH [File 0 not found]
server0 /boot lev 0 FLUSH [File 0 not found]
client10 /boot lev 0 FLUSH [File 0 not found]
client11 /boot lev 0 FLUSH [File 0 not found]
client12 /boot lev 0 FLUSH [File 0 not found]

They are VAULT attemp, not FLUSH, looking only at the first entry, it
try to vault 'client2 /boot 0 20171024084159' which it expect to 
find on

tape Server-01. It is an older dump.

Do Server-01 is still there? Did it still contains the dump?

OK, I've done some further investigation by tweaking the labeling a 
bit (which actually fixed a purely cosmetic issue we were having), 
but I'm still seeing the same problem that prompted this thread, and 
I can confirm that the dumps are where Amanda is trying to look for 
them, it's just not seeing them for some reason.  I hadn't thought of 
this before, but could it have something to do with the virtual tape 
library being auto-mounted over NFS on the backup server?

Austin,

Can you try to see if amfetchdump can restore it?

  * amfetchdump CONFIG client2 /boot 20171024084159
At the moment, I'm re-testing things after tweaking some NFS parameters 
for the virtual tape library (apparently the FreeNAS server that's 
actually storing the data didn't have NFSv4 turned on, so it was mounted 
with NFSv3, which we've had issues with before on our network), so I 
can't exactly check immediately, but assuming the problem repeats, I'll 
do that first thing once the test dump is done.

It looks like the combination of fixing the incorrect labeling in the 
config and switching to NFSv4 fixed this particular case.

Re: power down hard drives


On 2017-11-11 01:49, Jon LaBadie wrote:

Just a thought.  My amanda server has seven hard drives
dedicated to saving amanda data.  Only 2 are typically
used (holding and one vtape drive) during an amdump run.
Even then, the usage is only for about 3 hours.

So there is a lot of electricity and disk drive wear for
inactive drives.

Can todays drives be unmounted and powered down then
when needed, powered up and mounted again?

I'm not talking about system hibernation, the system
and its other drives still need to be active.

Back when 300GB was a big drive I had 2 of them in
external USB housings.  They shut themselves down
on inactivity.  When later accessed, there would
be about 5-10 seconds delay while the drive spun
up and things proceeded normally.

That would be a fine arrangement now if it could
be mimiced.
Aside from what Stefan mentioned (using hdparam to set the standby 
timeout, check the man page for hdparam as the numbers are not exactly 
sensible), you may consider looking into auto-mounting each of the 
drives, as that can help eliminate things that would keep the drives 
on-line (or make it more obvious that something is still using them).

Re: Odd non-fatal errors in amdump reports.

On 2017-11-10 08:27, Jean-Louis Martineau wrote:

On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote:

On 2017-11-08 08:03, Jean-Louis Martineau wrote:

On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
 > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
 >> Austin,
 >>
 >> It's hard to say something with only the error message.
 >>
 >> Can you post the amdump. and log..0 for the 2
 >> backup set that fail.
 >>
 > I've attached the files (I would put them inline, but one of the sets
 > has over 100 DLE's, so the amdump file is huge, and the others are
 > still over 100k each, and I figured nobody want's to try and wad
 > through those in-line).
 >
 > The set1 and set2 files are for the two backup sets that show the
 > header mismatch error, and the set3 files are for the one that claims
 > failures in the dump summary.

I looked at set3, the error in the 'DUMP SUMMARY' are related to the
error in the 'FAILURE DUMP SUMMARY'

client2 /boot lev 0 FLUSH [File 0 not found]
client3 /boot lev 0 FLUSH [File 0 not found]
client7 /boot lev 0 FLUSH [File 0 not found]
client8 /boot lev 0 FLUSH [File 0 not found]
client0 /boot lev 0 FLUSH [File 0 not found]
client9 /boot lev 0 FLUSH [File 0 not found]
client9 /srv lev 0 FLUSH [File 0 not found]
client9 /var lev 0 FLUSH [File 0 not found]
server0 /boot lev 0 FLUSH [File 0 not found]
client10 /boot lev 0 FLUSH [File 0 not found]
client11 /boot lev 0 FLUSH [File 0 not found]
client12 /boot lev 0 FLUSH [File 0 not found]

They are VAULT attemp, not FLUSH, looking only at the first entry, it
try to vault 'client2 /boot 0 20171024084159' which it expect to find on
tape Server-01. It is an older dump.

Do Server-01 is still there? Did it still contains the dump?

OK, I've done some further investigation by tweaking the labeling a 
bit (which actually fixed a purely cosmetic issue we were having), but 
I'm still seeing the same problem that prompted this thread, and I can 
confirm that the dumps are where Amanda is trying to look for them, 
it's just not seeing them for some reason.  I hadn't thought of this 
before, but could it have something to do with the virtual tape 
library being auto-mounted over NFS on the backup server?

Austin,

Can you try to see if amfetchdump can restore it?

  * amfetchdump CONFIG client2 /boot 20171024084159

amfetchdump doesn't see it, and neither does amrecover, but the files 
for the given parts are definitely there (I know for a fact that the 
dump in question has exactly one part, and the file for that does exist 
on the virtual tape mentioned in the log file).

I'm probably not going to be able to check more on this today, but I'll 
likely be checking if amrestore and amadmin find can see them.

Re: Odd non-fatal errors in amdump reports.


On 2017-11-10 10:00, Jean-Louis Martineau wrote:

Austin,

Can you try the attached patch, I think it could fix the set1 and set2
errors.

Yes, but I won't be able to log in this weekend to revert it if it 
doesn't work, so I won't be able to test it until Monday.


Am I correct in assuming that it only needs to be applied on the server 
and not the clients?


On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
 > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
 >> Austin,
 >>
 >> It's hard to say something with only the error message.
 >>
 >> Can you post the amdump. and log..0 for the 2
 >> backup set that fail.
 >>
 > I've attached the files (I would put them inline, but one of the sets
 > has over 100 DLE's, so the amdump file is huge, and the others are
 > still over 100k each, and I figured nobody want's to try and wad
 > through those in-line).
 >
 > The set1 and set2 files are for the two backup sets that show the
 > header mismatch error, and the set3 files are for the one that claims
 > failures in the dump summary.

Re: Odd non-fatal errors in amdump reports.

On 2017-11-10 08:27, Jean-Louis Martineau wrote:

On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote:

On 2017-11-08 08:03, Jean-Louis Martineau wrote:

On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
 > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
 >> Austin,
 >>
 >> It's hard to say something with only the error message.
 >>
 >> Can you post the amdump. and log..0 for the 2
 >> backup set that fail.
 >>
 > I've attached the files (I would put them inline, but one of the sets
 > has over 100 DLE's, so the amdump file is huge, and the others are
 > still over 100k each, and I figured nobody want's to try and wad
 > through those in-line).
 >
 > The set1 and set2 files are for the two backup sets that show the
 > header mismatch error, and the set3 files are for the one that claims
 > failures in the dump summary.

I looked at set3, the error in the 'DUMP SUMMARY' are related to the
error in the 'FAILURE DUMP SUMMARY'

client2 /boot lev 0 FLUSH [File 0 not found]
client3 /boot lev 0 FLUSH [File 0 not found]
client7 /boot lev 0 FLUSH [File 0 not found]
client8 /boot lev 0 FLUSH [File 0 not found]
client0 /boot lev 0 FLUSH [File 0 not found]
client9 /boot lev 0 FLUSH [File 0 not found]
client9 /srv lev 0 FLUSH [File 0 not found]
client9 /var lev 0 FLUSH [File 0 not found]
server0 /boot lev 0 FLUSH [File 0 not found]
client10 /boot lev 0 FLUSH [File 0 not found]
client11 /boot lev 0 FLUSH [File 0 not found]
client12 /boot lev 0 FLUSH [File 0 not found]

They are VAULT attemp, not FLUSH, looking only at the first entry, it
try to vault 'client2 /boot 0 20171024084159' which it expect to find on
tape Server-01. It is an older dump.

Do Server-01 is still there? Did it still contains the dump?

OK, I've done some further investigation by tweaking the labeling a 
bit (which actually fixed a purely cosmetic issue we were having), but 
I'm still seeing the same problem that prompted this thread, and I can 
confirm that the dumps are where Amanda is trying to look for them, 
it's just not seeing them for some reason.  I hadn't thought of this 
before, but could it have something to do with the virtual tape 
library being auto-mounted over NFS on the backup server?

Austin,

Can you try to see if amfetchdump can restore it?

  * amfetchdump CONFIG client2 /boot 20171024084159
At the moment, I'm re-testing things after tweaking some NFS parameters 
for the virtual tape library (apparently the FreeNAS server that's 
actually storing the data didn't have NFSv4 turned on, so it was mounted 
with NFSv3, which we've had issues with before on our network), so I 
can't exactly check immediately, but assuming the problem repeats, I'll 
do that first thing once the test dump is done.

Re: Odd non-fatal errors in amdump reports.

On 2017-11-08 08:03, Jean-Louis Martineau wrote:

On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
 > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
 >> Austin,
 >>
 >> It's hard to say something with only the error message.
 >>
 >> Can you post the amdump. and log..0 for the 2
 >> backup set that fail.
 >>
 > I've attached the files (I would put them inline, but one of the sets
 > has over 100 DLE's, so the amdump file is huge, and the others are
 > still over 100k each, and I figured nobody want's to try and wad
 > through those in-line).
 >
 > The set1 and set2 files are for the two backup sets that show the
 > header mismatch error, and the set3 files are for the one that claims
 > failures in the dump summary.

I looked at set3, the error in the 'DUMP SUMMARY' are related to the
error in the 'FAILURE DUMP SUMMARY'

client2 /boot lev 0 FLUSH [File 0 not found]
client3 /boot lev 0 FLUSH [File 0 not found]
client7 /boot lev 0 FLUSH [File 0 not found]
client8 /boot lev 0 FLUSH [File 0 not found]
client0 /boot lev 0 FLUSH [File 0 not found]
client9 /boot lev 0 FLUSH [File 0 not found]
client9 /srv lev 0 FLUSH [File 0 not found]
client9 /var lev 0 FLUSH [File 0 not found]
server0 /boot lev 0 FLUSH [File 0 not found]
client10 /boot lev 0 FLUSH [File 0 not found]
client11 /boot lev 0 FLUSH [File 0 not found]
client12 /boot lev 0 FLUSH [File 0 not found]

They are VAULT attemp, not FLUSH, looking only at the first entry, it
try to vault 'client2 /boot 0 20171024084159' which it expect to find on
tape Server-01. It is an older dump.

Do Server-01 is still there? Did it still contains the dump?

OK, I've done some further investigation by tweaking the labeling a bit 
(which actually fixed a purely cosmetic issue we were having), but I'm 
still seeing the same problem that prompted this thread, and I can 
confirm that the dumps are where Amanda is trying to look for them, it's 
just not seeing them for some reason.  I hadn't thought of this before, 
but could it have something to do with the virtual tape library being 
auto-mounted over NFS on the backup server?

Re: Odd non-fatal errors in amdump reports.

2017-11-08 Thread Austin S. Hemmelgarn


On 2017-11-08 08:03, Jean-Louis Martineau wrote:

On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
 > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
 >> Austin,
 >>
 >> It's hard to say something with only the error message.
 >>
 >> Can you post the amdump. and log..0 for the 2
 >> backup set that fail.
 >>
 > I've attached the files (I would put them inline, but one of the sets
 > has over 100 DLE's, so the amdump file is huge, and the others are
 > still over 100k each, and I figured nobody want's to try and wad
 > through those in-line).
 >
 > The set1 and set2 files are for the two backup sets that show the
 > header mismatch error, and the set3 files are for the one that claims
 > failures in the dump summary.


I looked at set3, the error in the 'DUMP SUMMARY' are related to the
error in the 'FAILURE DUMP SUMMARY'

client2 /boot lev 0 FLUSH [File 0 not found]
client3 /boot lev 0 FLUSH [File 0 not found]
client7 /boot lev 0 FLUSH [File 0 not found]
client8 /boot lev 0 FLUSH [File 0 not found]
client0 /boot lev 0 FLUSH [File 0 not found]
client9 /boot lev 0 FLUSH [File 0 not found]
client9 /srv lev 0 FLUSH [File 0 not found]
client9 /var lev 0 FLUSH [File 0 not found]
server0 /boot lev 0 FLUSH [File 0 not found]
client10 /boot lev 0 FLUSH [File 0 not found]
client11 /boot lev 0 FLUSH [File 0 not found]
client12 /boot lev 0 FLUSH [File 0 not found]

They are VAULT attemp, not FLUSH, looking only at the first entry, it
try to vault 'client2 /boot 0 20171024084159' which it expect to find on
tape Server-01. It is an older dump.

Do Server-01 is still there? Did it still contains the dump?

Hmm, looks like that's a leftover from changing our labeling format 
shortly after switching to this new configuration.  I thought I purged 
all the stuff with the old label scheme, but I guess not.


It somewhat surprises me that this doesn't give any kind of error 
indication in the e-mail report beyond the 'FAILED' line in the dump 
summary.

Re: amvault with dropbox


On 2017-11-07 13:36, Ned Danieley wrote:

On Tue, Nov 07, 2017 at 01:29:34PM -0500, Austin S. Hemmelgarn wrote:

OK, so you're talking about functionally permanent archiving instead
of keeping old stuff around for a fixed multiple of the dump cycle.
If that's the case, you may be better off pulling the dumps off the
tapes using amfetchdump, and then uploading them for there.  That
use case could in theory be handled better with some extra code in
Amanda, but I don't know how well the lack of deletion would be
handled on Amanda's side.


yeah, I need to upload monthly full dumps to dropbox and keep them forever.
the monthly dumps are to vtapes, and I thought it would be neat if I could
then just transfer the vtapes to dropbox using amvault.

Strictly speaking, amvault doesn't transfer vtapes, it retapes the dumps 
on the vtapes to a new location.  While this sounds like a somewhat 
pointless distinction, it's actually pretty significant because it means 
you can use a different type of tapes for your secondary storage, with 
almost every single tapetype option different (which is extremely useful 
for multiple reasons).  That's actually part of the reason that it's a 
preferred alternative to mirroring tapes with the Amanda's RAIT device.


The issue here though is the 'keep it forever' bit.  If Amanda is given 
an automated tape changer (a library of vtapes is an automated changer), 
it assumes it can reuse the tapes as it sees fit.  I think there's a 
config option that lets you change that, but once you do that, you need 
to keep adding tapes (or vtapes) to the library, which can get out of 
hand really quickly (especially if you don't plan ahead when deciding on 
how things will get labeled).


One option for this though, if you can afford to use something other 
than Dropbox, would be to use the Amazon S3 support to store your data 
in Amazon Glacier storage (which is insanely cheap at about 0.07 USD per 
TB of storage), and enable versioning (so that wen a 'tape' gets 
overwritten, the old version gets kept around) and keep old versions 
forever.  If you're interested in doing this, I can write up 
instructions for how to get things set up with Amazon to do this (We 
actually do something very similar for off-site backups where I work, 
just without Glacier or versioning (but those are easy to set up)).

Re: amvault with dropbox


On 2017-11-07 13:19, Ned Danieley wrote:

On Tue, Nov 07, 2017 at 01:11:43PM -0500, Austin S. Hemmelgarn wrote:

On 2017-11-07 11:55, Ned Danieley wrote:


we use a dropbox business account to archive our data, and I was interested
in trying to use amvault to transfer my amanda backups there. however, it
seems that there is a fair amount of work that would have to be done to the
code base to make that happen, work that is probably beyond my ability.

are any plans to include dropbox access in future versions?


You can do this already without needing any new code.  Just
configure a virtual tape library inside a Dropbox synced directory,
set that as a vaulting location, and recursively add the necessary
read permissions to the directory after each amvault run.


I guess that would work, although I'd have to set up selective sync so I
could remove the files locally without removing them from dropbox. thanks
for the suggestion; I'll give it a try.

OK, so you're talking about functionally permanent archiving instead of 
keeping old stuff around for a fixed multiple of the dump cycle.  If 
that's the case, you may be better off pulling the dumps off the tapes 
using amfetchdump, and then uploading them for there.  That use case 
could in theory be handled better with some extra code in Amanda, but I 
don't know how well the lack of deletion would be handled on Amanda's side.

Re: amvault with dropbox


On 2017-11-07 11:55, Ned Danieley wrote:


we use a dropbox business account to archive our data, and I was interested
in trying to use amvault to transfer my amanda backups there. however, it
seems that there is a fair amount of work that would have to be done to the
code base to make that happen, work that is probably beyond my ability.

are any plans to include dropbox access in future versions?

You can do this already without needing any new code.  Just configure a 
virtual tape library inside a Dropbox synced directory, set that as a 
vaulting location, and recursively add the necessary read permissions to 
the directory after each amvault run.

Re: Odd non-fatal errors in amdump reports.


On 2017-11-07 10:22, Jean-Louis Martineau wrote:

Austin,

It's hard to say something with only the error message.

Can you post the amdump. and log..0 for the 2
backup set that fail.
Yes, though it may take me a while since our policy is pretty strict 
about scrubbing hostnames and usernames from any internal files we make 
visible publicly.


Just to clarify, it will end up being 3 total pairs of files, two from 
backup sets that show the first issue I mentioned (the complaint about a 
header mismatch), and one from the backup set showing the second issue I 
mentioned (the apparently bogus dump failures listed in the dump summary).


The tapedev of the aws changer can be written like:

tapedev "chg-multi:s3:/slot-{0..127}
Thanks, I hadn't know that the configuration file syntax supported 
sequences like this, that makes it look so much nicer!



Jean-Louis

On 07/11/17 09:17 AM, Austin S. Hemmelgarn wrote:
 > Where I work, we recently switched from manually triggered vaulting to
 > automatic vaulting using the vault-storage, vault, and dump-selection
 > options. Things appear to be working correctly, but we keep getting
 > some odd non-fatal error messages (that might be bogus as well, since
 > I've verified the dumps mentioned restore correctly) in the amdump
 > e-mails. I've been trying to figure out these 'errors' for the past
 > few weeks now, and I'm hoping someone on the list might have some advice
 > (or better yet, might recognize the symptoms and know how to fix them).
 >
 > In our configuration, we have three different backup sets (each is on
 > it's own schedule). Of these, two are consistently showing the following
 > error in the amdump e-mail report (I've redacted hostnames and exact 
paths,

 > the second path listed though is a parent directory of the first):
 >
 > taper: FATAL Header of dumpfile does not match command from driver 0 
XXX /home/X 20171031074642 -- 0 XXX 
/home/XX 20171031074642 at 
/usr/lib64/perl5/vendor_perl/5.24.1/Amanda/Taper/Worker.pm line 1168

 >
 > For a given backup set, the particular hostname and paths are always the
 > same, but the backup appears to get taped correctly, and restores
 > correctly as well.
 >
 > With the third backup set, we're regularly seeing things like the
 > following in the dump summary section, but no other visible error
 > messages:
 >
 > DUMPER STATS TAPER STATS
 > HOSTNAME DISK L ORIG-KB OUT-KB COMP% MMM:SS KB/s MMM:SS KB/s
 > - -- 
 ---

 > XX /boot 0 -- FAILED
 > XX /boot 1 10 10 -- 0:00 168.8 0:00 0.0
 >
 > In this case, the particular DLE's affected are always the same,
 > and the first line that claims a failure always shows dump level
 > zero, even when the backup is supposed to be at another level.
 > Just like the other error, the affected dumps always restore
 > correctly when tested, and get correctly vaulted as well. The
 > affected DLE's are only on Linux systems, but it seems to not
 > care what distro or amanda version is being used (it's affected,
 > Debian, Gentoo, and Fedora systems, and covers 5 different
 > Amanda client versions), and are invariably small (sub-gigabyte)
 > filesystems, but I've not found any other commonality among them.
 >
 > All three sets use essentially the same amanda.conf file (the
 > differences are literally just in when they get run), which
 > I've attached in-line at the end of this e-mail with
 > sensitive data redacted. The thing I find particularly odd is
 > that this config is essentially identical to what I use on my
 > personal systems, which are not exhibiting either problem.
 >
 > 8<
 >
 > org "X"
 > mailto "admin"
 > dumpuser "amanda"
 > inparallel 2
 > dumporder "Ss"
 > taperalgo largestfit
 >
 > displayunit "k"
 > netusage 800 Kbps
 >
 > dumpcycle 4 weeks
 > runspercycle 28
 > tapecycle 128 tapes
 >
 > bumppercent 20
 > bumpdays 2
 >
 > etimeout 900
 > dtimeout 1800
 > ctimeout 30
 >
 > device_output_buffer_size 256M
 >
 > compress-index no
 >
 > flush-threshold-dumped 0
 > flush-threshold-scheduled 0
 > taperflush 0
 > autoflush yes
 >
 > runtapes 16
 >
 > define changer vtl {
 > tapedev "chg-disk:/net/XX/amanda/X"
 > changerfile "/etc/amanda/X/changer"
 > property "num-slot" "128"
 > property "auto-create-slot" "yes"
 > }
 >
 > define changer aws {
 > tapedev 
"chg-multi:s3:/slot-{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,1

Odd non-fatal errors in amdump reports.