Re: Only increasing incrementals

2018-12-13 Thread Austin S. Hemmelgarn

On 2018-12-12 19:11, Debra S Baddorf wrote:

Oh, that’s right — Chris DID tell us what he was trying to do,  last week:
(in case it helps with any further answers)


OK, given that what you want appears to be frequent snapshots, Amanda is 
almost certainly _not_ the correct tool for this job for three reasons:


* It can not do atomic snapshots of the filesystem state, so either you 
need to freeze all write I/O to the DLE, or you won't have coherent copy 
of the state of the DLE from when you ran the backup.  This doesn't 
matter most of the time for regular backup usage, because you just run 
Amanda during off-hours when nobody's doing anything and the system is 
idle, but for this it's going to be a potential issue.


* It takes a _lot_ of system resources to do a backup with Amanda.  This 
is mitigated by your proposed approach of doing constantly increasing 
incremental levels, but even for that Amanda has to call `stat()` on 
_everything_ in the DLE.


* It's not trivial (as you have found out) to get this type of thing to 
work reliably.


I would suggest looking at the following alternative approaches:

* Use ZFS, BTRFS, or another filesystem that supports native 
snapshotting, and just do snapshots regularly.  This is likely to be 
your best approach.  In some cases, depending on the platform and 
filesystem, you may not even need to do anything (for example, NILFS2 on 
Linux has implicit snapshots built in because it's a log structured 
filesystem).


* Store all the data on a NAS device that can do snapshots (for example, 
something running FreeNAS), and have it do regular snapshots.  This 
largely reduces to the above, just indirected over the network.


* Use a filesystem that supports automatic native file versioning.  The 
classic example is Files-11 from OpenVMS.  Other options for this 
include GitFS [1], CopyFS [2], Plan 9's Fossil filesystem


* Store all the data on a NAS device that does automatic native file 
versioning.


* If all else fails, you can technically do this with Amanda by using 
`amadmin force` to force the level 0 dump, and `amadmin force-bump` for 
each backup _after the second_ (the first backup after a level 0 will 
always be a level 1) to get the increasing incrementals.


[1] https://www.presslabs.com/code/gitfs/
[2] https://boklm.eu/copyfs/




On Dec 7, 2018, at 12:04 PM, Chris Miller  wrote:

Hi Folks,

I'm about to start a project during which I want to be able to:
• Request a backup at any moment and have that backup be either an incremental 
backup (Level N+1), meaning everything that has changed since the last backup (Level N), 
or a differential backup, meaning everything that has changed since the last full backup 
(Level 1). The second provision, "differential backup" is pretty straight 
forward, but I have no idea how to configure a constantly increasing dump level.
• The first backup of the day, meaning the first backup after midnight, 
will be a full filesystem backup.

Discussion on point 1:
The provision is for capturing changes that occur during a given period of time, and no 
so much for "backup" per se, so AMANDA may not be the best tool, but it is what 
I have, so I'm tying to make it fit. I know how to request a backup, so that's not my 
problem, but I don't know how to force a given level. In particular, I don't know how to 
force a Level N+1 backup. I could replace the Level N+1 requirement with a forced Level 
1, run my experiment, and force a level 2, and this would meet my requirement of 
capturing all the changes during a particular interval. But, again, that requires forcing 
AMANDA to take direction about backup levels and I don't know how to do that.

Before anybody reminds me that this is why god invented git, I would like to add, that 
the scope of git is typically only known parts of the project, and I want to capture log 
files and other files that are sometimes created in temporary locations with temporary 
names, which are not known a priori and therefore can't be "managed" with git.

Discussion on point 2:
The "first backup of the day", will run as a cron job, but it must be a level 
0, full filesystem backup so no work for the day is lost. It is more forcing AMANDA to 
take direction, I don't know exactly how to do this.

I don't think I like the idea of forcing AMANDA (#MeToo) to do things, but I'm 
not above payment in kind. (-:

Thanks for the help,


Re: Only increasing incrementals

2018-12-12 Thread Austin S. Hemmelgarn

On 2018-12-12 02:18, Olivier wrote:

Nathan Stratton Treadway  writes:


On Thu, Nov 22, 2018 at 11:18:25 +0700, Olivier wrote:

Hello,

UI am wondering if there is a way to define a DLE that would all
incrementals but only with increasing level:

- full (0)
- incremental 1
- incremental 2
- incremental 3
- etc.

But never: 0, 1, 1, 1, 2

Each back-up level must be above the previous one or be a full back-up.


I am not sure what you are trying to accomplish,


I am trying to backup something that can only have incremental with
increasing levels: it cannot do two level 1 in a row, levels must be 1,
then 2, then 3, etc. (think some successive snapshots).

According to amanda.conf(5) man page:

bumpdays int
Default: 2 days. To insure redundancy in the dumps, Amanda keeps
filesystems at the same incremental level for at least bumpdays
days, even if the other bump threshold criteria are met.

I want to absolutely cancel that feature, each incremental must have a
level creater than the previous dump and an incremental level can not be
bumped (only level 0 can be bumped).
OK, I"m actually curious what your exact reasoning for requiring this 
is, because I'm seeing exactly zero circumstances where this makes sense 
at all, and can think of multiple ways it's a bad thing (for example, 
losing your level one incremental makes all of your backups for that 
cycle useless).


Re: Dumping and taping in parallel

2018-11-28 Thread Austin S. Hemmelgarn

On 2018-11-28 13:58, Chris Nighswonger wrote:
On Wed, Nov 28, 2018 at 11:17 AM Austin S. Hemmelgarn 
mailto:ahferro...@gmail.com>> wrote:


Based on your configuration, your tapes are configured to store just
short of 800GB of data.

The relevant lines then are these two:

flush-threshold-scheduled 50
flush-threshold-dumped 50


I misunderstood the man pages there and for some reason thought that 
volume referred to the holding disk. Probably because I was reading way 
too fast


In your case, I'd suggest figuring out the average amount of data you
dump each run, and then configuring things to start flushing when about
half that much data has been dumped.  That will still have the taping
run in parallel with the dumping, but will give you enough of a buffer
that the taper should never have to wait for dumps to finish.


So over the last 13 runs, the:

-- smallest volume size has been 152G (19% tape capacity)
-- average volume size has been 254G (32% tape capacity)
-- largest volume size has been 612G (76% tape capacity)

So do the following values look "reasonable" based on those numbers:

flush-threshold-scheduled 25
flush-threshold-dumped 0

That should target the larger sizes which are the ones which tend to lap 
into the next business day.
Probably.  The extent of the experimentation I've done with these is 
determining for certain that I got no performance benefits from not just 
taping backups as they finished (all of my setups use vtapes on fast 
storage, so there's no benefit to me not just taping dumps as they're done).


Re: Dumping and taping in parallel

2018-11-28 Thread Austin S. Hemmelgarn

On 2018-11-28 10:58, Chris Nighswonger wrote:
On Wed, Nov 28, 2018 at 10:49 AM Austin S. Hemmelgarn 
mailto:ahferro...@gmail.com>> wrote:


On 2018-11-28 09:53, Stefan G. Weichinger wrote:
 > Am 28.11.18 um 15:47 schrieb Chris Nighswonger:
 >> So why won't amanda dump and tape at the same time?
 >
 > It does normally, that is what the holding disk is for.
Really?  I was under the impression that it was for making sure you can
finish dumps if something goes wrong with taping, and cache dumps so
they can be written to tape in one pass.  Without a holding disk,
Amanda
dumps straight to tape, which is technically dumping and taping in
parallel.
 >
 > More details might lead to better suggestions.
 >
 > Show your amanda.conf etc
Indeed, though I suspect it's something regarding the flushing
configuration.


inparallel 10
maxdumps 1
# (equivalent to one Tbit/s).
netusage 1073741824
dumporder "STSTStstst"
dumpcycle 5 days
runspercycle 5
tapecycle 13 tapes
runtapes 1
flush-threshold-scheduled 50
flush-threshold-dumped 50
bumpsize 10 Mbytes
bumppercent 0
bumpmult 1.5
bumpdays 2
ctimeout 60
dtimeout 1800
etimeout 300
dumpuser "backup"
tapedev "Quantum-Superloader3-LTO-V4"
autolabel "$c-$b" EMPTY
labelstr "campus-.*"
tapetype LTO-4
logdir "/var/backups/campus/log"
infofile "/var/backups/campus/curinfo"
indexdir "/var/backups/campus/index"
tapelist "/var/backups/campus/tapelist"
autoflush all

holdingdisk hd1 {
     comment "Local striped raid array"
     directory "/storage/campus"
     use 0 Gb
     chunksize 1 Gb
}

define changer Quantum-Superloader3-LTO-V4 {
     tapedev "chg-robot:/dev/sg3"
     property "use-slots" "1-13"
     property "tape-device" "0=tape:/dev/nst0"
     device-property "LEOM" "TRUE"
}

define tapetype LTO-4 {
     comment "Created by amtapetype; compression enabled"
     length 794405408 kbytes
     filemark 1385 kbytes
     speed 77291 kps
     blocksize 512 kbytes
}

Based on your configuration, your tapes are configured to store just 
short of 800GB of data.


The relevant lines then are these two:

flush-threshold-scheduled 50
flush-threshold-dumped 50

The first one tells Amanda to not try flushing anything early if you 
aren't using at least half a tape based on dump size estimates, and the 
second one says that at least half a tape's worth of data must already 
be dumped before flushing will start.  Together, this means Amanda won't 
flush anything to tape until all dumps are done unless you're dumping 
more than half a tape's worth of data each run.


If you set those both to zero, Amanda will start flushing dumps to tape 
as they finish.  Doing so has two disadvantages for you because you're 
using real tapes and not vtapes:


* You can't have Amanda intelligently pack the dumps onto the tape. 
This probably doesn't matter as you appear to have things configured so 
that each run only uses one tape and you haven't explicitly defined a 
`tapealgo` (the default `tapealgo` is a simple dumb FIFO queue, so it 
behaves the same as immediately flushing dumps as they finish).
* You run the risk of having to stop and restart the tape drive multiple 
times while writing dumps.  Put simply, by flushing at the end like 
things are currently, you can gaurantee 100% utilization of the tape 
drive while flushing dumps.  If you flush them as they're done, the 
taper will almost certainly have to wait for some dumps to finish after 
it initially starts writing data.


In your case, I'd suggest figuring out the average amount of data you 
dump each run, and then configuring things to start flushing when about 
half that much data has been dumped.  That will still have the taping 
run in parallel with the dumping, but will give you enough of a buffer 
that the taper should never have to wait for dumps to finish.


Re: Dumping and taping in parallel

2018-11-28 Thread Austin S. Hemmelgarn

On 2018-11-28 09:53, Stefan G. Weichinger wrote:

Am 28.11.18 um 15:47 schrieb Chris Nighswonger:

So why won't amanda dump and tape at the same time?


It does normally, that is what the holding disk is for.
Really?  I was under the impression that it was for making sure you can 
finish dumps if something goes wrong with taping, and cache dumps so 
they can be written to tape in one pass.  Without a holding disk, Amanda 
dumps straight to tape, which is technically dumping and taping in parallel.


More details might lead to better suggestions.

Show your amanda.conf etc
Indeed, though I suspect it's something regarding the flushing 
configuration.


Re: Another dumper question

2018-11-26 Thread Austin S. Hemmelgarn

On 2018-11-26 15:13, Chris Nighswonger wrote:

On Mon, Nov 26, 2018 at 2:32 PM Nathan Stratton Treadway
 wrote:


On Mon, Nov 26, 2018 at 13:56:52 -0500, Austin S. Hemmelgarn wrote:

On 2018-11-26 13:34, Chris Nighswonger wrote:
The other possibility that comes to mind is that your bandwidth
settings are making Amanda decide to limit to one dumper at a time.


Chris, this is certainly the first thing to look at: note in your
amstatus output the line "network free kps: 0":



9 dumpers idle  : 0
taper status: Idle
taper qlen: 1
network free kps: 0
holding space   : 436635431k ( 50.26%)


Hmm... I missed that completely. I'll set it arbitrarily high as
Austin suggested and test it overnight.

Don't feel bad, it's not something that gets actively used by a lot of 
people, so most people don't really think about it.  If used right 
though, it provides the rather neat ability to have Amanda limit it's 
network utilization while running backups, which is really helpful if 
you have to run backups during production hours for some reason.


Re: Another dumper question

2018-11-26 Thread Austin S. Hemmelgarn

On 2018-11-26 13:34, Chris Nighswonger wrote:

So in one particular configuration I have the following lines:

inparallel 10
dumporder "STSTSTSTST"

I would assume that that amanda would spawn 10 dumpers in parallel and
execute them giving priority to largest size and largest time
alternating. I would assume that amanda would do some sort of sorting
of the DLEs based on size and time, set them in descending order, and
the run the first 10 based on the list thereby utilizing all 10
permitted dumpers in parallel.

However, based on the amstatus excerpt below, it looks like amanda
simply starts with the largest size and runs the DLEs one at a time,
not making efficient use of parallel dumpers at all. This has the
unhappy results at times of causing amdump to be running when the next
backup is executed.

I have changed the dumporder to STSTStstst for tonight's run to see if
that makes any  difference. But I don't have much hope it will.

Any thoughts?
Is this all for one host?  If so, that's probably your issue.  By 
default, Amanda will only run at most one DLE per host at a time.  You 
can change this in the dump settings, but I forget what the exact 
configuration parameter is.


The other possibility that comes to mind is that your bandwidth settings 
are making Amanda decide to limit to one dumper at a time.  You can 
easily test that by just setting the `netusage` parameter to an absurdly 
large value like 1073741824 (equivalent to one Tbit/s).


Kind regards,
Chris




 From Mon Nov 26 01:00:01 EST 2018

1   4054117k waiting for dumping
1  6671k waiting for dumping
1   222k waiting for dumping
1  2568k waiting for dumping
1  6846k waiting for dumping
1125447k waiting for dumping
1 91372k waiting for dumping
192k waiting for dumping
132k waiting for dumping
132k waiting for dumping
132k waiting for dumping
132k waiting for dumping
1290840k waiting for dumping
1 76601k waiting for dumping
186k waiting for dumping
1 71414k waiting for dumping
0  44184811k waiting for dumping
1   281k waiting for dumping
1  6981k waiting for dumping
150k waiting for dumping
1 86968k waiting for dumping
1 81649k waiting for dumping
1359952k waiting for dumping
0 198961004k dumping 159842848k ( 80.34%) (7:23:39)
1 73966k waiting for dumping
1821398k waiting for dumping
1674198k waiting for dumping
0 233106841k dump done (7:23:37), waiting for writing to tape
132k waiting for dumping
132k waiting for dumping
1166876k waiting for dumping
132k waiting for dumping
1170895k waiting for dumping
1162817k waiting for dumping
0 failed: planner: [Request to client failed: Connection timed out]
132k waiting for dumping
132k waiting for dumping
053k waiting for dumping
0  77134628k waiting for dumping
1  2911k waiting for dumping
136k waiting for dumping
132k waiting for dumping
1 84935k waiting for dumping

SUMMARY  part  real  estimated
size   size
partition   :  43
estimated   :  42559069311k
flush   :   0 0k
failed  :   10k   (  0.00%)
wait for dumping:  40128740001k   ( 23.03%)
dumping to tape :   00k   (  0.00%)
dumping :   1 159842848k 198961004k ( 80.34%) ( 28.59%)
dumped  :   1 233106841k 231368306k (100.75%) ( 41.70%)
wait for writing:   1 233106841k 231368306k (100.75%) ( 41.70%)
wait to flush   :   0 0k 0k (100.00%) (  0.00%)
writing to tape :   0 0k 0k (  0.00%) (  0.00%)
failed to tape  :   0 0k 0k (  0.00%) (  0.00%)
taped   :   0 0k 0k (  0.00%) (  0.00%)
9 dumpers idle  : 0
taper status: Idle
taper qlen: 1
network free kps: 0
holding space   : 436635431k ( 50.26%)
chunker0 busy   :  6:17:03  ( 98.28%)
  dumper0 busy   :  6:17:03  ( 98.28%)
  0 dumpers busy :  0:06:34  (  1.72%)   0:  0:06:34  (100.00%)
  1 dumper busy  :  6:17:03  ( 98.28%)   0:  6:17:03  (100.00%)



Re: Flushing the Holding Disk

2018-11-16 Thread Austin S. Hemmelgarn

On 2018-11-16 12:27, Chris Miller wrote:

Hi Folks,

I'm unclear on the timing of the flush from holding disk to vtape. 
Suppose I run two backup jobs,and each uses the holding disk. When will 
the second job start? Obviously, after the client has sent everything... 
Before the holding disk flush starts, or after the holding disk flush 
has completed?
If by 'jobs' you mean 'amanda configurations', the second one starts 
when you start it.  Note that `amdump` does not return until everything 
is finished dumping and optionally taping if anything would be taped, so 
you can literally just run each one sequentially in a shell script and 
they won't run in parallel.


If by 'jobs' you mean DLE's, they run as concurrently as you tell Amanda 
to run them.  If you've got things serialized (`inparallel` is set to 1 
in your config), then the next DLE will start dumping once the previous 
one is finished dumping to the holding disk.  Otherwise, however many 
you've said can run in parallel run (within per-host limits), and DLE's 
start when the previous one in sequence for that dumper finishes. 
Taping can (by default) run in parallel with dumping if you're using a 
holding disk, which is generally a good thing, though you can also 
easily configure it to wait for some amount of data to be buffered on 
the holding disk before it starts taping.


Is there any way to defer the holding disk flush until all backup jobs 
for a given night have completed?
Generically, set `autoflush no` in each configuration, and then run 
`amflush` for each configuration once all the dumps are done.


However, unless you've got an odd arrangement where every system 
saturates the network link while actually dumping and you are sharing a 
single link on the Amanda server for both dumping and taping, this 
actually probably won't do anything for your performance.  You can 
easily configure amanda to flush backups from each DLE as soon as they 
are done, and it will wait to exit until everything is actually flushed.


Building from that, if you just want to ensure the `amdump` instances 
don't run in parallel, just use a tool to fire them off sequentially in 
the foreground.  Stuff like Ansible is great for this (especially 
because you can easily conditionally back up your index and tapelist 
when the dump finishes).  As long as the next `amdump` command isn't 
started until the previous one returns, you won't have to worry about 
them fighting each other for bandwidth.


Re: Does anyone know how to make an amadmin $config estimate work for new dle's?

2018-11-16 Thread Austin S. Hemmelgarn

On 2018-11-15 18:18, Gene Heskett wrote:

On Thursday 15 November 2018 14:17:29 Austin S. Hemmelgarn wrote:


On 2018-11-15 13:36, Gene Heskett wrote:

On Thursday 15 November 2018 12:57:54 Austin S. Hemmelgarn wrote:

On 2018-11-15 11:53, Gene Heskett wrote:

On Thursday 15 November 2018 07:36:37 Austin S. Hemmelgarn wrote:

On 2018-11-15 06:16, Gene Heskett wrote:

I ask because after last nights run it showed one huge and 3
teeny level 0's for the 4 new dle's.  So I just re-adjusted the
locations of some categories and broke the big one up into 2
pieces. "./[A-P]*" and ./[Q-Z]*", so the next run will have 5
new dle's.

But an estimate does not show the new names that results in.
I've even took the estimate assignment calcsize back out of the
global dumptype, which ack the manpage, forces the estimates to
be derived from a dummy run of tar, didn't help.

Clues? Having this info from an estimate query might take a
couple hours, but it sure would be helpfull when redesigning
ones dle's.I'm fairly certain you can't, because it specifically
shows server-side


estimates, which have no data to work from if there has never
been a dump run for the DLE.


Even if you told it to user tar for the estimate phase? That has
enough legs to be called a bug. IMO anyway.


As mentioned in one of my other responses, I can kind of see the
value in this not bothering the client systems.  Keep in mind that
server estimates cost nothing on the client, while calcsize or
client estimates may use a significant amount of resources.


My default has been calcsize for three or 4 years, changed because
tar was changed & was screwing up the estimates. I can remember 15+
years ago when I was using real tar estimates, on a much smaller
machine, and it could come within 50 megabytes of filling a DDS-2
tape (4 GB compressed) for weeks at a time. So that part of amanda
worked a lot better than it does today. And its slowly gone to the
dogs as my system grew in complexity.  And went in a handbasket when
I had to change to calcsize during the tar churn.


I've not been using AMANDA anywhere near as long as you have, but I've
actually not seen any issues with accuracy of 'estimate client' mode
estimates with current versions of GNU tar, except when the estimate
ran while data in the DLE was being modified (and in that case, it
makes sense that it would be bogus).  I generally don't 'estimate
client' on my own systems though because it consistently takes far
longer than 'estimate calcsize', and I'm not picky about the estimates
being perfect.


In this case, I do think the documentation should be a bit clearer,


Yes, but who is to rewrite it?  He should know a heck of a lot more
than I do about the amanda innards than I do even after 2 decades,
and better defined words here and there too. diakdevice is a very
poor substitute for the far more common slanguage of "/path/to/"


and it would be useful to be able to get regular (calcsize and/or
client) estimates on-demand, but I do think that the default is
reasonably sane.


It may well be sane, we'll see how it works in the morning. AIUI,
calcsize runs only on old history. so that should not impinge a load
on the client, even when the client is itself.


Unless I'm mistaken:

* 'estimate server' runs only on historical data, and doesn't even
talk to the client systems.  It's good at limiting the impact the
estimate has on the client, but reliably gives bogus estimates if your
DLEs don't show consistent behavior (that is, each backup of a given
level is roughly the same size as every other backup at that level).
* 'estimate client' relies on the backup program being used to give it
info about how big it will be.  It gives estimates that are close to
100% accurate, but currently essentially requires running the backup
process twice (once for the estimate, once for the actual backup) and
imposes a non-negligible amount of load on the client.


That depends on the clients instant duty's. I have backed up a milling
machine while it was running a 90 lines of code, 3 days to finish while
sharpening a saw blade, with no apparent interaction on a dual core atom
powered box. One core was locked away for LCNC, (isolcpus at work) the
other was free to do the backup client. Didn't bither it a bot. :)


* 'estimate calcsize' does something kind of in-between.  AIUI, it
looks at some historical data, and also looks at the on-disk size of
the data,


That would take time to access the dle's, and the answer is effectively
instant, ergo it is not questioning the client(s), it has to be working
only from the history in its own logs.
Except that it actually runs on the client systems.  I've actually 
looked at this, the calcsize program is running on the clients and not 
the server.  It may be looking at the logs there, _but_ it's still 
running on the client.  It may also be _really_ fast in your setup, but 
that doesn't inherently mean it's running locally (Amanda is smart 
e

Re: Does anyone know how to make an amadmin $config estimate work for new dle's?

2018-11-15 Thread Austin S. Hemmelgarn

On 2018-11-15 13:36, Gene Heskett wrote:

On Thursday 15 November 2018 12:57:54 Austin S. Hemmelgarn wrote:


On 2018-11-15 11:53, Gene Heskett wrote:

On Thursday 15 November 2018 07:36:37 Austin S. Hemmelgarn wrote:

On 2018-11-15 06:16, Gene Heskett wrote:

I ask because after last nights run it showed one huge and 3 teeny
level 0's for the 4 new dle's.  So I just re-adjusted the
locations of some categories and broke the big one up into 2
pieces. "./[A-P]*" and ./[Q-Z]*", so the next run will have 5 new
dle's.

But an estimate does not show the new names that results in. I've
even took the estimate assignment calcsize back out of the global
dumptype, which ack the manpage, forces the estimates to be
derived from a dummy run of tar, didn't help.

Clues? Having this info from an estimate query might take a couple
hours, but it sure would be helpfull when redesigning ones
dle's.I'm fairly certain you can't, because it specifically shows
server-side


estimates, which have no data to work from if there has never been
a dump run for the DLE.


Even if you told it to user tar for the estimate phase? That has
enough legs to be called a bug. IMO anyway.


As mentioned in one of my other responses, I can kind of see the value
in this not bothering the client systems.  Keep in mind that server
estimates cost nothing on the client, while calcsize or client
estimates may use a significant amount of resources.


My default has been calcsize for three or 4 years, changed because tar
was changed & was screwing up the estimates. I can remember 15+ years
ago when I was using real tar estimates, on a much smaller machine, and
it could come within 50 megabytes of filling a DDS-2 tape (4 GB
compressed) for weeks at a time. So that part of amanda worked a lot
better than it does today. And its slowly gone to the dogs as my system
grew in complexity.  And went in a handbasket when I had to change to
calcsize during the tar churn.
I've not been using AMANDA anywhere near as long as you have, but I've 
actually not seen any issues with accuracy of 'estimate client' mode 
estimates with current versions of GNU tar, except when the estimate ran 
while data in the DLE was being modified (and in that case, it makes 
sense that it would be bogus).  I generally don't 'estimate client' on 
my own systems though because it consistently takes far longer than 
'estimate calcsize', and I'm not picky about the estimates being perfect.
  

In this case, I do think the documentation should be a bit clearer,


Yes, but who is to rewrite it?  He should know a heck of a lot more than
I do about the amanda innards than I do even after 2 decades, and better
defined words here and there too. diakdevice is a very poor substitute
for the far more common slanguage of "/path/to/"

and it would be useful to be able to get regular (calcsize and/or
client) estimates on-demand, but I do think that the default is
reasonably sane.


It may well be sane, we'll see how it works in the morning. AIUI,
calcsize runs only on old history. so that should not impinge a load on
the client, even when the client is itself.

Unless I'm mistaken:

* 'estimate server' runs only on historical data, and doesn't even talk 
to the client systems.  It's good at limiting the impact the estimate 
has on the client, but reliably gives bogus estimates if your DLEs don't 
show consistent behavior (that is, each backup of a given level is 
roughly the same size as every other backup at that level).
* 'estimate client' relies on the backup program being used to give it 
info about how big it will be.  It gives estimates that are close to 
100% accurate, but currently essentially requires running the backup 
process twice (once for the estimate, once for the actual backup) and 
imposes a non-negligible amount of load on the client.
* 'estimate calcsize' does something kind of in-between.  AIUI, it looks 
at some historical data, and also looks at the on-disk size of the data, 
then factors in compression ratios and such to give an estimate that's 
usually reasonably accurate without needing the DLEs to be consistent or 
imposing significant load on the clients.


Re: Does anyone know how to make an amadmin $config estimate work for new dle's?

2018-11-15 Thread Austin S. Hemmelgarn

On 2018-11-15 11:53, Gene Heskett wrote:

On Thursday 15 November 2018 07:36:37 Austin S. Hemmelgarn wrote:


On 2018-11-15 06:16, Gene Heskett wrote:

I ask because after last nights run it showed one huge and 3 teeny
level 0's for the 4 new dle's.  So I just re-adjusted the locations
of some categories and broke the big one up into 2 pieces.
"./[A-P]*" and ./[Q-Z]*", so the next run will have 5 new dle's.

But an estimate does not show the new names that results in. I've
even took the estimate assignment calcsize back out of the global
dumptype, which ack the manpage, forces the estimates to be derived
from a dummy run of tar, didn't help.

Clues? Having this info from an estimate query might take a couple
hours, but it sure would be helpfull when redesigning ones dle's.I'm
fairly certain you can't, because it specifically shows server-side


estimates, which have no data to work from if there has never been a
dump run for the DLE.


Even if you told it to user tar for the estimate phase? That has enough
legs to be called a bug. IMO anyway.
As mentioned in one of my other responses, I can kind of see the value 
in this not bothering the client systems.  Keep in mind that server 
estimates cost nothing on the client, while calcsize or client estimates 
may use a significant amount of resources.


In this case, I do think the documentation should be a bit clearer, and 
it would be useful to be able to get regular (calcsize and/or client) 
estimates on-demand, but I do think that the default is reasonably sane.


Re: Does anyone know how to make an amadmin $config estimate work for new dle's?

2018-11-15 Thread Austin S. Hemmelgarn

On 2018-11-15 11:21, Chris Nighswonger wrote:
On Thu, Nov 15, 2018 at 7:40 AM Austin S. Hemmelgarn 
mailto:ahferro...@gmail.com>> wrote:


On 2018-11-15 06:16, Gene Heskett wrote:
 > I ask because after last nights run it showed one huge and 3
teeny level
 > 0's for the 4 new dle's.  So I just re-adjusted the locations of some
 > categories and broke the big one up into 2 pieces. "./[A-P]*"
 > and ./[Q-Z]*", so the next run will have 5 new dle's.
 >
 > But an estimate does not show the new names that results in. I've
even
 > took the estimate assignment calcsize back out of the global
dumptype,
 > which ack the manpage, forces the estimates to be derived from a
dummy
 > run of tar, didn't help.
 >
 > Clues? Having this info from an estimate query might take a
couple hours,
 > but it sure would be helpfull when redesigning ones dle's.

I'm fairly certain you can't, because it specifically shows server-side
estimates, which have no data to work from if there has never been a
dump run for the DLE.


What would be the downside to having the amanda client execute ' du -s' 
or some such on the DLE and return the results when amcheck and friends 
realize there is no reliable size estimate? This would seem to be a much 
more accurate estimate than a non-existent server estimate.


My guess is that it's intentionally limited to server estimates to avoid 
putting load on the client systems.  Both calcsize and client estimates 
require reading a nontrivial amount of data on the client side, and 
client estimates also involve a nontrivial amount of processing.


That said, it would be nice to be able to explicitly run any of the 
three types of estimate.


Re: Does anyone know how to make an amadmin $config estimate work for new dle's?

2018-11-15 Thread Austin S. Hemmelgarn

On 2018-11-15 06:16, Gene Heskett wrote:

I ask because after last nights run it showed one huge and 3 teeny level
0's for the 4 new dle's.  So I just re-adjusted the locations of some
categories and broke the big one up into 2 pieces. "./[A-P]*"
and ./[Q-Z]*", so the next run will have 5 new dle's.

But an estimate does not show the new names that results in. I've even
took the estimate assignment calcsize back out of the global dumptype,
which ack the manpage, forces the estimates to be derived from a dummy
run of tar, didn't help.

Clues? Having this info from an estimate query might take a couple hours,
but it sure would be helpfull when redesigning ones dle's.I'm fairly certain you can't, because it specifically shows server-side 
estimates, which have no data to work from if there has never been a 
dump run for the DLE.


Re: Monitor and Manage

2018-11-14 Thread Austin S. Hemmelgarn

On 2018-11-14 10:44, Chris Miller wrote:

Hi Folks,

I now have three working configs, meaning that I can backup three 
clients. There is not much difference among the configs, but that is a 
topic for a different thread. My question is how I manage what AMANDA is 
doing?


So, let's suppose I fire up all three amdumps at once:

  * How do I know if I'm getting level 0 or higher?
  * How do I know the backups are running and have not silently failed?
  * How do I know when they complete?
  * How do I know what has been accomplished?
  * :

These are all the sort of questions that might be answered by some sort 
of dashboard, but I haven't heard of any such thing, nor do I expect to 
hear of any such thing, but I am also equally sure that all the answers 
exist. I just don't know where.


In short, how do I monitor and mange AMANDA?
Well, for generic monitoring, make sure the system can deliver email and 
you have the aliases set up appropriately, and then configure Amanda to 
email you a report when the dump completes.


The reports themselves are actually rather thorough, going over both 
aggregate timing and performance information as well as the useful 
generic stuff like knowing what dump level everything ran at and what 
tapes got used.


You can get similar details for the last dump (or the current one if one 
is in-progress) using the `amstatus` command, which will also show 
progress info for individual DLE's if there is a dump running currently.


For more in-depth management, take a look at the `amadmin` and `amtape` 
commands, they both provide useful functionality for general management 
that doesn't involve actually running the backups, including:


* Forcing a level 0 or level 1 dump for any or all of the DLE's for the 
next run.  Don't get in the habit of doing this regularly, overriding 
the planner will usually not get you good results.
* Forcing Amanda to bump to a new dump level for a given DLE.  Again, 
don't do this regularly.
* Querying when the next level 0 dump is due for a given DLE.  This 
gives you an upper limit on when the DLE will get a level 0 dump 
assuming you stick to the schedule you told Amanda about.
* Querying details about all currently stored backups, including dates, 
location, and dump status.

* Querying the state of all the tapes/vtapes Amanda is managing.


Re: dumporder

2018-11-05 Thread Austin S. Hemmelgarn

On 11/5/2018 1:31 PM, Chris Nighswonger wrote:

Is there any wisdom available on optimization of dumporder?


This is personal experience only, but I find that in general, if all 
your dumps run at about the same speed and you don't have to worry about 
bandwidth, using something like the following generally gets reasonably 
good behavior:


'ssSS'

In essence, it ensures that the smallest dumps complete fast, while 
still making sure the big ones get started early.


Where I work, we've got a couple of slow systems, and I find that this 
works a bit better under those circumstances:


'ssST'

Similar to the above, except it makes sure that the long backups get 
started early too (I would use 'ssTT', except that we only run one DLE 
at a time for any given host).


Re: Can AMANDA be configured "per client"?

2018-11-05 Thread Austin S. Hemmelgarn

On 11/5/2018 1:05 PM, Chris Miller wrote:

Hi Folks,

I have four servers, henceforth AMANDA clients, that I need to backup 
and a lot of NAS space available. I'd like to configure AMANDA to treat 
each of the four AMANDA clients as if it were the only client, meaning 
each client should have it's own configuration which includes location 
for backup storage. I have a 3 TB staging disk on the AMANDA server. I 
have reasons for the individual treatment of clients that include 
off-site storage requirements and differing data sensitivity, so the 
simple solution is to be able to configure AMANDA to treat each client 
as a single case, so I can provide for proper security and custody of 
the backups. Can this be done?
Yes, just create a separate configuration for each client on the server 
(that is, a separate amanda.conf and disklist for each client, with each 
pair being in it's own sub-folder of your main amanda configuration 
directory).  This is actually a pretty common practice a lot of places 
(for example, the company I work for has 3 separate configurations that 
run at different times overnight and have slightly different parameters 
for the actual backups).  The only caveat is that you have to explicitly 
run dumps for each configuration, but that's really not that hard.


Please refer to the small table below. I have some basic questions, but 
the volume of documentation is difficult to grasp all at once, so please 
forgive what might seem like trivial questions; they are not yet trivial 
to me.


Using 10.1.1.10 from the table below as an example:

 1. I think I define the length of my tapes to be the maximum for a
given client backup, which is the size of a level 0 dump, which is
135 GB for the example of 10.1.1.10. Since I want to configure
AMANDA to treat each client as an individual and not part of a
collection of backup tasks, I assume AMANDA will use one vtape per
client per night. Can this be done? How do I qualify configuration
settings per client?
The main part of this should be answered by my comment above (if you 
have separate configurations, it's trivial to specify different settings 
for each client).


That said, you probably want the vtape size to be _larger_ than your 
current theoretical max backup size, because it's very hard to change 
the vtape 'size' after the fact, and if you run out of space the whole 
backup may fail.  Keep in mind that vtapes only take up whatever space 
is necessary for the data being stored on them (plus a bit extra for the 
virtual label), so you can set this to an arbitrarily large value.  As 
an example, the vtape configuration where I work specifies 2TB vtapes, 
because that's an amount I know for certain we will never hit in one 
run, even if everything is a level 0 backup.



 2. I have planned for one level 0 and five level 1 backups per week. Do
I call this "a cycle"? I think I need 185 GB storage per "cycle" and
this tells me how many "cycles", in this case, weeks, I can store
before I have to re-use tapes. Does this mean I can plan on a 43
cycle (week) retention of my backups? Will AMANDA append to tapes,
meaning can I put a full week on one vtape?This doesn't _quite_ line up with what Amanda calls a cycle, see my 
comments on your next question for more info on that.  Also, as 
mentioned above, assume you will need more space than you calculated, 
failed backups are a pain to deal with.


As far as taping, Amanda _never_ appends to a tape, it only ever 
rewrites whole tapes.  While it's technically possible to get Amanda to 
pack all the data it can onto one tape across multiple runs, it's 
generally only a good idea to do this if you need to store backups on a 
very limited number of physical tapes because:


* It means that some of your backup data may sit around on the Amanda 
server for an extended period of time before being taped (if you're 
doing a full week's worth of backups on one tape, that level zero backup 
won't get taped until the end of the week).


* Amanda rewrites whole tapes.  This means you will lose all backups on 
a tape when it gets reused.


Because you don't have any wasted storage space using vtapes, it's 
better to just plan on one vtape per run, specify a number appropriate 
for your retention requirements (plus a few extra to allow recovery from 
errors), and then just let Amanda run.  Such a configuration is more 
reliable and significantly more predictable.


As another concrete example from the configuration where I work:  We do 
4 week cycles (so we have at least one level zero backup every 28 days 
for each disk list entry), do daily backups, and retain backups for 16 
weeks.  For our vtape configuration, this translates to requiring 112 
tapes for all the current backups.  We need to be able to access the 
oldest backups during the current cycle, so we have an additional 
cycle's worth of tapes as well (bringing the total up to 130).  We also 
need to guarantee 

Re: Zmanda acquired from Carbonite by BETSOL -- future of Amanda development?

2018-10-02 Thread Austin S. Hemmelgarn

On 2018-10-02 13:29, Gene Heskett wrote:

On Tuesday 02 October 2018 12:34:40 Ashwin Krishna wrote:


Hi All,

We propose to have the call on Oct 8th at 11 AM Mountain Time.

Agenda:
*   Zmanda's Acquisition by BETSOL
*   Attendee Introductions
*   Existing Governance Model of Amanda Community
*   Suggested changes to the Governance Model
*   BETSOL's Commitment to Open Source Community

We have taken a note of all the suggestions received on the mailing
list and we will go through the same on the call.

Meeting Details:
Amanda Open Source Community Discussion
Mon, Oct 8, 2018 11:00 AM - 12:00 PM MDT
Please join my meeting from your computer, tablet or smartphone.
https://global.gotomeeting.com/join/438069045
You can also dial in using your phone.
United States: +1 (786) 535-3211
Access Code: 438-069-045
First GoToMeeting? Let's do a quick system check:
https://link.gotomeeting.com/system-check

Regards,
Ashwin Krishna

-Original Message-
From: Nathan Stratton Treadway 
Sent: Thursday, September 27, 2018 9:01 PM
To: Ashwin Krishna 
Cc: amanda-users@amanda.org
Subject: Re: Zmanda acquired from Carbonite by BETSOL -- future of
Amanda development?

Ashwin, thanks very much for getting in contact with the Amanda
mailing list.

On Thu, Sep 27, 2018 at 06:16:02 +, Ashwin Krishna wrote:

We are 100% committed to the open source community and will be
contributing to the code base to the best of our abilities.


[...]


I want to assure you that we are actively investing in growing
Amanda and we have young enthusiastic engineers in the team.

You can expect the next Amanda releases to include support for newer
versions of operating systems, defect fixes, security enhancements
etc.


[...]


We have retained the team members that we could of previous Zmanda
team. I can tell you that it's not easy without support from the
community members. We encourage the community members to guide and
contribute as much as you can. If you need commit access to the code
base, please don't hesitate to reach out to us. You can expect our
commitment and support to you.


On Thu, Sep 27, 2018 at 22:54:02 +, Ashwin Krishna wrote:

We are planning to host a conference call and would like all the
active admins and community members to join to have a discussion
with the Zmanda team at BETSOL regarding future collaborations.

Will be sending out the meeting details (US time) with the agenda
later.


It sounds like getting the new BETSOL team in direct contact with the
admins for the mailing list and other amanda.org-related resources in
an important step at this point.


However, I would say that for many of us here on the list, the most
notable change in the past 7 months is not related those things (which
have continued to chug along as before), but rather the lack of "a
developer" to move things along here on the public lists and in the
public source repo.

A decade or two ago it sounds like there were a number of developers
involved, but more recently it's just been one or two Zmanda people
who have served that role.

Obviously this could be a good time to reconsider this arrangement if
there are in fact other people ready to jump in, but off hand I'm
guessing that what's likely to work going forward is for there to be a
small number of BETSOL developers back in that role.

As an Amanda user who has tried to contribute back a few improvements
to the code line, I'm not really looking to have direct commit access
myself, but rather hope to get back to someone (hanging out here on
the mailing lists) who can take the patches I came up with hacking
around on my own system and understand whether or not they will really
work for everyone, and who will know which branches should have that
change pushed onto them, and what tweaks are needed to make the patch
apply to some older branch, etc.

So, here's hoping you all at BETSOL are soon able to identify
someone/a few people to take over that function, and patches and
discussions can start flowing again

Nathan

p.s. Personally I'd say that, rather than than a new major release
with support for newer versions of operating systems and whatnot.,
more urgent would be a minor release to gather up the handful of
bugfixes which have already been discussed since 3.5.1 came out and
get them published as part of an official release


+10, the 3.3.7p1 planner in particular is in serious need of help. It
refuses to adjust the schedule of the 3 largest members of my disklist,
choosing instead to do all three level 0's on the same run, so a 30 gig
average backup, has become 24 gigs for many nights, followed by a 60+
gig run using 3 vtapes. 5 or 6 tapelist cycles in a row now. I'd build
this mythical 3.5.1 but its been hidden someplace my browsing has not
found.

You should be able to get 3.5.1 here:
https://sourceforge.net/projects/amanda/files/amanda%20-%20stable/3.5.1/

That said, 3.5.1 doesn't seem to be much 

Re: Weird amdump behavior

2018-07-30 Thread Austin S. Hemmelgarn

On 2018-07-30 00:38, Kamil Jońca wrote:

Gene Heskett  writes:


On Saturday 28 July 2018 08:30:27 Kamil Jońca wrote:


Gene Heskett  writes:

[..]


Too many dumps per spindle, drive seeks take time=timeout?


As I can see in gdb/strace planner hangs on "futex"
'futex' is short for 'Fast Userspace muTEX', it's a synchronization 
primitive.  Based on personal experience (not with Amanda, but just 
debugging software hangs in general), this usually means it's either a 
threading issue, or that you've ended up with a deadlock somewhere 
between processes.  Regardless, it's probably an issue on the local 
system, and most likely only happens when backing up more than one 
client because you have more processes/threads involved and actually 
doing things in that case.


This is probably going to sound stupid, but try 
updating/rebuilding/reinstalling Perl, whatever Perl packages Amanda 
depends on (I don't remember which packages they are), and Amanda 
itself.  Most of the time when I see this kind of issue, it ends up 
being a case of at-rest data corruption in the executables or libraries, 
and reinstalling the problem software typically fixes things.




1. I do not configure spindle at all.


So its possible to have multiple dumps from the same spindle at the same
time.


No. There is another parameter,
--8<---cut here---start->8---
  maxdumps int
Default: 1. The maximum number of backups from a single host
that Amanda will attempt to run in parallel. See also the
inparallel option.
--8<---cut here---end--->8---

And I use default value, so I have at most one dump per host at once
(and I am quite happy with this)

Of course I can change spindles for testing, but, to be honest, I do
not understand, how should that help.




Please, give every disk in each machine its own unique spindle number.
Your backups should be done much faster.


I do not want faster dumps . I want working dumps.

KJ





Re: taper should wait until all dumps are done

2018-07-27 Thread Austin S. Hemmelgarn

On 2018-07-27 14:15, Stefan G. Weichinger wrote:

Am 27.07.2018 um 19:37 schrieb Austin S. Hemmelgarn:


Perhaps I can help with that.


Great stuff, thanks for your informative reply, that's exactly the 
information I would like to have in the docs etc


Will consult that in detail asap.

A quick note on what I try to solve here:

I have servers with only one big RAID-array consisting of maybe 4 or 6 
physical disks, and based on that (software-)RAID there is one LVM 
volume group. So the logical volumes containing the data to be backed up 
(DLEs) are on the same array as the other LV providing the amanda 
holding disk.


Yes, I know, that's not optimal, though I can't easily change that (I 
would have to add separate disks for holding disk purpose ... cost and 
space/controller issues)
Don't worry, I've got to deal with similarly sub-optimal stuff where I 
work (our backup server has to multiplex all the dumps _and_ taping over 
a single GbE connection, so our backups are _always_ network-bound, even 
when we do really aggressive compression), so I entirely understand.


So I want to avoid too much parallel activity of dumper and taper 
processes because that lets the throughput drop down massively (not to 
mention the additional stress on the hardware).


So it would be great to be able to tell amanda "the DLEs coming from the 
amanda client which is the amanda server (~localhost) should be dumped 
to holdingdisk while no taper processes run"


Or something in that direction.

I will consider reducing maxdumps to 4 as well and test "" for 
tonight's run.


And yes, I also test "holdingdisk no" for some DLEs already: I have big 
chunks of VM backups where it doesn't make sense to copy them within the 
RAID array ... I tape them directly.
If you're taping to vtapes, you might actually be able to set things up 
to not need a holding disk at all.  I'm a bit fuzzy on how to configure 
it, but I know it's possible to set up vtapes to tape things in 
parallel.  If you do that, you could (probably, again not 100% certain) 
get rid of the holding disk, dump direct to the vtapes, and still have 
the dumps run in parallel.  That would avoid having to worry about the 
taper processes competing with the dumper processes.  The only caveat is 
that failure to tape would mean failure to dump too, but the number of 
situations where you would fail to tape but still be able to dump to the 
same array as a holding disk is near zero, and the only one I can think 
of off the top of my head is completely avoided by not having a holding 
disk.


Re: taper should wait until all dumps are done

2018-07-27 Thread Austin S. Hemmelgarn

On 2018-07-19 09:41, Stefan G. Weichinger wrote:


I know about the 2 parameters

flush-threshold-dumped
flush-threshold-scheduled

but how to make sure that *all* the planned dumps are done before
writing to tape?

Some kind of "taper-wait" ...

Or just by trial-and-error with the 2 mentioned parameters?

You can do this by figuring out the upper limit of how much space all 
your backups will need, figuring out what percentage of your tape size 
that translates to, and then setting both of the flush-threshold values 
to that percentage, taperflush to 0 (to flush everything), and autoflush 
to 'yes' (so that it actually flushes the data).


However, keep in mind that for this to work, your holding disk has to be 
able to hold all of your dumps for a single run simultaneously.


Re: taper should wait until all dumps are done

2018-07-27 Thread Austin S. Hemmelgarn

On 2018-07-27 12:23, Stefan G. Weichinger wrote:

Am 27.07.2018 um 17:02 schrieb Jean-Francois Malouin:


You should also consider playing with dumporder.
I have it set to '' and that makes the longest (time wise)
dumps go first so that the fast ones get push at the end.
In one config I have:

dumporder ""
flush-threshold-dumped 100
flush-threshold-scheduled 100
taperflush 100
autoflush yes

so that all the dumps will wait until the longest one are done.
It also won't go until it can fill one volume (100%). You can
obviously go further than that if you have enough hold disk.

Or at least it's my understanding...


(the ML was down for a while, so that's the reason for my delayed 
response, it should work now)


I checked "dumporder" in that config, it was "BTBT...", I changed it to 
"TTT..." now for a test.


Although I am not 100% convinced that this will do the trick ;-)

We will see.

I never fully understood that parameter and its influence so far, to me 
it's a bit "unintuitive".

Perhaps I can help with that.

Part of what Amanda's scheduling does is figure out the size that each 
backup will be on each run (based on the estimate process), how much 
bandwidth it will need while dumping (based on the bandwidth settings 
for that particular dump type), and the amount of time it will take 
(predicted based on the size, prior timing data, and possibly the 
bandwidth).  That information is then used together with the 'dumporder' 
setting to control how each dumper chooses what dump to do next when it 
finishes dumping.  Each letter in the value corresponds to exactly one 
dumper, and controls only that dumper's selection.


The size-based selection is generally the easiest to explain, it just 
says to pick the largest (for 'S') or smallest (for 's') dump out of the 
set and run that next.


The bandwidth-based selection is only relevant if you have bandwidth 
settings configured.  Without them, it treats all dumps as equal, and 
picks the next dump based solely on the order that amanda has them 
sorted (which, IIRC, matches the order found in the disk list).  With 
them, it uses a similar selection method to the size-based selection, 
just looking at bandwidth instead of size.


The time-based selection is where things get tricky, but they get tricky 
because of how complicated it is to predict how long a dump will take, 
not because the selection is complicated (it works just like size-based 
selection, just looking at estimated runtime instead of size).  Pretty 
much, the timing data is extrapolated by looking at previous dumps of 
the DLE, correlating size and actual run-time.  I'm not sure what 
fitting method it uses for the extrapolation (my first guess would be 
simple linear extrapolation, because that's easy and should work most of 
the time), and I'm also not sure what, if any, impact bandwidth has on 
the calculation.


So, in short you have:

* 'S' and 's': Simple deterministic selection based on the predicted 
size of the dump.
* 'B' and 'b': Simple deterministic selection based on bandwidth 
settings if they are defined, otherwise trivial FIFO selection.
* 'T' and 't': Not quite deterministic selection based on predicted 
execution time of the dump process.


So, for a couple of examples:

* The default setting 'BTBTBTBT' This will have half the dumpers select 
dumps that will take the largest amount of time, and the other select 
the ones that will take the largest amount of bandwidth.  This works 
reasonably well if you have bandwidth settings configured and wide 
variance in dump size.


* What you're looking at testing '': This is a trivial case of 
all dumpers selecting the dumps that will take the longest time.  If 
you're dumping almost all similar hosts, this will be essentially 
equivalent to just selecting the largest.  If you're dumping a wide 
variety of different hosts, it will be equivalent to selecting the 
largest on the first dump, but after that will select based on which 
system takes the longest.


* What I use on my own systems 'SSss' (I only run four dumpers, not 
eight):  This is a reasonably simple option that gives a good balance 
between getting dumps done as quickly as possible, and not wasting time 
waiting on the big ones.  Two of the dumpers select whatever dump is the 
largest, so that some of the big ones get started right away, while the 
other two select the smallest dumps, so that those get backed up 
immediately.  I've done some really simple testing that indicates that 
this actually gets all the dumps done faster on average than the default 
for the case of all your systems being able to dump data at the same rate.


* What we use where I work 'TTss': This is one where things get a 
bit complicated.  There are three different ways things get selected 
here.  First, two of the eight dumpers will select dumps that are going 
to take the longest amount of time.  Then, you have four that will pull 
the largest ones, and two that 

Re: custom_compress with zstd

2018-04-04 Thread Austin S. Hemmelgarn

On 2018-04-04 06:01, Stefan G. Weichinger wrote:

Am 2018-04-03 um 20:52 schrieb Austin S. Hemmelgarn:

On 2018-04-03 14:25, Stefan G. Weichinger wrote:


Does anyone already use zstd  https://en.wikipedia.org/wiki/Zstandard
with amanda?

I will try to define an initial dumptype and play around although I
wonder if the standard behavior leads to any problems.

zstd does not remove the source file after de/compression per default
(only with "--rm") ... but as it is used within a pipe (?) with amanda I
assume that won't hurt.

The "-d" for decompression is there so that should work.



I've been using it for a few months now both at home and at work.  It
works just fine as-is and gets pretty good performance.

In both cases though, I actually use a wrapper script.  The one for
backups at work just adds `-T2` to the zstd command line as our backup
server has lots of CPU (and CPU time), but the backups are
network-limited.  At home, I also bump the compression level as high as
I can without needing special decompression options (so the full command
line at home that the wrapper passes is `-19 --long --zstd=hlog=26 -T2`).

I've done numerous restores from both sets of backups both with and
without the wrapper script (I initially set both up to just use zstd
directly), and it all appears to work just fine.


Would this work as well?
That's essentially what I used initially, and I had no issues with it at 
all either backing things up or restoring.


->

define dumptype client-zstd-tar {
global
program "GNUTAR"
comment "custom client compression dumped with tar"
compress client custom
client_custom_compress "/usr/bin/zstd"
}





Re: custom_compress with zstd

2018-04-03 Thread Austin S. Hemmelgarn

On 2018-04-03 14:25, Stefan G. Weichinger wrote:


Does anyone already use zstd  https://en.wikipedia.org/wiki/Zstandard
with amanda?

I will try to define an initial dumptype and play around although I
wonder if the standard behavior leads to any problems.

zstd does not remove the source file after de/compression per default
(only with "--rm") ... but as it is used within a pipe (?) with amanda I
assume that won't hurt.

The "-d" for decompression is there so that should work.



I've been using it for a few months now both at home and at work.  It 
works just fine as-is and gets pretty good performance.


In both cases though, I actually use a wrapper script.  The one for 
backups at work just adds `-T2` to the zstd command line as our backup 
server has lots of CPU (and CPU time), but the backups are 
network-limited.  At home, I also bump the compression level as high as 
I can without needing special decompression options (so the full command 
line at home that the wrapper passes is `-19 --long --zstd=hlog=26 -T2`).


I've done numerous restores from both sets of backups both with and 
without the wrapper script (I initially set both up to just use zstd 
directly), and it all appears to work just fine.


Re: Amanda clients running Docker

2018-03-29 Thread Austin S. Hemmelgarn

On 2018-03-27 11:12, Joi L. Ellis wrote:
I’m looking for information about how best to manage Amanda clients upon 
which are Devs are running docker containers.  Some of the production 
hosts are also running containers.  Does anyone have suggestions 
regarding best practices for backing up docker containers in an Amanda 
environment?  (I don’t use docker and I haven’t found anything online 
discussing containers on Amanda clients.)


Any pointers, suggestions, or online references would be very welcome.
I don't use Docker myself, but I do use LXC and know a lot of people who 
use a wide variety of container platforms including Docker, and the 
general principals are pretty much the same regardless of platform.


You have 5 options for handling container backups with Amanda:

1. Back up the containers as part of the regular host-system backup, and 
do all the containers together as one DLE.
2. Back up the containers as part of the regular host system backup with 
each container being it's own DLE (or DLE's).
3. Back up the containers in a separate backup set from the host system, 
with one DLE per host system.
4. Back up the containers in a separate backup set from the host system, 
with one DLE per container.

5. Back up the containers from the containers themselves.

Of these, most people I know use option 2 or 4 (I use approach 2 with 
locally written integration with the LXC to get the list of containers 
to back up).  Option 1 is probably the easiest, but can have performance 
issues if you have lots of containers (and requires a bit of effort to 
make sure you don't back up transient things like CI build containers). 
Option 3 suffers from the same issues that option 1 does, but takes more 
effort to set up.  Option 5 violates principles of minimalism, and is 
only really practical if your containers are full-system images instead 
of just bare-bones micro-services.


Re: some suggested config parameters for backups to local disk

2018-03-23 Thread Austin S. Hemmelgarn

On 2018-03-23 08:25, hy...@lactose.homelinux.net wrote:

"Ryan, Lyle (US)" writes:


The server has an 11TB filesystem to store the backups in.  I should
probably be fancier and split this up more, but not now.   So I've got my
holding, state, and vtapes directories all in there.


In this scenario, I would think there's no point to a "holding" disk.

I use a holding disk because my actual backup disk is external-USB and
(comparatively) slow.  So I backup to a holding disk on my internal
SSD, releasing the client and the network as soon as possible, and then
copy the backup to the backup drive afterwards.  But in your case, I
don't see any benefit.

There are two other benefits to having a holding disk:

1. It lets you run dumps in parallel.  Without a holding disk (or some 
somewhat complicated setup of the vtapes to allow parallel taping), you 
can only dump one DLE at a time because it dumps directly to tape.


2. It lets you defer taping until you have some minimum amount of data 
ready to be taped.  This may sound kind of useless when working with 
vtapes, but if the holding disk is on the same device as the final vtape 
library, deferring until the dumps are all done (or at least, almost all 
done) can help improve dumping performance, because the dump processes 
won't be competing with the taper process for disk bandwidth.


Re: some suggested config parameters for backups to local disk

2018-03-23 Thread Austin S. Hemmelgarn

On 2018-03-22 19:03, Ryan, Lyle (US) wrote:
I've got an Amanda 3.4.5 server running on Centos 7 now, and am able to 
do rudimentary backups of a remote client.


But in spite of reading man pages, HowTo's, etc, I need help choosing 
config params.  I don't mind continuing to read and experiment, but if 
someone could get me at least in the ballpark, I'd really appreciate it.


The server has an 11TB filesystem to store the backups in.  I should 
probably be fancier and split this up more, but not now.   So I've got 
my holding, state, and vtapes directories all in there.


The main client I want to back up has 4TB I want to backup.  It's almost 
all in one filesystem, but the HowTo for splitting DLE's with exclude 
lists is clear, so it should be easy to split this into (say) 10 smaller 
individual dumps.  The bulk of the data is pretty static, maybe 
10%/month changes.  It's hard to imagine 20%/month changing.


For a start, I'd like to get a full done every 2 weeks, and 
incrementals/differentials on the intervening days.   If I have room to 
keep 2 fulls (2 complete dumpcycles) that would be great.
Given what you've said, you should have enough room to do so, but only 
if you use compression.  Assuming the rate of change you quote above s 
approximately constant and doesn't result in bumping to a level higher 
than 1, then without compression you will need roughly 4.015TB per cycle 
(4TB for the full backup, ~15.38GB for the incrementals (roughly 0.38% 
change per day for 13 days)), plus 4TB of space for the holding disk 
(because you have to have room for a full backup _there_ prior to taping 
anything).  With compression and assuming you get a compression ratio of 
about 50%, you should actually be able to fit four complete cycles (you 
would need about 2.0075TB per cycle), though if you decide you want that 
I would bump the tapecycle to 60 and the number of slots to 60.


So I'm thinking:

- dumpcycle = 14

- runspercycle = 0 (default)

- tapecycle = 30

- runtapes = 1 (default)

I'd break the filesystem into 10 pieces, so 400GB each. and make the 
vtapes 400GB each (with tapetype length) relying on server-side 
compression to make it fit.


The HowTo "Use pigz to speed compression" looks clear, and the DL380 G7 
isn't doing anything else, so server-side compression sounds good.


Any advice on this or better ideas?  Maybe I'm off in left-field.

And one bonus question:  I'm assuming Amanda will just make vtapes as 
necessary, but is there any guidance as to how many vtape slots I should 
create ahead of time?  If my dumpcycle=14, maybe create 14 slots just to 
make tapes easier to find?


Debra covered the requirements for vtapes, slots, and everything very 
well in her reply, so I won't repeat any of that here.  I do however 
have some other more generic advice I can give based on my own experience:


* Make your vtapes as large as possible.  They won't take up any space 
beyond what's stored on them (in storage terminology, they're thinly 
provisioned), so their total 'virtual' size can be far more than your 
actual storage capacity, but if you can make it so that you can always 
fit a full backup on a single vtape, it will make figuring out how many 
vtapes you need easier, and additionally give a slight boost to taping 
performance (because the taper never has to stop to switch to a new 
vtape).  In your case, I'd say stating 5TB for your vtape size is 
reasonable, that would give you some extra room if you suddenly have 
more data without being insanely over-sized.


* Make sure to set a reasonable part_size for your vtapes.  While you 
wouldn't have to worry about splitting dumps if you take my above advice 
about vtape size, using parts has some other performance related 
advantages.  I normally use 1G, but all of my dumps are less than 100G 
in size.  In your case, if you'll have 10 400G dumps, I'd probably go 
for 4G for the part size.


* Match your holding disk chunk size to your vtape's part_size.  I have 
no hard number to back this up, but it appears to provide a slight 
performance improvement while dumping data.


* Don't worry right now about parallelizing the taping process.  It's 
somewhat complicated to get it working right, significantly changes how 
you have to calculate vtape slots and sizes, and will probably not 
provide much benefit unless you're taping to a really fast RAID array 
that does a very good job of handling parallel writes.


* There's essentially zero performance benefit to having your holding 
disk on a separate partition from your final storage unless you have it 
on a completely separate disk.  There are some benefits in terms of 
reliability, but realizing them requires some significant planning (you 
have to figure out exactly what amount of space your holding disk will 
need).


* If you're indexing the backups, store the working index directory (the 
one Amanda actually reads and writes to) on a separate drive from the 
holding disk and final backup 

Re: installing on Centos 7 - some newbee questions

2018-03-09 Thread Austin S. Hemmelgarn

On 2018-03-07 21:30, Ryan, Lyle (US) wrote:
Hello all.  I’m getting my first Amanda server running on Centos 7 and 
have a few questions:


- Centos is packaged with 3.3.3   Is that good enough or should I build 3.5?
Provided it's not missing any features you need and doesn't have any 
bugs that affect you, yeah it should be fine (and assuming of course 
you're not exposing it to the internet).  This applies even if you've 
got other versions on the network too (provided the protocols match up, 
it's perfectly possible to run differing versions of Amanda throughout 
the network).


- the server will use only disks, no tapes.   10TB, mostly all devoted 
to /home (though I could repartition)


- I believe I still use vtapes and a holding disk, even though they’ll 
all just be directories on the main partition.  sound right?
Yes.  The holding disk is actually pretty important even when using 
vtapes for two reasons:


1. It allows you to back-up DLE's that are larger than the size you've 
specified for your vtapes.
2. It lets you run multiple backups in parallel without having to jump 
through hoops to allow Amanda to write to multiple vtapes in parallel.


One quick tip regarding this type of configuration:  Try to match the 
part-size tapetype option and the chunksize option for the holding disk. 
 As stupid as it sounds, matching these actually improves performance 
by a measurable amount in most cases.  If you've got a bunch of big 
backups, 1GB is generally a reasonable size for both.


- I follow the instructions at 
https://wiki.zmanda.com/index.php/GSWA/Build_a_Basic_Configuration but 
when running amcheck get the error:


    can not stat /var/lib/Amanda/gnutar-lists

- indeed there is no file present there.  any ideas?
Just create it and set the correct permissions.  Strictly speaking, the 
package should create this when installed, but it seems a number of 
distributions' packages don't do so.


Re: keep a backup forever

2018-01-31 Thread Austin S. Hemmelgarn

On 2018-01-30 17:18, ghe wrote:

On 01/30/2018 12:29 PM, hy...@lactose.homelinux.net wrote:

I feel like I've asked this before, but I can't find any emails.
I can't believe this isn't an FAQ.  Or rather, there is an FAQ, but the
answer is (a) very sparse and (b) doesn't really answer the question.

I had a machine.  That machine was getting regular backups.  The machine
died.  I have replaced it with a new machine.  So having had this
emergency, I now want to keep, in perpetuity, my last full backup of
the now-dead machine.


How big was the dead disk? Do you have space to store the whole thing?

Did amanda do a level 0 of the whole dead disk to 17? If not, there are
very likely pieces of that disk on several of your virtual tapes.
amrestore deals with all that.


The backup in question is on (virtual) tape number 17.  So let's say
I take the approparite files that are in my /storage/amanda/vtapes/slot17
directory and copy them somewhere safe.  Six months go by, my real
slot17 gets reused, and I take those old files and copy them into slot44.

What is my next step?  How do I get those backups back into my amanda
index so that I can amrecover from them?  Is that what amreindex does?
Is that what amrestore does?


What I'd do is recover the last files amanda backed up from that disk,
using amrestore. I'd restore to a disk, consider that the perpetual
backup, and not try to get that old disk data anywhere in amanmda's
database -- amanda is very much oriented to reusing things in a cycle,
and trying to get her to change her ways can be difficult.

amrestore's a pleasant piece of software to use. You just tell it the
date you want to restore, the disk, the files, and some other things (I
use it infrequently, and I have to read the man page every time).
amrestore figures out which tapes you need, and restores the data.

Then you can do what you want with them -- burn to optical, buy a new
disk, whatever.

I would suggest the same approach myself.  In fact, that's pretty much 
what we do where I work.  Whenever we permanently decomission a system, 
it gets pulled from the backup rotation, and we image the disk and store 
the disk image in archival storage that's separate from the storage we 
use for regular backups.  Our procedure is similar for a failed disk we 
don't plan to replace, except instead of imaging it as-is, we rebuild it 
from backups and then image it (the imaging procedure was the norm 
before we switched to amanda, so it's just kind of stuck around).


Re: application amgtar ignore messages

2017-12-08 Thread Austin S. Hemmelgarn

On 2017-12-07 22:26, Jon LaBadie wrote:

If I want amgtar to ignore certain messages, is it
sufficient to list the message on the amanda server
or must the ignored message also be listed in
amanda-client.conf?

I've done it several times, only on the server, and
it seemed to work fine.  But I'm now trying to ignore
one message that appears on only one client and I'm
having no success.

Do I need to set up an "application amgtar" stanza
on the client?

Doesn't affect the question, but the problem is
caused by the "gnome virtual file system directory",
/home/user/.config/.gvfs.  This is a fuser mountpoint
not accessible by root.  So it generates a "can not
stat" error message from amgtar.

The better approach to this is to add that to the exclude file for that 
particular disk.  It's a well known path, so nothing else should be 
using it, and it's an area that shouldn't be dumped anyway, for a lot of 
the same reasons you shouldn't be dumping /sys or /dev/shm (and in fact, 
it isn't getting dumped, because amgtar can't see inside it).


Re: Odd non-fatal errors in amdump reports.

2017-11-16 Thread Austin S. Hemmelgarn

On 2017-11-14 14:37, Austin S. Hemmelgarn wrote:

On 2017-11-14 07:43, Austin S. Hemmelgarn wrote:

On 2017-11-14 07:34, Austin S. Hemmelgarn wrote:

On 2017-11-13 16:42, Jean-Louis Martineau wrote:

On 13/11/17 02:53 PM, Austin S. Hemmelgarn wrote:

driver: send-cmd time 9300.326 to taper1: VAULT-WRITE worker1-0 
00-00120 local-vtl local-vtl Home-0001 client0 /home/1D 0 
20171113073255 "" "" "" "" 1073741824 memory "" "" 0


 > FAIL taper "ST:cloud" "POOL:cloud" client0 /home/1D 
20171113073255 0 error "File 0 not found"


Do that dump still exists on tape Home-0001? Find it with amfetchdump.

If yes, send me the taper debug file.
amfetchdump does not see it, but looking directly at the virtual tape 
directories, I can see it there.


Just tried an amcheckdump on everything, it looks like some of the 
dump files are corrupted, but I can't for the life of me figure out 
why (I test our network regularly and it has no problems, and any 
problems with a particular system should show up as more than just 
corrupted tar files).  I'm going to try disabling compression and see 
if that helps at all, as that's the only processing other than the 
default that we're doing on the dumps (long term, it's not really a 
viable option, but if it fixes things at least we know what's broken).
No luck changing compression.  I would suspect some issue with NFS, but 
I've started seeing the same symptoms on my laptop as well now (which is 
completely unrelated to any of the sets at work other than having an 
almost identical configuration other than paths and the total number of 
tapes).


So, I finally got things working by switching from:

storage "local-vtl"
vault-storage "cloud"

To:

storage: "local-vtl" "cloud"

And removing the "vault" option from the local-vtl storage definition. 
Strictly speaking, this is working around the issue instead of fixing 
it, but it fits within what we need for our usage, and actually makes 
the amdump runs complete faster (since dumps get taped to S3 in parallel 
with getting taped to the local vtapes).


Based on this, and the fact that the issues I was seeing with corrupted 
dumps being reported by amcheckdump, I think the issue is probably an 
interaction between the vaulting code and the regular taping code, but 
I'm not certain.


Thanks for the help.


Re: Odd non-fatal errors in amdump reports.

2017-11-14 Thread Austin S. Hemmelgarn

On 2017-11-14 07:43, Austin S. Hemmelgarn wrote:

On 2017-11-14 07:34, Austin S. Hemmelgarn wrote:

On 2017-11-13 16:42, Jean-Louis Martineau wrote:

On 13/11/17 02:53 PM, Austin S. Hemmelgarn wrote:

driver: send-cmd time 9300.326 to taper1: VAULT-WRITE worker1-0 
00-00120 local-vtl local-vtl Home-0001 client0 /home/1D 0 
20171113073255 "" "" "" "" 1073741824 memory "" "" 0


 > FAIL taper "ST:cloud" "POOL:cloud" client0 /home/1D 20171113073255 
0 error "File 0 not found"


Do that dump still exists on tape Home-0001? Find it with amfetchdump.

If yes, send me the taper debug file.
amfetchdump does not see it, but looking directly at the virtual tape 
directories, I can see it there.


Just tried an amcheckdump on everything, it looks like some of the dump 
files are corrupted, but I can't for the life of me figure out why (I 
test our network regularly and it has no problems, and any problems with 
a particular system should show up as more than just corrupted tar 
files).  I'm going to try disabling compression and see if that helps at 
all, as that's the only processing other than the default that we're 
doing on the dumps (long term, it's not really a viable option, but if 
it fixes things at least we know what's broken).
No luck changing compression.  I would suspect some issue with NFS, but 
I've started seeing the same symptoms on my laptop as well now (which is 
completely unrelated to any of the sets at work other than having an 
almost identical configuration other than paths and the total number of 
tapes).


Re: Odd non-fatal errors in amdump reports.

2017-11-14 Thread Austin S. Hemmelgarn

On 2017-11-14 07:34, Austin S. Hemmelgarn wrote:

On 2017-11-13 16:42, Jean-Louis Martineau wrote:

On 13/11/17 02:53 PM, Austin S. Hemmelgarn wrote:

driver: send-cmd time 9300.326 to taper1: VAULT-WRITE worker1-0 
00-00120 local-vtl local-vtl Home-0001 client0 /home/1D 0 
20171113073255 "" "" "" "" 1073741824 memory "" "" 0


 > FAIL taper "ST:cloud" "POOL:cloud" client0 /home/1D 20171113073255 
0 error "File 0 not found"


Do that dump still exists on tape Home-0001? Find it with amfetchdump.

If yes, send me the taper debug file.
amfetchdump does not see it, but looking directly at the virtual tape 
directories, I can see it there.


Just tried an amcheckdump on everything, it looks like some of the dump 
files are corrupted, but I can't for the life of me figure out why (I 
test our network regularly and it has no problems, and any problems with 
a particular system should show up as more than just corrupted tar 
files).  I'm going to try disabling compression and see if that helps at 
all, as that's the only processing other than the default that we're 
doing on the dumps (long term, it's not really a viable option, but if 
it fixes things at least we know what's broken).


Re: Odd non-fatal errors in amdump reports.

2017-11-14 Thread Austin S. Hemmelgarn

On 2017-11-13 16:42, Jean-Louis Martineau wrote:

On 13/11/17 02:53 PM, Austin S. Hemmelgarn wrote:

driver: send-cmd time 9300.326 to taper1: VAULT-WRITE worker1-0 00-00120 
local-vtl local-vtl Home-0001 client0 /home/1D 0 20171113073255 "" "" "" 
"" 1073741824 memory "" "" 0


 > FAIL taper "ST:cloud" "POOL:cloud" client0 /home/1D 20171113073255 0 
error "File 0 not found"


Do that dump still exists on tape Home-0001? Find it with amfetchdump.

If yes, send me the taper debug file.
amfetchdump does not see it, but looking directly at the virtual tape 
directories, I can see it there.




Re: power down hard drives

2017-11-13 Thread Austin S. Hemmelgarn

On 2017-11-13 14:51, Jon LaBadie wrote:

On Mon, Nov 13, 2017 at 02:04:42PM -0500, Gene Heskett wrote:

On Monday 13 November 2017 13:42:13 Jon LaBadie wrote:


On Mon, Nov 13, 2017 at 11:40:17AM -0500, Austin S. Hemmelgarn wrote:

On 2017-11-13 11:11, Gene Heskett wrote:

On Monday 13 November 2017 10:12:47 Austin S. Hemmelgarn wrote:

On 2017-11-13 09:56, Gene Heskett wrote:

On Monday 13 November 2017 07:19:45 Austin S. Hemmelgarn wrote:

On 2017-11-11 01:49, Jon LaBadie wrote:

Just a thought.  My amanda server has seven hard drives
dedicated to saving amanda data.  Only 2 are typically
used (holding and one vtape drive) during an amdump run.
Even then, the usage is only for about 3 hours.

So there is a lot of electricity and disk drive wear for
inactive drives.

Can todays drives be unmounted and powered down then
when needed, powered up and mounted again?

I'm not talking about system hibernation, the system
and its other drives still need to be active.

Back when 300GB was a big drive I had 2 of them in
external USB housings.  They shut themselves down
on inactivity.  When later accessed, there would
be about 5-10 seconds delay while the drive spun
up and things proceeded normally.

That would be a fine arrangement now if it could
be mimiced.


Aside from what Stefan mentioned (using hdparam to set the
standby timeout, check the man page for hdparam as the
numbers are not exactly sensible), you may consider looking
into auto-mounting each of the drives, as that can help
eliminate things that would keep the drives on-line (or make
it more obvious that something is still using them).


...


But if I allow the 2TB to be  unmounted and self-powered down,
once daily, what shortening of its life would I be subjected to?
In other words, how many start-stop cycles can it survive?


It's hard to be certain.  For what it's worth though, you might want
to test this to be certain that it's actually going to save you
energy.  It takes a lot of power to get the platters up to speed,
but it doesn't take much to keep them running at that speed.  It
might be more advantageous to just configure the device to idle
(that is, park the heads) after some time out and leave the platters
spinning instead of spinning down completely (and it should result
in less wear on the spindle motor).


In my situation, each of the six data drives is only
needed for a 2 week period out of each 12 weeks.  Once
shutdown, it could be down for 10 weeks.

Jon


Which is more than enough time for stiction to appear if the heads are
not parked off disk.


Don't today's drives automatically park heads?
I don't think there were ever any (at least, not ATA or SAS) that didn't 
when they went into standby.  In fact, I've never seen a modern style 
hard disk with 'voice coil' style actuators that didn't automatically 
park the heads (and part of my job is tearing apart old hard drives 
prior to physical media destruction, so I've seen my fair share of them).


Re: Odd non-fatal errors in amdump reports.

2017-11-13 Thread Austin S. Hemmelgarn

On 2017-11-10 12:52, Jean-Louis Martineau wrote:

The previous patch broke something.
Try this new set2-r2.diff patch


Unfortunately, that doesn't appear to have fixed it, though the errors 
look different now.  I'll try and get the log scrubbed by the end of the 
day and post it here.


On 10/11/17 10:40 AM, Austin S. Hemmelgarn wrote:
 > On 2017-11-10 08:27, Jean-Louis Martineau wrote:
 >> On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote:
 >>> On 2017-11-08 08:03, Jean-Louis Martineau wrote:
 >>>> On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
 >>>> > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
 >>>> >> Austin,
 >>>> >>
 >>>> >> It's hard to say something with only the error message.
 >>>> >>
 >>>> >> Can you post the amdump. and log..0 for
 >>>> the 2
 >>>> >> backup set that fail.
 >>>> >>
 >>>> > I've attached the files (I would put them inline, but one of the
 >>>> sets
 >>>> > has over 100 DLE's, so the amdump file is huge, and the others are
 >>>> > still over 100k each, and I figured nobody want's to try and wad
 >>>> > through those in-line).
 >>>> >
 >>>> > The set1 and set2 files are for the two backup sets that show the
 >>>> > header mismatch error, and the set3 files are for the one that
 >>>> claims
 >>>> > failures in the dump summary.
 >>>>
 >>>>
 >>>> I looked at set3, the error in the 'DUMP SUMMARY' are related to the
 >>>> error in the 'FAILURE DUMP SUMMARY'
 >>>>
 >>>> client2 /boot lev 0 FLUSH [File 0 not found]
 >>>> client3 /boot lev 0 FLUSH [File 0 not found]
 >>>> client7 /boot lev 0 FLUSH [File 0 not found]
 >>>> client8 /boot lev 0 FLUSH [File 0 not found]
 >>>> client0 /boot lev 0 FLUSH [File 0 not found]
 >>>> client9 /boot lev 0 FLUSH [File 0 not found]
 >>>> client9 /srv lev 0 FLUSH [File 0 not found]
 >>>> client9 /var lev 0 FLUSH [File 0 not found]
 >>>> server0 /boot lev 0 FLUSH [File 0 not found]
 >>>> client10 /boot lev 0 FLUSH [File 0 not found]
 >>>> client11 /boot lev 0 FLUSH [File 0 not found]
 >>>> client12 /boot lev 0 FLUSH [File 0 not found]
 >>>>
 >>>> They are VAULT attemp, not FLUSH, looking only at the first entry, it
 >>>> try to vault 'client2 /boot 0 20171024084159' which it expect to
 >>>> find on
 >>>> tape Server-01. It is an older dump.
 >>>>
 >>>> Do Server-01 is still there? Did it still contains the dump?
 >>>>
 >>> OK, I've done some further investigation by tweaking the labeling a
 >>> bit (which actually fixed a purely cosmetic issue we were having),
 >>> but I'm still seeing the same problem that prompted this thread, and
 >>> I can confirm that the dumps are where Amanda is trying to look for
 >>> them, it's just not seeing them for some reason. I hadn't thought
 >>> of this before, but could it have something to do with the virtual
 >>> tape library being auto-mounted over NFS on the backup server?
 >>>
 >> Austin,
 >>
 >> Can you try to see if amfetchdump can restore it?
 >>
 >> * amfetchdump CONFIG client2 /boot 20171024084159
 >>
 > amfetchdump doesn't see it, and neither does amrecover, but the files
 > for the given parts are definitely there (I know for a fact that the
 > dump in question has exactly one part, and the file for that does
 > exist on the virtual tape mentioned in the log file).
 >
 > I'm probably not going to be able to check more on this today, but
 > I'll likely be checking if amrestore and amadmin find can see them.
 >


Re: power down hard drives

2017-11-13 Thread Austin S. Hemmelgarn

On 2017-11-13 11:11, Gene Heskett wrote:

On Monday 13 November 2017 10:12:47 Austin S. Hemmelgarn wrote:


On 2017-11-13 09:56, Gene Heskett wrote:

On Monday 13 November 2017 07:19:45 Austin S. Hemmelgarn wrote:

On 2017-11-11 01:49, Jon LaBadie wrote:

Just a thought.  My amanda server has seven hard drives
dedicated to saving amanda data.  Only 2 are typically
used (holding and one vtape drive) during an amdump run.
Even then, the usage is only for about 3 hours.

So there is a lot of electricity and disk drive wear for
inactive drives.

Can todays drives be unmounted and powered down then
when needed, powered up and mounted again?

I'm not talking about system hibernation, the system
and its other drives still need to be active.

Back when 300GB was a big drive I had 2 of them in
external USB housings.  They shut themselves down
on inactivity.  When later accessed, there would
be about 5-10 seconds delay while the drive spun
up and things proceeded normally.

That would be a fine arrangement now if it could
be mimiced.


Aside from what Stefan mentioned (using hdparam to set the standby
timeout, check the man page for hdparam as the numbers are not
exactly sensible), you may consider looking into auto-mounting each
of the drives, as that can help eliminate things that would keep
the drives on-line (or make it more obvious that something is still
using them).


I've investigated that, and I have amanda wrapped up in a script
that could do that, but ran into a showstopper I've long since
forgotten about.  Al this was back in the time I was writing that
wrapper, years ago now. One of the show stoppers AIR was the fact
that only root can mount and unmount a drive, and my script runs as
amanda.


While such a wrapper might work if you use sudo inside it (you can
configure sudo to allow root to run things as the amanda user without
needing a password, then run the wrapper as root), what I was trying
to refer to in a system-agnostic manner (since the exact mechanism is
different between different UNIX derivatives) was on-demand
auto-mounting, as provided by autofs on Linux or the auto-mount daemon
(amd) on BSD.  When doing on-demand auto-mounting, you don't need a
wrapper at all, as the access attempt will trigger the mount, and then
the mount will time out after some period of inactivity and be
unmounted again.  It's mostly used for network resources (possibly
with special auto-lookup mechanisms), as certain protocols (NFS in
particular) tend to have issues if the server goes down while a share
is mounted remotely, even if nothing is happening on that share, but
it works just as well for auto-mounting of local fixed or removable
volumes that aren't needed all the time (I use it for a handful of
things on my personal systems to minimize idle resource usage).


Sounds good perhaps. I am currently up to my eyeballs in an unrelated
problem, and I won't get to this again until that project is completed
and I have brought the 2TB drive in and configured it for amanda's
usage. That will tend to enforce my one thing at a time but do it right
bent. :)  What I have is working for a loose definition of working...
Yeah, I know what that's like.  Prior to switching to amanda where I 
worked, we had a home-grown backup system that had all kinds of odd edge 
cases I had to make sure never happened.  I'm extremely glad we decided 
to stop using that, since it means I can now focus on more interesting 
problems (in theory at least, we're having an issue with our Amanda 
config right now too, but thankfully it's not a huge one).


But if I allow the 2TB to be  unmounted and self-powered down, once
daily, what shortening of its life would I be subjected to?  In other
words, how many start-stop cycles can it survive?
It's hard to be certain.  For what it's worth though, you might want to 
test this to be certain that it's actually going to save you energy.  It 
takes a lot of power to get the platters up to speed, but it doesn't 
take much to keep them running at that speed.  It might be more 
advantageous to just configure the device to idle (that is, park the 
heads) after some time out and leave the platters spinning instead of 
spinning down completely (and it should result in less wear on the 
spindle motor).


Interesting, I had started a long time test yesterday, and the reported
hours has wrapped in the report, apparently at 65636 hours. Somebody
apparently didn't expect a drive to last that long? ;-)  The drive?
Healthy as can be.
That's about 7.48 years, so I can actually somewhat understand not going 
past 16-bits for that since most people don't use a given disk for more 
than about 5 years worth of power-on time before replacing it.  However, 
what matters is really not how long the device has been powered on, but 
how much abuse the drive has taken.  Running 24/7 for 5 years with no 
movement of the system (including nothing like earthquakes), in a 
temperature, humidity, and pressure controlled room will get

Re: power down hard drives

2017-11-13 Thread Austin S. Hemmelgarn

On 2017-11-13 09:56, Gene Heskett wrote:

On Monday 13 November 2017 07:19:45 Austin S. Hemmelgarn wrote:


On 2017-11-11 01:49, Jon LaBadie wrote:

Just a thought.  My amanda server has seven hard drives
dedicated to saving amanda data.  Only 2 are typically
used (holding and one vtape drive) during an amdump run.
Even then, the usage is only for about 3 hours.

So there is a lot of electricity and disk drive wear for
inactive drives.

Can todays drives be unmounted and powered down then
when needed, powered up and mounted again?

I'm not talking about system hibernation, the system
and its other drives still need to be active.

Back when 300GB was a big drive I had 2 of them in
external USB housings.  They shut themselves down
on inactivity.  When later accessed, there would
be about 5-10 seconds delay while the drive spun
up and things proceeded normally.

That would be a fine arrangement now if it could
be mimiced.


Aside from what Stefan mentioned (using hdparam to set the standby
timeout, check the man page for hdparam as the numbers are not exactly
sensible), you may consider looking into auto-mounting each of the
drives, as that can help eliminate things that would keep the drives
on-line (or make it more obvious that something is still using them).


I've investigated that, and I have amanda wrapped up in a script that
could do that, but ran into a showstopper I've long since forgotten
about.  Al this was back in the time I was writing that wrapper, years
ago now. One of the show stoppers AIR was the fact that only root can
mount and unmount a drive, and my script runs as amanda.

While such a wrapper might work if you use sudo inside it (you can 
configure sudo to allow root to run things as the amanda user without 
needing a password, then run the wrapper as root), what I was trying to 
refer to in a system-agnostic manner (since the exact mechanism is 
different between different UNIX derivatives) was on-demand 
auto-mounting, as provided by autofs on Linux or the auto-mount daemon 
(amd) on BSD.  When doing on-demand auto-mounting, you don't need a 
wrapper at all, as the access attempt will trigger the mount, and then 
the mount will time out after some period of inactivity and be unmounted 
again.  It's mostly used for network resources (possibly with special 
auto-lookup mechanisms), as certain protocols (NFS in particular) tend 
to have issues if the server goes down while a share is mounted 
remotely, even if nothing is happening on that share, but it works just 
as well for auto-mounting of local fixed or removable volumes that 
aren't needed all the time (I use it for a handful of things on my 
personal systems to minimize idle resource usage).


Re: Odd non-fatal errors in amdump reports.

2017-11-13 Thread Austin S. Hemmelgarn

On 2017-11-10 12:52, Jean-Louis Martineau wrote:

The previous patch broke something.
Try this new set2-r2.diff patch
Given that the switch to NFSv4 combined with a change to the labeling 
scheme fixed the other issue, I'm going to re-test these two sets with 
the same changes before I test the patch just so I've got something 
current to compare against.  I should have results from that later 
today, and will likely be testing this patch tomorrow if things aren't 
resolved by the other changes (and based on what you've said and what 
I've seen, I don't think the switch to NFSv4 or the labeling change will 
fix this one).


Jean-Louis

On 10/11/17 10:40 AM, Austin S. Hemmelgarn wrote:
 > On 2017-11-10 08:27, Jean-Louis Martineau wrote:
 >> On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote:
 >>> On 2017-11-08 08:03, Jean-Louis Martineau wrote:
 >>>> On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
 >>>> > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
 >>>> >> Austin,
 >>>> >>
 >>>> >> It's hard to say something with only the error message.
 >>>> >>
 >>>> >> Can you post the amdump. and log..0 for
 >>>> the 2
 >>>> >> backup set that fail.
 >>>> >>
 >>>> > I've attached the files (I would put them inline, but one of the
 >>>> sets
 >>>> > has over 100 DLE's, so the amdump file is huge, and the others are
 >>>> > still over 100k each, and I figured nobody want's to try and wad
 >>>> > through those in-line).
 >>>> >
 >>>> > The set1 and set2 files are for the two backup sets that show the
 >>>> > header mismatch error, and the set3 files are for the one that
 >>>> claims
 >>>> > failures in the dump summary.
 >>>>
 >>>>
 >>>> I looked at set3, the error in the 'DUMP SUMMARY' are related to the
 >>>> error in the 'FAILURE DUMP SUMMARY'
 >>>>
 >>>> client2 /boot lev 0 FLUSH [File 0 not found]
 >>>> client3 /boot lev 0 FLUSH [File 0 not found]
 >>>> client7 /boot lev 0 FLUSH [File 0 not found]
 >>>> client8 /boot lev 0 FLUSH [File 0 not found]
 >>>> client0 /boot lev 0 FLUSH [File 0 not found]
 >>>> client9 /boot lev 0 FLUSH [File 0 not found]
 >>>> client9 /srv lev 0 FLUSH [File 0 not found]
 >>>> client9 /var lev 0 FLUSH [File 0 not found]
 >>>> server0 /boot lev 0 FLUSH [File 0 not found]
 >>>> client10 /boot lev 0 FLUSH [File 0 not found]
 >>>> client11 /boot lev 0 FLUSH [File 0 not found]
 >>>> client12 /boot lev 0 FLUSH [File 0 not found]
 >>>>
 >>>> They are VAULT attemp, not FLUSH, looking only at the first entry, it
 >>>> try to vault 'client2 /boot 0 20171024084159' which it expect to
 >>>> find on
 >>>> tape Server-01. It is an older dump.
 >>>>
 >>>> Do Server-01 is still there? Did it still contains the dump?
 >>>>
 >>> OK, I've done some further investigation by tweaking the labeling a
 >>> bit (which actually fixed a purely cosmetic issue we were having),
 >>> but I'm still seeing the same problem that prompted this thread, and
 >>> I can confirm that the dumps are where Amanda is trying to look for
 >>> them, it's just not seeing them for some reason. I hadn't thought
 >>> of this before, but could it have something to do with the virtual
 >>> tape library being auto-mounted over NFS on the backup server?
 >>>
 >> Austin,
 >>
 >> Can you try to see if amfetchdump can restore it?
 >>
 >> * amfetchdump CONFIG client2 /boot 20171024084159
 >>
 > amfetchdump doesn't see it, and neither does amrecover, but the files
 > for the given parts are definitely there (I know for a fact that the
 > dump in question has exactly one part, and the file for that does
 > exist on the virtual tape mentioned in the log file).
 >
 > I'm probably not going to be able to check more on this today, but
 > I'll likely be checking if amrestore and amadmin find can see them.
 >


Re: Odd non-fatal errors in amdump reports.

2017-11-13 Thread Austin S. Hemmelgarn

On 2017-11-10 08:45, Austin S. Hemmelgarn wrote:

On 2017-11-10 08:27, Jean-Louis Martineau wrote:

On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote:

On 2017-11-08 08:03, Jean-Louis Martineau wrote:

On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
 > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
 >> Austin,
 >>
 >> It's hard to say something with only the error message.
 >>
 >> Can you post the amdump. and log..0 for the 2
 >> backup set that fail.
 >>
 > I've attached the files (I would put them inline, but one of the 
sets

 > has over 100 DLE's, so the amdump file is huge, and the others are
 > still over 100k each, and I figured nobody want's to try and wad
 > through those in-line).
 >
 > The set1 and set2 files are for the two backup sets that show the
 > header mismatch error, and the set3 files are for the one that 
claims

 > failures in the dump summary.


I looked at set3, the error in the 'DUMP SUMMARY' are related to the
error in the 'FAILURE DUMP SUMMARY'

client2 /boot lev 0 FLUSH [File 0 not found]
client3 /boot lev 0 FLUSH [File 0 not found]
client7 /boot lev 0 FLUSH [File 0 not found]
client8 /boot lev 0 FLUSH [File 0 not found]
client0 /boot lev 0 FLUSH [File 0 not found]
client9 /boot lev 0 FLUSH [File 0 not found]
client9 /srv lev 0 FLUSH [File 0 not found]
client9 /var lev 0 FLUSH [File 0 not found]
server0 /boot lev 0 FLUSH [File 0 not found]
client10 /boot lev 0 FLUSH [File 0 not found]
client11 /boot lev 0 FLUSH [File 0 not found]
client12 /boot lev 0 FLUSH [File 0 not found]

They are VAULT attemp, not FLUSH, looking only at the first entry, it
try to vault 'client2 /boot 0 20171024084159' which it expect to 
find on

tape Server-01. It is an older dump.

Do Server-01 is still there? Did it still contains the dump?

OK, I've done some further investigation by tweaking the labeling a 
bit (which actually fixed a purely cosmetic issue we were having), 
but I'm still seeing the same problem that prompted this thread, and 
I can confirm that the dumps are where Amanda is trying to look for 
them, it's just not seeing them for some reason.  I hadn't thought of 
this before, but could it have something to do with the virtual tape 
library being auto-mounted over NFS on the backup server?



Austin,

Can you try to see if amfetchdump can restore it?

  * amfetchdump CONFIG client2 /boot 20171024084159
At the moment, I'm re-testing things after tweaking some NFS parameters 
for the virtual tape library (apparently the FreeNAS server that's 
actually storing the data didn't have NFSv4 turned on, so it was mounted 
with NFSv3, which we've had issues with before on our network), so I 
can't exactly check immediately, but assuming the problem repeats, I'll 
do that first thing once the test dump is done.


It looks like the combination of fixing the incorrect labeling in the 
config and switching to NFSv4 fixed this particular case.


Re: power down hard drives

2017-11-13 Thread Austin S. Hemmelgarn

On 2017-11-11 01:49, Jon LaBadie wrote:

Just a thought.  My amanda server has seven hard drives
dedicated to saving amanda data.  Only 2 are typically
used (holding and one vtape drive) during an amdump run.
Even then, the usage is only for about 3 hours.

So there is a lot of electricity and disk drive wear for
inactive drives.

Can todays drives be unmounted and powered down then
when needed, powered up and mounted again?

I'm not talking about system hibernation, the system
and its other drives still need to be active.

Back when 300GB was a big drive I had 2 of them in
external USB housings.  They shut themselves down
on inactivity.  When later accessed, there would
be about 5-10 seconds delay while the drive spun
up and things proceeded normally.

That would be a fine arrangement now if it could
be mimiced.
Aside from what Stefan mentioned (using hdparam to set the standby 
timeout, check the man page for hdparam as the numbers are not exactly 
sensible), you may consider looking into auto-mounting each of the 
drives, as that can help eliminate things that would keep the drives 
on-line (or make it more obvious that something is still using them).




Re: Odd non-fatal errors in amdump reports.

2017-11-10 Thread Austin S. Hemmelgarn

On 2017-11-10 08:27, Jean-Louis Martineau wrote:

On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote:

On 2017-11-08 08:03, Jean-Louis Martineau wrote:

On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
 > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
 >> Austin,
 >>
 >> It's hard to say something with only the error message.
 >>
 >> Can you post the amdump. and log..0 for the 2
 >> backup set that fail.
 >>
 > I've attached the files (I would put them inline, but one of the sets
 > has over 100 DLE's, so the amdump file is huge, and the others are
 > still over 100k each, and I figured nobody want's to try and wad
 > through those in-line).
 >
 > The set1 and set2 files are for the two backup sets that show the
 > header mismatch error, and the set3 files are for the one that claims
 > failures in the dump summary.


I looked at set3, the error in the 'DUMP SUMMARY' are related to the
error in the 'FAILURE DUMP SUMMARY'

client2 /boot lev 0 FLUSH [File 0 not found]
client3 /boot lev 0 FLUSH [File 0 not found]
client7 /boot lev 0 FLUSH [File 0 not found]
client8 /boot lev 0 FLUSH [File 0 not found]
client0 /boot lev 0 FLUSH [File 0 not found]
client9 /boot lev 0 FLUSH [File 0 not found]
client9 /srv lev 0 FLUSH [File 0 not found]
client9 /var lev 0 FLUSH [File 0 not found]
server0 /boot lev 0 FLUSH [File 0 not found]
client10 /boot lev 0 FLUSH [File 0 not found]
client11 /boot lev 0 FLUSH [File 0 not found]
client12 /boot lev 0 FLUSH [File 0 not found]

They are VAULT attemp, not FLUSH, looking only at the first entry, it
try to vault 'client2 /boot 0 20171024084159' which it expect to find on
tape Server-01. It is an older dump.

Do Server-01 is still there? Did it still contains the dump?

OK, I've done some further investigation by tweaking the labeling a 
bit (which actually fixed a purely cosmetic issue we were having), but 
I'm still seeing the same problem that prompted this thread, and I can 
confirm that the dumps are where Amanda is trying to look for them, 
it's just not seeing them for some reason.  I hadn't thought of this 
before, but could it have something to do with the virtual tape 
library being auto-mounted over NFS on the backup server?



Austin,

Can you try to see if amfetchdump can restore it?

  * amfetchdump CONFIG client2 /boot 20171024084159

amfetchdump doesn't see it, and neither does amrecover, but the files 
for the given parts are definitely there (I know for a fact that the 
dump in question has exactly one part, and the file for that does exist 
on the virtual tape mentioned in the log file).


I'm probably not going to be able to check more on this today, but I'll 
likely be checking if amrestore and amadmin find can see them.


Re: Odd non-fatal errors in amdump reports.

2017-11-10 Thread Austin S. Hemmelgarn

On 2017-11-10 10:00, Jean-Louis Martineau wrote:

Austin,

Can you try the attached patch, I think it could fix the set1 and set2
errors.

Yes, but I won't be able to log in this weekend to revert it if it 
doesn't work, so I won't be able to test it until Monday.


Am I correct in assuming that it only needs to be applied on the server 
and not the clients?


On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
 > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
 >> Austin,
 >>
 >> It's hard to say something with only the error message.
 >>
 >> Can you post the amdump. and log..0 for the 2
 >> backup set that fail.
 >>
 > I've attached the files (I would put them inline, but one of the sets
 > has over 100 DLE's, so the amdump file is huge, and the others are
 > still over 100k each, and I figured nobody want's to try and wad
 > through those in-line).
 >
 > The set1 and set2 files are for the two backup sets that show the
 > header mismatch error, and the set3 files are for the one that claims
 > failures in the dump summary.




Re: Odd non-fatal errors in amdump reports.

2017-11-10 Thread Austin S. Hemmelgarn

On 2017-11-10 08:27, Jean-Louis Martineau wrote:

On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote:

On 2017-11-08 08:03, Jean-Louis Martineau wrote:

On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
 > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
 >> Austin,
 >>
 >> It's hard to say something with only the error message.
 >>
 >> Can you post the amdump. and log..0 for the 2
 >> backup set that fail.
 >>
 > I've attached the files (I would put them inline, but one of the sets
 > has over 100 DLE's, so the amdump file is huge, and the others are
 > still over 100k each, and I figured nobody want's to try and wad
 > through those in-line).
 >
 > The set1 and set2 files are for the two backup sets that show the
 > header mismatch error, and the set3 files are for the one that claims
 > failures in the dump summary.


I looked at set3, the error in the 'DUMP SUMMARY' are related to the
error in the 'FAILURE DUMP SUMMARY'

client2 /boot lev 0 FLUSH [File 0 not found]
client3 /boot lev 0 FLUSH [File 0 not found]
client7 /boot lev 0 FLUSH [File 0 not found]
client8 /boot lev 0 FLUSH [File 0 not found]
client0 /boot lev 0 FLUSH [File 0 not found]
client9 /boot lev 0 FLUSH [File 0 not found]
client9 /srv lev 0 FLUSH [File 0 not found]
client9 /var lev 0 FLUSH [File 0 not found]
server0 /boot lev 0 FLUSH [File 0 not found]
client10 /boot lev 0 FLUSH [File 0 not found]
client11 /boot lev 0 FLUSH [File 0 not found]
client12 /boot lev 0 FLUSH [File 0 not found]

They are VAULT attemp, not FLUSH, looking only at the first entry, it
try to vault 'client2 /boot 0 20171024084159' which it expect to find on
tape Server-01. It is an older dump.

Do Server-01 is still there? Did it still contains the dump?

OK, I've done some further investigation by tweaking the labeling a 
bit (which actually fixed a purely cosmetic issue we were having), but 
I'm still seeing the same problem that prompted this thread, and I can 
confirm that the dumps are where Amanda is trying to look for them, 
it's just not seeing them for some reason.  I hadn't thought of this 
before, but could it have something to do with the virtual tape 
library being auto-mounted over NFS on the backup server?



Austin,

Can you try to see if amfetchdump can restore it?

  * amfetchdump CONFIG client2 /boot 20171024084159
At the moment, I'm re-testing things after tweaking some NFS parameters 
for the virtual tape library (apparently the FreeNAS server that's 
actually storing the data didn't have NFSv4 turned on, so it was mounted 
with NFSv3, which we've had issues with before on our network), so I 
can't exactly check immediately, but assuming the problem repeats, I'll 
do that first thing once the test dump is done.


Re: Odd non-fatal errors in amdump reports.

2017-11-10 Thread Austin S. Hemmelgarn

On 2017-11-08 08:03, Jean-Louis Martineau wrote:

On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
 > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
 >> Austin,
 >>
 >> It's hard to say something with only the error message.
 >>
 >> Can you post the amdump. and log..0 for the 2
 >> backup set that fail.
 >>
 > I've attached the files (I would put them inline, but one of the sets
 > has over 100 DLE's, so the amdump file is huge, and the others are
 > still over 100k each, and I figured nobody want's to try and wad
 > through those in-line).
 >
 > The set1 and set2 files are for the two backup sets that show the
 > header mismatch error, and the set3 files are for the one that claims
 > failures in the dump summary.


I looked at set3, the error in the 'DUMP SUMMARY' are related to the
error in the 'FAILURE DUMP SUMMARY'

client2 /boot lev 0 FLUSH [File 0 not found]
client3 /boot lev 0 FLUSH [File 0 not found]
client7 /boot lev 0 FLUSH [File 0 not found]
client8 /boot lev 0 FLUSH [File 0 not found]
client0 /boot lev 0 FLUSH [File 0 not found]
client9 /boot lev 0 FLUSH [File 0 not found]
client9 /srv lev 0 FLUSH [File 0 not found]
client9 /var lev 0 FLUSH [File 0 not found]
server0 /boot lev 0 FLUSH [File 0 not found]
client10 /boot lev 0 FLUSH [File 0 not found]
client11 /boot lev 0 FLUSH [File 0 not found]
client12 /boot lev 0 FLUSH [File 0 not found]

They are VAULT attemp, not FLUSH, looking only at the first entry, it
try to vault 'client2 /boot 0 20171024084159' which it expect to find on
tape Server-01. It is an older dump.

Do Server-01 is still there? Did it still contains the dump?

OK, I've done some further investigation by tweaking the labeling a bit 
(which actually fixed a purely cosmetic issue we were having), but I'm 
still seeing the same problem that prompted this thread, and I can 
confirm that the dumps are where Amanda is trying to look for them, it's 
just not seeing them for some reason.  I hadn't thought of this before, 
but could it have something to do with the virtual tape library being 
auto-mounted over NFS on the backup server?


Re: Odd non-fatal errors in amdump reports.

2017-11-08 Thread Austin S. Hemmelgarn

On 2017-11-08 08:03, Jean-Louis Martineau wrote:

On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote:
 > On 2017-11-07 10:22, Jean-Louis Martineau wrote:
 >> Austin,
 >>
 >> It's hard to say something with only the error message.
 >>
 >> Can you post the amdump. and log..0 for the 2
 >> backup set that fail.
 >>
 > I've attached the files (I would put them inline, but one of the sets
 > has over 100 DLE's, so the amdump file is huge, and the others are
 > still over 100k each, and I figured nobody want's to try and wad
 > through those in-line).
 >
 > The set1 and set2 files are for the two backup sets that show the
 > header mismatch error, and the set3 files are for the one that claims
 > failures in the dump summary.


I looked at set3, the error in the 'DUMP SUMMARY' are related to the
error in the 'FAILURE DUMP SUMMARY'

client2 /boot lev 0 FLUSH [File 0 not found]
client3 /boot lev 0 FLUSH [File 0 not found]
client7 /boot lev 0 FLUSH [File 0 not found]
client8 /boot lev 0 FLUSH [File 0 not found]
client0 /boot lev 0 FLUSH [File 0 not found]
client9 /boot lev 0 FLUSH [File 0 not found]
client9 /srv lev 0 FLUSH [File 0 not found]
client9 /var lev 0 FLUSH [File 0 not found]
server0 /boot lev 0 FLUSH [File 0 not found]
client10 /boot lev 0 FLUSH [File 0 not found]
client11 /boot lev 0 FLUSH [File 0 not found]
client12 /boot lev 0 FLUSH [File 0 not found]

They are VAULT attemp, not FLUSH, looking only at the first entry, it
try to vault 'client2 /boot 0 20171024084159' which it expect to find on
tape Server-01. It is an older dump.

Do Server-01 is still there? Did it still contains the dump?

Hmm, looks like that's a leftover from changing our labeling format 
shortly after switching to this new configuration.  I thought I purged 
all the stuff with the old label scheme, but I guess not.


It somewhat surprises me that this doesn't give any kind of error 
indication in the e-mail report beyond the 'FAILED' line in the dump 
summary.


Re: amvault with dropbox

2017-11-07 Thread Austin S. Hemmelgarn

On 2017-11-07 13:36, Ned Danieley wrote:

On Tue, Nov 07, 2017 at 01:29:34PM -0500, Austin S. Hemmelgarn wrote:

OK, so you're talking about functionally permanent archiving instead
of keeping old stuff around for a fixed multiple of the dump cycle.
If that's the case, you may be better off pulling the dumps off the
tapes using amfetchdump, and then uploading them for there.  That
use case could in theory be handled better with some extra code in
Amanda, but I don't know how well the lack of deletion would be
handled on Amanda's side.


yeah, I need to upload monthly full dumps to dropbox and keep them forever.
the monthly dumps are to vtapes, and I thought it would be neat if I could
then just transfer the vtapes to dropbox using amvault.

Strictly speaking, amvault doesn't transfer vtapes, it retapes the dumps 
on the vtapes to a new location.  While this sounds like a somewhat 
pointless distinction, it's actually pretty significant because it means 
you can use a different type of tapes for your secondary storage, with 
almost every single tapetype option different (which is extremely useful 
for multiple reasons).  That's actually part of the reason that it's a 
preferred alternative to mirroring tapes with the Amanda's RAIT device.


The issue here though is the 'keep it forever' bit.  If Amanda is given 
an automated tape changer (a library of vtapes is an automated changer), 
it assumes it can reuse the tapes as it sees fit.  I think there's a 
config option that lets you change that, but once you do that, you need 
to keep adding tapes (or vtapes) to the library, which can get out of 
hand really quickly (especially if you don't plan ahead when deciding on 
how things will get labeled).


One option for this though, if you can afford to use something other 
than Dropbox, would be to use the Amazon S3 support to store your data 
in Amazon Glacier storage (which is insanely cheap at about 0.07 USD per 
TB of storage), and enable versioning (so that wen a 'tape' gets 
overwritten, the old version gets kept around) and keep old versions 
forever.  If you're interested in doing this, I can write up 
instructions for how to get things set up with Amazon to do this (We 
actually do something very similar for off-site backups where I work, 
just without Glacier or versioning (but those are easy to set up)).


Re: amvault with dropbox

2017-11-07 Thread Austin S. Hemmelgarn

On 2017-11-07 13:19, Ned Danieley wrote:

On Tue, Nov 07, 2017 at 01:11:43PM -0500, Austin S. Hemmelgarn wrote:

On 2017-11-07 11:55, Ned Danieley wrote:


we use a dropbox business account to archive our data, and I was interested
in trying to use amvault to transfer my amanda backups there. however, it
seems that there is a fair amount of work that would have to be done to the
code base to make that happen, work that is probably beyond my ability.

are any plans to include dropbox access in future versions?


You can do this already without needing any new code.  Just
configure a virtual tape library inside a Dropbox synced directory,
set that as a vaulting location, and recursively add the necessary
read permissions to the directory after each amvault run.


I guess that would work, although I'd have to set up selective sync so I
could remove the files locally without removing them from dropbox. thanks
for the suggestion; I'll give it a try.

OK, so you're talking about functionally permanent archiving instead of 
keeping old stuff around for a fixed multiple of the dump cycle.  If 
that's the case, you may be better off pulling the dumps off the tapes 
using amfetchdump, and then uploading them for there.  That use case 
could in theory be handled better with some extra code in Amanda, but I 
don't know how well the lack of deletion would be handled on Amanda's side.


Re: amvault with dropbox

2017-11-07 Thread Austin S. Hemmelgarn

On 2017-11-07 11:55, Ned Danieley wrote:


we use a dropbox business account to archive our data, and I was interested
in trying to use amvault to transfer my amanda backups there. however, it
seems that there is a fair amount of work that would have to be done to the
code base to make that happen, work that is probably beyond my ability.

are any plans to include dropbox access in future versions?

You can do this already without needing any new code.  Just configure a 
virtual tape library inside a Dropbox synced directory, set that as a 
vaulting location, and recursively add the necessary read permissions to 
the directory after each amvault run.


Re: Odd non-fatal errors in amdump reports.

2017-11-07 Thread Austin S. Hemmelgarn

On 2017-11-07 10:22, Jean-Louis Martineau wrote:

Austin,

It's hard to say something with only the error message.

Can you post the amdump. and log..0 for the 2
backup set that fail.
Yes, though it may take me a while since our policy is pretty strict 
about scrubbing hostnames and usernames from any internal files we make 
visible publicly.


Just to clarify, it will end up being 3 total pairs of files, two from 
backup sets that show the first issue I mentioned (the complaint about a 
header mismatch), and one from the backup set showing the second issue I 
mentioned (the apparently bogus dump failures listed in the dump summary).


The tapedev of the aws changer can be written like:

tapedev "chg-multi:s3:/slot-{0..127}
Thanks, I hadn't know that the configuration file syntax supported 
sequences like this, that makes it look so much nicer!



Jean-Louis

On 07/11/17 09:17 AM, Austin S. Hemmelgarn wrote:
 > Where I work, we recently switched from manually triggered vaulting to
 > automatic vaulting using the vault-storage, vault, and dump-selection
 > options. Things appear to be working correctly, but we keep getting
 > some odd non-fatal error messages (that might be bogus as well, since
 > I've verified the dumps mentioned restore correctly) in the amdump
 > e-mails. I've been trying to figure out these 'errors' for the past
 > few weeks now, and I'm hoping someone on the list might have some advice
 > (or better yet, might recognize the symptoms and know how to fix them).
 >
 > In our configuration, we have three different backup sets (each is on
 > it's own schedule). Of these, two are consistently showing the following
 > error in the amdump e-mail report (I've redacted hostnames and exact 
paths,

 > the second path listed though is a parent directory of the first):
 >
 > taper: FATAL Header of dumpfile does not match command from driver 0 
XXX /home/X 20171031074642 -- 0 XXX 
/home/XX 20171031074642 at 
/usr/lib64/perl5/vendor_perl/5.24.1/Amanda/Taper/Worker.pm line 1168

 >
 > For a given backup set, the particular hostname and paths are always the
 > same, but the backup appears to get taped correctly, and restores
 > correctly as well.
 >
 > With the third backup set, we're regularly seeing things like the
 > following in the dump summary section, but no other visible error
 > messages:
 >
 > DUMPER STATS TAPER STATS
 > HOSTNAME DISK L ORIG-KB OUT-KB COMP% MMM:SS KB/s MMM:SS KB/s
 > - -- 
 ---

 > XX /boot 0 -- FAILED
 > XX /boot 1 10 10 -- 0:00 168.8 0:00 0.0
 >
 > In this case, the particular DLE's affected are always the same,
 > and the first line that claims a failure always shows dump level
 > zero, even when the backup is supposed to be at another level.
 > Just like the other error, the affected dumps always restore
 > correctly when tested, and get correctly vaulted as well. The
 > affected DLE's are only on Linux systems, but it seems to not
 > care what distro or amanda version is being used (it's affected,
 > Debian, Gentoo, and Fedora systems, and covers 5 different
 > Amanda client versions), and are invariably small (sub-gigabyte)
 > filesystems, but I've not found any other commonality among them.
 >
 > All three sets use essentially the same amanda.conf file (the
 > differences are literally just in when they get run), which
 > I've attached in-line at the end of this e-mail with
 > sensitive data redacted. The thing I find particularly odd is
 > that this config is essentially identical to what I use on my
 > personal systems, which are not exhibiting either problem.
 >
 > 8<
 >
 > org "X"
 > mailto "admin"
 > dumpuser "amanda"
 > inparallel 2
 > dumporder "Ss"
 > taperalgo largestfit
 >
 > displayunit "k"
 > netusage 800 Kbps
 >
 > dumpcycle 4 weeks
 > runspercycle 28
 > tapecycle 128 tapes
 >
 > bumppercent 20
 > bumpdays 2
 >
 > etimeout 900
 > dtimeout 1800
 > ctimeout 30
 >
 > device_output_buffer_size 256M
 >
 > compress-index no
 >
 > flush-threshold-dumped 0
 > flush-threshold-scheduled 0
 > taperflush 0
 > autoflush yes
 >
 > runtapes 16
 >
 > define changer vtl {
 > tapedev "chg-disk:/net/XX/amanda/X"
 > changerfile "/etc/amanda/X/changer"
 > property "num-slot" "128"
 > property "auto-create-slot" "yes"
 > }
 >
 > define changer aws {
 > tapedev 
"chg-multi:s3:/slot-{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,1

Odd non-fatal errors in amdump reports.

2017-11-07 Thread Austin S. Hemmelgarn
Where I work, we recently switched from manually triggered vaulting to 
automatic vaulting using the vault-storage, vault, and dump-selection 
options.  Things appear to be working correctly, but we keep getting 
some odd non-fatal error messages (that might be bogus as well, since 
I've verified the dumps mentioned restore correctly) in the amdump 
e-mails.  I've been trying to figure out these 'errors' for the past
few weeks now, and I'm hoping someone on the list might have some advice
(or better yet, might recognize the symptoms and know how to fix them).

In our configuration, we have three different backup sets (each is on 
it's own schedule).  Of these, two are consistently showing the following
error in the amdump e-mail report (I've redacted hostnames and exact paths,
the second path listed though is a parent directory of the first):

taper: FATAL Header of dumpfile does not match command from driver 0 XXX 
/home/X 20171031074642 -- 0 XXX /home/XX 
20171031074642 at /usr/lib64/perl5/vendor_perl/5.24.1/Amanda/Taper/Worker.pm 
line 1168

For a given backup set, the particular hostname and paths are always the 
same, but the backup appears to get taped correctly, and restores 
correctly as well.

With the third backup set, we're regularly seeing things like the 
following in the dump summary section, but no other visible error 
messages:

   DUMPER 
STATS TAPER STATS
HOSTNAME DISK   L ORIG-KB  OUT-KB  COMP%  MMM:SS
 KB/s MMM:SS KB/s
- -- 
 ---
XX   /boot  0--   
FAILED 
XX   /boot  1  10  10-- 0:00
168.8   0:00  0.0

In this case, the particular DLE's affected are always the same,
and the first line that claims a failure always shows dump level
zero, even when the backup is supposed to be at another level.
Just like the other error, the affected dumps always restore
correctly when tested, and get correctly vaulted as well.  The
affected DLE's are only on Linux systems, but it seems to not
care what distro or amanda version is being used (it's affected,
Debian, Gentoo, and Fedora systems, and covers 5 different
Amanda client versions), and are invariably small (sub-gigabyte)
filesystems, but I've not found any other commonality among them.

All three sets use essentially the same amanda.conf file (the 
differences are literally just in when they get run), which
I've attached in-line at the end of this e-mail with
sensitive data redacted.  The thing I find particularly odd is
that this config is essentially identical to what I use on my
personal systems, which are not exhibiting either problem.

8<

org  "X"
mailto   "admin"
dumpuser "amanda"
inparallel 2
dumporder "Ss"
taperalgo largestfit

displayunit "k"
netusage  800 Kbps

dumpcycle 4 weeks
runspercycle 28
tapecycle 128 tapes

bumppercent 20
bumpdays 2

etimeout 900
dtimeout 1800
ctimeout 30

device_output_buffer_size 256M

compress-index no

flush-threshold-dumped 0
flush-threshold-scheduled 0
taperflush 0
autoflush yes

runtapes 16

define changer vtl {
tapedev "chg-disk:/net/XX/amanda/X"
changerfile "/etc/amanda/X/changer"
property "num-slot" "128"
property "auto-create-slot" "yes"
}

define changer aws {
tapedev 
"chg-multi:s3:/slot-{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127}"
changerfile "/etc/amanda/X/s3-changer"
device-property "S3_SSL" "YES"
device-property "S3_ACCESS_KEY" ""
device-property "S3_SECRET_KEY" 
""
device-property "S3_MULTI_PART_UPLOAD" "YES"
device-property "CREATE_BUCKET" "NO"
device-property "S3_BUCKET_LOCATION" "X"
device-property "STORAGE_API" "AWS4"
}

define storage local-vtl {
tpchanger "vtl"
tapepool "$r"
tapetype "V64G"
labelstr "^-[0-9][0-9]*$"
autolabel "-%%%" any
erase-on-full YES
erase-on-failure YES
vault cloud 0
}

define storage cloud {
tpchanger "aws"
tapepool "$r"
tapetype "S3TAPE"
labelstr "^Vault--[0-9][0-9]*$"
autolabel "Vault--%%%" any
erase-on-full YES
erase-on-failure YES
 

Re: approaches to Amanda vaulting?

2017-10-24 Thread Austin S. Hemmelgarn

On 2017-10-24 12:28, Stefan G. Weichinger wrote:

Am 2017-10-24 um 13:38 schrieb Austin S. Hemmelgarn:

On 2017-10-22 13:38, Stefan G. Weichinger wrote:

After or before I additionally can do something like:

amvault myconf --dest-storage --latest-fulls archive

correct?

I think so, but I'm not 100% certain.


oh ;-)


An additional hurdle is that the customer wants to use WORM tapes for
archive, so I should get that right at the first run to not waste any 
tapes


Perhaps create a temporary virtual tape library for testing that the 
archiving schedule works as expected?  This is what I generally do 
when testing changes at work (although I usually do it using a copy of 
the main configuration so that I don't confuse the planner for the 
production backups with half a dozen runs in one day).


Sure, that would be good, but I don't have that much disk space available.

I am currently trying to wrap my head around the tuning of these 
parameters (and understand the exact meaning by reading the man page):


flush-threshold-dumped
flush-threshold-scheduled
taperflush

I had lev0 of all DLEs in the holding disk and both flush-threshold 
values on 400 -> I thought this would keep data for 4 tapes inside the 
disk, but no, some lev0 backups were flushed to primary storage already.


Maybe I set up a VM with 2 vtape changers and play around there to learn 
and understand.
Based on what you're saying you want, I think you want the following in 
your config:


flush-threshold-dumped 400
flush-threshold-scheduled 400
taperflush 400
autoflush yes

The first two control flushing during a run, while taperflush controls 
flushing at the end of a run.  To get the flushing to actually happen, 
you then need autoflush set to yes (and amanda will complain if it's not 
set to yes and taperflush is more than zero).


Now, I'm not 100% certain that will work, as I've not done this type of 
thing myself (at work, we just use the holding disk as a cache so that 
we can finish dumps as quickly as possible without our (slow, 
parity-raid backed) persistent storage being the bottleneck, and at home 
I don't use it since I don't need parallelization and I don't have any 
disks that are faster than any others), but based on what I understand 
from the documentation, I'm pretty sure this should do it.


Re: approaches to Amanda vaulting?

2017-10-24 Thread Austin S. Hemmelgarn

On 2017-10-22 13:38, Stefan G. Weichinger wrote:

Am 2017-10-16 um 19:22 schrieb Stefan G. Weichinger:

Am 2017-10-16 um 15:20 schrieb Jean-Louis Martineau:

Amanda 3.5 can do everything you want only by running the amdump command.

Using an holding disk:

* You configure two storages
* All dumps go to the holding disk
* All dumps are copied to each storages, not necessarily at the same
time or in the same run.
* The dumps stay in holding until they are copied to both storages
* You can tell amanda that everything must go to both storage or only
some dle full/incr



I now have set up a config like this:


define changer robot {
 tpchanger "chg-robot:/dev/sg3"
 property "tape-device" "0=tape:/dev/nst0"
 property "eject-before-unload" "no"
 property "use-slots" "1-8"
}

define tapetype LTO6 {
#comment "Created by amtapetype; compression enabled"
length 2442818848 kbytes # etwa 2.4 TB (sgw)
filemark 1806 kbytes
speed 74006 kps
blocksize 32 kbytes
part_size 200G
}

define storage myconf {
 tapepool "myconf"
 tapetype "LTO6"
 tpchanger "robot"
 labelstr "^CMR[0-9][0-9]*$"
 autoflush   yes
#   flush-threshold-dumped 100
#   flush-threshold-scheduled 100
#
#   lass alles in holding disk
  flush-threshold-dumped 400 # (or more)
  flush-threshold-scheduled  400 # (or more)
  taperflush 400
 runtapes 4
}

define storage archive {
 tapepool "archive"
 tapetype "LTO6"
 tpchanger "robot"
 labelstr "^ARC[0-9][0-9]*$"
 autoflush   yes
 flush-threshold-dumped 100
 flush-threshold-scheduled 100
 runtapes 4
 dump-selection ALL FULL
}

storage "myconf"
maxdumpsize -1
amrecover_changer "robot"


> 8<

my goal:

I have to create a set of archive tapes for that customer, every 3
months or so.

With the above setup I now ran "amdump --no-taper myconf" which
collected all dumps on holdingdisk (did an "amadmin myconf force *"
before to force FULLs now).

As I understand that I could now do a plain amflush which should

(a) write to the tapes of tapepool "myconf" and

(b) leave the holdingdisk tarballs where they are, right?

(I am not yet sure about that "400" above, I want to keep data for all 4
tapes in the holdingdisk now and may reduce that to 100 for normal daily
runs without "--no-taper" or so)

After or before I additionally can do something like:

amvault myconf --dest-storage --latest-fulls archive

correct?

I think so, but I'm not 100% certain.


An additional hurdle is that the customer wants to use WORM tapes for
archive, so I should get that right at the first run to not waste any tapesPerhaps create a temporary virtual tape library for testing that the 
archiving schedule works as expected?  This is what I generally do when 
testing changes at work (although I usually do it using a copy of the 
main configuration so that I don't confuse the planner for the 
production backups with half a dozen runs in one day).


Re: approaches to Amanda vaulting?

2017-10-19 Thread Austin S. Hemmelgarn

On 2017-10-19 11:06, Jean-Louis Martineau wrote:

On 19/10/17 08:48 AM, Austin S. Hemmelgarn wrote:
 > On 2017-10-18 15:45, Stefan G. Weichinger wrote:
 >> Am 2017-10-16 um 20:47 schrieb Austin S. Hemmelgarn:
 >>
 >>> While it's not official documentation, I've got a working
 >>> configuration with
 >>> Amanda 3.5.0 on my personal systems, using locally accessible
 >>> storage for
 >>> primary backups, and S3 for vaulting (though I vault everything, the
 >>> local
 >>> storage is for getting old files back, S3 is for disaster
 >>> recovery). I've
 >>> put a copy of the relevant config fragment at the end of this reply,
 >>> with
 >>> various private data replaced, and some bits that aren't really
 >>> relevant
 >>> (like labeling options) elided.
 >>
 >> A quick thank you at this point:
 >>
 >> thanks for providing this config plus explanations, I will try to set up
 >> a similar config soon and take your example as a template.
 >>
 >> And maybe come back with some additional questions ;-)
 >>
 >> for example: what do you run as cronjobs, what do you do via manual
 >> commands? amdump in cron, amvault now and then?
 > Well, there's two options for how to handle it.
 >
 > Where I work, we use a very similar configuration to what I posted,
 > and run amdump and amvault independently, both through cron (though we
 > only vault full backups to S3 since we have a reasonably good level of
 > trust in the reliability of our local storage). This gives very good
 > control of exactly what and exactly when things get vaulted, and
 > allows for scheduling vaulting separately from dumps (we prefer to
 > only copy things out to S3 once a month and need to make sure the
 > network isn't bogged down with backups during work hours, so this is a
 > big plus for us).
The problem with the amvault command is that it do only according to the
command line, which can be difficult to get right.
If amvault fail, it's hard to find the correct arguments to vault what
was not yet vaulted.
With wrong arguments, some dump might never be vaulted, or some dumps
might be vaulted multiple time (on different amvault invocation).
To be entirely honest, I wouldn't exactly call `--latest-fulls`, 
`--fulls-only`, or `--incrs-only` hard to get right.  It's only really 
tricky if you want to only vault subsets of the config.  Add to that 
that it's pretty easy to see what got vaulted if you have e-mail set up 
right, and it really isn't too bad for most use cases.


Since you want to vault all full, I would set 'vault' in the local
storage, set 'dump-selection' in the cloud storage, but will not set
'vault-storage'
That way the vault are scheduled but are not executed because
vault-storage is not set. Amanda know they must be vaulted.
Every month, you can run: amdump CONF BADHOST -ovault-storage="cloud"
to do the vaulting.
We've actually been discussing migrating things to operate like I have 
them set up on my home systems (albeit only vaulting fulls), as the 
'once a month' part of vaulting is largely a hold-over from our old 
(pre-Amanda) backup system which did fulls on the first of the month, 
and archived them off-site the day afterwards.


 >
 > On my home systems, I also use a similar config, but I instead have a
 > 'vault' option specified in the 'local' storage block that points to
 > the 'cloud' and says to vault immediately after dump generation(so the
 > line is 'vault cloud 0'). With this setup, amdump will run the
 > vaulting operation itself after finishing everything else for the dump
 > (and you actually don't need the 'vault-storage' line at the end I
 > think), and you either end up vaulting everything, or have to limit
 > things through the config with a 'dump-selection' line in your 'cloud'
 > storage definition.
vault-storage is required, otherwise the vault are not executed.

Good to know, that could probably be better explained in the documentation.


Re: approaches to Amanda vaulting?

2017-10-19 Thread Austin S. Hemmelgarn

On 2017-10-18 15:45, Stefan G. Weichinger wrote:

Am 2017-10-16 um 20:47 schrieb Austin S. Hemmelgarn:


While it's not official documentation, I've got a working configuration with
Amanda 3.5.0 on my personal systems, using locally accessible storage for
primary backups, and S3 for vaulting (though I vault everything, the local
storage is for getting old files back, S3 is for disaster recovery).  I've
put a copy of the relevant config fragment at the end of this reply, with
various private data replaced, and some bits that aren't really relevant
(like labeling options) elided.


A quick thank you at this point:

thanks for providing this config plus explanations, I will try to set up
a similar config soon and take your example as a template.

And maybe come back with some additional questions ;-)

for example: what do you run as cronjobs, what do you do via manual
commands? amdump in cron, amvault now and then?

Well, there's two options for how to handle it.

Where I work, we use a very similar configuration to what I posted, and 
run amdump and amvault independently, both through cron (though we only 
vault full backups to S3 since we have a reasonably good level of trust 
in the reliability of our local storage).  This gives very good control 
of exactly what and exactly when things get vaulted, and allows for 
scheduling vaulting separately from dumps (we prefer to only copy things 
out to S3 once a month and need to make sure the network isn't bogged 
down with backups during work hours, so this is a big plus for us).


On my home systems, I also use a similar config, but I instead have a 
'vault' option specified in the 'local' storage block that points to the 
'cloud' and says to vault immediately after dump generation(so the line 
is 'vault cloud 0').  With this setup, amdump will run the vaulting 
operation itself after finishing everything else for the dump (and you 
actually don't need the 'vault-storage' line at the end I think), and 
you either end up vaulting everything, or have to limit things through 
the config with a 'dump-selection' line in your 'cloud' storage definition.


Re: What are the correct permissions for lib binaries for amanda 3.5

2017-10-16 Thread Austin S. Hemmelgarn

On 2017-10-16 14:58, Jon LaBadie wrote:

On Mon, Oct 16, 2017 at 02:05:05PM -0400, Jean-Louis Martineau wrote:

On 16/10/17 01:48 PM, Jon LaBadie wrote:

On Mon, Oct 16, 2017 at 08:12:43AM -0400, Jean-Louis Martineau wrote:

On 14/10/17 12:12 PM, Jose M Calhariz wrote:

On Sat, Oct 14, 2017 at 11:36:09AM -0400, Jean-Louis Martineau wrote:

On 14/10/17 11:14 AM, Jose M Calhariz wrote:

-rwsr-xr-- 1 root backup 10232 Oct 13 17:23 ambind

ambind must not be readable by all

-rwsr-x--- 1 root backup 10232 Oct 13 17:23 ambind

Thank you for the quick reply.  May I ask why "ambind must not be
readable by all" ?

All suid program in amanda are always installed like this.


Why are all amanda suid programs installed this way?

It's before I was born, maybe not, but before I started to work on the
amanda software.
It's kind of security by hiding, it's harder to find a vulnerability in
the suid binary if you can't read it.


I guessed it was security by obscurity.
It is, but it's common practice security by obscurity dating back almost 
to SVR4.



It make sense when you build yourself, but not when doing a package
where everyone can read the files in the package.


For the same reason I felt that would be "false" security.


The group probably do not read the 'r' bit either.

Do you think amcheck should not check if the suid binary are readable by
all?


My gut reaction is such a check is superfluous.  But I'm not a
security expert.  Do we have any security specialist (or others)
on the list who would care to comment?
I won't claim to be a security expert, but I've been a sysadmin for more 
than a decade and can tell you two things based on my experience own 
experience:


1. Amanda is the only software I've ever encountered that does this kind 
of check, or more accurately, it's the only software I've ever 
encountered where this type of check is a fatal error.  Some other 
software will ignore files if their ownership is wrong, but it's treated 
as a warning, and it's only configuration files (stuff like 
~/.ssh/authorized_keys for example).


2. The checks are a serious pain in the arse, mostly because error 
messages are so vague (OK, so file XYZ has the wrong permissions, does 
that mean the directory it's in has the wrong permissions, or the file 
itself, and which permissions are wrong?).  This particular check isn't 
as bad in that respect as, for example, the ones checking 
/etc/amanda-security.conf, but it's still a pain to deal with.


Aside from that though, it's a case where the benefit to security is 
dependent on things that just aren't true for most systems amanda is 
likely to run on, namely that an attacker is:


1. Unable to determine what type of system you're running on. (This is a 
patently false assumption on any publicly available distro, as well as 
most paid ones like OEL, RHEL, and SLES).

&
2. Unable to access the packages directly.

In most cases, both are false.  There are a few odd cases like 
source-based distros (Gentoo for example) where the package gets built 
locally, but even then the builds are pretty reproducible, and the code 
for Amanda itself is trivially available for review through other sources.


In a way, it's kind of like making the contents of /boot inaccessible to 
regular users, but not preventing `uname -v` and `uname -r` from being 
executed by them.  It makes things a bit more complicated for attackers, 
but in a rather trivial way that doesn't provide anything but a false 
sense of security.


Does amcheck do any checks for amanda programs that are [sg]uid
that should not be?
I'm not sure, though it does check ownership on many files, and I think 
it checks that things that are supposed to be suid or sgid are (I'm 
pretty sure it complains if amgtar or amstar aren't suid root).


Re: approaches to Amanda vaulting?

2017-10-16 Thread Austin S. Hemmelgarn
On 2017-10-16 13:22, Stefan G. Weichinger wrote:
> Am 2017-10-16 um 15:20 schrieb Jean-Louis Martineau:
>> Amanda 3.5 can do everything you want only by running the amdump command.
>>
>> Using an holding disk:
>>
>> * You configure two storages
>> * All dumps go to the holding disk
>> * All dumps are copied to each storages, not necessarily at the same
>> time or in the same run.
>> * The dumps stay in holding until they are copied to both storages
>> * You can tell amanda that everything must go to both storage or only
>> some dle full/incr
> 
> 
> So it is possible to set up a mix of "normal" daily backups with
> incrementals/fulls and "archive"/vault backups with only the full
> backups of a specific day ?
> 
> I have requests to do so for a customer, until now we used amanda-3.3.9
> and 2 configs sharing most of config and disklist ...
> 
> Nathan, the OP of this thread and others (including me) would like to
> see actual examples of configuration, a howto or something.
> 
> The man page https://wiki.zmanda.com/man/amvault.8.html is a bit minimal
> ...
> 
> Is there anything additional to that manpage and maybe:
> 
> http://wiki.zmanda.com/index.php/How_To:Copy_Data_from_Volume_to_Volume
> 
> ?
While it's not official documentation, I've got a working configuration with
Amanda 3.5.0 on my personal systems, using locally accessible storage for
primary backups, and S3 for vaulting (though I vault everything, the local
storage is for getting old files back, S3 is for disaster recovery).  I've
put a copy of the relevant config fragment at the end of this reply, with
various private data replaced, and some bits that aren't really relevant
(like labeling options) elided.

For this to work reliably, you need to define a holding disk (although it
can be on the same storage as the local vtape library).  I personally start
flushing from the holding disk immediately the moment any dump is complete,
as all the data fits on one tape and the S3 upload takes longer than creating
the backups in the first place, but it should work just fine too when
buffering things on the holding disk.

The given S3 configuration assumes you already created the destination bucket
(I per-create them since I do life cycle stuff and cross-region replication,
both of which are easier to set up if you create the bucket by hand).  I also
use a dedicated IAM user for the S3 side of things for both security and
accounting reasons, but that shouldn't impact things.  Additionally, I've found
that the S3 uploads work much more reliably if you set a reasonable part size
and have part caching.  1 GB seems to give a good balance between performance
and reliability.

8<---

define tapetype vtape {
length 16 GB
part-size 1 GB
part-cache-type memory
}

define changer local-vtl {
tapedev "chg-disk:/path/to/local/vtapes"
}

define changer aws {
tapedev 
"chg-multi:s3:example-bucket/slot{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16}"
device-property "S3_SSL""YES"
device-property "S3_ACCESS_KEY" "IAM_ACCESS_KEY"
device-property "S3_SECRET_KEY" "IAM_SECRET_KEY"
device-property "S3_MULTI_PART_UPLOAD"  "YES"
device-property "CREATE_BUCKET" "NO"
device-property "S3_BUCKET_LOCATION""us-east-1"
device-property "STORAGE_API"   "AWS4"
}

define storage local {
tapepool "local"
tapetype "vtape"
tpchanger "local-vtl"
}

define storage cloud {
tapepool "s3"
tapetype "vtape"
tpchanger "aws"
}

storage "local"
vault-storage "cloud"