Re: Only increasing incrementals
On 2018-12-12 19:11, Debra S Baddorf wrote: Oh, that’s right — Chris DID tell us what he was trying to do, last week: (in case it helps with any further answers) OK, given that what you want appears to be frequent snapshots, Amanda is almost certainly _not_ the correct tool for this job for three reasons: * It can not do atomic snapshots of the filesystem state, so either you need to freeze all write I/O to the DLE, or you won't have coherent copy of the state of the DLE from when you ran the backup. This doesn't matter most of the time for regular backup usage, because you just run Amanda during off-hours when nobody's doing anything and the system is idle, but for this it's going to be a potential issue. * It takes a _lot_ of system resources to do a backup with Amanda. This is mitigated by your proposed approach of doing constantly increasing incremental levels, but even for that Amanda has to call `stat()` on _everything_ in the DLE. * It's not trivial (as you have found out) to get this type of thing to work reliably. I would suggest looking at the following alternative approaches: * Use ZFS, BTRFS, or another filesystem that supports native snapshotting, and just do snapshots regularly. This is likely to be your best approach. In some cases, depending on the platform and filesystem, you may not even need to do anything (for example, NILFS2 on Linux has implicit snapshots built in because it's a log structured filesystem). * Store all the data on a NAS device that can do snapshots (for example, something running FreeNAS), and have it do regular snapshots. This largely reduces to the above, just indirected over the network. * Use a filesystem that supports automatic native file versioning. The classic example is Files-11 from OpenVMS. Other options for this include GitFS [1], CopyFS [2], Plan 9's Fossil filesystem * Store all the data on a NAS device that does automatic native file versioning. * If all else fails, you can technically do this with Amanda by using `amadmin force` to force the level 0 dump, and `amadmin force-bump` for each backup _after the second_ (the first backup after a level 0 will always be a level 1) to get the increasing incrementals. [1] https://www.presslabs.com/code/gitfs/ [2] https://boklm.eu/copyfs/ On Dec 7, 2018, at 12:04 PM, Chris Miller wrote: Hi Folks, I'm about to start a project during which I want to be able to: • Request a backup at any moment and have that backup be either an incremental backup (Level N+1), meaning everything that has changed since the last backup (Level N), or a differential backup, meaning everything that has changed since the last full backup (Level 1). The second provision, "differential backup" is pretty straight forward, but I have no idea how to configure a constantly increasing dump level. • The first backup of the day, meaning the first backup after midnight, will be a full filesystem backup. Discussion on point 1: The provision is for capturing changes that occur during a given period of time, and no so much for "backup" per se, so AMANDA may not be the best tool, but it is what I have, so I'm tying to make it fit. I know how to request a backup, so that's not my problem, but I don't know how to force a given level. In particular, I don't know how to force a Level N+1 backup. I could replace the Level N+1 requirement with a forced Level 1, run my experiment, and force a level 2, and this would meet my requirement of capturing all the changes during a particular interval. But, again, that requires forcing AMANDA to take direction about backup levels and I don't know how to do that. Before anybody reminds me that this is why god invented git, I would like to add, that the scope of git is typically only known parts of the project, and I want to capture log files and other files that are sometimes created in temporary locations with temporary names, which are not known a priori and therefore can't be "managed" with git. Discussion on point 2: The "first backup of the day", will run as a cron job, but it must be a level 0, full filesystem backup so no work for the day is lost. It is more forcing AMANDA to take direction, I don't know exactly how to do this. I don't think I like the idea of forcing AMANDA (#MeToo) to do things, but I'm not above payment in kind. (-: Thanks for the help,
Re: Only increasing incrementals
On 2018-12-12 02:18, Olivier wrote: Nathan Stratton Treadway writes: On Thu, Nov 22, 2018 at 11:18:25 +0700, Olivier wrote: Hello, UI am wondering if there is a way to define a DLE that would all incrementals but only with increasing level: - full (0) - incremental 1 - incremental 2 - incremental 3 - etc. But never: 0, 1, 1, 1, 2 Each back-up level must be above the previous one or be a full back-up. I am not sure what you are trying to accomplish, I am trying to backup something that can only have incremental with increasing levels: it cannot do two level 1 in a row, levels must be 1, then 2, then 3, etc. (think some successive snapshots). According to amanda.conf(5) man page: bumpdays int Default: 2 days. To insure redundancy in the dumps, Amanda keeps filesystems at the same incremental level for at least bumpdays days, even if the other bump threshold criteria are met. I want to absolutely cancel that feature, each incremental must have a level creater than the previous dump and an incremental level can not be bumped (only level 0 can be bumped). OK, I"m actually curious what your exact reasoning for requiring this is, because I'm seeing exactly zero circumstances where this makes sense at all, and can think of multiple ways it's a bad thing (for example, losing your level one incremental makes all of your backups for that cycle useless).
Re: Dumping and taping in parallel
On 2018-11-28 13:58, Chris Nighswonger wrote: On Wed, Nov 28, 2018 at 11:17 AM Austin S. Hemmelgarn mailto:ahferro...@gmail.com>> wrote: Based on your configuration, your tapes are configured to store just short of 800GB of data. The relevant lines then are these two: flush-threshold-scheduled 50 flush-threshold-dumped 50 I misunderstood the man pages there and for some reason thought that volume referred to the holding disk. Probably because I was reading way too fast In your case, I'd suggest figuring out the average amount of data you dump each run, and then configuring things to start flushing when about half that much data has been dumped. That will still have the taping run in parallel with the dumping, but will give you enough of a buffer that the taper should never have to wait for dumps to finish. So over the last 13 runs, the: -- smallest volume size has been 152G (19% tape capacity) -- average volume size has been 254G (32% tape capacity) -- largest volume size has been 612G (76% tape capacity) So do the following values look "reasonable" based on those numbers: flush-threshold-scheduled 25 flush-threshold-dumped 0 That should target the larger sizes which are the ones which tend to lap into the next business day. Probably. The extent of the experimentation I've done with these is determining for certain that I got no performance benefits from not just taping backups as they finished (all of my setups use vtapes on fast storage, so there's no benefit to me not just taping dumps as they're done).
Re: Dumping and taping in parallel
On 2018-11-28 10:58, Chris Nighswonger wrote: On Wed, Nov 28, 2018 at 10:49 AM Austin S. Hemmelgarn mailto:ahferro...@gmail.com>> wrote: On 2018-11-28 09:53, Stefan G. Weichinger wrote: > Am 28.11.18 um 15:47 schrieb Chris Nighswonger: >> So why won't amanda dump and tape at the same time? > > It does normally, that is what the holding disk is for. Really? I was under the impression that it was for making sure you can finish dumps if something goes wrong with taping, and cache dumps so they can be written to tape in one pass. Without a holding disk, Amanda dumps straight to tape, which is technically dumping and taping in parallel. > > More details might lead to better suggestions. > > Show your amanda.conf etc Indeed, though I suspect it's something regarding the flushing configuration. inparallel 10 maxdumps 1 # (equivalent to one Tbit/s). netusage 1073741824 dumporder "STSTStstst" dumpcycle 5 days runspercycle 5 tapecycle 13 tapes runtapes 1 flush-threshold-scheduled 50 flush-threshold-dumped 50 bumpsize 10 Mbytes bumppercent 0 bumpmult 1.5 bumpdays 2 ctimeout 60 dtimeout 1800 etimeout 300 dumpuser "backup" tapedev "Quantum-Superloader3-LTO-V4" autolabel "$c-$b" EMPTY labelstr "campus-.*" tapetype LTO-4 logdir "/var/backups/campus/log" infofile "/var/backups/campus/curinfo" indexdir "/var/backups/campus/index" tapelist "/var/backups/campus/tapelist" autoflush all holdingdisk hd1 { comment "Local striped raid array" directory "/storage/campus" use 0 Gb chunksize 1 Gb } define changer Quantum-Superloader3-LTO-V4 { tapedev "chg-robot:/dev/sg3" property "use-slots" "1-13" property "tape-device" "0=tape:/dev/nst0" device-property "LEOM" "TRUE" } define tapetype LTO-4 { comment "Created by amtapetype; compression enabled" length 794405408 kbytes filemark 1385 kbytes speed 77291 kps blocksize 512 kbytes } Based on your configuration, your tapes are configured to store just short of 800GB of data. The relevant lines then are these two: flush-threshold-scheduled 50 flush-threshold-dumped 50 The first one tells Amanda to not try flushing anything early if you aren't using at least half a tape based on dump size estimates, and the second one says that at least half a tape's worth of data must already be dumped before flushing will start. Together, this means Amanda won't flush anything to tape until all dumps are done unless you're dumping more than half a tape's worth of data each run. If you set those both to zero, Amanda will start flushing dumps to tape as they finish. Doing so has two disadvantages for you because you're using real tapes and not vtapes: * You can't have Amanda intelligently pack the dumps onto the tape. This probably doesn't matter as you appear to have things configured so that each run only uses one tape and you haven't explicitly defined a `tapealgo` (the default `tapealgo` is a simple dumb FIFO queue, so it behaves the same as immediately flushing dumps as they finish). * You run the risk of having to stop and restart the tape drive multiple times while writing dumps. Put simply, by flushing at the end like things are currently, you can gaurantee 100% utilization of the tape drive while flushing dumps. If you flush them as they're done, the taper will almost certainly have to wait for some dumps to finish after it initially starts writing data. In your case, I'd suggest figuring out the average amount of data you dump each run, and then configuring things to start flushing when about half that much data has been dumped. That will still have the taping run in parallel with the dumping, but will give you enough of a buffer that the taper should never have to wait for dumps to finish.
Re: Dumping and taping in parallel
On 2018-11-28 09:53, Stefan G. Weichinger wrote: Am 28.11.18 um 15:47 schrieb Chris Nighswonger: So why won't amanda dump and tape at the same time? It does normally, that is what the holding disk is for. Really? I was under the impression that it was for making sure you can finish dumps if something goes wrong with taping, and cache dumps so they can be written to tape in one pass. Without a holding disk, Amanda dumps straight to tape, which is technically dumping and taping in parallel. More details might lead to better suggestions. Show your amanda.conf etc Indeed, though I suspect it's something regarding the flushing configuration.
Re: Another dumper question
On 2018-11-26 15:13, Chris Nighswonger wrote: On Mon, Nov 26, 2018 at 2:32 PM Nathan Stratton Treadway wrote: On Mon, Nov 26, 2018 at 13:56:52 -0500, Austin S. Hemmelgarn wrote: On 2018-11-26 13:34, Chris Nighswonger wrote: The other possibility that comes to mind is that your bandwidth settings are making Amanda decide to limit to one dumper at a time. Chris, this is certainly the first thing to look at: note in your amstatus output the line "network free kps: 0": 9 dumpers idle : 0 taper status: Idle taper qlen: 1 network free kps: 0 holding space : 436635431k ( 50.26%) Hmm... I missed that completely. I'll set it arbitrarily high as Austin suggested and test it overnight. Don't feel bad, it's not something that gets actively used by a lot of people, so most people don't really think about it. If used right though, it provides the rather neat ability to have Amanda limit it's network utilization while running backups, which is really helpful if you have to run backups during production hours for some reason.
Re: Another dumper question
On 2018-11-26 13:34, Chris Nighswonger wrote: So in one particular configuration I have the following lines: inparallel 10 dumporder "STSTSTSTST" I would assume that that amanda would spawn 10 dumpers in parallel and execute them giving priority to largest size and largest time alternating. I would assume that amanda would do some sort of sorting of the DLEs based on size and time, set them in descending order, and the run the first 10 based on the list thereby utilizing all 10 permitted dumpers in parallel. However, based on the amstatus excerpt below, it looks like amanda simply starts with the largest size and runs the DLEs one at a time, not making efficient use of parallel dumpers at all. This has the unhappy results at times of causing amdump to be running when the next backup is executed. I have changed the dumporder to STSTStstst for tonight's run to see if that makes any difference. But I don't have much hope it will. Any thoughts? Is this all for one host? If so, that's probably your issue. By default, Amanda will only run at most one DLE per host at a time. You can change this in the dump settings, but I forget what the exact configuration parameter is. The other possibility that comes to mind is that your bandwidth settings are making Amanda decide to limit to one dumper at a time. You can easily test that by just setting the `netusage` parameter to an absurdly large value like 1073741824 (equivalent to one Tbit/s). Kind regards, Chris From Mon Nov 26 01:00:01 EST 2018 1 4054117k waiting for dumping 1 6671k waiting for dumping 1 222k waiting for dumping 1 2568k waiting for dumping 1 6846k waiting for dumping 1125447k waiting for dumping 1 91372k waiting for dumping 192k waiting for dumping 132k waiting for dumping 132k waiting for dumping 132k waiting for dumping 132k waiting for dumping 1290840k waiting for dumping 1 76601k waiting for dumping 186k waiting for dumping 1 71414k waiting for dumping 0 44184811k waiting for dumping 1 281k waiting for dumping 1 6981k waiting for dumping 150k waiting for dumping 1 86968k waiting for dumping 1 81649k waiting for dumping 1359952k waiting for dumping 0 198961004k dumping 159842848k ( 80.34%) (7:23:39) 1 73966k waiting for dumping 1821398k waiting for dumping 1674198k waiting for dumping 0 233106841k dump done (7:23:37), waiting for writing to tape 132k waiting for dumping 132k waiting for dumping 1166876k waiting for dumping 132k waiting for dumping 1170895k waiting for dumping 1162817k waiting for dumping 0 failed: planner: [Request to client failed: Connection timed out] 132k waiting for dumping 132k waiting for dumping 053k waiting for dumping 0 77134628k waiting for dumping 1 2911k waiting for dumping 136k waiting for dumping 132k waiting for dumping 1 84935k waiting for dumping SUMMARY part real estimated size size partition : 43 estimated : 42559069311k flush : 0 0k failed : 10k ( 0.00%) wait for dumping: 40128740001k ( 23.03%) dumping to tape : 00k ( 0.00%) dumping : 1 159842848k 198961004k ( 80.34%) ( 28.59%) dumped : 1 233106841k 231368306k (100.75%) ( 41.70%) wait for writing: 1 233106841k 231368306k (100.75%) ( 41.70%) wait to flush : 0 0k 0k (100.00%) ( 0.00%) writing to tape : 0 0k 0k ( 0.00%) ( 0.00%) failed to tape : 0 0k 0k ( 0.00%) ( 0.00%) taped : 0 0k 0k ( 0.00%) ( 0.00%) 9 dumpers idle : 0 taper status: Idle taper qlen: 1 network free kps: 0 holding space : 436635431k ( 50.26%) chunker0 busy : 6:17:03 ( 98.28%) dumper0 busy : 6:17:03 ( 98.28%) 0 dumpers busy : 0:06:34 ( 1.72%) 0: 0:06:34 (100.00%) 1 dumper busy : 6:17:03 ( 98.28%) 0: 6:17:03 (100.00%)
Re: Flushing the Holding Disk
On 2018-11-16 12:27, Chris Miller wrote: Hi Folks, I'm unclear on the timing of the flush from holding disk to vtape. Suppose I run two backup jobs,and each uses the holding disk. When will the second job start? Obviously, after the client has sent everything... Before the holding disk flush starts, or after the holding disk flush has completed? If by 'jobs' you mean 'amanda configurations', the second one starts when you start it. Note that `amdump` does not return until everything is finished dumping and optionally taping if anything would be taped, so you can literally just run each one sequentially in a shell script and they won't run in parallel. If by 'jobs' you mean DLE's, they run as concurrently as you tell Amanda to run them. If you've got things serialized (`inparallel` is set to 1 in your config), then the next DLE will start dumping once the previous one is finished dumping to the holding disk. Otherwise, however many you've said can run in parallel run (within per-host limits), and DLE's start when the previous one in sequence for that dumper finishes. Taping can (by default) run in parallel with dumping if you're using a holding disk, which is generally a good thing, though you can also easily configure it to wait for some amount of data to be buffered on the holding disk before it starts taping. Is there any way to defer the holding disk flush until all backup jobs for a given night have completed? Generically, set `autoflush no` in each configuration, and then run `amflush` for each configuration once all the dumps are done. However, unless you've got an odd arrangement where every system saturates the network link while actually dumping and you are sharing a single link on the Amanda server for both dumping and taping, this actually probably won't do anything for your performance. You can easily configure amanda to flush backups from each DLE as soon as they are done, and it will wait to exit until everything is actually flushed. Building from that, if you just want to ensure the `amdump` instances don't run in parallel, just use a tool to fire them off sequentially in the foreground. Stuff like Ansible is great for this (especially because you can easily conditionally back up your index and tapelist when the dump finishes). As long as the next `amdump` command isn't started until the previous one returns, you won't have to worry about them fighting each other for bandwidth.
Re: Does anyone know how to make an amadmin $config estimate work for new dle's?
On 2018-11-15 18:18, Gene Heskett wrote: On Thursday 15 November 2018 14:17:29 Austin S. Hemmelgarn wrote: On 2018-11-15 13:36, Gene Heskett wrote: On Thursday 15 November 2018 12:57:54 Austin S. Hemmelgarn wrote: On 2018-11-15 11:53, Gene Heskett wrote: On Thursday 15 November 2018 07:36:37 Austin S. Hemmelgarn wrote: On 2018-11-15 06:16, Gene Heskett wrote: I ask because after last nights run it showed one huge and 3 teeny level 0's for the 4 new dle's. So I just re-adjusted the locations of some categories and broke the big one up into 2 pieces. "./[A-P]*" and ./[Q-Z]*", so the next run will have 5 new dle's. But an estimate does not show the new names that results in. I've even took the estimate assignment calcsize back out of the global dumptype, which ack the manpage, forces the estimates to be derived from a dummy run of tar, didn't help. Clues? Having this info from an estimate query might take a couple hours, but it sure would be helpfull when redesigning ones dle's.I'm fairly certain you can't, because it specifically shows server-side estimates, which have no data to work from if there has never been a dump run for the DLE. Even if you told it to user tar for the estimate phase? That has enough legs to be called a bug. IMO anyway. As mentioned in one of my other responses, I can kind of see the value in this not bothering the client systems. Keep in mind that server estimates cost nothing on the client, while calcsize or client estimates may use a significant amount of resources. My default has been calcsize for three or 4 years, changed because tar was changed & was screwing up the estimates. I can remember 15+ years ago when I was using real tar estimates, on a much smaller machine, and it could come within 50 megabytes of filling a DDS-2 tape (4 GB compressed) for weeks at a time. So that part of amanda worked a lot better than it does today. And its slowly gone to the dogs as my system grew in complexity. And went in a handbasket when I had to change to calcsize during the tar churn. I've not been using AMANDA anywhere near as long as you have, but I've actually not seen any issues with accuracy of 'estimate client' mode estimates with current versions of GNU tar, except when the estimate ran while data in the DLE was being modified (and in that case, it makes sense that it would be bogus). I generally don't 'estimate client' on my own systems though because it consistently takes far longer than 'estimate calcsize', and I'm not picky about the estimates being perfect. In this case, I do think the documentation should be a bit clearer, Yes, but who is to rewrite it? He should know a heck of a lot more than I do about the amanda innards than I do even after 2 decades, and better defined words here and there too. diakdevice is a very poor substitute for the far more common slanguage of "/path/to/" and it would be useful to be able to get regular (calcsize and/or client) estimates on-demand, but I do think that the default is reasonably sane. It may well be sane, we'll see how it works in the morning. AIUI, calcsize runs only on old history. so that should not impinge a load on the client, even when the client is itself. Unless I'm mistaken: * 'estimate server' runs only on historical data, and doesn't even talk to the client systems. It's good at limiting the impact the estimate has on the client, but reliably gives bogus estimates if your DLEs don't show consistent behavior (that is, each backup of a given level is roughly the same size as every other backup at that level). * 'estimate client' relies on the backup program being used to give it info about how big it will be. It gives estimates that are close to 100% accurate, but currently essentially requires running the backup process twice (once for the estimate, once for the actual backup) and imposes a non-negligible amount of load on the client. That depends on the clients instant duty's. I have backed up a milling machine while it was running a 90 lines of code, 3 days to finish while sharpening a saw blade, with no apparent interaction on a dual core atom powered box. One core was locked away for LCNC, (isolcpus at work) the other was free to do the backup client. Didn't bither it a bot. :) * 'estimate calcsize' does something kind of in-between. AIUI, it looks at some historical data, and also looks at the on-disk size of the data, That would take time to access the dle's, and the answer is effectively instant, ergo it is not questioning the client(s), it has to be working only from the history in its own logs. Except that it actually runs on the client systems. I've actually looked at this, the calcsize program is running on the clients and not the server. It may be looking at the logs there, _but_ it's still running on the client. It may also be _really_ fast in your setup, but that doesn't inherently mean it's running locally (Amanda is smart e
Re: Does anyone know how to make an amadmin $config estimate work for new dle's?
On 2018-11-15 13:36, Gene Heskett wrote: On Thursday 15 November 2018 12:57:54 Austin S. Hemmelgarn wrote: On 2018-11-15 11:53, Gene Heskett wrote: On Thursday 15 November 2018 07:36:37 Austin S. Hemmelgarn wrote: On 2018-11-15 06:16, Gene Heskett wrote: I ask because after last nights run it showed one huge and 3 teeny level 0's for the 4 new dle's. So I just re-adjusted the locations of some categories and broke the big one up into 2 pieces. "./[A-P]*" and ./[Q-Z]*", so the next run will have 5 new dle's. But an estimate does not show the new names that results in. I've even took the estimate assignment calcsize back out of the global dumptype, which ack the manpage, forces the estimates to be derived from a dummy run of tar, didn't help. Clues? Having this info from an estimate query might take a couple hours, but it sure would be helpfull when redesigning ones dle's.I'm fairly certain you can't, because it specifically shows server-side estimates, which have no data to work from if there has never been a dump run for the DLE. Even if you told it to user tar for the estimate phase? That has enough legs to be called a bug. IMO anyway. As mentioned in one of my other responses, I can kind of see the value in this not bothering the client systems. Keep in mind that server estimates cost nothing on the client, while calcsize or client estimates may use a significant amount of resources. My default has been calcsize for three or 4 years, changed because tar was changed & was screwing up the estimates. I can remember 15+ years ago when I was using real tar estimates, on a much smaller machine, and it could come within 50 megabytes of filling a DDS-2 tape (4 GB compressed) for weeks at a time. So that part of amanda worked a lot better than it does today. And its slowly gone to the dogs as my system grew in complexity. And went in a handbasket when I had to change to calcsize during the tar churn. I've not been using AMANDA anywhere near as long as you have, but I've actually not seen any issues with accuracy of 'estimate client' mode estimates with current versions of GNU tar, except when the estimate ran while data in the DLE was being modified (and in that case, it makes sense that it would be bogus). I generally don't 'estimate client' on my own systems though because it consistently takes far longer than 'estimate calcsize', and I'm not picky about the estimates being perfect. In this case, I do think the documentation should be a bit clearer, Yes, but who is to rewrite it? He should know a heck of a lot more than I do about the amanda innards than I do even after 2 decades, and better defined words here and there too. diakdevice is a very poor substitute for the far more common slanguage of "/path/to/" and it would be useful to be able to get regular (calcsize and/or client) estimates on-demand, but I do think that the default is reasonably sane. It may well be sane, we'll see how it works in the morning. AIUI, calcsize runs only on old history. so that should not impinge a load on the client, even when the client is itself. Unless I'm mistaken: * 'estimate server' runs only on historical data, and doesn't even talk to the client systems. It's good at limiting the impact the estimate has on the client, but reliably gives bogus estimates if your DLEs don't show consistent behavior (that is, each backup of a given level is roughly the same size as every other backup at that level). * 'estimate client' relies on the backup program being used to give it info about how big it will be. It gives estimates that are close to 100% accurate, but currently essentially requires running the backup process twice (once for the estimate, once for the actual backup) and imposes a non-negligible amount of load on the client. * 'estimate calcsize' does something kind of in-between. AIUI, it looks at some historical data, and also looks at the on-disk size of the data, then factors in compression ratios and such to give an estimate that's usually reasonably accurate without needing the DLEs to be consistent or imposing significant load on the clients.
Re: Does anyone know how to make an amadmin $config estimate work for new dle's?
On 2018-11-15 11:53, Gene Heskett wrote: On Thursday 15 November 2018 07:36:37 Austin S. Hemmelgarn wrote: On 2018-11-15 06:16, Gene Heskett wrote: I ask because after last nights run it showed one huge and 3 teeny level 0's for the 4 new dle's. So I just re-adjusted the locations of some categories and broke the big one up into 2 pieces. "./[A-P]*" and ./[Q-Z]*", so the next run will have 5 new dle's. But an estimate does not show the new names that results in. I've even took the estimate assignment calcsize back out of the global dumptype, which ack the manpage, forces the estimates to be derived from a dummy run of tar, didn't help. Clues? Having this info from an estimate query might take a couple hours, but it sure would be helpfull when redesigning ones dle's.I'm fairly certain you can't, because it specifically shows server-side estimates, which have no data to work from if there has never been a dump run for the DLE. Even if you told it to user tar for the estimate phase? That has enough legs to be called a bug. IMO anyway. As mentioned in one of my other responses, I can kind of see the value in this not bothering the client systems. Keep in mind that server estimates cost nothing on the client, while calcsize or client estimates may use a significant amount of resources. In this case, I do think the documentation should be a bit clearer, and it would be useful to be able to get regular (calcsize and/or client) estimates on-demand, but I do think that the default is reasonably sane.
Re: Does anyone know how to make an amadmin $config estimate work for new dle's?
On 2018-11-15 11:21, Chris Nighswonger wrote: On Thu, Nov 15, 2018 at 7:40 AM Austin S. Hemmelgarn mailto:ahferro...@gmail.com>> wrote: On 2018-11-15 06:16, Gene Heskett wrote: > I ask because after last nights run it showed one huge and 3 teeny level > 0's for the 4 new dle's. So I just re-adjusted the locations of some > categories and broke the big one up into 2 pieces. "./[A-P]*" > and ./[Q-Z]*", so the next run will have 5 new dle's. > > But an estimate does not show the new names that results in. I've even > took the estimate assignment calcsize back out of the global dumptype, > which ack the manpage, forces the estimates to be derived from a dummy > run of tar, didn't help. > > Clues? Having this info from an estimate query might take a couple hours, > but it sure would be helpfull when redesigning ones dle's. I'm fairly certain you can't, because it specifically shows server-side estimates, which have no data to work from if there has never been a dump run for the DLE. What would be the downside to having the amanda client execute ' du -s' or some such on the DLE and return the results when amcheck and friends realize there is no reliable size estimate? This would seem to be a much more accurate estimate than a non-existent server estimate. My guess is that it's intentionally limited to server estimates to avoid putting load on the client systems. Both calcsize and client estimates require reading a nontrivial amount of data on the client side, and client estimates also involve a nontrivial amount of processing. That said, it would be nice to be able to explicitly run any of the three types of estimate.
Re: Does anyone know how to make an amadmin $config estimate work for new dle's?
On 2018-11-15 06:16, Gene Heskett wrote: I ask because after last nights run it showed one huge and 3 teeny level 0's for the 4 new dle's. So I just re-adjusted the locations of some categories and broke the big one up into 2 pieces. "./[A-P]*" and ./[Q-Z]*", so the next run will have 5 new dle's. But an estimate does not show the new names that results in. I've even took the estimate assignment calcsize back out of the global dumptype, which ack the manpage, forces the estimates to be derived from a dummy run of tar, didn't help. Clues? Having this info from an estimate query might take a couple hours, but it sure would be helpfull when redesigning ones dle's.I'm fairly certain you can't, because it specifically shows server-side estimates, which have no data to work from if there has never been a dump run for the DLE.
Re: Monitor and Manage
On 2018-11-14 10:44, Chris Miller wrote: Hi Folks, I now have three working configs, meaning that I can backup three clients. There is not much difference among the configs, but that is a topic for a different thread. My question is how I manage what AMANDA is doing? So, let's suppose I fire up all three amdumps at once: * How do I know if I'm getting level 0 or higher? * How do I know the backups are running and have not silently failed? * How do I know when they complete? * How do I know what has been accomplished? * : These are all the sort of questions that might be answered by some sort of dashboard, but I haven't heard of any such thing, nor do I expect to hear of any such thing, but I am also equally sure that all the answers exist. I just don't know where. In short, how do I monitor and mange AMANDA? Well, for generic monitoring, make sure the system can deliver email and you have the aliases set up appropriately, and then configure Amanda to email you a report when the dump completes. The reports themselves are actually rather thorough, going over both aggregate timing and performance information as well as the useful generic stuff like knowing what dump level everything ran at and what tapes got used. You can get similar details for the last dump (or the current one if one is in-progress) using the `amstatus` command, which will also show progress info for individual DLE's if there is a dump running currently. For more in-depth management, take a look at the `amadmin` and `amtape` commands, they both provide useful functionality for general management that doesn't involve actually running the backups, including: * Forcing a level 0 or level 1 dump for any or all of the DLE's for the next run. Don't get in the habit of doing this regularly, overriding the planner will usually not get you good results. * Forcing Amanda to bump to a new dump level for a given DLE. Again, don't do this regularly. * Querying when the next level 0 dump is due for a given DLE. This gives you an upper limit on when the DLE will get a level 0 dump assuming you stick to the schedule you told Amanda about. * Querying details about all currently stored backups, including dates, location, and dump status. * Querying the state of all the tapes/vtapes Amanda is managing.
Re: dumporder
On 11/5/2018 1:31 PM, Chris Nighswonger wrote: Is there any wisdom available on optimization of dumporder? This is personal experience only, but I find that in general, if all your dumps run at about the same speed and you don't have to worry about bandwidth, using something like the following generally gets reasonably good behavior: 'ssSS' In essence, it ensures that the smallest dumps complete fast, while still making sure the big ones get started early. Where I work, we've got a couple of slow systems, and I find that this works a bit better under those circumstances: 'ssST' Similar to the above, except it makes sure that the long backups get started early too (I would use 'ssTT', except that we only run one DLE at a time for any given host).
Re: Can AMANDA be configured "per client"?
On 11/5/2018 1:05 PM, Chris Miller wrote: Hi Folks, I have four servers, henceforth AMANDA clients, that I need to backup and a lot of NAS space available. I'd like to configure AMANDA to treat each of the four AMANDA clients as if it were the only client, meaning each client should have it's own configuration which includes location for backup storage. I have a 3 TB staging disk on the AMANDA server. I have reasons for the individual treatment of clients that include off-site storage requirements and differing data sensitivity, so the simple solution is to be able to configure AMANDA to treat each client as a single case, so I can provide for proper security and custody of the backups. Can this be done? Yes, just create a separate configuration for each client on the server (that is, a separate amanda.conf and disklist for each client, with each pair being in it's own sub-folder of your main amanda configuration directory). This is actually a pretty common practice a lot of places (for example, the company I work for has 3 separate configurations that run at different times overnight and have slightly different parameters for the actual backups). The only caveat is that you have to explicitly run dumps for each configuration, but that's really not that hard. Please refer to the small table below. I have some basic questions, but the volume of documentation is difficult to grasp all at once, so please forgive what might seem like trivial questions; they are not yet trivial to me. Using 10.1.1.10 from the table below as an example: 1. I think I define the length of my tapes to be the maximum for a given client backup, which is the size of a level 0 dump, which is 135 GB for the example of 10.1.1.10. Since I want to configure AMANDA to treat each client as an individual and not part of a collection of backup tasks, I assume AMANDA will use one vtape per client per night. Can this be done? How do I qualify configuration settings per client? The main part of this should be answered by my comment above (if you have separate configurations, it's trivial to specify different settings for each client). That said, you probably want the vtape size to be _larger_ than your current theoretical max backup size, because it's very hard to change the vtape 'size' after the fact, and if you run out of space the whole backup may fail. Keep in mind that vtapes only take up whatever space is necessary for the data being stored on them (plus a bit extra for the virtual label), so you can set this to an arbitrarily large value. As an example, the vtape configuration where I work specifies 2TB vtapes, because that's an amount I know for certain we will never hit in one run, even if everything is a level 0 backup. 2. I have planned for one level 0 and five level 1 backups per week. Do I call this "a cycle"? I think I need 185 GB storage per "cycle" and this tells me how many "cycles", in this case, weeks, I can store before I have to re-use tapes. Does this mean I can plan on a 43 cycle (week) retention of my backups? Will AMANDA append to tapes, meaning can I put a full week on one vtape?This doesn't _quite_ line up with what Amanda calls a cycle, see my comments on your next question for more info on that. Also, as mentioned above, assume you will need more space than you calculated, failed backups are a pain to deal with. As far as taping, Amanda _never_ appends to a tape, it only ever rewrites whole tapes. While it's technically possible to get Amanda to pack all the data it can onto one tape across multiple runs, it's generally only a good idea to do this if you need to store backups on a very limited number of physical tapes because: * It means that some of your backup data may sit around on the Amanda server for an extended period of time before being taped (if you're doing a full week's worth of backups on one tape, that level zero backup won't get taped until the end of the week). * Amanda rewrites whole tapes. This means you will lose all backups on a tape when it gets reused. Because you don't have any wasted storage space using vtapes, it's better to just plan on one vtape per run, specify a number appropriate for your retention requirements (plus a few extra to allow recovery from errors), and then just let Amanda run. Such a configuration is more reliable and significantly more predictable. As another concrete example from the configuration where I work: We do 4 week cycles (so we have at least one level zero backup every 28 days for each disk list entry), do daily backups, and retain backups for 16 weeks. For our vtape configuration, this translates to requiring 112 tapes for all the current backups. We need to be able to access the oldest backups during the current cycle, so we have an additional cycle's worth of tapes as well (bringing the total up to 130). We also need to guarantee
Re: Zmanda acquired from Carbonite by BETSOL -- future of Amanda development?
On 2018-10-02 13:29, Gene Heskett wrote: On Tuesday 02 October 2018 12:34:40 Ashwin Krishna wrote: Hi All, We propose to have the call on Oct 8th at 11 AM Mountain Time. Agenda: * Zmanda's Acquisition by BETSOL * Attendee Introductions * Existing Governance Model of Amanda Community * Suggested changes to the Governance Model * BETSOL's Commitment to Open Source Community We have taken a note of all the suggestions received on the mailing list and we will go through the same on the call. Meeting Details: Amanda Open Source Community Discussion Mon, Oct 8, 2018 11:00 AM - 12:00 PM MDT Please join my meeting from your computer, tablet or smartphone. https://global.gotomeeting.com/join/438069045 You can also dial in using your phone. United States: +1 (786) 535-3211 Access Code: 438-069-045 First GoToMeeting? Let's do a quick system check: https://link.gotomeeting.com/system-check Regards, Ashwin Krishna -Original Message- From: Nathan Stratton Treadway Sent: Thursday, September 27, 2018 9:01 PM To: Ashwin Krishna Cc: amanda-users@amanda.org Subject: Re: Zmanda acquired from Carbonite by BETSOL -- future of Amanda development? Ashwin, thanks very much for getting in contact with the Amanda mailing list. On Thu, Sep 27, 2018 at 06:16:02 +, Ashwin Krishna wrote: We are 100% committed to the open source community and will be contributing to the code base to the best of our abilities. [...] I want to assure you that we are actively investing in growing Amanda and we have young enthusiastic engineers in the team. You can expect the next Amanda releases to include support for newer versions of operating systems, defect fixes, security enhancements etc. [...] We have retained the team members that we could of previous Zmanda team. I can tell you that it's not easy without support from the community members. We encourage the community members to guide and contribute as much as you can. If you need commit access to the code base, please don't hesitate to reach out to us. You can expect our commitment and support to you. On Thu, Sep 27, 2018 at 22:54:02 +, Ashwin Krishna wrote: We are planning to host a conference call and would like all the active admins and community members to join to have a discussion with the Zmanda team at BETSOL regarding future collaborations. Will be sending out the meeting details (US time) with the agenda later. It sounds like getting the new BETSOL team in direct contact with the admins for the mailing list and other amanda.org-related resources in an important step at this point. However, I would say that for many of us here on the list, the most notable change in the past 7 months is not related those things (which have continued to chug along as before), but rather the lack of "a developer" to move things along here on the public lists and in the public source repo. A decade or two ago it sounds like there were a number of developers involved, but more recently it's just been one or two Zmanda people who have served that role. Obviously this could be a good time to reconsider this arrangement if there are in fact other people ready to jump in, but off hand I'm guessing that what's likely to work going forward is for there to be a small number of BETSOL developers back in that role. As an Amanda user who has tried to contribute back a few improvements to the code line, I'm not really looking to have direct commit access myself, but rather hope to get back to someone (hanging out here on the mailing lists) who can take the patches I came up with hacking around on my own system and understand whether or not they will really work for everyone, and who will know which branches should have that change pushed onto them, and what tweaks are needed to make the patch apply to some older branch, etc. So, here's hoping you all at BETSOL are soon able to identify someone/a few people to take over that function, and patches and discussions can start flowing again Nathan p.s. Personally I'd say that, rather than than a new major release with support for newer versions of operating systems and whatnot., more urgent would be a minor release to gather up the handful of bugfixes which have already been discussed since 3.5.1 came out and get them published as part of an official release +10, the 3.3.7p1 planner in particular is in serious need of help. It refuses to adjust the schedule of the 3 largest members of my disklist, choosing instead to do all three level 0's on the same run, so a 30 gig average backup, has become 24 gigs for many nights, followed by a 60+ gig run using 3 vtapes. 5 or 6 tapelist cycles in a row now. I'd build this mythical 3.5.1 but its been hidden someplace my browsing has not found. You should be able to get 3.5.1 here: https://sourceforge.net/projects/amanda/files/amanda%20-%20stable/3.5.1/ That said, 3.5.1 doesn't seem to be much
Re: Weird amdump behavior
On 2018-07-30 00:38, Kamil Jońca wrote: Gene Heskett writes: On Saturday 28 July 2018 08:30:27 Kamil Jońca wrote: Gene Heskett writes: [..] Too many dumps per spindle, drive seeks take time=timeout? As I can see in gdb/strace planner hangs on "futex" 'futex' is short for 'Fast Userspace muTEX', it's a synchronization primitive. Based on personal experience (not with Amanda, but just debugging software hangs in general), this usually means it's either a threading issue, or that you've ended up with a deadlock somewhere between processes. Regardless, it's probably an issue on the local system, and most likely only happens when backing up more than one client because you have more processes/threads involved and actually doing things in that case. This is probably going to sound stupid, but try updating/rebuilding/reinstalling Perl, whatever Perl packages Amanda depends on (I don't remember which packages they are), and Amanda itself. Most of the time when I see this kind of issue, it ends up being a case of at-rest data corruption in the executables or libraries, and reinstalling the problem software typically fixes things. 1. I do not configure spindle at all. So its possible to have multiple dumps from the same spindle at the same time. No. There is another parameter, --8<---cut here---start->8--- maxdumps int Default: 1. The maximum number of backups from a single host that Amanda will attempt to run in parallel. See also the inparallel option. --8<---cut here---end--->8--- And I use default value, so I have at most one dump per host at once (and I am quite happy with this) Of course I can change spindles for testing, but, to be honest, I do not understand, how should that help. Please, give every disk in each machine its own unique spindle number. Your backups should be done much faster. I do not want faster dumps . I want working dumps. KJ
Re: taper should wait until all dumps are done
On 2018-07-27 14:15, Stefan G. Weichinger wrote: Am 27.07.2018 um 19:37 schrieb Austin S. Hemmelgarn: Perhaps I can help with that. Great stuff, thanks for your informative reply, that's exactly the information I would like to have in the docs etc Will consult that in detail asap. A quick note on what I try to solve here: I have servers with only one big RAID-array consisting of maybe 4 or 6 physical disks, and based on that (software-)RAID there is one LVM volume group. So the logical volumes containing the data to be backed up (DLEs) are on the same array as the other LV providing the amanda holding disk. Yes, I know, that's not optimal, though I can't easily change that (I would have to add separate disks for holding disk purpose ... cost and space/controller issues) Don't worry, I've got to deal with similarly sub-optimal stuff where I work (our backup server has to multiplex all the dumps _and_ taping over a single GbE connection, so our backups are _always_ network-bound, even when we do really aggressive compression), so I entirely understand. So I want to avoid too much parallel activity of dumper and taper processes because that lets the throughput drop down massively (not to mention the additional stress on the hardware). So it would be great to be able to tell amanda "the DLEs coming from the amanda client which is the amanda server (~localhost) should be dumped to holdingdisk while no taper processes run" Or something in that direction. I will consider reducing maxdumps to 4 as well and test "" for tonight's run. And yes, I also test "holdingdisk no" for some DLEs already: I have big chunks of VM backups where it doesn't make sense to copy them within the RAID array ... I tape them directly. If you're taping to vtapes, you might actually be able to set things up to not need a holding disk at all. I'm a bit fuzzy on how to configure it, but I know it's possible to set up vtapes to tape things in parallel. If you do that, you could (probably, again not 100% certain) get rid of the holding disk, dump direct to the vtapes, and still have the dumps run in parallel. That would avoid having to worry about the taper processes competing with the dumper processes. The only caveat is that failure to tape would mean failure to dump too, but the number of situations where you would fail to tape but still be able to dump to the same array as a holding disk is near zero, and the only one I can think of off the top of my head is completely avoided by not having a holding disk.
Re: taper should wait until all dumps are done
On 2018-07-19 09:41, Stefan G. Weichinger wrote: I know about the 2 parameters flush-threshold-dumped flush-threshold-scheduled but how to make sure that *all* the planned dumps are done before writing to tape? Some kind of "taper-wait" ... Or just by trial-and-error with the 2 mentioned parameters? You can do this by figuring out the upper limit of how much space all your backups will need, figuring out what percentage of your tape size that translates to, and then setting both of the flush-threshold values to that percentage, taperflush to 0 (to flush everything), and autoflush to 'yes' (so that it actually flushes the data). However, keep in mind that for this to work, your holding disk has to be able to hold all of your dumps for a single run simultaneously.
Re: taper should wait until all dumps are done
On 2018-07-27 12:23, Stefan G. Weichinger wrote: Am 27.07.2018 um 17:02 schrieb Jean-Francois Malouin: You should also consider playing with dumporder. I have it set to '' and that makes the longest (time wise) dumps go first so that the fast ones get push at the end. In one config I have: dumporder "" flush-threshold-dumped 100 flush-threshold-scheduled 100 taperflush 100 autoflush yes so that all the dumps will wait until the longest one are done. It also won't go until it can fill one volume (100%). You can obviously go further than that if you have enough hold disk. Or at least it's my understanding... (the ML was down for a while, so that's the reason for my delayed response, it should work now) I checked "dumporder" in that config, it was "BTBT...", I changed it to "TTT..." now for a test. Although I am not 100% convinced that this will do the trick ;-) We will see. I never fully understood that parameter and its influence so far, to me it's a bit "unintuitive". Perhaps I can help with that. Part of what Amanda's scheduling does is figure out the size that each backup will be on each run (based on the estimate process), how much bandwidth it will need while dumping (based on the bandwidth settings for that particular dump type), and the amount of time it will take (predicted based on the size, prior timing data, and possibly the bandwidth). That information is then used together with the 'dumporder' setting to control how each dumper chooses what dump to do next when it finishes dumping. Each letter in the value corresponds to exactly one dumper, and controls only that dumper's selection. The size-based selection is generally the easiest to explain, it just says to pick the largest (for 'S') or smallest (for 's') dump out of the set and run that next. The bandwidth-based selection is only relevant if you have bandwidth settings configured. Without them, it treats all dumps as equal, and picks the next dump based solely on the order that amanda has them sorted (which, IIRC, matches the order found in the disk list). With them, it uses a similar selection method to the size-based selection, just looking at bandwidth instead of size. The time-based selection is where things get tricky, but they get tricky because of how complicated it is to predict how long a dump will take, not because the selection is complicated (it works just like size-based selection, just looking at estimated runtime instead of size). Pretty much, the timing data is extrapolated by looking at previous dumps of the DLE, correlating size and actual run-time. I'm not sure what fitting method it uses for the extrapolation (my first guess would be simple linear extrapolation, because that's easy and should work most of the time), and I'm also not sure what, if any, impact bandwidth has on the calculation. So, in short you have: * 'S' and 's': Simple deterministic selection based on the predicted size of the dump. * 'B' and 'b': Simple deterministic selection based on bandwidth settings if they are defined, otherwise trivial FIFO selection. * 'T' and 't': Not quite deterministic selection based on predicted execution time of the dump process. So, for a couple of examples: * The default setting 'BTBTBTBT' This will have half the dumpers select dumps that will take the largest amount of time, and the other select the ones that will take the largest amount of bandwidth. This works reasonably well if you have bandwidth settings configured and wide variance in dump size. * What you're looking at testing '': This is a trivial case of all dumpers selecting the dumps that will take the longest time. If you're dumping almost all similar hosts, this will be essentially equivalent to just selecting the largest. If you're dumping a wide variety of different hosts, it will be equivalent to selecting the largest on the first dump, but after that will select based on which system takes the longest. * What I use on my own systems 'SSss' (I only run four dumpers, not eight): This is a reasonably simple option that gives a good balance between getting dumps done as quickly as possible, and not wasting time waiting on the big ones. Two of the dumpers select whatever dump is the largest, so that some of the big ones get started right away, while the other two select the smallest dumps, so that those get backed up immediately. I've done some really simple testing that indicates that this actually gets all the dumps done faster on average than the default for the case of all your systems being able to dump data at the same rate. * What we use where I work 'TTss': This is one where things get a bit complicated. There are three different ways things get selected here. First, two of the eight dumpers will select dumps that are going to take the longest amount of time. Then, you have four that will pull the largest ones, and two that
Re: custom_compress with zstd
On 2018-04-04 06:01, Stefan G. Weichinger wrote: Am 2018-04-03 um 20:52 schrieb Austin S. Hemmelgarn: On 2018-04-03 14:25, Stefan G. Weichinger wrote: Does anyone already use zstd https://en.wikipedia.org/wiki/Zstandard with amanda? I will try to define an initial dumptype and play around although I wonder if the standard behavior leads to any problems. zstd does not remove the source file after de/compression per default (only with "--rm") ... but as it is used within a pipe (?) with amanda I assume that won't hurt. The "-d" for decompression is there so that should work. I've been using it for a few months now both at home and at work. It works just fine as-is and gets pretty good performance. In both cases though, I actually use a wrapper script. The one for backups at work just adds `-T2` to the zstd command line as our backup server has lots of CPU (and CPU time), but the backups are network-limited. At home, I also bump the compression level as high as I can without needing special decompression options (so the full command line at home that the wrapper passes is `-19 --long --zstd=hlog=26 -T2`). I've done numerous restores from both sets of backups both with and without the wrapper script (I initially set both up to just use zstd directly), and it all appears to work just fine. Would this work as well? That's essentially what I used initially, and I had no issues with it at all either backing things up or restoring. -> define dumptype client-zstd-tar { global program "GNUTAR" comment "custom client compression dumped with tar" compress client custom client_custom_compress "/usr/bin/zstd" }
Re: custom_compress with zstd
On 2018-04-03 14:25, Stefan G. Weichinger wrote: Does anyone already use zstd https://en.wikipedia.org/wiki/Zstandard with amanda? I will try to define an initial dumptype and play around although I wonder if the standard behavior leads to any problems. zstd does not remove the source file after de/compression per default (only with "--rm") ... but as it is used within a pipe (?) with amanda I assume that won't hurt. The "-d" for decompression is there so that should work. I've been using it for a few months now both at home and at work. It works just fine as-is and gets pretty good performance. In both cases though, I actually use a wrapper script. The one for backups at work just adds `-T2` to the zstd command line as our backup server has lots of CPU (and CPU time), but the backups are network-limited. At home, I also bump the compression level as high as I can without needing special decompression options (so the full command line at home that the wrapper passes is `-19 --long --zstd=hlog=26 -T2`). I've done numerous restores from both sets of backups both with and without the wrapper script (I initially set both up to just use zstd directly), and it all appears to work just fine.
Re: Amanda clients running Docker
On 2018-03-27 11:12, Joi L. Ellis wrote: I’m looking for information about how best to manage Amanda clients upon which are Devs are running docker containers. Some of the production hosts are also running containers. Does anyone have suggestions regarding best practices for backing up docker containers in an Amanda environment? (I don’t use docker and I haven’t found anything online discussing containers on Amanda clients.) Any pointers, suggestions, or online references would be very welcome. I don't use Docker myself, but I do use LXC and know a lot of people who use a wide variety of container platforms including Docker, and the general principals are pretty much the same regardless of platform. You have 5 options for handling container backups with Amanda: 1. Back up the containers as part of the regular host-system backup, and do all the containers together as one DLE. 2. Back up the containers as part of the regular host system backup with each container being it's own DLE (or DLE's). 3. Back up the containers in a separate backup set from the host system, with one DLE per host system. 4. Back up the containers in a separate backup set from the host system, with one DLE per container. 5. Back up the containers from the containers themselves. Of these, most people I know use option 2 or 4 (I use approach 2 with locally written integration with the LXC to get the list of containers to back up). Option 1 is probably the easiest, but can have performance issues if you have lots of containers (and requires a bit of effort to make sure you don't back up transient things like CI build containers). Option 3 suffers from the same issues that option 1 does, but takes more effort to set up. Option 5 violates principles of minimalism, and is only really practical if your containers are full-system images instead of just bare-bones micro-services.
Re: some suggested config parameters for backups to local disk
On 2018-03-23 08:25, hy...@lactose.homelinux.net wrote: "Ryan, Lyle (US)" writes: The server has an 11TB filesystem to store the backups in. I should probably be fancier and split this up more, but not now. So I've got my holding, state, and vtapes directories all in there. In this scenario, I would think there's no point to a "holding" disk. I use a holding disk because my actual backup disk is external-USB and (comparatively) slow. So I backup to a holding disk on my internal SSD, releasing the client and the network as soon as possible, and then copy the backup to the backup drive afterwards. But in your case, I don't see any benefit. There are two other benefits to having a holding disk: 1. It lets you run dumps in parallel. Without a holding disk (or some somewhat complicated setup of the vtapes to allow parallel taping), you can only dump one DLE at a time because it dumps directly to tape. 2. It lets you defer taping until you have some minimum amount of data ready to be taped. This may sound kind of useless when working with vtapes, but if the holding disk is on the same device as the final vtape library, deferring until the dumps are all done (or at least, almost all done) can help improve dumping performance, because the dump processes won't be competing with the taper process for disk bandwidth.
Re: some suggested config parameters for backups to local disk
On 2018-03-22 19:03, Ryan, Lyle (US) wrote: I've got an Amanda 3.4.5 server running on Centos 7 now, and am able to do rudimentary backups of a remote client. But in spite of reading man pages, HowTo's, etc, I need help choosing config params. I don't mind continuing to read and experiment, but if someone could get me at least in the ballpark, I'd really appreciate it. The server has an 11TB filesystem to store the backups in. I should probably be fancier and split this up more, but not now. So I've got my holding, state, and vtapes directories all in there. The main client I want to back up has 4TB I want to backup. It's almost all in one filesystem, but the HowTo for splitting DLE's with exclude lists is clear, so it should be easy to split this into (say) 10 smaller individual dumps. The bulk of the data is pretty static, maybe 10%/month changes. It's hard to imagine 20%/month changing. For a start, I'd like to get a full done every 2 weeks, and incrementals/differentials on the intervening days. If I have room to keep 2 fulls (2 complete dumpcycles) that would be great. Given what you've said, you should have enough room to do so, but only if you use compression. Assuming the rate of change you quote above s approximately constant and doesn't result in bumping to a level higher than 1, then without compression you will need roughly 4.015TB per cycle (4TB for the full backup, ~15.38GB for the incrementals (roughly 0.38% change per day for 13 days)), plus 4TB of space for the holding disk (because you have to have room for a full backup _there_ prior to taping anything). With compression and assuming you get a compression ratio of about 50%, you should actually be able to fit four complete cycles (you would need about 2.0075TB per cycle), though if you decide you want that I would bump the tapecycle to 60 and the number of slots to 60. So I'm thinking: - dumpcycle = 14 - runspercycle = 0 (default) - tapecycle = 30 - runtapes = 1 (default) I'd break the filesystem into 10 pieces, so 400GB each. and make the vtapes 400GB each (with tapetype length) relying on server-side compression to make it fit. The HowTo "Use pigz to speed compression" looks clear, and the DL380 G7 isn't doing anything else, so server-side compression sounds good. Any advice on this or better ideas? Maybe I'm off in left-field. And one bonus question: I'm assuming Amanda will just make vtapes as necessary, but is there any guidance as to how many vtape slots I should create ahead of time? If my dumpcycle=14, maybe create 14 slots just to make tapes easier to find? Debra covered the requirements for vtapes, slots, and everything very well in her reply, so I won't repeat any of that here. I do however have some other more generic advice I can give based on my own experience: * Make your vtapes as large as possible. They won't take up any space beyond what's stored on them (in storage terminology, they're thinly provisioned), so their total 'virtual' size can be far more than your actual storage capacity, but if you can make it so that you can always fit a full backup on a single vtape, it will make figuring out how many vtapes you need easier, and additionally give a slight boost to taping performance (because the taper never has to stop to switch to a new vtape). In your case, I'd say stating 5TB for your vtape size is reasonable, that would give you some extra room if you suddenly have more data without being insanely over-sized. * Make sure to set a reasonable part_size for your vtapes. While you wouldn't have to worry about splitting dumps if you take my above advice about vtape size, using parts has some other performance related advantages. I normally use 1G, but all of my dumps are less than 100G in size. In your case, if you'll have 10 400G dumps, I'd probably go for 4G for the part size. * Match your holding disk chunk size to your vtape's part_size. I have no hard number to back this up, but it appears to provide a slight performance improvement while dumping data. * Don't worry right now about parallelizing the taping process. It's somewhat complicated to get it working right, significantly changes how you have to calculate vtape slots and sizes, and will probably not provide much benefit unless you're taping to a really fast RAID array that does a very good job of handling parallel writes. * There's essentially zero performance benefit to having your holding disk on a separate partition from your final storage unless you have it on a completely separate disk. There are some benefits in terms of reliability, but realizing them requires some significant planning (you have to figure out exactly what amount of space your holding disk will need). * If you're indexing the backups, store the working index directory (the one Amanda actually reads and writes to) on a separate drive from the holding disk and final backup
Re: installing on Centos 7 - some newbee questions
On 2018-03-07 21:30, Ryan, Lyle (US) wrote: Hello all. I’m getting my first Amanda server running on Centos 7 and have a few questions: - Centos is packaged with 3.3.3 Is that good enough or should I build 3.5? Provided it's not missing any features you need and doesn't have any bugs that affect you, yeah it should be fine (and assuming of course you're not exposing it to the internet). This applies even if you've got other versions on the network too (provided the protocols match up, it's perfectly possible to run differing versions of Amanda throughout the network). - the server will use only disks, no tapes. 10TB, mostly all devoted to /home (though I could repartition) - I believe I still use vtapes and a holding disk, even though they’ll all just be directories on the main partition. sound right? Yes. The holding disk is actually pretty important even when using vtapes for two reasons: 1. It allows you to back-up DLE's that are larger than the size you've specified for your vtapes. 2. It lets you run multiple backups in parallel without having to jump through hoops to allow Amanda to write to multiple vtapes in parallel. One quick tip regarding this type of configuration: Try to match the part-size tapetype option and the chunksize option for the holding disk. As stupid as it sounds, matching these actually improves performance by a measurable amount in most cases. If you've got a bunch of big backups, 1GB is generally a reasonable size for both. - I follow the instructions at https://wiki.zmanda.com/index.php/GSWA/Build_a_Basic_Configuration but when running amcheck get the error: can not stat /var/lib/Amanda/gnutar-lists - indeed there is no file present there. any ideas? Just create it and set the correct permissions. Strictly speaking, the package should create this when installed, but it seems a number of distributions' packages don't do so.
Re: keep a backup forever
On 2018-01-30 17:18, ghe wrote: On 01/30/2018 12:29 PM, hy...@lactose.homelinux.net wrote: I feel like I've asked this before, but I can't find any emails. I can't believe this isn't an FAQ. Or rather, there is an FAQ, but the answer is (a) very sparse and (b) doesn't really answer the question. I had a machine. That machine was getting regular backups. The machine died. I have replaced it with a new machine. So having had this emergency, I now want to keep, in perpetuity, my last full backup of the now-dead machine. How big was the dead disk? Do you have space to store the whole thing? Did amanda do a level 0 of the whole dead disk to 17? If not, there are very likely pieces of that disk on several of your virtual tapes. amrestore deals with all that. The backup in question is on (virtual) tape number 17. So let's say I take the approparite files that are in my /storage/amanda/vtapes/slot17 directory and copy them somewhere safe. Six months go by, my real slot17 gets reused, and I take those old files and copy them into slot44. What is my next step? How do I get those backups back into my amanda index so that I can amrecover from them? Is that what amreindex does? Is that what amrestore does? What I'd do is recover the last files amanda backed up from that disk, using amrestore. I'd restore to a disk, consider that the perpetual backup, and not try to get that old disk data anywhere in amanmda's database -- amanda is very much oriented to reusing things in a cycle, and trying to get her to change her ways can be difficult. amrestore's a pleasant piece of software to use. You just tell it the date you want to restore, the disk, the files, and some other things (I use it infrequently, and I have to read the man page every time). amrestore figures out which tapes you need, and restores the data. Then you can do what you want with them -- burn to optical, buy a new disk, whatever. I would suggest the same approach myself. In fact, that's pretty much what we do where I work. Whenever we permanently decomission a system, it gets pulled from the backup rotation, and we image the disk and store the disk image in archival storage that's separate from the storage we use for regular backups. Our procedure is similar for a failed disk we don't plan to replace, except instead of imaging it as-is, we rebuild it from backups and then image it (the imaging procedure was the norm before we switched to amanda, so it's just kind of stuck around).
Re: application amgtar ignore messages
On 2017-12-07 22:26, Jon LaBadie wrote: If I want amgtar to ignore certain messages, is it sufficient to list the message on the amanda server or must the ignored message also be listed in amanda-client.conf? I've done it several times, only on the server, and it seemed to work fine. But I'm now trying to ignore one message that appears on only one client and I'm having no success. Do I need to set up an "application amgtar" stanza on the client? Doesn't affect the question, but the problem is caused by the "gnome virtual file system directory", /home/user/.config/.gvfs. This is a fuser mountpoint not accessible by root. So it generates a "can not stat" error message from amgtar. The better approach to this is to add that to the exclude file for that particular disk. It's a well known path, so nothing else should be using it, and it's an area that shouldn't be dumped anyway, for a lot of the same reasons you shouldn't be dumping /sys or /dev/shm (and in fact, it isn't getting dumped, because amgtar can't see inside it).
Re: Odd non-fatal errors in amdump reports.
On 2017-11-14 14:37, Austin S. Hemmelgarn wrote: On 2017-11-14 07:43, Austin S. Hemmelgarn wrote: On 2017-11-14 07:34, Austin S. Hemmelgarn wrote: On 2017-11-13 16:42, Jean-Louis Martineau wrote: On 13/11/17 02:53 PM, Austin S. Hemmelgarn wrote: driver: send-cmd time 9300.326 to taper1: VAULT-WRITE worker1-0 00-00120 local-vtl local-vtl Home-0001 client0 /home/1D 0 20171113073255 "" "" "" "" 1073741824 memory "" "" 0 > FAIL taper "ST:cloud" "POOL:cloud" client0 /home/1D 20171113073255 0 error "File 0 not found" Do that dump still exists on tape Home-0001? Find it with amfetchdump. If yes, send me the taper debug file. amfetchdump does not see it, but looking directly at the virtual tape directories, I can see it there. Just tried an amcheckdump on everything, it looks like some of the dump files are corrupted, but I can't for the life of me figure out why (I test our network regularly and it has no problems, and any problems with a particular system should show up as more than just corrupted tar files). I'm going to try disabling compression and see if that helps at all, as that's the only processing other than the default that we're doing on the dumps (long term, it's not really a viable option, but if it fixes things at least we know what's broken). No luck changing compression. I would suspect some issue with NFS, but I've started seeing the same symptoms on my laptop as well now (which is completely unrelated to any of the sets at work other than having an almost identical configuration other than paths and the total number of tapes). So, I finally got things working by switching from: storage "local-vtl" vault-storage "cloud" To: storage: "local-vtl" "cloud" And removing the "vault" option from the local-vtl storage definition. Strictly speaking, this is working around the issue instead of fixing it, but it fits within what we need for our usage, and actually makes the amdump runs complete faster (since dumps get taped to S3 in parallel with getting taped to the local vtapes). Based on this, and the fact that the issues I was seeing with corrupted dumps being reported by amcheckdump, I think the issue is probably an interaction between the vaulting code and the regular taping code, but I'm not certain. Thanks for the help.
Re: Odd non-fatal errors in amdump reports.
On 2017-11-14 07:43, Austin S. Hemmelgarn wrote: On 2017-11-14 07:34, Austin S. Hemmelgarn wrote: On 2017-11-13 16:42, Jean-Louis Martineau wrote: On 13/11/17 02:53 PM, Austin S. Hemmelgarn wrote: driver: send-cmd time 9300.326 to taper1: VAULT-WRITE worker1-0 00-00120 local-vtl local-vtl Home-0001 client0 /home/1D 0 20171113073255 "" "" "" "" 1073741824 memory "" "" 0 > FAIL taper "ST:cloud" "POOL:cloud" client0 /home/1D 20171113073255 0 error "File 0 not found" Do that dump still exists on tape Home-0001? Find it with amfetchdump. If yes, send me the taper debug file. amfetchdump does not see it, but looking directly at the virtual tape directories, I can see it there. Just tried an amcheckdump on everything, it looks like some of the dump files are corrupted, but I can't for the life of me figure out why (I test our network regularly and it has no problems, and any problems with a particular system should show up as more than just corrupted tar files). I'm going to try disabling compression and see if that helps at all, as that's the only processing other than the default that we're doing on the dumps (long term, it's not really a viable option, but if it fixes things at least we know what's broken). No luck changing compression. I would suspect some issue with NFS, but I've started seeing the same symptoms on my laptop as well now (which is completely unrelated to any of the sets at work other than having an almost identical configuration other than paths and the total number of tapes).
Re: Odd non-fatal errors in amdump reports.
On 2017-11-14 07:34, Austin S. Hemmelgarn wrote: On 2017-11-13 16:42, Jean-Louis Martineau wrote: On 13/11/17 02:53 PM, Austin S. Hemmelgarn wrote: driver: send-cmd time 9300.326 to taper1: VAULT-WRITE worker1-0 00-00120 local-vtl local-vtl Home-0001 client0 /home/1D 0 20171113073255 "" "" "" "" 1073741824 memory "" "" 0 > FAIL taper "ST:cloud" "POOL:cloud" client0 /home/1D 20171113073255 0 error "File 0 not found" Do that dump still exists on tape Home-0001? Find it with amfetchdump. If yes, send me the taper debug file. amfetchdump does not see it, but looking directly at the virtual tape directories, I can see it there. Just tried an amcheckdump on everything, it looks like some of the dump files are corrupted, but I can't for the life of me figure out why (I test our network regularly and it has no problems, and any problems with a particular system should show up as more than just corrupted tar files). I'm going to try disabling compression and see if that helps at all, as that's the only processing other than the default that we're doing on the dumps (long term, it's not really a viable option, but if it fixes things at least we know what's broken).
Re: Odd non-fatal errors in amdump reports.
On 2017-11-13 16:42, Jean-Louis Martineau wrote: On 13/11/17 02:53 PM, Austin S. Hemmelgarn wrote: driver: send-cmd time 9300.326 to taper1: VAULT-WRITE worker1-0 00-00120 local-vtl local-vtl Home-0001 client0 /home/1D 0 20171113073255 "" "" "" "" 1073741824 memory "" "" 0 > FAIL taper "ST:cloud" "POOL:cloud" client0 /home/1D 20171113073255 0 error "File 0 not found" Do that dump still exists on tape Home-0001? Find it with amfetchdump. If yes, send me the taper debug file. amfetchdump does not see it, but looking directly at the virtual tape directories, I can see it there.
Re: power down hard drives
On 2017-11-13 14:51, Jon LaBadie wrote: On Mon, Nov 13, 2017 at 02:04:42PM -0500, Gene Heskett wrote: On Monday 13 November 2017 13:42:13 Jon LaBadie wrote: On Mon, Nov 13, 2017 at 11:40:17AM -0500, Austin S. Hemmelgarn wrote: On 2017-11-13 11:11, Gene Heskett wrote: On Monday 13 November 2017 10:12:47 Austin S. Hemmelgarn wrote: On 2017-11-13 09:56, Gene Heskett wrote: On Monday 13 November 2017 07:19:45 Austin S. Hemmelgarn wrote: On 2017-11-11 01:49, Jon LaBadie wrote: Just a thought. My amanda server has seven hard drives dedicated to saving amanda data. Only 2 are typically used (holding and one vtape drive) during an amdump run. Even then, the usage is only for about 3 hours. So there is a lot of electricity and disk drive wear for inactive drives. Can todays drives be unmounted and powered down then when needed, powered up and mounted again? I'm not talking about system hibernation, the system and its other drives still need to be active. Back when 300GB was a big drive I had 2 of them in external USB housings. They shut themselves down on inactivity. When later accessed, there would be about 5-10 seconds delay while the drive spun up and things proceeded normally. That would be a fine arrangement now if it could be mimiced. Aside from what Stefan mentioned (using hdparam to set the standby timeout, check the man page for hdparam as the numbers are not exactly sensible), you may consider looking into auto-mounting each of the drives, as that can help eliminate things that would keep the drives on-line (or make it more obvious that something is still using them). ... But if I allow the 2TB to be unmounted and self-powered down, once daily, what shortening of its life would I be subjected to? In other words, how many start-stop cycles can it survive? It's hard to be certain. For what it's worth though, you might want to test this to be certain that it's actually going to save you energy. It takes a lot of power to get the platters up to speed, but it doesn't take much to keep them running at that speed. It might be more advantageous to just configure the device to idle (that is, park the heads) after some time out and leave the platters spinning instead of spinning down completely (and it should result in less wear on the spindle motor). In my situation, each of the six data drives is only needed for a 2 week period out of each 12 weeks. Once shutdown, it could be down for 10 weeks. Jon Which is more than enough time for stiction to appear if the heads are not parked off disk. Don't today's drives automatically park heads? I don't think there were ever any (at least, not ATA or SAS) that didn't when they went into standby. In fact, I've never seen a modern style hard disk with 'voice coil' style actuators that didn't automatically park the heads (and part of my job is tearing apart old hard drives prior to physical media destruction, so I've seen my fair share of them).
Re: Odd non-fatal errors in amdump reports.
On 2017-11-10 12:52, Jean-Louis Martineau wrote: The previous patch broke something. Try this new set2-r2.diff patch Unfortunately, that doesn't appear to have fixed it, though the errors look different now. I'll try and get the log scrubbed by the end of the day and post it here. On 10/11/17 10:40 AM, Austin S. Hemmelgarn wrote: > On 2017-11-10 08:27, Jean-Louis Martineau wrote: >> On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote: >>> On 2017-11-08 08:03, Jean-Louis Martineau wrote: >>>> On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote: >>>> > On 2017-11-07 10:22, Jean-Louis Martineau wrote: >>>> >> Austin, >>>> >> >>>> >> It's hard to say something with only the error message. >>>> >> >>>> >> Can you post the amdump. and log..0 for >>>> the 2 >>>> >> backup set that fail. >>>> >> >>>> > I've attached the files (I would put them inline, but one of the >>>> sets >>>> > has over 100 DLE's, so the amdump file is huge, and the others are >>>> > still over 100k each, and I figured nobody want's to try and wad >>>> > through those in-line). >>>> > >>>> > The set1 and set2 files are for the two backup sets that show the >>>> > header mismatch error, and the set3 files are for the one that >>>> claims >>>> > failures in the dump summary. >>>> >>>> >>>> I looked at set3, the error in the 'DUMP SUMMARY' are related to the >>>> error in the 'FAILURE DUMP SUMMARY' >>>> >>>> client2 /boot lev 0 FLUSH [File 0 not found] >>>> client3 /boot lev 0 FLUSH [File 0 not found] >>>> client7 /boot lev 0 FLUSH [File 0 not found] >>>> client8 /boot lev 0 FLUSH [File 0 not found] >>>> client0 /boot lev 0 FLUSH [File 0 not found] >>>> client9 /boot lev 0 FLUSH [File 0 not found] >>>> client9 /srv lev 0 FLUSH [File 0 not found] >>>> client9 /var lev 0 FLUSH [File 0 not found] >>>> server0 /boot lev 0 FLUSH [File 0 not found] >>>> client10 /boot lev 0 FLUSH [File 0 not found] >>>> client11 /boot lev 0 FLUSH [File 0 not found] >>>> client12 /boot lev 0 FLUSH [File 0 not found] >>>> >>>> They are VAULT attemp, not FLUSH, looking only at the first entry, it >>>> try to vault 'client2 /boot 0 20171024084159' which it expect to >>>> find on >>>> tape Server-01. It is an older dump. >>>> >>>> Do Server-01 is still there? Did it still contains the dump? >>>> >>> OK, I've done some further investigation by tweaking the labeling a >>> bit (which actually fixed a purely cosmetic issue we were having), >>> but I'm still seeing the same problem that prompted this thread, and >>> I can confirm that the dumps are where Amanda is trying to look for >>> them, it's just not seeing them for some reason. I hadn't thought >>> of this before, but could it have something to do with the virtual >>> tape library being auto-mounted over NFS on the backup server? >>> >> Austin, >> >> Can you try to see if amfetchdump can restore it? >> >> * amfetchdump CONFIG client2 /boot 20171024084159 >> > amfetchdump doesn't see it, and neither does amrecover, but the files > for the given parts are definitely there (I know for a fact that the > dump in question has exactly one part, and the file for that does > exist on the virtual tape mentioned in the log file). > > I'm probably not going to be able to check more on this today, but > I'll likely be checking if amrestore and amadmin find can see them. >
Re: power down hard drives
On 2017-11-13 11:11, Gene Heskett wrote: On Monday 13 November 2017 10:12:47 Austin S. Hemmelgarn wrote: On 2017-11-13 09:56, Gene Heskett wrote: On Monday 13 November 2017 07:19:45 Austin S. Hemmelgarn wrote: On 2017-11-11 01:49, Jon LaBadie wrote: Just a thought. My amanda server has seven hard drives dedicated to saving amanda data. Only 2 are typically used (holding and one vtape drive) during an amdump run. Even then, the usage is only for about 3 hours. So there is a lot of electricity and disk drive wear for inactive drives. Can todays drives be unmounted and powered down then when needed, powered up and mounted again? I'm not talking about system hibernation, the system and its other drives still need to be active. Back when 300GB was a big drive I had 2 of them in external USB housings. They shut themselves down on inactivity. When later accessed, there would be about 5-10 seconds delay while the drive spun up and things proceeded normally. That would be a fine arrangement now if it could be mimiced. Aside from what Stefan mentioned (using hdparam to set the standby timeout, check the man page for hdparam as the numbers are not exactly sensible), you may consider looking into auto-mounting each of the drives, as that can help eliminate things that would keep the drives on-line (or make it more obvious that something is still using them). I've investigated that, and I have amanda wrapped up in a script that could do that, but ran into a showstopper I've long since forgotten about. Al this was back in the time I was writing that wrapper, years ago now. One of the show stoppers AIR was the fact that only root can mount and unmount a drive, and my script runs as amanda. While such a wrapper might work if you use sudo inside it (you can configure sudo to allow root to run things as the amanda user without needing a password, then run the wrapper as root), what I was trying to refer to in a system-agnostic manner (since the exact mechanism is different between different UNIX derivatives) was on-demand auto-mounting, as provided by autofs on Linux or the auto-mount daemon (amd) on BSD. When doing on-demand auto-mounting, you don't need a wrapper at all, as the access attempt will trigger the mount, and then the mount will time out after some period of inactivity and be unmounted again. It's mostly used for network resources (possibly with special auto-lookup mechanisms), as certain protocols (NFS in particular) tend to have issues if the server goes down while a share is mounted remotely, even if nothing is happening on that share, but it works just as well for auto-mounting of local fixed or removable volumes that aren't needed all the time (I use it for a handful of things on my personal systems to minimize idle resource usage). Sounds good perhaps. I am currently up to my eyeballs in an unrelated problem, and I won't get to this again until that project is completed and I have brought the 2TB drive in and configured it for amanda's usage. That will tend to enforce my one thing at a time but do it right bent. :) What I have is working for a loose definition of working... Yeah, I know what that's like. Prior to switching to amanda where I worked, we had a home-grown backup system that had all kinds of odd edge cases I had to make sure never happened. I'm extremely glad we decided to stop using that, since it means I can now focus on more interesting problems (in theory at least, we're having an issue with our Amanda config right now too, but thankfully it's not a huge one). But if I allow the 2TB to be unmounted and self-powered down, once daily, what shortening of its life would I be subjected to? In other words, how many start-stop cycles can it survive? It's hard to be certain. For what it's worth though, you might want to test this to be certain that it's actually going to save you energy. It takes a lot of power to get the platters up to speed, but it doesn't take much to keep them running at that speed. It might be more advantageous to just configure the device to idle (that is, park the heads) after some time out and leave the platters spinning instead of spinning down completely (and it should result in less wear on the spindle motor). Interesting, I had started a long time test yesterday, and the reported hours has wrapped in the report, apparently at 65636 hours. Somebody apparently didn't expect a drive to last that long? ;-) The drive? Healthy as can be. That's about 7.48 years, so I can actually somewhat understand not going past 16-bits for that since most people don't use a given disk for more than about 5 years worth of power-on time before replacing it. However, what matters is really not how long the device has been powered on, but how much abuse the drive has taken. Running 24/7 for 5 years with no movement of the system (including nothing like earthquakes), in a temperature, humidity, and pressure controlled room will get
Re: power down hard drives
On 2017-11-13 09:56, Gene Heskett wrote: On Monday 13 November 2017 07:19:45 Austin S. Hemmelgarn wrote: On 2017-11-11 01:49, Jon LaBadie wrote: Just a thought. My amanda server has seven hard drives dedicated to saving amanda data. Only 2 are typically used (holding and one vtape drive) during an amdump run. Even then, the usage is only for about 3 hours. So there is a lot of electricity and disk drive wear for inactive drives. Can todays drives be unmounted and powered down then when needed, powered up and mounted again? I'm not talking about system hibernation, the system and its other drives still need to be active. Back when 300GB was a big drive I had 2 of them in external USB housings. They shut themselves down on inactivity. When later accessed, there would be about 5-10 seconds delay while the drive spun up and things proceeded normally. That would be a fine arrangement now if it could be mimiced. Aside from what Stefan mentioned (using hdparam to set the standby timeout, check the man page for hdparam as the numbers are not exactly sensible), you may consider looking into auto-mounting each of the drives, as that can help eliminate things that would keep the drives on-line (or make it more obvious that something is still using them). I've investigated that, and I have amanda wrapped up in a script that could do that, but ran into a showstopper I've long since forgotten about. Al this was back in the time I was writing that wrapper, years ago now. One of the show stoppers AIR was the fact that only root can mount and unmount a drive, and my script runs as amanda. While such a wrapper might work if you use sudo inside it (you can configure sudo to allow root to run things as the amanda user without needing a password, then run the wrapper as root), what I was trying to refer to in a system-agnostic manner (since the exact mechanism is different between different UNIX derivatives) was on-demand auto-mounting, as provided by autofs on Linux or the auto-mount daemon (amd) on BSD. When doing on-demand auto-mounting, you don't need a wrapper at all, as the access attempt will trigger the mount, and then the mount will time out after some period of inactivity and be unmounted again. It's mostly used for network resources (possibly with special auto-lookup mechanisms), as certain protocols (NFS in particular) tend to have issues if the server goes down while a share is mounted remotely, even if nothing is happening on that share, but it works just as well for auto-mounting of local fixed or removable volumes that aren't needed all the time (I use it for a handful of things on my personal systems to minimize idle resource usage).
Re: Odd non-fatal errors in amdump reports.
On 2017-11-10 12:52, Jean-Louis Martineau wrote: The previous patch broke something. Try this new set2-r2.diff patch Given that the switch to NFSv4 combined with a change to the labeling scheme fixed the other issue, I'm going to re-test these two sets with the same changes before I test the patch just so I've got something current to compare against. I should have results from that later today, and will likely be testing this patch tomorrow if things aren't resolved by the other changes (and based on what you've said and what I've seen, I don't think the switch to NFSv4 or the labeling change will fix this one). Jean-Louis On 10/11/17 10:40 AM, Austin S. Hemmelgarn wrote: > On 2017-11-10 08:27, Jean-Louis Martineau wrote: >> On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote: >>> On 2017-11-08 08:03, Jean-Louis Martineau wrote: >>>> On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote: >>>> > On 2017-11-07 10:22, Jean-Louis Martineau wrote: >>>> >> Austin, >>>> >> >>>> >> It's hard to say something with only the error message. >>>> >> >>>> >> Can you post the amdump. and log..0 for >>>> the 2 >>>> >> backup set that fail. >>>> >> >>>> > I've attached the files (I would put them inline, but one of the >>>> sets >>>> > has over 100 DLE's, so the amdump file is huge, and the others are >>>> > still over 100k each, and I figured nobody want's to try and wad >>>> > through those in-line). >>>> > >>>> > The set1 and set2 files are for the two backup sets that show the >>>> > header mismatch error, and the set3 files are for the one that >>>> claims >>>> > failures in the dump summary. >>>> >>>> >>>> I looked at set3, the error in the 'DUMP SUMMARY' are related to the >>>> error in the 'FAILURE DUMP SUMMARY' >>>> >>>> client2 /boot lev 0 FLUSH [File 0 not found] >>>> client3 /boot lev 0 FLUSH [File 0 not found] >>>> client7 /boot lev 0 FLUSH [File 0 not found] >>>> client8 /boot lev 0 FLUSH [File 0 not found] >>>> client0 /boot lev 0 FLUSH [File 0 not found] >>>> client9 /boot lev 0 FLUSH [File 0 not found] >>>> client9 /srv lev 0 FLUSH [File 0 not found] >>>> client9 /var lev 0 FLUSH [File 0 not found] >>>> server0 /boot lev 0 FLUSH [File 0 not found] >>>> client10 /boot lev 0 FLUSH [File 0 not found] >>>> client11 /boot lev 0 FLUSH [File 0 not found] >>>> client12 /boot lev 0 FLUSH [File 0 not found] >>>> >>>> They are VAULT attemp, not FLUSH, looking only at the first entry, it >>>> try to vault 'client2 /boot 0 20171024084159' which it expect to >>>> find on >>>> tape Server-01. It is an older dump. >>>> >>>> Do Server-01 is still there? Did it still contains the dump? >>>> >>> OK, I've done some further investigation by tweaking the labeling a >>> bit (which actually fixed a purely cosmetic issue we were having), >>> but I'm still seeing the same problem that prompted this thread, and >>> I can confirm that the dumps are where Amanda is trying to look for >>> them, it's just not seeing them for some reason. I hadn't thought >>> of this before, but could it have something to do with the virtual >>> tape library being auto-mounted over NFS on the backup server? >>> >> Austin, >> >> Can you try to see if amfetchdump can restore it? >> >> * amfetchdump CONFIG client2 /boot 20171024084159 >> > amfetchdump doesn't see it, and neither does amrecover, but the files > for the given parts are definitely there (I know for a fact that the > dump in question has exactly one part, and the file for that does > exist on the virtual tape mentioned in the log file). > > I'm probably not going to be able to check more on this today, but > I'll likely be checking if amrestore and amadmin find can see them. >
Re: Odd non-fatal errors in amdump reports.
On 2017-11-10 08:45, Austin S. Hemmelgarn wrote: On 2017-11-10 08:27, Jean-Louis Martineau wrote: On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote: On 2017-11-08 08:03, Jean-Louis Martineau wrote: On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote: > On 2017-11-07 10:22, Jean-Louis Martineau wrote: >> Austin, >> >> It's hard to say something with only the error message. >> >> Can you post the amdump. and log..0 for the 2 >> backup set that fail. >> > I've attached the files (I would put them inline, but one of the sets > has over 100 DLE's, so the amdump file is huge, and the others are > still over 100k each, and I figured nobody want's to try and wad > through those in-line). > > The set1 and set2 files are for the two backup sets that show the > header mismatch error, and the set3 files are for the one that claims > failures in the dump summary. I looked at set3, the error in the 'DUMP SUMMARY' are related to the error in the 'FAILURE DUMP SUMMARY' client2 /boot lev 0 FLUSH [File 0 not found] client3 /boot lev 0 FLUSH [File 0 not found] client7 /boot lev 0 FLUSH [File 0 not found] client8 /boot lev 0 FLUSH [File 0 not found] client0 /boot lev 0 FLUSH [File 0 not found] client9 /boot lev 0 FLUSH [File 0 not found] client9 /srv lev 0 FLUSH [File 0 not found] client9 /var lev 0 FLUSH [File 0 not found] server0 /boot lev 0 FLUSH [File 0 not found] client10 /boot lev 0 FLUSH [File 0 not found] client11 /boot lev 0 FLUSH [File 0 not found] client12 /boot lev 0 FLUSH [File 0 not found] They are VAULT attemp, not FLUSH, looking only at the first entry, it try to vault 'client2 /boot 0 20171024084159' which it expect to find on tape Server-01. It is an older dump. Do Server-01 is still there? Did it still contains the dump? OK, I've done some further investigation by tweaking the labeling a bit (which actually fixed a purely cosmetic issue we were having), but I'm still seeing the same problem that prompted this thread, and I can confirm that the dumps are where Amanda is trying to look for them, it's just not seeing them for some reason. I hadn't thought of this before, but could it have something to do with the virtual tape library being auto-mounted over NFS on the backup server? Austin, Can you try to see if amfetchdump can restore it? * amfetchdump CONFIG client2 /boot 20171024084159 At the moment, I'm re-testing things after tweaking some NFS parameters for the virtual tape library (apparently the FreeNAS server that's actually storing the data didn't have NFSv4 turned on, so it was mounted with NFSv3, which we've had issues with before on our network), so I can't exactly check immediately, but assuming the problem repeats, I'll do that first thing once the test dump is done. It looks like the combination of fixing the incorrect labeling in the config and switching to NFSv4 fixed this particular case.
Re: power down hard drives
On 2017-11-11 01:49, Jon LaBadie wrote: Just a thought. My amanda server has seven hard drives dedicated to saving amanda data. Only 2 are typically used (holding and one vtape drive) during an amdump run. Even then, the usage is only for about 3 hours. So there is a lot of electricity and disk drive wear for inactive drives. Can todays drives be unmounted and powered down then when needed, powered up and mounted again? I'm not talking about system hibernation, the system and its other drives still need to be active. Back when 300GB was a big drive I had 2 of them in external USB housings. They shut themselves down on inactivity. When later accessed, there would be about 5-10 seconds delay while the drive spun up and things proceeded normally. That would be a fine arrangement now if it could be mimiced. Aside from what Stefan mentioned (using hdparam to set the standby timeout, check the man page for hdparam as the numbers are not exactly sensible), you may consider looking into auto-mounting each of the drives, as that can help eliminate things that would keep the drives on-line (or make it more obvious that something is still using them).
Re: Odd non-fatal errors in amdump reports.
On 2017-11-10 08:27, Jean-Louis Martineau wrote: On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote: On 2017-11-08 08:03, Jean-Louis Martineau wrote: On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote: > On 2017-11-07 10:22, Jean-Louis Martineau wrote: >> Austin, >> >> It's hard to say something with only the error message. >> >> Can you post the amdump. and log..0 for the 2 >> backup set that fail. >> > I've attached the files (I would put them inline, but one of the sets > has over 100 DLE's, so the amdump file is huge, and the others are > still over 100k each, and I figured nobody want's to try and wad > through those in-line). > > The set1 and set2 files are for the two backup sets that show the > header mismatch error, and the set3 files are for the one that claims > failures in the dump summary. I looked at set3, the error in the 'DUMP SUMMARY' are related to the error in the 'FAILURE DUMP SUMMARY' client2 /boot lev 0 FLUSH [File 0 not found] client3 /boot lev 0 FLUSH [File 0 not found] client7 /boot lev 0 FLUSH [File 0 not found] client8 /boot lev 0 FLUSH [File 0 not found] client0 /boot lev 0 FLUSH [File 0 not found] client9 /boot lev 0 FLUSH [File 0 not found] client9 /srv lev 0 FLUSH [File 0 not found] client9 /var lev 0 FLUSH [File 0 not found] server0 /boot lev 0 FLUSH [File 0 not found] client10 /boot lev 0 FLUSH [File 0 not found] client11 /boot lev 0 FLUSH [File 0 not found] client12 /boot lev 0 FLUSH [File 0 not found] They are VAULT attemp, not FLUSH, looking only at the first entry, it try to vault 'client2 /boot 0 20171024084159' which it expect to find on tape Server-01. It is an older dump. Do Server-01 is still there? Did it still contains the dump? OK, I've done some further investigation by tweaking the labeling a bit (which actually fixed a purely cosmetic issue we were having), but I'm still seeing the same problem that prompted this thread, and I can confirm that the dumps are where Amanda is trying to look for them, it's just not seeing them for some reason. I hadn't thought of this before, but could it have something to do with the virtual tape library being auto-mounted over NFS on the backup server? Austin, Can you try to see if amfetchdump can restore it? * amfetchdump CONFIG client2 /boot 20171024084159 amfetchdump doesn't see it, and neither does amrecover, but the files for the given parts are definitely there (I know for a fact that the dump in question has exactly one part, and the file for that does exist on the virtual tape mentioned in the log file). I'm probably not going to be able to check more on this today, but I'll likely be checking if amrestore and amadmin find can see them.
Re: Odd non-fatal errors in amdump reports.
On 2017-11-10 10:00, Jean-Louis Martineau wrote: Austin, Can you try the attached patch, I think it could fix the set1 and set2 errors. Yes, but I won't be able to log in this weekend to revert it if it doesn't work, so I won't be able to test it until Monday. Am I correct in assuming that it only needs to be applied on the server and not the clients? On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote: > On 2017-11-07 10:22, Jean-Louis Martineau wrote: >> Austin, >> >> It's hard to say something with only the error message. >> >> Can you post the amdump. and log..0 for the 2 >> backup set that fail. >> > I've attached the files (I would put them inline, but one of the sets > has over 100 DLE's, so the amdump file is huge, and the others are > still over 100k each, and I figured nobody want's to try and wad > through those in-line). > > The set1 and set2 files are for the two backup sets that show the > header mismatch error, and the set3 files are for the one that claims > failures in the dump summary.
Re: Odd non-fatal errors in amdump reports.
On 2017-11-10 08:27, Jean-Louis Martineau wrote: On 10/11/17 07:57 AM, Austin S. Hemmelgarn wrote: On 2017-11-08 08:03, Jean-Louis Martineau wrote: On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote: > On 2017-11-07 10:22, Jean-Louis Martineau wrote: >> Austin, >> >> It's hard to say something with only the error message. >> >> Can you post the amdump. and log..0 for the 2 >> backup set that fail. >> > I've attached the files (I would put them inline, but one of the sets > has over 100 DLE's, so the amdump file is huge, and the others are > still over 100k each, and I figured nobody want's to try and wad > through those in-line). > > The set1 and set2 files are for the two backup sets that show the > header mismatch error, and the set3 files are for the one that claims > failures in the dump summary. I looked at set3, the error in the 'DUMP SUMMARY' are related to the error in the 'FAILURE DUMP SUMMARY' client2 /boot lev 0 FLUSH [File 0 not found] client3 /boot lev 0 FLUSH [File 0 not found] client7 /boot lev 0 FLUSH [File 0 not found] client8 /boot lev 0 FLUSH [File 0 not found] client0 /boot lev 0 FLUSH [File 0 not found] client9 /boot lev 0 FLUSH [File 0 not found] client9 /srv lev 0 FLUSH [File 0 not found] client9 /var lev 0 FLUSH [File 0 not found] server0 /boot lev 0 FLUSH [File 0 not found] client10 /boot lev 0 FLUSH [File 0 not found] client11 /boot lev 0 FLUSH [File 0 not found] client12 /boot lev 0 FLUSH [File 0 not found] They are VAULT attemp, not FLUSH, looking only at the first entry, it try to vault 'client2 /boot 0 20171024084159' which it expect to find on tape Server-01. It is an older dump. Do Server-01 is still there? Did it still contains the dump? OK, I've done some further investigation by tweaking the labeling a bit (which actually fixed a purely cosmetic issue we were having), but I'm still seeing the same problem that prompted this thread, and I can confirm that the dumps are where Amanda is trying to look for them, it's just not seeing them for some reason. I hadn't thought of this before, but could it have something to do with the virtual tape library being auto-mounted over NFS on the backup server? Austin, Can you try to see if amfetchdump can restore it? * amfetchdump CONFIG client2 /boot 20171024084159 At the moment, I'm re-testing things after tweaking some NFS parameters for the virtual tape library (apparently the FreeNAS server that's actually storing the data didn't have NFSv4 turned on, so it was mounted with NFSv3, which we've had issues with before on our network), so I can't exactly check immediately, but assuming the problem repeats, I'll do that first thing once the test dump is done.
Re: Odd non-fatal errors in amdump reports.
On 2017-11-08 08:03, Jean-Louis Martineau wrote: On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote: > On 2017-11-07 10:22, Jean-Louis Martineau wrote: >> Austin, >> >> It's hard to say something with only the error message. >> >> Can you post the amdump. and log..0 for the 2 >> backup set that fail. >> > I've attached the files (I would put them inline, but one of the sets > has over 100 DLE's, so the amdump file is huge, and the others are > still over 100k each, and I figured nobody want's to try and wad > through those in-line). > > The set1 and set2 files are for the two backup sets that show the > header mismatch error, and the set3 files are for the one that claims > failures in the dump summary. I looked at set3, the error in the 'DUMP SUMMARY' are related to the error in the 'FAILURE DUMP SUMMARY' client2 /boot lev 0 FLUSH [File 0 not found] client3 /boot lev 0 FLUSH [File 0 not found] client7 /boot lev 0 FLUSH [File 0 not found] client8 /boot lev 0 FLUSH [File 0 not found] client0 /boot lev 0 FLUSH [File 0 not found] client9 /boot lev 0 FLUSH [File 0 not found] client9 /srv lev 0 FLUSH [File 0 not found] client9 /var lev 0 FLUSH [File 0 not found] server0 /boot lev 0 FLUSH [File 0 not found] client10 /boot lev 0 FLUSH [File 0 not found] client11 /boot lev 0 FLUSH [File 0 not found] client12 /boot lev 0 FLUSH [File 0 not found] They are VAULT attemp, not FLUSH, looking only at the first entry, it try to vault 'client2 /boot 0 20171024084159' which it expect to find on tape Server-01. It is an older dump. Do Server-01 is still there? Did it still contains the dump? OK, I've done some further investigation by tweaking the labeling a bit (which actually fixed a purely cosmetic issue we were having), but I'm still seeing the same problem that prompted this thread, and I can confirm that the dumps are where Amanda is trying to look for them, it's just not seeing them for some reason. I hadn't thought of this before, but could it have something to do with the virtual tape library being auto-mounted over NFS on the backup server?
Re: Odd non-fatal errors in amdump reports.
On 2017-11-08 08:03, Jean-Louis Martineau wrote: On 07/11/17 02:58 PM, Austin S. Hemmelgarn wrote: > On 2017-11-07 10:22, Jean-Louis Martineau wrote: >> Austin, >> >> It's hard to say something with only the error message. >> >> Can you post the amdump. and log..0 for the 2 >> backup set that fail. >> > I've attached the files (I would put them inline, but one of the sets > has over 100 DLE's, so the amdump file is huge, and the others are > still over 100k each, and I figured nobody want's to try and wad > through those in-line). > > The set1 and set2 files are for the two backup sets that show the > header mismatch error, and the set3 files are for the one that claims > failures in the dump summary. I looked at set3, the error in the 'DUMP SUMMARY' are related to the error in the 'FAILURE DUMP SUMMARY' client2 /boot lev 0 FLUSH [File 0 not found] client3 /boot lev 0 FLUSH [File 0 not found] client7 /boot lev 0 FLUSH [File 0 not found] client8 /boot lev 0 FLUSH [File 0 not found] client0 /boot lev 0 FLUSH [File 0 not found] client9 /boot lev 0 FLUSH [File 0 not found] client9 /srv lev 0 FLUSH [File 0 not found] client9 /var lev 0 FLUSH [File 0 not found] server0 /boot lev 0 FLUSH [File 0 not found] client10 /boot lev 0 FLUSH [File 0 not found] client11 /boot lev 0 FLUSH [File 0 not found] client12 /boot lev 0 FLUSH [File 0 not found] They are VAULT attemp, not FLUSH, looking only at the first entry, it try to vault 'client2 /boot 0 20171024084159' which it expect to find on tape Server-01. It is an older dump. Do Server-01 is still there? Did it still contains the dump? Hmm, looks like that's a leftover from changing our labeling format shortly after switching to this new configuration. I thought I purged all the stuff with the old label scheme, but I guess not. It somewhat surprises me that this doesn't give any kind of error indication in the e-mail report beyond the 'FAILED' line in the dump summary.
Re: amvault with dropbox
On 2017-11-07 13:36, Ned Danieley wrote: On Tue, Nov 07, 2017 at 01:29:34PM -0500, Austin S. Hemmelgarn wrote: OK, so you're talking about functionally permanent archiving instead of keeping old stuff around for a fixed multiple of the dump cycle. If that's the case, you may be better off pulling the dumps off the tapes using amfetchdump, and then uploading them for there. That use case could in theory be handled better with some extra code in Amanda, but I don't know how well the lack of deletion would be handled on Amanda's side. yeah, I need to upload monthly full dumps to dropbox and keep them forever. the monthly dumps are to vtapes, and I thought it would be neat if I could then just transfer the vtapes to dropbox using amvault. Strictly speaking, amvault doesn't transfer vtapes, it retapes the dumps on the vtapes to a new location. While this sounds like a somewhat pointless distinction, it's actually pretty significant because it means you can use a different type of tapes for your secondary storage, with almost every single tapetype option different (which is extremely useful for multiple reasons). That's actually part of the reason that it's a preferred alternative to mirroring tapes with the Amanda's RAIT device. The issue here though is the 'keep it forever' bit. If Amanda is given an automated tape changer (a library of vtapes is an automated changer), it assumes it can reuse the tapes as it sees fit. I think there's a config option that lets you change that, but once you do that, you need to keep adding tapes (or vtapes) to the library, which can get out of hand really quickly (especially if you don't plan ahead when deciding on how things will get labeled). One option for this though, if you can afford to use something other than Dropbox, would be to use the Amazon S3 support to store your data in Amazon Glacier storage (which is insanely cheap at about 0.07 USD per TB of storage), and enable versioning (so that wen a 'tape' gets overwritten, the old version gets kept around) and keep old versions forever. If you're interested in doing this, I can write up instructions for how to get things set up with Amazon to do this (We actually do something very similar for off-site backups where I work, just without Glacier or versioning (but those are easy to set up)).
Re: amvault with dropbox
On 2017-11-07 13:19, Ned Danieley wrote: On Tue, Nov 07, 2017 at 01:11:43PM -0500, Austin S. Hemmelgarn wrote: On 2017-11-07 11:55, Ned Danieley wrote: we use a dropbox business account to archive our data, and I was interested in trying to use amvault to transfer my amanda backups there. however, it seems that there is a fair amount of work that would have to be done to the code base to make that happen, work that is probably beyond my ability. are any plans to include dropbox access in future versions? You can do this already without needing any new code. Just configure a virtual tape library inside a Dropbox synced directory, set that as a vaulting location, and recursively add the necessary read permissions to the directory after each amvault run. I guess that would work, although I'd have to set up selective sync so I could remove the files locally without removing them from dropbox. thanks for the suggestion; I'll give it a try. OK, so you're talking about functionally permanent archiving instead of keeping old stuff around for a fixed multiple of the dump cycle. If that's the case, you may be better off pulling the dumps off the tapes using amfetchdump, and then uploading them for there. That use case could in theory be handled better with some extra code in Amanda, but I don't know how well the lack of deletion would be handled on Amanda's side.
Re: amvault with dropbox
On 2017-11-07 11:55, Ned Danieley wrote: we use a dropbox business account to archive our data, and I was interested in trying to use amvault to transfer my amanda backups there. however, it seems that there is a fair amount of work that would have to be done to the code base to make that happen, work that is probably beyond my ability. are any plans to include dropbox access in future versions? You can do this already without needing any new code. Just configure a virtual tape library inside a Dropbox synced directory, set that as a vaulting location, and recursively add the necessary read permissions to the directory after each amvault run.
Re: Odd non-fatal errors in amdump reports.
On 2017-11-07 10:22, Jean-Louis Martineau wrote: Austin, It's hard to say something with only the error message. Can you post the amdump. and log..0 for the 2 backup set that fail. Yes, though it may take me a while since our policy is pretty strict about scrubbing hostnames and usernames from any internal files we make visible publicly. Just to clarify, it will end up being 3 total pairs of files, two from backup sets that show the first issue I mentioned (the complaint about a header mismatch), and one from the backup set showing the second issue I mentioned (the apparently bogus dump failures listed in the dump summary). The tapedev of the aws changer can be written like: tapedev "chg-multi:s3:/slot-{0..127} Thanks, I hadn't know that the configuration file syntax supported sequences like this, that makes it look so much nicer! Jean-Louis On 07/11/17 09:17 AM, Austin S. Hemmelgarn wrote: > Where I work, we recently switched from manually triggered vaulting to > automatic vaulting using the vault-storage, vault, and dump-selection > options. Things appear to be working correctly, but we keep getting > some odd non-fatal error messages (that might be bogus as well, since > I've verified the dumps mentioned restore correctly) in the amdump > e-mails. I've been trying to figure out these 'errors' for the past > few weeks now, and I'm hoping someone on the list might have some advice > (or better yet, might recognize the symptoms and know how to fix them). > > In our configuration, we have three different backup sets (each is on > it's own schedule). Of these, two are consistently showing the following > error in the amdump e-mail report (I've redacted hostnames and exact paths, > the second path listed though is a parent directory of the first): > > taper: FATAL Header of dumpfile does not match command from driver 0 XXX /home/X 20171031074642 -- 0 XXX /home/XX 20171031074642 at /usr/lib64/perl5/vendor_perl/5.24.1/Amanda/Taper/Worker.pm line 1168 > > For a given backup set, the particular hostname and paths are always the > same, but the backup appears to get taped correctly, and restores > correctly as well. > > With the third backup set, we're regularly seeing things like the > following in the dump summary section, but no other visible error > messages: > > DUMPER STATS TAPER STATS > HOSTNAME DISK L ORIG-KB OUT-KB COMP% MMM:SS KB/s MMM:SS KB/s > - -- --- > XX /boot 0 -- FAILED > XX /boot 1 10 10 -- 0:00 168.8 0:00 0.0 > > In this case, the particular DLE's affected are always the same, > and the first line that claims a failure always shows dump level > zero, even when the backup is supposed to be at another level. > Just like the other error, the affected dumps always restore > correctly when tested, and get correctly vaulted as well. The > affected DLE's are only on Linux systems, but it seems to not > care what distro or amanda version is being used (it's affected, > Debian, Gentoo, and Fedora systems, and covers 5 different > Amanda client versions), and are invariably small (sub-gigabyte) > filesystems, but I've not found any other commonality among them. > > All three sets use essentially the same amanda.conf file (the > differences are literally just in when they get run), which > I've attached in-line at the end of this e-mail with > sensitive data redacted. The thing I find particularly odd is > that this config is essentially identical to what I use on my > personal systems, which are not exhibiting either problem. > > 8< > > org "X" > mailto "admin" > dumpuser "amanda" > inparallel 2 > dumporder "Ss" > taperalgo largestfit > > displayunit "k" > netusage 800 Kbps > > dumpcycle 4 weeks > runspercycle 28 > tapecycle 128 tapes > > bumppercent 20 > bumpdays 2 > > etimeout 900 > dtimeout 1800 > ctimeout 30 > > device_output_buffer_size 256M > > compress-index no > > flush-threshold-dumped 0 > flush-threshold-scheduled 0 > taperflush 0 > autoflush yes > > runtapes 16 > > define changer vtl { > tapedev "chg-disk:/net/XX/amanda/X" > changerfile "/etc/amanda/X/changer" > property "num-slot" "128" > property "auto-create-slot" "yes" > } > > define changer aws { > tapedev "chg-multi:s3:/slot-{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,1
Odd non-fatal errors in amdump reports.
Where I work, we recently switched from manually triggered vaulting to automatic vaulting using the vault-storage, vault, and dump-selection options. Things appear to be working correctly, but we keep getting some odd non-fatal error messages (that might be bogus as well, since I've verified the dumps mentioned restore correctly) in the amdump e-mails. I've been trying to figure out these 'errors' for the past few weeks now, and I'm hoping someone on the list might have some advice (or better yet, might recognize the symptoms and know how to fix them). In our configuration, we have three different backup sets (each is on it's own schedule). Of these, two are consistently showing the following error in the amdump e-mail report (I've redacted hostnames and exact paths, the second path listed though is a parent directory of the first): taper: FATAL Header of dumpfile does not match command from driver 0 XXX /home/X 20171031074642 -- 0 XXX /home/XX 20171031074642 at /usr/lib64/perl5/vendor_perl/5.24.1/Amanda/Taper/Worker.pm line 1168 For a given backup set, the particular hostname and paths are always the same, but the backup appears to get taped correctly, and restores correctly as well. With the third backup set, we're regularly seeing things like the following in the dump summary section, but no other visible error messages: DUMPER STATS TAPER STATS HOSTNAME DISK L ORIG-KB OUT-KB COMP% MMM:SS KB/s MMM:SS KB/s - -- --- XX /boot 0-- FAILED XX /boot 1 10 10-- 0:00 168.8 0:00 0.0 In this case, the particular DLE's affected are always the same, and the first line that claims a failure always shows dump level zero, even when the backup is supposed to be at another level. Just like the other error, the affected dumps always restore correctly when tested, and get correctly vaulted as well. The affected DLE's are only on Linux systems, but it seems to not care what distro or amanda version is being used (it's affected, Debian, Gentoo, and Fedora systems, and covers 5 different Amanda client versions), and are invariably small (sub-gigabyte) filesystems, but I've not found any other commonality among them. All three sets use essentially the same amanda.conf file (the differences are literally just in when they get run), which I've attached in-line at the end of this e-mail with sensitive data redacted. The thing I find particularly odd is that this config is essentially identical to what I use on my personal systems, which are not exhibiting either problem. 8< org "X" mailto "admin" dumpuser "amanda" inparallel 2 dumporder "Ss" taperalgo largestfit displayunit "k" netusage 800 Kbps dumpcycle 4 weeks runspercycle 28 tapecycle 128 tapes bumppercent 20 bumpdays 2 etimeout 900 dtimeout 1800 ctimeout 30 device_output_buffer_size 256M compress-index no flush-threshold-dumped 0 flush-threshold-scheduled 0 taperflush 0 autoflush yes runtapes 16 define changer vtl { tapedev "chg-disk:/net/XX/amanda/X" changerfile "/etc/amanda/X/changer" property "num-slot" "128" property "auto-create-slot" "yes" } define changer aws { tapedev "chg-multi:s3:/slot-{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127}" changerfile "/etc/amanda/X/s3-changer" device-property "S3_SSL" "YES" device-property "S3_ACCESS_KEY" "" device-property "S3_SECRET_KEY" "" device-property "S3_MULTI_PART_UPLOAD" "YES" device-property "CREATE_BUCKET" "NO" device-property "S3_BUCKET_LOCATION" "X" device-property "STORAGE_API" "AWS4" } define storage local-vtl { tpchanger "vtl" tapepool "$r" tapetype "V64G" labelstr "^-[0-9][0-9]*$" autolabel "-%%%" any erase-on-full YES erase-on-failure YES vault cloud 0 } define storage cloud { tpchanger "aws" tapepool "$r" tapetype "S3TAPE" labelstr "^Vault--[0-9][0-9]*$" autolabel "Vault--%%%" any erase-on-full YES erase-on-failure YES
Re: approaches to Amanda vaulting?
On 2017-10-24 12:28, Stefan G. Weichinger wrote: Am 2017-10-24 um 13:38 schrieb Austin S. Hemmelgarn: On 2017-10-22 13:38, Stefan G. Weichinger wrote: After or before I additionally can do something like: amvault myconf --dest-storage --latest-fulls archive correct? I think so, but I'm not 100% certain. oh ;-) An additional hurdle is that the customer wants to use WORM tapes for archive, so I should get that right at the first run to not waste any tapes Perhaps create a temporary virtual tape library for testing that the archiving schedule works as expected? This is what I generally do when testing changes at work (although I usually do it using a copy of the main configuration so that I don't confuse the planner for the production backups with half a dozen runs in one day). Sure, that would be good, but I don't have that much disk space available. I am currently trying to wrap my head around the tuning of these parameters (and understand the exact meaning by reading the man page): flush-threshold-dumped flush-threshold-scheduled taperflush I had lev0 of all DLEs in the holding disk and both flush-threshold values on 400 -> I thought this would keep data for 4 tapes inside the disk, but no, some lev0 backups were flushed to primary storage already. Maybe I set up a VM with 2 vtape changers and play around there to learn and understand. Based on what you're saying you want, I think you want the following in your config: flush-threshold-dumped 400 flush-threshold-scheduled 400 taperflush 400 autoflush yes The first two control flushing during a run, while taperflush controls flushing at the end of a run. To get the flushing to actually happen, you then need autoflush set to yes (and amanda will complain if it's not set to yes and taperflush is more than zero). Now, I'm not 100% certain that will work, as I've not done this type of thing myself (at work, we just use the holding disk as a cache so that we can finish dumps as quickly as possible without our (slow, parity-raid backed) persistent storage being the bottleneck, and at home I don't use it since I don't need parallelization and I don't have any disks that are faster than any others), but based on what I understand from the documentation, I'm pretty sure this should do it.
Re: approaches to Amanda vaulting?
On 2017-10-22 13:38, Stefan G. Weichinger wrote: Am 2017-10-16 um 19:22 schrieb Stefan G. Weichinger: Am 2017-10-16 um 15:20 schrieb Jean-Louis Martineau: Amanda 3.5 can do everything you want only by running the amdump command. Using an holding disk: * You configure two storages * All dumps go to the holding disk * All dumps are copied to each storages, not necessarily at the same time or in the same run. * The dumps stay in holding until they are copied to both storages * You can tell amanda that everything must go to both storage or only some dle full/incr I now have set up a config like this: define changer robot { tpchanger "chg-robot:/dev/sg3" property "tape-device" "0=tape:/dev/nst0" property "eject-before-unload" "no" property "use-slots" "1-8" } define tapetype LTO6 { #comment "Created by amtapetype; compression enabled" length 2442818848 kbytes # etwa 2.4 TB (sgw) filemark 1806 kbytes speed 74006 kps blocksize 32 kbytes part_size 200G } define storage myconf { tapepool "myconf" tapetype "LTO6" tpchanger "robot" labelstr "^CMR[0-9][0-9]*$" autoflush yes # flush-threshold-dumped 100 # flush-threshold-scheduled 100 # # lass alles in holding disk flush-threshold-dumped 400 # (or more) flush-threshold-scheduled 400 # (or more) taperflush 400 runtapes 4 } define storage archive { tapepool "archive" tapetype "LTO6" tpchanger "robot" labelstr "^ARC[0-9][0-9]*$" autoflush yes flush-threshold-dumped 100 flush-threshold-scheduled 100 runtapes 4 dump-selection ALL FULL } storage "myconf" maxdumpsize -1 amrecover_changer "robot" > 8< my goal: I have to create a set of archive tapes for that customer, every 3 months or so. With the above setup I now ran "amdump --no-taper myconf" which collected all dumps on holdingdisk (did an "amadmin myconf force *" before to force FULLs now). As I understand that I could now do a plain amflush which should (a) write to the tapes of tapepool "myconf" and (b) leave the holdingdisk tarballs where they are, right? (I am not yet sure about that "400" above, I want to keep data for all 4 tapes in the holdingdisk now and may reduce that to 100 for normal daily runs without "--no-taper" or so) After or before I additionally can do something like: amvault myconf --dest-storage --latest-fulls archive correct? I think so, but I'm not 100% certain. An additional hurdle is that the customer wants to use WORM tapes for archive, so I should get that right at the first run to not waste any tapesPerhaps create a temporary virtual tape library for testing that the archiving schedule works as expected? This is what I generally do when testing changes at work (although I usually do it using a copy of the main configuration so that I don't confuse the planner for the production backups with half a dozen runs in one day).
Re: approaches to Amanda vaulting?
On 2017-10-19 11:06, Jean-Louis Martineau wrote: On 19/10/17 08:48 AM, Austin S. Hemmelgarn wrote: > On 2017-10-18 15:45, Stefan G. Weichinger wrote: >> Am 2017-10-16 um 20:47 schrieb Austin S. Hemmelgarn: >> >>> While it's not official documentation, I've got a working >>> configuration with >>> Amanda 3.5.0 on my personal systems, using locally accessible >>> storage for >>> primary backups, and S3 for vaulting (though I vault everything, the >>> local >>> storage is for getting old files back, S3 is for disaster >>> recovery). I've >>> put a copy of the relevant config fragment at the end of this reply, >>> with >>> various private data replaced, and some bits that aren't really >>> relevant >>> (like labeling options) elided. >> >> A quick thank you at this point: >> >> thanks for providing this config plus explanations, I will try to set up >> a similar config soon and take your example as a template. >> >> And maybe come back with some additional questions ;-) >> >> for example: what do you run as cronjobs, what do you do via manual >> commands? amdump in cron, amvault now and then? > Well, there's two options for how to handle it. > > Where I work, we use a very similar configuration to what I posted, > and run amdump and amvault independently, both through cron (though we > only vault full backups to S3 since we have a reasonably good level of > trust in the reliability of our local storage). This gives very good > control of exactly what and exactly when things get vaulted, and > allows for scheduling vaulting separately from dumps (we prefer to > only copy things out to S3 once a month and need to make sure the > network isn't bogged down with backups during work hours, so this is a > big plus for us). The problem with the amvault command is that it do only according to the command line, which can be difficult to get right. If amvault fail, it's hard to find the correct arguments to vault what was not yet vaulted. With wrong arguments, some dump might never be vaulted, or some dumps might be vaulted multiple time (on different amvault invocation). To be entirely honest, I wouldn't exactly call `--latest-fulls`, `--fulls-only`, or `--incrs-only` hard to get right. It's only really tricky if you want to only vault subsets of the config. Add to that that it's pretty easy to see what got vaulted if you have e-mail set up right, and it really isn't too bad for most use cases. Since you want to vault all full, I would set 'vault' in the local storage, set 'dump-selection' in the cloud storage, but will not set 'vault-storage' That way the vault are scheduled but are not executed because vault-storage is not set. Amanda know they must be vaulted. Every month, you can run: amdump CONF BADHOST -ovault-storage="cloud" to do the vaulting. We've actually been discussing migrating things to operate like I have them set up on my home systems (albeit only vaulting fulls), as the 'once a month' part of vaulting is largely a hold-over from our old (pre-Amanda) backup system which did fulls on the first of the month, and archived them off-site the day afterwards. > > On my home systems, I also use a similar config, but I instead have a > 'vault' option specified in the 'local' storage block that points to > the 'cloud' and says to vault immediately after dump generation(so the > line is 'vault cloud 0'). With this setup, amdump will run the > vaulting operation itself after finishing everything else for the dump > (and you actually don't need the 'vault-storage' line at the end I > think), and you either end up vaulting everything, or have to limit > things through the config with a 'dump-selection' line in your 'cloud' > storage definition. vault-storage is required, otherwise the vault are not executed. Good to know, that could probably be better explained in the documentation.
Re: approaches to Amanda vaulting?
On 2017-10-18 15:45, Stefan G. Weichinger wrote: Am 2017-10-16 um 20:47 schrieb Austin S. Hemmelgarn: While it's not official documentation, I've got a working configuration with Amanda 3.5.0 on my personal systems, using locally accessible storage for primary backups, and S3 for vaulting (though I vault everything, the local storage is for getting old files back, S3 is for disaster recovery). I've put a copy of the relevant config fragment at the end of this reply, with various private data replaced, and some bits that aren't really relevant (like labeling options) elided. A quick thank you at this point: thanks for providing this config plus explanations, I will try to set up a similar config soon and take your example as a template. And maybe come back with some additional questions ;-) for example: what do you run as cronjobs, what do you do via manual commands? amdump in cron, amvault now and then? Well, there's two options for how to handle it. Where I work, we use a very similar configuration to what I posted, and run amdump and amvault independently, both through cron (though we only vault full backups to S3 since we have a reasonably good level of trust in the reliability of our local storage). This gives very good control of exactly what and exactly when things get vaulted, and allows for scheduling vaulting separately from dumps (we prefer to only copy things out to S3 once a month and need to make sure the network isn't bogged down with backups during work hours, so this is a big plus for us). On my home systems, I also use a similar config, but I instead have a 'vault' option specified in the 'local' storage block that points to the 'cloud' and says to vault immediately after dump generation(so the line is 'vault cloud 0'). With this setup, amdump will run the vaulting operation itself after finishing everything else for the dump (and you actually don't need the 'vault-storage' line at the end I think), and you either end up vaulting everything, or have to limit things through the config with a 'dump-selection' line in your 'cloud' storage definition.
Re: What are the correct permissions for lib binaries for amanda 3.5
On 2017-10-16 14:58, Jon LaBadie wrote: On Mon, Oct 16, 2017 at 02:05:05PM -0400, Jean-Louis Martineau wrote: On 16/10/17 01:48 PM, Jon LaBadie wrote: On Mon, Oct 16, 2017 at 08:12:43AM -0400, Jean-Louis Martineau wrote: On 14/10/17 12:12 PM, Jose M Calhariz wrote: On Sat, Oct 14, 2017 at 11:36:09AM -0400, Jean-Louis Martineau wrote: On 14/10/17 11:14 AM, Jose M Calhariz wrote: -rwsr-xr-- 1 root backup 10232 Oct 13 17:23 ambind ambind must not be readable by all -rwsr-x--- 1 root backup 10232 Oct 13 17:23 ambind Thank you for the quick reply. May I ask why "ambind must not be readable by all" ? All suid program in amanda are always installed like this. Why are all amanda suid programs installed this way? It's before I was born, maybe not, but before I started to work on the amanda software. It's kind of security by hiding, it's harder to find a vulnerability in the suid binary if you can't read it. I guessed it was security by obscurity. It is, but it's common practice security by obscurity dating back almost to SVR4. It make sense when you build yourself, but not when doing a package where everyone can read the files in the package. For the same reason I felt that would be "false" security. The group probably do not read the 'r' bit either. Do you think amcheck should not check if the suid binary are readable by all? My gut reaction is such a check is superfluous. But I'm not a security expert. Do we have any security specialist (or others) on the list who would care to comment? I won't claim to be a security expert, but I've been a sysadmin for more than a decade and can tell you two things based on my experience own experience: 1. Amanda is the only software I've ever encountered that does this kind of check, or more accurately, it's the only software I've ever encountered where this type of check is a fatal error. Some other software will ignore files if their ownership is wrong, but it's treated as a warning, and it's only configuration files (stuff like ~/.ssh/authorized_keys for example). 2. The checks are a serious pain in the arse, mostly because error messages are so vague (OK, so file XYZ has the wrong permissions, does that mean the directory it's in has the wrong permissions, or the file itself, and which permissions are wrong?). This particular check isn't as bad in that respect as, for example, the ones checking /etc/amanda-security.conf, but it's still a pain to deal with. Aside from that though, it's a case where the benefit to security is dependent on things that just aren't true for most systems amanda is likely to run on, namely that an attacker is: 1. Unable to determine what type of system you're running on. (This is a patently false assumption on any publicly available distro, as well as most paid ones like OEL, RHEL, and SLES). & 2. Unable to access the packages directly. In most cases, both are false. There are a few odd cases like source-based distros (Gentoo for example) where the package gets built locally, but even then the builds are pretty reproducible, and the code for Amanda itself is trivially available for review through other sources. In a way, it's kind of like making the contents of /boot inaccessible to regular users, but not preventing `uname -v` and `uname -r` from being executed by them. It makes things a bit more complicated for attackers, but in a rather trivial way that doesn't provide anything but a false sense of security. Does amcheck do any checks for amanda programs that are [sg]uid that should not be? I'm not sure, though it does check ownership on many files, and I think it checks that things that are supposed to be suid or sgid are (I'm pretty sure it complains if amgtar or amstar aren't suid root).
Re: approaches to Amanda vaulting?
On 2017-10-16 13:22, Stefan G. Weichinger wrote: > Am 2017-10-16 um 15:20 schrieb Jean-Louis Martineau: >> Amanda 3.5 can do everything you want only by running the amdump command. >> >> Using an holding disk: >> >> * You configure two storages >> * All dumps go to the holding disk >> * All dumps are copied to each storages, not necessarily at the same >> time or in the same run. >> * The dumps stay in holding until they are copied to both storages >> * You can tell amanda that everything must go to both storage or only >> some dle full/incr > > > So it is possible to set up a mix of "normal" daily backups with > incrementals/fulls and "archive"/vault backups with only the full > backups of a specific day ? > > I have requests to do so for a customer, until now we used amanda-3.3.9 > and 2 configs sharing most of config and disklist ... > > Nathan, the OP of this thread and others (including me) would like to > see actual examples of configuration, a howto or something. > > The man page https://wiki.zmanda.com/man/amvault.8.html is a bit minimal > ... > > Is there anything additional to that manpage and maybe: > > http://wiki.zmanda.com/index.php/How_To:Copy_Data_from_Volume_to_Volume > > ? While it's not official documentation, I've got a working configuration with Amanda 3.5.0 on my personal systems, using locally accessible storage for primary backups, and S3 for vaulting (though I vault everything, the local storage is for getting old files back, S3 is for disaster recovery). I've put a copy of the relevant config fragment at the end of this reply, with various private data replaced, and some bits that aren't really relevant (like labeling options) elided. For this to work reliably, you need to define a holding disk (although it can be on the same storage as the local vtape library). I personally start flushing from the holding disk immediately the moment any dump is complete, as all the data fits on one tape and the S3 upload takes longer than creating the backups in the first place, but it should work just fine too when buffering things on the holding disk. The given S3 configuration assumes you already created the destination bucket (I per-create them since I do life cycle stuff and cross-region replication, both of which are easier to set up if you create the bucket by hand). I also use a dedicated IAM user for the S3 side of things for both security and accounting reasons, but that shouldn't impact things. Additionally, I've found that the S3 uploads work much more reliably if you set a reasonable part size and have part caching. 1 GB seems to give a good balance between performance and reliability. 8<--- define tapetype vtape { length 16 GB part-size 1 GB part-cache-type memory } define changer local-vtl { tapedev "chg-disk:/path/to/local/vtapes" } define changer aws { tapedev "chg-multi:s3:example-bucket/slot{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16}" device-property "S3_SSL""YES" device-property "S3_ACCESS_KEY" "IAM_ACCESS_KEY" device-property "S3_SECRET_KEY" "IAM_SECRET_KEY" device-property "S3_MULTI_PART_UPLOAD" "YES" device-property "CREATE_BUCKET" "NO" device-property "S3_BUCKET_LOCATION""us-east-1" device-property "STORAGE_API" "AWS4" } define storage local { tapepool "local" tapetype "vtape" tpchanger "local-vtl" } define storage cloud { tapepool "s3" tapetype "vtape" tpchanger "aws" } storage "local" vault-storage "cloud"