Re: amrecover failing
Trever L. Adams wrote at 05:49 -0600 on Jul 19, 2014: > Hello everyone, > > So, I am not quite sure what is going on. When I try to do an amrecover, > I get the following: > > Load tape normal149 now > Continue [?/Y/n/s/d]? Y > Got no header and data from server, check in amidxtaped.*.debug and > amandad.*.debug files on server > > The logs hold such things as: > > Sat Jul 19 05:45:39 2014: thd-0x1a16a00: amidxtaped: warning: Can't exec > "-eo": No such file or directory at > /usr/lib64/perl5/vendor_perl/Amanda/Process.pm line 176. > > Sat Jul 19 05:45:39 2014: thd-0x1a16a00: amidxtaped: critical (fatal): > -eo pid,ppid,command: No such file or directory at > /usr/lib64/perl5/vendor_perl/Amanda/Process.pm line 176. > > amidxtaped: -eo pid,ppid,command: No such file or directory at > /usr/lib64/perl5/vendor_perl/Amanda/Process.pm line 176. > > /lib64/libamanda-3.3.3.so(+0x2b727)[0x7f3e7d9d0727] > /lib64/libglib-2.0.so.0(g_logv+0x209)[0x7f3e7d6c9429] > /lib64/libglib-2.0.so.0(g_log+0x8f)[0x7f3e7d6c963f] > /usr/lib64/perl5/vendor_perl/auto/Amanda/MainLoop/libMainLoop.so(+0x4925)[0x7f3e7c5ad925] > /lib64/libglib-2.0.so.0(+0x3f89449e43)[0x7f3e7d6c2e43] > /lib64/libglib-2.0.so.0(g_main_context_dispatch+0x166)[0x7f3e7d6c22a6] > /lib64/libglib-2.0.so.0(+0x3f89449628)[0x7f3e7d6c2628] > /lib64/libglib-2.0.so.0(g_main_loop_run+0x6a)[0x7f3e7d6c2a3a] > /usr/lib64/perl5/vendor_perl/auto/Amanda/MainLoop/libMainLoop.so(_wrap_run_c+0x50)[0x7f3e7c5adca0] > /lib64/libperl.so.5.18(Perl_pp_entersub+0x5c6)[0x3008ec33e6] > /lib64/libperl.so.5.18(Perl_runops_standard+0x2e)[0x3008ebb81e] > /lib64/libperl.so.5.18(perl_run+0x300)[0x3008e52d40] > /usr/bin/perl[0x400d29] > /lib64/libc.so.6(__libc_start_main+0xf5)[0x3f86c21d65] > /usr/bin/perl[0x400d61] > > > This is also not 100% consistent. I got it to do a restore once. Doing > the same steps with the same disk and date will not restore, it is all > the above. > > Any ideas? Looks like there was some build problem perhaps and $PS is not defined in Constants.pm locate Constants.pm | grep Amanda | xargs grep -i ps =head1 SYNOPSIS $PS = "/bin/ps"; $PS_ARGUMENT = "-eo pid,ppid,command";
Re: amrecover works, normal amanda backup, logging connection refused
Gene Heskett gheskett-at-wdtv.com |amusersj-ml0| wrote at 15:07 -0400 on Jul 18, 2014: > On Friday 18 July 2014 14:22:48 John Hein did opine > And Gene did reply: > > Gene Heskett wrote at 12:25 -0400 on Jul 18, 2014: > > > 14/7/18@12:09:37: ERROR: 3859 {activate_normal} bind failed (Address > > > already in use (errno = 98)). service = amanda > > > > More than one xinetd or inetd running? > > > > Maybe some basic background is in order. The basic operation of > > *inetd is pretty simple, and if you understand the basics, you can > > really solve many of the common issues yourself. > > > > *inetd runs forever "listening" on the sockets you tell it to > > listen on (as configured by the xinetd or inetd config files). > > When requests (any activity) on that socket come in, it tries > > to run the service that is specified in its configuration. > > > > If something else "owns" that socket, *inetd can't do its job > > (i.e., can't start the service corresponds to that socket). > > > > If not, then *inetd will spawn off the configured service (amandad > > in amanda's case). > > > > Technically, you don't need *inetd. You can kick off amandad to run > > on the client some other way (e.g., daemontools, ssh). But the server > > expects something to be listening on the client when it comes time to > > do the dump. > > > > As others have mentioned, you have to configure things for the right > > type of socket - the configuration of the amanda server (primarily > > in amanda.conf / disklist) and client (typically inetd config and > > amanda-client.conf) should match (see amanda-auth(7) and > > amanda-client.conf(5)). > > > > Here's some other good info so you can maybe help yourself and > > understand better how things work: > > > > http://wiki.zmanda.com/index.php/Quick_start_%28old%29 > > I just discovered that the failing box did NOT have an /etc/amanda- > client.conf, so I copied the one from examples and edited it. But the > working machine doesn't have one either, so I nuked it. amcheck didn't > care. You got that out of my email? What about the most important bits: two inetd's running? and the bind failure? And the hint to use the background info to try digging on your own a little. You're doing lots of things and it seems you don't know why - just guessing. That's never a good recipe. Your xinetd got a bind failure. That has nothing to do with amanda. Fix that first.
Re: amrecover works, normal amanda backup, logging connection refused
Gene Heskett wrote at 12:25 -0400 on Jul 18, 2014: > 14/7/18@12:09:37: ERROR: 3859 {activate_normal} bind failed (Address > already in use (errno = 98)). service = amanda More than one xinetd or inetd running? Maybe some basic background is in order. The basic operation of *inetd is pretty simple, and if you understand the basics, you can really solve many of the common issues yourself. *inetd runs forever "listening" on the sockets you tell it to listen on (as configured by the xinetd or inetd config files). When requests (any activity) on that socket come in, it tries to run the service that is specified in its configuration. If something else "owns" that socket, *inetd can't do its job (i.e., can't start the service corresponds to that socket). If not, then *inetd will spawn off the configured service (amandad in amanda's case). Technically, you don't need *inetd. You can kick off amandad to run on the client some other way (e.g., daemontools, ssh). But the server expects something to be listening on the client when it comes time to do the dump. As others have mentioned, you have to configure things for the right type of socket - the configuration of the amanda server (primarily in amanda.conf / disklist) and client (typically inetd config and amanda-client.conf) should match (see amanda-auth(7) and amanda-client.conf(5)). Here's some other good info so you can maybe help yourself and understand better how things work: http://wiki.zmanda.com/index.php/Quick_start_%28old%29
Re: amrecover works, normal amanda backup, logging connection refused
Gene Heskett wrote at 10:26 -0400 on Jul 18, 2014: > Trying to figure out why amanda can't backup this machine, one of the > things I noticed in /etc, is that on the shop box, which works, there is > not an /etc/xinetd.d but it has an old-xinetd.d with a single stanza > amanda file in it. > > An ls -lau shows that file, /etc/old-xinetd.d/amanda was apparently > accessed a few minutes ago by my amcheck from the server. > > However, on the new install on the machine that is failing to allow the > connection, there is an /etc/xinet.d, with an amanda file in it with an > old last access date/time, was not 'touched' when I ran the amcheck. Its > last access date/time is I believe, the date/time of the installation > itself. > > That amanda-common is 2.6.1p1 IIRC. > > amcheck says: > WARNING: lathe: selfcheck request failed: Connection refused Try running xinetd -d (then amcheck) to see if (or why not) xinetd is running amandad.
Re: pre/post scripting
Stefan G. Weichinger sgw-at-amanda.org |amusersj-ml0| wrote at 16:38 +0200 on Jul 9, 2014: > Am 09.07.2014 16:17, schrieb Stefan G. Weichinger: > > > > Would anyone mind sharing some real world scripts he uses with amanda? > > > > I think of stopping/starting DBs or something like that. > > > > I would appreciate some good templates ;-) > > I started playing with the email examples from the docs but they fail > straight away: > > > define script-tool sc-email { > comment "email me before this DLE is backed up" > plugin "script-email" > execute-on pre-dle-backup > execute-where server > property "mailto" "l...@xunil.at" > } > > > > gives me > > Jul 09 16:37:11 amanda Script_email[20663]: Use of uninitialized value > in concatenation (.) or string at > /usr/libexec/amanda/application/script-email line 181. > Jul 09 16:37:11 amanda Script_email[20663]: Use of uninitialized value > $args[2] in join or string at > /usr/libexec/amanda/application/script-email line 182. > Jul 09 16:37:11 amanda Script_email[20664]: Use of uninitialized value > $args[2] in open at /usr/libexec/amanda/application/script-email line 185. > Jul 09 16:37:11 amanda Script_email[20663]: Use of uninitialized value > in concatenation (.) or string at > /usr/libexec/amanda/application/script-email line 186. > > > Does that work for anyone else? > Does it need anymore properties set? > > Thanks, Stefan I'm not sure about the exact cause of the errors you're seeing, but it looks like the mailto check will not accept '@' or '.' (or dashes or underscores or numbers). To address that, maybe try this patch: --- libexec/amanda/application/script-email.orig 2009-11-06 10:27:46.0 -0700 +++ libexec/amanda/application/script-email2014-07-09 10:02:06.0 -0600 @@ -154,7 +154,7 @@ my $dest; if ($self->{mailto}) { my $destcheck = join ',', @{$self->{mailto}}; - $destcheck =~ /^([a-zA-Z,]*)$/; + $destcheck =~ /^([-_[:alnum:],@.]*)$/; $dest = $1; } else { $dest = "root"; Or don't try to do the mailer's job and just skip the whole destcheck part - let the mailer catch any errors: --- libexec/amanda/application/script-email.orig 2009-11-06 10:27:46.0 -0700 +++ libexec/amanda/application/script-email2014-07-09 11:02:18.0 -0600 @@ -153,9 +153,7 @@ my($function) = @_; my $dest; if ($self->{mailto}) { - my $destcheck = join ',', @{$self->{mailto}}; - $destcheck =~ /^([a-zA-Z,]*)$/; - $dest = $1; + $dest = join ',', @{$self->{mailto}}; } else { $dest = "root"; }
Re: A handshake from amanda?
Jon LaBadie wrote at 09:43 -0400 on Jul 1, 2014: > On Tue, Jul 01, 2014 at 09:19:19AM -0400, Gene Heskett wrote: > > Greetings all; > > > > Pursuant to a conversation on the dovecot list about the relatively long > > times involved in rebuilding the dovecot.index file when it gets out of > > sync. > > > > It strikes me that if the backup program could be co-erced into sending a > > signal when it starts to backup a named directory, a signal that holds it > > until the processing of incoming mail has been stopped and the ack signal > > that it has been stopped sent back to amanda, effectively freezing the > > contents of what would normally be an active directory, so that the email > > corpus AND all its indexes would then be in sync when the backup is done. > > > > This would make any recovery efforts later into a considerable smoother > > action. > > > > I can see where such a feature could also be useful for a database of most > > any sort, mail being only an example. > > > > How feasible would it be to add this capability to amanda? > > > I suspect a difficult problem would be how to get the multiple programs > that modify the named directory to honor flag. There already is support for performing operations before and after the dump (among other things): http://wiki.zmanda.com/index.php/Script_API For older amanda versions that don't have the script API, the classic method (which you can still use with newer amanda) is to configure your amanda client to use a wrapper script instead of gtar or dump. Then your wrapper script can determine if the DLE is one for which you want to run some command to suspend normal operations (e.g., quiesce a database or mail server) during the backup.
Re: conflicting types for 'g_queue_free_full'
Subscriptions wrote at 02:21 + on Jun 16, 2012: > As there is no Amanda 3.3.1 binary build for 64 bit Ubuntu 12.04, I've > downloaded the latest stable version, but when I run the make I get the > following error > > > libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../config -I../gnulib > -fno-strict-aliasing -D_GNU_SOURCE -pthread -I/usr/include/glib-2.0 > -I/usr/lib/x86_64-linux-gnu/glib-2.0/include -Wall -Wextra -Wparentheses > -Wdeclaration-after-statement -Wmissing-prototypes -Wstrict-prototypes > -Wmissing-declarations -Wformat -Wformat-security -Wsign-compare > -Wfloat-equal -Wold-style-definition -Wno-strict-aliasing > -Wno-unknown-pragmas -g -O2 -fno-strict-aliasing -MT amxml.lo -MD -MP > -MF .deps/amxml.Tpo -c amxml.c -fPIC -DPIC -o .libs/amxml.o > In file included from util.h:39:0, > from amxml.c:34: > glib-util.h:75:6: error: conflicting types for 'g_queue_free_full' > /usr/include/glib-2.0/glib/gqueue.h:76:10: note: previous declaration of > 'g_queue_free_full' was here > make[3]: *** [amxml.lo] Error 1 > > I ran ./configure with the defaults prior to make Just remove g_queue_free_full from common-src/glib-util.[ch] ... http://amanda.svn.sourceforge.net/viewvc/amanda?view=revision&revision=4592 I don't think the version in amanda was ever(?) used.
RE: aclocal fails since AMANDA_INIT_VERSION call in configure.in
I see no reference to m4_divert_diversion in AMANDA_INIT_VERSION ... AC_DEFUN([AMANDA_INIT_VERSION], [ m4_syscmd([test -f FULL_VERSION]) m4_if(m4_sysval, [0], [ m4_define([AMANDA_F_VERSION], m4_chomp(m4_include([FULL_VERSION]))) ], [ m4_define([AMANDA_F_VERSION], m4_chomp(m4_include([VERSION]))) ]) VERSION=AMANDA_F_VERSION ]) And those all look like pretty standard macros (defined by the autoconf package) that are referenced in AMANDA_INIT_VERSION. Ah... from the autoconf info pages... . . Unfortunately older versions of Automake (e.g., Automake 1.4) did not quote the names of these macros. Therefore, when `m4' finds something like `AC_DEFUN(AM_TYPE_PTRDIFF_T, ...)' in `aclocal.m4', `AM_TYPE_PTRDIFF_T' is expanded, replaced with its Autoconf definition. Fortunately Autoconf catches pre-`AC_INIT' expansions, and complains, in its own words: $ cat configure.ac AC_INIT([Example], [1.0], [bug-exam...@example.org]) AM_TYPE_PTRDIFF_T $ aclocal-1.4 $ autoconf aclocal.m4:17: error: m4_defn: undefined macro: _m4_divert_diversion aclocal.m4:17: the top level autom4te: m4 failed with exit status: 1 $ Modern versions of Automake no longer define most of these macros, and properly quote the names of the remaining macros. If you must use an old Automake, do not depend upon macros from Automake as it is simply not its job to provide macros (but the one it requires itself): . . Investigate your automake installation. Kervin L. Pierre wrote at 13:19 + on Jun 11, 2012: > Hello Jean-Louis, > > The error is from autogen. I believe the error is at least related > to the new AMANDA_INIT_VERSION macro in configure.in. If I remove > that first line in configure.in the then the error goes away and > Amanda builds. > > # ./autogen > See DEVELOPING for instructions on updating: > * gettext macros > * gnulib > * libtool files > ..creating file lists > ..aclocal > configure.in:1: error: m4_defn: undefined macro: _m4_divert_diversion > configure.in:1: the top level > autom4te: /usr/bin/m4 failed with exit status: 1 > aclocal: autom4te failed with exit status: 1 > aclocal failed > > # aclocal --version > aclocal (GNU automake) 1.11.1 > > Best regards, > Kervin > > > Adevsoft Inc > Business Software Development > http://adevsoft.com/ > > > > -Original Message- > > From: Jean-Louis Martineau [mailto:martin...@zmanda.com] > > Sent: Monday, June 11, 2012 8:24 AM > > To: Kervin L. Pierre > > Cc: amanda-users@amanda.org > > Subject: Re: aclocal fails since AMANDA_INIT_VERSION call in > > configure.in > > > > On 06/09/2012 12:08 PM, Kervin L. Pierre wrote: > > > I'm building on a stock Amazon Linux server with all available > > patches. > > > > > > But it seems that since the new AMANDA_INIT_VERSION macro call was > > added a few weeks ago to configure.in, I haven't been able to run > > autogen.sh without error. > > > > > > Removing the AMANDA_INIT_VERSION call before AC_INIT seems to be the > > only work around I've found. > > > > > > Best regards, > > > Kervin > > > > > > > Kevin, > > > > What error do you get? > > > > There is no autogen.sh in amanda, the program is autogen > > > > Jean-Louis > > >
Re: Need help with new architecture for NAS/NFS setup
Brendon Martino wrote at 11:53 -0400 on Nov 3, 2011: > I'm running Amanda version 2.6.0p2-14 on Fedora 10. My current > architecture for Amanda is as follows: . . > How do I > implement my architecture to only keep about a week (or even a day) of > backups in the holding disk (locally on the system) but use the nfs > storage space for archiving the rest of the old backups? The idea is > that we keep 30 to 60 days worth of old backups on the NAS, but only the > last day or few days locally on the backup server. > > How do I do that? Is it possible? What would be the general idea/layout? > What directives would I need to change? Would I need to use multiple > DailySets?? I'm totally stumped. Any advice would be greatly appreciated. In addition to other suggestions on this thread, there's also amvault. But not with amanda-2.6.
Re: possible issues with upgrade
Jon LaBadie wrote at 18:49 -0400 on Jul 15, 2011: > On Fri, Jul 15, 2011 at 05:08:24PM -0400, Chris Hoogendyk wrote: > > Thanks, Brian. So, basically, the upgrade in general is pretty > > straightforward. > > > > The key point, though, is that I am promoting an important Amanda > > client to be the new Amanda server. During that process, there may > > be a period of time when the old server (running 2.5.1p3) will be > > talking to this important client that has been updated to 3.3.0, but > > not yet taken over as server. I don't want to miss backing it up > > during that time. > > > > So, briefly, will a 2.5.1p3 server have trouble with a 3.3.0 client > > (just until I get things completely swapped around)? > > I could fall back on the claim "amanda is generally compatible with > old releases", but that is not a lot of comfort without specific > experience. You may not need to run that gamut if you can briefly > run your backups with two servers. > > Upgrade your new server (NS), comment it out of the disklist of the > old server (OS). Let NS back itself up as a client. > > Upgrade one or a few of the other OS clients and make them clients > of NS. Eventually all OS will be backing up is itself and you > will have lots of experience upgrading clients and adding to NS. > That is when you finally do OS and again have a single server. I'll add that I usually configure amanda with --prefix=/some/place/amanda- so I can have multiple versions of amanda around at the same time (and accessible via shared NFS). That way it's easy to try different versions and still be able to go back (and forth).
Re: Setting Up SSH transport. SOLVED
Charles Curley wrote at 12:46 -0600 on Jun 3, 2011: > Problem solved. As often happens, user error. > > When you log in over SSH the first time, you get the usual "The > authenticity of host 'foo' can't be established." message. To avoid > that, you log in manually, accept the fingerprint entry in > known_hosts. After that, no prompts from SSH on login. Amanda doesn't > handle the first login well: it silently sigts there. To avoid that, > you do the first login by hand, as noted in the HOWTO. Did you try using StrictHostKeyChecking=no (at least when initially adding a new host)? When you update the wiki, maybe that would be a useful hint as well.
Re: strategies for Mac desktops
As you surmised, these are mostly gtar questions. If your DLE is not a filesystem, then the other dump-ish choices are out. But gtar (and star as well as the various flavors of dump) does _try_ to save space when it encounters hard links. For instance (gtar 1.26), mkdir xx dd if=/dev/zero of=xx/z count=1000 gtar cf a.tar xx ln xx/z xx/l gtar cf b.tar xx a.tar & b.tar should be the same or similar size. However, for an incremental dump on a DLE which has a new hard link, I don't think gtar will get you the savings you want (if I'm reading your reasoning correctly). It's only when the original is in the same tar image that you get the savings (i.e., not when the original is in a different tarball). mkdir xx dd if=/dev/zero of=xx/z count=1000 /bin/rm l0 gtar cf 0.tar --listed-incremental=l0 xx ln xx/z xx/l cp -p l0 l1 gtar cf 1.tar --listed-incremental=l1 xx touch xx/foo cp -p l1 l2 gtar cf 2.tar --listed-incremental=l2 xx In theory, tar's incremental mode might be able to realize that 'l' points to 'z' and 'z' hasn't changed, so just archive the hard link "meta-data" (i.e., not the contents). But I don't think gtar rolls like that - I'm not sure where/if the aforementioned theory may have holes, but it seems it's not implemented that way at this time. tar tvf 1.tar drwxr-xr-x jhein/jhein 7 2011-06-02 14:41 xx/ -rw-r--r-- jhein/jhein 512000 2011-06-02 14:41 xx/l hrw-r--r-- jhein/jhein 0 2011-06-02 14:41 xx/z link to xx/l In this simple test 1.tar is just as big as 0.tar. 2.tar is smaller, of course. And if you just touch xx/l (or xx/z), then a 3.tar will be "big" again. Testing for dumps (ufs, zfs) is left as an exercise for the reader ;). If you find out, let us know. Doing a quick test with star seems to show it behaves the same as gtar. Chris Hoogendyk wrote at 15:18 -0400 on Jun 2, 2011: > OK, so maybe I shot myself in the foot by asking too much (no replies from > anyone in over 24 hours) ;-). > > Let me pare this down to one simple question -- Will Amanda efficiently do > server side incremental > backups of hard link intensive rsync based backups being stored on the > server from a workstation > (Mac or otherwise)? In other words, if the workstation creates a new folder > on the server and > populates it with hard links before running an rsync against it, will Amanda > see that as all being > new and backing up essentially a full of the user's files? > > I understand Amanda uses native tools and there is a possibility that this > will vary depending on > whether the server is using gnutar on a zfs volume, or ufsdump on a ufs > volume, etc. I'm just hoping > that someone has some specific experience they can relate, especially since > Zmanda is working with > BackupPC now. > > I'm guessing from Dustin's April 12, 2010 blog at http://code.v.igoro.us/ > (cyrus imap under list of > possible projects), that gnutar probably still doesn't deal well with the > hard links. I saw some > references while I was digging that imply that ufsdump should be alright. > But, I'd still like to > hear from anyone who has first hand experience or definitive knowledge. > > TIA, > > Chris Hoogendyk > > > On 6/1/11 11:20 AM, Chris Hoogendyk wrote: > > I haven't tried this yet, but I'm hoping to get some comments and guidance > > from others who may be > > doing it. One particular question is set off in all caps below, but, more > > generally, I'm open to > > comments and advice on any of this. > > > > I have a number of Amanda installations in different departments and > > buildings that backup all my > > servers. They've all got tape libraries now and typically run a 6 week or > > better daily rotation > > with a weekly dump cycle. > > > > In the past I have punted on desktops, providing share space on the server > > and advising people to > > put what they want backed up on the server. Now we have converted most of > > our office staff to > > Macs, and I want to take a more integrated and automated approach for > > critical staff machines. I > > have a few options I'm looking at. One would be to automate Time Machine > > to a share on the server > > and back that up with Amanda. Another would be to script rsync to a server > > share and back that up > > with Amanda (we're using Samba for shares). The third would be to > > implement Amanda Client on Mac > > OS X for the staff and back that up from the server. Each of these > > approaches has advantages and > > disadvantages. > > > > If you have seen W. Curtis Preston's analysis of Time Machine, that > > provides some background to my > > questions. He wrote two blog posts. One breaks down time machine and > > expresses some complaints > > about it. The second replicates what time machine is doing using scripting > > with rsync. > > > > http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/280-time-machi
Re: Bacula --> Amanda migration
Gour wrote at 16:23 +0100 on Mar 3, 2011: > Is there any concern when migrating Amanda from Linux to FreeBSD? Amanda should not have problems. If you hit a snag, ask the list. You may hit issues like the mt(1) syntax is different (use mt for things like setting blocksize and disabling hardware compression before amanda starts up).
long running amandad on client
I have a client with an amandad that has been running since Sep 23... backup 97592 0.0 0.1 26780 7016 ?? Ss 23Sep10 40:30.43 amandad Most of the backups on that client still work fine. But two DLEs fail nightly. On the server, you get: 1286525084.841860: chunker: getcmd: START 20101007210002 1286525084.841877: chunker: getcmd: PORT-WRITE 00-00195 /holding/20101007210002/someclient._somedle.0 someclient 9ffe7f /somedle 0 1970:1:1:0:0:0 51 2000 APPLICATION 36 |;auth=BSD;compress-fast;index;exclude-file=.no-amanda-backup;exclude-file=.nobak;exclude-file=.noback;exclude-file=.nodump; 1286525084.842069: chunker: stream_server opening socket with family 2 (requested family was 2) 1286525084.842086: chunker: try_socksize: receive buffer size is 65536 1286525084.844115: chunker: bind_portrange2: Try port 11017: Available - Success 1286525084.844135: chunker: stream_server: waiting for connection: 0.0.0.0.11017 1286525084.844142: chunker: putresult: 23 PORT 1286525084.847225: chunker: stream_accept: connection from 127.0.0.1.11002 1286525084.847233: chunker: try_socksize: receive buffer size is 65536 1286525264.872340: chunker: putresult: 10 FAILED 1286525264.872462: chunker: pid 18935 finish time Fri Oct 8 02:07:44 2010 The amandad log on the client shows nothing at the 1286525084 timestamp (yes, the hosts in questions have good time sync). It does show sendbackup entries after the 3 minute timeout on the server above (1286525264 timestamp). So the client amandad seems to just be slow in responding. It's not clear why this long running amandad is slow in responding for a couple DLEs, but it's definitely abnormal to have such a long running amandad to begin with. lsof shows lots of open file descriptors like so: amandad 97592 backup 609u PIPE 0xff01dccfa00016384 ->0xff01dccfa158 There may be a descriptor leak bug, but that's sort of unimportant since _usually_ amandad runs only briefly. The real question is: why is amandad not exiting? Has anyone seen this before? I plan to kill amandad on the client, but I'll leave it running for a bit longer in case there might be something that can be learned. Unfortunately, this is amanda-2.6.1p1 on the client, so interest in learning about this anomaly in that code is likely low.
Re: Nitpick, enhancement
Brian Cuttler wrote at 09:20 -0400 on Sep 14, 2010: > Not sure which part of amanda is the driver, amdump itself ? driver & planner are executed in amdump. You can see yourself - it's just a script.
vDLE (was: [not a?] Nitpicks - rename DLE)
Dustin J. Mitchell wrote at 14:56 -0500 on Sep 13, 2010: > On Mon, Sep 13, 2010 at 2:43 PM, Brian Cuttler wrote: > > Virtual DLEs !?! > > > > That is EXACTLY what we need ! > > > > I know you warned us, but I'm REALLY Excited about this ! > > > > That would _so_ fix my Terrabyte sized DLE problem... > > Yes, yes it would. It would fix a lot of problems! > > I don't think it's the right solution to the problem, though. It > takes as fundamental Amanda's funny notion of DLEs and exclusion > lists, and tries to build a working system around that. If it could > work in a way that makes any sense to the user, I might be convinced, > but as it stands there are some *very* significant unanswered > questions, and probably a lot of more subtle problems, too. > > Instead, we should look at how other backup software handles similar > problems, and consider throwing out some long-standing Amanda > weakne^Wfeatures. You can see why this becomes a contentious issue > very quickly. One big weakness I always decry is that amanda can't automatically balance (even possibly with a hint from the admin) DLE sizes. Consider how many times you have had to manually break up a DLE because it takes too long to back up (hits a timeout) or exceeds certain system capacities - holding disk, tape, etc. Well, with split dumps, tape size is less of a problem for this issue these days. This is a separate issue from renaming DLEs. Well, at least, I wasn't thinking of it being the same. However, I can see how a similar mechanism could be leveraged for various uses (including meta-DLEs, DLE groups, balanced DLE dump sizing). [Subject changed to reflect the thread hijack^W^Wchange in scope] This has a lot of possibilities and could quickly get hard to bite off a piece to implement. I suppose it would be good to take a little time to implement a good solid base for a few potential flavors of feature candy in this area.
Re: [not a?] Nitpicks - rename DLE
Chris Lee wrote at 22:23 +0100 on Sep 13, 2010: > The way I see it the DLE is for the admin to make sub divisions in > the backup to make backing up easier to manage. The software should > store the file object with path and hostname, with the DLE as an > after thought; that way for the other direction of recovery I only > need the Date, file and hostname, the backup software should do the > gruntwork to find the tape required and DLE if it needs to know to > get my file back. What if the hostname changes - let's say just slightly (like it moves to a different subdomain)? I was looking for a way to have amanda (or tell amanda) about the history change and have operations (restore, incremental backup, etc.) be able to cross the history boundary seamlessly.
Re: [not a?] Nitpicks - rename DLE
Dustin J. Mitchell wrote at 14:09 -0500 on Sep 13, 2010: > On Sun, Sep 12, 2010 at 9:36 PM, John Hein wrote: > > This may not be considered a nitpick but more of a feature request. > > > > If I move a disk or rename a host or move the host to a different > > domain, it'd be nice to be able to rename the disklist entry (DLE) and > > have history tracking, incremental planning, and most importantly > > recover/restore operations off tape know to follow the rename. > > > > Maybe it's as simple as allowing one or more "alternate DLE name" or > > "alias" (if you will) entries in a DLE (note the casual insertion of > > the word "simple" does not imply I have a patch, sorry). > > > > Going back and doing a rename on log files, index files, dump files, > > etc., is, of course, not practical and not really desired in terms of > > representing history of a name change. > > This is an interesting idea, both for the purpose you describe, and > for the very futuristic and don't-get-excited-yet idea of "virtual > DLEs", where Amanda automatically splits DLEs based on size of > subdirectories. The main problem with virtual DLEs has been recovery: > if Amanda is backing up a particular file in a different DLE every > day, then it's going to be difficult to find it when it comes to > running amrecover. Incrementals are also a problem: changing the > boundaries between DLEs obviously requires doing a full backup of all > of the affected DLEs on the next run. At least, unless we're going to > become gnutar-specific and start futzing with the data in the backups > on the server side. > > As you can see, complicated. But a consistent approach to storing the > DLE and path of a particular "user object" over time would be a useful > first step. Do you have any thoughts on how that might be > implemented? It may be useful to consider having amanda store the disklist it uses with each backup (in the index db) [1]. And that there is some way to correlate the "same" DLE in one disklist that may just have a different hostname (or I suppose filesystem mount point - let's call it a host/filesystem tuple because that sounds fun) to the renamed DLE in another disklist. I suppose we could take a page from revision control design and have a DLE ID (SHA checksum perhaps of the "important" identifying contents of the DLE - that is, not things like maxdumps). And the hefty work for this feature would be to add would be a way to link DLE IDs together and have amanda understand the potential equality between more than one DLE ID for the various history-traversing operations she does. [1] I keep the disklist in revision control anyway since it would need to be consulted to properly restore some data from a year ago that may have moved around. Having amanda handle that tracking would be nice.
[not a?] Nitpicks - rename DLE
Dustin J. Mitchell wrote at 10:55 -0500 on Sep 9, 2010: > I bet most of you have some small nitpick with Amanda that you've > never felt warranted an email. Well, now's your chance! I'd like to > put some polish on Amanda, and it's hard for me to see the areas that > need burnishing, since I work on Amanda all day, every day. This may not be considered a nitpick but more of a feature request. If I move a disk or rename a host or move the host to a different domain, it'd be nice to be able to rename the disklist entry (DLE) and have history tracking, incremental planning, and most importantly recover/restore operations off tape know to follow the rename. Maybe it's as simple as allowing one or more "alternate DLE name" or "alias" (if you will) entries in a DLE (note the casual insertion of the word "simple" does not imply I have a patch, sorry). Going back and doing a rename on log files, index files, dump files, etc., is, of course, not practical and not really desired in terms of representing history of a name change.
Re: script help?
Gene Heskett wrote at 12:04 -0400 on Jul 21, 2010: > Greetings all; > > My catchup script seems to be working with 2 exceptions, first being that I > am not getting any emails from it, so I installed dnsmasq to see if that > fixes that. > > 2nd, each pass through my catchup script loop elicits an error warning from > the main script: > > ./flush.sh: line 173: [: !=: unary operator expected > > from that script: > # or are we running as flush.sh > > if [ $0 == "./flush.sh" ] || [ $0 == "${MYDIR}/flush.sh" ] || [ $0 == > "flush.sh" ]; then == is a bash-ism. Get in the habit of using = for posix compliance. You'll thank yourself when you might have to run a script on a system that doesn't use bash for sh(1) (bsd's, solaris). "Always" put quotes around vars ("$foo", "$0", etc.) since vars can contain strings with spaces confusing test(1) (aka [). You don't need quotes around literals that you have written ("./flush.sh" in your case above). So in '[ $0 == "./flush.sh" ]', you have your quoting backwards (with respect to defensive script writing). > # we don't want amflush to disconnect or ask questions > if [ `/bin/ls /dumps` != "" ] ; then <---line 173 > echo "Backup script running amflush -bf $CONFIGNAME " |tee - > a >> $LOG Same deal (use quotes) for back-tick expressions (`cmd` or $(cmd)). And that's your problem. Looks like `ls /dumps` is providing empty output. example: if [ `echo 1 2` = 1 ]; then echo y; else echo n; fi [: 1: unexpected operator or bash -c 'if [ `echo -n` != "" ]; then echo y; else echo n; fi' bash: [: !=: unary operator expected if [ "`echo 1 2`" = 1 ]; then echo y; else echo n; fi n
Re: ZWC and exclude/include lists
Sorry to hijack your thread, but I tried 'exclude file' a few months ago and they didn't work either. Chris Nighswonger wrote at 11:00 -0400 on Jun 9, 2010: > Does ZWC honor exclude/include lists? > > I have a DLE like: > > foo.bar.com "C:\\Documents\ and\ Settings" { > exclude list "C:\\.exclude > zwc-compress > estimate server > } > > c:\.exclude includes several entries like: > > .\user1 > .\user2 > > Looking at a tcpdump of the resulting transactions, I see the exclude > list passed to the client, but the client still dumps the entirety of > c:\documents and settings. > > Am I missing some syntax error here? > > Or perhaps this is the purpose of templates in the ZWC? > > Kind Regards, > Chris
RE: runtar error that I do not understand
As a workaround, perhaps you could "unhide" the snapshot directory. man zfs. McGraw, Robert P wrote at 11:23 -0400 on Jun 2, 2010: > I am not sure if this got sent to the group so I an fordwarding. > > This explains what is going on in the gtar code to cause gtar to seg fault. > All the accolades should go to my SA partner Chapman Flack. He is in the > process of sending this to bug-tar as suggested. > > Robert > > > > "Apparently under only some circumstances, gtar tries to use the old > algorithm for finding the cwd. That fails beneath .zfs which is a sort of > magical name that isn't there unless you're looking for it exactly. > > But it must just be a special combination of options that makes gtar try to > do that, because in simple invocations it doesn't: > > hertz /homes % ls .z*# nobody here but us chickens... > ls: No match. > hertz /homes % cd .zfs # oh, you mean ME? > hertz .zfs % cd snapshot/TODAY/jflack > % gtar --create --file /dev/null bar # works fine > > Aha, it's the --listed-incremental option. This makes gtar want to create > the snar file with names and metadata of files it sees: > > % gtar --create --file /dev/null --listed-incremental=/tmp/snar bar > Segmentation fault > > The funny thing is, if I cd to ~ where it works... > > % cd ~ > % gtar --create --file /dev/null --listed-incremental=/tmp/snar bar % > > ...and then look in /tmp/snar, it only contains relative paths... > ...so it STILL doesn't explain why gtar even wants to get the cwd in that > case, but it's clear from the truss that's what it's doing. > > One workaround might be to do a loopback mount of the desired snapshot onto > some other path that's not beneath .zfs, and do the backup from there. > > -Chap" > > > -Original Message- > > From: owner-amanda-us...@amanda.org [mailto:owner-amanda- > > us...@amanda.org] On Behalf Of Nathan Stratton Treadway > > Sent: Tuesday, May 04, 2010 2:45 PM > > To: amanda-users@amanda.org > > Subject: Re: runtar error that I do not understand > > > > > > On Tue, May 04, 2010 at 13:03:09 -0500, Dustin J. Mitchell wrote: > > > On Tue, May 4, 2010 at 12:27 PM, McGraw, Robert P > > wrote: > > > > I setup a lookback for the snapshot and now gtar seem to be > > working. > > > > > > It sounds like you have a fairly good understanding of this problem > > > now. Could you write up either a troubleshooting or How-To article > > on > > > the wiki? > > > > Also, you might want to send a bug report about this to the > > bug-...@gnu.org list -- even if the underlying problem is that .zfs > > doesn't behave normally, I suspect they'd be interested in knowing > > about > > the issue so that they can at least avoid having an abort-with-segfault > > in that situation... > > > > http://www.gnu.org/software/tar/#maillists > > > > > >Nathan > > > > --- > > - > > Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region > > Ray Ontko & Co. - Software consulting services - > > http://www.ontko.com/ > > GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: > > 1023D/ECFB6239 > > Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239 > >
Re: no-reuse tape was written to
Dustin J. Mitchell wrote at 12:22 -0600 on Mar 10, 2010: > On Wed, Mar 10, 2010 at 11:44 AM, John Hein wrote: > > Has anyone seen a tape that was marked no-reused get _used_? > > It happened last night here (amanda-2.6.1p2). > > To take a guess: label_new_tapes is set and the tape drive encountered > an error when try to read the label, thus not recognizing it as an > already-labeled tape. > > Just a guess.. label_new_tapes is not set in amanda.conf A changer (chg-zd-mtx) with barcodes is involved, and the debug logs show it specifically using "-slot next" to unload the current tape that was in the drive when amdump started and load the no-reuse tape. I haven't looked in the code to see if & where it cares to look at no-reuse and decide not to use the tape. I was wrong, BTW. This is 2.6.1p1 (not p2).
no-reuse tape was written to
Has anyone seen a tape that was marked no-reused get _used_? It happened last night here (amanda-2.6.1p2).
Re: Backup issues with OpenBSD 4.5 machines
stan wrote at 16:59 -0400 on Aug 24, 2009: > The firts thing I notice when comparing this function in 2.5.0 vs 2.5.2 is > that 2.5.0 does: > > tv.tv_usec = 0; > > and 2.5.2 does not. Could thim make a difference? Both do > > tv.tv_sec = timeout; In 2.5.2, the memset sets the entire struct to 0. 2.5.0 is slightly more efficient, but otherwise the results wind up being the same. Nothing to see there.
Re: Error redirecting stderr to fd 52
Jean-Louis Martineau wrote at 11:48 -0400 on Aug 24, 2009: > John Hein wrote: > > On a 2.6.1b1 client ... > > > Hmm, beta software ... > > It's not fixed in 2.6.1 neither in 2.6.1p1. > > You must use the latest 2.6.1p1 snapshot from > http://www.zmanda.com/community-builds.php Building a new version now. Another interesting note. On the client in question, amandad is still running, but it shouldn't be. It's got a couple unreaped zombies and is waiting in select. ps awwx -o user,pid,ppid,start,stat,wchan,command | grep backup backup 37628 60010 4:43AM Z- backup 37629 60010 4:43AM Z- backup 60010 23993 7:01PM Ss select amandad lsof shows quite a few file descriptors still open. COMMAND PID USER FD TYPE DEVICE SIZE/OFFNODE NAME amandad 60010 backup0u IPv4 0xc87f5438 0t0 UDP *:amanda amandad 60010 backup1u IPv4 0xc87f5438 0t0 UDP *:amanda amandad 60010 backup2u IPv4 0xc87f5438 0t0 UDP *:amanda amandad 60010 backup4u IPv4 0xcdd09bf4 0t0 UDP *:58068 amandad 60010 backup 10w VREG 0,88 107863 331113 / -- amandad/amandad.20090823190102.debug amandad 60010 backup 21u PIPE 0xcb0064c816384 ->0xcb006580 amandad 60010 backup 22u PIPE 0xc916a99016384 ->0xc916aa48 amandad 60010 backup 23u PIPE 0xcd62999016384 ->0xcd629a48 amandad 60010 backup 24u PIPE 0xce8a899016384 ->0xce8a8a48 amandad 60010 backup 28u PIPE 0xc86b47f816384 ->0xc86b48b0 amandad 60010 backup 29u IPv4 0xce6f63a0 0t0 TCP *:6108 (LISTEN) amandad 60010 backup 30u PIPE 0xc8d2f33016384 ->0xc8d2f3e8 amandad 60010 backup 31u PIPE 0xc946619816384 ->0xc9466250 amandad 60010 backup 33u PIPE 0xca15c33016384 ->0xca15c3e8 amandad 60010 backup 37u IPv4 0t0 TCP no PCB, CANTSENDMORE, CANTRCVMORE amandad 60010 backup 39u PIPE 0xcdeb4cc016384 ->0xcdeb4d78 amandad 60010 backup 40u PIPE 0xc9c307f816384 ->0xc9c308b0 amandad 60010 backup 42u PIPE 0xc96317f816384 ->0xc96318b0 amandad 60010 backup 44u PIPE 0xc9c3066016384 ->0xc9c30718 amandad 60010 backup 46u PIPE 0xc9c1d33016384 ->0xc9c1d3e8 amandad 60010 backup 47u PIPE 0xc914e7180 ->0xc914e660 amandad 60010 backup 48u PIPE 0xcc4dd4c816384 ->0xcc4dd580 amandad 60010 backup 49u PIPE 0xc96be7f816384 ->0xc96be8b0 amandad 60010 backup 50u PIPE 0xce8a833016384 ->0xce8a83e8 amandad 60010 backup 51u PIPE 0xc914e5800 ->0xc914e4c8 amandad 60010 backup 52u PIPE 0xc914ecc016384 ->0xc914ed78 amandad 60010 backup 60u PIPE 0xc8e1033016384 ->0xc8e103e8 amandad 60010 backup 61u PIPE 0xcc37fb2816384 ->0xcc37fbe0 amandad 60010 backup 62u PIPE 0xce64966016384 ->0xce649718 amandad 60010 backup 63u PIPE 0xcaa0099016384 ->0xcaa00a48 amandad 60010 backup 64u PIPE 0xc912f66016384 ->0xc912f718 amandad 60010 backup 65u PIPE 0xc90f1cc016384 ->0xc90f1d78 A few seconds of tracing the process shows.. 60010 amandad RET poll 0 60010 amandad CALL gettimeofday(0xbfbfea88,0) 60010 amandad RET gettimeofday 0 60010 amandad CALL gettimeofday(0xbfbfea88,0) 60010 amandad RET gettimeofday 0 60010 amandad CALL poll(0x8052600,0x1,0x7530)
Error redirecting stderr to fd 52
On a 2.6.1b1 client ... 1251090802.506210: sendbackup: pid 61161 ruid 5001 euid 5001 version 2.6.1b1: start at Sun Aug 23 23:13:22 2009 1251090802.506278: sendbackup: Version 2.6.1b1 1251090802.511032: sendbackup: pid 61161 ruid 5001 euid 5001 version 2.6.1b1: rename at Sun Aug 23 23:13:22 2009 1251090802.511055: sendbackup: sendbackup req: 1251090802.511102: sendbackup: Parsed request as: program `GNUTAR' 1251090802.511109: sendbackup: disk `/data' 1251090802.53: sendbackup: device `/data' 1251090802.58: sendbackup: level 1 1251090802.511123: sendbackup: since 2009:8:21:6:55:50 1251090802.511128: sendbackup: options `|;auth=bsd;compress-fast;index;exclude-list=/site/etc/amanda/exclude-gtar;' 1251090802.511203: sendbackup: Error redirecting stderr to fd 52: Bad file descriptor 1251090802.511215: sendbackup: pid 61161 finish time Sun Aug 23 23:13:22 2009 Has anyone ever seen that? This is on a client with about a dozen DLEs with possibly 3 dumping in parallel at a time. The DLE in question is not small - certainly not so small to complete in 5 ms. This DLE sometimes works. Sometimes a different one fails the same way. It looks like the mesgfd in client-src/sendbackup.c is getting closed before dup2(2) runs. Perhaps a race. amandad log (below) shows no obvious trouble other than it is continuing to do work after the child sendbackup has failed (as seen in the log output above). The two security_stream_close messages seem to be different than the log messages associated with DLEs that worked. The working ones have three security_stream_close messages. But, oddly, the DLE that worked (and was small) right _before_ the failed DLE did _not_ have any security_stream_close messages. Possibly a clue. In fact, it seems all the failures are happening right after a small (< 10 MB), and thus quick, dump. Could just be a coincidence. 1251090802.228997: amandad: dgram_recv(dgram=0x280c2a04, timeout=0, fromaddr=0x280d29f0) 1251090802.229037: amandad: (sockaddr_in *)0x280d29f0 = { 2, 703, 206.168.13.161 } 1251090802.229055: amandad: security_handleinit(handle=0x8052600, driver=0x280bc520 (BSD)) 1251090802.235787: amandad: accept recv REQ pkt: < SERVICE sendbackup OPTIONS features=9ffe00;hostname=bunny;config=test; GNUTAR /data 1 2009:8:21:6:55:50 OPTIONS |;auth=bsd;compress-fast;index;exclude-list=/site/etc/amanda/exclude-gtar; > 1251090802.237043: amandad: creating new service: sendbackup OPTIONS features=9ffe00;hostname=bunny;config=test; GNUTAR /data 1 2009:8:21:6:55:50 OPTIONS |;auth=bsd;compress-fast;index;exclude-list=/site/etc/amanda/exclude-gtar; 1251090802.237710: amandad: sending ACK pkt: < > 1251090802.237764: amandad: dgram_send_addr(addr=0x8052620, dgram=0x280c2a04) 1251090802.237772: amandad: (sockaddr_in *)0x8052620 = { 2, 703, 206.168.13.161 } 1251090802.237779: amandad: dgram_send_addr: 0x280c2a04->socket = 0 1251090802.511364: amandad: security_streaminit(stream=0x81dd000, driver=0x280bc520 (BSD)) 1251090802.511719: amandad: stream_server opening socket with family 2 (requested family was 2) 1251090802.511736: amandad: try_socksize: send buffer size is 65536 1251090802.511743: amandad: try_socksize: receive buffer size is 65536 1251090802.512604: amandad: bind_portrange2: Try port 6108: Available - Success 1251090802.512617: amandad: stream_server: waiting for connection: 0.0.0.0.6108 1251090802.512643: amandad: security_streaminit(stream=0x81e6000, driver=0x280bc520 (BSD)) 1251090802.512658: amandad: stream_server opening socket with family 2 (requested family was 2) 1251090802.512669: amandad: try_socksize: send buffer size is 65536 1251090802.512677: amandad: try_socksize: receive buffer size is 65536 1251090802.513496: amandad: bind_portrange2: Try port 6108: Available - Address already in use 1251090802.514300: amandad: bind_portrange2: Try port 6109: Available - Success 1251090802.514311: amandad: stream_server: waiting for connection: 0.0.0.0.6109 1251090802.514319: amandad: security_streaminit(stream=0x81ef000, driver=0x280bc520 (BSD)) 1251090802.514333: amandad: stream_server opening socket with family 2 (requested family was 2) 1251090802.514344: amandad: try_socksize: send buffer size is 65536 1251090802.514351: amandad: try_socksize: receive buffer size is 65536 1251090802.515189: amandad: bind_portrange2: Try port 6108: Available - Address already in use 1251090802.515991: amandad: bind_portrange2: Try port 6109: Available - Address already in use 1251090802.516772: amandad: bind_portrange2: Skip port 6110: Owned by softcm. 1251090802.517540: amandad: bind_portrange2: Skip port 6111: Owned by spc. 1251090802.518336: amandad: bind_portrange2: Try port 6112: Available - Success 1251090802.518348: amandad: stream_server: waiting for connection: 0.0.0.0.6112 1251090802.518355: amandad: sen
Re: Backup issues with OpenBSD 4.5 machines
stan wrote at 13:56 -0400 on Aug 21, 2009: > OK, I reproduced the failure with only a crossover cable between the test > client and the Amanda Master: Just because you're using a crossover cable doesn't rule out firewall or other such socket level interference. I'm not saying that's your problem, but using a crossover cable doesn't rule it out. > 192.168.1.2:wd0f 0 dumper: [could not connect DATA stream: can't connect > stream to 192.168.1.2 port 24376: Connection refused] (13:48:23) > > Note the 192.168.1.2 address :-) > > This is with a 2.5.2p1 clinet on OpenBSD 4.5 2.5.0p1 works on this same > machine/OS/netwrok configuration. > > So, it appears to me that this must be because of something that changed > between 2.5.0p1 and 2.5.2p1. And we have a pretty good idea where in the > code this is failing. So can anyone enlighten me as to what chaged in this > area between those 2 versions? I haven't looked to see what changed between 2.5.0 and 2.5.2. It's all pretty basic socket stuff. I wouldn't be surprised if that is when the additional auth mechanisms (bsdudp, bsdtcp) were added. However, if no one chimes in, it's not that hard to look yourself. If you can narrow it down a bit to where there seems to be a problem in the code, the amanda-hackers@ list might be able to help more.
Re: Backup issues with OpenBSD 4.5 machines
stan wrote at 10:56 -0400 on Aug 21, 2009: > OK here is the latest on this saga :-) > > On one of the OpenBSD 4.5 machines I have built 2.5.0p1, and was able to > back this machine up successfully (using classic UDP based authentication) > > On another of them, I built 2.5.2p1. The first attempt to back this machine > up failed. I checked the log files, and found they were having issues > because /etc/amdates was missing. I corrected that, and started a 2nd > backup run. (Remember amcheck reports all is well with this machine). I > got the following from amstatus when I attempted to back up this machine. > Also remember, one of the test I ran with a 2.6.1 client was to connect a > test machine directly to the client, using a crossover cable to eliminate > any firewall, or router type issues. > > I am attaching, what I think is, the amadnad debug file associated with this > failure. > > Can anyone suggest what I can do to further troubleshoot this? > > pb48:wd0f 1 dumper: [could not connect DATA stream: > can't connect stream to pb48.meadwestvaco.com port 11996: Connection > refused] (10:37:27) > . . . > amandad: time 30.019: stream_accept: timeout after 30 seconds > amandad: time 30.019: security_stream_seterr(0x86b67000, can't accept new > stream connection: No such file or directory) > amandad: time 30.019: stream 0 accept failed: unknown protocol error > amandad: time 30.019: security_stream_close(0x86b67000) > amandad: time 60.027: stream_accept: timeout after 30 seconds > amandad: time 60.027: security_stream_seterr(0x81212000, can't accept new > stream connection: No such file or directory) > amandad: time 60.027: stream 1 accept failed: unknown protocol error > amandad: time 60.027: security_stream_close(0x81212000) > amandad: time 90.035: stream_accept: timeout after 30 seconds > amandad: time 90.036: security_stream_seterr(0x84877000, can't accept new > stream connection: No such file or directory) > amandad: time 90.036: stream 2 accept failed: unknown protocol error > amandad: time 90.036: security_stream_close(0x84877000) > amandad: time 90.036: security_close(handle=0x81bbf800, driver=0x298a9240 > (BSD)) > amandad: time 120.044: pid 17702 finish time Fri Aug 21 10:39:27 2009 For some reason the socket is not getting marked ready for read. select(2) is timing out waiting. Firewall setup perhaps? This bit of code in 2.5.2p1's common-src/stream.c is where the failure is happening for you... int stream_accept( int server_socket, int timeout, size_t sendsize, size_t recvsize) { SELECT_ARG_TYPE readset; struct timeval tv; int nfound, connected_socket; int save_errno; int ntries = 0; in_port_t port; assert(server_socket >= 0); do { ntries++; memset(&tv, 0, SIZEOF(tv)); tv.tv_sec = timeout; memset(&readset, 0, SIZEOF(readset)); FD_ZERO(&readset); FD_SET(server_socket, &readset); nfound = select(server_socket+1, &readset, NULL, NULL, &tv); if(nfound <= 0 || !FD_ISSET(server_socket, &readset)) { save_errno = errno; if(nfound < 0) { dbprintf(("%s: stream_accept: select() failed: %s\n", debug_prefix_time(NULL), strerror(save_errno))); } else if(nfound == 0) { dbprintf(("%s: stream_accept: timeout after %d second%s\n", debug_prefix_time(NULL), timeout, (timeout == 1) ? "" : "s")); errno = ENOENT; /* ??? */ return -1;
RE: Amanda and dual tape libraries.
Onotsky, Steve x55328 wrote at 14:23 -0400 on May 14, 2009: > I agree, but the caveat is that the planner will do its darndest to > make full use of the extended capacity of the LTO4 cartridge. > > In my case, our backups went from between 5 and 8 hours with LTO2 > tapes to well over 24 hours in some cases with the LOT4s - same > DLEs. It took some fancy footwork to get it to a reasonable window > (about the same length of time as with the 2s but some of the > larger DLEs are forced to incremental on weekdays). This is so we > can get the cartridges ready for pickup by our offsite storage > provider. You can lie about your "tape" size in the tapetype, of course. You can even have different lies for different configurations. I've always wanted a knob to tell the scheduler to "shoot" for a smaller percentage of the total tape size, but to go ahead and use more if needed. Kind of an average target total size for the dumps. Maybe there is such a knob these days. I hope someone will say if there is. Lying about the tape size usually works fine. And it will go over that declared size if needed - if, for instance, some unexpected increase in size to a DLE happens after the estimate completes (or for whatever reason, the estimate is too low).
Re: amrecover stuck on "set owner/mode for '.' "
Dustin J. Mitchell wrote at 11:11 -0400 on May 1, 2009: > On Fri, May 1, 2009 at 10:20 AM, Joe Konecny wrote: > > I just ran amrecover and successfully put a file into /tmp. > > amrecover prompted me "set owner/mode for '.'? [yn]" to > > which I answered "n" and hit enter. Â Now it just sits there > > for over 30 minutes doing nothing. Â What could be going on? > > Hmm, I can only find that in the manpage, not the source. Presumably > it's some old message which has since been removed. This is tangential to the OP's problem, but it depends on what source you're looking at. That message is probably coming from the 'restore' OS utility, not amanda.
Re: The Coyote Den AMANDA MAIL REPORT FOR April 9, 2009
John Hein wrote at 08:47 -0600 on Apr 9, 2009: > This (snippet below from the gtar NEWS file) was added in 1.21 Woops. Sorry --no-check-device was added for 1.20. I've never tested it, however. If you can prove a device number change and use of this option is causing the "big incremental" dump problem, let us know (start a separate thread).
Re: The Coyote Den AMANDA MAIL REPORT FOR April 9, 2009
Gene Heskett wrote at 10:05 -0400 on Apr 9, 2009: > On Thursday 09 April 2009, Dustin J. Mitchell wrote: > >On Thu, Apr 9, 2009 at 8:53 AM, Gene Heskett > wrote: > >> Uptime is about 5 days now, but this may be the beginning of the end. > >> Something made it think all data was new from the looks of this. This was > >> the first run of 20090323, 20090321 works fine. Another device mapper > >> screwup? It was updated by yum yesterday. > > > >That seems the most likely cause. > > > >Dustin > > Yeas, but I thought we had worked out something that made amanda immune to > those little annoyances? Or was tar changed and the fix didn't work now? > > [ama...@coyote GenesAmandaHelper-0.6]$ tar --version > tar (GNU tar) 1.20 [snip] > Here is the amgtar stanza from amanda.conf: > > define application-tool app_amgtar { > comment "amgtar" > plugin "amgtar" > #property "GNUTAR-PATH" "/path/to/gtar" > #property "GNUTAR-LISTDIR" "/path/to/gnutar_list_dir" > #default from gnutar_list_dir setting in amanda-client.conf > #property "ONE-FILE-SYSTEM" "yes" #use '--one-file-system' option > #property "SPARSE" "yes" #use '--sparse' option > #property "ATIME-PRESERVE" "yes" #use '--atime-preserve=system' option > property "CHECK-DEVICE" "no" #use '--no-check-device' if set to > "no" > } > Do I need to set additional options? This (snippet below from the gtar NEWS file) was added in 1.21 == * New options --no-check-device, --check-device. The `--no-check-device' option disables comparing device numbers during preparatory stage of an incremental dump. This allows to avoid creating full dumps if the device numbers change (e.g. when using an LVM snapshot). The `--check-device' option enables comparing device numbers. This is the default. This option is provided to undo the effect of the previous `--no-check-device' option, e.g. if it was set in TAR_OPTIONS environment variable. ==
Re: Fwd: The Coyote Den AMANDA MAIL REPORT FOR April 9, 2009
Gene Heskett wrote at 08:53 -0400 on Apr 9, 2009: > Uptime is about 5 days now, but this may be the beginning of the > end. Something made it think all data was new from the looks of > this. This was the first run of 20090323, 20090321 works fine. > Another device mapper screwup? It was updated by yum yesterday. It probably was a device numbering change - if you are in the habit of rebooting after a yum update. It's probably worthwhile noting the disk device numbering before / after a reboot. Then if it changes, you can repair the gnutar-list files to point at the new device to avoid the "everything has changed" incremental dump (or 'amadmin force' a level 0, but that has problems with scheduling balance not to mention potentially running out of room depending on tape / vtape / holding disk size). I think there was a script floating around to change the dev # in the gnutar listed incremental files - possibly Dustin's.
Re: amgetconf: not found
Dustin J. Mitchell wrote at 10:11 -0400 on Mar 17, 2009: > 2009/3/17 Zbigniew Szalbot : > > Ever since the upgrade, I am not able to perform the backup. > > > > % /usr/local/sbin/amdump Backup2Disk > > amgetconf: not found > > > > I use FreeBSD 7.1-RELEASE if that matters. > > > > Would you help me find out how I can solve this problem and resume > > backups? Thank you! > > Is amgetconf installed? Is it in the same location as intended when > it was compiled? amdump is a shell script, so this should be pretty > easy for you to track down. That can be a misleading error message that points to perl not available. If you do have amgetconf, and it is in the path, then do 'head amgetconf' and check and make sure the perl it's looking for is installed and working.
Re: More ranting about issues compiling 2.6.1 on older machines
stan wrote at 08:30 -0500 on Mar 2, 2009: > I really think we need to come up with a plan that results in it being > easier to comile clients on older machines. I have expressed my opinion > that this needs to be a forkof a 2.5 branch, but I did not seem to get much > in the way of buy in by others on this list ofr that. Does anyiine have a > better plan? You never really said why you need to fork 2.5 as opposed to just run 2.5.2 (or 2.4.5) on older clients. Security fixes? Specific features? I think that putting security fixes onto a branch of 2.5 might be a reasonable task. Backporting some of the newer APIs would likely be a good bit more work, and, depending on your point of view, possibly not worth it. That said, it's possible committers would be willing to entertain committing patches to a 2.5.2 branch. I can't speak for them, but if the work is made minimal (by submitting well-documented patches), they might be reasonable about it. You could test the waters with a patch to fix some buffer overflow and ask (on amanda-hackers) if they would be willing to commit it. Cutting a new release is probably beyond the scope, but making commits to a legacy branch for a while seems reasonable. And if they don't, then you could, as you seem to be hinting, start a fork yourself. I can't say how popular it would be. Personally, I've had reasonable success getting the newer code to compile / run on older machines, certainly for clients if not the server code. It may be less work than a fork (and patches possibly more acceptable to the current maintainers). But if you publish a fork (whether it be a patchset or a public repository), there's likely a greater than zero chance that someone will use it - I just can't say how much greater than zero ;).
Re: amrestore: NAK: user root from localhost is not allowed to execute the service amindexd
Charles Curley wrote at 18:54 -0700 on Mar 1, 2009: > On Sun, Mar 01, 2009 at 05:49:20PM -0700, John Hein wrote: > > man amrecover (see -s & -t). I don't know if there is a run-time > > configuration option for these (I didn't see one after a quick read > > of the man pages) - if so, -o would be of no help. > > Those set the index and tape servers, respectively. Supposedly, you > can also do that with environmental variables. I tried environmental > variables, and they didn't work. Indeed, and the code seems to agree with the man page... recover-src/amrecover.c:server_name = getenv("AMANDA_SERVER"); recover-src/amrecover.c:tape_server_name = getenv("AMANDA_TAPE_SERVER"); I believe it has worked for me in the past. > I mean that amrecover should work on the client. Yes, it does work on the client. > > If you are asking if most people configure amanda that way, I'd say > > probably not, but who knows - I can say that I don't. If you want, > > you can take it up with the debian/ubuntu packager. FWIW, the default > > in the configure script if you don't specify --with-index-server is > > `uname -n`. > > Which in a precompiled package would give you the host name of the > build machine, rather useless for the rest of the universe. Which is perhaps why they might override that with 'localhost'. If I were the packager, I'd probably pick some host name that you could define as a good CNAME (or additional A record) in DNS, like 'backup', but there's a risk of picking something that will clash for someone out there. Having it overridable in amanda.conf would be good for this issue. If it really is not, then it might make a simple project for someone. > How about having it call the OS to enquire, and providing an option > to override? But that has its own security problems. Inquire what? DNS? Some LDAP map? I think -s & -t should work as a way to override - not sure why they didn't for you. I don't see a security issue since the server can decide which client hosts to allow (except perhaps for spoofing issues, but if you have problems with that, amanda may be the least of your worries).
Re: Timeout waiting for ack after adding dle
Toomas Aas wrote at 10:13 +0200 on Mar 1, 2009: > Sunday 01 March 2009 04:59:54 kirjutasid sa: > > Is this new DLE big? Lots of files? > > The new DLE is not that big. Its 'raw capacity' is 21 GB, ca 25000 files, > but > most of it are MySQL and PostgreSQL database files which are excluded from > the DLE. > > > It's also possible you're hitting a udp datagram size limit. This can > > be improved with a sysctl tweak, or a source patch or using tcp > > (sorry - don't recall if amanda 2.5.1 supports the latter). > > Thanks for the idea, I'll increase the net.inet.udp.maxdgram sysctl. Long pathnames can exacerbate the udp issue, too. > I also looked at sendbackup debug files on the client, but the only error > there is the same 'index tee cannot write [Broken pipe]': . . > sendbackup: time 0.014: started index creator: "/usr/local/bin/gtar -tf - > 2>/dev/null | sed -e 's/^\.//'" > sendbackup: time 469.114: index tee cannot write [Broken pipe] > sendbackup: time 469.114: pid 11511 finish time Sat Feb 28 04:14:12 2009 It dies after 469 seconds. That doesn't seem to be a data timeout (dtimeout defaults to 1800 seconds). But often when you have a "biggish" DLE with lots of excludes, it can cause tar to be silent for extended periods of time and trigger a timeout.
Re: amrestore: NAK: user root from localhost is not allowed to execute the service amindexd
John Hein wrote at 17:49 -0700 on Mar 1, 2009: > man amrecover (see -s & -t). I don't know if there is a run-time > configuration option for these (I didn't see one after a quick read > of the man pages) - if so, -o would be of no help. ^^^ s/so/not/
Re: amrestore: NAK: user root from localhost is not allowed to execute the service amindexd
Charles Curley wrote at 16:53 -0700 on Mar 1, 2009: > On Sun, Mar 01, 2009 at 02:18:27PM -0700, Charles Curley wrote: > > On Sun, Mar 01, 2009 at 12:50:50PM -0700, John Hein wrote: > > > Do you know how your stock ubuntu build of amanda was configured (the > > > args to configure)? > > > > > > I just noticed that the request came from 'localhost' which does > > > not match your .amandahosts entry. > > > > Short of pulling in the source package and looking at that, I have no > > idea. I don't even know how to find out, other than ask on another > > list. Yep, this is an example of one disadvantage of using prebuilt packages. > > I also don't see any way to override the host name. -o host and -o > > hostname are rejected. man amrecover (see -s & -t). I don't know if there is a run-time configuration option for these (I didn't see one after a quick read of the man pages) - if so, -o would be of no help. > For what it's worth, I came up with a work-around. On the server, I > added localhost.localdomain to .amandahosts, ran amrecover, and that > worked. Yes, that is what I was getting at. Good to hear it worked for you. > -- > chaffee.localdomain backup amdump > chaffee.localdomain root amindexd amidxtaped > localhost.localdomain root amindexd amidxtaped > -- > > r...@chaffee:~/test# amrecover > AMRECOVER Version 2.5.2p1. Contacting server on localhost ... > 220 chaffee AMANDA index server (2.5.2p1) ready. > Setting restore date to today (2009-03-01) > 200 Working date set to 2009-03-01. > 200 Config set to DailySet1. > 501 Host chaffee is not in your disklist. > Trying host chaffee.localdomain ... > 200 Dump host set to chaffee.localdomain. > Use the setdisk command to choose dump disk to recover > amrecover> help > > From there, sethost, setdisk, setdate, and it looks like I'm on my > way. > > But this is not The Way It's Supposed To Work, is it? Not sure what you mean. If someone configured the build of amanda (specifically the amrecover part of amanda in this case) with --with-index-server=localhost, then, yes, what you experienced is the expected behavior. If you are asking if most people configure amanda that way, I'd say probably not, but who knows - I can say that I don't. If you want, you can take it up with the debian/ubuntu packager. FWIW, the default in the configure script if you don't specify --with-index-server is `uname -n`. If you want better control, you can build amanda from source yourself.
Re: amrestore: NAK: user root from localhost is not allowed to execute the service amindexd
Charles Curley wrote at 11:48 -0700 on Mar 1, 2009: > On Sun, Mar 01, 2009 at 10:11:38AM -0700, John Hein wrote: > > Charles Curley wrote at 08:29 -0700 on Mar 1, 2009: > > > r...@dragon:/home/ccurley/projects/ror# amrecover -C > > /etc/amanda/DailySet1 > > > AMRECOVER Version 2.5.2p1. Contacting server on localhost ... > > > NAK: user root from localhost is not allowed to execute the service > > amindexd: Please add "amindexd amidxtaped" to the line in > > /var/backups/.amandahosts on the client > > > > ^^^ Note, "user root" > > > > > > > However, I have already added the quoted text to the .amandahosts > > > file, on both client and server, like so: > > > > > > chaffee.localdomain backup amindexd amidxtaped > > > > You have the "backup" user here. > > > > http://wiki.zmanda.com/index.php/How_To:Migrate_from_older_amanda_versions#Problems_with_amrecover_from_amanda_2.5 > > Right. I did: > > -- > chaffee.localdomain backup amdump > chaffee.localdomain root amindexd amidxtaped > -- > > and various permutations thereof on the client and the server. No go. > > I also commented out server_args in the xinetd.d/amanda file. Also no > go. Do you know how your stock ubuntu build of amanda was configured (the args to configure)? I just noticed that the request came from 'localhost' which does not match your .amandahosts entry.
Re: amrestore: NAK: user root from localhost is not allowed to execute the service amindexd
Charles Curley wrote at 08:29 -0700 on Mar 1, 2009: > r...@dragon:/home/ccurley/projects/ror# amrecover -C /etc/amanda/DailySet1 > AMRECOVER Version 2.5.2p1. Contacting server on localhost ... > NAK: user root from localhost is not allowed to execute the service > amindexd: Please add "amindexd amidxtaped" to the line in > /var/backups/.amandahosts on the client ^^^ Note, "user root" > However, I have already added the quoted text to the .amandahosts > file, on both client and server, like so: > > chaffee.localdomain backup amindexd amidxtaped You have the "backup" user here. http://wiki.zmanda.com/index.php/How_To:Migrate_from_older_amanda_versions#Problems_with_amrecover_from_amanda_2.5
Re: Timeout waiting for ack after adding dle
Toomas Aas wrote at 11:04 +0200 on Feb 28, 2009: > I have a single-machine (client==server) setup which has been working > well for quite a long time. It's running Amanda 2.5.1p3 on FreeBSD 6.4. > > Yesterday I added a new disk to the machine, mounted it under /db and > added corresponding entry to the disklist. On tonights backup run, > Amanda backed up first two small DLEs but all the rest (including the > newly added one) failed with: > > host.domain.ee /usr lev 1 FAILED [cannot read header: got 0 instead > of 32768] > host.domain.ee /usr lev 1 FAILED [cannot read header: got 0 instead > of 32768] > host.domain.ee /usr lev 1 FAILED [too many dumper retry: "[request > failed: timeout waiting for ACK]"] > > This shouldn't be a firewall problem, since the firewall on the > machine is set to unconditionally pass all traffic on loopback > interface and I couldn't find any relevant dropped packets in the > firewall log. Also amcheck -c passes with no errors. > > I looked at the amdump.1 file, and the first indication of any problem > is on the 3rd DLE (which is the newly added one - coincidence?): > > driver: result time 2761.656 from chunker0: FAILED 00-5 "[cannot > read header: got 0 instead of 32768]" > > (2761 seconds is approximately 04:06 local time) > > Couldn't see anything wrong before that. In the server's general error > log there are just these messages tonight: > > Feb 28 04:14:12 host sendbackup[11511]: index tee cannot write [Broken pipe] > Feb 28 04:15:02 host sendbackup[11632]: index tee cannot write [Broken pipe] sendbackup is dying early - possible your timeouts are set too low in amanda.conf. Is this new DLE big? Lots of files? It's also possible you're hitting a udp datagram size limit. This can be improved with a sysctl tweak, or a source patch or using tcp (sorry - don't recall if amanda 2.5.1 supports the latter). The client debug files might tell more. You didn't say you looked at those.
Re: Amanda and older clients
stan wrote at 16:13 -0500 on Feb 25, 2009: > It appears that the mainstream development of Amanda has taken off in a > direction that has/will result in making in impossible to compile on many > existing platforms that have been historically supported by Amanda. > > While there are good reasons for this change, it represents a major loss of > functionality for us, and I suspect many other long term Amanda users who > depend on being able to use this package to backup their older clients. > > I have been discussing this issue at length, off list, with one of the > developers of the project. His recommendation is that we create a "client > only" version of Amanda that is a fork off of the 2.5.2.x branch of th > tree. This version, as I understand it predates the need for glibc, which > as I have just discovered is unsorted on may many hardware/software > architectures. I think it also predates the need for pkg-configure, which > does not seem to have the same portability issues as glib, but is IMHO an > unnecessary build time dependency, given that configure was designed for, > what I believe to be, the same need. > > I am thinking about volunteering to lead this effort, as we are in the > middle of upgrading a fairly large Amanda installation at my work, and i > have, at least, 3 OS/hardware pairs thta are not supported by glib. > > I would like to hear from other users of Amanda how they feel about this. i > hope the collective wisdom of the list may help to provide some direction > for my thoughts. I am sympathetic to the needs of running old platforms. But if you need to do so, at a certain point, it becomes an exercise of self-maintenance. It's like maintaining a 50 year old car. You can't just go to Napa and get a part sometimes. Developing for a project like amanda is, to some extent, a juggling exercise. They (the developers) have to deal with a variety of OS's of various ages. I can understand the decision to depend on glib (not glibc, BTW) from a portability aspect. (I'm less convinced about perl, but that's another matter). glib was partly chosen _because_ it's more portable (again not glibc), but it can sometimes have edge cases when using it on older systems. This is a much more general question that applies to more than just amanda. But, that said, there is some effort expended to ensure that newer amanda servers can speak to older clients (going the other way, new client - old server, is another matter, but that works to a certain extent, too). So for older platforms, you _can_ (as others have mentioned) just freeze the amanda version on the client. Most, but not all, of the new features one would be interested in are on the server. Answering your particular notion of forking amanda, it's also another possibility to expend some effort to build the latest amanda on an old system. If you don't have to build the server code, it's a more simplified task. And if you have a set of patches to say, build on old HP-UX, sometimes they can be applied in the current code (submit to amanda-hackers). At the least, you can put the patches up on the wiki. Anyway, that's another possibility for you to consider.
Re: Weird compression results for DLE using 'compress NONE' (nocomp-root)
Tom Robinson wrote at 12:30 +1100 on Jan 22, 2009: > I've got several disks that are showing weird compression results in the > amanda report. Here's one of them: > >DUMPER STATS > TAPER STATS > HOSTNAME DISK L ORIG-KB OUT-KB COMP% MMM:SS > KB/s MMM:SS KB/s > > -- --- > host /disk 1 20316904063380 200.0 36:34 > 1852.3 6:27 10487.2 > > > Note the ORIG-KB blows out to twice the size! COMP% is 200.0... > > This happens on more that one disk actually. I chose this disk as it's > the biggest disk that I dump, it shows the most expansive blowout and I > noticed it first. This disk uses 'compress NONE' (dumptype is > nocomp-root). Some of the other disks showing compression weirdness are > using 'compress client fast' in their DLE's. Smells like a factor of two error somewhere (512 byte blocks vs. 1024?). What does 'env -i du -ks /disk' say?
Re: Weird compression results for DLE using 'compress NONE' (nocomp-root)
John Hein wrote at 21:38 -0700 on Jan 21, 2009: > Tom Robinson wrote at 12:30 +1100 on Jan 22, 2009: > > I've got several disks that are showing weird compression results in the > > amanda report. Here's one of them: > > > >DUMPER STATS > TAPER STATS > > HOSTNAME DISK L ORIG-KB OUT-KB COMP% MMM:SS > KB/s MMM:SS KB/s > > > -- --- > > host /disk 1 20316904063380 200.0 36:34 > 1852.3 6:27 10487.2 > > > > > > Note the ORIG-KB blows out to twice the size! COMP% is 200.0... > > > > This happens on more that one disk actually. I chose this disk as it's > > the biggest disk that I dump, it shows the most expansive blowout and I > > noticed it first. This disk uses 'compress NONE' (dumptype is > > nocomp-root). Some of the other disks showing compression weirdness are > > using 'compress client fast' in their DLE's. > > Smells like a factor of two error somewhere (512 byte blocks vs. 1024?). > What does 'env -i du -ks /disk' say? Never mind that last request... your report above shows a level 1, not 0. So du output won't be a useful comparision to the numbers above. Does it behave the same (x2) for level 0 dumps, too?
Re: perl errors in taper debug files
Jean-Francois Malouin wrote at 14:00 -0500 on Dec 19, 2008: > * Dustin J. Mitchell [20081219 13:51]: > > On Fri, Dec 19, 2008 at 1:28 PM, Jean-Francois Malouin > > wrote: > > > for module Amanda::Types: > > > /usr/local/share/perl/5.8.8/auto/Amanda/Types/libTypes.so: cannot open > > > shared object file: No such file or > > > directory at /usr/lib/perl/5.8/DynaLoader.pm line 225. > > > > > Should I be worried? I guess this is part of the new API > > > and all the swig stuff which I'm totally clueless about. > > > Is something not quite kosher in my local perl setup? > > > > Yep. Does that .so file exist? Sometimes libdl gives "No such file > > or directory" when it can't find the *dependencies* of a shared > > object. Assuming libTypes.so exists, what does 'ldd' tell you about > > it? What platform is this? Was it installed from a package or from > > source? > > Dustin, > > Yes, it exists: see my other post about the output of ldd. > > I think the problam might be related the having libglib dso's in > /opt/lib64 (installed from source) and not having this in > LD_LIBRARY_PATH. > > This is on Debian/Etch running 2.6.26.5-i686-64-smp. Did the ldd you ran have a different env than amanda (i.e., did your ldd run have LD_LIBRARY_PATH set?)? If it is just an LD_LIBRARY_PATH issue, you may want to consider putting /opt/lib64 into your ldconfig config (/etc/ld.*conf* under linux). Since you're on linux, you can run 'ldd -v ...' to chase down missing dependencies in the dependencies.
Re: perl errors in taper debug files
Jean-Francois Malouin wrote at 13:28 -0500 on Dec 19, 2008: > I get this in taper debug files on a server running 2.6.0p2: > > 1229661061.852610: taper: pid 15074 ruid 107 euid 107: start at Thu Dec 18 > 23:31:01 2008 > 1229661061.898373: taper: taper: pid 15074 executable taper version 2.6.0p2 > 1229661061.900205: taper: pid 15074 ruid 107 euid 107: rename at Thu Dec 18 > 23:31:01 2008 > 1229661061.900655: taper: getcmd: START-TAPER 20081218233101 > 1229661063.220627: taper: changer_query: changer return was 12 1 1 > 1229661063.220682: taper: changer_query: searchable = 1 > 1229661063.220692: taper: changer_find: looking for av24-1_left1_S00041L3 > changer is searchable = 1 > 1229661063.220703: taper: changer_search: av24-1_left1_S00041L3 > Can't load '/usr/local/share/perl/5.8.8/auto/Amanda/Types/libTypes.so' > for module Amanda::Types: > /usr/local/share/perl/5.8.8/auto/Amanda/Types/libTypes.so: cannot open > shared object file: No such file or > directory at /usr/lib/perl/5.8/DynaLoader.pm line 225. > at /usr/local/share/perl/5.8.8/Amanda/Types.pm line 11 > Compilation failed in require at > /usr/local/share/perl/5.8.8/Amanda/Device.pm line 10. > Compilation failed in require at /opt/amanda-2.6.0p2/sbin/amdevcheck > line 4. > BEGIN failed--compilation aborted at /opt/amanda-2.6.0p2/sbin/amdevcheck > line 4. > 1229661188.511546: taper: device_read_label; mode = 0 > ... > > Should I be worried? I guess this is part of the new API > and all the swig stuff which I'm totally clueless about. > Is something not quite kosher in my local perl setup? What's the output of ldd? ldd /usr/local/share/perl/5.8.8/auto/Amanda/Types/libTypes.so And if it doesn't exist in that location, then that's your problem. If so, you can try to convince amanda/perl to look for it elsewhere or install it to the expected location.
Re: Tape changer question
Tim Bunnell wrote at 17:03 -0500 on Dec 13, 2008: > Folks, > > We're running Amanda (version 2.5.1p1) on a Debian Linux system with an > 8-tape AIT-2 library. We have around 314GB spread over two file systems > that we are attempting to backup in one run across as many tapes as > necessary. We're using gzip compression and expect it will take 5-6 > tapes to complete (there's a fair amount of audio and image data that > doesn't compress too well). > > I think we have the config files set up correctly, but it seems like no > matter what we do, the run stops (after about 16 hours) and reports that > it's out of tape. I don't think it has ever succeeded in spanning more > than 4 tapes before giving us the error. I see nothing in the .debug > output for the changer that looks different for any tapes it changes. > > I'm sort of at a loss for where to start looking for the problem, and > what to look for. Any suggestions from the list? Do you really know the tape capacity for your tapes? Some AIT-2 flavors are 36 GB, some are 50 GB, it seems. 36*8 < 314 Have you run amtapetype to verify? (see http://wiki.zmanda.com/index.php/Tapetype_definitions) Do you have hardware compression off?
Re: Tape changer question
John Hein wrote at 16:00 -0700 on Dec 13, 2008: > Tim Bunnell wrote at 17:03 -0500 on Dec 13, 2008: > > Folks, > > > > We're running Amanda (version 2.5.1p1) on a Debian Linux system with an > > 8-tape AIT-2 library. We have around 314GB spread over two file systems > > that we are attempting to backup in one run across as many tapes as > > necessary. We're using gzip compression and expect it will take 5-6 > > tapes to complete (there's a fair amount of audio and image data that > > doesn't compress too well). > > > > I think we have the config files set up correctly, but it seems like no > > matter what we do, the run stops (after about 16 hours) and reports that > > it's out of tape. I don't think it has ever succeeded in spanning more > > than 4 tapes before giving us the error. I see nothing in the .debug > > output for the changer that looks different for any tapes it changes. > > > > I'm sort of at a loss for where to start looking for the problem, and > > what to look for. Any suggestions from the list? > > Do you really know the tape capacity for your tapes? > Some AIT-2 flavors are 36 GB, some are 50 GB, it seems. > 36*8 < 314 > > Have you run amtapetype to verify? > (see http://wiki.zmanda.com/index.php/Tapetype_definitions) > Do you have hardware compression off? Sorry, I just re-read and saw that it only used 4 tapes. What is runtapes set to? Somewhere in the logs, it should explain why it's not going past 4.
Re: Stranded on waitq failure (planner: Message too long)
Leon Me=DFner wrote at 12:22 +0100 on Nov 6, 2008: > Hi, > = > On Wed, Nov 05, 2008 at 06:46:52PM -0500, Jean-Louis Martineau wrote: > > Ian Turner wrote: > >> I don't know if 2.5.1 is old enough to qualify for this issue, but = it used = > >> to be the case that the entire set of disklists for a client had to= fit in = > >> a single packet. What that meant is that if you had more than a few= dozen = > >> disks on one client (depending on disklist options), you would run = into = > >> this issue. > = > On this backupset i have 28 dle's. > = > >> = > >> The solution is to upgrade, but a workaround is to create a second = IP = > >> address and DNS name on the same physical client, and move some of = the = > >> disklist entries to the latter. > = > I'm running the latest Amanda from the Ports. Perhaps i should ask the= > maintainer about updating to 2.6.x. I don't know why the port uses thi= s > old version. The maintainer seems to be quite active. > = > > Or change to the 'bsdtcp' auth. > = > = > Thanks for your solutions, changing net.inet.udp.maxdgram to 65535 > helped also (FreeBSD's default is 9k ;). I've been applying the following patch to bump up the max datagram size since amanda 2.4.1 (minor differences per version mostly due to changes in dbprintf) when I started seeing packet size limit problems even with a modest number of DLEs (it was a particular DLE that had a lot of files that first caused the problem). It works on all OS's, not just BSDs. At one point I submitted it on -hackers, but it never got committed. The maxdgram sysctl is global to the system. This patch gives you a little finer control. Against 2.5.1p1 - 2.5.x ... --- common-src/dgram.c.orig Wed Sep 20 06:48:54 2006 +++ common-src/dgram.c Wed Sep 27 13:43:07 2006 @@ -57,6 +57,7 @@ dgram_bind( socklen_t len; struct sockaddr_in name; int save_errno; +int sndbufsize =3D MAX_DGRAM; = *portp =3D (in_port_t)0; if((s =3D socket(AF_INET, SOCK_DGRAM, 0)) =3D=3D -1) { @@ -75,6 +76,10 @@ dgram_bind( errno =3D EMFILE; /* out of range */ return -1; } +if (setsockopt(sock, SOL_SOCKET, SO_SNDBUF, (void *) &sndbufsize, + sizeof(sndbufsize)) < 0) + dbprintf(("%s: dgram_bind: could not set udp send buffer to %d\n", + debug_prefix(NULL), sndbufsize)); = memset(&name, 0, SIZEOF(name)); name.sin_family =3D (sa_family_t)AF_INET; Against 2.6.x ... --- common-src/dgram.c.orig Fri May 30 11:44:36 2008 +++ common-src/dgram.c Fri Aug 22 13:19:56 2008 @@ -250,6 +250,7 @@ socklen_t_equiv addrlen; ssize_t nfound; int save_errno; +int sndbufsize =3D MAX_DGRAM; = sock =3D dgram->socket; = @@ -286,6 +287,10 @@ errno =3D save_errno; return nfound; } +if (setsockopt(sock, SOL_SOCKET, SO_SNDBUF, (void *) &sndbufsize, + sizeof(sndbufsize)) < 0) + dbprintf("%s: dgram_bind: could not set udp send buffer to %d\n", + strerror(save_errno), sndbufsize); = addrlen =3D (socklen_t_equiv)sizeof(sockaddr_union); size =3D recvfrom(sock, dgram->data, (size_t)MAX_DGRAM, 0,
Re: amrecover ignores --prefix
Steven Backus wrote at 17:50 -0700 on Nov 6, 2008: > I'm doing a trial installation of amanda and don't want to mess > up my regular install, so I compiled with --prefix=/local. > Regardless of this, amrecover is still looking for my config files > in /usr/local/etc/amanda/ instead of /local/etc/amanda/. > Is this expected behavior, a bug or ? I'm using amanda-2.6.0p2. What does 'amadmin version' say for CONFIG_DIR? FWIW, --prefix works for me.
Re: DUMP: You can't update the dumpdates file when dumping a subdirectory
Dustin J. Mitchell wrote at 00:41 -0400 on Oct 3, 2008: > On Thu, Oct 2, 2008 at 9:03 PM, Aaron J. Grier <[EMAIL PROTECTED]> wrote: > > I beg your pardon, but /sbin/dump is perfectly capable of dumping > > subdirectories on most unixes. it just won't record (or read) the date > > of the dump in /etc/dumpdates. > > I'm happy to be proven wrong (I've not used dump myself), but it was > my understanding that dump, in general, worked at the filesystem > level, against a block device. For example, the OSX manpage (which is > just inherited from the BSD manpages) says: > > dump [-0123456789cnu] [-B records] [-b blocksize] [-d density] > [-f file] [-h level] [-s feet] [-T date] filesystem > > where the use of the term "filesystem" means, to my understanding, a > filesystem and not an arbitrary subdirectory. Now, you may have a > filesystem mounted at /usr/local, in which case you can use dump to > back up /usr/local, but I don't think that's what you meant. > > Can you point to some documentation to support your assertion? This was touched on as part of a larger thread in April. http://www.mail-archive.com/amanda-users@amanda.org/msg40187.html In short, some OS's support a dump with files, but it's not clear how well that is supported (or will work at all) in amanda, particularly with incremental dumps. I think it'd be great if we could use the native filesystem's dump (or dump-like) tool rather than gtar. Then you can backup filesystem specific attributes that gtar may not handle. (and you don't touch atime, which has always been an annoying quibble with having to use tar). Having to dump the whole filesystem, of course, is really the big hurdle with using a dump or dump-like tool.
Re: amandad args in inetd.conf
[EMAIL PROTECTED] wrote at 13:09 -0400 on Sep 26, 2008: > Hmm, thanks for mentioning. If I condense down my browser window, > the text goes beyond the bounding box but I can still scroll over > to see the full line of text. Are you not getting the same? Exactly the same. I was wondering if there is a wiki markup directive to have the bounding box extend to the end of the text rather than the border of the window. > > And somehow between 2.5 & 2.6, the .amandahost docs in amanda(8) lost > > the docs for the 'service(s)' field. > > > > I'm seeing the following in 2.6.0p2's amanda(8): Woops. My fault... bad zgrep for the strings in the various flavors of amanda I have installed (man pages in the 2.6* flavors moved into /share/man instead of /man).
Re: amandad args in inetd.conf
Paul Yeatman wrote at 16:59 -0700 on Sep 24, 2008: > Online Amanda documentation for inetd.conf configuration for both > server and client are found on the Amanda wiki site here > > server: > http://wiki.zmanda.com/index.php/Configuring_bsd/bsdudp/bsdtcp_authentication#xinetd.2Finetd_configuration_file_changes Thanks. Perhaps someone should add this info to the man pages. I wonder how one might go about fixing the wikimedia markup to extend the bounding box in the 'inetd.conf' example (assuming your browser window is not really wide). And somehow between 2.5 & 2.6, the .amandahost docs in amanda(8) lost the docs for the 'service(s)' field. > client: > http://wiki.zmanda.com/index.php/Quick_start#Configuring_inetd_on_the_client Thanks again.
Re: dumpcycle
Richard Stockton wrote at 16:42 -0700 on Sep 23, 2008: > At 04:29 PM 9/23/2008, you wrote: > > > How do I force a full dump under ALL circumstances? > > > (I do have my disklist set to "always-full"). > > > >Is your holding disk sufficiently large? > > Yes. 260 Gigs to hold about 240 Gigs of backup in 2 pieces (100+140) > > >Have you changed the "reserve" for incrementals parameter > >from its default 100%? > > I have now. I have also set (in the "always-full" dumptype): > strategy noinc > skip-incr > > Do I need all three set as above? Or would just the "reserve 0" > take care of it? I _think_ noinc is sufficient, but I'm not sure. I've always added 'reserve 0' in our "everything level 0" config. And I think that's a good idea (if you definitely don't want amanda to try dumping incrementals in degraded mode) in case of running out of tapes, etc. I don't think skip-incr is necessary, but we do have it. I don't know about dumpcycle 0, but I don't use it. I've always considered that amanda treats dumpcycle as advisory. If it it can meet the dumpcycle number, it will, but between promotions (not really applicable for dumpcycle 0) and delays (usually due to space), it's not guaranteed.
Re: dumpcycle
John Hein wrote at 17:22 -0600 on Sep 23, 2008: > Richard Stockton wrote at 14:28 -0700 on Sep 23, 2008: > > How do I force a full dump under ALL circumstances? > > (I do have my disklist set to "always-full"). > > Put these in your global dumptype settings... > strategy noinc > skip-incr You probably also want to set 'reserve 0'.
Re: dumpcycle
Richard Stockton wrote at 14:28 -0700 on Sep 23, 2008: > How do I force a full dump under ALL circumstances? > (I do have my disklist set to "always-full"). Put these in your global dumptype settings... strategy noinc skip-incr
Re: amandad args in inetd.conf
Olivier Cherrier wrote at 18:46 -0400 on Sep 23, 2008: > On Tue, Sep 23, 2008 at 10:27:26AM -0600, [EMAIL PROTECTED] wrote: > > Where are the docs for what args need to be added to amandad in > > inetd.conf? > > > > I added amindexd and amidxtaped on the backup server in order to do > > amrecover, but then amcheck failed (needed noop, then selfcheck). > > Then amdump failed (needed sendsize, ...). > > I think you have to populate your ~amandauser/.amandahosts with > something like that (amandauser = operator for me) : > yourHost operatoramdump > > >From amanda(8): "amdump is a shortcut for "noop selfcheck sendsize > sendbackup" Ah... difference between amanda-2.5 and amanda-2.6. At least a documentation difference. 2.5.* seems to be missing that documentation in the context of inetd.conf and 2.6.* doesn't have that tidbit at all (in fact it doesn't seem to support the service entry in .amandahosts in the 2.6 man page). I'm not sure if that's an oversight or deliberate. I also wonder if that alias is valid in inetd.conf. > > I see the full list in amandad.c, and I think I understand why clients > > don't need the addition. It defaults to all services except amindexd > > and amidxtaped being active. But when you activate those two on the > > server, it seems it _de_activates the others. > > I myself configured inetd.conf like that: > $ grep amandad /etc/inetd.conf > amandadgram udp wait operator /usr/local/libexec/amanda/amandad > amandad amindexd amidxtaped So you don't backup your server? Or you use .amandahosts to specify the services that amandad can run on the server? I guess I don't see a big difference between editing .amandahosts vs. inetd.conf
amandad args in inetd.conf
Where are the docs for what args need to be added to amandad in inetd.conf? I added amindexd and amidxtaped on the backup server in order to do amrecover, but then amcheck failed (needed noop, then selfcheck). Then amdump failed (needed sendsize, ...). I see the full list in amandad.c, and I think I understand why clients don't need the addition. It defaults to all services except amindexd and amidxtaped being active. But when you activate those two on the server, it seems it _de_activates the others. I didn't see it in the docs in the tarball and the wiki search leaves a bit to be desired. Where is it documented for the non-source diving crowd? Also, why would anyone want noop, selfcheck, sendbackup, etc., disabled? Is it a use case that some people don't back up their backup servers? I've been using a 2.4.5 server for so long, I've missed some details.
Re: Single tape changer process
Dustin J. Mitchell wrote at 09:32 -0400 on Sep 12, 2008: > On Thu, Sep 11, 2008 at 11:53 PM, Olivier Nicole <[EMAIL PROTECTED]> wrote: > > Is there a mechanism in Amanda to ensure that only a single tape > > changer process is running at any given time? > > No -- and this poses a problem for processes that want to move data > between devices. I'm working on such a process, and for that reason > I'm working on an overhaul of the changer API at the moment. The key > problem with the existing API is that it has no way to indicate that a > process is finished with a device, and that the changer can load a new > volume into that device. I use lockf(1) in a wrapper script to protect accesses to a resource where amanda (currently) does not.
Re: planner: could not lock log file
Dustin J. Mitchell wrote at 09:40 -0400 on Sep 12, 2008: > > creating amflock-test > > make: don't know how to make amgpgcrypt. Stop > > *** Error code 2 > > Stop in /root/amanda-2.6.0p2/common-src. > > *** Error code 1 > > Stop in /root/amanda-2.6.0p2. > > web# > > Amanda requires GNU make now (gmake). Indeed. I'm so used to visually transforming make to gmake (since gmake is spelled make in linux) that I missed that detail of the problem report. I think amanda has required gmake for quite a while, hasn't it?
Re: planner: could not lock log file
Taalaibek Ashirov wrote at 14:58 +0300 on Sep 12, 2008: > > Let me recant, it may be amanda at fault or a combination. If you add > > the following to the initialization of 'lock' in the test, does your > > problem go away? > > > > lock.l_start = 0; > > lock.l_len = 0; > > > > If the short test passes, add those lines to amflock() in amanda's > > common-src/amflock.c, rebuild amanda and give it another shot. > > Yes, in the test the problem goes away. Actually, I've installed amanda > from ports. Now I deinstalled amanda and got latest source version from > amandas website, added those lines to amflock(). When I did make it gave > me error like "lock is not declared". I declared lock same as test > (struct flock lock;). This time I got this: > > ... > creating amflock-test > make: don't know how to make amgpgcrypt. Stop > *** Error code 2 > Stop in /root/amanda-2.6.0p2/common-src. > *** Error code 1 > Stop in /root/amanda-2.6.0p2. > web# > > Is there any thing else I missed? That's likely a 'configure' problem (to investigate, you could look at differences in the configure stage between your build from source and the ports build from source - the config.log file shows what was passed to configure). Maybe just go back to building from the port... cd ports/misc/amanda-server make patch add the lines to amflock.c make sudo make install
Re: planner: could not lock log file
John Hein wrote at 07:19 -0600 on Sep 11, 2008: > Taalaibek Ashirov wrote at 10:31 +0300 on Sep 11, 2008: > > On Wed, 2008-09-10 at 09:56 -0600, John Hein wrote: > > > What happens when you compile and run this (as the backup user)? > > > > > > #include > > > #include > > > #include > > > int > > > main() > > > { > > > struct flock lock; > > > int fd = open("/var/log/amanda/dotProject/foo", O_RDWR | O_CREAT); > > > if (fd < 0) err(1, "open"); > > > > > > lock.l_type = F_WRLCK; > > > lock.l_whence = SEEK_SET; > > > int r = fcntl(fd, F_SETLKW, &lock); > > > if (r < 0) err(1, "fnctl"); > > > return 0; > > > } > > > > Hi John! Thank you for your efforts. I got the same error: > > > > $ ./test > > test: fnctl: Invalid argument > > Then it's an issue with your system somehow, not amanda. > > Looking at src/sys/kern/kern_descrip.c, you can get EINVAL is if you > pass in l_type that is not F_RDLCK, F_WRLCK or F_UNLCK. > > Try adding the printf below and rebuilding your kernel. Then run the > above test. Look for the printf in dmesg (or /var/log/messages if you > are using a default syslog.conf). > > Index: kern_descrip.c > === > RCS file: /base/FreeBSD-CVS/src/sys/kern/kern_descrip.c,v > retrieving revision 1.279.2.15.2.1 > diff -u -p -r1.279.2.15.2.1 kern_descrip.c > --- kern_descrip.c 14 Feb 2008 11:46:40 - 1.279.2.15.2.1 > +++ kern_descrip.c 11 Sep 2008 13:17:25 - > @@ -533,6 +533,7 @@ kern_fcntl(struct thread *td, int fd, in > flp, F_POSIX); > break; > default: > +printf("invalid l_type: %#x\n", flp->l_type); > error = EINVAL; > break; > } Let me recant, it may be amanda at fault or a combination. If you add the following to the initialization of 'lock' in the test, does your problem go away? lock.l_start = 0; lock.l_len = 0; If the short test passes, add those lines to amflock() in amanda's common-src/amflock.c, rebuild amanda and give it another shot.
Re: planner: could not lock log file
Taalaibek Ashirov wrote at 10:31 +0300 on Sep 11, 2008: > On Wed, 2008-09-10 at 09:56 -0600, John Hein wrote: > > What happens when you compile and run this (as the backup user)? > > > > #include > > #include > > #include > > int > > main() > > { > > struct flock lock; > > int fd = open("/var/log/amanda/dotProject/foo", O_RDWR | O_CREAT); > > if (fd < 0) err(1, "open"); > > > > lock.l_type = F_WRLCK; > > lock.l_whence = SEEK_SET; > > int r = fcntl(fd, F_SETLKW, &lock); > > if (r < 0) err(1, "fnctl"); > > return 0; > > } > > Hi John! Thank you for your efforts. I got the same error: > > $ ./test > test: fnctl: Invalid argument Then it's an issue with your system somehow, not amanda. Looking at src/sys/kern/kern_descrip.c, you can get EINVAL is if you pass in l_type that is not F_RDLCK, F_WRLCK or F_UNLCK. Try adding the printf below and rebuilding your kernel. Then run the above test. Look for the printf in dmesg (or /var/log/messages if you are using a default syslog.conf). Index: kern_descrip.c === RCS file: /base/FreeBSD-CVS/src/sys/kern/kern_descrip.c,v retrieving revision 1.279.2.15.2.1 diff -u -p -r1.279.2.15.2.1 kern_descrip.c --- kern_descrip.c 14 Feb 2008 11:46:40 - 1.279.2.15.2.1 +++ kern_descrip.c 11 Sep 2008 13:17:25 - @@ -533,6 +533,7 @@ kern_fcntl(struct thread *td, int fd, in flp, F_POSIX); break; default: +printf("invalid l_type: %#x\n", flp->l_type); error = EINVAL; break; }
Re: planner: could not lock log file
Taalaibek Ashirov wrote at 18:40 +0300 on Sep 10, 2008: > On Wed, 2008-09-10 at 09:20 -0600, John Hein wrote: > > John Hein wrote at 08:23 -0600 on Sep 10, 2008: > > > Out of curiosity, what is the output of 'df /var/log/amanda/dotProject'? > > > > And the output of mount. > > web# df -h /var/log/amanda/dotProject/ > Filesystem SizeUsed Avail Capacity Mounted on > /dev/da0s1a 30G 12G 16G43%/ > web# mount > /dev/da0s1a on / (ufs, local) > devfs on /dev (devfs, local) > /dev/da1s1d on /home (ufs, local, soft-updates) > /dev/da0s1d on /tmp (ufs, local, soft-updates) > linprocfs on /usr/compat/linux/proc (linprocfs, local) Since that's plain old ufs, that rules out issues with wonderful and exotic filesystems. What happens when you compile and run this (as the backup user)? #include #include #include int main() { struct flock lock; int fd = open("/var/log/amanda/dotProject/foo", O_RDWR | O_CREAT); if (fd < 0) err(1, "open"); lock.l_type = F_WRLCK; lock.l_whence = SEEK_SET; int r = fcntl(fd, F_SETLKW, &lock); if (r < 0) err(1, "fnctl"); return 0; }
Re: planner: could not lock log file
Taalaibek Ashirov wrote at 15:16 +0300 on Sep 10, 2008: > I am using FreeBSD 6.3 and amanda 2.5.1p3. Running amcheck results no > errors. All the debug files result no > errors. > > But I got "Result Missing" errors back in the amreport after > amdump. Below is the error in amdump.1 file under the log location. The > log.error.1 file is created without anything in it. > > What can I do to fix this error? Please help. Thank you very much. > > > READING CONF FILES... > driver: pid 67085 executable /usr/local/libexec/amanda/driver version > 2.5.1p3 > planner: could not lock log file /var/log/amanda/dotProject/log: Invalid > argument > driver: could not lock log file /var/log/amanda/dotProject/log: Invalid > argument > - > > $amadmin x version | grep LOCKING > LOCKING=POSIX_FCNTL DEBUG_CODE AMANDA_DEBUG_DAYS=4 >From the fcntl(3) man page... [EINVAL] The cmd argument is F_DUPFD and arg is negative or greater than the maximum allowable number (see getdtablesize(2)). The argument cmd is F_GETLK, F_SETLK or F_SETLKW and the data to which arg points is not valid. And the amanda code to do the lock... int amflock( int fd, char * resource) { int r; #ifdef USE_POSIX_FCNTL (void)resource; /* Quiet unused paramater warning */ lock.l_type = F_WRLCK; lock.l_whence = SEEK_SET; r = fcntl(fd, F_SETLKW, &lock); #else It's not immediatly obvious why this would cause EINVAL. Out of curiosity, what is the output of 'df /var/log/amanda/dotProject'?
Re: caution: gtar 1.20 & amanda < 2.5.1
Jon LaBadie wrote at 01:57 -0400 on Sep 7, 2008: > On Sun, Sep 07, 2008 at 01:46:50AM -0400, Jon LaBadie wrote: > > On Sat, Sep 06, 2008 at 10:44:11PM -0600, John Hein wrote: > > > Someone may already know about this, but using gtar > 1.15.1 and > > > amanda < 2.5.1 will not work very well. > > > > > > The format of the "listed incremental" file has changed. Among other > > > things, the entries are now separated by '\0' "null" bytes rather than > > > newlines. [I'm not exactly sure why since it doesn't save any space > > > and I don't think '\n' is a valid character in a posix file name]. > > > > After a quick search I did not find a reference for this, but I'd > > be surprised if posix did not allow \n as a valid file name char. > > For the multiple decades I've used unix, it has always been valid. > > If not specifically allowed, it may be one of those undefined > > things that leaves it to the locale or character set. > > I just missed it. > > Only characters not allowed are slash '/' and null byte '\0'. Indeed. I just created a file with a \n. Finding different ways to accessing it via shell can provide hours of fun. So let me rephrase: I can't think of any reason anyone would want to put '\n' in a file name... except to make access to it harder. ;)
Re: caution: gtar 1.20 & amanda < 2.5.1
Dustin J. Mitchell wrote at 10:01 -0400 on Sep 7, 2008: > On Sun, Sep 7, 2008 at 12:44 AM, John Hein <[EMAIL PROTECTED]> wrote: > > There was a zmanda wiki page that described issues with various gnutar > > and amanda version combinations, but I can't seem to find it at the > > moment (the search doesn't turn it up that I saw). > > > > If someone finds it, let me know and I'll try to see that this info > > gets added. > > The link is: > > http://wiki.zmanda.com/index.php/FAQ:What_versions_of_GNU_Tar_are_Amanda-compatible%3F > which does specify that those versions will not get along, but not for > the reason you described, so it's definitely worth an edit. Thanks. I'll update it, including patches to fix both issues. Just curious... Why doesn't a wiki search for gnutar or gnu tar find this?
caution: gtar 1.20 & amanda < 2.5.1
Someone may already know about this, but using gtar > 1.15.1 and amanda < 2.5.1 will not work very well. The format of the "listed incremental" file has changed. Among other things, the entries are now separated by '\0' "null" bytes rather than newlines. [I'm not exactly sure why since it doesn't save any space and I don't think '\n' is a valid character in a posix file name]. This causes trouble for amanda < 2.5.1 which tries to read in the "old" snapshot file and copy it to a new one in a fgets/fputs loop which explicitly appends a newline to the copy whether there was one in the original or not. With the old format (gnutar <= 1.15.1), this was not a problem since it used newlines. Not only does amanda add newlines (which gtar 1.20 chokes on because it explicitly looks for '\0' characters and dies a fatal death on seeing '\n' instead - see read_unsigned_num() in src/incremen.c), but it also truncates the file because amanda stops processing the file early due to the null bytes. As a result, you will see things like so... yoyoma / lev 0 FAILED [dumps too big, 1 KB, but no incremental estimate] and/or planner: disk furble:/usr, estimate of level 1 failed: -1. There are some other oddities that I haven't fully figured out yet (I'm not sure they are fatal), but without patching client-src/sendbackup-gnutar.c to do what 2.5.1 and later does (or updating amanda), this problem is a showstopper. This could be particularly troublesome for clients that you can't update to a newer amanda (but more than likely in that case, you won't be updating them to gnutar 1.20 either). There was a zmanda wiki page that described issues with various gnutar and amanda version combinations, but I can't seem to find it at the moment (the search doesn't turn it up that I saw). If someone finds it, let me know and I'll try to see that this info gets added. Separate issue, but worth a mention... Note also that changing from gnutar <= 1.15.1 to a later version (if you already have incremental dump files in your gnutar-lists directories) will cause some very large incremental dumps to happen because of some details of the format change that I won't go into here.
dumps way too big, must skip incremental dumps
Yes, it's the classic problem. I understand the cause, but I have a question. a little background for those who don't know about this one... = Last night I got a bunch of these... elmer /hr lev 3 FAILED [dumps way too big, 9065 KB, must skip incremental dumps] As a result, lots of DLEs were just skipped. This can happen if, for instance, one DLE had lots of changes and its incremental dump is much larger than normal. Or you have one DLE that is extremely large and when it does its level 0 dump, it squeezes out all the other DLEs. So the planner decides it can't fit all the dumps for the configured tape size. The question... = It's been a while since I've looked at all the shiny new knobs that amanda has grown (generally we just leave amanda as is), but have we grown anything that allows amanda to at least _try_ to dump such 'skipped' incrementals to holding disk? [note... our server is still amanda-2.4.2p2, so maybe this has been fixed in a different way in 2.5 or later and I just haven't noticed]
incremental with gnutar bogusly dumping old files
The other night, a number of incremental dumps included a lot of files that should not have been dumped. As a result, I got a number of 'dumps way too big' failure messages causing a number of DLEs to not get dumped since the planner decided there was no room. For example, I have a old file, foo, somewhere under directory /xxx... stat -x foo File: "foo" Size: 296422 FileType: Regular File Mode: (0444/-r--r--r--) Uid: ( 631/ nrg) Gid: ( 2005/ web) Device: 0,86 Inode: 28006660Links: 1 Access: Tue Sep 25 06:29:20 2007 Modify: Mon Feb 23 14:34:18 2004 Change: Mon Sep 17 15:04:20 2007 zgrep foo index/host/_xxx_4.gz foo and from /tmp/amanda/sendbackup.20070925052340.debug on the client... sendbackup-gnutar: time 0.433: doing level 4 dump as listed-incremental from /local/backup/amanda/gnutar-lists/xxx_3 to /local/backup/amanda/gnutar-lists/xxx_4.new sendbackup-gnutar: time 0.477: doing level 4 dump from date: 2007-09-18 9:31:24 GMT sendbackup: time 0.530: spawning /site/dist/amanda-2.4.5/libexec/runtar in pipeline sendbackup: argument list: gtar --create --file - --directory /xxx --one-file-system --listed-incremental /local/backup/amanda/gnutar-lists/xxx_4.new --sparse --ignore-failed-read --totals --exclude-from /tmp/amanda/sendbackup._xxx.20070925052341.exclude . grep xxx.3 /etc/amandates /r/cvs 3 1190107884 env TZ=GMT date -r 1190107884 Tue Sep 18 09:31:24 GMT 2007 This matches debug output above, and is well beyond the mtime of the file in question: Feb 23, 2004. head -1 gnutar-lists/xxx_3 1190113536 gtar 1.15.1 amanda 2.4.5 From the above debug, it looks like amanda is doing the right thing. The only thing I can think of is an obscure gtar bug that doesn't work with certain dates, or the date at the top of xxx_4.new (used by --listed-incremental I believe) was somehow wrong after it got copied from xxx_3. There were no 'filesystem full' problems. Before last night, it did 7 nights at level 4 with no such problems. Last night, it pretty much did a level 0 as far as I can see (the index file looks like it has every file under /xxx) even though it claimed to be doing a level 4. Some files did change under /xxx, but the vast majority did not and still got dumped by gtar. The estimate was way too big, too... planner: time 12274.025: got result for host yiff disk /xxx: 0 -> 9389110K, 4 -> 9389100K, -1 -> -2K Has anyone else ever seen this behavior?