Re: amrecover failing

2014-07-19 Thread John Hein
Trever L. Adams wrote at 05:49 -0600 on Jul 19, 2014:
  Hello everyone,
 
  So, I am not quite sure what is going on. When I try to do an amrecover,
  I get the following:
 
  Load tape normal149 now
  Continue [?/Y/n/s/d]? Y
  Got no header and data from server, check in amidxtaped.*.debug and
  amandad.*.debug files on server
 
  The logs hold such things as:
 
  Sat Jul 19 05:45:39 2014: thd-0x1a16a00: amidxtaped: warning: Can't exec
  -eo: No such file or directory at
  /usr/lib64/perl5/vendor_perl/Amanda/Process.pm line 176.
 
  Sat Jul 19 05:45:39 2014: thd-0x1a16a00: amidxtaped: critical (fatal):
  -eo pid,ppid,command: No such file or directory at
  /usr/lib64/perl5/vendor_perl/Amanda/Process.pm line 176.
 
  amidxtaped:  -eo pid,ppid,command: No such file or directory at
  /usr/lib64/perl5/vendor_perl/Amanda/Process.pm line 176.
 
  /lib64/libamanda-3.3.3.so(+0x2b727)[0x7f3e7d9d0727]
  /lib64/libglib-2.0.so.0(g_logv+0x209)[0x7f3e7d6c9429]
  /lib64/libglib-2.0.so.0(g_log+0x8f)[0x7f3e7d6c963f]
  /usr/lib64/perl5/vendor_perl/auto/Amanda/MainLoop/libMainLoop.so(+0x4925)[0x7f3e7c5ad925]
  /lib64/libglib-2.0.so.0(+0x3f89449e43)[0x7f3e7d6c2e43]
  /lib64/libglib-2.0.so.0(g_main_context_dispatch+0x166)[0x7f3e7d6c22a6]
  /lib64/libglib-2.0.so.0(+0x3f89449628)[0x7f3e7d6c2628]
  /lib64/libglib-2.0.so.0(g_main_loop_run+0x6a)[0x7f3e7d6c2a3a]
  /usr/lib64/perl5/vendor_perl/auto/Amanda/MainLoop/libMainLoop.so(_wrap_run_c+0x50)[0x7f3e7c5adca0]
  /lib64/libperl.so.5.18(Perl_pp_entersub+0x5c6)[0x3008ec33e6]
  /lib64/libperl.so.5.18(Perl_runops_standard+0x2e)[0x3008ebb81e]
  /lib64/libperl.so.5.18(perl_run+0x300)[0x3008e52d40]
  /usr/bin/perl[0x400d29]
  /lib64/libc.so.6(__libc_start_main+0xf5)[0x3f86c21d65]
  /usr/bin/perl[0x400d61]
 
 
  This is also not 100% consistent. I got it to do a restore once. Doing
  the same steps with the same disk and date will not restore, it is all
  the above.
 
  Any ideas?

Looks like there was some build problem perhaps and $PS is not defined
in Constants.pm

locate Constants.pm | grep Amanda | xargs grep -i ps
=head1 SYNOPSIS
$PS = /bin/ps;
$PS_ARGUMENT = -eo pid,ppid,command;



Re: amrecover works, normal amanda backup, logging connection refused

2014-07-18 Thread John Hein
Gene Heskett wrote at 10:26 -0400 on Jul 18, 2014:
  Trying to figure out why amanda can't backup this machine, one of the
  things I noticed in /etc, is that on the shop box, which works, there is
  not an /etc/xinetd.d but it has an old-xinetd.d with a single stanza
  amanda file in it.
 
  An ls -lau shows that file, /etc/old-xinetd.d/amanda was apparently
  accessed a few minutes ago by my amcheck from the server.
 
  However, on the new install on the machine that is failing to allow the
  connection, there is an /etc/xinet.d, with an amanda file in it with an
  old last access date/time, was not 'touched' when I ran the amcheck.  Its
  last access date/time is I believe, the date/time of the installation
  itself.
 
  That amanda-common is 2.6.1p1 IIRC.
 
  amcheck says:
  WARNING: lathe: selfcheck request failed: Connection refused

Try running xinetd -d (then amcheck) to see if (or why not) xinetd is
running amandad.



Re: amrecover works, normal amanda backup, logging connection refused

2014-07-18 Thread John Hein
Gene Heskett wrote at 12:25 -0400 on Jul 18, 2014:
  14/7/18@12:09:37: ERROR: 3859 {activate_normal} bind failed (Address 
  already in use (errno = 98)). service = amanda

More than one xinetd or inetd running?

Maybe some basic background is in order.  The basic operation of
*inetd is pretty simple, and if you understand the basics, you can
really solve many of the common issues yourself.

*inetd runs forever listening on the sockets you tell it to
listen on (as configured by the xinetd or inetd config files).
When requests (any activity) on that socket come in, it tries
to run the service that is specified in its configuration.

If something else owns that socket, *inetd can't do its job
(i.e., can't start the service corresponds to that socket).

If not, then *inetd will spawn off the configured service (amandad
in amanda's case).

Technically, you don't need *inetd.  You can kick off amandad to run
on the client some other way (e.g., daemontools, ssh).  But the server
expects something to be listening on the client when it comes time to
do the dump.

As others have mentioned, you have to configure things for the right
type of socket - the configuration of the amanda server (primarily
in amanda.conf / disklist) and client (typically inetd config and
amanda-client.conf) should match (see amanda-auth(7) and
amanda-client.conf(5)).

Here's some other good info so you can maybe help yourself and
understand better how things work:

http://wiki.zmanda.com/index.php/Quick_start_%28old%29


Re: amrecover works, normal amanda backup, logging connection refused

2014-07-18 Thread John Hein
Gene Heskett gheskett-at-wdtv.com |amusersj-ml0| wrote at 15:07 -0400 on Jul 
18, 2014:
  On Friday 18 July 2014 14:22:48 John Hein did opine
  And Gene did reply:
   Gene Heskett wrote at 12:25 -0400 on Jul 18, 2014:
 14/7/18@12:09:37: ERROR: 3859 {activate_normal} bind failed (Address
 already in use (errno = 98)). service = amanda
  
   More than one xinetd or inetd running?
  
   Maybe some basic background is in order.  The basic operation of
   *inetd is pretty simple, and if you understand the basics, you can
   really solve many of the common issues yourself.
  
   *inetd runs forever listening on the sockets you tell it to
   listen on (as configured by the xinetd or inetd config files).
   When requests (any activity) on that socket come in, it tries
   to run the service that is specified in its configuration.
  
   If something else owns that socket, *inetd can't do its job
   (i.e., can't start the service corresponds to that socket).
  
   If not, then *inetd will spawn off the configured service (amandad
   in amanda's case).
  
   Technically, you don't need *inetd.  You can kick off amandad to run
   on the client some other way (e.g., daemontools, ssh).  But the server
   expects something to be listening on the client when it comes time to
   do the dump.
  
   As others have mentioned, you have to configure things for the right
   type of socket - the configuration of the amanda server (primarily
   in amanda.conf / disklist) and client (typically inetd config and
   amanda-client.conf) should match (see amanda-auth(7) and
   amanda-client.conf(5)).
  
   Here's some other good info so you can maybe help yourself and
   understand better how things work:
  
   http://wiki.zmanda.com/index.php/Quick_start_%28old%29
 
  I just discovered that the failing box did NOT have an /etc/amanda-
  client.conf, so I copied the one from examples and edited it.  But the
  working machine doesn't have one either, so I nuked it. amcheck didn't
  care.

You got that out of my email?

What about the most important bits:

two inetd's running?
and the bind failure?

And the hint to use the background info to try digging on your own a
little.  You're doing lots of things and it seems you don't know why -
just guessing.  That's never a good recipe.

Your xinetd got a bind failure.  That has nothing to do with amanda.
Fix that first.


Re: pre/post scripting

2014-07-09 Thread John Hein
Stefan G. Weichinger sgw-at-amanda.org |amusersj-ml0| wrote at 16:38 +0200 on 
Jul  9, 2014:
  Am 09.07.2014 16:17, schrieb Stefan G. Weichinger:
  
   Would anyone mind sharing some real world scripts he uses with amanda?
  
   I think of stopping/starting DBs or something like that.
  
   I would appreciate some good templates ;-)
 
  I started playing with the email examples from the docs but they fail
  straight away:
 
 
  define script-tool sc-email {
  comment email me before this DLE is backed up
  plugin  script-email
  execute-on pre-dle-backup
  execute-where server
  property mailto l...@xunil.at
  }
 
 
 
   gives me
 
  Jul 09 16:37:11 amanda Script_email[20663]: Use of uninitialized value
  in concatenation (.) or string at
  /usr/libexec/amanda/application/script-email line 181.
  Jul 09 16:37:11 amanda Script_email[20663]: Use of uninitialized value
  $args[2] in join or string at
  /usr/libexec/amanda/application/script-email line 182.
  Jul 09 16:37:11 amanda Script_email[20664]: Use of uninitialized value
  $args[2] in open at /usr/libexec/amanda/application/script-email line 185.
  Jul 09 16:37:11 amanda Script_email[20663]: Use of uninitialized value
  in concatenation (.) or string at
  /usr/libexec/amanda/application/script-email line 186.
 
 
  Does that work for anyone else?
  Does it need anymore properties set?
 
  Thanks, Stefan

I'm not sure about the exact cause of the errors you're seeing, but it
looks like the mailto check will not accept '@' or '.' (or dashes or
underscores or numbers).

To address that, maybe try this patch:

--- libexec/amanda/application/script-email.orig   2009-11-06 
10:27:46.0 -0700
+++ libexec/amanda/application/script-email2014-07-09 
10:02:06.0 -0600
@@ -154,7 +154,7 @@
my $dest;
if ($self-{mailto}) {
   my $destcheck = join ',', @{$self-{mailto}};
-  $destcheck =~ /^([a-zA-Z,]*)$/;
+  $destcheck =~ /^([-_[:alnum:],@.]*)$/;
   $dest = $1;
} else {
   $dest = root;


Or don't try to do the mailer's job and just skip the whole destcheck
part - let the mailer catch any errors:


--- libexec/amanda/application/script-email.orig   2009-11-06 
10:27:46.0 -0700
+++ libexec/amanda/application/script-email2014-07-09 
11:02:18.0 -0600
@@ -153,9 +153,7 @@
my($function) = @_;
my $dest;
if ($self-{mailto}) {
-  my $destcheck = join ',', @{$self-{mailto}};
-  $destcheck =~ /^([a-zA-Z,]*)$/;
-  $dest = $1;
+  $dest = join ',', @{$self-{mailto}};
} else {
   $dest = root;
}


Re: A handshake from amanda?

2014-07-01 Thread John Hein
Jon LaBadie wrote at 09:43 -0400 on Jul  1, 2014:
  On Tue, Jul 01, 2014 at 09:19:19AM -0400, Gene Heskett wrote:
   Greetings all;
  
   Pursuant to a conversation on the dovecot list about the relatively long
   times involved in rebuilding the dovecot.index file when it gets out of
   sync.
  
   It strikes me that if the backup program could be co-erced into sending a
   signal when it starts to backup a named directory, a signal that holds it
   until the processing of incoming mail has been stopped and the ack signal
   that it has been stopped sent back to amanda, effectively freezing the
   contents of what would normally be an active directory, so that the email
   corpus AND all its indexes would then be in sync when the backup is done.
  
   This would make any recovery efforts later into a considerable smoother
   action.
  
   I can see where such a feature could also be useful for a database of most
   any sort, mail being only an example.
  
   How feasible would it be to add this capability to amanda?
  
  I suspect a difficult problem would be how to get the multiple programs
  that modify the named directory to honor flag.

There already is support for performing operations before and after
the dump (among other things):

http://wiki.zmanda.com/index.php/Script_API

For older amanda versions that don't have the script API, the classic
method (which you can still use with newer amanda) is to configure
your amanda client to use a wrapper script instead of gtar or dump.
Then your wrapper script can determine if the DLE is one for which you
want to run some command to suspend normal operations (e.g., quiesce a
database or mail server) during the backup.


Re: conflicting types for 'g_queue_free_full'

2012-06-15 Thread John Hein
Subscriptions wrote at 02:21 + on Jun 16, 2012:
  As there is no Amanda 3.3.1 binary build for 64 bit Ubuntu 12.04, I've
  downloaded the latest stable version, but when I run the make I get the
  following error
 
 
  libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../config -I../gnulib
  -fno-strict-aliasing -D_GNU_SOURCE -pthread -I/usr/include/glib-2.0
  -I/usr/lib/x86_64-linux-gnu/glib-2.0/include -Wall -Wextra -Wparentheses
  -Wdeclaration-after-statement -Wmissing-prototypes -Wstrict-prototypes
  -Wmissing-declarations -Wformat -Wformat-security -Wsign-compare
  -Wfloat-equal -Wold-style-definition -Wno-strict-aliasing
  -Wno-unknown-pragmas -g -O2 -fno-strict-aliasing -MT amxml.lo -MD -MP
  -MF .deps/amxml.Tpo -c amxml.c  -fPIC -DPIC -o .libs/amxml.o
  In file included from util.h:39:0,
   from amxml.c:34:
  glib-util.h:75:6: error: conflicting types for 'g_queue_free_full'
  /usr/include/glib-2.0/glib/gqueue.h:76:10: note: previous declaration of
  'g_queue_free_full' was here
  make[3]: *** [amxml.lo] Error 1
 
  I ran ./configure with the defaults prior to make

Just remove g_queue_free_full from common-src/glib-util.[ch] ...

http://amanda.svn.sourceforge.net/viewvc/amanda?view=revisionrevision=4592

I don't think the version in amanda was ever(?) used.


RE: aclocal fails since AMANDA_INIT_VERSION call in configure.in

2012-06-11 Thread John Hein
I see no reference to m4_divert_diversion in AMANDA_INIT_VERSION ...

AC_DEFUN([AMANDA_INIT_VERSION],
[
m4_syscmd([test -f FULL_VERSION])
m4_if(m4_sysval, [0],
[
m4_define([AMANDA_F_VERSION], m4_chomp(m4_include([FULL_VERSION])))
],
[
m4_define([AMANDA_F_VERSION], m4_chomp(m4_include([VERSION])))

])
VERSION=AMANDA_F_VERSION
])

And those all look like pretty standard macros (defined by the
autoconf package) that are referenced in AMANDA_INIT_VERSION.

Ah... from the autoconf info pages...


 .
 .
   Unfortunately older versions of Automake (e.g., Automake 1.4) did
not quote the names of these macros.  Therefore, when `m4' finds
something like `AC_DEFUN(AM_TYPE_PTRDIFF_T, ...)' in `aclocal.m4',
`AM_TYPE_PTRDIFF_T' is expanded, replaced with its Autoconf definition.

   Fortunately Autoconf catches pre-`AC_INIT' expansions, and
complains, in its own words:

 $ cat configure.ac
 AC_INIT([Example], [1.0], [bug-exam...@example.org])
 AM_TYPE_PTRDIFF_T
 $ aclocal-1.4
 $ autoconf
 aclocal.m4:17: error: m4_defn: undefined macro: _m4_divert_diversion
 aclocal.m4:17: the top level
 autom4te: m4 failed with exit status: 1
 $

   Modern versions of Automake no longer define most of these macros,
and properly quote the names of the remaining macros.  If you must use
an old Automake, do not depend upon macros from Automake as it is
simply not its job to provide macros (but the one it requires itself):
 .
 .


Investigate your automake installation.


Kervin L. Pierre wrote at 13:19 + on Jun 11, 2012:
  Hello Jean-Louis,
  
  The error is from autogen.  I believe the error is at least related
  to the new AMANDA_INIT_VERSION macro in configure.in.  If I remove
  that first line in configure.in the then the error goes away and
  Amanda builds.
  
  # ./autogen 
  See DEVELOPING for instructions on updating:
   * gettext macros
   * gnulib
   * libtool files
  ..creating file lists
  ..aclocal
  configure.in:1: error: m4_defn: undefined macro: _m4_divert_diversion
  configure.in:1: the top level
  autom4te: /usr/bin/m4 failed with exit status: 1
  aclocal: autom4te failed with exit status: 1
  aclocal failed
  
  # aclocal --version
  aclocal (GNU automake) 1.11.1
  
  Best regards,
  Kervin
  
  
  Adevsoft Inc
  Business Software Development
  http://adevsoft.com/
  
  
   -Original Message-
   From: Jean-Louis Martineau [mailto:martin...@zmanda.com]
   Sent: Monday, June 11, 2012 8:24 AM
   To: Kervin L. Pierre
   Cc: amanda-users@amanda.org
   Subject: Re: aclocal fails since AMANDA_INIT_VERSION call in
   configure.in
   
   On 06/09/2012 12:08 PM, Kervin L. Pierre wrote:
I'm building on a stock Amazon Linux server with all available
   patches.
   
But it seems that since the new AMANDA_INIT_VERSION macro call was
   added a few weeks ago to configure.in, I haven't been able to run
   autogen.sh without error.
   
Removing the AMANDA_INIT_VERSION call before AC_INIT seems to be the
   only work around I've found.
   
Best regards,
Kervin
   
   
   Kevin,
   
   What error do you get?
   
   There is no autogen.sh in amanda, the program is autogen
   
   Jean-Louis
  
  
  


Re: Need help with new architecture for NAS/NFS setup

2011-11-03 Thread John Hein
Brendon Martino wrote at 11:53 -0400 on Nov  3, 2011:
  I'm running Amanda version 2.6.0p2-14 on Fedora 10. My current
  architecture for Amanda is as follows:
.
.
   How do I
  implement my architecture to only keep about a week (or even a day) of
  backups in the holding disk (locally on the system) but use the nfs
  storage space for archiving the rest of the old backups? The idea is
  that we keep 30 to 60 days worth of old backups on the NAS, but only the
  last day or few days locally on the backup server.
  
  How do I do that? Is it possible? What would be the general idea/layout?
  What directives would I need to change? Would I need to use multiple
  DailySets?? I'm totally stumped. Any advice would be greatly appreciated.

In addition to other suggestions on this thread, there's also amvault.
But not with amanda-2.6.


Re: possible issues with upgrade

2011-07-19 Thread John Hein
Jon LaBadie wrote at 18:49 -0400 on Jul 15, 2011:
  On Fri, Jul 15, 2011 at 05:08:24PM -0400, Chris Hoogendyk wrote:
   Thanks, Brian. So, basically, the upgrade in general is pretty 
   straightforward.
   
   The key point, though, is that I am promoting an important Amanda
   client to be the new Amanda server. During that process, there may
   be a period of time when the old server (running 2.5.1p3) will be
   talking to this important client that has been updated to 3.3.0, but
   not yet taken over as server. I don't want to miss backing it up
   during that time.
   
   So, briefly, will a 2.5.1p3 server have trouble with a 3.3.0 client
   (just until I get things completely swapped around)?
  
  I could fall back on the claim amanda is generally compatible with
  old releases, but that is not a lot of comfort without specific
  experience.  You may not need to run that gamut if you can briefly
  run your backups with two servers.
  
  Upgrade your new server (NS), comment it out of the disklist of the
  old server (OS).  Let NS back itself up as a client.
  
  Upgrade one or a few of the other OS clients and make them clients
  of NS.  Eventually all OS will be backing up is itself and you
  will have lots of experience upgrading clients and adding to NS.
  That is when you finally do OS and again have a single server.

I'll add that I usually configure amanda with
--prefix=/some/place/amanda-ver so I can have multiple versions of
amanda around at the same time (and accessible via shared NFS).  That
way it's easy to try different versions and still be able to go back
(and forth).



Re: Setting Up SSH transport. SOLVED

2011-06-03 Thread John Hein
Charles Curley wrote at 12:46 -0600 on Jun  3, 2011:
  Problem solved. As often happens, user error.
  
  When you log in over SSH the first time, you get the usual The
  authenticity of host 'foo' can't be established. message. To avoid
  that, you log in manually, accept the fingerprint entry in
  known_hosts. After that, no prompts from SSH on login. Amanda doesn't
  handle the first login well: it silently sigts there. To avoid that,
  you do the first login by hand, as noted in the HOWTO.

Did you try using StrictHostKeyChecking=no (at least when initially
adding a new host)?  When you update the wiki, maybe that would be
a useful hint as well.


Re: strategies for Mac desktops

2011-06-02 Thread John Hein
As you surmised, these are mostly gtar questions.
If your DLE is not a filesystem, then the other dump-ish choices are out.

But gtar (and star as well as the various flavors of dump) does _try_
to save space when it encounters hard links.

For instance (gtar 1.26), 
mkdir xx
dd if=/dev/zero of=xx/z count=1000
gtar cf a.tar xx
ln xx/z xx/l
gtar cf b.tar xx

a.tar  b.tar should be the same or similar size.

However, for an incremental dump on a DLE which has a new hard link, I
don't think gtar will get you the savings you want (if I'm reading
your reasoning correctly).  It's only when the original is in the same
tar image that you get the savings (i.e., not when the original is in
a different tarball).

mkdir xx
dd if=/dev/zero of=xx/z count=1000
/bin/rm l0
gtar cf 0.tar --listed-incremental=l0 xx
ln xx/z xx/l
cp -p l0 l1
gtar cf 1.tar --listed-incremental=l1 xx
touch xx/foo
cp -p l1 l2
gtar cf 2.tar --listed-incremental=l2 xx

In theory, tar's incremental mode might be able to realize that 'l'
points to 'z' and 'z' hasn't changed, so just archive the hard link
meta-data (i.e., not the contents).  But I don't think gtar rolls
like that - I'm not sure where/if the aforementioned theory may have
holes, but it seems it's not implemented that way at this time.

tar tvf 1.tar
drwxr-xr-x jhein/jhein   7 2011-06-02 14:41 xx/
-rw-r--r-- jhein/jhein  512000 2011-06-02 14:41 xx/l
hrw-r--r-- jhein/jhein   0 2011-06-02 14:41 xx/z link to xx/l

In this simple test 1.tar is just as big as 0.tar.  2.tar is
smaller, of course.

And if you just touch xx/l (or xx/z), then a 3.tar will be big again.

Testing for dumps (ufs, zfs) is left as an exercise for the reader ;).
If you find out, let us know.  Doing a quick test with star seems to
show it behaves the same as gtar.

Chris Hoogendyk wrote at 15:18 -0400 on Jun  2, 2011:
  OK, so maybe I shot myself in the foot by asking too much (no replies from 
  anyone in over 24 hours) ;-).
  
  Let me pare this down to one simple question -- Will Amanda efficiently do 
  server side incremental 
  backups of hard link intensive rsync based backups being stored on the 
  server from a workstation 
  (Mac or otherwise)? In other words, if the workstation creates a new folder 
  on the server and 
  populates it with hard links before running an rsync against it, will Amanda 
  see that as all being 
  new and backing up essentially a full of the user's files?
  
  I understand Amanda uses native tools and there is a possibility that this 
  will vary depending on 
  whether the server is using gnutar on a zfs volume, or ufsdump on a ufs 
  volume, etc. I'm just hoping 
  that someone has some specific experience they can relate, especially since 
  Zmanda is working with 
  BackupPC now.
  
  I'm guessing from Dustin's April 12, 2010 blog at http://code.v.igoro.us/ 
  (cyrus imap under list of 
  possible projects), that gnutar probably still doesn't deal well with the 
  hard links. I saw some 
  references while I was digging that imply that ufsdump should be alright. 
  But, I'd still like to 
  hear from anyone who has first hand experience or definitive knowledge.
  
  TIA,
  
  Chris Hoogendyk
  
  
  On 6/1/11 11:20 AM, Chris Hoogendyk wrote:
   I haven't tried this yet, but I'm hoping to get some comments and guidance 
   from others who may be 
   doing it. One particular question is set off in all caps below, but, more 
   generally, I'm open to 
   comments and advice on any of this.
  
   I have a number of Amanda installations in different departments and 
   buildings that backup all my 
   servers. They've all got tape libraries now and typically run a 6 week or 
   better daily rotation 
   with a weekly dump cycle.
  
   In the past I have punted on desktops, providing share space on the server 
   and advising people to 
   put what they want backed up on the server. Now we have converted most of 
   our office staff to 
   Macs, and I want to take a more integrated and automated approach for 
   critical staff machines. I 
   have a few options I'm looking at. One would be to automate Time Machine 
   to a share on the server 
   and back that up with Amanda. Another would be to script rsync to a server 
   share and back that up 
   with Amanda (we're using Samba for shares). The third would be to 
   implement Amanda Client on Mac 
   OS X for the staff and back that up from the server. Each of these 
   approaches has advantages and 
   disadvantages.
  
   If you have seen W. Curtis Preston's analysis of Time Machine, that 
   provides some background to my 
   questions. He wrote two blog posts. One breaks down time machine and 
   expresses some complaints 
   about it. The second replicates what time machine is doing using scripting 
   with rsync.
  
   http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/280-time-machine.html/
   

Re: Bacula -- Amanda migration

2011-03-03 Thread John Hein
Gour wrote at 16:23 +0100 on Mar  3, 2011:
  Is there any concern when migrating Amanda from Linux to FreeBSD?

Amanda should not have problems.  If you hit a snag, ask the list.
You may hit issues like the mt(1) syntax is different
(use mt for things like setting blocksize and disabling hardware
compression before amanda starts up).


long running amandad on client

2010-10-08 Thread John Hein
I have a client with an amandad that has been running since Sep 23...

backup 97592  0.0  0.1 26780  7016  ??  Ss   23Sep10  40:30.43 amandad

Most of the backups on that client still work fine.  But two DLEs
fail nightly.  On the server, you get:

1286525084.841860: chunker: getcmd: START 20101007210002
1286525084.841877: chunker: getcmd: PORT-WRITE 00-00195 
/holding/20101007210002/someclient._somedle.0 someclient 9ffe7f 
/somedle 0 1970:1:1:0:0:0 51
2000 APPLICATION 36 
|;auth=BSD;compress-fast;index;exclude-file=.no-amanda-backup;exclude-file=.nobak;exclude-file=.noback;exclude-file=.nodump;
1286525084.842069: chunker: stream_server opening socket with family 2 
(requested family was 2)
1286525084.842086: chunker: try_socksize: receive buffer size is 65536
1286525084.844115: chunker: bind_portrange2: Try  port 11017: Available - 
Success
1286525084.844135: chunker: stream_server: waiting for connection: 0.0.0.0.11017
1286525084.844142: chunker: putresult: 23 PORT
1286525084.847225: chunker: stream_accept: connection from 127.0.0.1.11002
1286525084.847233: chunker: try_socksize: receive buffer size is 65536
1286525264.872340: chunker: putresult: 10 FAILED
1286525264.872462: chunker: pid 18935 finish time Fri Oct  8 02:07:44 2010


The amandad log on the client shows nothing at the 1286525084
timestamp (yes, the hosts in questions have good time sync).

It does show sendbackup entries after the 3 minute timeout on the
server above (1286525264 timestamp).  So the client amandad seems to
just be slow in responding.


It's not clear why this long running amandad is slow in responding for
a couple DLEs, but it's definitely abnormal to have such a long
running amandad to begin with.


lsof shows lots of open file descriptors like so:

amandad 97592 backup  609u  PIPE 0xff01dccfa00016384
-0xff01dccfa158

There may be a descriptor leak bug, but that's sort of unimportant since
_usually_ amandad runs only briefly.

The real question is: why is amandad not exiting?

Has anyone seen this before?

I plan to kill amandad on the client, but I'll leave it running for a
bit longer in case there might be something that can be learned.
Unfortunately, this is amanda-2.6.1p1 on the client, so interest in
learning about this anomaly in that code is likely low.


Re: Nitpick, enhancement

2010-09-14 Thread John Hein
Brian Cuttler wrote at 09:20 -0400 on Sep 14, 2010:
  Not sure which part of amanda is the driver, amdump itself ?

driver  planner are executed in amdump.  You can see yourself - it's
just a script.


Re: [not a?] Nitpicks - rename DLE

2010-09-13 Thread John Hein
Dustin J. Mitchell wrote at 14:09 -0500 on Sep 13, 2010:
  On Sun, Sep 12, 2010 at 9:36 PM, John Hein jh...@symmetricom.com wrote:
   This may not be considered a nitpick but more of a feature request.
  
   If I move a disk or rename a host or move the host to a different
   domain, it'd be nice to be able to rename the disklist entry (DLE) and
   have history tracking, incremental planning, and most importantly
   recover/restore operations off tape know to follow the rename.
  
   Maybe it's as simple as allowing one or more alternate DLE name or
   alias (if you will) entries in a DLE (note the casual insertion of
   the word simple does not imply I have a patch, sorry).
  
   Going back and doing a rename on log files, index files, dump files,
   etc., is, of course, not practical and not really desired in terms of
   representing history of a name change.
  
  This is an interesting idea, both for the purpose you describe, and
  for the very futuristic and don't-get-excited-yet idea of virtual
  DLEs, where Amanda automatically splits DLEs based on size of
  subdirectories.  The main problem with virtual DLEs has been recovery:
  if Amanda is backing up a particular file in a different DLE every
  day, then it's going to be difficult to find it when it comes to
  running amrecover.  Incrementals are also a problem: changing the
  boundaries between DLEs obviously requires doing a full backup of all
  of the affected DLEs on the next run.  At least, unless we're going to
  become gnutar-specific and start futzing with the data in the backups
  on the server side.
  
  As you can see, complicated.  But a consistent approach to storing the
  DLE and path of a particular user object over time would be a useful
  first step.  Do you have any thoughts on how that might be
  implemented?

It may be useful to consider having amanda store the disklist it uses
with each backup (in the index db) [1].  And that there is some way to
correlate the same DLE in one disklist that may just have a
different hostname (or I suppose filesystem mount point - let's call
it a host/filesystem tuple because that sounds fun) to the renamed DLE
in another disklist.

I suppose we could take a page from revision control design and have a
DLE ID (SHA checksum perhaps of the important identifying contents
of the DLE - that is, not things like maxdumps).  And the hefty work
for this feature would be to add would be a way to link DLE IDs
together and have amanda understand the potential equality between
more than one DLE ID for the various history-traversing operations she
does.


[1] I keep the disklist in revision control anyway since it would
need to be consulted to properly restore some data from a year
ago that may have moved around.  Having amanda handle that
tracking would be nice.


vDLE (was: [not a?] Nitpicks - rename DLE)

2010-09-13 Thread John Hein
Dustin J. Mitchell wrote at 14:56 -0500 on Sep 13, 2010:
  On Mon, Sep 13, 2010 at 2:43 PM, Brian Cuttler br...@wadsworth.org wrote:
   Virtual DLEs !?!
  
   That is EXACTLY what we need !
  
   I know you warned us, but I'm REALLY Excited about this !
  
   That would _so_ fix my Terrabyte sized DLE problem...
  
  Yes, yes it would.  It would fix a lot of problems!
  
  I don't think it's the right solution to the problem, though.  It
  takes as fundamental Amanda's funny notion of DLEs and exclusion
  lists, and tries to build a working system around that.  If it could
  work in a way that makes any sense to the user, I might be convinced,
  but as it stands there are some *very* significant unanswered
  questions, and probably a lot of more subtle problems, too.
  
  Instead, we should look at how other backup software handles similar
  problems, and consider throwing out some long-standing Amanda
  weakne^Wfeatures.  You can see why this becomes a contentious issue
  very quickly.

One big weakness I always decry is that amanda can't automatically
balance (even possibly with a hint from the admin) DLE sizes.
Consider how many times you have had to manually break up a DLE
because it takes too long to back up (hits a timeout) or exceeds
certain system capacities - holding disk, tape, etc.  Well, with split
dumps, tape size is less of a problem for this issue these days.

This is a separate issue from renaming DLEs.  Well, at least,
I wasn't thinking of it being the same.  However, I can see how
a similar mechanism could be leveraged for various uses (including
meta-DLEs, DLE groups, balanced DLE dump sizing).

[Subject changed to reflect the thread hijack^W^Wchange in scope]

This has a lot of possibilities and could quickly get hard to bite off
a piece to implement.  I suppose it would be good to take a little
time to implement a good solid base for a few potential flavors of
feature candy in this area.


[not a?] Nitpicks - rename DLE

2010-09-12 Thread John Hein
Dustin J. Mitchell wrote at 10:55 -0500 on Sep  9, 2010:
  I bet most of you have some small nitpick with Amanda that you've
  never felt warranted an email.  Well, now's your chance!  I'd like to
  put some polish on Amanda, and it's hard for me to see the areas that
  need burnishing, since I work on Amanda all day, every day.

This may not be considered a nitpick but more of a feature request.

If I move a disk or rename a host or move the host to a different
domain, it'd be nice to be able to rename the disklist entry (DLE) and
have history tracking, incremental planning, and most importantly
recover/restore operations off tape know to follow the rename.

Maybe it's as simple as allowing one or more alternate DLE name or
alias (if you will) entries in a DLE (note the casual insertion of
the word simple does not imply I have a patch, sorry).

Going back and doing a rename on log files, index files, dump files,
etc., is, of course, not practical and not really desired in terms of
representing history of a name change.




Re: script help?

2010-07-21 Thread John Hein
Gene Heskett wrote at 12:04 -0400 on Jul 21, 2010:
  Greetings all;
  
  My catchup script seems to be working with 2 exceptions, first being that I 
  am not getting any emails from it, so I installed dnsmasq to see if that 
  fixes that.
  
  2nd, each pass through my catchup script loop elicits an error warning from 
  the main script:
  
  ./flush.sh: line 173: [: !=: unary operator expected
  
  from that script:
  # or are we running as flush.sh
  
  if [ $0 == ./flush.sh ] || [ $0 == ${MYDIR}/flush.sh ] || [ $0 == 
  flush.sh ]; then

== is a bash-ism.  Get in the habit of using = for posix compliance.
You'll thank yourself when you might have to run a script on a system
that doesn't use bash for sh(1) (bsd's, solaris).

Always put quotes around vars ($foo, $0, etc.) since vars
can contain strings with spaces confusing test(1) (aka [).

You don't need quotes around literals that you have written
(./flush.sh in your case above).  So in '[ $0 == ./flush.sh ]',
you have your quoting backwards (with respect to defensive script
writing).


  # we don't want amflush to disconnect or ask questions
  if [ `/bin/ls /dumps` !=  ] ; then  ---line 173
  echo Backup script running amflush -bf $CONFIGNAME  |tee -
  a  $LOG

Same deal (use quotes) for back-tick expressions (`cmd` or $(cmd)).
And that's your problem.  Looks like `ls /dumps` is providing empty
output.

example:

if [ `echo 1 2` = 1 ]; then echo y; else echo n; fi
[: 1: unexpected operator

 or

bash -c 'if [ `echo -n` !=  ]; then echo y; else echo n; fi'
bash: [: !=: unary operator expected


if [ `echo 1 2` = 1 ]; then echo y; else echo n; fi
n




Re: ZWC and exclude/include lists

2010-06-09 Thread John Hein
Sorry to hijack your thread, but I tried 'exclude file' a few months
ago and they didn't work either.

Chris Nighswonger wrote at 11:00 -0400 on Jun  9, 2010:
  Does ZWC honor exclude/include lists?
  
  I have a DLE like:
  
  foo.bar.com C:\\Documents\ and\ Settings {
exclude list C:\\.exclude
zwc-compress
estimate server
  }
  
  c:\.exclude includes several entries like:
  
  .\user1
  .\user2
  
  Looking at a tcpdump of the resulting transactions, I see the exclude
  list passed to the client, but the client still dumps the entirety of
  c:\documents and settings.
  
  Am I missing some syntax error here?
  
  Or perhaps this is the purpose of templates in the ZWC?
  
  Kind Regards,
  Chris


RE: runtar error that I do not understand

2010-06-02 Thread John Hein
As a workaround, perhaps you could unhide the snapshot directory.
man zfs.

McGraw, Robert P wrote at 11:23 -0400 on Jun  2, 2010:
  I am  not sure if this got sent to the group so I an fordwarding.
  
  This explains what is going on in the gtar code to cause gtar to seg fault. 
  All the accolades should go to my SA partner Chapman Flack. He is in the 
  process of sending this to bug-tar as suggested.
  
  Robert
  
  
  
  Apparently under only some circumstances, gtar tries to use the old 
  algorithm for finding the cwd. That fails beneath .zfs which is a sort of 
  magical name that isn't there unless you're looking for it exactly.
  
  But it must just be a special combination of options that makes gtar try to 
  do that, because in simple invocations it doesn't:
  
  hertz /homes % ls .z*# nobody here but us chickens...
  ls: No match.
  hertz /homes % cd .zfs   # oh, you mean ME?
  hertz .zfs % cd snapshot/TODAY/jflack
  % gtar --create --file /dev/null bar # works fine
  
  Aha, it's the --listed-incremental option. This makes gtar want to create 
  the snar file with names and metadata of files it sees:
  
  % gtar --create --file /dev/null --listed-incremental=/tmp/snar bar 
  Segmentation fault
  
  The funny thing is, if I cd to ~ where it works...
  
  % cd ~
  % gtar --create --file /dev/null --listed-incremental=/tmp/snar bar % 
  
  ...and then look in /tmp/snar, it only contains relative paths...
  ...so it STILL doesn't explain why gtar even wants to get the cwd in that 
  case, but it's clear from the truss that's what it's doing.
  
  One workaround might be to do a loopback mount of the desired snapshot onto 
  some other path that's not beneath .zfs, and do the backup from there.
  
  -Chap
  
   -Original Message-
   From: owner-amanda-us...@amanda.org [mailto:owner-amanda-
   us...@amanda.org] On Behalf Of Nathan Stratton Treadway
   Sent: Tuesday, May 04, 2010 2:45 PM
   To: amanda-users@amanda.org
   Subject: Re: runtar error that I do not understand
   
   
   On Tue, May 04, 2010 at 13:03:09 -0500, Dustin J. Mitchell wrote:
On Tue, May 4, 2010 at 12:27 PM, McGraw, Robert P
   rmcg...@purdue.edu wrote:
 I setup a lookback for the snapshot and now gtar seem to be
   working.
   
It sounds like you have a fairly good understanding of this problem
now.  Could you write up either a troubleshooting or How-To article
   on
the wiki?
   
   Also, you might want to send a bug report about this to the
   bug-...@gnu.org list -- even if the underlying problem is that .zfs
   doesn't behave normally, I suspect they'd be interested in knowing
   about
   the issue so that they can at least avoid having an abort-with-segfault
   in that situation...
   
 http://www.gnu.org/software/tar/#maillists
   
   
  Nathan
   
   ---
   -
   Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
   Ray Ontko  Co.  -  Software consulting services  -
   http://www.ontko.com/
GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID:
   1023D/ECFB6239
Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239
  
  


no-reuse tape was written to

2010-03-10 Thread John Hein
Has anyone seen a tape that was marked no-reused get _used_?
It happened last night here (amanda-2.6.1p2).


Re: no-reuse tape was written to

2010-03-10 Thread John Hein
Dustin J. Mitchell wrote at 12:22 -0600 on Mar 10, 2010:
  On Wed, Mar 10, 2010 at 11:44 AM, John Hein jh...@symmetricom.com wrote:
   Has anyone seen a tape that was marked no-reused get _used_?
   It happened last night here (amanda-2.6.1p2).
  
  To take a guess: label_new_tapes is set and the tape drive encountered
  an error when try to read the label, thus not recognizing it as an
  already-labeled tape.
  
  Just a guess..

label_new_tapes is not set in amanda.conf

A changer (chg-zd-mtx) with barcodes is involved, and the debug logs
show it specifically using -slot next to unload the current tape
that was in the drive when amdump started and load the no-reuse tape.

I haven't looked in the code to see if  where it cares to look
at no-reuse and decide not to use the tape.

I was wrong, BTW.  This is 2.6.1p1 (not p2).


Error redirecting stderr to fd 52

2009-08-24 Thread John Hein
On a 2.6.1b1 client ...

1251090802.506210: sendbackup: pid 61161 ruid 5001 euid 5001 version 2.6.1b1: 
start at Sun Aug 23 23:13:22 2009
1251090802.506278: sendbackup: Version 2.6.1b1
1251090802.511032: sendbackup: pid 61161 ruid 5001 euid 5001 version 2.6.1b1: 
rename at Sun Aug 23 23:13:22 2009
1251090802.511055: sendbackup:   sendbackup req: GNUTAR /data  1 
2009:8:21:6:55:50 OPTIONS 
|;auth=bsd;compress-fast;index;exclude-list=/site/etc/amanda/exclude-gtar;
1251090802.511102: sendbackup:   Parsed request as: program `GNUTAR'
1251090802.511109: sendbackup:  disk `/data'
1251090802.53: sendbackup:  device `/data'
1251090802.58: sendbackup:  level 1
1251090802.511123: sendbackup:  since 2009:8:21:6:55:50
1251090802.511128: sendbackup:  options 
`|;auth=bsd;compress-fast;index;exclude-list=/site/etc/amanda/exclude-gtar;'
1251090802.511203: sendbackup: Error redirecting stderr to fd 52: Bad file 
descriptor
1251090802.511215: sendbackup: pid 61161 finish time Sun Aug 23 23:13:22 2009


Has anyone ever seen that?

This is on a client with about a dozen DLEs with possibly 3
dumping in parallel at a time.  The DLE in question is
not small - certainly not so small to complete in 5 ms.

This DLE sometimes works.  Sometimes a different one fails the same
way.

It looks like the mesgfd in client-src/sendbackup.c is getting
closed before dup2(2) runs.  Perhaps a race.

amandad log (below) shows no obvious trouble other than it is continuing to do
work after the child sendbackup has failed (as seen in the log output above).

The two security_stream_close messages seem to be different than the
log messages associated with DLEs that worked.  The working ones have
three security_stream_close messages.  But, oddly, the DLE that worked
(and was small) right _before_ the failed DLE did _not_ have any
security_stream_close messages.  Possibly a clue.  In fact, it
seems all the failures are happening right after a small ( 10 MB),
and thus quick, dump.  Could just be a coincidence.

1251090802.228997: amandad: dgram_recv(dgram=0x280c2a04, timeout=0, 
fromaddr=0x280d29f0)
1251090802.229037: amandad: (sockaddr_in *)0x280d29f0 = { 2, 703, 
206.168.13.161 }
1251090802.229055: amandad: security_handleinit(handle=0x8052600, 
driver=0x280bc520 (BSD))
1251090802.235787: amandad: accept recv REQ pkt:

SERVICE sendbackup
OPTIONS features=9ffe00;hostname=bunny;config=test;
GNUTAR /data  1 2009:8:21:6:55:50 OPTIONS 
|;auth=bsd;compress-fast;index;exclude-list=/site/etc/amanda/exclude-gtar;

1251090802.237043: amandad: creating new service: sendbackup
OPTIONS features=9ffe00;hostname=bunny;config=test;
GNUTAR /data  1 2009:8:21:6:55:50 OPTIONS 
|;auth=bsd;compress-fast;index;exclude-list=/site/etc/amanda/exclude-gtar;

1251090802.237710: amandad: sending ACK pkt:


1251090802.237764: amandad: dgram_send_addr(addr=0x8052620, dgram=0x280c2a04)
1251090802.237772: amandad: (sockaddr_in *)0x8052620 = { 2, 703, 206.168.13.161 
}
1251090802.237779: amandad: dgram_send_addr: 0x280c2a04-socket = 0
1251090802.511364: amandad: security_streaminit(stream=0x81dd000, 
driver=0x280bc520 (BSD))
1251090802.511719: amandad: stream_server opening socket with family 2 
(requested family was 2)
1251090802.511736: amandad: try_socksize: send buffer size is 65536
1251090802.511743: amandad: try_socksize: receive buffer size is 65536
1251090802.512604: amandad: bind_portrange2: Try  port 6108: Available - Success
1251090802.512617: amandad: stream_server: waiting for connection: 0.0.0.0.6108
1251090802.512643: amandad: security_streaminit(stream=0x81e6000, 
driver=0x280bc520 (BSD))
1251090802.512658: amandad: stream_server opening socket with family 2 
(requested family was 2)
1251090802.512669: amandad: try_socksize: send buffer size is 65536
1251090802.512677: amandad: try_socksize: receive buffer size is 65536
1251090802.513496: amandad: bind_portrange2: Try  port 6108: Available - 
Address already in use
1251090802.514300: amandad: bind_portrange2: Try  port 6109: Available - Success
1251090802.514311: amandad: stream_server: waiting for connection: 0.0.0.0.6109
1251090802.514319: amandad: security_streaminit(stream=0x81ef000, 
driver=0x280bc520 (BSD))
1251090802.514333: amandad: stream_server opening socket with family 2 
(requested family was 2)
1251090802.514344: amandad: try_socksize: send buffer size is 65536
1251090802.514351: amandad: try_socksize: receive buffer size is 65536
1251090802.515189: amandad: bind_portrange2: Try  port 6108: Available - 
Address already in use
1251090802.515991: amandad: bind_portrange2: Try  port 6109: Available - 
Address already in use
1251090802.516772: amandad: bind_portrange2: Skip port 6110: Owned by softcm.
1251090802.517540: amandad: bind_portrange2: Skip port 6111: Owned by spc.
1251090802.518336: amandad: bind_portrange2: Try  port 6112: Available - Success

Re: Error redirecting stderr to fd 52

2009-08-24 Thread John Hein
Jean-Louis Martineau wrote at 11:48 -0400 on Aug 24, 2009:
  John Hein wrote:
   On a 2.6.1b1 client ...
 
  Hmm, beta software ...
  
  It's not fixed in 2.6.1 neither in 2.6.1p1.
  
  You must use the latest 2.6.1p1 snapshot from 
  http://www.zmanda.com/community-builds.php

Building a new version now.

Another interesting note.  On the client in question, amandad is still
running, but it shouldn't be.  It's got a couple unreaped zombies and
is waiting in select.

ps awwx -o user,pid,ppid,start,stat,wchan,command | grep backup
backup   37628 60010  4:43AM Z-  defunct
backup   37629 60010  4:43AM Z-  defunct
backup   60010 23993  7:01PM Ss   select amandad


lsof shows quite a few file descriptors still open.

COMMAND   PID   USER   FD   TYPE   DEVICE SIZE/OFFNODE NAME
amandad 60010 backup0u  IPv4   0xc87f5438  0t0 UDP *:amanda
amandad 60010 backup1u  IPv4   0xc87f5438  0t0 UDP *:amanda
amandad 60010 backup2u  IPv4   0xc87f5438  0t0 UDP *:amanda
amandad 60010 backup4u  IPv4   0xcdd09bf4  0t0 UDP *:58068
amandad 60010 backup   10w  VREG 0,88   107863  331113 / -- 
amandad/amandad.20090823190102.debug
amandad 60010 backup   21u  PIPE   0xcb0064c816384 -0xcb006580
amandad 60010 backup   22u  PIPE   0xc916a99016384 -0xc916aa48
amandad 60010 backup   23u  PIPE   0xcd62999016384 -0xcd629a48
amandad 60010 backup   24u  PIPE   0xce8a899016384 -0xce8a8a48
amandad 60010 backup   28u  PIPE   0xc86b47f816384 -0xc86b48b0
amandad 60010 backup   29u  IPv4   0xce6f63a0  0t0 TCP *:6108 (LISTEN)
amandad 60010 backup   30u  PIPE   0xc8d2f33016384 -0xc8d2f3e8
amandad 60010 backup   31u  PIPE   0xc946619816384 -0xc9466250
amandad 60010 backup   33u  PIPE   0xca15c33016384 -0xca15c3e8
amandad 60010 backup   37u  IPv4   0t0 TCP no PCB, 
CANTSENDMORE, CANTRCVMORE
amandad 60010 backup   39u  PIPE   0xcdeb4cc016384 -0xcdeb4d78
amandad 60010 backup   40u  PIPE   0xc9c307f816384 -0xc9c308b0
amandad 60010 backup   42u  PIPE   0xc96317f816384 -0xc96318b0
amandad 60010 backup   44u  PIPE   0xc9c3066016384 -0xc9c30718
amandad 60010 backup   46u  PIPE   0xc9c1d33016384 -0xc9c1d3e8
amandad 60010 backup   47u  PIPE   0xc914e7180 -0xc914e660
amandad 60010 backup   48u  PIPE   0xcc4dd4c816384 -0xcc4dd580
amandad 60010 backup   49u  PIPE   0xc96be7f816384 -0xc96be8b0
amandad 60010 backup   50u  PIPE   0xce8a833016384 -0xce8a83e8
amandad 60010 backup   51u  PIPE   0xc914e5800 -0xc914e4c8
amandad 60010 backup   52u  PIPE   0xc914ecc016384 -0xc914ed78
amandad 60010 backup   60u  PIPE   0xc8e1033016384 -0xc8e103e8
amandad 60010 backup   61u  PIPE   0xcc37fb2816384 -0xcc37fbe0
amandad 60010 backup   62u  PIPE   0xce64966016384 -0xce649718
amandad 60010 backup   63u  PIPE   0xcaa0099016384 -0xcaa00a48
amandad 60010 backup   64u  PIPE   0xc912f66016384 -0xc912f718
amandad 60010 backup   65u  PIPE   0xc90f1cc016384 -0xc90f1d78

A few seconds of tracing the process shows..

 60010 amandad  RET   poll 0
 60010 amandad  CALL  gettimeofday(0xbfbfea88,0)
 60010 amandad  RET   gettimeofday 0
 60010 amandad  CALL  gettimeofday(0xbfbfea88,0)
 60010 amandad  RET   gettimeofday 0
 60010 amandad  CALL  poll(0x8052600,0x1,0x7530)



Re: Backup issues with OpenBSD 4.5 machines

2009-08-24 Thread John Hein
stan wrote at 16:59 -0400 on Aug 24, 2009:
  The firts thing I notice when comparing this function in 2.5.0 vs 2.5.2 is
  that 2.5.0 does:
  
  tv.tv_usec = 0;
  
  and 2.5.2 does not. Could thim make a difference? Both do 
  
  tv.tv_sec = timeout;

In 2.5.2, the memset sets the entire struct to 0.
2.5.0 is slightly more efficient, but otherwise the results
wind up being the same.

Nothing to see there.


Re: Backup issues with OpenBSD 4.5 machines

2009-08-21 Thread John Hein
stan wrote at 10:56 -0400 on Aug 21, 2009:
  OK here is the latest on this saga :-)
  
  On one of the OpenBSD 4.5 machines I have built 2.5.0p1, and was able to
  back this machine up successfully (using classic UDP based authentication)
  
  On another of them, I built 2.5.2p1. The first attempt to back this machine
  up failed. I checked the log files, and found they were having issues
  because /etc/amdates was missing. I corrected that, and started a 2nd
  backup run. (Remember amcheck reports all is well with this machine). I 
  got the following from amstatus when I attempted to back up this machine.
  Also remember, one of the test I ran with a 2.6.1 client was to connect a
  test machine directly to the client, using a crossover cable to eliminate
  any firewall, or router type issues.
  
  I am attaching, what I think is, the amadnad debug file associated with this
  failure.
  
  Can anyone suggest what I can do to further troubleshoot this?
  
  pb48:wd0f 1  dumper: [could not connect DATA stream:
  can't connect stream to pb48.meadwestvaco.com port 11996: Connection
  refused] (10:37:27)
  
   .
   .
   .
  amandad: time 30.019: stream_accept: timeout after 30 seconds
  amandad: time 30.019: security_stream_seterr(0x86b67000, can't accept new 
  stream connection: No such file or directory)
  amandad: time 30.019: stream 0 accept failed: unknown protocol error
  amandad: time 30.019: security_stream_close(0x86b67000)
  amandad: time 60.027: stream_accept: timeout after 30 seconds
  amandad: time 60.027: security_stream_seterr(0x81212000, can't accept new 
  stream connection: No such file or directory)
  amandad: time 60.027: stream 1 accept failed: unknown protocol error
  amandad: time 60.027: security_stream_close(0x81212000)
  amandad: time 90.035: stream_accept: timeout after 30 seconds
  amandad: time 90.036: security_stream_seterr(0x84877000, can't accept new 
  stream connection: No such file or directory)
  amandad: time 90.036: stream 2 accept failed: unknown protocol error
  amandad: time 90.036: security_stream_close(0x84877000)
  amandad: time 90.036: security_close(handle=0x81bbf800, driver=0x298a9240 
  (BSD))
  amandad: time 120.044: pid 17702 finish time Fri Aug 21 10:39:27 2009

For some reason the socket is not getting marked ready for read.
select(2) is timing out waiting.  Firewall setup perhaps?

This bit of code in 2.5.2p1's common-src/stream.c is where
the failure is happening for you...

int
stream_accept(
int server_socket,
int timeout,
size_t sendsize,
size_t recvsize)
{
SELECT_ARG_TYPE readset;
struct timeval tv;
int nfound, connected_socket;
int save_errno;
int ntries = 0;
in_port_t port;

assert(server_socket = 0);

do {
ntries++;
memset(tv, 0, SIZEOF(tv));
tv.tv_sec = timeout;
memset(readset, 0, SIZEOF(readset));
FD_ZERO(readset);
FD_SET(server_socket, readset);
nfound = select(server_socket+1, readset, NULL, NULL, tv);
if(nfound = 0 || !FD_ISSET(server_socket, readset)) {
save_errno = errno;
if(nfound  0) {
dbprintf((%s: stream_accept: select() failed: %s\n,
  debug_prefix_time(NULL),
  strerror(save_errno)));
} else if(nfound == 0) {
dbprintf((%s: stream_accept: timeout after %d second%s\n,
  debug_prefix_time(NULL),
  timeout,
  (timeout == 1) ?  : s));
errno = ENOENT; /* ??? */
return -1;


Re: Backup issues with OpenBSD 4.5 machines

2009-08-21 Thread John Hein
stan wrote at 13:56 -0400 on Aug 21, 2009:
  OK, I reproduced the failure with only a crossover cable between the test
  client and the Amanda Master:

Just because you're using a crossover cable doesn't rule out firewall
or other such socket level interference.  I'm not saying that's your
problem, but using a crossover cable doesn't rule it out.


  192.168.1.2:wd0f 0  dumper: [could not connect DATA stream: can't connect
  stream to 192.168.1.2 port 24376: Connection refused] (13:48:23)
  
  Note the 192.168.1.2 address :-)
  
  This is with a 2.5.2p1 clinet on OpenBSD 4.5 2.5.0p1 works on this same
  machine/OS/netwrok configuration.
  
  So, it appears to me that this must be because of something that changed
  between 2.5.0p1 and 2.5.2p1. And we have a pretty good idea where in the
  code this is failing. So can anyone enlighten me as to what chaged in this
  area between those 2 versions?

I haven't looked to see what changed between 2.5.0 and 2.5.2.  It's
all pretty basic socket stuff.  I wouldn't be surprised if that is
when the additional auth mechanisms (bsdudp, bsdtcp) were added.
However, if no one chimes in, it's not that hard to look yourself.  If
you can narrow it down a bit to where there seems to be a problem in
the code, the amanda-hackers@ list might be able to help more.


RE: Amanda and dual tape libraries.

2009-05-14 Thread John Hein
Onotsky, Steve x55328 wrote at 14:23 -0400 on May 14, 2009:
  I agree, but the caveat is that the planner will do its darndest to
  make full use of the extended capacity of the LTO4 cartridge.
  
  In my case, our backups went from between 5 and 8 hours with LTO2
  tapes to well over 24 hours in some cases with the LOT4s - same
  DLEs.  It took some fancy footwork to get it to a reasonable window
  (about the same length of time as with the 2s but some of the
  larger DLEs are forced to incremental on weekdays).  This is so we
  can get the cartridges ready for pickup by our offsite storage
  provider.

You can lie about your tape size in the tapetype, of course.
You can even have different lies for different configurations.

I've always wanted a knob to tell the scheduler to shoot for a
smaller percentage of the total tape size, but to go ahead and use
more if needed.  Kind of an average target total size for the dumps.

Maybe there is such a knob these days.  I hope someone will
say if there is.

Lying about the tape size usually works fine.  And it will go over
that declared size if needed - if, for instance, some unexpected
increase in size to a DLE happens after the estimate completes (or for
whatever reason, the estimate is too low).


Re: amrecover stuck on set owner/mode for '.'

2009-05-01 Thread John Hein
Dustin J. Mitchell wrote at 11:11 -0400 on May  1, 2009:
  On Fri, May 1, 2009 at 10:20 AM, Joe Konecny jkone...@rmtohio.com wrote:
   I just ran amrecover and successfully put a file into /tmp.
   amrecover prompted me set owner/mode for '.'? [yn] to
   which I answered n and hit enter.  Now it just sits there
   for over 30 minutes doing nothing.  What could be going on?
  
  Hmm, I can only find that in the manpage, not the source.  Presumably
  it's some old message which has since been removed.

This is tangential to the OP's problem, but it depends on what source
you're looking at.  That message is probably coming from the 'restore'
OS utility, not amanda.


Re: Fwd: The Coyote Den AMANDA MAIL REPORT FOR April 9, 2009

2009-04-09 Thread John Hein
Gene Heskett wrote at 08:53 -0400 on Apr  9, 2009:

  Uptime is about 5 days now, but this may be the beginning of the
  end.  Something made it think all data was new from the looks of
  this.  This was the first run of 20090323, 20090321 works fine.
  Another device mapper screwup?  It was updated by yum yesterday.

It probably was a device numbering change - if you are in the habit of
rebooting after a yum update.  It's probably worthwhile noting the
disk device numbering before / after a reboot.  Then if it changes,
you can repair the gnutar-list files to point at the new device to
avoid the everything has changed incremental dump (or 'amadmin
force' a level 0, but that has problems with scheduling balance not to
mention potentially running out of room depending on tape / vtape /
holding disk size).

I think there was a script floating around to change the dev # in the
gnutar listed incremental files - possibly Dustin's.


Re: The Coyote Den AMANDA MAIL REPORT FOR April 9, 2009

2009-04-09 Thread John Hein
Gene Heskett wrote at 10:05 -0400 on Apr  9, 2009:
  On Thursday 09 April 2009, Dustin J. Mitchell wrote:
  On Thu, Apr 9, 2009 at 8:53 AM, Gene Heskett gene.hesk...@verizon.net 
  wrote:
   Uptime is about 5 days now, but this may be the beginning of the end.
   Something made it think all data was new from the looks of this.  This was
   the first run of 20090323, 20090321 works fine.  Another device mapper
   screwup? It was updated by yum yesterday.
  
  That seems the most likely cause.
  
  Dustin
  
  Yeas, but I thought we had worked out something that made amanda immune to 
  those little annoyances?  Or was tar changed and the fix didn't work now?
  
  [ama...@coyote GenesAmandaHelper-0.6]$ tar --version
  tar (GNU tar) 1.20
[snip]
  Here is the amgtar stanza from amanda.conf:
  
  define application-tool app_amgtar {
   comment amgtar
   plugin  amgtar
   #property GNUTAR-PATH /path/to/gtar
   #property GNUTAR-LISTDIR /path/to/gnutar_list_dir
   #default from gnutar_list_dir setting in amanda-client.conf
   #property ONE-FILE-SYSTEM yes  #use '--one-file-system' option
   #property SPARSE yes   #use '--sparse' option
   #property ATIME-PRESERVE yes   #use '--atime-preserve=system' option
   property CHECK-DEVICE no   #use '--no-check-device' if set to 
  no
  }
  Do I need to set additional options?

This (snippet below from the gtar NEWS file) was added in 1.21

==
* New options --no-check-device, --check-device.

The `--no-check-device' option disables comparing device numbers during
preparatory stage of an incremental dump.  This allows to avoid
creating full dumps if the device numbers change (e.g. when using an
LVM snapshot).

The `--check-device' option enables comparing device numbers.  This is
the default.  This option is provided to undo the effect of the previous
`--no-check-device' option, e.g. if it was set in TAR_OPTIONS
environment variable.
==


Re: The Coyote Den AMANDA MAIL REPORT FOR April 9, 2009

2009-04-09 Thread John Hein
John Hein wrote at 08:47 -0600 on Apr  9, 2009:
  This (snippet below from the gtar NEWS file) was added in 1.21

Woops.  Sorry --no-check-device was added for 1.20.  I've never tested
it, however.  If you can prove a device number change and use of this
option is causing the big incremental dump problem, let us know
(start a separate thread).


Re: amgetconf: not found

2009-03-17 Thread John Hein
Dustin J. Mitchell wrote at 10:11 -0400 on Mar 17, 2009:
  2009/3/17 Zbigniew Szalbot z.szal...@lcwords.com:
   Ever since the upgrade, I am not able to perform the backup.
  
   % /usr/local/sbin/amdump Backup2Disk
   amgetconf: not found
  
   I use FreeBSD 7.1-RELEASE if that matters.
  
   Would you help me find out how I can solve this problem and resume
   backups? Thank you!
  
  Is amgetconf installed?  Is it in the same location as intended when
  it was compiled?  amdump is a shell script, so this should be pretty
  easy for you to track down.

That can be a misleading error message that points to perl not
available.  If you do have amgetconf, and it is in the path, then do
'head amgetconf' and check and make sure the perl it's looking for is
installed and working.


Re: More ranting about issues compiling 2.6.1 on older machines

2009-03-02 Thread John Hein
stan wrote at 08:30 -0500 on Mar  2, 2009:
  I really think we need to come up with a plan that results in it being
  easier to comile clients on older machines. I have expressed my opinion
  that this needs to be a forkof a 2.5 branch, but I did not seem to get much
  in the way of buy in by others on this list ofr that. Does anyiine have a
  better plan?

You never really said why you need to fork 2.5 as opposed
to just run 2.5.2 (or 2.4.5) on older clients.

Security fixes?
Specific features?

I think that putting security fixes onto a branch of 2.5 might be a
reasonable task.  Backporting some of the newer APIs would likely
be a good bit more work, and, depending on your point of view,
possibly not worth it.

That said, it's possible committers would be willing to entertain
committing patches to a 2.5.2 branch.  I can't speak for them, but if
the work is made minimal (by submitting well-documented patches), they
might be reasonable about it.  You could test the waters with a patch
to fix some buffer overflow and ask (on amanda-hackers) if they would
be willing to commit it.  Cutting a new release is probably beyond the
scope, but making commits to a legacy branch for a while seems
reasonable.

And if they don't, then you could, as you seem to be hinting, start a
fork yourself.  I can't say how popular it would be.  Personally, I've
had reasonable success getting the newer code to compile / run on
older machines, certainly for clients if not the server code.  It may
be less work than a fork (and patches possibly more acceptable to the
current maintainers).

But if you publish a fork (whether it be a patchset or a public
repository), there's likely a greater than zero chance that someone
will use it - I just can't say how much greater than zero ;).


Re: amrestore: NAK: user root from localhost is not allowed to execute the service amindexd

2009-03-01 Thread John Hein
Charles Curley wrote at 08:29 -0700 on Mar  1, 2009:
  r...@dragon:/home/ccurley/projects/ror# amrecover -C /etc/amanda/DailySet1
  AMRECOVER Version 2.5.2p1. Contacting server on localhost ...
  NAK: user root from localhost is not allowed to execute the service 
  amindexd: Please add amindexd amidxtaped to the line in 
  /var/backups/.amandahosts on the client

^^^ Note, user root


  However, I have already added the quoted text to the .amandahosts
  file, on both client and server, like so:
  
  chaffee.localdomain backup amindexd amidxtaped

You have the backup user here.

http://wiki.zmanda.com/index.php/How_To:Migrate_from_older_amanda_versions#Problems_with_amrecover_from_amanda_2.5


Re: amrestore: NAK: user root from localhost is not allowed to execute the service amindexd

2009-03-01 Thread John Hein
Charles Curley wrote at 11:48 -0700 on Mar  1, 2009:
  On Sun, Mar 01, 2009 at 10:11:38AM -0700, John Hein wrote:
   Charles Curley wrote at 08:29 -0700 on Mar  1, 2009:
 r...@dragon:/home/ccurley/projects/ror# amrecover -C 
   /etc/amanda/DailySet1
 AMRECOVER Version 2.5.2p1. Contacting server on localhost ...
 NAK: user root from localhost is not allowed to execute the service 
   amindexd: Please add amindexd amidxtaped to the line in 
   /var/backups/.amandahosts on the client
   
   ^^^ Note, user root
   
   
 However, I have already added the quoted text to the .amandahosts
 file, on both client and server, like so:
 
 chaffee.localdomain backup amindexd amidxtaped
   
   You have the backup user here.
   
   http://wiki.zmanda.com/index.php/How_To:Migrate_from_older_amanda_versions#Problems_with_amrecover_from_amanda_2.5
  
  Right. I did:
  
  --
  chaffee.localdomain backup amdump
  chaffee.localdomain root amindexd amidxtaped
  --
  
  and various permutations thereof on the client and the server. No go.
  
  I also commented out server_args in the xinetd.d/amanda file. Also no
  go.

Do you know how your stock ubuntu build of amanda was configured (the
args to configure)?

I just noticed that the request came from 'localhost' which does
not match your .amandahosts entry.


Re: amrestore: NAK: user root from localhost is not allowed to execute the service amindexd

2009-03-01 Thread John Hein
Charles Curley wrote at 16:53 -0700 on Mar  1, 2009:
  On Sun, Mar 01, 2009 at 02:18:27PM -0700, Charles Curley wrote:
   On Sun, Mar 01, 2009 at 12:50:50PM -0700, John Hein wrote:
Do you know how your stock ubuntu build of amanda was configured (the
args to configure)?

I just noticed that the request came from 'localhost' which does
not match your .amandahosts entry.
   
   Short of pulling in the source package and looking at that, I have no
   idea. I don't even know how to find out, other than ask on another
   list.

Yep, this is an example of one disadvantage of using prebuilt packages.


   I also don't see any way to override the host name. -o host and -o
   hostname are rejected.

man amrecover (see -s  -t).  I don't know if there is a run-time
configuration option for these (I didn't see one after a quick read
of the man pages) - if so, -o would be of no help.


  For what it's worth, I came up with a work-around. On the server, I
  added localhost.localdomain to .amandahosts, ran amrecover, and that
  worked.

Yes, that is what I was getting at.  Good to hear it worked for you.


  --
  chaffee.localdomain backup amdump
  chaffee.localdomain root amindexd amidxtaped
  localhost.localdomain root amindexd amidxtaped
  --
  
  r...@chaffee:~/test# amrecover 
  AMRECOVER Version 2.5.2p1. Contacting server on localhost ...
  220 chaffee AMANDA index server (2.5.2p1) ready.
  Setting restore date to today (2009-03-01)
  200 Working date set to 2009-03-01.
  200 Config set to DailySet1.
  501 Host chaffee is not in your disklist.
  Trying host chaffee.localdomain ...
  200 Dump host set to chaffee.localdomain.
  Use the setdisk command to choose dump disk to recover
  amrecover help
  
  From there, sethost, setdisk, setdate, and it looks like I'm on my
  way.
  
  But this is not The Way It's Supposed To Work, is it?

Not sure what you mean.  If someone configured the build of amanda
(specifically the amrecover part of amanda in this case) with
--with-index-server=localhost, then, yes, what you experienced is the
expected behavior.

If you are asking if most people configure amanda that way, I'd say
probably not, but who knows - I can say that I don't.  If you want,
you can take it up with the debian/ubuntu packager.  FWIW, the default
in the configure script if you don't specify --with-index-server is
`uname -n`.

If you want better control, you can build amanda from source yourself.


Re: amrestore: NAK: user root from localhost is not allowed to execute the service amindexd

2009-03-01 Thread John Hein
John Hein wrote at 17:49 -0700 on Mar  1, 2009:
  man amrecover (see -s  -t).  I don't know if there is a run-time
  configuration option for these (I didn't see one after a quick read
  of the man pages) - if so, -o would be of no help.
  ^^^ s/so/not/


Re: Timeout waiting for ack after adding dle

2009-03-01 Thread John Hein
Toomas Aas wrote at 10:13 +0200 on Mar  1, 2009:
  Sunday 01 March 2009 04:59:54 kirjutasid sa:
   Is this new DLE big?  Lots of files?
  
  The new DLE is not that big. Its 'raw capacity' is 21 GB, ca 25000 files, 
  but 
  most of it are MySQL and PostgreSQL database files which are excluded from 
  the DLE.
  
   It's also possible you're hitting a udp datagram size limit.  This can
   be improved with a sysctl tweak, or a source patch or using tcp
   (sorry - don't recall if amanda 2.5.1 supports the latter).
  
  Thanks for the idea, I'll increase the net.inet.udp.maxdgram sysctl. 

Long pathnames can exacerbate the udp issue, too.


  I also looked at sendbackup debug files on the client, but the only error 
  there is the same 'index tee cannot write [Broken pipe]':
.
.
  sendbackup: time 0.014: started index creator: /usr/local/bin/gtar -tf - 
  2/dev/null | sed -e 's/^\.//'
  sendbackup: time 469.114: index tee cannot write [Broken pipe]
  sendbackup: time 469.114: pid 11511 finish time Sat Feb 28 04:14:12 2009

It dies after 469 seconds.  That doesn't seem to be a data timeout
(dtimeout defaults to 1800 seconds).
But often when you have a biggish DLE with lots of excludes, it
can cause tar to be silent for extended periods of time and
trigger a timeout.


Re: amrestore: NAK: user root from localhost is not allowed to execute the service amindexd

2009-03-01 Thread John Hein
Charles Curley wrote at 18:54 -0700 on Mar  1, 2009:
  On Sun, Mar 01, 2009 at 05:49:20PM -0700, John Hein wrote:
   man amrecover (see -s  -t).  I don't know if there is a run-time
   configuration option for these (I didn't see one after a quick read
   of the man pages) - if so, -o would be of no help.
  
  Those set the index and tape servers, respectively. Supposedly, you
  can also do that with environmental variables. I tried environmental
  variables, and they didn't work.

Indeed, and the code seems to agree with the man page...

recover-src/amrecover.c:server_name = getenv(AMANDA_SERVER);
recover-src/amrecover.c:tape_server_name = getenv(AMANDA_TAPE_SERVER);

I believe it has worked for me in the past.


  I mean that amrecover should work on the client.

Yes, it does work on the client.


   If you are asking if most people configure amanda that way, I'd say
   probably not, but who knows - I can say that I don't.  If you want,
   you can take it up with the debian/ubuntu packager.  FWIW, the default
   in the configure script if you don't specify --with-index-server is
   `uname -n`.
  
  Which in a precompiled package would give you the host name of the
  build machine, rather useless for the rest of the universe.

Which is perhaps why they might override that with 'localhost'.  If I
were the packager, I'd probably pick some host name that you could
define as a good CNAME (or additional A record) in DNS, like 'backup',
but there's a risk of picking something that will clash for someone
out there.
Having it overridable in amanda.conf would be good for this
issue.  If it really is not, then it might make a simple project
for someone.


  How about having it call the OS to enquire, and providing an option
  to override?  But that has its own security problems.

Inquire what?  DNS?  Some LDAP map?  I think -s  -t should work as
a way to override - not sure why they didn't for you.  I don't
see a security issue since the server can decide which client
hosts to allow (except perhaps for spoofing issues, but if
you have problems with that, amanda may be the least of
your worries).


Re: Timeout waiting for ack after adding dle

2009-02-28 Thread John Hein
Toomas Aas wrote at 11:04 +0200 on Feb 28, 2009:
  I have a single-machine (client==server) setup which has been working  
  well for quite a long time. It's running Amanda 2.5.1p3 on FreeBSD 6.4.
  
  Yesterday I added a new disk to the machine, mounted it under /db and  
  added corresponding entry to the disklist. On tonights backup run,  
  Amanda backed up  first two small DLEs but all the rest (including the  
  newly added one) failed with:
  
  host.domain.ee  /usr lev 1  FAILED [cannot read header: got 0 instead  
  of 32768]
  host.domain.ee  /usr lev 1  FAILED [cannot read header: got 0 instead  
  of 32768]
  host.domain.ee  /usr lev 1  FAILED [too many dumper retry: [request  
  failed: timeout waiting for ACK]]
  
  This shouldn't be a firewall problem, since the firewall on the  
  machine is set to unconditionally pass all traffic on loopback  
  interface and I couldn't find any relevant dropped packets in the  
  firewall log. Also amcheck -c passes with no errors.
  
  I looked at the amdump.1 file, and the first indication of any problem  
  is on the 3rd DLE (which is the newly added one - coincidence?):
  
  driver: result time 2761.656 from chunker0: FAILED 00-5 [cannot  
  read header: got 0 instead of 32768]
  
  (2761 seconds is approximately 04:06 local time)
  
  Couldn't see anything wrong before that. In the server's general error  
  log there are just these messages tonight:
  
  Feb 28 04:14:12 host sendbackup[11511]: index tee cannot write [Broken pipe]
  Feb 28 04:15:02 host sendbackup[11632]: index tee cannot write [Broken pipe]

sendbackup is dying early - possible your timeouts are set too low
in amanda.conf.

Is this new DLE big?  Lots of files?

It's also possible you're hitting a udp datagram size limit.  This can
be improved with a sysctl tweak, or a source patch or using tcp
(sorry - don't recall if amanda 2.5.1 supports the latter).

The client debug files might tell more.  You didn't say you looked
at those.


Re: Amanda and older clients

2009-02-25 Thread John Hein
stan wrote at 16:13 -0500 on Feb 25, 2009:
  It appears that the mainstream development of Amanda has taken off in a
  direction that has/will result in making in impossible to compile on many
  existing platforms that have been historically supported by Amanda.
  
  While there are good reasons for this change, it represents a major loss of
  functionality for us, and I suspect many other long term Amanda users who
  depend on being able to use this package to backup their older clients.
  
  I have been discussing this issue at length, off list, with one of the
  developers of the project. His recommendation is that we create a client
  only version of Amanda that is a fork off of the 2.5.2.x branch of th
  tree. This version, as I understand it predates the need for glibc, which
  as I have just discovered is unsorted on may many hardware/software
  architectures. I think it also predates the need for pkg-configure, which
  does not seem to have the same portability issues as glib, but is IMHO an
  unnecessary build time dependency, given that configure was designed for,
  what I believe to be, the same need.
  
  I am thinking about volunteering to lead this effort, as we are in the
  middle of upgrading a fairly large Amanda installation at my work, and i
  have, at least, 3 OS/hardware pairs thta are not supported by glib.
  
  I would like to hear from other users of Amanda how they feel about this. i
  hope the collective wisdom of the list may help to provide some direction
  for my thoughts.

I am sympathetic to the needs of running old platforms.

But if you need to do so, at a certain point, it becomes an exercise of
self-maintenance.  It's like maintaining a 50 year old car.  You can't
just go to Napa and get a part sometimes.

Developing for a project like amanda is, to some extent, a juggling
exercise.  They (the developers) have to deal with a variety of OS's
of various ages.  I can understand the decision to depend on glib (not
glibc, BTW) from a portability aspect.  (I'm less convinced about
perl, but that's another matter).

glib was partly chosen _because_ it's more portable (again not glibc),
but it can sometimes have edge cases when using it on older systems.

This is a much more general question that applies to more than
just amanda.

But, that said, there is some effort expended to ensure that newer
amanda servers can speak to older clients (going the other way, new
client - old server, is another matter, but that works to a certain
extent, too).  So for older platforms, you _can_ (as others have
mentioned) just freeze the amanda version on the client.  Most, but
not all, of the new features one would be interested in are on the
server.

Answering your particular notion of forking amanda, it's also another
possibility to expend some effort to build the latest amanda on an old
system.  If you don't have to build the server code, it's a more
simplified task.  And if you have a set of patches to say, build on
old HP-UX, sometimes they can be applied in the current code (submit
to amanda-hackers).  At the least, you can put the patches up on the
wiki.  Anyway, that's another possibility for you to consider.



Re: Weird compression results for DLE using 'compress NONE' (nocomp-root)

2009-01-21 Thread John Hein
Tom Robinson wrote at 12:30 +1100 on Jan 22, 2009:
  I've got several disks that are showing weird compression results in the 
  amanda report. Here's one of them:
  
 DUMPER STATS  
   TAPER STATS  
  HOSTNAME DISK  L   ORIG-KB OUT-KB  COMP%  MMM:SS   
  KB/s MMM:SS KB/s
   
  -- ---
  host /disk 1   20316904063380  200.0   36:34 
  1852.3   6:27  10487.2
  
  
  Note the ORIG-KB blows out to twice the size! COMP% is 200.0...
  
  This happens on more that one disk actually. I chose this disk as it's 
  the biggest disk that I dump, it shows the most expansive blowout and I 
  noticed it first. This disk uses 'compress NONE' (dumptype is 
  nocomp-root). Some of the other disks showing compression weirdness are 
  using 'compress client fast' in their DLE's.

Smells like a factor of two error somewhere (512 byte blocks vs. 1024?).
What does 'env -i du -ks /disk' say?


Re: Weird compression results for DLE using 'compress NONE' (nocomp-root)

2009-01-21 Thread John Hein
John Hein wrote at 21:38 -0700 on Jan 21, 2009:
  Tom Robinson wrote at 12:30 +1100 on Jan 22, 2009:
I've got several disks that are showing weird compression results in the 
amanda report. Here's one of them:

   DUMPER STATS   
  TAPER STATS  
HOSTNAME DISK  L   ORIG-KB OUT-KB  COMP%  MMM:SS  
   KB/s MMM:SS KB/s
 
  -- ---
host /disk 1   20316904063380  200.0   36:34 
  1852.3   6:27  10487.2


Note the ORIG-KB blows out to twice the size! COMP% is 200.0...

This happens on more that one disk actually. I chose this disk as it's 
the biggest disk that I dump, it shows the most expansive blowout and I 
noticed it first. This disk uses 'compress NONE' (dumptype is 
nocomp-root). Some of the other disks showing compression weirdness are 
using 'compress client fast' in their DLE's.
  
  Smells like a factor of two error somewhere (512 byte blocks vs. 1024?).
  What does 'env -i du -ks /disk' say?

Never mind that last request... your report above shows a level 1, not
0.  So du output won't be a useful comparision to the numbers above.
Does it behave the same (x2) for level 0 dumps, too?


Re: perl errors in taper debug files

2008-12-19 Thread John Hein
Jean-Francois Malouin wrote at 14:00 -0500 on Dec 19, 2008:
  * Dustin J. Mitchell dus...@zmanda.com [20081219 13:51]:
   On Fri, Dec 19, 2008 at 1:28 PM, Jean-Francois Malouin
   jean-francois.malo...@bic.mni.mcgill.ca wrote:
for module Amanda::Types: 
/usr/local/share/perl/5.8.8/auto/Amanda/Types/libTypes.so: cannot open 
shared object file: No such file or
directory at /usr/lib/perl/5.8/DynaLoader.pm line 225.
   
Should I be worried? I guess this is part of the new API
and all the swig stuff which I'm totally clueless about.
Is something not quite kosher in my local perl setup?
   
   Yep.  Does that .so file exist?  Sometimes libdl gives No such file
   or directory when it can't find the *dependencies* of a shared
   object.  Assuming libTypes.so exists, what does 'ldd' tell you about
   it?  What platform is this?  Was it installed from a package or from
   source?
  
  Dustin, 
  
  Yes, it exists: see my other post about the output of ldd.
  
  I think the problam might be related the having libglib dso's in
  /opt/lib64 (installed from source) and not having this in
  LD_LIBRARY_PATH.
  
  This is on Debian/Etch running 2.6.26.5-i686-64-smp.

Did the ldd you ran have a different env than amanda (i.e., did your
ldd run have LD_LIBRARY_PATH set?)?

If it is just an LD_LIBRARY_PATH issue, you may want to consider
putting /opt/lib64 into your ldconfig config (/etc/ld.*conf* under
linux).

Since you're on linux, you can run 'ldd -v ...' to chase down
missing dependencies in the dependencies.


Re: Tape changer question

2008-12-13 Thread John Hein
John Hein wrote at 16:00 -0700 on Dec 13, 2008:
  Tim Bunnell wrote at 17:03 -0500 on Dec 13, 2008:
Folks,

We're running Amanda (version 2.5.1p1) on a Debian Linux system with an 
8-tape AIT-2 library. We have around 314GB spread over two file systems 
that we are attempting to backup in one run across as many tapes as 
necessary. We're using gzip compression and expect it will take 5-6 
tapes to complete (there's a fair amount of audio and image data that 
doesn't compress too well).

I think we have the config files set up correctly, but it seems like no 
matter what we do, the run stops (after about 16 hours) and reports that 
it's out of tape. I don't think it has ever succeeded in spanning more 
than 4 tapes before giving us the error. I see nothing in the .debug 
output for the changer that looks different for any tapes it changes.

I'm sort of at a loss for where to start looking for the problem, and 
what to look for. Any suggestions from the list?
  
  Do you really know the tape capacity for your tapes?
  Some AIT-2 flavors are 36 GB, some are 50 GB, it seems.
  36*8  314
  
  Have you run amtapetype to verify?
   (see http://wiki.zmanda.com/index.php/Tapetype_definitions)
  Do you have hardware compression off?

Sorry, I just re-read and saw that it only used 4 tapes.
What is runtapes set to?
Somewhere in the logs, it should explain why it's not
going past 4.


Re: Tape changer question

2008-12-13 Thread John Hein
Tim Bunnell wrote at 17:03 -0500 on Dec 13, 2008:
  Folks,
  
  We're running Amanda (version 2.5.1p1) on a Debian Linux system with an 
  8-tape AIT-2 library. We have around 314GB spread over two file systems 
  that we are attempting to backup in one run across as many tapes as 
  necessary. We're using gzip compression and expect it will take 5-6 
  tapes to complete (there's a fair amount of audio and image data that 
  doesn't compress too well).
  
  I think we have the config files set up correctly, but it seems like no 
  matter what we do, the run stops (after about 16 hours) and reports that 
  it's out of tape. I don't think it has ever succeeded in spanning more 
  than 4 tapes before giving us the error. I see nothing in the .debug 
  output for the changer that looks different for any tapes it changes.
  
  I'm sort of at a loss for where to start looking for the problem, and 
  what to look for. Any suggestions from the list?

Do you really know the tape capacity for your tapes?
Some AIT-2 flavors are 36 GB, some are 50 GB, it seems.
36*8  314

Have you run amtapetype to verify?
 (see http://wiki.zmanda.com/index.php/Tapetype_definitions)
Do you have hardware compression off?


Re: amrecover ignores --prefix

2008-11-07 Thread John Hein
Steven Backus wrote at 17:50 -0700 on Nov  6, 2008:
I'm doing a trial installation of amanda and don't want to mess
  up my regular install, so I compiled with --prefix=/local.
  Regardless of this, amrecover is still looking for my config files
  in /usr/local/etc/amanda/config instead of /local/etc/amanda/config.
  Is this expected behavior, a bug or ?  I'm using amanda-2.6.0p2.

What does 'amadmin yourconfig version' say for CONFIG_DIR?

FWIW, --prefix works for me.


Re: Stranded on waitq failure (planner: Message too long)

2008-11-07 Thread John Hein
Leon Me=DFner wrote at 12:22 +0100 on Nov  6, 2008:
  Hi,
  =

  On Wed, Nov 05, 2008 at 06:46:52PM -0500, Jean-Louis Martineau wrote:
   Ian Turner wrote:
   I don't know if 2.5.1 is old enough to qualify for this issue, but =
it used =

   to be the case that the entire set of disklists for a client had to=
 fit in =

   a single packet. What that meant is that if you had more than a few=
 dozen =

   disks on one client (depending on disklist options), you would run =
into =

   this issue.
  =

  On this backupset i have 28 dle's.
  =

   =

   The solution is to upgrade, but a workaround is to create a second =
IP =

   address and DNS name on the same physical client, and move some of =
the =

   disklist entries to the latter.
  =

  I'm running the latest Amanda from the Ports. Perhaps i should ask the=

  maintainer about updating to 2.6.x. I don't know why the port uses thi=
s
  old version. The maintainer seems to be quite active.
  =

   Or change to the 'bsdtcp' auth.
  =

  =

  Thanks for your solutions, changing net.inet.udp.maxdgram to 65535
  helped also (FreeBSD's default is 9k ;).

I've been applying the following patch to bump up the max datagram
size since amanda 2.4.1 (minor differences per version mostly due to
changes in dbprintf) when I started seeing packet size limit problems
even with a modest number of DLEs (it was a particular DLE that had a
lot of files that first caused the problem).

It works on all OS's, not just BSDs.  At one point I submitted it on
-hackers, but it never got committed.  The maxdgram sysctl is global
to the system.  This patch gives you a little finer control.

Against 2.5.1p1 - 2.5.x ...

--- common-src/dgram.c.orig Wed Sep 20 06:48:54 2006
+++ common-src/dgram.c  Wed Sep 27 13:43:07 2006
@@ -57,6 +57,7 @@ dgram_bind(
 socklen_t len;
 struct sockaddr_in name;
 int save_errno;
+int sndbufsize =3D MAX_DGRAM;
 =

 *portp =3D (in_port_t)0;
 if((s =3D socket(AF_INET, SOCK_DGRAM, 0)) =3D=3D -1) {
@@ -75,6 +76,10 @@ dgram_bind(
errno =3D EMFILE;   /* out of range */
return -1;
 }
+if (setsockopt(sock, SOL_SOCKET, SO_SNDBUF, (void *) sndbufsize,
+   sizeof(sndbufsize))  0)
+   dbprintf((%s: dgram_bind: could not set udp send buffer to %d\n,
+ debug_prefix(NULL), sndbufsize));
 =

 memset(name, 0, SIZEOF(name));
 name.sin_family =3D (sa_family_t)AF_INET;


Against 2.6.x ...

--- common-src/dgram.c.orig Fri May 30 11:44:36 2008
+++ common-src/dgram.c  Fri Aug 22 13:19:56 2008
@@ -250,6 +250,7 @@
 socklen_t_equiv addrlen;
 ssize_t nfound;
 int save_errno;
+int sndbufsize =3D MAX_DGRAM;
 =

 sock =3D dgram-socket;
 =

@@ -286,6 +287,10 @@
errno =3D save_errno;
return nfound;
 }
+if (setsockopt(sock, SOL_SOCKET, SO_SNDBUF, (void *) sndbufsize,
+   sizeof(sndbufsize))  0)
+   dbprintf(%s: dgram_bind: could not set udp send buffer to %d\n,
+ strerror(save_errno), sndbufsize);
 =

 addrlen =3D (socklen_t_equiv)sizeof(sockaddr_union);
 size =3D recvfrom(sock, dgram-data, (size_t)MAX_DGRAM, 0,


Re: DUMP: You can't update the dumpdates file when dumping a subdirectory

2008-10-03 Thread John Hein
Dustin J. Mitchell wrote at 00:41 -0400 on Oct  3, 2008:
  On Thu, Oct 2, 2008 at 9:03 PM, Aaron J. Grier [EMAIL PROTECTED] wrote:
   I beg your pardon, but /sbin/dump is perfectly capable of dumping
   subdirectories on most unixes.  it just won't record (or read) the date
   of the dump in /etc/dumpdates.
  
  I'm happy to be proven wrong (I've not used dump myself), but it was
  my understanding that dump, in general, worked at the filesystem
  level, against a block device.  For example, the OSX manpage (which is
  just inherited from the BSD manpages) says:
  
   dump [-0123456789cnu] [-B records] [-b blocksize] [-d density]
[-f file] [-h level] [-s feet] [-T date] filesystem
  
  where the use of the term filesystem means, to my understanding, a
  filesystem and not an arbitrary subdirectory.  Now, you may have a
  filesystem mounted at /usr/local, in which case you can use dump to
  back up /usr/local, but I don't think that's what you meant.
  
  Can you point to some documentation to support your assertion?

This was touched on as part of a larger thread in April.

http://www.mail-archive.com/amanda-users@amanda.org/msg40187.html

In short, some OS's support a dump with files, but it's not clear how
well that is supported (or will work at all) in amanda, particularly
with incremental dumps.

I think it'd be great if we could use the native filesystem's dump (or
dump-like) tool rather than gtar.  Then you can backup filesystem
specific attributes that gtar may not handle.  (and you don't touch
atime, which has always been an annoying quibble with having to use
tar).  Having to dump the whole filesystem, of course, is really the
big hurdle with using a dump or dump-like tool.


Re: amandad args in inetd.conf

2008-09-26 Thread John Hein
[EMAIL PROTECTED] wrote at 13:09 -0400 on Sep 26, 2008:
  Hmm, thanks for mentioning.  If I condense down my browser window,
  the text goes beyond the bounding box but I can still scroll over
  to see the full line of text.  Are you not getting the same?

Exactly the same.  I was wondering if there is a wiki markup directive
to have the bounding box extend to the end of the text rather than the
border of the window.


   And somehow between 2.5  2.6, the .amandahost docs in amanda(8) lost
   the docs for the 'service(s)' field.
  
  
  I'm seeing the following in 2.6.0p2's amanda(8):

Woops.  My fault... bad zgrep for the strings in the various flavors
of amanda I have installed (man pages in the 2.6* flavors moved into
prefix/share/man instead of prefix/man).


Re: amandad args in inetd.conf

2008-09-24 Thread John Hein
Paul Yeatman wrote at 16:59 -0700 on Sep 24, 2008:
  Online Amanda documentation for inetd.conf configuration for both
  server and client are found on the Amanda wiki site here
  
  server:
  http://wiki.zmanda.com/index.php/Configuring_bsd/bsdudp/bsdtcp_authentication#xinetd.2Finetd_configuration_file_changes

Thanks.  Perhaps someone should add this info to the man pages.

I wonder how one might go about fixing the wikimedia markup to
extend the bounding box in the 'inetd.conf' example (assuming
your browser window is not really wide).

And somehow between 2.5  2.6, the .amandahost docs in amanda(8) lost
the docs for the 'service(s)' field.


  client:
  http://wiki.zmanda.com/index.php/Quick_start#Configuring_inetd_on_the_client

Thanks again.


amandad args in inetd.conf

2008-09-23 Thread John Hein
Where are the docs for what args need to be added to amandad in
inetd.conf?

I added amindexd and amidxtaped on the backup server in order to do
amrecover, but then amcheck failed (needed noop, then selfcheck).
Then amdump failed (needed sendsize, ...).

I see the full list in amandad.c, and I think I understand why clients
don't need the addition.  It defaults to all services except amindexd
and amidxtaped being active.  But when you activate those two on the
server, it seems it _de_activates the others.

I didn't see it in the docs in the tarball and the wiki search leaves
a bit to be desired.  Where is it documented for the non-source diving
crowd?

Also, why would anyone want noop, selfcheck, sendbackup, etc.,
disabled?  Is it a use case that some people don't back up their
backup servers?

I've been using a 2.4.5 server for so long, I've missed some details.


Re: amandad args in inetd.conf

2008-09-23 Thread John Hein
Olivier Cherrier wrote at 18:46 -0400 on Sep 23, 2008:
  On Tue, Sep 23, 2008 at 10:27:26AM -0600, [EMAIL PROTECTED] wrote:
   Where are the docs for what args need to be added to amandad in
   inetd.conf?
   
   I added amindexd and amidxtaped on the backup server in order to do
   amrecover, but then amcheck failed (needed noop, then selfcheck).
   Then amdump failed (needed sendsize, ...).
   
  I think you have to populate your ~amandauser/.amandahosts with
  something like that (amandauser = operator for me) :
  yourHost operatoramdump
  
  From amanda(8): amdump is a shortcut for noop selfcheck sendsize
  sendbackup

Ah... difference between amanda-2.5 and amanda-2.6.  At least a
documentation difference.  2.5.* seems to be missing that
documentation in the context of inetd.conf and 2.6.* doesn't have that
tidbit at all (in fact it doesn't seem to support the service entry in
.amandahosts in the 2.6 man page).  I'm not sure if that's an
oversight or deliberate.

I also wonder if that alias is valid in inetd.conf.


   I see the full list in amandad.c, and I think I understand why clients
   don't need the addition.  It defaults to all services except amindexd
   and amidxtaped being active.  But when you activate those two on the
   server, it seems it _de_activates the others.
   
  I myself configured inetd.conf like that:
  $ grep amandad /etc/inetd.conf 
  amandadgram  udp wait   operator /usr/local/libexec/amanda/amandad 
  amandad amindexd amidxtaped

So you don't backup your server?  Or you use .amandahosts to specify
the services that amandad can run on the server?  I guess I don't
see a big difference between editing .amandahosts vs. inetd.conf


Re: dumpcycle

2008-09-23 Thread John Hein
John Hein wrote at 17:22 -0600 on Sep 23, 2008:
  Richard Stockton wrote at 14:28 -0700 on Sep 23, 2008:
How do I force a full dump under ALL circumstances?
(I do have my disklist set to always-full).
  
  Put these in your global dumptype settings...
  strategy noinc
  skip-incr

You probably also want to set 'reserve 0'.


Re: dumpcycle

2008-09-23 Thread John Hein
Richard Stockton wrote at 14:28 -0700 on Sep 23, 2008:
  How do I force a full dump under ALL circumstances?
  (I do have my disklist set to always-full).

Put these in your global dumptype settings...
strategy noinc
skip-incr


Re: dumpcycle

2008-09-23 Thread John Hein
Richard Stockton wrote at 16:42 -0700 on Sep 23, 2008:
  At 04:29 PM 9/23/2008, you wrote:
How do I force a full dump under ALL circumstances?
(I do have my disklist set to always-full).
  
  Is your holding disk sufficiently large?
  
  Yes.  260 Gigs to hold about 240 Gigs of backup in 2 pieces (100+140)
  
  Have you changed the reserve for incrementals parameter
  from its default 100%?
  
  I have now.  I have also set (in the always-full dumptype):
   strategy noinc
   skip-incr
  
  Do I need all three set as above?  Or would just the reserve 0
  take care of it?

I _think_ noinc is sufficient, but I'm not sure.  I've always added
'reserve 0' in our everything level 0 config.  And I think that's a
good idea (if you definitely don't want amanda to try dumping
incrementals in degraded mode) in case of running out of tapes, etc.

I don't think skip-incr is necessary, but we do have it.

I don't know about dumpcycle 0, but I don't use it.

I've always considered that amanda treats dumpcycle as advisory.  If
it it can meet the dumpcycle number, it will, but between promotions
(not really applicable for dumpcycle 0) and delays (usually due to
space), it's not guaranteed.


Re: planner: could not lock log file

2008-09-12 Thread John Hein
Taalaibek Ashirov wrote at 14:58 +0300 on Sep 12, 2008:
   Let me recant, it may be amanda at fault or a combination.  If you add
   the following to the initialization of 'lock' in the test, does your
   problem go away?
   
   lock.l_start = 0;
   lock.l_len = 0;
   
   If the short test passes, add those lines to amflock() in amanda's
   common-src/amflock.c, rebuild amanda and give it another shot.
  
  Yes, in the test the problem goes away. Actually, I've installed amanda
  from ports. Now I deinstalled amanda and got latest source version from
  amandas website, added those lines to amflock(). When I did make it gave
  me error like lock is not declared. I declared lock same as test
  (struct flock lock;). This time I got this:
  
  ...
  creating amflock-test
  make: don't know how to make amgpgcrypt. Stop
  *** Error code 2
  Stop in /root/amanda-2.6.0p2/common-src.
  *** Error code 1
  Stop in /root/amanda-2.6.0p2.
  web# 
  
  Is there any thing else I missed?

That's likely a 'configure' problem (to investigate, you could look at
differences in the configure stage between your build from source and
the ports build from source - the config.log file shows what was
passed to configure).

Maybe just go back to building from the port...

cd ports/misc/amanda-server
make patch
add the lines to amflock.c
make
sudo make install


Re: planner: could not lock log file

2008-09-12 Thread John Hein
Dustin J. Mitchell wrote at 09:40 -0400 on Sep 12, 2008:
   creating amflock-test
   make: don't know how to make amgpgcrypt. Stop
   *** Error code 2
   Stop in /root/amanda-2.6.0p2/common-src.
   *** Error code 1
   Stop in /root/amanda-2.6.0p2.
   web#
  
  Amanda requires GNU make now (gmake).

Indeed.  I'm so used to visually transforming make to gmake (since
gmake is spelled make in linux) that I missed that detail of the
problem report.

I think amanda has required gmake for quite a while, hasn't it?


Re: Single tape changer process

2008-09-12 Thread John Hein
Dustin J. Mitchell wrote at 09:32 -0400 on Sep 12, 2008:
  On Thu, Sep 11, 2008 at 11:53 PM, Olivier Nicole [EMAIL PROTECTED] wrote:
   Is there a mechanism in Amanda to ensure that only a single tape
   changer process is running at any given time?
  
  No -- and this poses a problem for processes that want to move data
  between devices.  I'm working on such a process, and for that reason
  I'm working on an overhaul of the changer API at the moment.  The key
  problem with the existing API is that it has no way to indicate that a
  process is finished with a device, and that the changer can load a new
  volume into that device.

I use lockf(1) in a wrapper script to protect accesses to a resource
where amanda (currently) does not.


Re: planner: could not lock log file

2008-09-11 Thread John Hein
Taalaibek Ashirov wrote at 10:31 +0300 on Sep 11, 2008:
  On Wed, 2008-09-10 at 09:56 -0600, John Hein wrote:
   What happens when you compile and run this (as the backup user)?
   
   #include err.h
   #include fcntl.h
   #include stdio.h
   int
   main()
   {
   struct flock lock;
   int fd = open(/var/log/amanda/dotProject/foo, O_RDWR | O_CREAT);
   if (fd  0) err(1, open);
   
   lock.l_type = F_WRLCK;
   lock.l_whence = SEEK_SET;
   int r = fcntl(fd, F_SETLKW, lock);
   if (r  0) err(1, fnctl);
   return 0;
   }
  
  Hi John! Thank you for your efforts. I got the same error:
  
  $ ./test
  test: fnctl: Invalid argument

Then it's an issue with your system somehow, not amanda.

Looking at src/sys/kern/kern_descrip.c, you can get EINVAL is if you
pass in l_type that is not F_RDLCK, F_WRLCK or F_UNLCK.

Try adding the printf below and rebuilding your kernel.  Then run the
above test.  Look for the printf in dmesg (or /var/log/messages if you
are using a default syslog.conf).

Index: kern_descrip.c
===
RCS file: /base/FreeBSD-CVS/src/sys/kern/kern_descrip.c,v
retrieving revision 1.279.2.15.2.1
diff -u -p -r1.279.2.15.2.1 kern_descrip.c
--- kern_descrip.c  14 Feb 2008 11:46:40 -  1.279.2.15.2.1
+++ kern_descrip.c  11 Sep 2008 13:17:25 -
@@ -533,6 +533,7 @@ kern_fcntl(struct thread *td, int fd, in
flp, F_POSIX);
break;
default:
+printf(invalid l_type: %#x\n, flp-l_type);
error = EINVAL;
break;
}



Re: planner: could not lock log file

2008-09-11 Thread John Hein
John Hein wrote at 07:19 -0600 on Sep 11, 2008:
  Taalaibek Ashirov wrote at 10:31 +0300 on Sep 11, 2008:
On Wed, 2008-09-10 at 09:56 -0600, John Hein wrote:
 What happens when you compile and run this (as the backup user)?
 
 #include err.h
 #include fcntl.h
 #include stdio.h
 int
 main()
 {
 struct flock lock;
 int fd = open(/var/log/amanda/dotProject/foo, O_RDWR | O_CREAT);
 if (fd  0) err(1, open);
 
 lock.l_type = F_WRLCK;
 lock.l_whence = SEEK_SET;
 int r = fcntl(fd, F_SETLKW, lock);
 if (r  0) err(1, fnctl);
 return 0;
 }

Hi John! Thank you for your efforts. I got the same error:

$ ./test
test: fnctl: Invalid argument
  
  Then it's an issue with your system somehow, not amanda.
  
  Looking at src/sys/kern/kern_descrip.c, you can get EINVAL is if you
  pass in l_type that is not F_RDLCK, F_WRLCK or F_UNLCK.
  
  Try adding the printf below and rebuilding your kernel.  Then run the
  above test.  Look for the printf in dmesg (or /var/log/messages if you
  are using a default syslog.conf).
  
  Index: kern_descrip.c
  ===
  RCS file: /base/FreeBSD-CVS/src/sys/kern/kern_descrip.c,v
  retrieving revision 1.279.2.15.2.1
  diff -u -p -r1.279.2.15.2.1 kern_descrip.c
  --- kern_descrip.c   14 Feb 2008 11:46:40 -  1.279.2.15.2.1
  +++ kern_descrip.c   11 Sep 2008 13:17:25 -
  @@ -533,6 +533,7 @@ kern_fcntl(struct thread *td, int fd, in
   flp, F_POSIX);
   break;
   default:
  +printf(invalid l_type: %#x\n, flp-l_type);
   error = EINVAL;
   break;
   }

Let me recant, it may be amanda at fault or a combination.  If you add
the following to the initialization of 'lock' in the test, does your
problem go away?

lock.l_start = 0;
lock.l_len = 0;

If the short test passes, add those lines to amflock() in amanda's
common-src/amflock.c, rebuild amanda and give it another shot.


Re: planner: could not lock log file

2008-09-10 Thread John Hein
Taalaibek Ashirov wrote at 15:16 +0300 on Sep 10, 2008:
  I am using FreeBSD 6.3 and amanda 2.5.1p3. Running amcheck results no
  errors. All the debug files result no
  errors. 
  
  But I got Result Missing errors back in the amreport after 
  amdump. Below is the error in amdump.1 file under the log location. The
  log.error.1 file is created without anything in it.
  
  What can I do to fix this error? Please help. Thank you very much. 
  
  
  READING CONF FILES...
  driver: pid 67085 executable /usr/local/libexec/amanda/driver version
  2.5.1p3
  planner: could not lock log file /var/log/amanda/dotProject/log: Invalid
  argument
  driver: could not lock log file /var/log/amanda/dotProject/log: Invalid
  argument
  -
  
  $amadmin x version | grep LOCKING
  LOCKING=POSIX_FCNTL DEBUG_CODE AMANDA_DEBUG_DAYS=4


From the fcntl(3) man page...

 [EINVAL]   The cmd argument is F_DUPFD and arg is negative or
greater than the maximum allowable number (see
getdtablesize(2)).

The argument cmd is F_GETLK, F_SETLK or F_SETLKW and
the data to which arg points is not valid.


And the amanda code to do the lock...

int
amflock(
int fd,
char *  resource)
{
int r;

#ifdef USE_POSIX_FCNTL
(void)resource; /* Quiet unused paramater warning */
lock.l_type = F_WRLCK;
lock.l_whence = SEEK_SET;
r = fcntl(fd, F_SETLKW, lock);
#else

It's not immediatly obvious why this would cause EINVAL.

Out of curiosity, what is the output of 'df /var/log/amanda/dotProject'?


Re: caution: gtar 1.20 amanda 2.5.1

2008-09-07 Thread John Hein
Dustin J. Mitchell wrote at 10:01 -0400 on Sep  7, 2008:
  On Sun, Sep 7, 2008 at 12:44 AM, John Hein [EMAIL PROTECTED] wrote:
   There was a zmanda wiki page that described issues with various gnutar
   and amanda version combinations, but I can't seem to find it at the
   moment (the search doesn't turn it up that I saw).
  
   If someone finds it, let me know and I'll try to see that this info
   gets added.
  
  The link is:

  http://wiki.zmanda.com/index.php/FAQ:What_versions_of_GNU_Tar_are_Amanda-compatible%3F
  which does specify that those versions will not get along, but not for
  the reason you described, so it's definitely worth an edit.

Thanks.  I'll update it, including patches to fix both issues.

Just curious... Why doesn't a wiki search for gnutar or gnu tar find this?


Re: caution: gtar 1.20 amanda 2.5.1

2008-09-07 Thread John Hein
Jon LaBadie wrote at 01:57 -0400 on Sep  7, 2008:
  On Sun, Sep 07, 2008 at 01:46:50AM -0400, Jon LaBadie wrote:
   On Sat, Sep 06, 2008 at 10:44:11PM -0600, John Hein wrote:
Someone may already know about this, but using gtar  1.15.1 and
amanda  2.5.1 will not work very well.

The format of the listed incremental file has changed.  Among other
things, the entries are now separated by '\0' null bytes rather than
newlines.  [I'm not exactly sure why since it doesn't save any space
and I don't think '\n' is a valid character in a posix file name].
   
   After a quick search I did not find a reference for this, but I'd
   be surprised if posix did not allow \n as a valid file name char.
   For the multiple decades I've used unix, it has always been valid.
   If not specifically allowed, it may be one of those undefined
   things that leaves it to the locale or character set.
  
  I just missed it.
  
  Only characters not allowed are slash '/' and null byte '\0'.

Indeed.  I just created a file with a \n.  Finding different ways to
accessing it via shell can provide hours of fun.

So let me rephrase:

I can't think of any reason anyone would want to put '\n' in a file
name... except to make access to it harder.  ;)


caution: gtar 1.20 amanda 2.5.1

2008-09-06 Thread John Hein
Someone may already know about this, but using gtar  1.15.1 and
amanda  2.5.1 will not work very well.

The format of the listed incremental file has changed.  Among other
things, the entries are now separated by '\0' null bytes rather than
newlines.  [I'm not exactly sure why since it doesn't save any space
and I don't think '\n' is a valid character in a posix file name].

This causes trouble for amanda  2.5.1 which tries to read in the
old snapshot file and copy it to a new one in a fgets/fputs loop
which explicitly appends a newline to the copy whether there was one
in the original or not.  With the old format (gnutar = 1.15.1), this
was not a problem since it used newlines.

Not only does amanda add newlines (which gtar 1.20 chokes on because
it explicitly looks for '\0' characters and dies a fatal death on
seeing '\n' instead - see read_unsigned_num() in src/incremen.c), but
it also truncates the file because amanda stops processing the file
early due to the null bytes.


As a result, you will see things like so...

  yoyoma  / lev 0 FAILED [dumps too big, 1 KB, but no incremental estimate]

and/or

  planner: disk furble:/usr, estimate of level 1 failed: -1.


There are some other oddities that I haven't fully figured out yet
(I'm not sure they are fatal), but without patching
client-src/sendbackup-gnutar.c to do what 2.5.1 and later does (or
updating amanda), this problem is a showstopper.

This could be particularly troublesome for clients that you can't
update to a newer amanda (but more than likely in that case, you won't
be updating them to gnutar 1.20 either).

There was a zmanda wiki page that described issues with various gnutar
and amanda version combinations, but I can't seem to find it at the
moment (the search doesn't turn it up that I saw).

If someone finds it, let me know and I'll try to see that this info
gets added.


Separate issue, but worth a mention...

Note also that changing from gnutar = 1.15.1 to a later version (if
you already have incremental dump files in your gnutar-lists
directories) will cause some very large incremental dumps to happen
because of some details of the format change that I won't go into
here.


dumps way too big, must skip incremental dumps

2008-03-06 Thread John Hein
Yes, it's the classic problem.  I understand the cause, but I have a
question.

a little background for those who don't know about this one...
=
Last night I got a bunch of these...

  elmer  /hr lev 3 FAILED [dumps way too big, 9065 KB, must skip 
incremental dumps]

As a result, lots of DLEs were just skipped.  This can happen if, for
instance, one DLE had lots of changes and its incremental dump is much
larger than normal.  Or you have one DLE that is extremely large and
when it does its level 0 dump, it squeezes out all the other DLEs.  So
the planner decides it can't fit all the dumps for the configured tape
size.


The question...
=
It's been a while since I've looked at all the shiny new knobs that
amanda has grown (generally we just leave amanda as is), but have we
grown anything that allows amanda to at least _try_ to dump such
'skipped' incrementals to holding disk?

[note... our server is still amanda-2.4.2p2, so maybe this has been
 fixed in a different way in 2.5 or later and I just haven't noticed]


incremental with gnutar bogusly dumping old files

2007-09-26 Thread John Hein
The other night, a number of incremental dumps included a lot of files that
should not have been dumped.

As a result, I got a number of 'dumps way too big' failure messages
causing a number of DLEs to not get dumped since the planner decided
there was no room.

For example, I have a old file, foo, somewhere under directory /xxx...

stat -x foo
  File: foo
  Size: 296422   FileType: Regular File
  Mode: (0444/-r--r--r--) Uid: (  631/ nrg)  Gid: ( 2005/ web)
Device: 0,86   Inode: 28006660Links: 1
Access: Tue Sep 25 06:29:20 2007
Modify: Mon Feb 23 14:34:18 2004
Change: Mon Sep 17 15:04:20 2007

zgrep foo index/host/_xxx_4.gz
foo

and from /tmp/amanda/sendbackup.20070925052340.debug on the client...

sendbackup-gnutar: time 0.433: doing level 4 dump as listed-incremental from 
/local/backup/amanda/gnutar-lists/xxx_3 to 
/local/backup/amanda/gnutar-lists/xxx_4.new
sendbackup-gnutar: time 0.477: doing level 4 dump from date: 2007-09-18  
9:31:24 GMT
sendbackup: time 0.530: spawning /site/dist/amanda-2.4.5/libexec/runtar in 
pipeline
sendbackup: argument list: gtar --create --file - --directory /xxx 
--one-file-system --listed-incremental 
/local/backup/amanda/gnutar-lists/xxx_4.new --sparse --ignore-failed-read 
--totals --exclude-from /tmp/amanda/sendbackup._xxx.20070925052341.exclude .

grep xxx.3 /etc/amandates
/r/cvs 3 1190107884

env TZ=GMT date -r 1190107884
Tue Sep 18 09:31:24 GMT 2007

This matches debug output above, and is well beyond the mtime of the
file in question: Feb 23, 2004.

head -1 gnutar-lists/xxx_3
1190113536

gtar 1.15.1
amanda 2.4.5

 From the above debug, it looks like amanda is doing the right thing.

The only thing I can think of is an obscure gtar bug that doesn't work
with certain dates, or the date at the top of xxx_4.new (used by
--listed-incremental I believe) was somehow wrong after it got copied
from xxx_3.  There were no 'filesystem full' problems.

Before last night, it did 7 nights at level 4 with no such problems.
Last night, it pretty much did a level 0 as far as I can see (the
index file looks like it has every file under /xxx) even though it
claimed to be doing a level 4.

Some files did change under /xxx, but the vast majority did not and
still got dumped by gtar.

The estimate was way too big, too...
planner: time 12274.025: got result for host yiff disk /xxx: 0 - 9389110K, 4 
- 9389100K, -1 - -2K

Has anyone else ever seen this behavior?