subject:"Re\: Estimate timeout"

Re: estimate timeout and dump failure

2006-10-06 Thread Gene Heskett

On Friday 06 October 2006 10:32, Mike Galvez wrote:

Mike, I think this is a known bug, you need to update to one of the 2.5.1p1 
versions.  It bit lots of us.

See 

I ran the 20061004 snapshot last night, and it worked just fine.

>I am using version 2.5.0p2. on my dump host. One of my clients (same
> version) has a filesystem that consistently fails the estimate and dump
> phases. The same host has two other filesystems (smaller) that complete
> without problem. The amandad and selfcheck debug from this host shows no
> indication of problems.
[...]

If the client is a slower client, you might have to enlarge the 'etimeout' 
and 'dtimeout' settings too, but I believe that won't fix the bug I 
referred to.  Really big estimates and dumps will exceed those defaults, 
and you didn't say how big they might be.  My largest dle is about 9GiB, 
and ISTR I've had those doubled for years.  My one client is a little 
slow, its only a 500mhz k6 with 320megs of ram.  I have things divided up 
into usually not more than 2GiB dle's, some considerably smaller, so 
amanda can have a ball balancing things.  About 55GiB total.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2006 by Maurice Eugene Heskett, all rights reserved.

Re: Estimate timeout from localhost / driver: WARNING: got empty schedule from planner

2006-04-26 Thread Thomas Grieder

Joshua Baker-LePain wrote:
> On Tue, 25 Apr 2006 at 9:22am, Thomas Grieder wrote
> 
>> Since a few days I get this two error messages:
>>
>> Estimate timeout from localhost
>> driver: WARNING: got empty schedule from planner
> 
> 1) Using localhost in the disklist is generally frowned upon.  It's not a
>unique name, and will likely come back to bite you someday.
> 
> 2) Look in /tmp/amanda at the *debug files -- most likely sendize*debug
>and/or amandad*debug will have more info as to what's going wrong.
> 

Thanks, backup is working well now.

Thomas

Re: Estimate timeout from localhost / driver: WARNING: got empty schedule from planner

2006-04-25 Thread Joshua Baker-LePain


On Tue, 25 Apr 2006 at 9:22am, Thomas Grieder wrote


Since a few days I get this two error messages:

Estimate timeout from localhost
driver: WARNING: got empty schedule from planner


1) Using localhost in the disklist is generally frowned upon.  It's not a
   unique name, and will likely come back to bite you someday.

2) Look in /tmp/amanda at the *debug files -- most likely sendize*debug
   and/or amandad*debug will have more info as to what's going wrong.

--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University

Re: Estimate Timeout Issue - Dump runs fine

2005-11-03 Thread Tom Brown



OK thanks - I have increased the etimeout to 2400 seconds and also 
changed the udp timeout within checkpoint to also be 2400 seconds so 
i'll see how the run goes tonight


everything was fine today - no estimate timeout

thanks for the pointer

Re: Estimate Timeout Issue - Dump runs fine

2005-11-02 Thread Tom Brown



Yep.  So you can just increase etimeout and/or figure out why 
"/sbin/dump 1Ssf 1048576 - /dev/sda5" is taking so long.


OK thanks - I have increased the etimeout to 2400 seconds and also 
changed the udp timeout within checkpoint to also be 2400 seconds so 
i'll see how the run goes tonight


thanks

Re: Estimate Timeout Issue - Dump runs fine

2005-11-02 Thread Joshua Baker-LePain


On Wed, 2 Nov 2005 at 2:31pm, Tom Brown wrote

Look in /tmp/amanda/sendsize*debug and/or amandad*debug to see how long the 
estimate is actually taking.  Also, what do your iptables rules look like 
on the server?


thanks - iptables are not being used, local firewall is off



one of my amanda.debugs does have this at the bottom of it

amandad: time 2193.716: dgram_recv: timeout after 10 seconds
amandad: time 2193.716: waiting for ack: timeout, retrying
amandad: time 2203.716: dgram_recv: timeout after 10 seconds
amandad: time 2203.716: waiting for ack: timeout, retrying
amandad: time 2213.717: dgram_recv: timeout after 10 seconds
amandad: time 2213.717: waiting for ack: timeout, retrying
amandad: time 2223.717: dgram_recv: timeout after 10 seconds
amandad: time 2223.717: waiting for ack: timeout, retrying
amandad: time 2233.718: dgram_recv: timeout after 10 seconds
amandad: time 2233.718: waiting for ack: timeout, giving up!
amandad: time 2233.718: pid 12319 finish time Wed Nov  2 01:07:14 2005

is that time figure a time in seconds ?


Yep.  So you can just increase etimeout and/or figure out why "/sbin/dump 
1Ssf 1048576 - /dev/sda5" is taking so long.


--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University

Re: Estimate Timeout Issue - Dump runs fine

2005-11-02 Thread Tom Brown




Look in /tmp/amanda/sendsize*debug and/or amandad*debug to see how long 
the estimate is actually taking.  Also, what do your iptables rules look 
like on the server?


thanks - iptables are not being used, local firewall is off

sendsize degug is below and looks OK

# more /tmp/amanda/sendsize.20051102003001.debug
sendsize: debug 1 pid 12320 ruid 11 euid 11: start at Wed Nov  2 
00:30:01 2005

sendsize: version 2.4.5p1
sendsize[12322]: time 0.002: calculating for amname '/dev/sda2', dirname 
'/', spindle -1

sendsize[12322]: time 0.002: getting size via dump for /dev/sda2 level 0
sendsize[12322]: time 0.002: calculating for device '/dev/sda2' with 'ext3'
sendsize[12322]: time 0.002: running "/sbin/dump 0Ssf 1048576 - /dev/sda2"
sendsize[12322]: time 0.003: running /opt/amanda-2.4.5p1/libexec/killpgrp
sendsize[12320]: time 0.003: waiting for any estimate child: 1 running
sendsize[12322]: time 21.884: 1447269376
sendsize[12322]: time 21.885: .
sendsize[12322]: estimate time for /dev/sda2 level 0: 21.882
sendsize[12322]: estimate size for /dev/sda2 level 0: 1413349 KB
sendsize[12322]: time 21.885: asking killpgrp to terminate
sendsize[12322]: time 22.886: getting size via dump for /dev/sda2 level 1
sendsize[12322]: time 22.887: calculating for device '/dev/sda2' with 'ext3'
sendsize[12322]: time 22.887: running "/sbin/dump 1Ssf 1048576 - /dev/sda2"
sendsize[12322]: time 22.888: running /opt/amanda-2.4.5p1/libexec/killpgrp
sendsize[12322]: time 195.606: 4647936
sendsize[12322]: time 195.606: .
sendsize[12322]: estimate time for /dev/sda2 level 1: 172.718
sendsize[12322]: estimate size for /dev/sda2 level 1: 4539 KB
sendsize[12322]: time 195.606: asking killpgrp to terminate
sendsize[12322]: time 196.608: done with amname '/dev/sda2', dirname 
'/', spindle -1

sendsize[12320]: time 196.608: child 12322 terminated normally
sendsize[12334]: time 196.609: calculating for amname '/dev/sda1', 
dirname '/boot', spindle -1

sendsize[12334]: time 196.609: getting size via dump for /dev/sda1 level 0
sendsize[12334]: time 196.609: calculating for device '/dev/sda1' with 
'ext3'

sendsize[12334]: time 196.609: running "/sbin/dump 0Ssf 1048576 - /dev/sda1"
sendsize[12320]: time 196.609: waiting for any estimate child: 1 running
sendsize[12334]: time 196.610: running /opt/amanda-2.4.5p1/libexec/killpgrp
sendsize[12334]: time 197.239: 5737472
sendsize[12334]: time 197.239: .
sendsize[12334]: estimate time for /dev/sda1 level 0: 0.630
sendsize[12334]: estimate size for /dev/sda1 level 0: 5603 KB
sendsize[12334]: time 197.239: asking killpgrp to terminate
sendsize[12334]: time 198.242: getting size via dump for /dev/sda1 level 1
sendsize[12334]: time 198.243: calculating for device '/dev/sda1' with 
'ext3'

sendsize[12334]: time 198.243: running "/sbin/dump 1Ssf 1048576 - /dev/sda1"
sendsize[12334]: time 198.243: running /opt/amanda-2.4.5p1/libexec/killpgrp
sendsize[12334]: time 198.684: 27648
sendsize[12334]: time 198.684: .
sendsize[12334]: estimate time for /dev/sda1 level 1: 0.441
sendsize[12334]: estimate size for /dev/sda1 level 1: 27 KB
sendsize[12334]: time 198.684: asking killpgrp to terminate
sendsize[12334]: time 199.687: done with amname '/dev/sda1', dirname 
'/boot', spindle -1

sendsize[12320]: time 199.687: child 12334 terminated normally
sendsize[12339]: time 199.687: calculating for amname '/dev/sda5', 
dirname '/export/disk1', spindle -1

sendsize[12339]: time 199.688: getting size via dump for /dev/sda5 level 0
sendsize[12320]: time 199.688: waiting for any estimate child: 1 running
sendsize[12339]: time 199.688: calculating for device '/dev/sda5' with 
'ext3'

sendsize[12339]: time 199.688: running "/sbin/dump 0Ssf 1048576 - /dev/sda5"
sendsize[12339]: time 199.689: running /opt/amanda-2.4.5p1/libexec/killpgrp
sendsize[12339]: time 545.606: 88973312
sendsize[12339]: time 545.617: .
sendsize[12339]: estimate time for /dev/sda5 level 0: 345.928
sendsize[12339]: estimate size for /dev/sda5 level 0: 86888 KB
sendsize[12339]: time 545.617: asking killpgrp to terminate
sendsize[12339]: time 546.619: getting size via dump for /dev/sda5 level 1
sendsize[12339]: time 546.646: calculating for device '/dev/sda5' with 
'ext3'

sendsize[12339]: time 546.646: running "/sbin/dump 1Ssf 1048576 - /dev/sda5"
sendsize[12339]: time 546.647: running /opt/amanda-2.4.5p1/libexec/killpgrp
sendsize[12339]: time 2182.684: 25811968
sendsize[12339]: time 2182.696: .
sendsize[12339]: estimate time for /dev/sda5 level 1: 1636.054
sendsize[12339]: estimate size for /dev/sda5 level 1: 25207 KB
sendsize[12339]: time 2182.701: asking killpgrp to terminate
sendsize[12339]: time 2183.703: done with amname '/dev/sda5', dirname 
'/export/disk1', spindle -1

sendsize[12320]: time 2183.704: child 12339 terminated normally
sendsize: time 2183.704: pid 12320 finish time Wed Nov  2 01:06:24 2005

one of my amanda.debugs does have this at the bottom of it

amandad: time 2193.716: dgram_recv: timeout after 10 seconds
amanda

Re: Estimate Timeout Issue - Dump runs fine

2005-11-02 Thread Joshua Baker-LePain


On Wed, 2 Nov 2005 at 11:32am, Tom Brown wrote

But i'm getting a slightly strange error with a large partition. The 
partition in question is around 900gig in size although only a few hundred 
meg are currently used. When the estimate runs it returns


FAILURE AND STRANGE DUMP SUMMARY:
 planner: ERROR Estimate timeout from "servername"

Thing is though the actual dump of this filesystem runs fine - I have 
increased my eTimeout to 20mins but this still occurs - Any ideas on this 
one?


Look in /tmp/amanda/sendsize*debug and/or amandad*debug to see how long 
the estimate is actually taking.  Also, what do your iptables rules look 
like on the server?


--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University

Re: estimate timeout

2005-10-10 Thread Gene Heskett

On Monday 10 October 2005 03:20, Shai Ayal wrote:
>Hi all,
>
>I have searched the archives but none of the emails with similar
> subjects helped me.
>
>I have a FC2 amanda 2.4.4 server with 2 linux clients. The server is
> using vtapes for daily backups. It all ran very nicely for many months
> until we ran out of disk space in the server. After a few days of bad
> backups due to full disk, we installed an additional disk, moved some
> of the virtual tapes to it using symlinks, flushed the old backups
> etc... and sat back to enjoy amanda at work.
>
>However:
>
>While one client is being backed up perfectly well, the other keeps
> getting estimates timeout. On this client, everything seem ok except
> for showing 2 amandad processes during estimates, one of them defunct
> -- I attach the 2 amandad debug reports.

It is possible that the defunct amandad has open locks on files, thereby
blocking the estimate.  2 things might help, first I'd reboot the machine
the failure is on to remove them, and then I think I'd install a newer
amanda, 2.4.4 is getting a bit long in the tooth these days.  I can't
recall the exact version I was running when that happened on my firewall
box, mainly because I wasn't doing virtual tapes yet and was having so
many other tape related issues back then that a stuck amandad just wasn't
an event to record at length in my wetram.

If you still jave the same scripts you used to build the 2.4.4 on each
box, then 2.4.5-20051006 should install and run exactly the same.

However, I just checked the /home/amanda directory on my single linux
client, and its equally elderly, at 2.4.4-20030529, and its working fine
other than a 10 second delay in checking clients when amcheck is run,
about 80% of the time.

But this is as good a time to bring it uptodate as any, so its building on
that box now.  Using the same script I built the older version with.  
Oops,
forgot to run ldconfig after the install, done now.

Humm, I note that, and this has been random in the past, true about 80%
of the time, but there is no longer a 10 second delay in checking the
clients now, more like .35 seconds.  At least for the several iterations
of it I've done.  Maybe thats fixed now?

>On the server I have set an etimeout of 300 which should be enough, but
>even bumping this to 7200 did not help.
>
>I have no firewall on client and server

I do, but it not between the client and server, its betwen client and
the rest of the planet.  That box is the gateway.

>tar version is tar (GNU tar) 1.13.25 o the client

Thats a good one, although I'm running 1.15-1 on the server.  But the
client box is rh7.3, and the glib version won't let me build, or install
1.15.1.

>This is really frustrating since this setup used to work !
>
>Thanks in advance
>Shai

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.35% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2005 by Maurice Eugene Heskett, all rights reserved.

Re: estimate timeout

2005-10-10 Thread Joshua Baker-LePain


On Mon, 10 Oct 2005 at 9:20am, Shai Ayal wrote

I have a FC2 amanda 2.4.4 server with 2 linux clients. The server is using 
vtapes for daily backups. It all ran very nicely for many months until we ran 
out of disk space in the server. After a few days of bad backups due to full 
disk, we installed an additional disk, moved some of the virtual tapes to it 
using symlinks, flushed the old backups etc... and sat back to enjoy amanda 
at work.


However:

While one client is being backed up perfectly well, the other keeps getting 
estimates timeout. On this client, everything seem ok except for showing 2 
amandad processes during estimates, one of them defunct -- I attach the 2 
amandad debug reports.


On the server I have set an etimeout of 300 which should be enough, but even 
bumping this to 7200 did not help.


I have no firewall on client and server


Are you sure about that?  /etc/sysconfig/iptables is empty and/or 
'chkconfig --list iptables' says "off" for all runlevels?  That's a very 
non-standard setup.  I've seen behavior like this:


amandad: time 0.025: running service "/usr/lib/amanda/sendsize"
amandad: time 349.398: sending REP packet:
*snip*
amandad: time 359.415: dgram_recv: timeout after 10 seconds
amandad: time 359.415: waiting for ack: timeout, retrying
amandad: time 369.413: dgram_recv: timeout after 10 seconds
amandad: time 369.413: waiting for ack: timeout, retrying
amandad: time 379.412: dgram_recv: timeout after 10 seconds
amandad: time 379.412: waiting for ack: timeout, retrying
amandad: time 389.410: dgram_recv: timeout after 10 seconds
amandad: time 389.410: waiting for ack: timeout, retrying
amandad: time 399.409: dgram_recv: timeout after 10 seconds
amandad: time 399.409: waiting for ack: timeout, giving up!

on my systems where iptables allows established connections, but > 300 
seconds timed-out what was considered "established".


--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University

Re: Estimate timeout

2005-09-21 Thread John R. Jackson

>In amandad.debug on the client I get:
>...
>amandad: time 2951.811: sending REP packet:
>
>Amanda 2.4 REP HANDLE 00F-80930508 SEQ 1126652514
>OPTIONS features=feff9ffe7f;
>ar0s1a 0 SIZE 30567356
>ar0s1a 1 SIZE 17959269
>
>
>amandad: time 2961.819: dgram_recv: timeout after 10 seconds
>amandad: time 2961.819: waiting for ack: timeout, retrying
>...
>In sendsize.debug:
>...
>sendsize[20121]: estimate time for ar0s1a level 0: 320.045
>...
>sendsize[20121]: estimate time for ar0s1a level 1: 2629.405
>sendsize[20113]: time 2951.660: child 20121 terminated normally

It took ~2950 seconds to do the two estimates, based on the various
log messages.  When amandad tried to send back the response/reply (REP)
packet, it never got an acknowledgement (ack) that amdump/planner had
received it.

The default etimeout is 300 seconds.  Amanda multiplies that by the
number of estimates it asks the client to do, so, at best, planner on
the server side gave up after 600 seconds, which is why there wasn't
anyone around to receive the reply and answer it.  If you look at the
amdump.NN file that matches the above you'll probably see planner getting
done well before 2950 seconds.

I'm not sure why this disk is now taking so much longer to do estimates,
but the simplest solution is to just crank up etimeout in your amanda.conf
(or disklist) to compensate.  At least backups will start working again,
and then you can look into possible hardware or file system performance
problems.

>Tommy Eriksen - Chief Technical Officer

JJ

RE: Estimate timeout

2005-08-31 Thread LaValley, Brian E

Well, the tar command by itself is still running, but the backup with the
new version of tar is complete, so my "estimate timeout" problem is fixed
with an updated tar executable. Thank you all.

-Original Message-
From: Joshua Baker-LePain [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 30, 2005 11:21 AM
To: LaValley, Brian E
Cc: Amanda (E-mail)
Subject: Re: Estimate timeout


On Tue, 30 Aug 2005 at 11:01am, LaValley, Brian E wrote

> sendsize: debug 1 pid 12359 ruid 548 euid 548: start at Mon Aug 29
18:00:02
> 2005
> sendsize: version 2.4.4p2
> sendsize[12359]: time 0.034: waiting for any estimate child: 1 running
> sendsize[12361]: time 0.035: calculating for amname
> '/dev/vx/dsk/homedg/homevol', dirname '/home', spindle -1
> sendsize[12361]: time 0.035: getting size via gnutar for
> /dev/vx/dsk/homedg/homevol level 0
> sendsize[12361]: time 0.092: spawning
/home/backup/amanda_sun/libexec/runtar
> in pipeline
> sendsize[12361]: argument list: /opt/sfw/bin/gtar --create --file
/dev/null
> --directory /home --one-file-system --listed-incremental
>
/home/backup/amanda_sun/var/amanda/gnutar-lists/coneng_dev_vx_dsk_homedg_hom
> evol_0.new --sparse --ignore-failed-read --totals --exclude-from
> /tmp/amanda/sendsize._dev_vx_dsk_homedg_homevol.20050829180002.exclude .

Run this command yourself on the command line (as root) and see how long 
it take to complete.  Also, what version of tar are you running?

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University

Re: Estimate timeout

2005-08-30 Thread Joshua Baker-LePain

On Tue, 30 Aug 2005 at 1:19pm, Jon LaBadie wrote

> On Tue, Aug 30, 2005 at 11:01:52AM -0400, LaValley, Brian E wrote:
> > Can someone please help me get to the bottom of this issue?  I have Amanda
> > 2.4.4p2 on a Fedora Core 3 machine which is the tape server.  It has no
> > trouble backing up itself and other Linux machines.  The trouble comes with
> > a Sun Solaris 8 client which never completes its estimate. I tried to keep
> > increasing the etimeout value, but I am at 29600 and am wondering how far I
> > should go?  Is there some other part I should be looking at? Thank you.
> > 
> Does gnutar follow, and backup symbolic links.  I wonder if some of these
> monster estimates might be due to circular references.

I'm fairly certain it backs them up as links, not as the targest 
themselves.  It'd be easy to test, though...

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University

Re: Estimate timeout

2005-08-30 Thread Jon LaBadie

On Tue, Aug 30, 2005 at 11:01:52AM -0400, LaValley, Brian E wrote:
> Can someone please help me get to the bottom of this issue?  I have Amanda
> 2.4.4p2 on a Fedora Core 3 machine which is the tape server.  It has no
> trouble backing up itself and other Linux machines.  The trouble comes with
> a Sun Solaris 8 client which never completes its estimate. I tried to keep
> increasing the etimeout value, but I am at 29600 and am wondering how far I
> should go?  Is there some other part I should be looking at? Thank you.
> 

Does gnutar follow, and backup symbolic links.  I wonder if some of these
monster estimates might be due to circular references.


-- 
Jon H. LaBadie  [EMAIL PROTECTED]
 JG Computing
 4455 Province Line Road(609) 252-0159
 Princeton, NJ  08540-4322  (609) 683-7220 (fax)

RE: Estimate timeout

2005-08-30 Thread LaValley, Brian E

Ok, I'll try a new version of tar after my test of the tar command on its
own.

-Original Message-
From: Joshua Baker-LePain [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 30, 2005 11:33 AM
To: LaValley, Brian E
Cc: Amanda (E-mail)
Subject: RE: Estimate timeout


On Tue, 30 Aug 2005 at 11:38am, LaValley, Brian E wrote

> I'll have to get back to you on running the command by itself. My tar
> version is: tar (GNU tar) 1.13

Bad, bad, bad.

http://www.amanda.org/docs/faq.html#id2554919

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University

RE: Estimate timeout

2005-08-30 Thread Joshua Baker-LePain

On Tue, 30 Aug 2005 at 11:38am, LaValley, Brian E wrote

> I'll have to get back to you on running the command by itself. My tar
> version is: tar (GNU tar) 1.13

Bad, bad, bad.

http://www.amanda.org/docs/faq.html#id2554919

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University

RE: Estimate timeout

2005-08-30 Thread LaValley, Brian E

I'll have to get back to you on running the command by itself. My tar
version is: tar (GNU tar) 1.13

-Original Message-
From: Joshua Baker-LePain [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 30, 2005 11:21 AM
To: LaValley, Brian E
Cc: Amanda (E-mail)
Subject: Re: Estimate timeout


On Tue, 30 Aug 2005 at 11:01am, LaValley, Brian E wrote

> sendsize: debug 1 pid 12359 ruid 548 euid 548: start at Mon Aug 29
18:00:02
> 2005
> sendsize: version 2.4.4p2
> sendsize[12359]: time 0.034: waiting for any estimate child: 1 running
> sendsize[12361]: time 0.035: calculating for amname
> '/dev/vx/dsk/homedg/homevol', dirname '/home', spindle -1
> sendsize[12361]: time 0.035: getting size via gnutar for
> /dev/vx/dsk/homedg/homevol level 0
> sendsize[12361]: time 0.092: spawning
/home/backup/amanda_sun/libexec/runtar
> in pipeline
> sendsize[12361]: argument list: /opt/sfw/bin/gtar --create --file
/dev/null
> --directory /home --one-file-system --listed-incremental
>
/home/backup/amanda_sun/var/amanda/gnutar-lists/coneng_dev_vx_dsk_homedg_hom
> evol_0.new --sparse --ignore-failed-read --totals --exclude-from
> /tmp/amanda/sendsize._dev_vx_dsk_homedg_homevol.20050829180002.exclude .

Run this command yourself on the command line (as root) and see how long 
it take to complete.  Also, what version of tar are you running?

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University

Re: Estimate timeout

2005-08-30 Thread Joshua Baker-LePain

On Tue, 30 Aug 2005 at 11:01am, LaValley, Brian E wrote

> sendsize: debug 1 pid 12359 ruid 548 euid 548: start at Mon Aug 29 18:00:02
> 2005
> sendsize: version 2.4.4p2
> sendsize[12359]: time 0.034: waiting for any estimate child: 1 running
> sendsize[12361]: time 0.035: calculating for amname
> '/dev/vx/dsk/homedg/homevol', dirname '/home', spindle -1
> sendsize[12361]: time 0.035: getting size via gnutar for
> /dev/vx/dsk/homedg/homevol level 0
> sendsize[12361]: time 0.092: spawning /home/backup/amanda_sun/libexec/runtar
> in pipeline
> sendsize[12361]: argument list: /opt/sfw/bin/gtar --create --file /dev/null
> --directory /home --one-file-system --listed-incremental
> /home/backup/amanda_sun/var/amanda/gnutar-lists/coneng_dev_vx_dsk_homedg_hom
> evol_0.new --sparse --ignore-failed-read --totals --exclude-from
> /tmp/amanda/sendsize._dev_vx_dsk_homedg_homevol.20050829180002.exclude .

Run this command yourself on the command line (as root) and see how long 
it take to complete.  Also, what version of tar are you running?

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University

RE: Estimate timeout

2005-08-30 Thread LaValley, Brian E

There is no firewall.

-Original Message-
From: Guy Dallaire [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 30, 2005 11:02 AM
To: LaValley, Brian E
Cc: Amanda (E-mail)
Subject: Re: Estimate timeout


2005/8/30, LaValley, Brian E <[EMAIL PROTECTED]>:
> Can someone please help me get to the bottom of this issue?  I have Amanda
> 2.4.4p2 on a Fedora Core 3 machine which is the tape server.  It has no
> trouble backing up itself and other Linux machines.  The trouble comes
with
> a Sun Solaris 8 client which never completes its estimate. I tried to keep
> increasing the etimeout value, but I am at 29600 and am wondering how far
I
> should go?  Is there some other part I should be looking at? Thank you.
> 

Is ths sun box behind a firewall ? If so, there may be issues with the
firewall. You have to increase some retention time parameters for UDP
packets.

Re: Estimate timeout

2005-08-30 Thread Guy Dallaire

2005/8/30, LaValley, Brian E <[EMAIL PROTECTED]>:
> Can someone please help me get to the bottom of this issue?  I have Amanda
> 2.4.4p2 on a Fedora Core 3 machine which is the tape server.  It has no
> trouble backing up itself and other Linux machines.  The trouble comes with
> a Sun Solaris 8 client which never completes its estimate. I tried to keep
> increasing the etimeout value, but I am at 29600 and am wondering how far I
> should go?  Is there some other part I should be looking at? Thank you.
> 

Is ths sun box behind a firewall ? If so, there may be issues with the
firewall. You have to increase some retention time parameters for UDP
packets.

Re: Estimate timeout

2005-07-23 Thread Jon LaBadie

On Sat, Jul 23, 2005 at 09:56:30AM -0400, LaValley, Brian E wrote:
> My dumps aren't completing. One fishy thing I am seeing in the logs is two
> of the same partition, "/home 1" and "/home 0"
> What does this mean?
> 
>
As part of its planning amanda may do an estimate of
the size of a potential dump of more than one level.

The thinking is, if the size of a level X
is only slightly more than level X+1,
might as well do the level X.

-- 
Jon H. LaBadie  [EMAIL PROTECTED]
 JG Computing
 4455 Province Line Road(609) 252-0159
 Princeton, NJ  08540-4322  (609) 683-7220 (fax)

Re: estimate timeout

2005-05-12 Thread Jim Summers

McDonagh, Joe wrote:
I have an estimate timeout of three hours, is there anyway to skip the
estimate or what? It's getting estimate timeout, the fs is fine, it can
be read from and everything, it just has loads of small files.
You might want to consider upgrading the server and client involved to 
the current stable.  It has provisions for doing server estimates.  That 
should help.

The server estimates seem to be conservative, to be on the safe side.  I 
have some data to look through for level 0, level 1, level 2 estimates 
and I should be able to post it by the end of this week.

Hope this helps.
--
Jim Summers
School of Computer Science-University of Oklahoma
-

Re: estimate timeout

2005-05-11 Thread Paul Bijnens

McDonagh, Joe wrote:
I have an estimate timeout of three hours, is there anyway to skip the
estimate or what? It's getting estimate timeout, the fs is fine, it can
be read from and everything, it just has loads of small files.
If you have amanda 2.4.5, you can use alternate methods for the
estimate.  From the NEWS file:
* new 'estimate' dumptype option to select estimate type:
CLIENT: estimate by the dumping program.
CALCSIZE: estimate by the calcsize program, a lot faster but less 
acurate.
SERVER: estimate based on statistic from previous run, take second but
can be wrong on the estimate size.

I've not yet tried it myself.
--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out  *
***

Re: Estimate timeout error

2004-12-07 Thread Paul Bijnens

Nick Danger wrote:
Paul Bijnens wrote:
But the reply packet never got acknowledged by the server.
Somehow it got lost or corrupted.
Default route for reverse path not correct?  Wrong subnetmask?
Try do get a network trace at the client and server, and inbetween
(don't know how to accomplish that on a PIX firewall):
Solaris:
snoop -x42 host x.x.x.x proto udp port 10080
using open source (linux and others):
tcpdump -X host x.x.x.x and udp and port 10080
Or other programs that have the same capabilities (ethereal etc).
Before guessing how to fix it, we must know where the problem is.
Is the packet lost?  or is it broken?

Quick recap: Server grolsch tries to back up client dominion. It works 
for the partitions of /, /usr and /var. As soon as I tell grolsh to back 
up dominions /u00 partition (a 45G partition, but presently only 177M 
full w/approx 2000 files) it will fail. I have since removed /u00 from 
backups to at least keep things working in the meantime but I would like 
that data backed up :-)

I have moved the amanda server to public IP space. It is still behind a 
PIX firewall, I just got rid of the private IP to public IP mappings.  
This didnt fix it :-) Not that I thought it would, I just got annoyed at 
some of the routing.

I ran tcpdump on client and server, the dumps are on the following page, 
lined up as best I could to show the flow. It seems when doing the 
partition that makes it fail, a bunch of packets do not get from the 
client to the server.  Since I am no expert in TCPdump or interpreting 
its results, I hope this helps figure out the problem.

tcpdump results on http://www.hackermonkey.com/amanda-error.html
very good to find that already.
You forgot "-s 1500", so that all packets are cut off at 256 bytes...
But I believe I have enough information to conclude that the PIX
firewall times out too soon for the udp reply.
Usually a dialog goes like:
 server sends some REQuest to client
 client answers with ACKnowledge to confirm receipt of request
 client sends REPly to the server
 server answers with ACKnowledge to confirm receipt of reply
The details of the REQ or REP packet are cut off by omitting the -s
option to tcdump, but you can see the strings REQ/ACK/REP in each packet.
The first exchange is a NOOP request, to which the client answers
with his list of capabilities.  This takes only a few milliseconds.
The second exchange is the request to estimate the list of
DLE's.  The client sends the REPly when all DLE's are estimated.
This takes more time:  09:51:45.208525 til 09:54:25.687229 or about
2 minutes 40 seconds.
However the packet is not recieved at the server.  The client just
sends the packet at an interval of 10 seconds, but never receives
the ACK, and gives up.
For a TCP connection a firewall has a notion of a connection and
keeps a TCP connection open until one of side stops the connections.
A UDP connection is stateless, and a firewall has no indication that
the third step (REPly) is related to the REQuest/ACK some minutes
before.  A firewall usually uses a timer to decide when to stop
transmitting packets.
It seems that the timer for UDP packets in the PIX firewall is
less than 2 minutes 40 seconds.  I have no experience with a PIX
firewall.  Any possibility to increase the UDP timeout?
Another possibility is to allow UDP packets to port 10080 from
client to server without timeouts.  (That's what stateless firewalls
have to do anyway.)

--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out  *
***

Re: Estimate timeout error

2004-12-07 Thread Nick Danger

Paul Bijnens wrote:
But the reply packet never got acknowledged by the server.
Somehow it got lost or corrupted.
Default route for reverse path not correct?  Wrong subnetmask?
Try do get a network trace at the client and server, and inbetween
(don't know how to accomplish that on a PIX firewall):
Solaris:
snoop -x42 host x.x.x.x proto udp port 10080
using open source (linux and others):
tcpdump -X host x.x.x.x and udp and port 10080
Or other programs that have the same capabilities (ethereal etc).
Before guessing how to fix it, we must know where the problem is.
Is the packet lost?  or is it broken?

Quick recap: Server grolsch tries to back up client dominion. It works 
for the partitions of /, /usr and /var. As soon as I tell grolsh to back 
up dominions /u00 partition (a 45G partition, but presently only 177M 
full w/approx 2000 files) it will fail. I have since removed /u00 from 
backups to at least keep things working in the meantime but I would like 
that data backed up :-)

I have moved the amanda server to public IP space. It is still behind a 
PIX firewall, I just got rid of the private IP to public IP mappings.  
This didnt fix it :-) Not that I thought it would, I just got annoyed at 
some of the routing.

I ran tcpdump on client and server, the dumps are on the following page, 
lined up as best I could to show the flow. It seems when doing the 
partition that makes it fail, a bunch of packets do not get from the 
client to the server.  Since I am no expert in TCPdump or interpreting 
its results, I hope this helps figure out the problem.

tcpdump results on http://www.hackermonkey.com/amanda-error.html
-Nick

Re: Estimate timeout error

2004-12-03 Thread Paul Bijnens

Nick Danger wrote:
There is a PIX between the two, but Im backing up a bunch (10?) linux 
and solaris servers in the same areas of the network, to this same 
amanda server without any issues so I dont believe it to be a firewall 
issue. There are no iptables running on either host (both linux in this 
case)

In the amandad.XXX.debug log I have the following lines, which Im 
assuming are the error report of the problem? Now, the question is, how 
to fix it :-)

-Nick
amandad: time 0.010: amandahosts security check passed
amandad: time 0.010: running service "/usr/lib/amanda/sendsize"
amandad: time 182.436: sending REP packet:
The above concludes that 3 minutes is needed for the sendsize,
and it is indeed without errors, because it has all the
info below.  Could still be that 179 seconds works and 181 seconds
is too late...


Amanda 2.4 REP HANDLE 005-40813308 SEQ 1102082216
OPTIONS features=feff9ffe0f;
/ 0 SIZE 301197
/ 1 SIZE 100
/u00 0 SIZE 143930
/u00 1 SIZE 41411
/usr 0 SIZE 880958
/usr 1 SIZE 79
/usr/local 0 SIZE 174
/usr/local 1 SIZE 47
/var 0 SIZE 299300
/var 1 SIZE 2857

The above lines are the reply packet, less than 300 bytes,
so I guess it's not a UDP packet overflow.

amandad: time 192.437: dgram_recv: timeout after 10 seconds
amandad: time 192.437: waiting for ack: timeout, retrying
amandad: time 202.439: dgram_recv: timeout after 10 seconds
amandad: time 202.439: waiting for ack: timeout, retrying
amandad: time 212.441: dgram_recv: timeout after 10 seconds
amandad: time 212.442: waiting for ack: timeout, retrying
amandad: time 222.444: dgram_recv: timeout after 10 seconds
amandad: time 222.444: waiting for ack: timeout, retrying
amandad: time 232.446: dgram_recv: timeout after 10 seconds
amandad: time 232.446: waiting for ack: timeout, giving up!
amandad: time 232.446: pid 21896 finish time Fri Dec  3 09:01:32 2004

But the reply packet never got acknowledged by the server.
Somehow it got lost or corrupted.
Default route for reverse path not correct?  Wrong subnetmask?
Try do get a network trace at the client and server, and inbetween
(don't know how to accomplish that on a PIX firewall):
Solaris:
snoop -x42 host x.x.x.x proto udp port 10080
using open source (linux and others):
tcpdump -X host x.x.x.x and udp and port 10080
Or other programs that have the same capabilities (ethereal etc).
Before guessing how to fix it, we must know where the problem is.
Is the packet lost?  or is it broken?
--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out  *
***

Re: Estimate timeout error

2004-12-03 Thread Nick Danger

There is a PIX between the two, but Im backing up a bunch (10?) linux 
and solaris servers in the same areas of the network, to this same 
amanda server without any issues so I dont believe it to be a firewall 
issue. There are no iptables running on either host (both linux in this 
case)

In the amandad.XXX.debug log I have the following lines, which Im 
assuming are the error report of the problem? Now, the question is, how 
to fix it :-)

-Nick
amandad: time 0.010: amandahosts security check passed
amandad: time 0.010: running service "/usr/lib/amanda/sendsize"
amandad: time 182.436: sending REP packet:

Amanda 2.4 REP HANDLE 005-40813308 SEQ 1102082216
OPTIONS features=feff9ffe0f;
/ 0 SIZE 301197
/ 1 SIZE 100
/u00 0 SIZE 143930
/u00 1 SIZE 41411
/usr 0 SIZE 880958
/usr 1 SIZE 79
/usr/local 0 SIZE 174
/usr/local 1 SIZE 47
/var 0 SIZE 299300
/var 1 SIZE 2857

amandad: time 192.437: dgram_recv: timeout after 10 seconds
amandad: time 192.437: waiting for ack: timeout, retrying
amandad: time 202.439: dgram_recv: timeout after 10 seconds
amandad: time 202.439: waiting for ack: timeout, retrying
amandad: time 212.441: dgram_recv: timeout after 10 seconds
amandad: time 212.442: waiting for ack: timeout, retrying
amandad: time 222.444: dgram_recv: timeout after 10 seconds
amandad: time 222.444: waiting for ack: timeout, retrying
amandad: time 232.446: dgram_recv: timeout after 10 seconds
amandad: time 232.446: waiting for ack: timeout, giving up!
amandad: time 232.446: pid 21896 finish time Fri Dec  3 09:01:32 2004
Paul Bijnens wrote:
Nick Danger wrote:
Nope - still a problem. The error is still as below:
FAILURE AND STRANGE DUMP SUMMARY:
 dominion.h /var lev 0 FAILED [Estimate timeout from dominion.xxx]
 dominion.h /usr/local lev 0 FAILED [Estimate timeout from dominion.xxx]
 dominion.h /usr lev 0 FAILED [Estimate timeout from dominion.xxx]
 dominion.h /u00 lev 0 FAILED [Estimate timeout from dominion.xxx]
 dominion.h / lev 0 FAILED [Estimate timeout from dominion.xxx]
I have the timeout in amanda.conf set to an ungodly high number of
etimeout -12000 # total number of seconds for estimates.
[...]
sendsize: debug 1 pid 26242 ruid 33 euid 33: start at Thu Dec  2 
11:25:07 2004
sendsize: version 2.4.4p1
[...]
sendsize: time 172.473: pid 26242 finish time Thu Dec  2 11:27:59 2004
The estimate really takes only 173 seconds.  That means that etimeout
is plenty (better lower it again to normal values).
The problem seems to be in the reply packet.
I've already seen problems with a UDP-packet overflow, but that's
unlikely.  That problem happened with older versions where the UDP
size was only 8Kbyte or so. Currently it is 64K, but it could be
limited by the OS too, of course.  The reply packet is usually larger
than the request packet, because it contains 1 to 3 lines for each
DLE (level 0, current level, current plus 1).
In amandad.DATETIME.debug, you can find the request packet, and the
reply packet.
Any weird limitation on UDP packet size on one of the hosts (or
intermediate routers/firewalls)?
Another problem could be in the iptables modules for amanda, where
there is already twice a bug introduced.  I don't know exactly the
last status of that bug.  If not needed, do not use the amanda iptables
modules.  Try "lsmod | grep amanda".  (Or on intermediate firewalls!)
Maybe try a network traffic dump (with tcpdump or similar program)
on client *and* host?

Re: Estimate timeout error

2004-12-02 Thread Paul Bijnens

Nick Danger wrote:
Nope - still a problem. The error is still as below:
FAILURE AND STRANGE DUMP SUMMARY:
 dominion.h /var lev 0 FAILED [Estimate timeout from dominion.xxx]
 dominion.h /usr/local lev 0 FAILED [Estimate timeout from dominion.xxx]
 dominion.h /usr lev 0 FAILED [Estimate timeout from dominion.xxx]
 dominion.h /u00 lev 0 FAILED [Estimate timeout from dominion.xxx]
 dominion.h / lev 0 FAILED [Estimate timeout from dominion.xxx]
I have the timeout in amanda.conf set to an ungodly high number of
etimeout -12000 # total number of seconds for estimates.
[...]
sendsize: debug 1 pid 26242 ruid 33 euid 33: start at Thu Dec  2 
11:25:07 2004
sendsize: version 2.4.4p1
[...]
sendsize: time 172.473: pid 26242 finish time Thu Dec  2 11:27:59 2004
The estimate really takes only 173 seconds.  That means that etimeout
is plenty (better lower it again to normal values).
The problem seems to be in the reply packet.
I've already seen problems with a UDP-packet overflow, but that's
unlikely.  That problem happened with older versions where the UDP
size was only 8Kbyte or so. Currently it is 64K, but it could be
limited by the OS too, of course.  The reply packet is usually larger
than the request packet, because it contains 1 to 3 lines for each
DLE (level 0, current level, current plus 1).
In amandad.DATETIME.debug, you can find the request packet, and the
reply packet.
Any weird limitation on UDP packet size on one of the hosts (or
intermediate routers/firewalls)?
Another problem could be in the iptables modules for amanda, where
there is already twice a bug introduced.  I don't know exactly the
last status of that bug.  If not needed, do not use the amanda iptables
modules.  Try "lsmod | grep amanda".  (Or on intermediate firewalls!)
Maybe try a network traffic dump (with tcpdump or similar program)
on client *and* host?
--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out  *
***

Re: Estimate timeout error

2004-12-02 Thread Nick Danger

Matt Hyclak wrote:
The sendsize.DATETIME.debug log file on dominion should tell you how long
the estimates are taking. A simple calculation should tell you how big
etimeout should be. 

(NUM_PARTITIONS * ETIMEOUT) = total time amanda waits for estimates.
Matt
 

Nope - still a problem. The error is still as below:
FAILURE AND STRANGE DUMP SUMMARY:
 dominion.h /var lev 0 FAILED [Estimate timeout from dominion.xxx]
 dominion.h /usr/local lev 0 FAILED [Estimate timeout from dominion.xxx]
 dominion.h /usr lev 0 FAILED [Estimate timeout from dominion.xxx]
 dominion.h /u00 lev 0 FAILED [Estimate timeout from dominion.xxx]
 dominion.h / lev 0 FAILED [Estimate timeout from dominion.xxx]

I have the timeout in amanda.conf set to an ungodly high number of
etimeout -12000 # total number of seconds for estimates.
The lines from the sendsize in /var/log are listed below. All the file 
systems are short, except for /u00 which lists Is the 77 in MINUES? Or 
seconds? Either way, 12000 in amanda.conf should be plenty, shouldnt it? 
I could be just doing my math wrong. Which is always a possiblity. There 
are no units listed other then in the config file, so Im guessing at 
some parts here.

Thanks all
-Nick
---
sendsize: debug 1 pid 26242 ruid 33 euid 33: start at Thu Dec  2 
11:25:07 2004
sendsize: version 2.4.4p1
sendsize[26244]: time 0.007: calculating for amname '/', dirname '/', 
spindle -1
sendsize[26244]: time 0.007: getting size via dump for / level 0
sendsize[26242]: time 0.008: waiting for any estimate child
sendsize[26244]: time 0.008: calculating for device '/dev/sda1' with 'ext3'
sendsize[26244]: time 0.008: running "/sbin/dump 0Ssf 1048576 - /dev/sda1"
sendsize[26244]: time 0.011: running /usr/lib/amanda/killpgrp
sendsize[26244]: time 0.071:   DUMP: Excluding inode 8 (journal inode) 
from dump
sendsize[26244]: time 0.072:   DUMP: Excluding inode 7 (resize inode) 
from dump
sendsize[26244]: time 0.410: 308423680
sendsize[26244]: time 0.411: .
sendsize[26244]: estimate time for / level 0: 0.402
sendsize[26244]: estimate size for / level 0: 301195 KB
sendsize[26244]: time 0.411: asking killpgrp to terminate
sendsize[26244]: time 1.415: getting size via dump for / level 1
sendsize[26244]: time 1.416: calculating for device '/dev/sda1' with 'ext3'
sendsize[26244]: time 1.416: running "/sbin/dump 1Ssf 1048576 - /dev/sda1"
sendsize[26244]: time 1.419: running /usr/lib/amanda/killpgrp
sendsize[26244]: time 1.449:   DUMP: Excluding inode 8 (journal inode) 
from dump
sendsize[26244]: time 1.451:   DUMP: Excluding inode 7 (resize inode) 
from dump
sendsize[26244]: time 1.887: 1104896
sendsize[26244]: time 1.889: .
sendsize[26244]: estimate time for / level 1: 0.472
sendsize[26244]: estimate size for / level 1: 1079 KB
sendsize[26244]: time 1.889: asking killpgrp to terminate
sendsize[26244]: time 2.895: done with amname '/', dirname '/', spindle -1
sendsize[26242]: time 2.895: child 26244 terminated normally
sendsize[26249]: time 2.896: calculating for amname '/u00', dirname 
'/u00', spindle -1
sendsize[26249]: time 2.896: getting size via dump for /u00 level 0
sendsize[26249]: time 2.897: calculating for device '/dev/sda9' with 'ext3'
sendsize[26249]: time 2.897: running "/sbin/dump 0Ssf 1048576 - /dev/sda9"
sendsize[26249]: time 2.900: running /usr/lib/amanda/killpgrp
sendsize[26242]: time 2.905: waiting for any estimate child
sendsize[26249]: time 2.942:   DUMP: Excluding inode 8 (journal inode) 
from dump
sendsize[26249]: time 2.943:   DUMP: Excluding inode 7 (resize inode) 
from dump
sendsize[26249]: time 80.109: 147388416
sendsize[26249]: time 80.111: .
sendsize[26249]: estimate time for /u00 level 0: 77.213
sendsize[26249]: estimate size for /u00 level 0: 143934 KB
sendsize[26249]: time 80.111: asking killpgrp to terminate
sendsize[26249]: time 81.112: getting size via dump for /u00 level 1
sendsize[26249]: time 81.113: calculating for device '/dev/sda9' with 'ext3'
sendsize[26249]: time 81.113: running "/sbin/dump 1Ssf 1048576 - /dev/sda9"
sendsize[26249]: time 81.116: running /usr/lib/amanda/killpgrp
sendsize[26249]: time 81.401:   DUMP: Excluding inode 8 (journal inode) 
from dump
sendsize[26249]: time 81.403:   DUMP: Excluding inode 7 (resize inode) 
from dump
sendsize[26249]: time 159.069: 42408960
sendsize[26249]: time 159.070: .
sendsize[26249]: estimate time for /u00 level 1: 77.957
sendsize[26249]: estimate size for /u00 level 1: 41415 KB
sendsize[26249]: time 159.071: asking killpgrp to terminate
sendsize[26249]: time 160.080: done with amname '/u00', dirname '/u00', 
spindle -1
sendsize[26242]: time 160.080: child 26249 terminated normally
sendsize[26484]: time 160.081: calculating for amname '/usr', dirname 
'/usr', spindle -1
sendsize[26484]: time 160.081: getting size via dump for /usr level 0
sendsize[26484]: time 160.082: calculating for device '/dev/sda2' with 
'ext3'
sendsize[26484]: time 160.082: running "/sbin/dump 0Ssf 1048576 - /dev/sda2"
sendsize[2648

Re: Estimate timeout error

2004-11-29 Thread Paul Bijnens


On Monday 29 November 2004 09:00, Nick Danger wrote:
Is there any way to properly calculate what your timeout estimate
value should be other then trial and error? I have a partition on a
machine that gives this error. "
dominion.h /u00 lev 0 FAILED [Estimate timeout from dominion]
Have a look in on the client in the file 
/tmp/amanda/sendsize.DATETIMESTAMP.debug.
The first and the last line contain a date.
Even when the server times out, the client still continues.
If not, then there is probably an error message.

--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out  *
***

Re: Estimate timeout error

2004-11-29 Thread Gene Heskett

On Monday 29 November 2004 09:00, Nick Danger wrote:
>Is there any way to properly calculate what your timeout estimate
> value should be other then trial and error? I have a partition on a
> machine that gives this error. "
>
>dominion.h /u00 lev 0 FAILED [Estimate timeout from dominion]
>
>
>If I remove that partition from disklist, all other partitions on
> that server backup just fine.
>Its a 45G partiton on a SCSI raid set. Its hardly 1% full, holding
> maybe 1000 files. I have upped the timeout to 1200, and still it
> failed.
>
>Suggestions?

No, other than your etimeout value s/b more than sufficient.  Do you 
have other dirs on that client that do work ok and amanda backs them 
up ok?
>
>-Nick

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.29% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

Re: Estimate timeout error

2004-11-29 Thread Matt Hyclak

On Mon, Nov 29, 2004 at 09:00:44AM -0500, Nick Danger enlightened us:
> Is there any way to properly calculate what your timeout estimate value 
> should be other then trial and error? I have a partition on a machine 
> that gives this error. "
> 
> dominion.h /u00 lev 0 FAILED [Estimate timeout from dominion]
> 
> 
> If I remove that partition from disklist, all other partitions on that 
> server backup just fine.
> Its a 45G partiton on a SCSI raid set. Its hardly 1% full, holding maybe 
> 1000 files. I have upped the timeout to 1200, and still it failed.
> 

The sendsize.DATETIME.debug log file on dominion should tell you how long
the estimates are taking. A simple calculation should tell you how big
etimeout should be. 

(NUM_PARTITIONS * ETIMEOUT) = total time amanda waits for estimates.

Matt

-- 
Matt Hyclak
Department of Mathematics 
Department of Social Work
Ohio University
(740) 593-1263


pgpTl5n77Ssee.pgp
Description: PGP signature

Re: Estimate timeout

2004-06-11 Thread Steven Schoch

Joshua Baker-LePain wrote:
> "admin prohibited" is definately a result of iptables filtering.
> Have a close look in homer.  Execute "iptables -L".
>
> Maybe the solution is loading the amanda iptables module,
> if that is available on the machine.
I'd be interested to see if that fixes it.
The following line was added to /etc/sysconfig/iptables:
-A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp -s XX.XX.XX.0/24 
--sport 10080 -j ACCEPT

...where XX.XX.XX is the IP address of our local 'external' network, on 
which both homer and marge are located.

The problem has been solved.
--
Steve
_
Watch the online reality show Mixed Messages with a friend and enter to win 
a trip to NY 
http://www.msnmessenger-download.click-url.com/go/onm00200497ave/direct/01/

Re: Estimate timeout

2004-06-10 Thread Gene Heskett

On Thursday 10 June 2004 07:59, Joshua Baker-LePain wrote:
>On Thu, 10 Jun 2004 at 1:40pm, Paul Bijnens wrote
>
>> I have been thinking about this problem, and, without any real
>> testing to backup my hypothesis, I believe the problem lies in the
>> default timeout in iptables for UDP traffic, as you decided too.
>>
>> For TCP traffic, once a packet is replied, the timeout becomes
>> very large (5 days or so I believe).  But for UDP, which is a
>> conectionless protocol the timeout is 180 seconds (I believe).
>> After this timeout the connection tracking drops the rule.
>
>Is this true even with ip_conntrack_amanda loaded?

I wasn't even aware of such a module, and got surprised by the output 
of a locate!

Its part of the kernel's netfilter options since back in 2.4.22 or 
earlier days, so if he doesn't have the executable module, he may 
have to rebuild his kernel to get it.

I hadn't worried about it here since everything I backup with amanda 
is inside the firewall, or on the firewall itself, but iptables sits 
between the 2 NICS in the firewall that seperate inside from outside 
stuffs.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.23% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

Re: Estimate timeout

2004-06-10 Thread Paul Bijnens

Joshua Baker-LePain wrote:
It seems to be tuneable.  From the header of the source code:
*   Module load syntax:
 *  insmod ip_conntrack_amanda.o [master_timeout=n]
 *  
 *  Where master_timeout is the timeout (in seconds) of the master
 *  connection (port 10080).  This defaults to 5 minutes but if
 *  your clients take longer than 5 minutes to do their work
 *  before getting back to the Amanda server, you can increase
 *  this value.
I should test it one of these nights...
Wow!  Learning something new every day!
--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out  *
***

Re: Estimate timeout

2004-06-10 Thread Joshua Baker-LePain

On Thu, 10 Jun 2004 at 2:11pm, Paul Bijnens wrote

> > Is this true even with ip_conntrack_amanda loaded?
> 
> 
> I should have a look at the source code, or find a detailed doc that
> explains it, to find out.
> 
> Anyway that module should somehow know the etimeout parameter
> of amanda.conf, which of course it does not know, or otherwise allow
> a really really large timeout, like a few hours.  Or should be tuneable
> somehow (in the amanda-tradition that could be hardcoded at compile time).

It seems to be tuneable.  From the header of the source code:

*   Module load syntax:
 *  insmod ip_conntrack_amanda.o [master_timeout=n]
 *  
 *  Where master_timeout is the timeout (in seconds) of the master
 *  connection (port 10080).  This defaults to 5 minutes but if
 *  your clients take longer than 5 minutes to do their work
 *  before getting back to the Amanda server, you can increase
 *  this value.

I should test it one of these nights...

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University

Re: Estimate timeout

2004-06-10 Thread Paul Bijnens

Joshua Baker-LePain wrote:
On Thu, 10 Jun 2004 at 1:40pm, Paul Bijnens wrote

I have been thinking about this problem, and, without any real testing
to backup my hypothesis, I believe the problem lies in the default
timeout in iptables for UDP traffic, as you decided too.
For TCP traffic, once a packet is replied, the timeout becomes very
large (5 days or so I believe).  But for UDP, which is a conectionless
protocol the timeout is 180 seconds (I believe).
After this timeout the connection tracking drops the rule.

Is this true even with ip_conntrack_amanda loaded?

I should have a look at the source code, or find a detailed doc that
explains it, to find out.
Anyway that module should somehow know the etimeout parameter
of amanda.conf, which of course it does not know, or otherwise allow
a really really large timeout, like a few hours.  Or should be tuneable
somehow (in the amanda-tradition that could be hardcoded at compile time).
--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out  *
***

Re: Estimate timeout

2004-06-10 Thread Joshua Baker-LePain

On Thu, 10 Jun 2004 at 1:40pm, Paul Bijnens wrote

> I have been thinking about this problem, and, without any real testing
> to backup my hypothesis, I believe the problem lies in the default
> timeout in iptables for UDP traffic, as you decided too.
> 
> For TCP traffic, once a packet is replied, the timeout becomes very
> large (5 days or so I believe).  But for UDP, which is a conectionless
> protocol the timeout is 180 seconds (I believe).
> After this timeout the connection tracking drops the rule.

Is this true even with ip_conntrack_amanda loaded?

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University

Re: Estimate timeout

2004-06-10 Thread Paul Bijnens

Joshua Baker-LePain wrote:
On Thu, 10 Jun 2004 at 9:31am, Paul Bijnens wrote
Steven Schoch wrote:
Now we're getting somewhere.  The tcpdump shows this:
15:01:56.739818 homer > marge: icmp: host homer unreachable - admin 
prohibited [tos 0xc0]

My guess is that ICMP message is something to do with a firewall.

"admin prohibited" is definately a result of iptables filtering.
Have a close look in homer.  Execute "iptables -L".
Maybe the solution is loading the amanda iptables module,
if that is available on the machine.

I'd be interested to see if that fixes it.  My amanda server which runs 
the nightlies of the (small) home partitions has been at RH9 for a while, 
and has this as the only rule it needed to get amdump working:

# If we've an established session, well, okay
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT 

I recently moved my other amanda server (which backs up my 4.5TB of RAID 
space) to RH9.  The first few nights, most of the clients were failing 
with estimate timeouts.  But when I tested during the day (with small 
partitions), everything worked.  I finally decided that the estimates on 
the big partitions were taking long enough that the above rule was timing 
out.  I couldn't afford another night of the backups failing, so I didn't 
try loading the amanda module -- I just added rules to allow incoming 
UDP traffic on priviledged ports from the clients.

I have been thinking about this problem, and, without any real testing
to backup my hypothesis, I believe the problem lies in the default
timeout in iptables for UDP traffic, as you decided too.
For TCP traffic, once a packet is replied, the timeout becomes very
large (5 days or so I believe).  But for UDP, which is a conectionless
protocol the timeout is 180 seconds (I believe).
After this timeout the connection tracking drops the rule.
In my config, the estimates of the clients in the DMZ all take less than
2 minutes.  And this works fine.
That means that the real solution is to compile amanda with a dedicated
udp range, and add that range to the firewall iptables.
--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out  *
***

Re: Estimate timeout

2004-06-10 Thread Joshua Baker-LePain

On Thu, 10 Jun 2004 at 9:31am, Paul Bijnens wrote

> Steven Schoch wrote:
> 
> > Now we're getting somewhere.  The tcpdump shows this:
> > 
> > 15:01:56.739818 homer > marge: icmp: host homer unreachable - admin 
> > prohibited [tos 0xc0]
> > 
> > My guess is that ICMP message is something to do with a firewall.
> 
> 
> "admin prohibited" is definately a result of iptables filtering.
> Have a close look in homer.  Execute "iptables -L".
> 
> Maybe the solution is loading the amanda iptables module,
> if that is available on the machine.

I'd be interested to see if that fixes it.  My amanda server which runs 
the nightlies of the (small) home partitions has been at RH9 for a while, 
and has this as the only rule it needed to get amdump working:

# If we've an established session, well, okay
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT 

I recently moved my other amanda server (which backs up my 4.5TB of RAID 
space) to RH9.  The first few nights, most of the clients were failing 
with estimate timeouts.  But when I tested during the day (with small 
partitions), everything worked.  I finally decided that the estimates on 
the big partitions were taking long enough that the above rule was timing 
out.  I couldn't afford another night of the backups failing, so I didn't 
try loading the amanda module -- I just added rules to allow incoming 
UDP traffic on priviledged ports from the clients.

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University

Re: Estimate timeout

2004-06-10 Thread Paul Bijnens

Steven Schoch wrote:
on Wed, 09 Jun 2004 Paul Bijnens wrote:
Try to find out where the UDP packet got dropped, using tcpdump or 
etherreal or other network analyzer on homer and marge.

Now we're getting somewhere.  The tcpdump shows this:
15:01:56.739818 homer > marge: icmp: host homer unreachable - admin 
prohibited [tos 0xc0]

My guess is that ICMP message is something to do with a firewall.

"admin prohibited" is definately a result of iptables filtering.
Have a close look in homer.  Execute "iptables -L".
Maybe the solution is loading the amanda iptables module,
if that is available on the machine.
--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out  *
***

Re: Estimate timeout

2004-06-09 Thread Steven Schoch

on Wed, 09 Jun 2004 Paul Bijnens wrote:
Try to find out where the UDP packet got dropped, using tcpdump or 
etherreal or other network analyzer on homer and marge.
Now we're getting somewhere.  The tcpdump shows this:
14:54:28.697197 homer.858 > marge.amanda: udp 117 (DF)
14:54:29.176236 marge.amanda > homer.858: udp 50
14:54:29.444159 marge.amanda > homer..858: udp 83
14:54:29.444563 homer.858 > marge.amanda: udp 50 (DF)
14:54:29.445650 homer.858 > marge.amanda: udp 531 (DF)
14:54:29.525614 marge.amanda > homer.858: udp 50
15:01:56.739172 marge.amanda > homer.858: udp 184
15:01:56.739818 homer > marge: icmp: host homer unreachable - admin 
prohibited [tos 0xc0]
15:02:06.743312 marge.amanda > homer.858: udp 184
15:02:06.743992 homer > marge: icmp: host homer unreachable - admin 
prohibited [tos 0xc0]

My guess is that ICMP message is something to do with a firewall.
--
Steve
_
MSN 9 Dial-up Internet Access fights spam and pop-ups  now 3 months FREE! 
http://join.msn.click-url.com/go/onm00200361ave/direct/01/

Re: Estimate timeout

2004-06-09 Thread Paul Bijnens

Steven Schoch wrote:
It was working for several days, then all of a sudden it stopped and 
hasn't worked since.
First thing to ask is: what did change since then?
Installed something?  Reconfigured something?  Rebooted system?
amandad: time 447.906: sending REP packet:
It took less than 550 seconds to estimate all of it.
planner: time 10801.886: error result for host marge disk /: Estimate 
and server timed out after 3 DLE's * 2 lvls * 1800 sec = 10800 seconds
It looks like homer was waiting a suffcient time for marge to reply, but 
the reply was dropped.
Yes, indeed.
Marge and homer are on the same switch.
Are there other clients besides marge?
Is there a local firewall activated on homer?
Try to find out where the UDP packet got dropped, using tcpdump or 
etherreal or other network analyzer on homer and marge.

--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out  *
***

Re: "Estimate timeout" on Mac OS X

2004-04-05 Thread Gene Heskett

On Monday 05 April 2004 04:07, David Chin wrote:
>On 5 Apr 2004, at 03:02, Gene Heskett wrote:
>> Here is the first potential problem.  Even with all the warnings
>> plastered all over the FAQ and Docs, folks still insist on useing
>> a universal name instead of the FQDN.
>
>Yes, I knew about the problems with a universal name. I just wanted
>to get something up quickly as a test. My machine sits NATted at
> home and doesn't have a real FQDN.
>
>But anyway, I changed it:
>
>1. make my wireless AP give my machine a fixed address
>2. add an entry to /etc/hosts --
>
>   192.168.0.111 myhostname
>
>> Second, did you build amanda as the user amanda, then become root
>> to do the make install?
>
>I decided to avoid all permission stuff by running everything as
> root. Yes, I know the dangers, and I am willing to live with the
> risk for now.

amanda checks to see who she is, and amdump will not run as root.  
Tear it all back out and reinstall according to the instructions.  
This requirement is a security related requirement, and really isn't 
open for discussion.  Where amanda needs root perms, she will do an 
suid root to gain the perms she needs.  Make a normal user "amanda' 
and make this user a member of the group 'disk' or 'backup'.  As 
root, do a "chown -R amanda:disk amanda-2.4.5b1-20040326" (if thats 
the name of the src tree) before starting the build.  I maintain 
these src trees in /home/amanda here.  You'll also need to change the 
perms on the tarball itself because lately the tarballs are not owned 
by amanda if root does the download.  Minor detail.

I also use a script to do the configuration  and initial make because 
its consistent and repeatable from snapshot to snapshot without 
relying on my aged, occasionally fading memory. I copy this script 
into the new src tree when a new snapshot comes out, and run it from 
the top level directory of the src.

The script:
-gh.cf
#!/bin/sh
# since I'm always forgetting to su amanda...
if [ `whoami` != 'amanda' ]; then
echo
echo " Warning "
echo "Amanda needs to be configured and built by the user amanda,"
echo "but must be installed by the user root."
echo
exit 1
fi
make clean
rm -f config.status config.cache
./configure --with-user=amanda \
--with-group=disk \
--with-owner=amanda \
--with-tape-device=/dev/nst0 \
--with-changer-device=/dev/sg1 \
--with-gnu-ld --prefix=/usr/local \
--with-debugging=/tmp/amanda-dbg/ \
--with-tape-server=FQDN.of.the.server \
--with-amandahosts \
--with-configdir=/usr/local/etc/amanda

make
---end of script-

remove the changer device line if you don't have a robotic changer.  
The Fully Qualified Domain Name (FQDN) of the tape server (or its ip 
address) must be used.

Adjust the device name to be whatever the NON-rewinding on file close 
device is on your system.  Set the x bit (chmod +x script.name)  
Become amanda and execute it with "./script.name".  Then become root 
and do a "make install"

I doubt you'll need to do it, but the estimate timeout value 
('etimeout' in your amanda.conf) which is defaulted to 10 minutes 
(600 seconds) per disklist entry might have to be increased.  I did 
that early on when it was running on a much slower machine, but now 
on this box a 44 member disklist typically takes 22 minutes to 
estimate.  The backup will in any event commence when all estimates 
have been obtained, or have timed out, unlikely on todays hardware 
such as your G5.

[...]

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.22% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attornies please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

Re: "Estimate timeout" on Mac OS X

2004-04-05 Thread David Chin

On 5 Apr 2004, at 03:02, Gene Heskett wrote:
Here is the first potential problem.  Even with all the warnings
plastered all over the FAQ and Docs, folks still insist on useing a
universal name instead of the FQDN.
Yes, I knew about the problems with a universal name. I just wanted
to get something up quickly as a test. My machine sits NATted at home
and doesn't have a real FQDN.
But anyway, I changed it:

1. make my wireless AP give my machine a fixed address
2. add an entry to /etc/hosts --
  192.168.0.111 myhostname


Second, did you build amanda as the user amanda, then become root to
do the make install?
I decided to avoid all permission stuff by running everything as root.
Yes, I know the dangers, and I am willing to live with the risk for now.
While it should run ok, 2.4.4p2 is beginning to get a bit long in the
tooth. We're up to 2.4.5beta something or other, and I've not found
anything beta about it.  It just works.
Are you running it on OS X? I had to edit some of the source to get it
to compile. I have 2.4.4p2 running in my lab - RH7.3 server, mix of RH,
Fedora, and HP-UX clients - so I figure I'd stick with something I
knew was already working.
No dice, still. I'll try setting up a separate amanda user first, and
then go on and try the beta code. I'll dig around in the code as last
resort since Google didn't find me any interesting links for a search
on '"estimate timeout" amanda'
--Dave

These dumps were to tape daily12.
The next tape Amanda expects to use is: a new tape.
The next new tape already labelled is: daily01.
FAILURE AND STRANGE DUMP SUMMARY:
  Ginger /Users/drauh lev 0 FAILED [Estimate timeout from Ginger]
STATISTICS:
  Total   Full  Daily
      
Estimate Time (hrs:min)0:15
Run Time (hrs:min) 0:15
Dump Time (hrs:min)0:00   0:00   0:00
Output Size (meg)   0.00.00.0
Original Size (meg) 0.00.00.0
Avg Compressed Size (%) -- -- --
Filesystems Dumped0  0  0
Avg Dump Rate (k/s) -- -- --
Tape Time (hrs:min)0:00   0:00   0:00
Tape Size (meg) 0.00.00.0
Tape Used (%)   0.00.00.0
Filesystems Taped 0  0  0
Avg Tp Write Rate (k/s) -- -- --
USAGE BY TAPE:
  Label Time  Size  %Nb
  daily12   0:00   0.00.0 0


NOTES:
  planner: Adding new disk Ginger:/Users/drauh.
  driver: WARNING: got empty schedule from planner
  taper: tape daily12 kb 0 fm 0 [OK]


DUMP SUMMARY:
 DUMPER STATSTAPER STATS
HOSTNAME DISKL ORIG-KB OUT-KB COMP% MMM:SS  KB/s MMM:SS  
KB/s
-- - 

Ginger   -sers/drauh 0 FAILED 
---

(brought to you by Amanda version 2.4.4p2)

Re: "Estimate timeout" on Mac OS X

2004-04-05 Thread Gene Heskett

On Monday 05 April 2004 01:25, David Chin wrote:
>Hi,
>
>I've almost got amanda to run on a PowerBook G4 with Mac OS X.3.3.
> Right now, I have it set up with "virtual tapes" on a separate
> disk. Everything goes well, including the amcheck, but when I rum
> amdump, the backup doesn't go. The mailed log of the run is below.
>
>Can someone point me in the right direction?
>
>Thanks in advance,
>--Dave
>
>
>These dumps were to tape daily11.
>The next tape Amanda expects to use is: a new tape.
>The next new tape already labelled is: daily01.
>
>FAILURE AND STRANGE DUMP SUMMARY:
>   localhost  /Users/drauh lev 0 FAILED [Estimate timeout from
> localhost]
>
>
>STATISTICS:
>   Total   Full  Daily
>       
>Estimate Time (hrs:min)0:15
>Run Time (hrs:min) 0:15
>Dump Time (hrs:min)0:00   0:00   0:00
>Output Size (meg)   0.00.00.0
>Original Size (meg) 0.00.00.0
>Avg Compressed Size (%) -- -- --
>Filesystems Dumped0  0  0
>Avg Dump Rate (k/s) -- -- --
>
>Tape Time (hrs:min)0:00   0:00   0:00
>Tape Size (meg) 0.00.00.0
>Tape Used (%)   0.00.00.0
>Filesystems Taped 0  0  0
>Avg Tp Write Rate (k/s) -- -- --
>
>USAGE BY TAPE:
>   Label Time  Size  %Nb
>   daily11   0:00   0.00.0 0
>
>
>
>NOTES:
>   planner: Adding new disk localhost:/Users/drauh.
>   driver: WARNING: got empty schedule from planner
>   taper: tape daily11 kb 0 fm 0 [OK]
>
>
>
>DUMP SUMMARY:
>  DUMPER STATSTAPER
> STATS HOSTNAME DISKL ORIG-KB OUT-KB COMP% MMM:SS  KB/s
> MMM:SS KB/s
>-- -
>
>localhost-sers/drauh 0 FAILED

Here is the first potential problem.  Even with all the warnings 
plastered all over the FAQ and Docs, folks still insist on useing a 
universal name instead of the FQDN.

Second, did you build amanda as the user amanda, then become root to 
do the make install?  I'm thinking, just a hunch because you've not 
posted enough info, that there is either a permissions problem, or 
an .amandahosts problem, but in the latter case it will usually tell 
you about quite plainly.

>---
>
>(brought to you by Amanda version 2.4.4p2)

While it should run ok, 2.4.4p2 is beginning to get a bit long in the 
tooth. We're up to 2.4.5beta something or other, and I've not found 
anything beta about it.  It just works.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.22% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attornies please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

Re: estimate timeout

2003-12-04 Thread Jon LaBadie

On Thu, Dec 04, 2003 at 11:46:52PM +0100, Mats Blomstrand wrote:
> > > Any ideas what i can do about it?
> > 
> > Uhh, increase the estimate timeout in amanda.conf?
> 
> Sounds like a good idea. Wonder why i didnt think of that :)
> 
> Sinced i mailed the above brigth question i have been digging in my
> amanda mailbox for similar questions and learned about "etimeout".
> 
> What do you think i should set it to on a SUN Ultra-30 with 340k files?

A large number and then look at the reports and back it off.

-- 
Jon H. LaBadie  [EMAIL PROTECTED]
 JG Computing
 4455 Province Line Road(609) 252-0159
 Princeton, NJ  08540-4322  (609) 683-7220 (fax)

Re: estimate timeout

2003-12-04 Thread Mats Blomstrand

> > Any ideas what i can do about it?
> 
> Uhh, increase the estimate timeout in amanda.conf?

Sounds like a good idea. Wonder why i didnt think of that :)

Sinced i mailed the above brigth question i have been digging in my
amanda mailbox for similar questions and learned about "etimeout".

What do you think i should set it to on a SUN Ultra-30 with 340k files?
//Mats

Re: estimate timeout

2003-12-04 Thread Jon LaBadie

On Thu, Dec 04, 2003 at 11:09:17PM +0100, Mats Blomstrand wrote:
> Hi all
> 'planner' just told me "estimate timeout ..." on my new archive-conf im
> testing. The normal-backup works ok even on level 0 dumps from the same host.
> 
> Any ideas what i can do about it?

Uhh, increase the estimate timeout in amanda.conf?


-- 
Jon H. LaBadie  [EMAIL PROTECTED]
 JG Computing
 4455 Province Line Road(609) 252-0159
 Princeton, NJ  08540-4322  (609) 683-7220 (fax)

49 matches

Mail list logo