Estimate timeout

2009-09-02 Thread Brian Cuttler

Amanda 2.6.1 on Solaris 10/Sparc
Amanda 2.6.1, Solaris 10x86

Server has 21 clients with a total of 109 DLEs.
One of the client systems has 51 DLEs, 1 ufs and 50 zfs partitions.

The partitions/DLE are all part of the same ZFS pool, which
I believe (listening to another discussion earlier this week)
are checked sequentially.

We seem to be exceeding a timeout limit.

etimeout is for size estimates - so I don't think it applies.

We have switched to server estimate for zfs-dump.

Is there a per client amcheck estimate timeout, not based on
number of client DLEs ?


Amanda Backup Client Hosts Check

WARNING: finsen: selfcheck request failed: timeout waiting for REP
Client check: 21 hosts checked in 91.125 seconds.  1 problem found.

thank you,

Brian
---
   Brian R Cuttler brian.cutt...@wadsworth.org
   Computer Systems Support(v) 518 486-1697
   Wadsworth Center(f) 518 473-6384
   NYS Department of HealthHelp Desk 518 473-0773



IMPORTANT NOTICE: This e-mail and any attachments may contain
confidential or sensitive information which is, or may be, legally
privileged or otherwise protected by law from further disclosure.  It
is intended only for the addressee.  If you received this in error or
from someone who was not authorized to send it to you, please do not
distribute, copy or use it or any attachments.  Please notify the
sender immediately by reply e-mail and delete this from your
system. Thank you for your cooperation.




Re: AW: Estimate timeout

2007-02-07 Thread Yogesh Hasabnis
Thanks for the reply. Actually, I could resolve the
problem by changing the disklist file to something
like this

hostname volumename {
root-tar
estimate calcsize
}

It worked for quite a few test backups. But while
adding some more DLEs to the disklist, I started
getting errors such as those given below, when I run
the command amcheck config_name

/etc/amanda/fullback/disklist, line 3: dump type
parameter expected
/etc/amanda/fullback/disklist, line 3: end of line
expected

My version of amanda is 2.4.4. I thought that the
estimate parameter I have used in the disklist is
not supported in amanda-2.4.4. But then I wonder why
it worked for some time. Seems that I will have to
upgrade the version of amanda. Kindly let me know, if
there is any other way of resolving this issue.

Thanks

Yogesh


--- Dipl.Ing.Trompler Wilhelm [EMAIL PROTECTED]
wrote:

 try different values for etime.
 
 Regards W.Trompler 
 
 -Ursprüngliche Nachricht-
 Von: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] Im
 Auftrag von Yogesh Hasabnis
 Gesendet: Mittwoch, 07. Februar 2007 06:48
 An: amanda-users@amanda.org
 Betreff: Estimate timeout
 
 Hi,
 
 Yesterday, I created one configuration for always
 full backup with the name FULLBACK. The 2 volumes
 to
 be backed up had a size of roughly 150GB. But the
 backup perfromed yesterday seems to have failed. The
 amanda logs for this backup run, read as given below
 (I have edited the hostname and the volume names):
 
  START driver date 20070206
 DISK planner host_name volume_name_1
 DISK planner host_name volume_name_2
 START planner date 20070206
 INFO planner Adding new disk
 host_name:volume_name_1.
 INFO planner Adding new disk
 host_name:volume_name_2.
 START taper datestamp 20070206 label FULLBACK1 tape
 0
 FAIL planner hostname volume_name_2 20070206 0
 [Estimate timeout from host_name]
 FAIL planner host_name volume_name_1 20070206 0
 [Estimate timeout from host_name]
 FINISH planner date 20070206
 WARNING driver WARNING: got empty schedule from
 planner
 STATS driver startup time 1810.114
 INFO taper tape FULLBACK1 kb 0 fm 0 [OK]
 
 FINISH driver date 20070206 time 1815.145
 
 The backup server and client are the same in my
 case.
 
 Kindly give me suggestions about what may have gone
 wrong.
 
 Thanks
 
 Yogesh
 
 
 
  


 
 Get your own web address.  
 Have a HUGE year through Yahoo! Small Business.
 http://smallbusiness.yahoo.com/domains/?p=BESTDEAL
 
 



 

Expecting? Get great news right away with email Auto-Check. 
Try the Yahoo! Mail Beta.
http://advision.webevents.yahoo.com/mailbeta/newmail_tools.html 


Estimate timeout

2007-02-06 Thread Yogesh Hasabnis
Hi,

Yesterday, I created one configuration for always
full backup with the name FULLBACK. The 2 volumes to
be backed up had a size of roughly 150GB. But the
backup perfromed yesterday seems to have failed. The
amanda logs for this backup run, read as given below
(I have edited the hostname and the volume names):

 START driver date 20070206
DISK planner host_name volume_name_1
DISK planner host_name volume_name_2
START planner date 20070206
INFO planner Adding new disk
host_name:volume_name_1.
INFO planner Adding new disk
host_name:volume_name_2.
START taper datestamp 20070206 label FULLBACK1 tape 0
FAIL planner hostname volume_name_2 20070206 0
[Estimate timeout from host_name]
FAIL planner host_name volume_name_1 20070206 0
[Estimate timeout from host_name]
FINISH planner date 20070206
WARNING driver WARNING: got empty schedule from
planner
STATS driver startup time 1810.114
INFO taper tape FULLBACK1 kb 0 fm 0 [OK]

FINISH driver date 20070206 time 1815.145

The backup server and client are the same in my case.

Kindly give me suggestions about what may have gone
wrong.

Thanks

Yogesh



 

Get your own web address.  
Have a HUGE year through Yahoo! Small Business.
http://smallbusiness.yahoo.com/domains/?p=BESTDEAL


estimate timeout and dump failure

2006-10-06 Thread Mike Galvez
Hi,

I am using version 2.5.0p2. on my dump host. One of my clients (same version) 
has a 
filesystem that consistently fails the estimate and dump phases. The same host 
has 
two other filesystems (smaller) that complete without problem. The amandad and 
selfcheck debug from this host shows no indication of problems.

The sendsize debug does show a warning, but I can't find enough information to
correct the problem or be sure that it is the problem. 

Error/Warning message: From backup report 

FAILURE AND STRANGE DUMP SUMMARY:
  fa1  amrd0s1f  lev 0  FAILED [disk amrd0s1f, all estimate timed out]
  planner: ERROR Request to fa1 failed: timeout waiting for REP
-

Error/Warning message: From sendsize.debug  

sendsize[67866]: time 5.776: getting size via dump for amrd0s1f level 0
sendsize[67866]: time 5.777: calculating for device '/dev/amrd0s1f' with 'ufs'
sendsize[67866]: time 5.777: running /sbin/dump 0Shsf 0 1048576 - 
/dev/amrd0s1f
sendsize[67866]: time 5.778: running /usr/local/libexec/amanda/killpgrp
sendsize[67866]: time 5.781:   DUMP: WARNING: should use -L when dumping live 
read-write filesystems!
sendsize[67866]: time 5.782:   DUMP: Date of this level 0 dump: Thu Oct  5 
19:36:58 2006
sendsize[67866]: time 5.783:   DUMP: Date of last level 0 dump: the epoch
sendsize[67866]: time 5.784:   DUMP: Dumping /dev/amrd0s1f (/usr) to standard 
output
sendsize[67866]: time 5.857:   DUMP: mapping (Pass I) [regular files]
sendsize[67866]: time 17.022:   DUMP: mapping (Pass II) [directories]
sendsize[67866]: time 17.022:   DUMP: estimated 5457824 tape blocks.
sendsize[67866]: time 17.027: .
sendsize[67866]: estimate time for amrd0s1f level 0: 11.249
sendsize[67866]: estimate size for amrd0s1f level 0: 5457824 KB
sendsize[67866]: time 17.027: asking killpgrp to terminate
sendsize[67866]: time 18.035: done with amname 'amrd0s1f', dirname '/usr', 
spindle -1
sendsize[67854]: time 18.036: child 67866 terminated normally
-

I compiled the client with amanda_snapshot, and I can see a .snap directory in 
the filesystem
noted above. 

One question I have is, How and where do you specify dump -L?

I get this same warning on one of the other filesystems on this host, but the 
estimate and
dump finish with no problems.

Backups on the host completed when host and server were using (Amanda-2.4.5). 
I appreciate any help you can provide in solving this.

Thanks

-Mike
 
-- 
Michael Galvez http://www.people.virginia.edu/~mrg8n
Information Technology Specialist University of Virginia


Re: estimate timeout and dump failure

2006-10-06 Thread Gene Heskett
On Friday 06 October 2006 10:32, Mike Galvez wrote:

Mike, I think this is a known bug, you need to update to one of the 2.5.1p1 
versions.  It bit lots of us.

See http://www.iro.umontreal.ca/~martinea/amanda/

I ran the 20061004 snapshot last night, and it worked just fine.

I am using version 2.5.0p2. on my dump host. One of my clients (same
 version) has a filesystem that consistently fails the estimate and dump
 phases. The same host has two other filesystems (smaller) that complete
 without problem. The amandad and selfcheck debug from this host shows no
 indication of problems.
[...]

If the client is a slower client, you might have to enlarge the 'etimeout' 
and 'dtimeout' settings too, but I believe that won't fix the bug I 
referred to.  Really big estimates and dumps will exceed those defaults, 
and you didn't say how big they might be.  My largest dle is about 9GiB, 
and ISTR I've had those doubled for years.  My one client is a little 
slow, its only a 500mhz k6 with 320megs of ram.  I have things divided up 
into usually not more than 2GiB dle's, some considerably smaller, so 
amanda can have a ball balancing things.  About 55GiB total.

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2006 by Maurice Eugene Heskett, all rights reserved.


Re: Estimate timeout from localhost / driver: WARNING: got empty schedule from planner

2006-04-26 Thread Thomas Grieder
Joshua Baker-LePain wrote:
 On Tue, 25 Apr 2006 at 9:22am, Thomas Grieder wrote
 
 Since a few days I get this two error messages:

 Estimate timeout from localhost
 driver: WARNING: got empty schedule from planner
 
 1) Using localhost in the disklist is generally frowned upon.  It's not a
unique name, and will likely come back to bite you someday.
 
 2) Look in /tmp/amanda at the *debug files -- most likely sendize*debug
and/or amandad*debug will have more info as to what's going wrong.
 

Thanks, backup is working well now.

Thomas


Estimate timeout from localhost / driver: WARNING: got empty schedule from planner

2006-04-25 Thread Thomas Grieder
Hi

Since a few days I get this two error messages:

Estimate timeout from localhost
driver: WARNING: got empty schedule from planner

I could not find any hints while searching google. Any ideas?

Below amcheck an amdump mails:

amcheck:

Amanda Tape Server Host Check
-
Holding disk /srv/amanda/: 155902968 KB disk space available, using
145417208 KB
amcheck-server: slot 3: not an amanda tape
amcheck-server: slot 3: not an amanda tape
amcheck-server: slot 4: date 20060419 label 000341 (active tape)
amcheck-server: slot 5: date 20060420 label 000123 (active tape)
amcheck-server: slot 6: date 20060421 label 000214 (active tape)
amcheck-server: slot 7: date 20060409 label 000348 (first labelstr match)
amcheck-server: slot 8: date 20060410 label 000282 (active tape)
amcheck-server: slot 9: date 20060410 label 000279 (active tape)
amcheck-server: slot 10: date 20060410 label 000280 (active tape)
amcheck-server: slot 11: date 20060410 label 000347 (active tape)
amcheck-server: slot 12: date 20060411 label 000349 (active tape)
amcheck-server: slot 13: date 20060412 label 000281 (active tape)
amcheck-server: slot 14: date 20060411 label 74 (active tape)
amcheck-server: slot 15: date 20060414 label 67 (active tape)
amcheck-server: slot 16: date 20060417 label 63 (active tape)
amcheck-server: slot 17: date 20060418 label 03 (active tape)
amcheck-server: slot 18: date 20060413 label 14 (active tape)
amcheck-server: slot 19: date Xlabel 25 (labelstr match)
amcheck-server: slot 20: date Xlabel 29 (labelstr match)
amcheck-server: slot 21: date Xlabel 24 (labelstr match)
amcheck-server: slot 22: date 20060330 label 000106 (labelstr match)
amcheck-server: slot 1: date 20060330 label 000350 (labelstr match)
amcheck-server: slot 2: date 20060405 label 000345 (labelstr match)
NOTE: skipping tape-writable test
Tape 000348 label ok
Server check took 2060.760 seconds

Amanda Backup Client Hosts Check

Client check: 1 host checked in 0.338 seconds, 0 problems found

(brought to you by Amanda 2.4.4p3)

amdump:
These dumps were to tape 000348.
The next tape Amanda expects to use is: 000344.
The next new tape already labelled is: 25.

FAILURE AND STRANGE DUMP SUMMARY:
  localhost  /srv/backup/moonsmile.ch lev 0 FAILED [Estimate timeout
from localhost]
  localhost  /srv/backup/Tapes lev 0 FAILED [Estimate timeout from
localhost]
  localhost  /srv/svn lev 0 FAILED [Estimate timeout from localhost]
  localhost  /var lev 0 FAILED [Estimate timeout from localhost]
  localhost  /home lev 0 FAILED [Estimate timeout from localhost]
  localhost  /usr lev 0 FAILED [Estimate timeout from localhost]
  localhost  /etc lev 0 FAILED [Estimate timeout from localhost]


STATISTICS:
  Total   Full  Daily
      
Estimate Time (hrs:min)1:15
Run Time (hrs:min) 1:15
Dump Time (hrs:min)0:00   0:00   0:00
Output Size (meg)   0.00.00.0
Original Size (meg) 0.00.00.0
Avg Compressed Size (%) -- -- --
Filesystems Dumped0  0  0
Avg Dump Rate (k/s) -- -- --

Tape Time (hrs:min)0:00   0:00   0:00
Tape Size (meg) 0.00.00.0
Tape Used (%)   0.00.00.0
Filesystems Taped 0  0  0
Avg Tp Write Rate (k/s) -- -- --

USAGE BY TAPE:
  LabelTime  Size  %Nb
  000348   0:00   0.00.0 0


NOTES:
  driver: WARNING: got empty schedule from planner
  taper: tape 000348 kb 0 fm 0 [OK]


DUMP SUMMARY:
 DUMPER STATSTAPER STATS
HOSTNAME DISKL ORIG-KB OUT-KB COMP% MMM:SS  KB/s MMM:SS  KB/s
-- - 
localhost/etc0 FAILED ---
localhost/home   0 FAILED ---
localhost-ckup/Tapes 0 FAILED ---
localhost-onsmile.ch 0 FAILED ---
localhost/srv/svn0 FAILED ---
localhost/usr0 FAILED ---
localhost/var0 FAILED ---

(brought to you by Amanda version 2.4.4p3)

Thomas


Re: Estimate timeout from localhost / driver: WARNING: got empty schedule from planner

2006-04-25 Thread Joshua Baker-LePain

On Tue, 25 Apr 2006 at 9:22am, Thomas Grieder wrote


Since a few days I get this two error messages:

Estimate timeout from localhost
driver: WARNING: got empty schedule from planner


1) Using localhost in the disklist is generally frowned upon.  It's not a
   unique name, and will likely come back to bite you someday.

2) Look in /tmp/amanda at the *debug files -- most likely sendize*debug
   and/or amandad*debug will have more info as to what's going wrong.

--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


BUG (was: Re: Handitarded....odd (partial) estimate timeout errors.)

2006-01-05 Thread Paul Bijnens

Michael Loftis wrote:



Paul asked for the logs, it seems like there's an amanda bug.  The units 


Yes, indeed, there is a bug in Amanda!
You have 236 DLE's for that host, and from my reading of the code
the REQuest UDP packet is limited to 32K instead of 64K (see planner.c
lines 1377-1383)  (Need to update the documentation!)

It seems that that planner splits up the REQuest packet into separate
UDP-packets when exceeding MAX_DGRAM/2, i.e. 32K.
Your first request was 32580 bytes.  Adding the next string to that
request would have excceeded the 32768 limit.
The reason for division by 2 seems to reserver space for error replies
on each of those.

However, the amandad client only expects one and only one REQuest packet.
Any other REQuest packet coming from the same connection (5-tuple:
protocol, remotehost, remoteport, localhost, localport) and having
a type REQ is considered a duplicate.
It should actually test for the handle and sequence to be identical
too. It does not.

It's not fixed quickly either:  when receiving the first REQ packet,
the amandad client forks and execs the request program (sendsize in
this case) and reads from the results from a pipe.

By the time the second, non-identical request comes in (with different
handle, sequence -- which is currently not checked), sendsize is already
started and cannot be given additional DLE's to estimate.


As a temporary workaround, you could shorten the exclude-list string for 
that host by creating a symlink:


   ln -s /etc/amanda/exclude.gtar /.excl

and use that as exclude-list: this shortens each line by 20 byte, which
would shrink the package to fit again. (236 DLE's * 20  = 4720 bytes
less in a REQuest UDP for that host!)


AnywayI'm getting a headache thinking about it :)  all my other DLEs 
seem ok for that host, and the ones that it misses are not always 
exactly the same, but all seem to be non-calcsize estimated.


Just bad luck for those entries that happen to go in the end of the
queue.  On the other hand, when really unlucky, you could have up to 
three estimates for each DLE, overflowing even the 4K we saved by 
shrinking the exclude string...



--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ...  Are you sure?  ...   YES   ...   Phew ...   I'm out  *
***




Re: BUG (was: Re: Handitarded....odd (partial) estimate timeout errors.)

2006-01-05 Thread Michael Loftis



--On January 5, 2006 4:49:53 PM +0100 Paul Bijnens 
[EMAIL PROTECTED] wrote:



Michael Loftis wrote:



Paul asked for the logs, it seems like there's an amanda bug.  The units


Yes, indeed, there is a bug in Amanda!
You have 236 DLE's for that host, and from my reading of the code
the REQuest UDP packet is limited to 32K instead of 64K (see planner.c
lines 1377-1383)  (Need to update the documentation!)


Woot, I'm NOT crazy! :D

...did I just say woot?  My apologies.


It seems that that planner splits up the REQuest packet into separate
UDP-packets when exceeding MAX_DGRAM/2, i.e. 32K.
Your first request was 32580 bytes.  Adding the next string to that
request would have excceeded the 32768 limit.
The reason for division by 2 seems to reserver space for error replies
on each of those.


I knew it was size related but that my packets were significantly less than 
the MAX_DGRAM.  This definitely explains it.



However, the amandad client only expects one and only one REQuest packet.
Any other REQuest packet coming from the same connection (5-tuple:
protocol, remotehost, remoteport, localhost, localport) and having
a type REQ is considered a duplicate.
It should actually test for the handle and sequence to be identical
too. It does not.

It's not fixed quickly either:  when receiving the first REQ packet,
the amandad client forks and execs the request program (sendsize in
this case) and reads from the results from a pipe.

By the time the second, non-identical request comes in (with different
handle, sequence -- which is currently not checked), sendsize is already
started and cannot be given additional DLE's to estimate.


As a temporary workaround, you could shorten the exclude-list string for
that host by creating a symlink:

ln -s /etc/amanda/exclude.gtar /.excl


Yeah...This will help for a time.  Hopefully long enough for a patch to fix 
amandad.  I'll have to create a separate type for this server, since we've 
got well over a hundred now and they all share that main backup type.  I 
figured shortening the UDP packets somehow would help, I knew it was just 
odd that it wasn't quite right and I seemed to be running into the problem 
way too early :)



and use that as exclude-list: this shortens each line by 20 byte, which
would shrink the package to fit again. (236 DLE's * 20  = 4720 bytes
less in a REQuest UDP for that host!)



AnywayI'm getting a headache thinking about it :)  all my other DLEs
seem ok for that host, and the ones that it misses are not always
exactly the same, but all seem to be non-calcsize estimated.


Just bad luck for those entries that happen to go in the end of the
queue.  On the other hand, when really unlucky, you could have up to
three estimates for each DLE, overflowing even the 4K we saved by
shrinking the exclude string...


Like I said, hopefully by then either the hackers (or myself) will have put 
together a patch.  ...  I see three ways to fix this...one of which I don't 
know will fix, what about turning wait=yes to wait=no in my xinetd.conf? 
Not sure what that would break.  The other two involve code...multiple 
sendsize's, *or* a protocol change to wait for a 'final start' packet, or 
an amandad change to wait a few extra seconds before starting the actual 
sendsize, coalescing the results.


And you're right, the other ways aren't easy...one involves possibly 
breaking the protocol too.






Handitarded....odd (partial) estimate timeout errors.

2006-01-04 Thread Michael Loftis
I added about half a dozen or so DLEs (splitting an existing one) and since 
that time I get estimate timeout errors for some other DLEs on this host 
(daily run snippet attached)  ... i suspect I'm hitting a UDP packet limit 
maybe, but...I'm really drawing a blank.  I've turned up etimeout quite a 
bit, to no effect.


Maybe soemone can jog my memory, but are the estimates returned in a single 
UDP packet and therefore subject to the MTU?  If so...how to get around it? 
OR maybe I'm missing something more obvious.  Amanda 2.4.5 server and 
client, client being debian woody, server being debian sarge client DLEs 
all with 'calcsize' estimate setting except for the affected DLEs, but not 
all non-calcsize DLEs are affected...  need anything else let me know.



 planner: ERROR Request to nfs0.msomt timed out.
 nfs0.msomt /var/spool/cron lev 0 FAILED [missing result for 
/var/spool/cron in nfs0.msomt response]
 nfs0.msomt /usr/local lev 0 FAILED [missing result for /usr/local in 
nfs0.msomt response]
 nfs0.msomt /root lev 0 FAILED [missing result for /root in nfs0.msomt 
response]




--
Genius might be described as a supreme capacity for getting its possessors
into trouble of all kinds.
-- Samuel Butler


Re: Handitarded....odd (partial) estimate timeout errors.

2006-01-04 Thread Jon LaBadie
On Wed, Jan 04, 2006 at 02:01:36PM -0700, Michael Loftis wrote:
 I added about half a dozen or so DLEs (splitting an existing one) and since 
 that time I get estimate timeout errors for some other DLEs on this host 
 (daily run snippet attached)  ... i suspect I'm hitting a UDP packet limit 
 maybe, but...I'm really drawing a blank.  I've turned up etimeout quite a 
 bit, to no effect.
 
 Maybe soemone can jog my memory, but are the estimates returned in a single 
 UDP packet and therefore subject to the MTU?  If so...how to get around it? 
 OR maybe I'm missing something more obvious.  Amanda 2.4.5 server and 
 client, client being debian woody, server being debian sarge client DLEs 
 all with 'calcsize' estimate setting except for the affected DLEs, but not 
 all non-calcsize DLEs are affected...  need anything else let me know.
 
 
  planner: ERROR Request to nfs0.msomt timed out.
  nfs0.msomt /var/spool/cron lev 0 FAILED [missing result for 
 /var/spool/cron in nfs0.msomt response]
  nfs0.msomt /usr/local lev 0 FAILED [missing result for /usr/local in 
 nfs0.msomt response]
  nfs0.msomt /root lev 0 FAILED [missing result for /root in nfs0.msomt 
 response]

You can find comments on the problem here:

http://tinyurl.com/ca7pv

-- 
Jon H. LaBadie  [EMAIL PROTECTED]
 JG Computing
 4455 Province Line Road(609) 252-0159
 Princeton, NJ  08540-4322  (609) 683-7220 (fax)


Re: Handitarded....odd (partial) estimate timeout errors.

2006-01-04 Thread Michael Loftis



--On January 4, 2006 4:30:53 PM -0500 Jon LaBadie [EMAIL PROTECTED] wrote:



You can find comments on the problem here:

http://tinyurl.com/ca7pv



OK hmm  something REALLY odd is happening.  For the DLEs that failed 
there are multiple sendsize requests... one in the main/first REQ which it 
acks...then another request (a second or two later) that just for the DLEs 
that never make it, amandad claims this to be a dup P_REQ packet, acks it 
anyway, but doesn't apparently do any estimates of it  I'm wary of 
sending the entire debug to the list, but if interested I'll send it 
directly to developer( s )  I'm thinking maybe something funny is going 
on?







Re: Handitarded....odd (partial) estimate timeout errors.

2006-01-04 Thread Jon LaBadie
On Wed, Jan 04, 2006 at 03:38:56PM -0700, Michael Loftis wrote:
 
 
 --On January 4, 2006 4:30:53 PM -0500 Jon LaBadie [EMAIL PROTECTED] wrote:
 
 
 You can find comments on the problem here:
 
  http://tinyurl.com/ca7pv
 
 
 OK hmm  something REALLY odd is happening.  For the DLEs that failed 
 there are multiple sendsize requests... one in the main/first REQ which it 
 acks...then another request (a second or two later) that just for the DLEs 
 that never make it, amandad claims this to be a dup P_REQ packet, acks it 
 anyway, but doesn't apparently do any estimates of it  I'm wary of 
 sending the entire debug to the list, but if interested I'll send it 
 directly to developer( s )  I'm thinking maybe something funny is going 
 on?
 
 

Asking questions about things I know nothing ...  :)

Are you using iptables?
If so, have you installed and configured the ??conntrack?? module?


-- 
Jon H. LaBadie  [EMAIL PROTECTED]
 JG Computing
 4455 Province Line Road(609) 252-0159
 Princeton, NJ  08540-4322  (609) 683-7220 (fax)


Re: Handitarded....odd (partial) estimate timeout errors.

2006-01-04 Thread Michael Loftis



--On January 4, 2006 7:20:50 PM -0500 Jon LaBadie [EMAIL PROTECTED] wrote:


Asking questions about things I know nothing ...  :)

Are you using iptables?
If so, have you installed and configured the ??conntrack?? module?


Paul asked for the logs, it seems like there's an amanda bug.  The units in 
question are attached to the same broadcast domain/VLAN and are in the same 
subnet, so are talking directly to eachother.  It's not an obvious network 
or switch problem going on.  I thought maybe an MTU limit of 1500 bytes but 
apparently amanda is set to fragment UDP packets up to 64k and so that 
should be fine, and other drives are making it.  Anyway thanks anyway Jon 
:)  I think we've hit some sort of bug or something in amandad, or planner 
(I think it sends the SERVICE sendsize packets) or both.


Network wise BTW the backup server is connected to a switch here in the 
office, which is trunked further to a switch upstairs, then to another 
switch in the blade chassis, then to the untrunked connection to the 
(amanda backup client) nfs server which is the one having issues.  It 
seemed maybe some sort of odd packet size limit or some other 'max number 
of' limit in planner, since planner is sending duplicate requests sorta for 
the affected DLEs.



AnywayI'm getting a headache thinking about it :)  all my other DLEs 
seem ok for that host, and the ones that it misses are not always exactly 
the same, but all seem to be non-calcsize estimated.


Re: AW: AW: AW: AW: AW: AW: AW: Estimate timeout from server

2005-11-04 Thread Alexander Jolk

Sebastian Kösters wrote:

Thats all in sendsize:

sendsize: debug 1 pid 5132 ruid 33 euid 33: start at Fri Nov  4 02:30:02
2005
sendsize: version 2.4.3
sendsize[5132]: time 0.007: waiting for any estimate child
sendsize[5134]: time 0.007: calculating for amname '/pst', dirname '/pst',
spindle -1
sendsize[5134]: time 0.007: getting size via gnutar for /pst level 0
sendsize[5134]: time 0.034: spawning /usr/lib/amanda/runtar in pipeline
sendsize[5134]: argument list: /bin/tar --create --file /dev/null
--directory /pst --one-file-system --listed-incremental
/var/lib/amanda/gnutar-lists/pst_ps
t_0.new --sparse --ignore-failed-read --totals --exclude-from
/tmp/amanda/sendsize._pst.20051104023002.exclude .
sendsize[5134]: time 2938.236: Total bytes written: 86986362880 (81GB,
28MB/s)
sendsize[5134]: time 2938.237: .
sendsize[5134]: estimate time for /pst level 0: 2938.203
sendsize[5134]: estimate size for /pst level 0: 84947620 KB
sendsize[5134]: time 2938.237: waiting for /bin/tar /pst child
sendsize[5134]: time 2938.237: after /bin/tar /pst wait
sendsize[5134]: time 2938.238: done with amname '/pst', dirname '/pst',
spindle -1
sendsize[5132]: time 2938.238: child 5134 terminated normally
sendsize: time 2938.238: pid 5132 finish time Fri Nov  4 03:19:00 2005


[...]


Seems to work but why not directly with amanda?!


Remind me, did you change etimeout?  Your sendsize debug file seems 
normal to me, and in fact we see that the estimate size is correctly 
identified.  Could you show the corresponding amandad debug file please?


Alex


--
Alexander Jolk  * BUF Compagnie * [EMAIL PROTECTED]
Tel +33-1 42 68 18 28  *  Fax +33-1 42 68 18 29


AW: AW: AW: AW: AW: AW: AW: AW: Estimate timeout from server

2005-11-04 Thread Sebastian Kösters
The last try from which i posted the output was by hand. Now i`am trying
again with Amanda (Timeout 4000)


-Ursprüngliche Nachricht-
Von: Alexander Jolk [mailto:[EMAIL PROTECTED] 
Gesendet: Freitag, 4. November 2005 09:54
An: Sebastian Kösters
Cc: amanda-users@amanda.org
Betreff: Re: AW: AW: AW: AW: AW: AW: AW: Estimate timeout from server

Sebastian Kösters wrote:
 Thats all in sendsize:
 
 sendsize: debug 1 pid 5132 ruid 33 euid 33: start at Fri Nov  4 02:30:02
 2005
 sendsize: version 2.4.3
 sendsize[5132]: time 0.007: waiting for any estimate child
 sendsize[5134]: time 0.007: calculating for amname '/pst', dirname '/pst',
 spindle -1
 sendsize[5134]: time 0.007: getting size via gnutar for /pst level 0
 sendsize[5134]: time 0.034: spawning /usr/lib/amanda/runtar in pipeline
 sendsize[5134]: argument list: /bin/tar --create --file /dev/null
 --directory /pst --one-file-system --listed-incremental
 /var/lib/amanda/gnutar-lists/pst_ps
 t_0.new --sparse --ignore-failed-read --totals --exclude-from
 /tmp/amanda/sendsize._pst.20051104023002.exclude .
 sendsize[5134]: time 2938.236: Total bytes written: 86986362880 (81GB,
 28MB/s)
 sendsize[5134]: time 2938.237: .
 sendsize[5134]: estimate time for /pst level 0: 2938.203
 sendsize[5134]: estimate size for /pst level 0: 84947620 KB
 sendsize[5134]: time 2938.237: waiting for /bin/tar /pst child
 sendsize[5134]: time 2938.237: after /bin/tar /pst wait
 sendsize[5134]: time 2938.238: done with amname '/pst', dirname '/pst',
 spindle -1
 sendsize[5132]: time 2938.238: child 5134 terminated normally
 sendsize: time 2938.238: pid 5132 finish time Fri Nov  4 03:19:00 2005

[...]

 Seems to work but why not directly with amanda?!

Remind me, did you change etimeout?  Your sendsize debug file seems 
normal to me, and in fact we see that the estimate size is correctly 
identified.  Could you show the corresponding amandad debug file please?

Alex


-- 
Alexander Jolk  * BUF Compagnie * [EMAIL PROTECTED]
Tel +33-1 42 68 18 28  *  Fax +33-1 42 68 18 29






AW: AW: AW: AW: AW: AW: AW: AW: Estimate timeout from server

2005-11-04 Thread Sebastian Kösters
Like i said i started amdumup with etimeout = 4000but now ist running
since 8:36  now it is 10:53.

And it is even running

I dont understand.

On the client thera are the following process from amanda:

8243 ?S  0:00 /usr/lib/amanda/sendbackup
 8245 ?S 52:15 /usr/bin/gzip --fast
 8246 ?S  1:19 /usr/lib/amanda/sendbackup
 8247 ?S  0:00 sh -c /bin/tar -tf - 2/dev/null | sed -e
's/^\.//'
 8248 ?S  0:49 /bin/tar -tf -
 8249 ?S  0:00 sed -e s/^\.//
 8250 ?R  2:30 gtar --create --file - --directory /pst
--one-file-system --listed-incremental
/var/lib/amanda/gnutar-lists/pst_pst_0.new --sparse
 

-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im
Auftrag von Alexander Jolk
Gesendet: Freitag, 4. November 2005 09:54
An: Sebastian Kösters
Cc: amanda-users@amanda.org
Betreff: Re: AW: AW: AW: AW: AW: AW: AW: Estimate timeout from server

Sebastian Kösters wrote:
 Thats all in sendsize:
 
 sendsize: debug 1 pid 5132 ruid 33 euid 33: start at Fri Nov  4 02:30:02
 2005
 sendsize: version 2.4.3
 sendsize[5132]: time 0.007: waiting for any estimate child
 sendsize[5134]: time 0.007: calculating for amname '/pst', dirname '/pst',
 spindle -1
 sendsize[5134]: time 0.007: getting size via gnutar for /pst level 0
 sendsize[5134]: time 0.034: spawning /usr/lib/amanda/runtar in pipeline
 sendsize[5134]: argument list: /bin/tar --create --file /dev/null
 --directory /pst --one-file-system --listed-incremental
 /var/lib/amanda/gnutar-lists/pst_ps
 t_0.new --sparse --ignore-failed-read --totals --exclude-from
 /tmp/amanda/sendsize._pst.20051104023002.exclude .
 sendsize[5134]: time 2938.236: Total bytes written: 86986362880 (81GB,
 28MB/s)
 sendsize[5134]: time 2938.237: .
 sendsize[5134]: estimate time for /pst level 0: 2938.203
 sendsize[5134]: estimate size for /pst level 0: 84947620 KB
 sendsize[5134]: time 2938.237: waiting for /bin/tar /pst child
 sendsize[5134]: time 2938.237: after /bin/tar /pst wait
 sendsize[5134]: time 2938.238: done with amname '/pst', dirname '/pst',
 spindle -1
 sendsize[5132]: time 2938.238: child 5134 terminated normally
 sendsize: time 2938.238: pid 5132 finish time Fri Nov  4 03:19:00 2005

[...]

 Seems to work but why not directly with amanda?!

Remind me, did you change etimeout?  Your sendsize debug file seems 
normal to me, and in fact we see that the estimate size is correctly 
identified.  Could you show the corresponding amandad debug file please?

Alex


-- 
Alexander Jolk  * BUF Compagnie * [EMAIL PROTECTED]
Tel +33-1 42 68 18 28  *  Fax +33-1 42 68 18 29






Re: AW: AW: AW: AW: AW: AW: AW: Estimate timeout from server

2005-11-04 Thread Paul Bijnens

Sebastian Kösters wrote:

Hi!

Thats all in sendsize:

sendsize: debug 1 pid 5132 ruid 33 euid 33: start at Fri Nov  4 02:30:02
2005
sendsize: version 2.4.3
sendsize[5132]: time 0.007: waiting for any estimate child
sendsize[5134]: time 0.007: calculating for amname '/pst', dirname '/pst',
spindle -1
sendsize[5134]: time 0.007: getting size via gnutar for /pst level 0
sendsize[5134]: time 0.034: spawning /usr/lib/amanda/runtar in pipeline
sendsize[5134]: argument list: /bin/tar --create --file /dev/null
--directory /pst --one-file-system --listed-incremental
/var/lib/amanda/gnutar-lists/pst_ps
t_0.new --sparse --ignore-failed-read --totals --exclude-from
/tmp/amanda/sendsize._pst.20051104023002.exclude .
sendsize[5134]: time 2938.236: Total bytes written: 86986362880 (81GB,
28MB/s)
sendsize[5134]: time 2938.237: .
sendsize[5134]: estimate time for /pst level 0: 2938.203
sendsize[5134]: estimate size for /pst level 0: 84947620 KB
sendsize[5134]: time 2938.237: waiting for /bin/tar /pst child
sendsize[5134]: time 2938.237: after /bin/tar /pst wait
sendsize[5134]: time 2938.238: done with amname '/pst', dirname '/pst',
spindle -1
sendsize[5132]: time 2938.238: child 5134 terminated normally
sendsize: time 2938.238: pid 5132 finish time Fri Nov  4 03:19:00 2005


This sendsize is completely normal.
The previous one you sent, did have the error about not finding the size 
line. But that size line is there in this run!

Notice the line Total bytes written: ...




[EMAIL PROTECTED] gnutar-lists]# /bin/tar --create --file /dev/null
--directory /pst --one-file-system --listed-incremental
/var/lib/amanda/gnutar-lists/pst_pst_0.new --sparse --ignore-failed-read
--totals .

After +1 hour

Gesamtzahl geschriebener Bytes: 86991011840 (81GB, 27MB/s) (English: Total
number of written bytes )


Seems you have some mixed English-German environment.  Amanda 
specifically looks for the English string, and does not recognize the
german words.  Could it be that some amdump runs somehow use the german 
environment?





Seems to work but why not directly with amanda?!



I cannot conclude that from the evidence you give here:  the sendsize is
perfect.




--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  Are you sure?  ...   YES   ...   Phew ...   I'm out  *
***


AW: AW: AW: AW: AW: AW: AW: AW: Estimate timeout from server

2005-11-04 Thread Sebastian Kösters
Hi!

We have some more server with (for example) german tar and there are no
Problems with the backup. Amanda is installed in English
 

-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im
Auftrag von Paul Bijnens
Gesendet: Freitag, 4. November 2005 11:26
An: Sebastian Kösters
Cc: amanda-users@amanda.org
Betreff: Re: AW: AW: AW: AW: AW: AW: AW: Estimate timeout from server

Sebastian Kösters wrote:
 Hi!
 
 Thats all in sendsize:
 
 sendsize: debug 1 pid 5132 ruid 33 euid 33: start at Fri Nov  4 02:30:02
 2005
 sendsize: version 2.4.3
 sendsize[5132]: time 0.007: waiting for any estimate child
 sendsize[5134]: time 0.007: calculating for amname '/pst', dirname '/pst',
 spindle -1
 sendsize[5134]: time 0.007: getting size via gnutar for /pst level 0
 sendsize[5134]: time 0.034: spawning /usr/lib/amanda/runtar in pipeline
 sendsize[5134]: argument list: /bin/tar --create --file /dev/null
 --directory /pst --one-file-system --listed-incremental
 /var/lib/amanda/gnutar-lists/pst_ps
 t_0.new --sparse --ignore-failed-read --totals --exclude-from
 /tmp/amanda/sendsize._pst.20051104023002.exclude .
 sendsize[5134]: time 2938.236: Total bytes written: 86986362880 (81GB,
 28MB/s)
 sendsize[5134]: time 2938.237: .
 sendsize[5134]: estimate time for /pst level 0: 2938.203
 sendsize[5134]: estimate size for /pst level 0: 84947620 KB
 sendsize[5134]: time 2938.237: waiting for /bin/tar /pst child
 sendsize[5134]: time 2938.237: after /bin/tar /pst wait
 sendsize[5134]: time 2938.238: done with amname '/pst', dirname '/pst',
 spindle -1
 sendsize[5132]: time 2938.238: child 5134 terminated normally
 sendsize: time 2938.238: pid 5132 finish time Fri Nov  4 03:19:00 2005

This sendsize is completely normal.
The previous one you sent, did have the error about not finding the size 
line. But that size line is there in this run!
Notice the line Total bytes written: ...



 [EMAIL PROTECTED] gnutar-lists]# /bin/tar --create --file /dev/null
 --directory /pst --one-file-system --listed-incremental
 /var/lib/amanda/gnutar-lists/pst_pst_0.new --sparse --ignore-failed-read
 --totals .
 
 After +1 hour
 
 Gesamtzahl geschriebener Bytes: 86991011840 (81GB, 27MB/s) (English: Total
 number of written bytes )

Seems you have some mixed English-German environment.  Amanda 
specifically looks for the English string, and does not recognize the
german words.  Could it be that some amdump runs somehow use the german 
environment?


 
 Seems to work but why not directly with amanda?!


I cannot conclude that from the evidence you give here:  the sendsize is
perfect.




-- 
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  Are you sure?  ...   YES   ...   Phew ...   I'm out  *
***






Re: AW: AW: AW: AW: Estimate timeout from server

2005-11-03 Thread Alexander Jolk

Sebastian Kösters wrote:

It does not work.


Which means?  Do you get an error message?  Please cite it.  What do the 
relevant log files say?  You showed us yesterday that the estimate phase 
took almost an hour on this machine, have you waited for that time?



On the client amanda starts the following command:

/bin/tar --create --file /dev/null --directory /pst --one-file-system
--listed-incremental /var/lib/amanda/gnutar-lists/pst_pst_0.new


What happens if you run that command by hand?  (And wait until 
completion, obviously.)


Alex


--
Alexander Jolk  * BUF Compagnie * [EMAIL PROTECTED]
Tel +33-1 42 68 18 28  *  Fax +33-1 42 68 18 29


Re: Estimate Timeout Issue - Dump runs fine

2005-11-03 Thread Tom Brown


OK thanks - I have increased the etimeout to 2400 seconds and also 
changed the udp timeout within checkpoint to also be 2400 seconds so 
i'll see how the run goes tonight


everything was fine today - no estimate timeout

thanks for the pointer



AW: AW: AW: AW: AW: Estimate timeout from server

2005-11-03 Thread Sebastian Kösters
When run it by hand i get this error (translated from german to english)

/bin/tar: No empty Archive created.

Everything is like on the other Mashines. 
 
-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im
Auftrag von Alexander Jolk
Gesendet: Donnerstag, 3. November 2005 09:58
An: Sebastian Kösters
Cc: amanda-users@amanda.org
Betreff: Re: AW: AW: AW: AW: Estimate timeout from server

Sebastian Kösters wrote:
 It does not work.

Which means?  Do you get an error message?  Please cite it.  What do the 
relevant log files say?  You showed us yesterday that the estimate phase 
took almost an hour on this machine, have you waited for that time?

 On the client amanda starts the following command:
 
 /bin/tar --create --file /dev/null --directory /pst --one-file-system
 --listed-incremental /var/lib/amanda/gnutar-lists/pst_pst_0.new

What happens if you run that command by hand?  (And wait until 
completion, obviously.)

Alex


-- 
Alexander Jolk  * BUF Compagnie * [EMAIL PROTECTED]
Tel +33-1 42 68 18 28  *  Fax +33-1 42 68 18 29






Re: AW: AW: AW: AW: AW: Estimate timeout from server

2005-11-03 Thread Alexander Jolk

Sebastian Kösters wrote:

On the client amanda starts the following command:

/bin/tar --create --file /dev/null --directory /pst --one-file-system
--listed-incremental /var/lib/amanda/gnutar-lists/pst_pst_0.new


When run it by hand i get this error (translated from german to english)

/bin/tar: No empty Archive created.


You forgot the last dot `.' on the command line as given in the debug file.

Alex


--
Alexander Jolk  * BUF Compagnie * [EMAIL PROTECTED]
Tel +33-1 42 68 18 28  *  Fax +33-1 42 68 18 29


AW: AW: AW: AW: AW: AW: Estimate timeout from server

2005-11-03 Thread Sebastian Kösters
You are right.

When i try it by hand nothing happens. It runs and runs and runs and the
file stays at 0kb.

I found no error messages.

I also changed the persmissions of the amanda files / directorys to 777.  
 

-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im
Auftrag von Alexander Jolk
Gesendet: Donnerstag, 3. November 2005 14:21
An: Sebastian Kösters
Cc: amanda-users@amanda.org
Betreff: Re: AW: AW: AW: AW: AW: Estimate timeout from server

Sebastian Kösters wrote:
On the client amanda starts the following command:

/bin/tar --create --file /dev/null --directory /pst --one-file-system
--listed-incremental /var/lib/amanda/gnutar-lists/pst_pst_0.new
 
 When run it by hand i get this error (translated from german to english)
 
 /bin/tar: No empty Archive created.

You forgot the last dot `.' on the command line as given in the debug file.

Alex


-- 
Alexander Jolk  * BUF Compagnie * [EMAIL PROTECTED]
Tel +33-1 42 68 18 28  *  Fax +33-1 42 68 18 29






AW: AW: AW: AW: AW: AW: Estimate timeout from server

2005-11-03 Thread Sebastian Kösters
I found this in /tmp/amanda on the client

sendsize: debug 1 pid 25485 ruid 33 euid 33: start at Thu Nov  3 14:00:25
2005
sendsize: version 2.4.3
sendsize[25487]: time 0.002: calculating for amname '/pst', dirname '/pst',
spindle -1
sendsize[25487]: time 0.002: getting size via gnutar for /pst level 0
sendsize[25487]: time 0.003: spawning /usr/lib/amanda/runtar in pipeline
sendsize[25487]: argument list: /bin/tar --create --file /dev/null
--directory /pst --one-file-system --listed-incremental
/var/lib/amanda/gnutar-lists/pst_p
st_0.new --sparse --ignore-failed-read --totals .
sendsize[25485]: time 0.023: waiting for any estimate child
sendsize[25487]: time 1519.255: .
sendsize[25487]: estimate time for /pst level 0: 1519.252
sendsize[25487]: no size line match in /bin/tar output for /pst
sendsize[25487]: .
sendsize[25487]: estimate size for /pst level 0: -1 KB
sendsize[25487]: time 1519.255: waiting for /bin/tar /pst child
sendsize[25487]: time 1519.256: after /bin/tar /pst wait
sendsize[25485]: time 1519.256: child 25487 terminated with signal 13
sendsize: time 1519.257: pid 25485 finish time Thu Nov  3 14:25:44 2005
 

whats that: no size line match in /bin/tar output for /pst

and that: size for /pst level 0: -1 KB ??

-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im
Auftrag von Sebastian Kösters
Gesendet: Donnerstag, 3. November 2005 14:33
An: 'Alexander Jolk'
Cc: amanda-users@amanda.org
Betreff: AW: AW: AW: AW: AW: AW: Estimate timeout from server

You are right.

When i try it by hand nothing happens. It runs and runs and runs and the
file stays at 0kb.

I found no error messages.

I also changed the persmissions of the amanda files / directorys to 777.  
 

-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im
Auftrag von Alexander Jolk
Gesendet: Donnerstag, 3. November 2005 14:21
An: Sebastian Kösters
Cc: amanda-users@amanda.org
Betreff: Re: AW: AW: AW: AW: AW: Estimate timeout from server

Sebastian Kösters wrote:
On the client amanda starts the following command:

/bin/tar --create --file /dev/null --directory /pst --one-file-system
--listed-incremental /var/lib/amanda/gnutar-lists/pst_pst_0.new
 
 When run it by hand i get this error (translated from german to english)
 
 /bin/tar: No empty Archive created.

You forgot the last dot `.' on the command line as given in the debug file.

Alex


-- 
Alexander Jolk  * BUF Compagnie * [EMAIL PROTECTED]
Tel +33-1 42 68 18 28  *  Fax +33-1 42 68 18 29










Re: AW: AW: AW: AW: AW: Estimate timeout from server

2005-11-03 Thread Matt Hyclak
On Thu, Nov 03, 2005 at 02:38:04PM +0100, Sebastian Kösters enlightened us:
 I found this in /tmp/amanda on the client
 
 sendsize: debug 1 pid 25485 ruid 33 euid 33: start at Thu Nov  3 14:00:25
 2005
 sendsize: version 2.4.3
 sendsize[25487]: time 0.002: calculating for amname '/pst', dirname '/pst',
 spindle -1
 sendsize[25487]: time 0.002: getting size via gnutar for /pst level 0
 sendsize[25487]: time 0.003: spawning /usr/lib/amanda/runtar in pipeline
 sendsize[25487]: argument list: /bin/tar --create --file /dev/null
 --directory /pst --one-file-system --listed-incremental
 /var/lib/amanda/gnutar-lists/pst_p
 st_0.new --sparse --ignore-failed-read --totals .
 sendsize[25485]: time 0.023: waiting for any estimate child
 sendsize[25487]: time 1519.255: .
 sendsize[25487]: estimate time for /pst level 0: 1519.252
 sendsize[25487]: no size line match in /bin/tar output for /pst
 sendsize[25487]: .
 sendsize[25487]: estimate size for /pst level 0: -1 KB
 sendsize[25487]: time 1519.255: waiting for /bin/tar /pst child
 sendsize[25487]: time 1519.256: after /bin/tar /pst wait
 sendsize[25485]: time 1519.256: child 25487 terminated with signal 13
 sendsize: time 1519.257: pid 25485 finish time Thu Nov  3 14:25:44 2005
  
 
 whats that: no size line match in /bin/tar output for /pst

 and that: size for /pst level 0: -1 KB ??


What version of tar is this?

-- 
Matt Hyclak
Department of Mathematics 
Department of Social Work
Ohio University
(740) 593-1263


AW: AW: AW: AW: AW: AW: AW: Estimate timeout from server

2005-11-03 Thread Sebastian Kösters
I tried it by hand with /opt and it worked.

-rw-r--r--1 root root  256  3. Nov 14:52 pst_opt_0.new

tar (GNU tar) 1.13.25
 
i dont know why it has a Problem with /pst?! Ok its big (80GB) but not to
big i think. 

-Ursprüngliche Nachricht-
Von: Paul Bijnens [mailto:[EMAIL PROTECTED] 
Gesendet: Donnerstag, 3. November 2005 14:46
An: Sebastian Kösters
Betreff: Re: AW: AW: AW: AW: AW: AW: Estimate timeout from server

Sebastian Kösters wrote:
 I found this in /tmp/amanda on the client
 
...
 sendsize[25487]: argument list: /bin/tar --create --file /dev/null
 --directory /pst --one-file-system --listed-incremental
 /var/lib/amanda/gnutar-lists/pst_p
 st_0.new --sparse --ignore-failed-read --totals .
 sendsize[25485]: time 0.023: waiting for any estimate child
 sendsize[25487]: time 1519.255: .
 sendsize[25487]: estimate time for /pst level 0: 1519.252
 sendsize[25487]: no size line match in /bin/tar output for /pst
 sendsize[25487]: .
 sendsize[25487]: estimate size for /pst level 0: -1 KB
 sendsize[25487]: time 1519.255: waiting for /bin/tar /pst child
 sendsize[25487]: time 1519.256: after /bin/tar /pst wait
 sendsize[25485]: time 1519.256: child 25487 terminated with signal 13
 sendsize: time 1519.257: pid 25485 finish time Thu Nov  3 14:25:44 2005
  
 
 whats that: no size line match in /bin/tar output for /pst

Amanda looks in the tar output for a line like:

Total bytes written: 33955840 (32MB, 938MB/s)

But it does not find one.

 
 and that: size for /pst level 0: -1 KB ??

The -1 is means it failed.

Does the tar command works for another directory instead of /pst ?
e.g. /var/log



-- 
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ...  Are you sure?  ...   YES   ...   Phew ...   I'm out  *
***








Re: AW: AW: AW: AW: AW: AW: AW: Estimate timeout from server

2005-11-03 Thread Alexander Jolk

Sebastian Kösters wrote:

I tried it by hand with /opt and it worked.

-rw-r--r--1 root root  256  3. Nov 14:52 pst_opt_0.new

tar (GNU tar) 1.13.25
 
i dont know why it has a Problem with /pst?! Ok its big (80GB) but not to
big i think. 


You have failed to understand that we don't care about that 
listed-incremental file something.new; what we are looking for is what 
tar gives on its stdout at the end of its run.  Could you report the 
exact output of the tar command, both when running on /pst and on /opt? 
 We know from your before debug files that it takes almost half an hour 
on /pst.


Alex


--
Alexander Jolk  * BUF Compagnie * [EMAIL PROTECTED]
Tel +33-1 42 68 18 28  *  Fax +33-1 42 68 18 29


Re: AW: AW: AW: AW: AW: AW: Estimate timeout from server

2005-11-03 Thread Alexander Jolk

Sebastian Kösters wrote:

I found this in /tmp/amanda on the client

[...]

sendsize[25487]: time 1519.255: .


Did you cut something here?  That would have been what was interesting.


sendsize[25487]: estimate time for /pst level 0: 1519.252



Alex


--
Alexander Jolk  * BUF Compagnie * [EMAIL PROTECTED]
Tel +33-1 42 68 18 28  *  Fax +33-1 42 68 18 29


AW: AW: AW: AW: AW: AW: AW: Estimate timeout from server

2005-11-03 Thread Sebastian Kösters
Hi!

Thats all in sendsize:

sendsize: debug 1 pid 5132 ruid 33 euid 33: start at Fri Nov  4 02:30:02
2005
sendsize: version 2.4.3
sendsize[5132]: time 0.007: waiting for any estimate child
sendsize[5134]: time 0.007: calculating for amname '/pst', dirname '/pst',
spindle -1
sendsize[5134]: time 0.007: getting size via gnutar for /pst level 0
sendsize[5134]: time 0.034: spawning /usr/lib/amanda/runtar in pipeline
sendsize[5134]: argument list: /bin/tar --create --file /dev/null
--directory /pst --one-file-system --listed-incremental
/var/lib/amanda/gnutar-lists/pst_ps
t_0.new --sparse --ignore-failed-read --totals --exclude-from
/tmp/amanda/sendsize._pst.20051104023002.exclude .
sendsize[5134]: time 2938.236: Total bytes written: 86986362880 (81GB,
28MB/s)
sendsize[5134]: time 2938.237: .
sendsize[5134]: estimate time for /pst level 0: 2938.203
sendsize[5134]: estimate size for /pst level 0: 84947620 KB
sendsize[5134]: time 2938.237: waiting for /bin/tar /pst child
sendsize[5134]: time 2938.237: after /bin/tar /pst wait
sendsize[5134]: time 2938.238: done with amname '/pst', dirname '/pst',
spindle -1
sendsize[5132]: time 2938.238: child 5134 terminated normally
sendsize: time 2938.238: pid 5132 finish time Fri Nov  4 03:19:00 2005

Outputs:

/opt

[EMAIL PROTECTED] amanda]# /bin/tar --create --file /dev/null --directory /opt
--one-file-system --listed-incremental
/var/lib/amanda/gnutar-lists/pst_opt_0.new --sparse --ignore-failed-read
--totals .

Gesamtzahl geschriebener Bytes: 798720 (780kB, ?B/s) (English: Total number
of written bytes )

/pst

[EMAIL PROTECTED] gnutar-lists]# /bin/tar --create --file /dev/null
--directory /pst --one-file-system --listed-incremental
/var/lib/amanda/gnutar-lists/pst_pst_0.new --sparse --ignore-failed-read
--totals .

After +1 hour

Gesamtzahl geschriebener Bytes: 86991011840 (81GB, 27MB/s) (English: Total
number of written bytes )

Seems to work but why not directly with amanda?!


-Ursprüngliche Nachricht-
Von: Alexander Jolk [mailto:[EMAIL PROTECTED] 
Gesendet: Donnerstag, 3. November 2005 15:20
An: Sebastian Kösters
Cc: amanda-users@amanda.org
Betreff: Re: AW: AW: AW: AW: AW: AW: Estimate timeout from server

Sebastian Kösters wrote:
 I found this in /tmp/amanda on the client
[...]
 sendsize[25487]: time 1519.255: .

Did you cut something here?  That would have been what was interesting.

 sendsize[25487]: estimate time for /pst level 0: 1519.252


Alex


-- 
Alexander Jolk  * BUF Compagnie * [EMAIL PROTECTED]
Tel +33-1 42 68 18 28  *  Fax +33-1 42 68 18 29






Estimate Timeout Issue - Dump runs fine

2005-11-02 Thread Tom Brown

Hi

Server is 2.4.5 and client is now 2.4.5p1 both on CentOS

I use Amanda and have done for years with no issues setting up etc - I 
can pretty much set up with my eyes closed now!! Amanda rocks...


But i'm getting a slightly strange error with a large partition. The 
partition in question is around 900gig in size although only a few 
hundred meg are currently used. When the estimate runs it returns


FAILURE AND STRANGE DUMP SUMMARY:
  planner: ERROR Estimate timeout from servername

Thing is though the actual dump of this filesystem runs fine - I have 
increased my eTimeout to 20mins but this still occurs - Any ideas on 
this one?


thanks



AW: Estimate timeout from server

2005-11-02 Thread Sebastian Kösters
Can no one help?
 
-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im
Auftrag von Sebastian Kösters
Gesendet: Mittwoch, 2. November 2005 07:50
An: amanda-users@amanda.org
Betreff: Estimate timeout from server

Hi,

i get this error (Estimate timeout from server) on an new installed System
(RedHat 9).

I installed amanda like on every other Machine but it will not work (no
reboot or something like this during the Backup).

Which Log-Files from which Machine do you need to help me finding the error?

Thank you very much!

Regards
Sebastian








Re: AW: Estimate timeout from server

2005-11-02 Thread Alexander Jolk

Sebastian Kösters wrote:

Which Log-Files from which Machine do you need to help me finding the error?


The output of `amcheck your-conf' on the amanda server.
If there's one particular client that fails, the files from that 
client's /tmp/amanda/ corresponding to above amcheck, or double 
confirmation that there are no files.

The version number of amanda, and the relevant platforms.

Alex


--
Alexander Jolk  * BUF Compagnie * [EMAIL PROTECTED]
Tel +33-1 42 68 18 28  *  Fax +33-1 42 68 18 29


AW: AW: Estimate timeout from server

2005-11-02 Thread Sebastian Kösters
[29469]: time 0.005: getting size via gnutar for /pst level 0
sendsize[29469]: time 0.006: spawning /usr/lib/amanda/runtar in pipeline
sendsize[29469]: argument list: /bin/tar --create --file /dev/null
--directory /pst --one-file-system --listed-incremental
/var/lib/amanda/gnutar-lists/pst_p
st_0.new --sparse --ignore-failed-read --totals .
sendsize[29467]: time 0.006: waiting for any estimate child
sendsize[29469]: time 3428.548: Total bytes written: 86429921280 (80GB,
24MB/s)
sendsize[29469]: time 3428.562: .
sendsize[29469]: estimate time for /pst level 0: 3428.556
sendsize[29469]: estimate size for /pst level 0: 84404220 KB
sendsize[29469]: time 3428.562: waiting for /bin/tar /pst child
sendsize[29469]: time 3428.562: after /bin/tar /pst wait
sendsize[29469]: time 3428.562: done with amname '/pst', dirname '/pst',
spindle -1
sendsize[29467]: time 3428.563: child 29469 terminated normally
sendsize: time 3428.563: pid 29467 finish time Wed Nov  2 08:25:20 2005
 

Server is Fedora 3 and Client is RH9

Thanks for the Help! 

-Ursprüngliche Nachricht-
Von: Alexander Jolk [mailto:[EMAIL PROTECTED] 
Gesendet: Mittwoch, 2. November 2005 13:23
An: Sebastian Kösters
Cc: amanda-users@amanda.org
Betreff: Re: AW: Estimate timeout from server

Sebastian Kösters wrote:
 Which Log-Files from which Machine do you need to help me finding the
error?

The output of `amcheck your-conf' on the amanda server.
If there's one particular client that fails, the files from that 
client's /tmp/amanda/ corresponding to above amcheck, or double 
confirmation that there are no files.
The version number of amanda, and the relevant platforms.

Alex


-- 
Alexander Jolk  * BUF Compagnie * [EMAIL PROTECTED]
Tel +33-1 42 68 18 28  *  Fax +33-1 42 68 18 29






Re: AW: AW: Estimate timeout from server

2005-11-02 Thread Alexander Jolk

Sebastian Kösters wrote:

It failes only doing amdump config


Would you happen to have a relevant error message to show from your 
amdump report?



Amandad on client

[...]

amandad: time 0.000: got packet:

Amanda 2.4 REQ HANDLE 000-E0426409 SEQ 1130912894
SECURITY USER amanda
SERVICE sendsize
OPTIONS features=feff9ffe0f;maxdumps=1;hostname=pst;
GNUTAR /pst 0 1970:1:1:0:0:0 -1 OPTIONS |;bsd-auth;compress-fast;index;

[...]

amandad: time 3428.573: sending REP packet:

Amanda 2.4 REP HANDLE 000-E0426409 SEQ 1130912894
OPTIONS features=feff9f00;
/pst 0 SIZE 84404220


amandad: time 3438.570: dgram_recv: timeout after 10 seconds
amandad: time 3438.570: waiting for ack: timeout, retrying


Sounds like a simple estimate timeout to me, 3500s for one partition is 
quite long.  Bump up your etimeout in amanda.conf to something like 
5000s and see whether that works.  Or try to find out why estimate on 
the pst:/pst takes so long, and do something about that.  You might for 
instance split it up in smaller chunks.  Or switch to server side 
estimates which are instant.


You should have got a clear message in amdump's report though, saying 
`estimate timeout', which is english for `estimate timeout'.


Alex

--
Alexander Jolk  * BUF Compagnie * [EMAIL PROTECTED]
Tel +33-1 42 68 18 28  *  Fax +33-1 42 68 18 29


AW: AW: AW: Estimate timeout from server

2005-11-02 Thread Sebastian Kösters
My report looks like this:

These dumps were to tape DailySet150.
The next tape Amanda expects to use is: DailySet150.

FAILURE AND STRANGE DUMP SUMMARY:
  pst/pst lev 0 FAILED [Estimate timeout from pst]


STATISTICS:
  Total   Full  Daily
      
Estimate Time (hrs:min)0:15
Run Time (hrs:min) 0:15
Dump Time (hrs:min)0:00   0:00   0:00
Output Size (meg)   0.00.00.0
Original Size (meg) 0.00.00.0
Avg Compressed Size (%) -- -- -- 
Filesystems Dumped0  0  0
Avg Dump Rate (k/s) -- -- -- 

Tape Time (hrs:min)0:00   0:00   0:00
Tape Size (meg) 0.00.00.0
Tape Used (%)   0.00.00.0
Filesystems Taped 0  0  0
Avg Tp Write Rate (k/s) -- -- -- 

USAGE BY TAPE:
  Label Time  Size  %Nb
  DailySet150   0:00   0.00.0 0



NOTES:
  planner: tapecycle (1) = runspercycle (10)
  planner: Adding new disk pst:/pst.
  driver: WARNING: got empty schedule from planner
  taper: tape DailySet150 kb 0 fm 0 [OK]



DUMP SUMMARY:
 DUMPER STATSTAPER STATS 
HOSTNAME DISKL ORIG-KB OUT-KB COMP% MMM:SS  KB/s MMM:SS  KB/s
-- - 
pst  /pst0 FAILED ---

(brought to you by Amanda version 2.4.4p4)

Thats all. I tested with a seperate config. 
 

-Ursprüngliche Nachricht-
Von: Alexander Jolk [mailto:[EMAIL PROTECTED] 
Gesendet: Mittwoch, 2. November 2005 13:36
An: Sebastian Kösters
Cc: amanda-users@amanda.org
Betreff: Re: AW: AW: Estimate timeout from server

Sebastian Kösters wrote:
 It failes only doing amdump config

Would you happen to have a relevant error message to show from your 
amdump report?

 Amandad on client
[...]
 amandad: time 0.000: got packet:
 
 Amanda 2.4 REQ HANDLE 000-E0426409 SEQ 1130912894
 SECURITY USER amanda
 SERVICE sendsize
 OPTIONS features=feff9ffe0f;maxdumps=1;hostname=pst;
 GNUTAR /pst 0 1970:1:1:0:0:0 -1 OPTIONS |;bsd-auth;compress-fast;index;
[...]
 amandad: time 3428.573: sending REP packet:
 
 Amanda 2.4 REP HANDLE 000-E0426409 SEQ 1130912894
 OPTIONS features=feff9f00;
 /pst 0 SIZE 84404220
 
 
 amandad: time 3438.570: dgram_recv: timeout after 10 seconds
 amandad: time 3438.570: waiting for ack: timeout, retrying

Sounds like a simple estimate timeout to me, 3500s for one partition is 
quite long.  Bump up your etimeout in amanda.conf to something like 
5000s and see whether that works.  Or try to find out why estimate on 
the pst:/pst takes so long, and do something about that.  You might for 
instance split it up in smaller chunks.  Or switch to server side 
estimates which are instant.

You should have got a clear message in amdump's report though, saying 
`estimate timeout', which is english for `estimate timeout'.

Alex

-- 
Alexander Jolk  * BUF Compagnie * [EMAIL PROTECTED]
Tel +33-1 42 68 18 28  *  Fax +33-1 42 68 18 29






Re: Estimate Timeout Issue - Dump runs fine

2005-11-02 Thread Joshua Baker-LePain

On Wed, 2 Nov 2005 at 11:32am, Tom Brown wrote

But i'm getting a slightly strange error with a large partition. The 
partition in question is around 900gig in size although only a few hundred 
meg are currently used. When the estimate runs it returns


FAILURE AND STRANGE DUMP SUMMARY:
 planner: ERROR Estimate timeout from servername

Thing is though the actual dump of this filesystem runs fine - I have 
increased my eTimeout to 20mins but this still occurs - Any ideas on this 
one?


Look in /tmp/amanda/sendsize*debug and/or amandad*debug to see how long 
the estimate is actually taking.  Also, what do your iptables rules look 
like on the server?


--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


Re: Estimate Timeout Issue - Dump runs fine

2005-11-02 Thread Tom Brown



Look in /tmp/amanda/sendsize*debug and/or amandad*debug to see how long 
the estimate is actually taking.  Also, what do your iptables rules look 
like on the server?


thanks - iptables are not being used, local firewall is off

sendsize degug is below and looks OK

# more /tmp/amanda/sendsize.20051102003001.debug
sendsize: debug 1 pid 12320 ruid 11 euid 11: start at Wed Nov  2 
00:30:01 2005

sendsize: version 2.4.5p1
sendsize[12322]: time 0.002: calculating for amname '/dev/sda2', dirname 
'/', spindle -1

sendsize[12322]: time 0.002: getting size via dump for /dev/sda2 level 0
sendsize[12322]: time 0.002: calculating for device '/dev/sda2' with 'ext3'
sendsize[12322]: time 0.002: running /sbin/dump 0Ssf 1048576 - /dev/sda2
sendsize[12322]: time 0.003: running /opt/amanda-2.4.5p1/libexec/killpgrp
sendsize[12320]: time 0.003: waiting for any estimate child: 1 running
sendsize[12322]: time 21.884: 1447269376
sendsize[12322]: time 21.885: .
sendsize[12322]: estimate time for /dev/sda2 level 0: 21.882
sendsize[12322]: estimate size for /dev/sda2 level 0: 1413349 KB
sendsize[12322]: time 21.885: asking killpgrp to terminate
sendsize[12322]: time 22.886: getting size via dump for /dev/sda2 level 1
sendsize[12322]: time 22.887: calculating for device '/dev/sda2' with 'ext3'
sendsize[12322]: time 22.887: running /sbin/dump 1Ssf 1048576 - /dev/sda2
sendsize[12322]: time 22.888: running /opt/amanda-2.4.5p1/libexec/killpgrp
sendsize[12322]: time 195.606: 4647936
sendsize[12322]: time 195.606: .
sendsize[12322]: estimate time for /dev/sda2 level 1: 172.718
sendsize[12322]: estimate size for /dev/sda2 level 1: 4539 KB
sendsize[12322]: time 195.606: asking killpgrp to terminate
sendsize[12322]: time 196.608: done with amname '/dev/sda2', dirname 
'/', spindle -1

sendsize[12320]: time 196.608: child 12322 terminated normally
sendsize[12334]: time 196.609: calculating for amname '/dev/sda1', 
dirname '/boot', spindle -1

sendsize[12334]: time 196.609: getting size via dump for /dev/sda1 level 0
sendsize[12334]: time 196.609: calculating for device '/dev/sda1' with 
'ext3'

sendsize[12334]: time 196.609: running /sbin/dump 0Ssf 1048576 - /dev/sda1
sendsize[12320]: time 196.609: waiting for any estimate child: 1 running
sendsize[12334]: time 196.610: running /opt/amanda-2.4.5p1/libexec/killpgrp
sendsize[12334]: time 197.239: 5737472
sendsize[12334]: time 197.239: .
sendsize[12334]: estimate time for /dev/sda1 level 0: 0.630
sendsize[12334]: estimate size for /dev/sda1 level 0: 5603 KB
sendsize[12334]: time 197.239: asking killpgrp to terminate
sendsize[12334]: time 198.242: getting size via dump for /dev/sda1 level 1
sendsize[12334]: time 198.243: calculating for device '/dev/sda1' with 
'ext3'

sendsize[12334]: time 198.243: running /sbin/dump 1Ssf 1048576 - /dev/sda1
sendsize[12334]: time 198.243: running /opt/amanda-2.4.5p1/libexec/killpgrp
sendsize[12334]: time 198.684: 27648
sendsize[12334]: time 198.684: .
sendsize[12334]: estimate time for /dev/sda1 level 1: 0.441
sendsize[12334]: estimate size for /dev/sda1 level 1: 27 KB
sendsize[12334]: time 198.684: asking killpgrp to terminate
sendsize[12334]: time 199.687: done with amname '/dev/sda1', dirname 
'/boot', spindle -1

sendsize[12320]: time 199.687: child 12334 terminated normally
sendsize[12339]: time 199.687: calculating for amname '/dev/sda5', 
dirname '/export/disk1', spindle -1

sendsize[12339]: time 199.688: getting size via dump for /dev/sda5 level 0
sendsize[12320]: time 199.688: waiting for any estimate child: 1 running
sendsize[12339]: time 199.688: calculating for device '/dev/sda5' with 
'ext3'

sendsize[12339]: time 199.688: running /sbin/dump 0Ssf 1048576 - /dev/sda5
sendsize[12339]: time 199.689: running /opt/amanda-2.4.5p1/libexec/killpgrp
sendsize[12339]: time 545.606: 88973312
sendsize[12339]: time 545.617: .
sendsize[12339]: estimate time for /dev/sda5 level 0: 345.928
sendsize[12339]: estimate size for /dev/sda5 level 0: 86888 KB
sendsize[12339]: time 545.617: asking killpgrp to terminate
sendsize[12339]: time 546.619: getting size via dump for /dev/sda5 level 1
sendsize[12339]: time 546.646: calculating for device '/dev/sda5' with 
'ext3'

sendsize[12339]: time 546.646: running /sbin/dump 1Ssf 1048576 - /dev/sda5
sendsize[12339]: time 546.647: running /opt/amanda-2.4.5p1/libexec/killpgrp
sendsize[12339]: time 2182.684: 25811968
sendsize[12339]: time 2182.696: .
sendsize[12339]: estimate time for /dev/sda5 level 1: 1636.054
sendsize[12339]: estimate size for /dev/sda5 level 1: 25207 KB
sendsize[12339]: time 2182.701: asking killpgrp to terminate
sendsize[12339]: time 2183.703: done with amname '/dev/sda5', dirname 
'/export/disk1', spindle -1

sendsize[12320]: time 2183.704: child 12339 terminated normally
sendsize: time 2183.704: pid 12320 finish time Wed Nov  2 01:06:24 2005

one of my amanda.debugs does have this at the bottom of it

amandad: time 2193.716: dgram_recv: timeout after 10 seconds
amandad: time 

Re: Estimate Timeout Issue - Dump runs fine

2005-11-02 Thread Joshua Baker-LePain

On Wed, 2 Nov 2005 at 2:31pm, Tom Brown wrote

Look in /tmp/amanda/sendsize*debug and/or amandad*debug to see how long the 
estimate is actually taking.  Also, what do your iptables rules look like 
on the server?


thanks - iptables are not being used, local firewall is off



one of my amanda.debugs does have this at the bottom of it

amandad: time 2193.716: dgram_recv: timeout after 10 seconds
amandad: time 2193.716: waiting for ack: timeout, retrying
amandad: time 2203.716: dgram_recv: timeout after 10 seconds
amandad: time 2203.716: waiting for ack: timeout, retrying
amandad: time 2213.717: dgram_recv: timeout after 10 seconds
amandad: time 2213.717: waiting for ack: timeout, retrying
amandad: time 2223.717: dgram_recv: timeout after 10 seconds
amandad: time 2223.717: waiting for ack: timeout, retrying
amandad: time 2233.718: dgram_recv: timeout after 10 seconds
amandad: time 2233.718: waiting for ack: timeout, giving up!
amandad: time 2233.718: pid 12319 finish time Wed Nov  2 01:07:14 2005

is that time figure a time in seconds ?


Yep.  So you can just increase etimeout and/or figure out why /sbin/dump 
1Ssf 1048576 - /dev/sda5 is taking so long.


--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


Re: Estimate Timeout Issue - Dump runs fine

2005-11-02 Thread Tom Brown


Yep.  So you can just increase etimeout and/or figure out why 
/sbin/dump 1Ssf 1048576 - /dev/sda5 is taking so long.


OK thanks - I have increased the etimeout to 2400 seconds and also 
changed the udp timeout within checkpoint to also be 2400 seconds so 
i'll see how the run goes tonight


thanks



Re: AW: AW: AW: Estimate timeout from server

2005-11-02 Thread Stefan G. Weichinger
Sebastian Kösters wrote:
 My report looks like this:
 
 NOTES:
   planner: tapecycle (1) = runspercycle (10)

Please get that sorted out, I assume that this is not what you want,
although it has nothing to do with your timeouts.
 
 Thats all. I tested with a seperate config. 

Have you increased etimeout already, as Alexander suggested?

Stefan.



AW: AW: AW: AW: Estimate timeout from server

2005-11-02 Thread Sebastian Kösters
I am testing right now with a timeout = 5000s.

The Partition i want to backup has 80GB.


 

-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im
Auftrag von Stefan G. Weichinger
Gesendet: Mittwoch, 2. November 2005 22:01
An: amanda-users@amanda.org
Betreff: Re: AW: AW: AW: Estimate timeout from server

Sebastian Kösters wrote:
 My report looks like this:
 
 NOTES:
   planner: tapecycle (1) = runspercycle (10)

Please get that sorted out, I assume that this is not what you want,
although it has nothing to do with your timeouts.
 
 Thats all. I tested with a seperate config. 

Have you increased etimeout already, as Alexander suggested?

Stefan.







AW: AW: AW: AW: Estimate timeout from server

2005-11-02 Thread Sebastian Kösters
It does not work.

On the client amanda starts the following command:

/bin/tar --create --file /dev/null --directory /pst --one-file-system
--listed-incremental /var/lib/amanda/gnutar-lists/pst_pst_0.new

But pst_pst_0.new has always 0kb.

And amcheck always told me that everything is ok?!
 

-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im
Auftrag von Stefan G. Weichinger
Gesendet: Mittwoch, 2. November 2005 22:01
An: amanda-users@amanda.org
Betreff: Re: AW: AW: AW: Estimate timeout from server

Sebastian Kösters wrote:
 My report looks like this:
 
 NOTES:
   planner: tapecycle (1) = runspercycle (10)

Please get that sorted out, I assume that this is not what you want,
although it has nothing to do with your timeouts.
 
 Thats all. I tested with a seperate config. 

Have you increased etimeout already, as Alexander suggested?

Stefan.







Estimate timeout from server

2005-11-01 Thread Sebastian Kösters
Hi,

i get this error (Estimate timeout from server) on an new installed System
(RedHat 9).

I installed amanda like on every other Machine but it will not work (no
reboot or something like this during the Backup).

Which Log-Files from which Machine do you need to help me finding the error?

Thank you very much!

Regards
Sebastian




estimate timeout

2005-10-10 Thread Shai Ayal

Hi all,

I have searched the archives but none of the emails with similar subjects 
helped me.


I have a FC2 amanda 2.4.4 server with 2 linux clients. The server is using 
vtapes for daily backups. It all ran very nicely for many months until we 
ran out of disk space in the server. After a few days of bad backups due to 
full disk, we installed an additional disk, moved some of the virtual tapes 
to it using symlinks, flushed the old backups etc... and sat back to enjoy 
amanda at work.


However:

While one client is being backed up perfectly well, the other keeps getting 
estimates timeout. On this client, everything seem ok except for showing 2 
amandad processes during estimates, one of them defunct -- I attach the 2 
amandad debug reports.


On the server I have set an etimeout of 300 which should be enough, but 
even bumping this to 7200 did not help.


I have no firewall on client and server

tar version is tar (GNU tar) 1.13.25 o the client

This is really frustrating since this setup used to work !

Thanks in advance
Shai

amandad: debug 1 pid 25071 ruid 33 euid 33: start at Mon Oct 10 08:38:58 2005
amandad: version 2.4.4p2
amandad: build: VERSION=Amanda-2.4.4p2
amandad:BUILT_DATE=Mon Mar 22 12:27:54 EST 2004
amandad:BUILT_MACH=Linux bugs.devel.redhat.com 2.4.21-9.ELsmp #1 SMP 
Thu Jan 8 17:08:56 EST 2004 i686 i686 i386 GNU/Linux
amandad:CC=i386-redhat-linux-gcc
amandad:CONFIGURE_COMMAND='./configure' '--host=i386-redhat-linux' 
'--build=i386-redhat-linux' '--target=i386-redhat-linux-gnu' 
'--program-prefix=' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' 
'--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' 
'--includedir=/usr/include' '--libdir=/usr/lib' '--libexecdir=/usr/lib/amanda' 
'--localstatedir=/var/lib' '--sharedstatedir=/usr/com' 
'--mandir=/usr/share/man' '--infodir=/usr/share/info' '--enable-shared' 
'--with-index-server=localhost' 
'--with-gnutar-listdir=/var/lib/amanda/gnutar-lists' 
'--with-smbclient=/usr/bin/smbclient' '--with-amandahosts' '--with-user=amanda' 
'--with-group=disk' '--with-tmpdir=/var/log/amanda' '--with-gnutar=/bin/tar'
amandad: paths: bindir=/usr/bin sbindir=/usr/sbin
amandad:libexecdir=/usr/lib/amanda mandir=/usr/share/man
amandad:AMANDA_TMPDIR=/var/log/amanda
amandad:AMANDA_DBGDIR=/var/log/amanda CONFIG_DIR=/etc/amanda
amandad:DEV_PREFIX=/dev/ RDEV_PREFIX=/dev/r
amandad:DUMP=/sbin/dump RESTORE=/sbin/restore VDUMP=UNDEF
amandad:VRESTORE=UNDEF XFSDUMP=UNDEF XFSRESTORE=UNDEF VXDUMP=UNDEF
amandad:VXRESTORE=UNDEF SAMBA_CLIENT=/usr/bin/smbclient
amandad:GNUTAR=/bin/tar COMPRESS_PATH=/usr/bin/gzip
amandad:UNCOMPRESS_PATH=/usr/bin/gzip LPRCMD=/usr/bin/lpr
amandad:MAILER=/usr/bin/Mail
amandad:listed_incr_dir=/var/lib/amanda/gnutar-lists
amandad: defs:  DEFAULT_SERVER=localhost DEFAULT_CONFIG=DailySet1
amandad:DEFAULT_TAPE_SERVER=localhost
amandad:DEFAULT_TAPE_DEVICE=/dev/null HAVE_MMAP HAVE_SYSVSHM
amandad:LOCKING=POSIX_FCNTL SETPGRP_VOID DEBUG_CODE
amandad:AMANDA_DEBUG_DAYS=4 BSD_SECURITY USE_AMANDAHOSTS
amandad:CLIENT_LOGIN=amanda FORCE_USERID HAVE_GZIP
amandad:COMPRESS_SUFFIX=.gz COMPRESS_FAST_OPT=--fast
amandad:COMPRESS_BEST_OPT=--best UNCOMPRESS_OPT=-dc
amandad: time 0.000: got packet:

Amanda 2.4 REQ HANDLE 000-00443709 SEQ 1128926338
SECURITY USER amanda
SERVICE noop
OPTIONS features=feff9ffe0f;


amandad: time 0.000: sending ack:

Amanda 2.4 ACK HANDLE 000-00443709 SEQ 1128926338


amandad: time 0.000: bsd security: remote host betacentauri.bioc user amanda 
local user amanda
amandad: time 0.015: amandahosts security check passed
amandad: time 0.015: running service noop
amandad: time 0.015: sending REP packet:

Amanda 2.4 REP HANDLE 000-00443709 SEQ 1128926338
OPTIONS features=feff9ffe0f;


amandad: time 0.015: got packet:

Amanda 2.4 ACK HANDLE 000-00443709 SEQ 1128926338


amandad: time 0.016: pid 25071 finish time Mon Oct 10 08:38:58 2005
amandad: debug 1 pid 25072 ruid 33 euid 33: start at Mon Oct 10 08:38:58 2005
amandad: version 2.4.4p2
amandad: build: VERSION=Amanda-2.4.4p2
amandad:BUILT_DATE=Mon Mar 22 12:27:54 EST 2004
amandad:BUILT_MACH=Linux bugs.devel.redhat.com 2.4.21-9.ELsmp #1 SMP 
Thu Jan 8 17:08:56 EST 2004 i686 i686 i386 GNU/Linux
amandad:CC=i386-redhat-linux-gcc
amandad:CONFIGURE_COMMAND='./configure' '--host=i386-redhat-linux' 
'--build=i386-redhat-linux' '--target=i386-redhat-linux-gnu' 
'--program-prefix=' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' 
'--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' 
'--includedir=/usr/include' '--libdir=/usr/lib' '--libexecdir=/usr/lib/amanda' 
'--localstatedir=/var/lib' '--sharedstatedir=/usr/com' 
'--mandir=/usr/share/man' '--infodir=/usr/share/info' '--enable-shared' 

Re: estimate timeout

2005-10-10 Thread Joshua Baker-LePain

On Mon, 10 Oct 2005 at 9:20am, Shai Ayal wrote

I have a FC2 amanda 2.4.4 server with 2 linux clients. The server is using 
vtapes for daily backups. It all ran very nicely for many months until we ran 
out of disk space in the server. After a few days of bad backups due to full 
disk, we installed an additional disk, moved some of the virtual tapes to it 
using symlinks, flushed the old backups etc... and sat back to enjoy amanda 
at work.


However:

While one client is being backed up perfectly well, the other keeps getting 
estimates timeout. On this client, everything seem ok except for showing 2 
amandad processes during estimates, one of them defunct -- I attach the 2 
amandad debug reports.


On the server I have set an etimeout of 300 which should be enough, but even 
bumping this to 7200 did not help.


I have no firewall on client and server


Are you sure about that?  /etc/sysconfig/iptables is empty and/or 
'chkconfig --list iptables' says off for all runlevels?  That's a very 
non-standard setup.  I've seen behavior like this:


amandad: time 0.025: running service /usr/lib/amanda/sendsize
amandad: time 349.398: sending REP packet:
*snip*
amandad: time 359.415: dgram_recv: timeout after 10 seconds
amandad: time 359.415: waiting for ack: timeout, retrying
amandad: time 369.413: dgram_recv: timeout after 10 seconds
amandad: time 369.413: waiting for ack: timeout, retrying
amandad: time 379.412: dgram_recv: timeout after 10 seconds
amandad: time 379.412: waiting for ack: timeout, retrying
amandad: time 389.410: dgram_recv: timeout after 10 seconds
amandad: time 389.410: waiting for ack: timeout, retrying
amandad: time 399.409: dgram_recv: timeout after 10 seconds
amandad: time 399.409: waiting for ack: timeout, giving up!

on my systems where iptables allows established connections, but  300 
seconds timed-out what was considered established.


--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


Re: estimate timeout

2005-10-10 Thread Gene Heskett
On Monday 10 October 2005 03:20, Shai Ayal wrote:
Hi all,

I have searched the archives but none of the emails with similar
 subjects helped me.

I have a FC2 amanda 2.4.4 server with 2 linux clients. The server is
 using vtapes for daily backups. It all ran very nicely for many months
 until we ran out of disk space in the server. After a few days of bad
 backups due to full disk, we installed an additional disk, moved some
 of the virtual tapes to it using symlinks, flushed the old backups
 etc... and sat back to enjoy amanda at work.

However:

While one client is being backed up perfectly well, the other keeps
 getting estimates timeout. On this client, everything seem ok except
 for showing 2 amandad processes during estimates, one of them defunct
 -- I attach the 2 amandad debug reports.

It is possible that the defunct amandad has open locks on files, thereby
blocking the estimate.  2 things might help, first I'd reboot the machine
the failure is on to remove them, and then I think I'd install a newer
amanda, 2.4.4 is getting a bit long in the tooth these days.  I can't
recall the exact version I was running when that happened on my firewall
box, mainly because I wasn't doing virtual tapes yet and was having so
many other tape related issues back then that a stuck amandad just wasn't
an event to record at length in my wetram.

If you still jave the same scripts you used to build the 2.4.4 on each
box, then 2.4.5-20051006 should install and run exactly the same.

However, I just checked the /home/amanda directory on my single linux
client, and its equally elderly, at 2.4.4-20030529, and its working fine
other than a 10 second delay in checking clients when amcheck is run,
about 80% of the time.

But this is as good a time to bring it uptodate as any, so its building on
that box now.  Using the same script I built the older version with.  
Oops,
forgot to run ldconfig after the install, done now.

Humm, I note that, and this has been random in the past, true about 80%
of the time, but there is no longer a 10 second delay in checking the
clients now, more like .35 seconds.  At least for the several iterations
of it I've done.  Maybe thats fixed now?

On the server I have set an etimeout of 300 which should be enough, but
even bumping this to 7200 did not help.

I have no firewall on client and server
 
I do, but it not between the client and server, its betwen client and
the rest of the planet.  That box is the gateway.

tar version is tar (GNU tar) 1.13.25 o the client

Thats a good one, although I'm running 1.15-1 on the server.  But the
client box is rh7.3, and the glib version won't let me build, or install
1.15.1.

This is really frustrating since this setup used to work !

Thanks in advance
Shai

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
99.35% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2005 by Maurice Eugene Heskett, all rights reserved.



Re: Estimate timeout

2005-09-21 Thread John R. Jackson
In amandad.debug on the client I get:
...
amandad: time 2951.811: sending REP packet:

Amanda 2.4 REP HANDLE 00F-80930508 SEQ 1126652514
OPTIONS features=feff9ffe7f;
ar0s1a 0 SIZE 30567356
ar0s1a 1 SIZE 17959269


amandad: time 2961.819: dgram_recv: timeout after 10 seconds
amandad: time 2961.819: waiting for ack: timeout, retrying
...
In sendsize.debug:
...
sendsize[20121]: estimate time for ar0s1a level 0: 320.045
...
sendsize[20121]: estimate time for ar0s1a level 1: 2629.405
sendsize[20113]: time 2951.660: child 20121 terminated normally

It took ~2950 seconds to do the two estimates, based on the various
log messages.  When amandad tried to send back the response/reply (REP)
packet, it never got an acknowledgement (ack) that amdump/planner had
received it.

The default etimeout is 300 seconds.  Amanda multiplies that by the
number of estimates it asks the client to do, so, at best, planner on
the server side gave up after 600 seconds, which is why there wasn't
anyone around to receive the reply and answer it.  If you look at the
amdump.NN file that matches the above you'll probably see planner getting
done well before 2950 seconds.

I'm not sure why this disk is now taking so much longer to do estimates,
but the simplest solution is to just crank up etimeout in your amanda.conf
(or disklist) to compensate.  At least backups will start working again,
and then you can look into possible hardware or file system performance
problems.

Tommy Eriksen - Chief Technical Officer

JJ


Estimate timeout

2005-09-20 Thread Tommy Eriksen
Hey,

I have a rather strange problem.
I had to restore a complete server from backup recently, no problem, everything 
went smoothly.
However, after this, I can't seem to get a fresh backup of it. I've tried 
everything from reinstalling amanda to changing the machine's hostname (both 
the machines real hostname and the one in amanda), but still I get this in my 
daily report:
  dc104  ar0s1a lev 0 FAILED [Estimate timeout from dc104]

I've got 115 entries in my disklist (on some 60 hosts) and this is the only one 
I can't get to work.
There doesn't seem to be any network problems between the amanda server and the 
client either.

This does look like a networking problem, but the machines can communicate 
freely and without any problems.

In amandad.debug on the client I get:
-bash-2.05b# cat amandad.20050914010102000.debug
amandad: debug 1 pid 20112 ruid 2 euid 2: start at Wed Sep 14 01:01:02 2005
amandad: version 2.4.5
amandad: build: VERSION=Amanda-2.4.5
amandad:BUILT_DATE=Thu Aug 25 17:46:51 CEST 2005
amandad:BUILT_MACH=FreeBSD tlnordic.moduleweb.net 4.9-STABLE FreeBSD 
4.9-STABLE #0: Mon Jan 5 23:35:10 CET 2004 [EMAIL 
PROTECTED]:/usr/obj/usr/src/sys/GENERIC i386
amandad:CC=cc
amandad:CONFIGURE_COMMAND='./configure' 
'--libexecdir=/usr/local/libexec/amanda' '--with-amandahosts' '--with-fqdn' 
'--with-dump-honor-nodump' '--with-buffered-dump' '--without-server' 
'--disable-libtool' '--prefix=/usr/local' '--with-user=operator' 
'--with-group=operator' 
'--with-gnutar-listdir=/usr/local/var/amanda/gnutar-lists' 
'--with-index-server=eclipse.rackhosting.com' 
'--with-tape-server=eclipse.rackhosting.com' '--with-config=ModuleWeb' 
'--prefix=/usr/local' '--build=i386-portbld-freebsd4.9'
amandad: paths: bindir=/usr/local/bin sbindir=/usr/local/sbin
amandad:libexecdir=/usr/local/libexec/amanda
amandad:mandir=/usr/local/man AMANDA_TMPDIR=/tmp/amanda
amandad:AMANDA_DBGDIR=/tmp/amanda
amandad:CONFIG_DIR=/usr/local/etc/amanda DEV_PREFIX=/dev/
amandad:RDEV_PREFIX=/dev/r DUMP=/sbin/dump
amandad:RESTORE=/sbin/restore VDUMP=UNDEF VRESTORE=UNDEF
amandad:XFSDUMP=UNDEF XFSRESTORE=UNDEF VXDUMP=UNDEF VXRESTORE=UNDEF
amandad:SAMBA_CLIENT=UNDEF GNUTAR=/usr/bin/tar
amandad:COMPRESS_PATH=/usr/bin/gzip
amandad:UNCOMPRESS_PATH=/usr/bin/gzip LPRCMD=/usr/bin/lpr
amandad:MAILER=/usr/bin/Mail
amandad:listed_incr_dir=/usr/local/var/amanda/gnutar-lists
amandad: defs:  DEFAULT_SERVER=eclipse.rackhosting.com
amandad:DEFAULT_CONFIG=ModuleWeb
amandad:DEFAULT_TAPE_SERVER=eclipse.rackhosting.com
amandad:DEFAULT_TAPE_DEVICE=/dev/null HAVE_MMAP HAVE_SYSVSHM
amandad:LOCKING=POSIX_FCNTL DEBUG_CODE AMANDA_DEBUG_DAYS=4
amandad:BSD_SECURITY USE_AMANDAHOSTS CLIENT_LOGIN=operator
amandad:FORCE_USERID HAVE_GZIP COMPRESS_SUFFIX=.gz
amandad:COMPRESS_FAST_OPT=--fast COMPRESS_BEST_OPT=--best
amandad:UNCOMPRESS_OPT=-dc
amandad: time 0.000: got packet:

Amanda 2.4 REQ HANDLE 00F-80930508 SEQ 1126652514
SECURITY USER operator
SERVICE sendsize
OPTIONS features=feff9ffe0f;maxdumps=1;hostname=dc104;
DUMP ar0s1a 0 1970:1:1:0:0:0 -1 OPTIONS |;auth=bsd;compress-fast;index;
DUMP ar0s1a 1 2005:8:10:12:30:11 -1 OPTIONS |;auth=bsd;compress-fast;index;


amandad: time 0.000: sending ack:

Amanda 2.4 ACK HANDLE 00F-80930508 SEQ 1126652514


amandad: time 0.001: bsd security: remote host eclipse.rackhosting.com user 
operator local user operator
amandad: time 0.001: amandahosts security check passed
amandad: time 0.001: running service /usr/local/libexec/amanda/sendsize
amandad: time 599.378: got packet:

Amanda 2.4 REQ HANDLE 00F-80930508 SEQ 1126652514
SECURITY USER operator
SERVICE sendsize
OPTIONS features=feff9ffe0f;maxdumps=1;hostname=dc104;
DUMP ar0s1a 0 1970:1:1:0:0:0 -1 OPTIONS |;auth=bsd;compress-fast;index;
DUMP ar0s1a 1 2005:8:10:12:30:11 -1 OPTIONS |;auth=bsd;compress-fast;index;


amandad: time 599.378: received dup P_REQ packet, ACKing it
amandad: time 599.378: sending ack:

Amanda 2.4 ACK HANDLE 00F-80930508 SEQ 1126652514


amandad: time 1199.936: got packet:

Amanda 2.4 REQ HANDLE 00F-80930508 SEQ 1126652514
SECURITY USER operator
SERVICE sendsize
OPTIONS features=feff9ffe0f;maxdumps=1;hostname=dc104;
DUMP ar0s1a 0 1970:1:1:0:0:0 -1 OPTIONS |;auth=bsd;compress-fast;index;
DUMP ar0s1a 1 2005:8:10:12:30:11 -1 OPTIONS |;auth=bsd;compress-fast;index;


amandad: time 1199.936: received dup P_REQ packet, ACKing it
amandad: time 1199.936: sending ack:

Amanda 2.4 ACK HANDLE 00F-80930508 SEQ 1126652514


amandad: time 2951.811: sending REP packet:

Amanda 2.4 REP HANDLE 00F-80930508 SEQ 1126652514
OPTIONS features=feff9ffe7f;
ar0s1a 0 SIZE 30567356
ar0s1a 1 SIZE 17959269


amandad: time 2961.819: dgram_recv: timeout after 10 seconds
amandad: time 2961.819: waiting for ack: timeout

RE: Estimate timeout

2005-08-31 Thread LaValley, Brian E
Well, the tar command by itself is still running, but the backup with the
new version of tar is complete, so my estimate timeout problem is fixed
with an updated tar executable. Thank you all.

-Original Message-
From: Joshua Baker-LePain [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 30, 2005 11:21 AM
To: LaValley, Brian E
Cc: Amanda (E-mail)
Subject: Re: Estimate timeout


On Tue, 30 Aug 2005 at 11:01am, LaValley, Brian E wrote

 sendsize: debug 1 pid 12359 ruid 548 euid 548: start at Mon Aug 29
18:00:02
 2005
 sendsize: version 2.4.4p2
 sendsize[12359]: time 0.034: waiting for any estimate child: 1 running
 sendsize[12361]: time 0.035: calculating for amname
 '/dev/vx/dsk/homedg/homevol', dirname '/home', spindle -1
 sendsize[12361]: time 0.035: getting size via gnutar for
 /dev/vx/dsk/homedg/homevol level 0
 sendsize[12361]: time 0.092: spawning
/home/backup/amanda_sun/libexec/runtar
 in pipeline
 sendsize[12361]: argument list: /opt/sfw/bin/gtar --create --file
/dev/null
 --directory /home --one-file-system --listed-incremental

/home/backup/amanda_sun/var/amanda/gnutar-lists/coneng_dev_vx_dsk_homedg_hom
 evol_0.new --sparse --ignore-failed-read --totals --exclude-from
 /tmp/amanda/sendsize._dev_vx_dsk_homedg_homevol.20050829180002.exclude .

Run this command yourself on the command line (as root) and see how long 
it take to complete.  Also, what version of tar are you running?

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


Estimate timeout

2005-08-30 Thread LaValley, Brian E
setup_estimate: napier:/common: command 0, options:
last_level -1 next_level0 -13025 level_days 0
getting estimates 0 (0) -1 (-1) -1 (-1)
planner: time 0.209: setting up estimates took 0.153 secs

GETTING ESTIMATES...
driver: started dumper0 pid 7556
driver: started dumper1 pid 7557
driver: started dumper2 pid 7558
driver: started dumper3 pid 7559
dumper: dgram_bind: socket bound to 0.0.0.0.742
dumper: pid 7557 executable dumper1 version 2.4.4p2, using port 742
dumper: dgram_bind: socket bound to 0.0.0.0.741
dumper: pid 7556 executable dumper0 version 2.4.4p2, using port 741
dumper: dgram_bind: socket bound to 0.0.0.0.743
dumper: pid 7558 executable dumper2 version 2.4.4p2, using port 743
dumper: dgram_bind: socket bound to 0.0.0.0.744
dumper: pid 7559 executable dumper3 version 2.4.4p2, using port 744
changer: got exit: 0 str: 6 15 1 1
changer: opening pipe to: /opt/amanda/libexec/chg-scsi -slot current
changer: got exit: 0 str: 6 /dev/nst0
taper: slot 6: date Xlabel DailySet1-A06 (new tape)
taper: read label `DailySet1-A06' date `X'
taper: wrote label `DailySet1-A06' date `20050829'
planner: time 10735.451: got result for host napier disk /common: 0 -
294080K, -1 - -1K, -1 - -1K
planner: time 10735.481: got result for host napier disk /home: 0 -
81027540K, 1 - 3635110K, -1 - -1K
planner: time 10735.481: got result for host napier disk /: 0 - 3193020K, 3
- 281800K, -1 - -1K
planner: time 29598.524: error result for host coneng disk
/dev/vx/dsk/homedg/homevol: Estimate timeout from coneng
planner: time 29598.552: getting estimates took 29598.342 secs
FAILED QUEUE:
  0: coneng /dev/vx/dsk/homedg/homevol
DONE QUEUE:
  0: napier /common
  1: napier /home
  2: napier /

GENERATING SCHEDULE:

ENDFLUSH
DUMP napier feff9ffe0f /common 20050829 13027 0 1970:1:1:0:0:0 147040
4901
DUMP napier feff9ffe0f / 20050829 13 0 1970:1:1:0:0:0 952854 11210 3
2005:8:16:5:23:44 94810 162
DUMP napier feff9ffe0f /home 20050829 2 1 2005:8:26:1:10:29 1817555
60585

driver: adding holding disk 0 dir /backup/amanda/rdumps/dump3 size 83886080
driver: adding holding disk 1 dir /backup/amanda/rdumps/dump2 size 83886080
driver: adding holding disk 2 dir /backup/amanda/ldump size 14698880
reserving 182471040 out of 182471040 for degraded-mode dumps
driver: flush size 0
driver: start time 29598.652 inparallel 4 bandwidth 2200 diskspace 182471040
dir OBSOLETE datestamp 20050829 driver: drain-ends tapeq FIRST big-dumpers
ttt
driver: result time 29598.652 from taper: TAPER-OK
driver: send-cmd time 29598.668 to dumper0: FILE-DUMP 00-1
/backup/amanda/rdumps/dump3/20050829/napier._common.0 napier feff9ffe0f
/common NODEVICE 0 1970:1:1:0:0:0 256000 GNUTAR 147104
|;bsd-auth;srvcomp-best;index;exclude-list=/opt/amanda/etc/exclude.gtar;
driver: state time 29598.669 free kps: 2170 space: 182323936 taper: idle
idle-dumpers: 3 qlen tapeq: 0 runq: 2 roomq: 0 wakeup: 15 driver-idle:
start-wait
driver: interface-state time 29598.669 if : free 570 if ETH0: free 600 if
LOCAL: free 1000
driver: hdisk-state time 29598.669 hdisk 0: free 83738976 dumpers 1 hdisk 1:
free 83886080 dumpers 0 hdisk 2: free 14698880 dumpers 0
dumper: stream_client: connected to 198.151.154.11.33238
dumper: stream_client: our side is 0.0.0.0.33241
dumper: stream_client: connected to 198.151.154.11.33239
dumper: stream_client: our side is 0.0.0.0.33242
dumper: stream_client: connected to 198.151.154.11.33240
dumper: stream_client: our side is 0.0.0.0.33243
driver: state time 29613.666 free kps: 2170 space: 182323936 taper: idle
idle-dumpers: 3 qlen tapeq: 0 runq: 2 roomq: 0 wakeup: 86400 driver-idle:
client-constrained
driver: interface-state time 29613.666 if : free 570 if ETH0: free 600 if
LOCAL: free 1000
driver: hdisk-state time 29613.666 hdisk 0: free 83738976 dumpers 1 hdisk 1:
free 83886080 dumpers 0 hdisk 2: free 14698880 dumpers 0
driver: result time 29782.170 from dumper0: DONE 00-1 294080 80960 183
[sec 182.929 kb 80960 kps 442.6 orig-kb 294080]
driver: finished-cmd time 30379.942 dumper0 dumped napier:/common
driver: send-cmd time 30379.942 to taper: FILE-WRITE 00-2
/backup/amanda/rdumps/dump3/20050829/napier._common.0 napier feff9ffe0f
/common 0 20050829
driver: startaflush: FIRST napier /common 80993 52433920
driver: send-cmd time 30379.942 to dumper0: FILE-DUMP 01-3
/backup/amanda/rdumps/dump2/20050829/napier._.0 napier feff9ffe0f /
NODEVICE 0 1970:1:1:0:0:0 256000 GNUTAR 953024
|;bsd-auth;srvcomp-best;index;exclude-list=/opt/amanda/etc/exclude.gtar;
driver: state time 30379.942 free kps: 2115 space: 181437023 taper: writing
idle-dumpers: 3 qlen tapeq: 0 runq: 1 roomq: 0 wakeup: 15 driver-idle:
start-wait
driver: interface-state time 30379.942 if : free 515 if ETH0: free 600 if
LOCAL: free 1000
driver: hdisk-state time 30379.942 hdisk 0: free 83805087 dumpers 0 hdisk 1:
free 82933056 dumpers 1 hdisk 2: free 14698880 dumpers 0
dumper: stream_client: connected to 198.151.154.11.33248
dumper

Re: Estimate timeout

2005-08-30 Thread Joshua Baker-LePain
On Tue, 30 Aug 2005 at 11:01am, LaValley, Brian E wrote

 sendsize: debug 1 pid 12359 ruid 548 euid 548: start at Mon Aug 29 18:00:02
 2005
 sendsize: version 2.4.4p2
 sendsize[12359]: time 0.034: waiting for any estimate child: 1 running
 sendsize[12361]: time 0.035: calculating for amname
 '/dev/vx/dsk/homedg/homevol', dirname '/home', spindle -1
 sendsize[12361]: time 0.035: getting size via gnutar for
 /dev/vx/dsk/homedg/homevol level 0
 sendsize[12361]: time 0.092: spawning /home/backup/amanda_sun/libexec/runtar
 in pipeline
 sendsize[12361]: argument list: /opt/sfw/bin/gtar --create --file /dev/null
 --directory /home --one-file-system --listed-incremental
 /home/backup/amanda_sun/var/amanda/gnutar-lists/coneng_dev_vx_dsk_homedg_hom
 evol_0.new --sparse --ignore-failed-read --totals --exclude-from
 /tmp/amanda/sendsize._dev_vx_dsk_homedg_homevol.20050829180002.exclude .

Run this command yourself on the command line (as root) and see how long 
it take to complete.  Also, what version of tar are you running?

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


RE: Estimate timeout

2005-08-30 Thread LaValley, Brian E
I'll have to get back to you on running the command by itself. My tar
version is: tar (GNU tar) 1.13

-Original Message-
From: Joshua Baker-LePain [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 30, 2005 11:21 AM
To: LaValley, Brian E
Cc: Amanda (E-mail)
Subject: Re: Estimate timeout


On Tue, 30 Aug 2005 at 11:01am, LaValley, Brian E wrote

 sendsize: debug 1 pid 12359 ruid 548 euid 548: start at Mon Aug 29
18:00:02
 2005
 sendsize: version 2.4.4p2
 sendsize[12359]: time 0.034: waiting for any estimate child: 1 running
 sendsize[12361]: time 0.035: calculating for amname
 '/dev/vx/dsk/homedg/homevol', dirname '/home', spindle -1
 sendsize[12361]: time 0.035: getting size via gnutar for
 /dev/vx/dsk/homedg/homevol level 0
 sendsize[12361]: time 0.092: spawning
/home/backup/amanda_sun/libexec/runtar
 in pipeline
 sendsize[12361]: argument list: /opt/sfw/bin/gtar --create --file
/dev/null
 --directory /home --one-file-system --listed-incremental

/home/backup/amanda_sun/var/amanda/gnutar-lists/coneng_dev_vx_dsk_homedg_hom
 evol_0.new --sparse --ignore-failed-read --totals --exclude-from
 /tmp/amanda/sendsize._dev_vx_dsk_homedg_homevol.20050829180002.exclude .

Run this command yourself on the command line (as root) and see how long 
it take to complete.  Also, what version of tar are you running?

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


RE: Estimate timeout

2005-08-30 Thread Joshua Baker-LePain
On Tue, 30 Aug 2005 at 11:38am, LaValley, Brian E wrote

 I'll have to get back to you on running the command by itself. My tar
 version is: tar (GNU tar) 1.13

Bad, bad, bad.

http://www.amanda.org/docs/faq.html#id2554919

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


RE: Estimate timeout

2005-08-30 Thread LaValley, Brian E
Ok, I'll try a new version of tar after my test of the tar command on its
own.

-Original Message-
From: Joshua Baker-LePain [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 30, 2005 11:33 AM
To: LaValley, Brian E
Cc: Amanda (E-mail)
Subject: RE: Estimate timeout


On Tue, 30 Aug 2005 at 11:38am, LaValley, Brian E wrote

 I'll have to get back to you on running the command by itself. My tar
 version is: tar (GNU tar) 1.13

Bad, bad, bad.

http://www.amanda.org/docs/faq.html#id2554919

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


Re: Estimate timeout

2005-08-30 Thread Jon LaBadie
On Tue, Aug 30, 2005 at 11:01:52AM -0400, LaValley, Brian E wrote:
 Can someone please help me get to the bottom of this issue?  I have Amanda
 2.4.4p2 on a Fedora Core 3 machine which is the tape server.  It has no
 trouble backing up itself and other Linux machines.  The trouble comes with
 a Sun Solaris 8 client which never completes its estimate. I tried to keep
 increasing the etimeout value, but I am at 29600 and am wondering how far I
 should go?  Is there some other part I should be looking at? Thank you.
 

Does gnutar follow, and backup symbolic links.  I wonder if some of these
monster estimates might be due to circular references.


-- 
Jon H. LaBadie  [EMAIL PROTECTED]
 JG Computing
 4455 Province Line Road(609) 252-0159
 Princeton, NJ  08540-4322  (609) 683-7220 (fax)


Re: Estimate timeout

2005-08-30 Thread Joshua Baker-LePain
On Tue, 30 Aug 2005 at 1:19pm, Jon LaBadie wrote

 On Tue, Aug 30, 2005 at 11:01:52AM -0400, LaValley, Brian E wrote:
  Can someone please help me get to the bottom of this issue?  I have Amanda
  2.4.4p2 on a Fedora Core 3 machine which is the tape server.  It has no
  trouble backing up itself and other Linux machines.  The trouble comes with
  a Sun Solaris 8 client which never completes its estimate. I tried to keep
  increasing the etimeout value, but I am at 29600 and am wondering how far I
  should go?  Is there some other part I should be looking at? Thank you.
  
 Does gnutar follow, and backup symbolic links.  I wonder if some of these
 monster estimates might be due to circular references.

I'm fairly certain it backs them up as links, not as the targest 
themselves.  It'd be easy to test, though...

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


Estimate timeout issue

2005-08-16 Thread LaValley, Brian E
I have an AMANDA client machine with Solaris 8 and logical volumes on a
disk.  The AMANDA server's config has etimeout=29600 so it waits 59202
seconds and fails.

planner: time 59202.106: error result for host coneng disk /dev/vx/dsk/opt:
Estimate timeout from coneng
planner: time 59202.108: error result for host coneng disk
/dev/vx/dsk/homedg/homevol: Estimate timeout from coneng
planner: time 59202.108: getting estimates took 59192.001 secs
FAILED QUEUE:
  0: coneng /dev/vx/dsk/opt
  1: coneng /dev/vx/dsk/homedg/homevol

Any ideas how I can fix this?  What other information do you need?


Estimate timeout

2005-07-23 Thread LaValley, Brian E
My dumps aren't completing. One fishy thing I am seeing in the logs is two
of the same partition, /home 1 and /home 0
What does this mean?

Amanda 2.4 REQ HANDLE 000-B0F0E609 SEQ 1122119536
SECURITY USER backup
SERVICE sendsize
OPTIONS features=feff9ffe0f;maxdumps=1;hostname=blavalley-l;
GNUTAR /home 0 1970:1:1:0:0:0 -1 OPTIONS
|;bsd-auth;srvcomp-best;index;exclude-list=/opt/amanda/etc/amanda/daily/excl
ude.gtar;
GNUTAR /home 1 2005:7:20:18:21:45 -1 OPTIONS
|;bsd-auth;srvcomp-best;index;exclude-list=/opt/amanda/etc/amanda/daily/excl
ude.gtar;
GNUTAR /common 0 1970:1:1:0:0:0 -1 OPTIONS
|;bsd-auth;srvcomp-best;index;exclude-list=/opt/amanda/etc/amanda/daily/excl
ude.gtar;
GNUTAR /native 0 1970:1:1:0:0:0 -1 OPTIONS
|;bsd-auth;srvcomp-best;index;exclude-list=/opt/amanda/etc/amanda/daily/excl
ude.gtar;
GNUTAR /opt 0 1970:1:1:0:0:0 -1 OPTIONS
|;bsd-auth;srvcomp-best;index;exclude-list=/opt/amanda/etc/amanda/daily/excl
ude.gtar;


Dumps Fail - Estimate Timeout errors

2005-05-18 Thread Tom Brown
Hi
Clients are all RH 7.3 or WhiteBox respin 2
Server is 2.4.4p4 running on whitebox respin 2
My amchecks un fine and without issue however i have come in on 2 
morning snow to find that some of the clients failed. The actual fails 
have occurred on different clients, ie some that failed 2 nights ago 
worked last night without changes, and i can't figure out why. All 
clients were working fine at a different site as we move idc over the 
weekend so we have new network architecture.

Failure errors are
hostname/dev/rd/c0d0p3 lev 0 FAILED [Estimate timeout from hostname]
hostname/dev/rd/c0d0p1 lev 0 FAILED [Estimate timeout from hostname]
anotherhostname /dev/rd/c0d0p2 lev 0 FAILED [Estimate timeout from 
anotherhostname]
anotherhostname /dev/rd/c0d0p5 lev 0 FAILED [Estimate timeout from 
anotherhostname]

There is nothing in the firewall log to indicate a drop of packet. I 
have, as a test actually allowed any ports between these networks, but 
it has not helped.

Does anyone know how to debug these timeout type issues as i have been 
using amanda for about 3 years now and have not encountered this before.

thanks


Re: Dumps Fail - Estimate Timeout errors

2005-05-18 Thread Paul Bijnens
Tom Brown wrote:
Failure errors are
hostname/dev/rd/c0d0p3 lev 0 FAILED [Estimate timeout from hostname]
...
Does anyone know how to debug these timeout type issues as i have been 
using amanda for about 3 years now and have not encountered this before.
Have a look on that client in /tmp/amanda, look for the files
sendsize.DATETIME.debug and see how long the estimate did take.
The first line of the file is the start time and the last line is the
finish time.  How long did it really take?  You many have to
change the etimeout parameter in amanda.conf.
If there is no finish time line, then the estimate crashed, and
probably there is an error message in that file too.
--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  Are you sure?  ...   YES   ...   Phew ...   I'm out  *
***



Re: Dumps Fail - Estimate Timeout errors

2005-05-18 Thread Tom Brown
Have a look on that client in /tmp/amanda, look for the files
sendsize.DATETIME.debug and see how long the estimate did take.
The first line of the file is the start time and the last line is the
finish time.  How long did it really take?  You many have to
change the etimeout parameter in amanda.conf.
If there is no finish time line, then the estimate crashed, and
probably there is an error message in that file too.
Hi
Please see pasted below the 3 entries that applt to last night - everything 
appears OK in there it seems. If the sendsize is not crashing what else 
could cause this?

thanks
sendsize: debug 1 pid 16820 ruid 11 euid 11: start at Wed May 18 00:29:27 
2005
sendsize: version 2.4.4p1
snip
sendsize[16820]: time 118.545: child 16874 terminated normally
sendsize: time 118.545: pid 16820 finish time Wed May 18 00:31:25 2005

sendsize: debug 1 pid 17384 ruid 11 euid 11: start at Wed May 18 00:54:26 
2005
sendsize: version 2.4.4p1
snip
sendsize[17384]: time 101.841: child 17418 terminated normally
sendsize: time 101.841: pid 17384 finish time Wed May 18 00:56:08 2005

sendsize: debug 1 pid 17795 ruid 11 euid 11: start at Wed May 18 01:19:26 
2005
sendsize: version 2.4.4p1
snip
sendsize[17795]: time 106.417: child 17842 terminated normally
sendsize: time 106.417: pid 17795 finish time Wed May 18 01:21:12 2005 




Re: Dumps Fail - Estimate Timeout errors

2005-05-18 Thread Paul Bijnens
Tom Brown wrote:
actually digging around in /tmp/amanda i have come accross files with 
amandad.DATE.debug and a few of them apper to end like this

amandad: time 111.846: dgram_recv: timeout after 10 seconds
amandad: time 111.846: waiting for ack: timeout, retrying
amandad: time 121.846: dgram_recv: timeout after 10 seconds
amandad: time 121.847: waiting for ack: timeout, retrying
amandad: time 131.847: dgram_recv: timeout after 10 seconds
amandad: time 131.847: waiting for ack: timeout, retrying
amandad: time 141.847: dgram_recv: timeout after 10 seconds
amandad: time 141.848: waiting for ack: timeout, retrying
amandad: time 151.848: dgram_recv: timeout after 10 seconds
amandad: time 151.848: waiting for ack: timeout, giving up!
amandad: time 151.848: pid 17383 finish time Wed May 18 00:56:58 2005
what would cause that and i presume this could be the cause of the failure
Maybe this a problem in some firewall settings that forbids reverse
traffic, or expires the udp-reply after less than 101 seconds.
--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  Are you sure?  ...   YES   ...   Phew ...   I'm out  *
***


Re: estimate timeout

2005-05-12 Thread Jim Summers
McDonagh, Joe wrote:
I have an estimate timeout of three hours, is there anyway to skip the
estimate or what? It's getting estimate timeout, the fs is fine, it can
be read from and everything, it just has loads of small files.
You might want to consider upgrading the server and client involved to 
the current stable.  It has provisions for doing server estimates.  That 
should help.

The server estimates seem to be conservative, to be on the safe side.  I 
have some data to look through for level 0, level 1, level 2 estimates 
and I should be able to post it by the end of this week.

Hope this helps.
--
Jim Summers
School of Computer Science-University of Oklahoma
-


estimate timeout

2005-05-11 Thread McDonagh, Joe
I have an estimate timeout of three hours, is there anyway to skip the
estimate or what? It's getting estimate timeout, the fs is fine, it can
be read from and everything, it just has loads of small files.


Re: estimate timeout

2005-05-11 Thread Paul Bijnens
McDonagh, Joe wrote:
I have an estimate timeout of three hours, is there anyway to skip the
estimate or what? It's getting estimate timeout, the fs is fine, it can
be read from and everything, it just has loads of small files.
If you have amanda 2.4.5, you can use alternate methods for the
estimate.  From the NEWS file:
* new 'estimate' dumptype option to select estimate type:
CLIENT: estimate by the dumping program.
CALCSIZE: estimate by the calcsize program, a lot faster but less 
acurate.
SERVER: estimate based on statistic from previous run, take second but
can be wrong on the estimate size.

I've not yet tried it myself.
--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  Are you sure?  ...   YES   ...   Phew ...   I'm out  *
***


Estimate Timeout through iptables firewall

2005-01-22 Thread Matt Hyclak
This is mostly just for the archives.

I had problems with some clients timing out on estimates when running
through a linux firewall (2.6.9 patched and 2.6.10). The problem was that
ip_conntrack_amanda was closing the return path before the clients could
finish, so the estimate results were getting dropped on the floor. There are
three solutions:

1. Open a hole in the firewall allowing clients to send from port 10080 to
your amanda server.

2. Change the UDP stream timeout which defaults to 180 seconds to something
larger. WARNING! This will change it for ALL UDP connections:

sysctl -w net.ipv4.netfilter.ip_conntrack_udp_timeout_stream=1800

3. Extend the amount of time that ip_conntrack_amanda allows the connection
to remain open. According to the source it is currently 300 seconds. You can
change this by loading the module with the master_timeout option set to
something bigger. This can be done in /etc/modprobe.conf:

options ip_conntrack_amanda master_timeout=1800

Obviously I prefer 3.

Hope this helps someone down the road...

Matt

-- 
Matt Hyclak
Department of Mathematics 
Department of Social Work
Ohio University
(740) 593-1263




pgpqN7hpFZ99C.pgp
Description: PGP signature


Re: Estimate timeout error

2004-12-07 Thread Nick Danger
Paul Bijnens wrote:
But the reply packet never got acknowledged by the server.
Somehow it got lost or corrupted.
Default route for reverse path not correct?  Wrong subnetmask?
Try do get a network trace at the client and server, and inbetween
(don't know how to accomplish that on a PIX firewall):
Solaris:
snoop -x42 host x.x.x.x proto udp port 10080
using open source (linux and others):
tcpdump -X host x.x.x.x and udp and port 10080
Or other programs that have the same capabilities (ethereal etc).
Before guessing how to fix it, we must know where the problem is.
Is the packet lost?  or is it broken?

Quick recap: Server grolsch tries to back up client dominion. It works 
for the partitions of /, /usr and /var. As soon as I tell grolsh to back 
up dominions /u00 partition (a 45G partition, but presently only 177M 
full w/approx 2000 files) it will fail. I have since removed /u00 from 
backups to at least keep things working in the meantime but I would like 
that data backed up :-)

I have moved the amanda server to public IP space. It is still behind a 
PIX firewall, I just got rid of the private IP to public IP mappings.  
This didnt fix it :-) Not that I thought it would, I just got annoyed at 
some of the routing.

I ran tcpdump on client and server, the dumps are on the following page, 
lined up as best I could to show the flow. It seems when doing the 
partition that makes it fail, a bunch of packets do not get from the 
client to the server.  Since I am no expert in TCPdump or interpreting 
its results, I hope this helps figure out the problem.

tcpdump results on http://www.hackermonkey.com/amanda-error.html
-Nick


Re: lev 0 FAILED [Estimate timeout from ******]

2004-12-03 Thread Nuno Dias
I think is not a problem of firewall because, if i put only one
filesystem in disklist it work.

ND

On Thu, 2004-12-02 at 18:21 +0100, Christoph Scheeder wrote:
 Hi,
 
 Completly shure?
 many modern linux distros (AFAIK at least suse and redhat) come up
 with default firewall-installations blocking many things if you do
 not explicitly disable these firewalls.
 So there might be a firewall on the linux-box even if you didn't
 configure it.
 
 Christoph
 
 Nuno Dias schrieb:
  No, the two machines are in the same network, no firewall.
  
  ND
  
  On Thu, 2004-12-02 at 17:55 +0100, Christoph Scheeder wrote:
  
 Nuno Dias schrieb:
 
  Hi,
 
  I have a Digital Unix machine that give me some strange results when i
 try to use amanda.
  If i configure disklist with 2 or more disks of the Digital Unix
 machine, the amanda report tell me this:
 
 xxx/ lev 0 FAILED [Estimate timeout from xx]
 xxx/usr lev 0 FAILED [Estimate timeout from xx]
 xxx/ lev 0 FAILED [Estimate timeout from xx]
 
  The amandad.20041202142753000.debug file in Digital Machine have this
 error:
 
 amandad: time 200.266: dgram_recv: timeout after 10 seconds
 amandad: time 200.266: waiting for ack: timeout, retrying
 amandad: time 210.267: dgram_recv: timeout after 10 seconds
 amandad: time 210.267: waiting for ack: timeout, retrying
 amandad: time 220.267: dgram_recv: timeout after 10 seconds
 amandad: time 220.267: waiting for ack: timeout, retrying
 amandad: time 230.267: dgram_recv: timeout after 10 seconds
 amandad: time 230.267: waiting for ack: timeout, retrying
 amandad: time 240.267: dgram_recv: timeout after 10 seconds
 amandad: time 240.267: waiting for ack: timeout, giving up!
 amandad: time 240.267: pid 22594 finish time Thu Dec  2 14:31:54 2004
 
  The strange thing is, if i configure only one disk in disklist, the
 backup run ok, and no problem is report in amanda report.
  I increased the etimeout/ctimeout to a big number ... and did not work.
 
  I have a Linux machine that is the master and the Digital Unix machine
 is the client, the version of amanda is 2.4.4p4
 
 Thank's for some help.
 
 ND
 
 Hi,
 could this be a firewall-timeout on the linux-machine?
 Christoph
-- 
Nuno Dias [EMAIL PROTECTED]
LIP


Re: lev 0 FAILED [Estimate timeout from ******]

2004-12-03 Thread Paul Bijnens
Nuno Dias wrote:
I think is not a problem of firewall because, if i put only one
filesystem in disklist it work.
Not sure.  Is there a firewall involved?
If the connection tracking times out after 5 minutes, then
maybe 1 filesystem can reply within that timeframe, but more
filesystems exceed the time.
Is there a firewall involved?  What is the UDP-timer value?
Is it any better is client and server disable firewall rules
competely?
Can you see the reply packet leave the client, and
arrive at the server? (capture all network traffic for udp
port 10080 at client and server to verify.)
Is there any other message in the client-file 
/tmp/amanda/sendsize.DATETIME.debug?


ND
On Thu, 2004-12-02 at 18:21 +0100, Christoph Scheeder wrote:
Hi,
Completly shure?
many modern linux distros (AFAIK at least suse and redhat) come up
with default firewall-installations blocking many things if you do
not explicitly disable these firewalls.
So there might be a firewall on the linux-box even if you didn't
configure it.
Christoph
Nuno Dias schrieb:
No, the two machines are in the same network, no firewall.
ND
On Thu, 2004-12-02 at 17:55 +0100, Christoph Scheeder wrote:

Nuno Dias schrieb:

Hi,
I have a Digital Unix machine that give me some strange results when i
try to use amanda.
If i configure disklist with 2 or more disks of the Digital Unix
machine, the amanda report tell me this:
xxx/ lev 0 FAILED [Estimate timeout from xx]
xxx/usr lev 0 FAILED [Estimate timeout from xx]
xxx/ lev 0 FAILED [Estimate timeout from xx]
The amandad.20041202142753000.debug file in Digital Machine have this
error:
amandad: time 200.266: dgram_recv: timeout after 10 seconds
amandad: time 200.266: waiting for ack: timeout, retrying
amandad: time 210.267: dgram_recv: timeout after 10 seconds
amandad: time 210.267: waiting for ack: timeout, retrying
amandad: time 220.267: dgram_recv: timeout after 10 seconds
amandad: time 220.267: waiting for ack: timeout, retrying
amandad: time 230.267: dgram_recv: timeout after 10 seconds
amandad: time 230.267: waiting for ack: timeout, retrying
amandad: time 240.267: dgram_recv: timeout after 10 seconds
amandad: time 240.267: waiting for ack: timeout, giving up!
amandad: time 240.267: pid 22594 finish time Thu Dec  2 14:31:54 2004
The strange thing is, if i configure only one disk in disklist, the
backup run ok, and no problem is report in amanda report.
I increased the etimeout/ctimeout to a big number ... and did not work.
I have a Linux machine that is the master and the Digital Unix machine
is the client, the version of amanda is 2.4.4p4
Thank's for some help.
ND
Hi,
could this be a firewall-timeout on the linux-machine?
Christoph

--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  Are you sure?  ...   YES   ...   Phew ...   I'm out  *
***



Re: Estimate timeout error

2004-12-03 Thread Nick Danger
There is a PIX between the two, but Im backing up a bunch (10?) linux 
and solaris servers in the same areas of the network, to this same 
amanda server without any issues so I dont believe it to be a firewall 
issue. There are no iptables running on either host (both linux in this 
case)

In the amandad.XXX.debug log I have the following lines, which Im 
assuming are the error report of the problem? Now, the question is, how 
to fix it :-)

-Nick
amandad: time 0.010: amandahosts security check passed
amandad: time 0.010: running service /usr/lib/amanda/sendsize
amandad: time 182.436: sending REP packet:

Amanda 2.4 REP HANDLE 005-40813308 SEQ 1102082216
OPTIONS features=feff9ffe0f;
/ 0 SIZE 301197
/ 1 SIZE 100
/u00 0 SIZE 143930
/u00 1 SIZE 41411
/usr 0 SIZE 880958
/usr 1 SIZE 79
/usr/local 0 SIZE 174
/usr/local 1 SIZE 47
/var 0 SIZE 299300
/var 1 SIZE 2857

amandad: time 192.437: dgram_recv: timeout after 10 seconds
amandad: time 192.437: waiting for ack: timeout, retrying
amandad: time 202.439: dgram_recv: timeout after 10 seconds
amandad: time 202.439: waiting for ack: timeout, retrying
amandad: time 212.441: dgram_recv: timeout after 10 seconds
amandad: time 212.442: waiting for ack: timeout, retrying
amandad: time 222.444: dgram_recv: timeout after 10 seconds
amandad: time 222.444: waiting for ack: timeout, retrying
amandad: time 232.446: dgram_recv: timeout after 10 seconds
amandad: time 232.446: waiting for ack: timeout, giving up!
amandad: time 232.446: pid 21896 finish time Fri Dec  3 09:01:32 2004
Paul Bijnens wrote:
Nick Danger wrote:
Nope - still a problem. The error is still as below:
FAILURE AND STRANGE DUMP SUMMARY:
 dominion.h /var lev 0 FAILED [Estimate timeout from dominion.xxx]
 dominion.h /usr/local lev 0 FAILED [Estimate timeout from dominion.xxx]
 dominion.h /usr lev 0 FAILED [Estimate timeout from dominion.xxx]
 dominion.h /u00 lev 0 FAILED [Estimate timeout from dominion.xxx]
 dominion.h / lev 0 FAILED [Estimate timeout from dominion.xxx]
I have the timeout in amanda.conf set to an ungodly high number of
etimeout -12000 # total number of seconds for estimates.
[...]
sendsize: debug 1 pid 26242 ruid 33 euid 33: start at Thu Dec  2 
11:25:07 2004
sendsize: version 2.4.4p1
[...]
sendsize: time 172.473: pid 26242 finish time Thu Dec  2 11:27:59 2004
The estimate really takes only 173 seconds.  That means that etimeout
is plenty (better lower it again to normal values).
The problem seems to be in the reply packet.
I've already seen problems with a UDP-packet overflow, but that's
unlikely.  That problem happened with older versions where the UDP
size was only 8Kbyte or so. Currently it is 64K, but it could be
limited by the OS too, of course.  The reply packet is usually larger
than the request packet, because it contains 1 to 3 lines for each
DLE (level 0, current level, current plus 1).
In amandad.DATETIME.debug, you can find the request packet, and the
reply packet.
Any weird limitation on UDP packet size on one of the hosts (or
intermediate routers/firewalls)?
Another problem could be in the iptables modules for amanda, where
there is already twice a bug introduced.  I don't know exactly the
last status of that bug.  If not needed, do not use the amanda iptables
modules.  Try lsmod | grep amanda.  (Or on intermediate firewalls!)
Maybe try a network traffic dump (with tcpdump or similar program)
on client *and* host?



Re: Estimate timeout error

2004-12-03 Thread Paul Bijnens
Nick Danger wrote:
There is a PIX between the two, but Im backing up a bunch (10?) linux 
and solaris servers in the same areas of the network, to this same 
amanda server without any issues so I dont believe it to be a firewall 
issue. There are no iptables running on either host (both linux in this 
case)

In the amandad.XXX.debug log I have the following lines, which Im 
assuming are the error report of the problem? Now, the question is, how 
to fix it :-)

-Nick
amandad: time 0.010: amandahosts security check passed
amandad: time 0.010: running service /usr/lib/amanda/sendsize
amandad: time 182.436: sending REP packet:
The above concludes that 3 minutes is needed for the sendsize,
and it is indeed without errors, because it has all the
info below.  Could still be that 179 seconds works and 181 seconds
is too late...


Amanda 2.4 REP HANDLE 005-40813308 SEQ 1102082216
OPTIONS features=feff9ffe0f;
/ 0 SIZE 301197
/ 1 SIZE 100
/u00 0 SIZE 143930
/u00 1 SIZE 41411
/usr 0 SIZE 880958
/usr 1 SIZE 79
/usr/local 0 SIZE 174
/usr/local 1 SIZE 47
/var 0 SIZE 299300
/var 1 SIZE 2857

The above lines are the reply packet, less than 300 bytes,
so I guess it's not a UDP packet overflow.

amandad: time 192.437: dgram_recv: timeout after 10 seconds
amandad: time 192.437: waiting for ack: timeout, retrying
amandad: time 202.439: dgram_recv: timeout after 10 seconds
amandad: time 202.439: waiting for ack: timeout, retrying
amandad: time 212.441: dgram_recv: timeout after 10 seconds
amandad: time 212.442: waiting for ack: timeout, retrying
amandad: time 222.444: dgram_recv: timeout after 10 seconds
amandad: time 222.444: waiting for ack: timeout, retrying
amandad: time 232.446: dgram_recv: timeout after 10 seconds
amandad: time 232.446: waiting for ack: timeout, giving up!
amandad: time 232.446: pid 21896 finish time Fri Dec  3 09:01:32 2004

But the reply packet never got acknowledged by the server.
Somehow it got lost or corrupted.
Default route for reverse path not correct?  Wrong subnetmask?
Try do get a network trace at the client and server, and inbetween
(don't know how to accomplish that on a PIX firewall):
Solaris:
snoop -x42 host x.x.x.x proto udp port 10080
using open source (linux and others):
tcpdump -X host x.x.x.x and udp and port 10080
Or other programs that have the same capabilities (ethereal etc).
Before guessing how to fix it, we must know where the problem is.
Is the packet lost?  or is it broken?
--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  Are you sure?  ...   YES   ...   Phew ...   I'm out  *
***



lev 0 FAILED [Estimate timeout from ******]

2004-12-02 Thread Nuno Dias
 Hi,

 I have a Digital Unix machine that give me some strange results when i
try to use amanda.
 If i configure disklist with 2 or more disks of the Digital Unix
machine, the amanda report tell me this:

xxx/ lev 0 FAILED [Estimate timeout from xx]
xxx/usr lev 0 FAILED [Estimate timeout from xx]
xxx/ lev 0 FAILED [Estimate timeout from xx]

 The amandad.20041202142753000.debug file in Digital Machine have this
error:

amandad: time 200.266: dgram_recv: timeout after 10 seconds
amandad: time 200.266: waiting for ack: timeout, retrying
amandad: time 210.267: dgram_recv: timeout after 10 seconds
amandad: time 210.267: waiting for ack: timeout, retrying
amandad: time 220.267: dgram_recv: timeout after 10 seconds
amandad: time 220.267: waiting for ack: timeout, retrying
amandad: time 230.267: dgram_recv: timeout after 10 seconds
amandad: time 230.267: waiting for ack: timeout, retrying
amandad: time 240.267: dgram_recv: timeout after 10 seconds
amandad: time 240.267: waiting for ack: timeout, giving up!
amandad: time 240.267: pid 22594 finish time Thu Dec  2 14:31:54 2004

 The strange thing is, if i configure only one disk in disklist, the
backup run ok, and no problem is report in amanda report.
 I increased the etimeout/ctimeout to a big number ... and did not work.

 I have a Linux machine that is the master and the Digital Unix machine
is the client, the version of amanda is 2.4.4p4

Thank's for some help.

ND
-- 
Nuno Dias [EMAIL PROTECTED]
LIP


Re: lev 0 FAILED [Estimate timeout from ******]

2004-12-02 Thread Christoph Scheeder
Nuno Dias schrieb:
 Hi,
 I have a Digital Unix machine that give me some strange results when i
try to use amanda.
 If i configure disklist with 2 or more disks of the Digital Unix
machine, the amanda report tell me this:
xxx/ lev 0 FAILED [Estimate timeout from xx]
xxx/usr lev 0 FAILED [Estimate timeout from xx]
xxx/ lev 0 FAILED [Estimate timeout from xx]
 The amandad.20041202142753000.debug file in Digital Machine have this
error:
amandad: time 200.266: dgram_recv: timeout after 10 seconds
amandad: time 200.266: waiting for ack: timeout, retrying
amandad: time 210.267: dgram_recv: timeout after 10 seconds
amandad: time 210.267: waiting for ack: timeout, retrying
amandad: time 220.267: dgram_recv: timeout after 10 seconds
amandad: time 220.267: waiting for ack: timeout, retrying
amandad: time 230.267: dgram_recv: timeout after 10 seconds
amandad: time 230.267: waiting for ack: timeout, retrying
amandad: time 240.267: dgram_recv: timeout after 10 seconds
amandad: time 240.267: waiting for ack: timeout, giving up!
amandad: time 240.267: pid 22594 finish time Thu Dec  2 14:31:54 2004
 The strange thing is, if i configure only one disk in disklist, the
backup run ok, and no problem is report in amanda report.
 I increased the etimeout/ctimeout to a big number ... and did not work.
 I have a Linux machine that is the master and the Digital Unix machine
is the client, the version of amanda is 2.4.4p4
Thank's for some help.
ND
Hi,
could this be a firewall-timeout on the linux-machine?
Christoph


Re: lev 0 FAILED [Estimate timeout from ******]

2004-12-02 Thread Nuno Dias
No, the two machines are in the same network, no firewall.

ND

On Thu, 2004-12-02 at 17:55 +0100, Christoph Scheeder wrote:
 Nuno Dias schrieb:
   Hi,
  
   I have a Digital Unix machine that give me some strange results when i
  try to use amanda.
   If i configure disklist with 2 or more disks of the Digital Unix
  machine, the amanda report tell me this:
  
  xxx/ lev 0 FAILED [Estimate timeout from xx]
  xxx/usr lev 0 FAILED [Estimate timeout from xx]
  xxx/ lev 0 FAILED [Estimate timeout from xx]
  
   The amandad.20041202142753000.debug file in Digital Machine have this
  error:
  
  amandad: time 200.266: dgram_recv: timeout after 10 seconds
  amandad: time 200.266: waiting for ack: timeout, retrying
  amandad: time 210.267: dgram_recv: timeout after 10 seconds
  amandad: time 210.267: waiting for ack: timeout, retrying
  amandad: time 220.267: dgram_recv: timeout after 10 seconds
  amandad: time 220.267: waiting for ack: timeout, retrying
  amandad: time 230.267: dgram_recv: timeout after 10 seconds
  amandad: time 230.267: waiting for ack: timeout, retrying
  amandad: time 240.267: dgram_recv: timeout after 10 seconds
  amandad: time 240.267: waiting for ack: timeout, giving up!
  amandad: time 240.267: pid 22594 finish time Thu Dec  2 14:31:54 2004
  
   The strange thing is, if i configure only one disk in disklist, the
  backup run ok, and no problem is report in amanda report.
   I increased the etimeout/ctimeout to a big number ... and did not work.
  
   I have a Linux machine that is the master and the Digital Unix machine
  is the client, the version of amanda is 2.4.4p4
  
  Thank's for some help.
  
  ND
 Hi,
 could this be a firewall-timeout on the linux-machine?
 Christoph
-- 
Nuno Dias [EMAIL PROTECTED]
LIP


Re: lev 0 FAILED [Estimate timeout from ******]

2004-12-02 Thread Christoph Scheeder
Hi,
Completly shure?
many modern linux distros (AFAIK at least suse and redhat) come up
with default firewall-installations blocking many things if you do
not explicitly disable these firewalls.
So there might be a firewall on the linux-box even if you didn't
configure it.
Christoph
Nuno Dias schrieb:
No, the two machines are in the same network, no firewall.
ND
On Thu, 2004-12-02 at 17:55 +0100, Christoph Scheeder wrote:
Nuno Dias schrieb:
Hi,
I have a Digital Unix machine that give me some strange results when i
try to use amanda.
If i configure disklist with 2 or more disks of the Digital Unix
machine, the amanda report tell me this:
xxx/ lev 0 FAILED [Estimate timeout from xx]
xxx/usr lev 0 FAILED [Estimate timeout from xx]
xxx/ lev 0 FAILED [Estimate timeout from xx]
The amandad.20041202142753000.debug file in Digital Machine have this
error:
amandad: time 200.266: dgram_recv: timeout after 10 seconds
amandad: time 200.266: waiting for ack: timeout, retrying
amandad: time 210.267: dgram_recv: timeout after 10 seconds
amandad: time 210.267: waiting for ack: timeout, retrying
amandad: time 220.267: dgram_recv: timeout after 10 seconds
amandad: time 220.267: waiting for ack: timeout, retrying
amandad: time 230.267: dgram_recv: timeout after 10 seconds
amandad: time 230.267: waiting for ack: timeout, retrying
amandad: time 240.267: dgram_recv: timeout after 10 seconds
amandad: time 240.267: waiting for ack: timeout, giving up!
amandad: time 240.267: pid 22594 finish time Thu Dec  2 14:31:54 2004
The strange thing is, if i configure only one disk in disklist, the
backup run ok, and no problem is report in amanda report.
I increased the etimeout/ctimeout to a big number ... and did not work.
I have a Linux machine that is the master and the Digital Unix machine
is the client, the version of amanda is 2.4.4p4
Thank's for some help.
ND
Hi,
could this be a firewall-timeout on the linux-machine?
Christoph



Re: Estimate timeout error

2004-12-02 Thread Nick Danger
Matt Hyclak wrote:
The sendsize.DATETIME.debug log file on dominion should tell you how long
the estimates are taking. A simple calculation should tell you how big
etimeout should be. 

(NUM_PARTITIONS * ETIMEOUT) = total time amanda waits for estimates.
Matt
 

Nope - still a problem. The error is still as below:
FAILURE AND STRANGE DUMP SUMMARY:
 dominion.h /var lev 0 FAILED [Estimate timeout from dominion.xxx]
 dominion.h /usr/local lev 0 FAILED [Estimate timeout from dominion.xxx]
 dominion.h /usr lev 0 FAILED [Estimate timeout from dominion.xxx]
 dominion.h /u00 lev 0 FAILED [Estimate timeout from dominion.xxx]
 dominion.h / lev 0 FAILED [Estimate timeout from dominion.xxx]

I have the timeout in amanda.conf set to an ungodly high number of
etimeout -12000 # total number of seconds for estimates.
The lines from the sendsize in /var/log are listed below. All the file 
systems are short, except for /u00 which lists Is the 77 in MINUES? Or 
seconds? Either way, 12000 in amanda.conf should be plenty, shouldnt it? 
I could be just doing my math wrong. Which is always a possiblity. There 
are no units listed other then in the config file, so Im guessing at 
some parts here.

Thanks all
-Nick
---
sendsize: debug 1 pid 26242 ruid 33 euid 33: start at Thu Dec  2 
11:25:07 2004
sendsize: version 2.4.4p1
sendsize[26244]: time 0.007: calculating for amname '/', dirname '/', 
spindle -1
sendsize[26244]: time 0.007: getting size via dump for / level 0
sendsize[26242]: time 0.008: waiting for any estimate child
sendsize[26244]: time 0.008: calculating for device '/dev/sda1' with 'ext3'
sendsize[26244]: time 0.008: running /sbin/dump 0Ssf 1048576 - /dev/sda1
sendsize[26244]: time 0.011: running /usr/lib/amanda/killpgrp
sendsize[26244]: time 0.071:   DUMP: Excluding inode 8 (journal inode) 
from dump
sendsize[26244]: time 0.072:   DUMP: Excluding inode 7 (resize inode) 
from dump
sendsize[26244]: time 0.410: 308423680
sendsize[26244]: time 0.411: .
sendsize[26244]: estimate time for / level 0: 0.402
sendsize[26244]: estimate size for / level 0: 301195 KB
sendsize[26244]: time 0.411: asking killpgrp to terminate
sendsize[26244]: time 1.415: getting size via dump for / level 1
sendsize[26244]: time 1.416: calculating for device '/dev/sda1' with 'ext3'
sendsize[26244]: time 1.416: running /sbin/dump 1Ssf 1048576 - /dev/sda1
sendsize[26244]: time 1.419: running /usr/lib/amanda/killpgrp
sendsize[26244]: time 1.449:   DUMP: Excluding inode 8 (journal inode) 
from dump
sendsize[26244]: time 1.451:   DUMP: Excluding inode 7 (resize inode) 
from dump
sendsize[26244]: time 1.887: 1104896
sendsize[26244]: time 1.889: .
sendsize[26244]: estimate time for / level 1: 0.472
sendsize[26244]: estimate size for / level 1: 1079 KB
sendsize[26244]: time 1.889: asking killpgrp to terminate
sendsize[26244]: time 2.895: done with amname '/', dirname '/', spindle -1
sendsize[26242]: time 2.895: child 26244 terminated normally
sendsize[26249]: time 2.896: calculating for amname '/u00', dirname 
'/u00', spindle -1
sendsize[26249]: time 2.896: getting size via dump for /u00 level 0
sendsize[26249]: time 2.897: calculating for device '/dev/sda9' with 'ext3'
sendsize[26249]: time 2.897: running /sbin/dump 0Ssf 1048576 - /dev/sda9
sendsize[26249]: time 2.900: running /usr/lib/amanda/killpgrp
sendsize[26242]: time 2.905: waiting for any estimate child
sendsize[26249]: time 2.942:   DUMP: Excluding inode 8 (journal inode) 
from dump
sendsize[26249]: time 2.943:   DUMP: Excluding inode 7 (resize inode) 
from dump
sendsize[26249]: time 80.109: 147388416
sendsize[26249]: time 80.111: .
sendsize[26249]: estimate time for /u00 level 0: 77.213
sendsize[26249]: estimate size for /u00 level 0: 143934 KB
sendsize[26249]: time 80.111: asking killpgrp to terminate
sendsize[26249]: time 81.112: getting size via dump for /u00 level 1
sendsize[26249]: time 81.113: calculating for device '/dev/sda9' with 'ext3'
sendsize[26249]: time 81.113: running /sbin/dump 1Ssf 1048576 - /dev/sda9
sendsize[26249]: time 81.116: running /usr/lib/amanda/killpgrp
sendsize[26249]: time 81.401:   DUMP: Excluding inode 8 (journal inode) 
from dump
sendsize[26249]: time 81.403:   DUMP: Excluding inode 7 (resize inode) 
from dump
sendsize[26249]: time 159.069: 42408960
sendsize[26249]: time 159.070: .
sendsize[26249]: estimate time for /u00 level 1: 77.957
sendsize[26249]: estimate size for /u00 level 1: 41415 KB
sendsize[26249]: time 159.071: asking killpgrp to terminate
sendsize[26249]: time 160.080: done with amname '/u00', dirname '/u00', 
spindle -1
sendsize[26242]: time 160.080: child 26249 terminated normally
sendsize[26484]: time 160.081: calculating for amname '/usr', dirname 
'/usr', spindle -1
sendsize[26484]: time 160.081: getting size via dump for /usr level 0
sendsize[26484]: time 160.082: calculating for device '/dev/sda2' with 
'ext3'
sendsize[26484]: time 160.082: running /sbin/dump 0Ssf 1048576 - /dev/sda2
sendsize[26484]: time

Re: Estimate timeout error

2004-12-02 Thread Paul Bijnens
Nick Danger wrote:
Nope - still a problem. The error is still as below:
FAILURE AND STRANGE DUMP SUMMARY:
 dominion.h /var lev 0 FAILED [Estimate timeout from dominion.xxx]
 dominion.h /usr/local lev 0 FAILED [Estimate timeout from dominion.xxx]
 dominion.h /usr lev 0 FAILED [Estimate timeout from dominion.xxx]
 dominion.h /u00 lev 0 FAILED [Estimate timeout from dominion.xxx]
 dominion.h / lev 0 FAILED [Estimate timeout from dominion.xxx]
I have the timeout in amanda.conf set to an ungodly high number of
etimeout -12000 # total number of seconds for estimates.
[...]
sendsize: debug 1 pid 26242 ruid 33 euid 33: start at Thu Dec  2 
11:25:07 2004
sendsize: version 2.4.4p1
[...]
sendsize: time 172.473: pid 26242 finish time Thu Dec  2 11:27:59 2004
The estimate really takes only 173 seconds.  That means that etimeout
is plenty (better lower it again to normal values).
The problem seems to be in the reply packet.
I've already seen problems with a UDP-packet overflow, but that's
unlikely.  That problem happened with older versions where the UDP
size was only 8Kbyte or so. Currently it is 64K, but it could be
limited by the OS too, of course.  The reply packet is usually larger
than the request packet, because it contains 1 to 3 lines for each
DLE (level 0, current level, current plus 1).
In amandad.DATETIME.debug, you can find the request packet, and the
reply packet.
Any weird limitation on UDP packet size on one of the hosts (or
intermediate routers/firewalls)?
Another problem could be in the iptables modules for amanda, where
there is already twice a bug introduced.  I don't know exactly the
last status of that bug.  If not needed, do not use the amanda iptables
modules.  Try lsmod | grep amanda.  (Or on intermediate firewalls!)
Maybe try a network traffic dump (with tcpdump or similar program)
on client *and* host?
--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  Are you sure?  ...   YES   ...   Phew ...   I'm out  *
***


Estimate timeout error

2004-11-29 Thread Nick Danger
Is there any way to properly calculate what your timeout estimate value 
should be other then trial and error? I have a partition on a machine 
that gives this error. 

dominion.h /u00 lev 0 FAILED [Estimate timeout from dominion]
If I remove that partition from disklist, all other partitions on that 
server backup just fine.
Its a 45G partiton on a SCSI raid set. Its hardly 1% full, holding maybe 
1000 files. I have upped the timeout to 1200, and still it failed.

Suggestions?
-Nick



Re: Estimate timeout error

2004-11-29 Thread Matt Hyclak
On Mon, Nov 29, 2004 at 09:00:44AM -0500, Nick Danger enlightened us:
 Is there any way to properly calculate what your timeout estimate value 
 should be other then trial and error? I have a partition on a machine 
 that gives this error. 
 
 dominion.h /u00 lev 0 FAILED [Estimate timeout from dominion]
 
 
 If I remove that partition from disklist, all other partitions on that 
 server backup just fine.
 Its a 45G partiton on a SCSI raid set. Its hardly 1% full, holding maybe 
 1000 files. I have upped the timeout to 1200, and still it failed.
 

The sendsize.DATETIME.debug log file on dominion should tell you how long
the estimates are taking. A simple calculation should tell you how big
etimeout should be. 

(NUM_PARTITIONS * ETIMEOUT) = total time amanda waits for estimates.

Matt

-- 
Matt Hyclak
Department of Mathematics 
Department of Social Work
Ohio University
(740) 593-1263


pgpTl5n77Ssee.pgp
Description: PGP signature


Re: Estimate timeout error

2004-11-29 Thread Gene Heskett
On Monday 29 November 2004 09:00, Nick Danger wrote:
Is there any way to properly calculate what your timeout estimate
 value should be other then trial and error? I have a partition on a
 machine that gives this error. 

dominion.h /u00 lev 0 FAILED [Estimate timeout from dominion]


If I remove that partition from disklist, all other partitions on
 that server backup just fine.
Its a 45G partiton on a SCSI raid set. Its hardly 1% full, holding
 maybe 1000 files. I have upped the timeout to 1200, and still it
 failed.

Suggestions?

No, other than your etimeout value s/b more than sufficient.  Do you 
have other dirs on that client that do work ok and amanda backs them 
up ok?

-Nick

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
99.29% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.


Re: Estimate timeout error

2004-11-29 Thread Paul Bijnens

On Monday 29 November 2004 09:00, Nick Danger wrote:
Is there any way to properly calculate what your timeout estimate
value should be other then trial and error? I have a partition on a
machine that gives this error. 
dominion.h /u00 lev 0 FAILED [Estimate timeout from dominion]
Have a look in on the client in the file 
/tmp/amanda/sendsize.DATETIMESTAMP.debug.
The first and the last line contain a date.
Even when the server times out, the client still continues.
If not, then there is probably an error message.

--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  Are you sure?  ...   YES   ...   Phew ...   I'm out  *
***


Re: Estimate timeout

2004-06-11 Thread Steven Schoch
Joshua Baker-LePain wrote:
 admin prohibited is definately a result of iptables filtering.
 Have a close look in homer.  Execute iptables -L.

 Maybe the solution is loading the amanda iptables module,
 if that is available on the machine.
I'd be interested to see if that fixes it.
The following line was added to /etc/sysconfig/iptables:
-A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp -s XX.XX.XX.0/24 
--sport 10080 -j ACCEPT

...where XX.XX.XX is the IP address of our local 'external' network, on 
which both homer and marge are located.

The problem has been solved.
--
Steve
_
Watch the online reality show Mixed Messages with a friend and enter to win 
a trip to NY 
http://www.msnmessenger-download.click-url.com/go/onm00200497ave/direct/01/



Re: Estimate timeout

2004-06-10 Thread Paul Bijnens
Steven Schoch wrote:
on Wed, 09 Jun 2004 Paul Bijnens wrote:
Try to find out where the UDP packet got dropped, using tcpdump or 
etherreal or other network analyzer on homer and marge.

Now we're getting somewhere.  The tcpdump shows this:
15:01:56.739818 homer  marge: icmp: host homer unreachable - admin 
prohibited [tos 0xc0]

My guess is that ICMP message is something to do with a firewall.

admin prohibited is definately a result of iptables filtering.
Have a close look in homer.  Execute iptables -L.
Maybe the solution is loading the amanda iptables module,
if that is available on the machine.
--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  Are you sure?  ...   YES   ...   Phew ...   I'm out  *
***



Re: Estimate timeout

2004-06-10 Thread Joshua Baker-LePain
On Thu, 10 Jun 2004 at 9:31am, Paul Bijnens wrote

 Steven Schoch wrote:
 
  Now we're getting somewhere.  The tcpdump shows this:
  
  15:01:56.739818 homer  marge: icmp: host homer unreachable - admin 
  prohibited [tos 0xc0]
  
  My guess is that ICMP message is something to do with a firewall.
 
 
 admin prohibited is definately a result of iptables filtering.
 Have a close look in homer.  Execute iptables -L.
 
 Maybe the solution is loading the amanda iptables module,
 if that is available on the machine.

I'd be interested to see if that fixes it.  My amanda server which runs 
the nightlies of the (small) home partitions has been at RH9 for a while, 
and has this as the only rule it needed to get amdump working:

# If we've an established session, well, okay
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT 

I recently moved my other amanda server (which backs up my 4.5TB of RAID 
space) to RH9.  The first few nights, most of the clients were failing 
with estimate timeouts.  But when I tested during the day (with small 
partitions), everything worked.  I finally decided that the estimates on 
the big partitions were taking long enough that the above rule was timing 
out.  I couldn't afford another night of the backups failing, so I didn't 
try loading the amanda module -- I just added rules to allow incoming 
UDP traffic on priviledged ports from the clients.

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


Re: Estimate timeout

2004-06-10 Thread Paul Bijnens
Joshua Baker-LePain wrote:
On Thu, 10 Jun 2004 at 9:31am, Paul Bijnens wrote
Steven Schoch wrote:
Now we're getting somewhere.  The tcpdump shows this:
15:01:56.739818 homer  marge: icmp: host homer unreachable - admin 
prohibited [tos 0xc0]

My guess is that ICMP message is something to do with a firewall.

admin prohibited is definately a result of iptables filtering.
Have a close look in homer.  Execute iptables -L.
Maybe the solution is loading the amanda iptables module,
if that is available on the machine.

I'd be interested to see if that fixes it.  My amanda server which runs 
the nightlies of the (small) home partitions has been at RH9 for a while, 
and has this as the only rule it needed to get amdump working:

# If we've an established session, well, okay
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT 

I recently moved my other amanda server (which backs up my 4.5TB of RAID 
space) to RH9.  The first few nights, most of the clients were failing 
with estimate timeouts.  But when I tested during the day (with small 
partitions), everything worked.  I finally decided that the estimates on 
the big partitions were taking long enough that the above rule was timing 
out.  I couldn't afford another night of the backups failing, so I didn't 
try loading the amanda module -- I just added rules to allow incoming 
UDP traffic on priviledged ports from the clients.

I have been thinking about this problem, and, without any real testing
to backup my hypothesis, I believe the problem lies in the default
timeout in iptables for UDP traffic, as you decided too.
For TCP traffic, once a packet is replied, the timeout becomes very
large (5 days or so I believe).  But for UDP, which is a conectionless
protocol the timeout is 180 seconds (I believe).
After this timeout the connection tracking drops the rule.
In my config, the estimates of the clients in the DMZ all take less than
2 minutes.  And this works fine.
That means that the real solution is to compile amanda with a dedicated
udp range, and add that range to the firewall iptables.
--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  Are you sure?  ...   YES   ...   Phew ...   I'm out  *
***



Re: Estimate timeout

2004-06-10 Thread Joshua Baker-LePain
On Thu, 10 Jun 2004 at 1:40pm, Paul Bijnens wrote

 I have been thinking about this problem, and, without any real testing
 to backup my hypothesis, I believe the problem lies in the default
 timeout in iptables for UDP traffic, as you decided too.
 
 For TCP traffic, once a packet is replied, the timeout becomes very
 large (5 days or so I believe).  But for UDP, which is a conectionless
 protocol the timeout is 180 seconds (I believe).
 After this timeout the connection tracking drops the rule.

Is this true even with ip_conntrack_amanda loaded?

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


Re: Estimate timeout

2004-06-10 Thread Paul Bijnens
Joshua Baker-LePain wrote:
On Thu, 10 Jun 2004 at 1:40pm, Paul Bijnens wrote

I have been thinking about this problem, and, without any real testing
to backup my hypothesis, I believe the problem lies in the default
timeout in iptables for UDP traffic, as you decided too.
For TCP traffic, once a packet is replied, the timeout becomes very
large (5 days or so I believe).  But for UDP, which is a conectionless
protocol the timeout is 180 seconds (I believe).
After this timeout the connection tracking drops the rule.

Is this true even with ip_conntrack_amanda loaded?

I should have a look at the source code, or find a detailed doc that
explains it, to find out.
Anyway that module should somehow know the etimeout parameter
of amanda.conf, which of course it does not know, or otherwise allow
a really really large timeout, like a few hours.  Or should be tuneable
somehow (in the amanda-tradition that could be hardcoded at compile time).
--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  Are you sure?  ...   YES   ...   Phew ...   I'm out  *
***



Re: Estimate timeout

2004-06-10 Thread Joshua Baker-LePain
On Thu, 10 Jun 2004 at 2:11pm, Paul Bijnens wrote

  Is this true even with ip_conntrack_amanda loaded?
 
 
 I should have a look at the source code, or find a detailed doc that
 explains it, to find out.
 
 Anyway that module should somehow know the etimeout parameter
 of amanda.conf, which of course it does not know, or otherwise allow
 a really really large timeout, like a few hours.  Or should be tuneable
 somehow (in the amanda-tradition that could be hardcoded at compile time).

It seems to be tuneable.  From the header of the source code:

*   Module load syntax:
 *  insmod ip_conntrack_amanda.o [master_timeout=n]
 *  
 *  Where master_timeout is the timeout (in seconds) of the master
 *  connection (port 10080).  This defaults to 5 minutes but if
 *  your clients take longer than 5 minutes to do their work
 *  before getting back to the Amanda server, you can increase
 *  this value.

I should test it one of these nights...

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


Re: Estimate timeout

2004-06-10 Thread Paul Bijnens
Joshua Baker-LePain wrote:
It seems to be tuneable.  From the header of the source code:
*   Module load syntax:
 *  insmod ip_conntrack_amanda.o [master_timeout=n]
 *  
 *  Where master_timeout is the timeout (in seconds) of the master
 *  connection (port 10080).  This defaults to 5 minutes but if
 *  your clients take longer than 5 minutes to do their work
 *  before getting back to the Amanda server, you can increase
 *  this value.
I should test it one of these nights...
Wow!  Learning something new every day!
--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  Are you sure?  ...   YES   ...   Phew ...   I'm out  *
***



Re: Estimate timeout

2004-06-10 Thread Gene Heskett
On Thursday 10 June 2004 07:59, Joshua Baker-LePain wrote:
On Thu, 10 Jun 2004 at 1:40pm, Paul Bijnens wrote

 I have been thinking about this problem, and, without any real
 testing to backup my hypothesis, I believe the problem lies in the
 default timeout in iptables for UDP traffic, as you decided too.

 For TCP traffic, once a packet is replied, the timeout becomes
 very large (5 days or so I believe).  But for UDP, which is a
 conectionless protocol the timeout is 180 seconds (I believe).
 After this timeout the connection tracking drops the rule.

Is this true even with ip_conntrack_amanda loaded?

I wasn't even aware of such a module, and got surprised by the output 
of a locate!

Its part of the kernel's netfilter options since back in 2.4.22 or 
earlier days, so if he doesn't have the executable module, he may 
have to rebuild his kernel to get it.

I hadn't worried about it here since everything I backup with amanda 
is inside the firewall, or on the firewall itself, but iptables sits 
between the 2 NICS in the firewall that seperate inside from outside 
stuffs.

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
99.23% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.


Estimate timeout

2004-06-09 Thread Steven Schoch
It was working for several days, then all of a sudden it stopped and hasn't 
worked since.

Amcheck works fine, but amdump doesn't.
Amdump is run on homer, the system with the tape drive.  Homer is a RedHat 
Enterprise Linux system with amanda version 2.4.4p1.  The system that fails 
to dump is marge, a FreeBSD system with amanda version 2.4.4p2.

The important lines from amanda.conf:

etimeout 1800# number of seconds per filesystem for estimates.
#etimeout -600   # total number of seconds for estimates.
# a positive number will be multiplied by the number of filesystems on
# each host; a negative number will be taken as an absolute total time-out.
# The default is 5 minutes per filesystem.

From disklist:

marge /var comp-user
marge /usr comp-root
marge / comp-root

From crontab:

45 0 * * 2-6/usr/sbin/amdump OurDump

In /tmp/amanda on marge, these lines appear in 
amandad.20040609004501000.debug:

amandad: debug 1 pid 22611 ruid 1001 euid 1001: start at Wed Jun  9 00:45:01 
200
4
amandad: version 2.4.4p2
amandad: build: VERSION=Amanda-2.4.4p2
...
amandad: time 0.003: got packet:

Amanda 2.4 REQ HANDLE 001-389B0608 SEQ 1086767104
SECURITY USER amanda
SERVICE sendsize
...
amandad: time 0.004: sending ack:

Amanda 2.4 ACK HANDLE 001-389B0608 SEQ 1086767104
...
amandad: time 0.009: amandahosts security check passed
amandad: time 0.009: running service /usr/local/libexec/sendsize
amandad: time 447.906: sending REP packet:

Amanda 2.4 REP HANDLE 001-389B0608 SEQ 1086767104
OPTIONS features=feff9ffe0f;
/var 0 SIZE 11520
/var 1 SIZE 1580
/usr 0 SIZE 1166599
/usr 1 SIZE 18710
/ 0 SIZE 39571
/ 1 SIZE 381


amandad: time 457.910: dgram_recv: timeout after 10 seconds
amandad: time 457.910: waiting for ack: timeout, retrying
amandad: time 467.920: dgram_recv: timeout after 10 seconds
amandad: time 467.920: waiting for ack: timeout, retrying
amandad: time 477.930: dgram_recv: timeout after 10 seconds
amandad: time 477.930: waiting for ack: timeout, retrying
amandad: time 487.940: dgram_recv: timeout after 10 seconds
amandad: time 487.941: waiting for ack: timeout, retrying
amandad: time 497.950: dgram_recv: timeout after 10 seconds
amandad: time 497.951: waiting for ack: timeout, giving up!
amandad: time 497.951: pid 22611 finish time Wed Jun  9 00:53:19 2004

On homer, in amdump.1 these lines:

amdump: start at Wed Jun  9 00:45:01 PDT 2004
amdump: datestamp 20040609
planner: pid 9813 executable /usr/lib/amanda/planner version 2.4.4p1
planner: build: VERSION=Amanda-2.4.4p1
...
setup_estimate: marge:/var: command 0, options:
   last_level 0 next_level0 21 level_days 0
   getting estimates 0 (11503) 1 (0) -1 (-1)
planner: time 0.125: setting up estimates for marge:/usr
setup_estimate: marge:/usr: command 0, options:
   last_level 0 next_level0 21 level_days 0
   getting estimates 0 (1163201) 1 (0) -1 (-1)
planner: time 0.135: setting up estimates for marge:/
setup_estimate: marge:/: command 0, options:
   last_level 0 next_level0 21 level_days 0
   getting estimates 0 (39486) 1 (0) -1 (-1)
...
planner: time 223.483: got result for host homer disk /home: 0 - 4642543K, 
4 -
899568K, -1 - -1K
planner: time 10801.886: error result for host marge disk /: Estimate 
timeout fr
om marge
planner: time 10801.886: error result for host marge disk /usr: Estimate 
timeout
from marge
planner: time 10801.886: error result for host marge disk /var: Estimate 
timeout
from marge
planner: time 10801.886: getting estimates took 10801.690 secs


It looks like homer was waiting a suffcient time for marge to reply, but the 
reply was dropped.
Marge and homer are on the same switch.
--
Steve

_
Get fast, reliable Internet access with MSN 9 Dial-up – now 3 months FREE! 
http://join.msn.click-url.com/go/onm00200361ave/direct/01/



Re: Estimate timeout

2004-06-09 Thread Paul Bijnens
Steven Schoch wrote:
It was working for several days, then all of a sudden it stopped and 
hasn't worked since.
First thing to ask is: what did change since then?
Installed something?  Reconfigured something?  Rebooted system?
amandad: time 447.906: sending REP packet:
It took less than 550 seconds to estimate all of it.
planner: time 10801.886: error result for host marge disk /: Estimate 
and server timed out after 3 DLE's * 2 lvls * 1800 sec = 10800 seconds
It looks like homer was waiting a suffcient time for marge to reply, but 
the reply was dropped.
Yes, indeed.
Marge and homer are on the same switch.
Are there other clients besides marge?
Is there a local firewall activated on homer?
Try to find out where the UDP packet got dropped, using tcpdump or 
etherreal or other network analyzer on homer and marge.

--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  Are you sure?  ...   YES   ...   Phew ...   I'm out  *
***



Re: Estimate timeout

2004-06-09 Thread Steven Schoch
on Wed, 09 Jun 2004 Paul Bijnens wrote:
Try to find out where the UDP packet got dropped, using tcpdump or 
etherreal or other network analyzer on homer and marge.
Now we're getting somewhere.  The tcpdump shows this:
14:54:28.697197 homer.858  marge.amanda: udp 117 (DF)
14:54:29.176236 marge.amanda  homer.858: udp 50
14:54:29.444159 marge.amanda  homer..858: udp 83
14:54:29.444563 homer.858  marge.amanda: udp 50 (DF)
14:54:29.445650 homer.858  marge.amanda: udp 531 (DF)
14:54:29.525614 marge.amanda  homer.858: udp 50
15:01:56.739172 marge.amanda  homer.858: udp 184
15:01:56.739818 homer  marge: icmp: host homer unreachable - admin 
prohibited [tos 0xc0]
15:02:06.743312 marge.amanda  homer.858: udp 184
15:02:06.743992 homer  marge: icmp: host homer unreachable - admin 
prohibited [tos 0xc0]

My guess is that ICMP message is something to do with a firewall.
--
Steve
_
MSN 9 Dial-up Internet Access fights spam and pop-ups – now 3 months FREE! 
http://join.msn.click-url.com/go/onm00200361ave/direct/01/



Re: Estimate timeout on Mac OS X

2004-04-05 Thread Gene Heskett
On Monday 05 April 2004 01:25, David Chin wrote:
Hi,

I've almost got amanda to run on a PowerBook G4 with Mac OS X.3.3.
 Right now, I have it set up with virtual tapes on a separate
 disk. Everything goes well, including the amcheck, but when I rum
 amdump, the backup doesn't go. The mailed log of the run is below.

Can someone point me in the right direction?

Thanks in advance,
--Dave


These dumps were to tape daily11.
The next tape Amanda expects to use is: a new tape.
The next new tape already labelled is: daily01.

FAILURE AND STRANGE DUMP SUMMARY:
   localhost  /Users/drauh lev 0 FAILED [Estimate timeout from
 localhost]


STATISTICS:
   Total   Full  Daily
       
Estimate Time (hrs:min)0:15
Run Time (hrs:min) 0:15
Dump Time (hrs:min)0:00   0:00   0:00
Output Size (meg)   0.00.00.0
Original Size (meg) 0.00.00.0
Avg Compressed Size (%) -- -- --
Filesystems Dumped0  0  0
Avg Dump Rate (k/s) -- -- --

Tape Time (hrs:min)0:00   0:00   0:00
Tape Size (meg) 0.00.00.0
Tape Used (%)   0.00.00.0
Filesystems Taped 0  0  0
Avg Tp Write Rate (k/s) -- -- --

USAGE BY TAPE:
   Label Time  Size  %Nb
   daily11   0:00   0.00.0 0



NOTES:
   planner: Adding new disk localhost:/Users/drauh.
   driver: WARNING: got empty schedule from planner
   taper: tape daily11 kb 0 fm 0 [OK]



DUMP SUMMARY:
  DUMPER STATSTAPER
 STATS HOSTNAME DISKL ORIG-KB OUT-KB COMP% MMM:SS  KB/s
 MMM:SS KB/s
-- -

localhost-sers/drauh 0 FAILED

Here is the first potential problem.  Even with all the warnings 
plastered all over the FAQ and Docs, folks still insist on useing a 
universal name instead of the FQDN.

Second, did you build amanda as the user amanda, then become root to 
do the make install?  I'm thinking, just a hunch because you've not 
posted enough info, that there is either a permissions problem, or 
an .amandahosts problem, but in the latter case it will usually tell 
you about quite plainly.

---

(brought to you by Amanda version 2.4.4p2)

While it should run ok, 2.4.4p2 is beginning to get a bit long in the 
tooth. We're up to 2.4.5beta something or other, and I've not found 
anything beta about it.  It just works.

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
99.22% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attornies please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.


Re: Estimate timeout on Mac OS X

2004-04-05 Thread David Chin
On 5 Apr 2004, at 03:02, Gene Heskett wrote:
Here is the first potential problem.  Even with all the warnings
plastered all over the FAQ and Docs, folks still insist on useing a
universal name instead of the FQDN.
Yes, I knew about the problems with a universal name. I just wanted
to get something up quickly as a test. My machine sits NATted at home
and doesn't have a real FQDN.
But anyway, I changed it:

1. make my wireless AP give my machine a fixed address
2. add an entry to /etc/hosts --
  192.168.0.111 myhostname


Second, did you build amanda as the user amanda, then become root to
do the make install?
I decided to avoid all permission stuff by running everything as root.
Yes, I know the dangers, and I am willing to live with the risk for now.
While it should run ok, 2.4.4p2 is beginning to get a bit long in the
tooth. We're up to 2.4.5beta something or other, and I've not found
anything beta about it.  It just works.
Are you running it on OS X? I had to edit some of the source to get it
to compile. I have 2.4.4p2 running in my lab - RH7.3 server, mix of RH,
Fedora, and HP-UX clients - so I figure I'd stick with something I
knew was already working.
No dice, still. I'll try setting up a separate amanda user first, and
then go on and try the beta code. I'll dig around in the code as last
resort since Google didn't find me any interesting links for a search
on 'estimate timeout amanda'
--Dave

These dumps were to tape daily12.
The next tape Amanda expects to use is: a new tape.
The next new tape already labelled is: daily01.
FAILURE AND STRANGE DUMP SUMMARY:
  Ginger /Users/drauh lev 0 FAILED [Estimate timeout from Ginger]
STATISTICS:
  Total   Full  Daily
      
Estimate Time (hrs:min)0:15
Run Time (hrs:min) 0:15
Dump Time (hrs:min)0:00   0:00   0:00
Output Size (meg)   0.00.00.0
Original Size (meg) 0.00.00.0
Avg Compressed Size (%) -- -- --
Filesystems Dumped0  0  0
Avg Dump Rate (k/s) -- -- --
Tape Time (hrs:min)0:00   0:00   0:00
Tape Size (meg) 0.00.00.0
Tape Used (%)   0.00.00.0
Filesystems Taped 0  0  0
Avg Tp Write Rate (k/s) -- -- --
USAGE BY TAPE:
  Label Time  Size  %Nb
  daily12   0:00   0.00.0 0


NOTES:
  planner: Adding new disk Ginger:/Users/drauh.
  driver: WARNING: got empty schedule from planner
  taper: tape daily12 kb 0 fm 0 [OK]


DUMP SUMMARY:
 DUMPER STATSTAPER STATS
HOSTNAME DISKL ORIG-KB OUT-KB COMP% MMM:SS  KB/s MMM:SS  
KB/s
-- - 

Ginger   -sers/drauh 0 FAILED 
---

(brought to you by Amanda version 2.4.4p2)



Re: Estimate timeout on Mac OS X

2004-04-05 Thread Gene Heskett
On Monday 05 April 2004 04:07, David Chin wrote:
On 5 Apr 2004, at 03:02, Gene Heskett wrote:
 Here is the first potential problem.  Even with all the warnings
 plastered all over the FAQ and Docs, folks still insist on useing
 a universal name instead of the FQDN.

Yes, I knew about the problems with a universal name. I just wanted
to get something up quickly as a test. My machine sits NATted at
 home and doesn't have a real FQDN.

But anyway, I changed it:

1. make my wireless AP give my machine a fixed address
2. add an entry to /etc/hosts --

   192.168.0.111 myhostname

 Second, did you build amanda as the user amanda, then become root
 to do the make install?

I decided to avoid all permission stuff by running everything as
 root. Yes, I know the dangers, and I am willing to live with the
 risk for now.

amanda checks to see who she is, and amdump will not run as root.  
Tear it all back out and reinstall according to the instructions.  
This requirement is a security related requirement, and really isn't 
open for discussion.  Where amanda needs root perms, she will do an 
suid root to gain the perms she needs.  Make a normal user amanda' 
and make this user a member of the group 'disk' or 'backup'.  As 
root, do a chown -R amanda:disk amanda-2.4.5b1-20040326 (if thats 
the name of the src tree) before starting the build.  I maintain 
these src trees in /home/amanda here.  You'll also need to change the 
perms on the tarball itself because lately the tarballs are not owned 
by amanda if root does the download.  Minor detail.

I also use a script to do the configuration  and initial make because 
its consistent and repeatable from snapshot to snapshot without 
relying on my aged, occasionally fading memory. I copy this script 
into the new src tree when a new snapshot comes out, and run it from 
the top level directory of the src.

The script:
-gh.cf
#!/bin/sh
# since I'm always forgetting to su amanda...
if [ `whoami` != 'amanda' ]; then
echo
echo  Warning 
echo Amanda needs to be configured and built by the user amanda,
echo but must be installed by the user root.
echo
exit 1
fi
make clean
rm -f config.status config.cache
./configure --with-user=amanda \
--with-group=disk \
--with-owner=amanda \
--with-tape-device=/dev/nst0 \
--with-changer-device=/dev/sg1 \
--with-gnu-ld --prefix=/usr/local \
--with-debugging=/tmp/amanda-dbg/ \
--with-tape-server=FQDN.of.the.server \
--with-amandahosts \
--with-configdir=/usr/local/etc/amanda

make
---end of script-

remove the changer device line if you don't have a robotic changer.  
The Fully Qualified Domain Name (FQDN) of the tape server (or its ip 
address) must be used.

Adjust the device name to be whatever the NON-rewinding on file close 
device is on your system.  Set the x bit (chmod +x script.name)  
Become amanda and execute it with ./script.name.  Then become root 
and do a make install

I doubt you'll need to do it, but the estimate timeout value 
('etimeout' in your amanda.conf) which is defaulted to 10 minutes 
(600 seconds) per disklist entry might have to be increased.  I did 
that early on when it was running on a much slower machine, but now 
on this box a 44 member disklist typically takes 22 minutes to 
estimate.  The backup will in any event commence when all estimates 
have been obtained, or have timed out, unlikely on todays hardware 
such as your G5.

[...]

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
99.22% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attornies please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.


Estimate timeout on Mac OS X

2004-04-04 Thread David Chin
Hi,

I've almost got amanda to run on a PowerBook G4 with Mac OS X.3.3. Right
now, I have it set up with virtual tapes on a separate disk.
Everything goes well, including the amcheck, but when I rum amdump,
the backup doesn't go. The mailed log of the run is below.
Can someone point me in the right direction?

Thanks in advance,
--Dave
These dumps were to tape daily11.
The next tape Amanda expects to use is: a new tape.
The next new tape already labelled is: daily01.
FAILURE AND STRANGE DUMP SUMMARY:
  localhost  /Users/drauh lev 0 FAILED [Estimate timeout from localhost]
STATISTICS:
  Total   Full  Daily
      
Estimate Time (hrs:min)0:15
Run Time (hrs:min) 0:15
Dump Time (hrs:min)0:00   0:00   0:00
Output Size (meg)   0.00.00.0
Original Size (meg) 0.00.00.0
Avg Compressed Size (%) -- -- --
Filesystems Dumped0  0  0
Avg Dump Rate (k/s) -- -- --
Tape Time (hrs:min)0:00   0:00   0:00
Tape Size (meg) 0.00.00.0
Tape Used (%)   0.00.00.0
Filesystems Taped 0  0  0
Avg Tp Write Rate (k/s) -- -- --
USAGE BY TAPE:
  Label Time  Size  %Nb
  daily11   0:00   0.00.0 0


NOTES:
  planner: Adding new disk localhost:/Users/drauh.
  driver: WARNING: got empty schedule from planner
  taper: tape daily11 kb 0 fm 0 [OK]


DUMP SUMMARY:
 DUMPER STATSTAPER STATS
HOSTNAME DISKL ORIG-KB OUT-KB COMP% MMM:SS  KB/s MMM:SS  
KB/s
-- - 

localhost-sers/drauh 0 FAILED 
---

(brought to you by Amanda version 2.4.4p2)




estimate timeout

2003-12-04 Thread Mats Blomstrand
Hi all
'planner' just told me estimate timeout ... on my new archive-conf im
testing. The normal-backup works ok even on level 0 dumps from the same host.

Any ideas what i can do about it?
//Mats



  1   2   >