Re: Why Oh Why only THIS DLE is giving me those timeout problems ?

2005-08-31 Thread Geert Uytterhoeven
On Wed, 31 Aug 2005, Steve Wray wrote:
 Geert Uytterhoeven wrote:
  On Tue, 30 Aug 2005, Graeme Humphries wrote:
  
 Guy Dallaire wrote:
 
 Yes, thanks. I know about hard links. But how would it impact the size
 or performance of my backups ?
 
 
 Well, if a file is hard linked multiple times, it'll be backed up multiple
 times. Therefor, a filesystem with tons of hard links will take a really 
 long
 time to back up. :)
  
  Fortunately tar is sufficiently smart to back it up only once.
  
  Usually the problem with lots of hard links is not the data timeout value, 
  but
  the estimate timeout value, as I found out the hard way[*].
 
 We've been having similar problems with estimates timeing out. I just
 ran the 'find' command given in an earlier email and found a grand total
 of 607 hard links on the entire filesystem.
 
 What I'm wondering is, does 607 count as 'lots' WRT amanda estimate
 timeouts?

Not really, given I have many files with more than 600 hard links.
I seem to have 1582186 of them in my cluster of Linux kernel source trees.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say programmer or something like that.
-- Linus Torvalds


Re: This is retarded.

2005-08-31 Thread Joe Rhett
  On 8/30/05, Joe Rhett [EMAIL PROTECTED] wrote:
  tracking ability, but let's use 3 tapes and write not a single byte to
  them?

On Tue, Aug 30, 2005 at 06:35:12PM -0400, Jon LaBadie wrote:
 And not using samba, right Joe :))
 
Right.  Straight amanda clients, some Solaris, some Linux, lots of Freebsd,
and some Windows.  But all samba native clients 2.4.4p2 or later.

 The dumps were flushed to tapes svk17, svk18, svk19.
 The next 7 tapes Amanda expects to used are: svk20, svk21, svk01,
 svk02, svk03, svk04, svk05.
 
 Looks like runtapes is 7.
 
 Output Size (meg)   0.00.00.0
 
 Nothing made it to any holding disk.
 
This was a flush.  It's already on the holding disk.

   taper: tape svk17 kb 34286880 fm 1 writing file: short write
   taper: retrying customer-plat1:/.1 on new tape: [writing file: short
 write]
   taper: tape svk18 kb 34227328 fm 1 writing file: short write
   taper: retrying customer-plat1:/.1 on new tape: [writing file: short
 write]
   taper: tape svk19 kb 0 fm 0 [OK]
 
 This looks like the dump was going directly to tape and was too large to
 fit your 35GB tape.  So it was retried and of course was still too big.
   customer- / lev 1 FAILED 20050825 [too many taper retries]
 
Yes.  The item it is trying to flush is 53 1gb tar blockfilies.

du -ks /amandadump/20050825
5336478620050825

 This seems to be an improvement over what I would have expected.
 I recall amanda continuing through any and all runtapes tapes.
 It would have done 7 attempts in your config.  At least now it
 stops after a couple of attempts.
 
 As to whether amanda's behavior is reasonable, well ...
 
 It is nearly impossible, perhaps totally impossible, to tell if the taping
 failed because of reaching the end of the tape or a tape or hardware error.
 Is having a backup important enough to continue and try again or should the
 first failure, possibly a bad/worn out tape terminate all the remaining 
 backups?
 
 I don't think there is a simple answer.  What would be your recommendation?
 
I don't think this is the answer.  The real answer is that the filesystem
it is trying to flush was 53gb in size when you put all the chunks
together.  That won't fit on a 33gb tape.

1. Why aren't we backing up chunks to different tapes yet?  Amanda is the
only backup software which doesn't handle this.

-and more importantly-

2. Why is it trying to back up 53gb to a 33gb tape definition?

#2 is clearly a bug.  #1 is a feature request long overdue, but #2 is
clearly the bug.

-- 
Joe Rhett
senior geek
meer.net


Re: This is retarded.

2005-08-31 Thread Joe Rhett
On Tue, Aug 30, 2005 at 06:01:07PM -0500, Frank Smith wrote:
 In order to tell if Amanda is deficient in some way, we need some more 
 information.
 According to the original post, this was an amflush run, so this DLE must be 
 sitting
 on a holdindisk.  Since normally Amanda refuses to backup a DLE that won't 
 fit on
 a tape and give a warning, I see two possibilities:
 1. Your tapelength is set incorrectly in your config, so Amanda thinks a dump 
 will
fit when it won't

Tapelength is actually less than real tape length.  We adjusted this well 
below the real length of DLT tapes some time ago to avoid amanda 
miscalculation errors.

 2. For some reason the dump ended up larger than the estimate, perhaps due to 
 a
changing filesystem or using both H/W compression on already compressed 
 data.
 
We don't use compression anywhere.  Never.  None of our backup definitions
have compression enabled.

 Is your tapelength set to something less than 34.2GB?

Yes, it's set to 30gb.  Much less than the real capacity.

 How big is the dump in the holdingdisk (not how big the chunks are, if there 
 are
 more than one, but the total of all the chunks of that DLE)?
 
53 1gb chunks.

 With that information we could find where the problem lies and maybe find a 
 solution.
 
Some part of the code mistakenly decided to store 53gb to the holding disk,
and then tried to flush it to tape, regardless of the fairly basic math
involved.

This works fine without holding disks, or when the holding disk is too
small for the backup.  The logic flaw must be somewhere in the 
tape not available, store to holding disk for later logic and/or flush
this to tape not checking the sizes involved.

I am still strongly of the opinion that amanda's handling of DLEs is still
its strongest failing point.  We're hacking vainly at something which needs
to be redesigned to work properly.

-- 
Joe Rhett
senior geek
meer.net


dump larger than tape

2005-08-31 Thread Geert Uytterhoeven
Hi,

I just got this in my daily report from Amanda (2.4.5-1, Debian etch/testing):

| FAILURE AND STRANGE DUMP SUMMARY:
| 
| [...]
| 
|   host dle lev 5 FAILED [dump larger than tape, -1 KB, skipping 
incremental]
| 
| [...]
| 
| DUMP SUMMARY:
| 
| [...]
| 
| host dle5 FAILED 

| 
| [...]

The funny thing is that this DLE is only about 5 GiB large (according to
du), while the tapetype length is 2 mbytes.

Anyone ever seen this before?

Thx!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say programmer or something like that.
-- Linus Torvalds


extend chg-mtx

2005-08-31 Thread rangzen
Hello,
if the main difference between chg-mtx and chg-zd-mtx. is only The
mtx program must support commands such as `-s', `-l' and `-u'. If the
one you've got requires `status', `load' and `unload', you should use
chg-zd-mtx instead, is it possible to just add a variable like
UseFullOption = yes or no and complete StatusOption, LoadOption and
UnloadOption with the good choice -s or status, -l or load and
-u or unload, and use them when the script call mtx.
Is it too simple ?
-- 
freedom - share - respect



Re: Help restoring

2005-08-31 Thread David Golden
On 2005-08-30 13:25:09 -0400, Jon LaBadie wrote:
  As to the other suggestions, my /dev/tape ls symlinked to /dev/nst0, so is
  not rewinding.
 
 Typically 'tape' references st0 and 'ntape' references nst0.

OTOH, my install of CentOS 4, something (udev, manages /dev
on most new linux kernel based OS Distros) has automagically 
made the  symlink from tape to nst0, not st0. Not saying it's right, 
or conventional in the wider ixoid world, just that that's what 
happened and that's what a fresh install of CentOS 4 (and 
therefore presumably what RHEL4) will do.

[EMAIL PROTECTED] ~]# grep -R tape /etc/udev/rules.d/
/etc/udev/rules.d/50-udev.rules:KERNEL=qft0,  SYMLINK=ftape
/etc/udev/rules.d/50-udev.rules:KERNEL=nst[0-9]*, SYMLINK=tape%e



Re: This is retarded.

2005-08-31 Thread Joe Rhett
On Wed, Aug 31, 2005 at 12:41:43AM -0700, Joe Rhett wrote:
 Some part of the code mistakenly decided to store 53gb to the holding disk,
 and then tried to flush it to tape, regardless of the fairly basic math
 involved.
 
And well, tonight, it does it again.

USAGE BY TAPE:
  Label   Time  Size  %Nb
  svk17   0:00   0.00.0 0
  svk18   0:00   0.00.0 0
  svk19   1:32   15181.6   45.142
  svk20   0:00   0.00.0 0

Yeah, I didn't need those tapes for anything else.  Let's just waste 12
hours and 3 tapes doing absolutely NOTHING.

-- 
Joe Rhett
senior geek
meer.net


extimate server initial value?

2005-08-31 Thread Graeme Humphries
This may be a RTFM question, but I can't see the answer in the 
amanda.conf man page (http://www.amanda.org/docs/amanda.conf.5.html):


When using estimate server, is there a way to configure what the 
initial estimate of a disk entry is before there's any historical data. 
It looks like Amanda's defaulting to about 5MB, and I'd rather it 
defaulted to closer to 1GB.


Graeme

--
Graeme Humphries ([EMAIL PROTECTED])
(306) 955-7075 ext. 485

My views are not the views of my employers.



Re: This is retarded.

2005-08-31 Thread Frank Smith
--On Wednesday, August 31, 2005 15:46:36 -0700 Joe Rhett [EMAIL PROTECTED] 
wrote:

 On Wed, Aug 31, 2005 at 12:41:43AM -0700, Joe Rhett wrote:
 Some part of the code mistakenly decided to store 53gb to the holding disk,
 and then tried to flush it to tape, regardless of the fairly basic math
 involved.
  
 And well, tonight, it does it again.
 
 USAGE BY TAPE:
   Label   Time  Size  %Nb
   svk17   0:00   0.00.0 0
   svk18   0:00   0.00.0 0
   svk19   1:32   15181.6   45.142
   svk20   0:00   0.00.0 0
 
 Yeah, I didn't need those tapes for anything else.  Let's just waste 12
 hours and 3 tapes doing absolutely NOTHING.

I'm still curious how a 53gb dump was even done if your tapelength was
set to 30ish GB.  Was your tapelenght set to something longer at the
time the dump was originally done?

In the meantime, to quit burning tapes, set autoflush off (or is it false?)
to let the old dump stay on disk while continuing your current backups
and researching the problem.  Or are you saying it did a new 53gb dump?

Yes, I agree that Amanda shouldn't mark a tape as used if nothing was
successfully written to it. You could edit your tapelist to 'unuse' a
tape, just be aware you can screw things up if you get it wrong.

Frank

 
 -- 
 Joe Rhett
 senior geek
 meer.net



-- 
Frank Smith  [EMAIL PROTECTED]
Sr. Systems Administrator   Voice: 512-374-4673
Hoover's Online   Fax: 512-374-4501



Re: planner timeouts

2005-08-31 Thread Alexander Jolk

Charles Sprickman wrote:
h13 (client) debug logs.  Note that there is two-way communication, and 
everything seems to go correctly.  In the debug dir, there are only 
amandad debug logs, nothing else.


That doesn't sound right to me.  There should be a sendbackup log file 
as well, a runtar one, and so on.  Can you verify your inetd config on 
that particular client, to see whether there's something afoul?  Have a 
look at the system logs as well, while you're at it.  amanda might be 
unable to run any secondary programs, for instance.



GETTING ESTIMATES...
planner: time 30.956: error result for host h13.blah.com disk /spool: 
Request to h13.blah.com timed out.
planner: time 30.956: error result for host h13.blah.com disk 
/var/qmail/bin: Request to h13.blah.com timed out.
planner: time 30.956: error result for host h13.blah.com disk 
/var/qmail/control: Request to h13.blah.com timed out.
planner: time 30.956: error result for host h13.blah.com disk 
/var/db/pkg: Request to h13.blah.com timed out.
planner: time 30.956: error result for host h13.blah.com disk 
/usr/local/: Request to h13.blah.com timed out.
planner: time 30.956: error result for host h13.blah.com disk /home: 
Request to h13.blah.com timed out.
planner: time 30.956: error result for host h13.blah.com disk /: Request 
to h13.blah.com timed out.

planner: time 30.956: getting estimates took 30.811 secs


Does that spell a 30s timeout somewhere?  amanda.conf not taken into 
account, perhaps?  And the obligatory question, did you double-check 
that there's no firewall between that particular client and server?  (If 
you did, triple-check. :-) )


Alex


--
Alexander Jolk / BUF Compagnie
tel +33-1 42 68 18 28 /  fax +33-1 42 68 18 29



AIT-2 length specifier, et. al.

2005-08-31 Thread Mason Loring Bliss
Hey, all.

I've got an inherited Amanda system I'm managing, and we're using an AIT-2
tape drive / changer. I see the following:

define tapetype AIT-2 {
  comment Generic AIT 2 Drive -- real world numbers
  length 41000 mbytes
  filemark 1000 kbytes
  speed 2920 kps

}

Now, we've got data compression turned on on the tape drive, which should
theoretically make more space, and we also have clients doing compression.

Looking at the Faq-O-Matic, I see this:

http://amanda.sourceforge.net/fom-serve/cache/439.html

In short, the person entering that data got less space and speed with
hardware compression enabled. They were using the same drive we use here.

My questions are profligate, but I'll limit them to these:

1. Why would the length parameter end up being so much shorter with
hardware compression enabled on that drive? This confuses me.

2. Am I to take it from the derived speed parameter that dumps will go
quicker without hardware compression enabled? It seems strange that
hardware compression would create that sort of impact, even when it's
dealing with already-compressed data.

I'll try to schedule some time to run amtapetype at work, but we're in the
middle of a move, so I might not get the chance for a while, and if I can
safely get more speed and space out of my tapes with such a small change,
I'd like to do so.

Thanks kindly and heart-feltedly in advance!

-- 
Mason Loring Bliss [EMAIL PROTECTED]   They also surf who
awake ? sleep : dream; http://blisses.org/ only stand on waves.


Re: This is retarded.

2005-08-31 Thread Jon LaBadie
On Wed, Aug 31, 2005 at 03:46:36PM -0700, Joe Rhett wrote:
 On Wed, Aug 31, 2005 at 12:41:43AM -0700, Joe Rhett wrote:
  Some part of the code mistakenly decided to store 53gb to the holding disk,
  and then tried to flush it to tape, regardless of the fairly basic math
  involved.
  
 And well, tonight, it does it again.
 
 USAGE BY TAPE:
   Label   Time  Size  %Nb
   svk17   0:00   0.00.0 0
   svk18   0:00   0.00.0 0
   svk19   1:32   15181.6   45.142
   svk20   0:00   0.00.0 0
 
 Yeah, I didn't need those tapes for anything else.  Let's just waste 12
 hours and 3 tapes doing absolutely NOTHING.

What's the saying?
If you keep doing what you've been doing
 you'll keep getting what you've been getting!

You know you have a DLE that is too big to tape and that amanda
does not handle it well.  Isn't it time to stop wasting time
and tape and split the DLE?

jl
-- 
Jon H. LaBadie  [EMAIL PROTECTED]
 JG Computing
 4455 Province Line Road(609) 252-0159
 Princeton, NJ  08540-4322  (609) 683-7220 (fax)


RE: This is retarded.

2005-08-31 Thread Lengyel, Florian
Title: RE: This is retarded.







On Wed, Aug 31, 2005 at 03:46:36PM -0700, Joe Rhett wrote:

 And well, tonight, it does it again.

 USAGE BY TAPE:
 Label Time Size % Nb
 svk17 0:00 0.0 0.0 0
 svk18 0:00 0.0 0.0 0
 svk19 1:32 15181.6 45.1 42
 svk20 0:00 0.0 0.0 0

 Yeah, I didn't need those tapes for anything else. Let's just waste 12
 hours and 3 tapes doing absolutely NOTHING.

What's the saying?
 If you keep doing what you've been doing
 you'll keep getting what you've been getting!

You know you have a DLE that is too big to tape and that amanda
does not handle it well. Isn't it time to stop wasting time
and tape and split the DLE?

jl

Ah, not splitting the DLE is retarded. The whining is a most productive use
of bandwidth, however. I'd like to see a discussion of the code with the
purported error. That would have some intellectual (as opposed to emotional) content





Re: This is retarded.

2005-08-31 Thread Jamie Wilkinson
This one time, at band camp, Joe Rhett wrote:
1. Why aren't we backing up chunks to different tapes yet?  Amanda is the
only backup software which doesn't handle this.

-and more importantly-

2. Why is it trying to back up 53gb to a 33gb tape definition?

#2 is clearly a bug.  #1 is a feature request long overdue, but #2 is
clearly the bug.

There's a patch from John Stange that I don't believe has been committed to
CVS, but it takes care of splitting dumps and spanning tapes.  Grep for it
on the amanda-hackers list.  Doesn't fix #2, but should take care of #1 for
you.


Re: AIT-2 length specifier, et. al.

2005-08-31 Thread Frank Smith
--On Wednesday, August 31, 2005 10:17:54 -0400 Mason Loring Bliss [EMAIL 
PROTECTED] wrote:

 Hey, all.
 
 I've got an inherited Amanda system I'm managing, and we're using an AIT-2
 tape drive / changer. I see the following:
 
 define tapetype AIT-2 {
   comment Generic AIT 2 Drive -- real world numbers
   length 41000 mbytes
   filemark 1000 kbytes
   speed 2920 kps
 
 }
 
 Now, we've got data compression turned on on the tape drive, which should
 theoretically make more space, and we also have clients doing compression.

Generally not good to do both.

 Looking at the Faq-O-Matic, I see this:
 
   http://amanda.sourceforge.net/fom-serve/cache/439.html
 
 In short, the person entering that data got less space and speed with
 hardware compression enabled. They were using the same drive we use here.
 
 My questions are profligate, but I'll limit them to these:
 
 1. Why would the length parameter end up being so much shorter with
 hardware compression enabled on that drive? This confuses me.

Trying to compress already compressed data makes it larger.  Some
tape drives are smart enough to not compress the data again but AIT
isn't one of them (at least up through AIT-3, not certain about -4).

 2. Am I to take it from the derived speed parameter that dumps will go
 quicker without hardware compression enabled? It seems strange that
 hardware compression would create that sort of impact, even when it's
 dealing with already-compressed data.

A tape drive can only write so many MB/sec.  If the data is getting
larger it has to wtrite more (i.e. your 41GB is turning into maybe 46GB).

 I'll try to schedule some time to run amtapetype at work, but we're in the
 middle of a move, so I might not get the chance for a while, and if I can
 safely get more speed and space out of my tapes with such a small change,
 I'd like to do so.

Pick either H/W or S/W compression, but don't do both.  If you decide
to disable H/W compression on your drive, you probably need to look
for Gene's recurring postings on the subject in the archives, as many
drives detect that a tape was previously used compressed and will
re-enable compression even if you think you've disabled it.  With
H/W compression off, yopu should be able to get close to 50GB on your
AIT-2 drive and not 41GB.

Frank

 Thanks kindly and heart-feltedly in advance!
 
 -- 
 Mason Loring Bliss [EMAIL PROTECTED]   They also surf who
 awake ? sleep : dream; http://blisses.org/ only stand on waves.



-- 
Frank Smith  [EMAIL PROTECTED]
Sr. Systems Administrator   Voice: 512-374-4673
Hoover's Online   Fax: 512-374-4501



RE: Estimate timeout

2005-08-31 Thread LaValley, Brian E
Well, the tar command by itself is still running, but the backup with the
new version of tar is complete, so my estimate timeout problem is fixed
with an updated tar executable. Thank you all.

-Original Message-
From: Joshua Baker-LePain [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 30, 2005 11:21 AM
To: LaValley, Brian E
Cc: Amanda (E-mail)
Subject: Re: Estimate timeout


On Tue, 30 Aug 2005 at 11:01am, LaValley, Brian E wrote

 sendsize: debug 1 pid 12359 ruid 548 euid 548: start at Mon Aug 29
18:00:02
 2005
 sendsize: version 2.4.4p2
 sendsize[12359]: time 0.034: waiting for any estimate child: 1 running
 sendsize[12361]: time 0.035: calculating for amname
 '/dev/vx/dsk/homedg/homevol', dirname '/home', spindle -1
 sendsize[12361]: time 0.035: getting size via gnutar for
 /dev/vx/dsk/homedg/homevol level 0
 sendsize[12361]: time 0.092: spawning
/home/backup/amanda_sun/libexec/runtar
 in pipeline
 sendsize[12361]: argument list: /opt/sfw/bin/gtar --create --file
/dev/null
 --directory /home --one-file-system --listed-incremental

/home/backup/amanda_sun/var/amanda/gnutar-lists/coneng_dev_vx_dsk_homedg_hom
 evol_0.new --sparse --ignore-failed-read --totals --exclude-from
 /tmp/amanda/sendsize._dev_vx_dsk_homedg_homevol.20050829180002.exclude .

Run this command yourself on the command line (as root) and see how long 
it take to complete.  Also, what version of tar are you running?

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


Re: samba backups

2005-08-31 Thread Gregor Ibic

Here is a process list and strace to smbclient and tar. You can see that
is stalled at some opening of the file.
I modified share name and file name for security reasons.

regards,
gregor

19432 ?S  0:00 /bin/sh /usr/sbin/amdump tape
19442 ?S  0:20 /usr/libexec/amanda/driver tape
19443 ?S 21:31 taper tape
19444 ?S 69:20 dumper0 tape
19445 ?S  5:36 dumper1 tape
19446 ?S  3:26 dumper2 tape
19447 ?S 17:10 taper tape
24836 ?S  0:00 /usr/libexec/amanda/sendbackup
24838 ?S  0:00 /bin/gzip --best
26743 ?S  0:01 /usr/libexec/amanda/sendbackup
26761 ?S  0:00 pickup -l -t fifo -u
26747 ?S  0:00 sed -e s/^\.//
26746 ?S  0:00 /bin/tar -tf -
26745 ?S  0:00 sh -c /bin/tar -tf - 2/dev/null | sed -e
's/^\.//'
26744 ?S  0:07 smbclient \\server\sharename -U username -E
-d0 -Tqca
26778 pts/15   R  0:00 ps ax


[EMAIL PROTECTED] root]# strace -p 26744
Process 26744 attached - interrupt to quit
write(2, NT_STATUS_ACCESS_DENIED opening ..., 134 unfinished ...
Process 26744 detached


[EMAIL PROTECTED] root]# strace -p 26745
Process 26745 attached - interrupt to quit
wait4(-1,
Process 26745 detached

[EMAIL PROTECTED] root]# strace -p 26746
Process 26746 attached - interrupt to quit
read(0,  unfinished ...
Process 26746 detached

[EMAIL PROTECTED] root]# strace -p 26747
Process 26747 attached - interrupt to quit
read(0,  unfinished ...
Process 26747 detached

[EMAIL PROTECTED] root]# strace -p 26743
Process 26743 attached - interrupt to quit
read(0,  unfinished ...
Process 26743 detached

[EMAIL PROTECTED] root]# strace -p 24836
Process 24836 attached - interrupt to quit
write(2, filename.txt (\\Data\\..., 50 unfinished ...
Process 24836 detached