Update on Index Tees - Data Timeouts

2002-08-15 Thread Jim Summers

Test #1:

I commented out all filesystems except one of the failing ones and
simply ran amdump.  RESULT:  Worked perfectly.

Test #2:

Added in the final two failing filesystems and ran amdump.
RESULT:  Worked perfectly.

Any suggestions on what it could be.  

I have the following set in amanda.conf:

inparallel 16 
netusage  75000 Kbps 
tapebufs 40
reserve 25
for my holdingdisk I have set the chunksize to 1Gb

Which side is most likely to cause a Data Timeout client or server?

Are there any suggestions on where to do the compression?

Jon,  thank you for your response regarding the ufsdumps.  I had
reverted back to that during that particular run to see if I could ease
some contention somewhere.  But to no avail.

Thanks again,
Jim

On Wed, 2002-08-14 at 14:37, Joshua Baker-LePain wrote:
 On 14 Aug 2002 at 2:19pm, Jim Summers wrote
 
  In an effort to debug this problem, is there a way I can interactively
  run the command(s) that amanda would run to see if anything is dumped to
  stdout?  If so, are these the commands in runtar and sendbackup?  Or
  would it be better to comment out all filesystems except one of the ones
  having problems and run amdump?
 
 The exact commands are in both runtar*debug and sendbackup*debug.  It 
 would be interesting to comment out some filesystems to see if it's a 
 function of having too much going on on one host, or if it's something 
 particular to those filesystems.
 
 -- 
 Joshua Baker-LePain
 Department of Biomedical Engineering
 Duke University
 





Re: Update on Index Tees - Data Timeouts

2002-08-15 Thread Jon LaBadie

On Thu, Aug 15, 2002 at 11:08:51AM -0500, Jim Summers wrote:
 Test #1:
 
 I commented out all filesystems except one of the failing ones and
 simply ran amdump.  RESULT:  Worked perfectly.
 
 Test #2:
 
 Added in the final two failing filesystems and ran amdump.
 RESULT:  Worked perfectly.
 
 Any suggestions on what it could be.  
 

Perhaps during the problem times all the file systems were
doing level 0's and the dumps were running in parallel.
This could cause high levels of activity that could cause
great slow downs.  Perhaps net contention.  Perhaps multiple
dumps from a single host and high cpu usage.  Perhaps
multiple dumps from a single disk drive and lots of disk
head movement slowing things down.

Now some are doing incrementals while others are level 0s.

-- 
Jon H. LaBadie  [EMAIL PROTECTED]
 JG Computing
 4455 Province Line Road(609) 252-0159
 Princeton, NJ  08540-4322  (609) 683-7220 (fax)



Re: Update on Index Tees - Data Timeouts

2002-08-15 Thread Joshua Baker-LePain

On 15 Aug 2002 at 11:08am, Jim Summers wrote

 Any suggestions on what it could be.  
 
 I have the following set in amanda.conf:
 
 inparallel 16 
 netusage  75000 Kbps 
 tapebufs 40
 reserve 25
 for my holdingdisk I have set the chunksize to 1Gb

Maybe you're hitting some network contention problems -- try cranking down 
inparallel?

 Are there any suggestions on where to do the compression?

On the hosts that can handle it.  I have a fast amanda server, and mostly 
fast clients.  For the not fast clients, I do compression on the server.

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University




Index Tees - Data Timeouts

2002-08-14 Thread Jim Summers

I am running Amanda 2.4.2p2 on a Redhat Linux 7.3 as my Amanda server. 
The clients are mostly Solaris.  I have been been backing up the server
and adding clients one at a time.  Everything was working well, one
server and two clients, then I added a third client.  Now I getting data
timeouts and index tee broken messages in my Amanda reports and in the
system log files.

I have perused the docs, FAQ, and the various log files that Amanda
generates and I not finding the clue to point me to the problem.  All I
keep seeing are the index tee messages, but not what is causing the
index tee messages to be generated.

Suggestions or ideas?  If more info is needed, amanda.conf entries
etc... let me know and I can provide.

Thanks in advance,
Jim Summers
[EMAIL PROTECTED]
 






Re: Index Tees - Data Timeouts

2002-08-14 Thread Joshua Baker-LePain

On 14 Aug 2002 at 8:09am, Jim Summers wrote

 I am running Amanda 2.4.2p2 on a Redhat Linux 7.3 as my Amanda server. 
 The clients are mostly Solaris.  I have been been backing up the server
 and adding clients one at a time.  Everything was working well, one
 server and two clients, then I added a third client.  Now I getting data
 timeouts and index tee broken messages in my Amanda reports and in the
 system log files.

From which systems?  The actual error messages would be most helpful.

 I have perused the docs, FAQ, and the various log files that Amanda
 generates and I not finding the clue to point me to the problem.  All I
 keep seeing are the index tee messages, but not what is causing the
 index tee messages to be generated.

Also send along the contents of /tmp/amanda/sendbackup*debug from the 
failing clients.

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University




Re: Index Tees - Data Timeouts

2002-08-14 Thread Joshua Baker-LePain

On 14 Aug 2002 at 9:26am, Jim Summers wrote

 On Wed, 2002-08-14 at 08:23, Joshua Baker-LePain wrote:
  On 14 Aug 2002 at 8:09am, Jim Summers wrote
  
   I am running Amanda 2.4.2p2 on a Redhat Linux 7.3 as my Amanda server. 
   The clients are mostly Solaris.  I have been been backing up the server
   and adding clients one at a time.  Everything was working well, one
   server and two clients, then I added a third client.  Now I getting data
   timeouts and index tee broken messages in my Amanda reports and in the
   system log files.
  
  From which systems?  The actual error messages would be most helpful.
 From one of the working systems a Sun E250 Solaris 8 and from the newly
 added system Sun Ultra10 Solaris 8.  I will send the amanda report when
 I get the next one.

You said you had messages in the system log files -- what are those?  You 
could also try increasing dtimeout...

How do your dumprates look?

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University




Re: Index Tees - Data Timeouts

2002-08-14 Thread Jim Summers

On Wed, 2002-08-14 at 09:44, Joshua Baker-LePain wrote:
 On 14 Aug 2002 at 9:26am, Jim Summers wrote
 
  On Wed, 2002-08-14 at 08:23, Joshua Baker-LePain wrote:
   On 14 Aug 2002 at 8:09am, Jim Summers wrote
   
I am running Amanda 2.4.2p2 on a Redhat Linux 7.3 as my Amanda server. 
The clients are mostly Solaris.  I have been been backing up the server
and adding clients one at a time.  Everything was working well, one
server and two clients, then I added a third client.  Now I getting data
timeouts and index tee broken messages in my Amanda reports and in the
system log files.
   
   From which systems?  The actual error messages would be most helpful.
  From one of the working systems a Sun E250 Solaris 8 and from the newly
  added system Sun Ultra10 Solaris 8.  I will send the amanda report when
  I get the next one.
 
 You said you had messages in the system log files -- what are those?  You 
Here are the messages in my system log file:

Aug 14 01:14:14 turing sendbackup[17657]: [ID 702911 auth.notice] index
tee cannot write [Broken pipe]
Aug 14 01:14:14 turing sendbackup[17655]: [ID 702911 auth.notice] error
[/usr/local/bin/tar got signal 13, compress returned 1]
Aug 14 02:06:34 turing sendbackup[17740]: [ID 702911 auth.notice] index
tee cannot write [Broken pipe]




 could also try increasing dtimeout...
I have twiddled with that one went from 1800 to 3600.  Then back to 1800
and I am currently at 2400.  

 
 How do your dumprates look?
Here is the last amanda report received.  I incorrectly used the wrong
dump type on the /usr/oracle fs.

 
These dumps were to tapes daily09, daily10.
The next 2 tapes Amanda expects to used are: daily11, daily12.

FAILURE AND STRANGE DUMP SUMMARY:
  tarjan /opt lev 0 FAILED [data timeout]
  turing /cs/turing/home2 lev 1 FAILED [data timeout]These dumps
were to tapes daily09, daily10.
The next 2 tapes Amanda expects to used are: daily11, daily12.

FAILURE AND STRANGE DUMP SUMMARY:
  tarjan /opt lev 0 FAILED [data timeout]
  turing /cs/turing/home2 lev 1 FAILED [data timeout]
  turing /opt lev 0 FAILED [data timeout]
  tarjan /usr/oracle lev 0 FAILED [data timeout]
  turing /usr lev 0 STRANGE


STATISTICS:
  Total   Full  Daily
      
Estimate Time (hrs:min)0:50
Run Time (hrs:min)10:21
Dump Time (hrs:min)4:26   4:08   0:17
Output Size (meg)   13542.111766.4 1775.7
Original Size (meg) 26361.524555.9 1805.6
Avg Compressed Size (%)47.9   47.97.3   (level:#disks
...)
Filesystems Dumped   13  4  9   (1:9)
Avg Dump Rate (k/s)   870.0  808.9 1742.9

Tape Time (hrs:min)3:23   2:56   0:27
Tape Size (meg) 13542.511766.5 1776.0
Tape Used (%) 116.7  101.4   15.3   (level:#disks
...)
Filesystems Taped13  4  9   (1:9)
Avg Tp Write Rate (k/s)  1137.5 1141.5 .7


FAILED AND STRANGE DUMP DETAILS:

/-- tarjan /opt lev 0 FAILED [data timeout]
sendbackup: start [tarjan:/opt level 0]
sendbackup: info BACKUP=/usr/local/bin/tar
sendbackup: info RECOVER_CMD=/usr/local/bin/gzip -dc |/usr/local/bin/tar
-f... -
sendbackup: info COMPRESS_SUFFIX=.gz
sendbackup: info end
? 
\

/-- turing /cs/turing/home2 lev 1 FAILED [data timeout]
sendbackup: start [turing:/cs/turing/home2 level 1]
sendbackup: info BACKUP=/usr/local/bin/tar
sendbackup: info RECOVER_CMD=/usr/local/bin/gzip -dc |/usr/local/bin/tar
-f... -
sendbackup: info COMPRESS_SUFFIX=.gz
sendbackup: info end
? 
\

/-- turing /opt lev 0 FAILED [data timeout]
sendbackup: start [turing:/opt level 0]
sendbackup: info BACKUP=/usr/sbin/ufsdump
sendbackup: info RECOVER_CMD=/usr/local/bin/gzip -dc
|/usr/sbin/ufsrestore -f... -
sendbackup: info COMPRESS_SUFFIX=.gz
sendbackup: info end
|   DUMP: Writing 32 Kilobyte records
|   DUMP: Date of this level 0 dump: Wed Aug 14 01:54:45 2002
|   DUMP: Date of last level 0 dump: the epoch
|   DUMP: Dumping /dev/rdsk/c0t0d0s5 (turing:/opt) to standard output.
|   DUMP: Mapping (Pass I) [regular files]
|   DUMP: Mapping (Pass II) [directories]
|   DUMP: Estimated 6949216 blocks (3393.17MB) on 0.05 tapes.
|   DUMP: Dumping (Pass III) [directories]
|   DUMP: Dumping (Pass IV) [regular files]
| 
? gzip: stdout: Broken pipe
? sendbackup: index tee cannot write [Broken pipe]
|   DUMP: Broken pipe
|   DUMP: The ENTIRE dump is aborted.
? index returned 1
sendbackup: error [/usr/sbin/ufsdump returned 3, compress returned 1]
\

/-- tarjan /usr/oracle lev 0 FAILED [data timeout]
sendbackup: start [tarjan:/usr/oracle level 0]
sendbackup: info BACKUP=/usr/sbin/ufsdump
sendbackup: info RECOVER_CMD=/usr/local/bin/gzip -dc
|/usr/sbin/ufsrestore -f... -
sendbackup: info COMPRESS_SUFFIX=.gz
sendbackup: info end
|   DUMP: Writing 32 Kilobyte records
|   DUMP: Date of 

Re: Index Tees - Data Timeouts

2002-08-14 Thread Jim Summers

In an effort to debug this problem, is there a way I can interactively
run the command(s) that amanda would run to see if anything is dumped to
stdout?  If so, are these the commands in runtar and sendbackup?  Or
would it be better to comment out all filesystems except one of the ones
having problems and run amdump?

Thanks again,
Jim


On Wed, 2002-08-14 at 10:41, Jim Summers wrote:
 On Wed, 2002-08-14 at 09:44, Joshua Baker-LePain wrote:
  On 14 Aug 2002 at 9:26am, Jim Summers wrote
  
   On Wed, 2002-08-14 at 08:23, Joshua Baker-LePain wrote:
On 14 Aug 2002 at 8:09am, Jim Summers wrote

 I am running Amanda 2.4.2p2 on a Redhat Linux 7.3 as my Amanda server. 
 The clients are mostly Solaris.  I have been been backing up the server
 and adding clients one at a time.  Everything was working well, one
 server and two clients, then I added a third client.  Now I getting data
 timeouts and index tee broken messages in my Amanda reports and in the
 system log files.

From which systems?  The actual error messages would be most helpful.
   From one of the working systems a Sun E250 Solaris 8 and from the newly
   added system Sun Ultra10 Solaris 8.  I will send the amanda report when
   I get the next one.
  
  You said you had messages in the system log files -- what are those?  You 
 Here are the messages in my system log file:
 
 Aug 14 01:14:14 turing sendbackup[17657]: [ID 702911 auth.notice] index
 tee cannot write [Broken pipe]
 Aug 14 01:14:14 turing sendbackup[17655]: [ID 702911 auth.notice] error
 [/usr/local/bin/tar got signal 13, compress returned 1]
 Aug 14 02:06:34 turing sendbackup[17740]: [ID 702911 auth.notice] index
 tee cannot write [Broken pipe]
 
 
 
 
  could also try increasing dtimeout...
 I have twiddled with that one went from 1800 to 3600.  Then back to 1800
 and I am currently at 2400.  
 
  
  How do your dumprates look?
 Here is the last amanda report received.  I incorrectly used the wrong
 dump type on the /usr/oracle fs.
 
  
 These dumps were to tapes daily09, daily10.
 The next 2 tapes Amanda expects to used are: daily11, daily12.
 
 FAILURE AND STRANGE DUMP SUMMARY:
   tarjan /opt lev 0 FAILED [data timeout]
   turing /cs/turing/home2 lev 1 FAILED [data timeout]These dumps
 were to tapes daily09, daily10.
 The next 2 tapes Amanda expects to used are: daily11, daily12.
 
 FAILURE AND STRANGE DUMP SUMMARY:
   tarjan /opt lev 0 FAILED [data timeout]
   turing /cs/turing/home2 lev 1 FAILED [data timeout]
   turing /opt lev 0 FAILED [data timeout]
   tarjan /usr/oracle lev 0 FAILED [data timeout]
   turing /usr lev 0 STRANGE
 
 
 STATISTICS:
   Total   Full  Daily
       
 Estimate Time (hrs:min)0:50
 Run Time (hrs:min)10:21
 Dump Time (hrs:min)4:26   4:08   0:17
 Output Size (meg)   13542.111766.4 1775.7
 Original Size (meg) 26361.524555.9 1805.6
 Avg Compressed Size (%)47.9   47.97.3   (level:#disks
 ...)
 Filesystems Dumped   13  4  9   (1:9)
 Avg Dump Rate (k/s)   870.0  808.9 1742.9
 
 Tape Time (hrs:min)3:23   2:56   0:27
 Tape Size (meg) 13542.511766.5 1776.0
 Tape Used (%) 116.7  101.4   15.3   (level:#disks
 ...)
 Filesystems Taped13  4  9   (1:9)
 Avg Tp Write Rate (k/s)  1137.5 1141.5 .7
 
 
 FAILED AND STRANGE DUMP DETAILS:
 
 /-- tarjan /opt lev 0 FAILED [data timeout]
 sendbackup: start [tarjan:/opt level 0]
 sendbackup: info BACKUP=/usr/local/bin/tar
 sendbackup: info RECOVER_CMD=/usr/local/bin/gzip -dc |/usr/local/bin/tar
 -f... -
 sendbackup: info COMPRESS_SUFFIX=.gz
 sendbackup: info end
 ? 
 \
 
 /-- turing /cs/turing/home2 lev 1 FAILED [data timeout]
 sendbackup: start [turing:/cs/turing/home2 level 1]
 sendbackup: info BACKUP=/usr/local/bin/tar
 sendbackup: info RECOVER_CMD=/usr/local/bin/gzip -dc |/usr/local/bin/tar
 -f... -
 sendbackup: info COMPRESS_SUFFIX=.gz
 sendbackup: info end
 ? 
 \
 
 /-- turing /opt lev 0 FAILED [data timeout]
 sendbackup: start [turing:/opt level 0]
 sendbackup: info BACKUP=/usr/sbin/ufsdump
 sendbackup: info RECOVER_CMD=/usr/local/bin/gzip -dc
 |/usr/sbin/ufsrestore -f... -
 sendbackup: info COMPRESS_SUFFIX=.gz
 sendbackup: info end
 |   DUMP: Writing 32 Kilobyte records
 |   DUMP: Date of this level 0 dump: Wed Aug 14 01:54:45 2002
 |   DUMP: Date of last level 0 dump: the epoch
 |   DUMP: Dumping /dev/rdsk/c0t0d0s5 (turing:/opt) to standard output.
 |   DUMP: Mapping (Pass I) [regular files]
 |   DUMP: Mapping (Pass II) [directories]
 |   DUMP: Estimated 6949216 blocks (3393.17MB) on 0.05 tapes.
 |   DUMP: Dumping (Pass III) [directories]
 |   DUMP: Dumping (Pass IV) [regular files]
 | 
 ? gzip: stdout: Broken pipe
 ? sendbackup: index tee cannot write [Broken 

Re: Index Tees - Data Timeouts

2002-08-14 Thread Joshua Baker-LePain

On 14 Aug 2002 at 2:19pm, Jim Summers wrote

 In an effort to debug this problem, is there a way I can interactively
 run the command(s) that amanda would run to see if anything is dumped to
 stdout?  If so, are these the commands in runtar and sendbackup?  Or
 would it be better to comment out all filesystems except one of the ones
 having problems and run amdump?

The exact commands are in both runtar*debug and sendbackup*debug.  It 
would be interesting to comment out some filesystems to see if it's a 
function of having too much going on on one host, or if it's something 
particular to those filesystems.

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University




Re: Index Tees - Data Timeouts

2002-08-14 Thread Jon LaBadie

On Wed, Aug 14, 2002 at 02:19:16PM -0500, Jim Summers wrote:
 In an effort to debug this problem, is there a way I can interactively
 run the command(s) that amanda would run to see if anything is dumped to
 stdout?  If so, are these the commands in runtar and sendbackup?  Or
 would it be better to comment out all filesystems except one of the ones
 having problems and run amdump?
 

Two of the 4 failures (1 was only strange) were using ufsdump.
Shouldn't those commands be in rundump debug files?
-- 
Jon H. LaBadie  [EMAIL PROTECTED]
 JG Computing
 4455 Province Line Road(609) 252-0159
 Princeton, NJ  08540-4322  (609) 683-7220 (fax)