Update on Index Tees - Data Timeouts
Test #1: I commented out all filesystems except one of the failing ones and simply ran amdump. RESULT: Worked perfectly. Test #2: Added in the final two failing filesystems and ran amdump. RESULT: Worked perfectly. Any suggestions on what it could be. I have the following set in amanda.conf: inparallel 16 netusage 75000 Kbps tapebufs 40 reserve 25 for my holdingdisk I have set the chunksize to 1Gb Which side is most likely to cause a Data Timeout client or server? Are there any suggestions on where to do the compression? Jon, thank you for your response regarding the ufsdumps. I had reverted back to that during that particular run to see if I could ease some contention somewhere. But to no avail. Thanks again, Jim On Wed, 2002-08-14 at 14:37, Joshua Baker-LePain wrote: On 14 Aug 2002 at 2:19pm, Jim Summers wrote In an effort to debug this problem, is there a way I can interactively run the command(s) that amanda would run to see if anything is dumped to stdout? If so, are these the commands in runtar and sendbackup? Or would it be better to comment out all filesystems except one of the ones having problems and run amdump? The exact commands are in both runtar*debug and sendbackup*debug. It would be interesting to comment out some filesystems to see if it's a function of having too much going on on one host, or if it's something particular to those filesystems. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University
Re: Update on Index Tees - Data Timeouts
On Thu, Aug 15, 2002 at 11:08:51AM -0500, Jim Summers wrote: Test #1: I commented out all filesystems except one of the failing ones and simply ran amdump. RESULT: Worked perfectly. Test #2: Added in the final two failing filesystems and ran amdump. RESULT: Worked perfectly. Any suggestions on what it could be. Perhaps during the problem times all the file systems were doing level 0's and the dumps were running in parallel. This could cause high levels of activity that could cause great slow downs. Perhaps net contention. Perhaps multiple dumps from a single host and high cpu usage. Perhaps multiple dumps from a single disk drive and lots of disk head movement slowing things down. Now some are doing incrementals while others are level 0s. -- Jon H. LaBadie [EMAIL PROTECTED] JG Computing 4455 Province Line Road(609) 252-0159 Princeton, NJ 08540-4322 (609) 683-7220 (fax)
Re: Update on Index Tees - Data Timeouts
On 15 Aug 2002 at 11:08am, Jim Summers wrote Any suggestions on what it could be. I have the following set in amanda.conf: inparallel 16 netusage 75000 Kbps tapebufs 40 reserve 25 for my holdingdisk I have set the chunksize to 1Gb Maybe you're hitting some network contention problems -- try cranking down inparallel? Are there any suggestions on where to do the compression? On the hosts that can handle it. I have a fast amanda server, and mostly fast clients. For the not fast clients, I do compression on the server. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University
Index Tees - Data Timeouts
I am running Amanda 2.4.2p2 on a Redhat Linux 7.3 as my Amanda server. The clients are mostly Solaris. I have been been backing up the server and adding clients one at a time. Everything was working well, one server and two clients, then I added a third client. Now I getting data timeouts and index tee broken messages in my Amanda reports and in the system log files. I have perused the docs, FAQ, and the various log files that Amanda generates and I not finding the clue to point me to the problem. All I keep seeing are the index tee messages, but not what is causing the index tee messages to be generated. Suggestions or ideas? If more info is needed, amanda.conf entries etc... let me know and I can provide. Thanks in advance, Jim Summers [EMAIL PROTECTED]
Re: Index Tees - Data Timeouts
On 14 Aug 2002 at 8:09am, Jim Summers wrote I am running Amanda 2.4.2p2 on a Redhat Linux 7.3 as my Amanda server. The clients are mostly Solaris. I have been been backing up the server and adding clients one at a time. Everything was working well, one server and two clients, then I added a third client. Now I getting data timeouts and index tee broken messages in my Amanda reports and in the system log files. From which systems? The actual error messages would be most helpful. I have perused the docs, FAQ, and the various log files that Amanda generates and I not finding the clue to point me to the problem. All I keep seeing are the index tee messages, but not what is causing the index tee messages to be generated. Also send along the contents of /tmp/amanda/sendbackup*debug from the failing clients. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University
Re: Index Tees - Data Timeouts
On 14 Aug 2002 at 9:26am, Jim Summers wrote On Wed, 2002-08-14 at 08:23, Joshua Baker-LePain wrote: On 14 Aug 2002 at 8:09am, Jim Summers wrote I am running Amanda 2.4.2p2 on a Redhat Linux 7.3 as my Amanda server. The clients are mostly Solaris. I have been been backing up the server and adding clients one at a time. Everything was working well, one server and two clients, then I added a third client. Now I getting data timeouts and index tee broken messages in my Amanda reports and in the system log files. From which systems? The actual error messages would be most helpful. From one of the working systems a Sun E250 Solaris 8 and from the newly added system Sun Ultra10 Solaris 8. I will send the amanda report when I get the next one. You said you had messages in the system log files -- what are those? You could also try increasing dtimeout... How do your dumprates look? -- Joshua Baker-LePain Department of Biomedical Engineering Duke University
Re: Index Tees - Data Timeouts
On Wed, 2002-08-14 at 09:44, Joshua Baker-LePain wrote: On 14 Aug 2002 at 9:26am, Jim Summers wrote On Wed, 2002-08-14 at 08:23, Joshua Baker-LePain wrote: On 14 Aug 2002 at 8:09am, Jim Summers wrote I am running Amanda 2.4.2p2 on a Redhat Linux 7.3 as my Amanda server. The clients are mostly Solaris. I have been been backing up the server and adding clients one at a time. Everything was working well, one server and two clients, then I added a third client. Now I getting data timeouts and index tee broken messages in my Amanda reports and in the system log files. From which systems? The actual error messages would be most helpful. From one of the working systems a Sun E250 Solaris 8 and from the newly added system Sun Ultra10 Solaris 8. I will send the amanda report when I get the next one. You said you had messages in the system log files -- what are those? You Here are the messages in my system log file: Aug 14 01:14:14 turing sendbackup[17657]: [ID 702911 auth.notice] index tee cannot write [Broken pipe] Aug 14 01:14:14 turing sendbackup[17655]: [ID 702911 auth.notice] error [/usr/local/bin/tar got signal 13, compress returned 1] Aug 14 02:06:34 turing sendbackup[17740]: [ID 702911 auth.notice] index tee cannot write [Broken pipe] could also try increasing dtimeout... I have twiddled with that one went from 1800 to 3600. Then back to 1800 and I am currently at 2400. How do your dumprates look? Here is the last amanda report received. I incorrectly used the wrong dump type on the /usr/oracle fs. These dumps were to tapes daily09, daily10. The next 2 tapes Amanda expects to used are: daily11, daily12. FAILURE AND STRANGE DUMP SUMMARY: tarjan /opt lev 0 FAILED [data timeout] turing /cs/turing/home2 lev 1 FAILED [data timeout]These dumps were to tapes daily09, daily10. The next 2 tapes Amanda expects to used are: daily11, daily12. FAILURE AND STRANGE DUMP SUMMARY: tarjan /opt lev 0 FAILED [data timeout] turing /cs/turing/home2 lev 1 FAILED [data timeout] turing /opt lev 0 FAILED [data timeout] tarjan /usr/oracle lev 0 FAILED [data timeout] turing /usr lev 0 STRANGE STATISTICS: Total Full Daily Estimate Time (hrs:min)0:50 Run Time (hrs:min)10:21 Dump Time (hrs:min)4:26 4:08 0:17 Output Size (meg) 13542.111766.4 1775.7 Original Size (meg) 26361.524555.9 1805.6 Avg Compressed Size (%)47.9 47.97.3 (level:#disks ...) Filesystems Dumped 13 4 9 (1:9) Avg Dump Rate (k/s) 870.0 808.9 1742.9 Tape Time (hrs:min)3:23 2:56 0:27 Tape Size (meg) 13542.511766.5 1776.0 Tape Used (%) 116.7 101.4 15.3 (level:#disks ...) Filesystems Taped13 4 9 (1:9) Avg Tp Write Rate (k/s) 1137.5 1141.5 .7 FAILED AND STRANGE DUMP DETAILS: /-- tarjan /opt lev 0 FAILED [data timeout] sendbackup: start [tarjan:/opt level 0] sendbackup: info BACKUP=/usr/local/bin/tar sendbackup: info RECOVER_CMD=/usr/local/bin/gzip -dc |/usr/local/bin/tar -f... - sendbackup: info COMPRESS_SUFFIX=.gz sendbackup: info end ? \ /-- turing /cs/turing/home2 lev 1 FAILED [data timeout] sendbackup: start [turing:/cs/turing/home2 level 1] sendbackup: info BACKUP=/usr/local/bin/tar sendbackup: info RECOVER_CMD=/usr/local/bin/gzip -dc |/usr/local/bin/tar -f... - sendbackup: info COMPRESS_SUFFIX=.gz sendbackup: info end ? \ /-- turing /opt lev 0 FAILED [data timeout] sendbackup: start [turing:/opt level 0] sendbackup: info BACKUP=/usr/sbin/ufsdump sendbackup: info RECOVER_CMD=/usr/local/bin/gzip -dc |/usr/sbin/ufsrestore -f... - sendbackup: info COMPRESS_SUFFIX=.gz sendbackup: info end | DUMP: Writing 32 Kilobyte records | DUMP: Date of this level 0 dump: Wed Aug 14 01:54:45 2002 | DUMP: Date of last level 0 dump: the epoch | DUMP: Dumping /dev/rdsk/c0t0d0s5 (turing:/opt) to standard output. | DUMP: Mapping (Pass I) [regular files] | DUMP: Mapping (Pass II) [directories] | DUMP: Estimated 6949216 blocks (3393.17MB) on 0.05 tapes. | DUMP: Dumping (Pass III) [directories] | DUMP: Dumping (Pass IV) [regular files] | ? gzip: stdout: Broken pipe ? sendbackup: index tee cannot write [Broken pipe] | DUMP: Broken pipe | DUMP: The ENTIRE dump is aborted. ? index returned 1 sendbackup: error [/usr/sbin/ufsdump returned 3, compress returned 1] \ /-- tarjan /usr/oracle lev 0 FAILED [data timeout] sendbackup: start [tarjan:/usr/oracle level 0] sendbackup: info BACKUP=/usr/sbin/ufsdump sendbackup: info RECOVER_CMD=/usr/local/bin/gzip -dc |/usr/sbin/ufsrestore -f... - sendbackup: info COMPRESS_SUFFIX=.gz sendbackup: info end | DUMP: Writing 32 Kilobyte records | DUMP: Date of
Re: Index Tees - Data Timeouts
In an effort to debug this problem, is there a way I can interactively run the command(s) that amanda would run to see if anything is dumped to stdout? If so, are these the commands in runtar and sendbackup? Or would it be better to comment out all filesystems except one of the ones having problems and run amdump? Thanks again, Jim On Wed, 2002-08-14 at 10:41, Jim Summers wrote: On Wed, 2002-08-14 at 09:44, Joshua Baker-LePain wrote: On 14 Aug 2002 at 9:26am, Jim Summers wrote On Wed, 2002-08-14 at 08:23, Joshua Baker-LePain wrote: On 14 Aug 2002 at 8:09am, Jim Summers wrote I am running Amanda 2.4.2p2 on a Redhat Linux 7.3 as my Amanda server. The clients are mostly Solaris. I have been been backing up the server and adding clients one at a time. Everything was working well, one server and two clients, then I added a third client. Now I getting data timeouts and index tee broken messages in my Amanda reports and in the system log files. From which systems? The actual error messages would be most helpful. From one of the working systems a Sun E250 Solaris 8 and from the newly added system Sun Ultra10 Solaris 8. I will send the amanda report when I get the next one. You said you had messages in the system log files -- what are those? You Here are the messages in my system log file: Aug 14 01:14:14 turing sendbackup[17657]: [ID 702911 auth.notice] index tee cannot write [Broken pipe] Aug 14 01:14:14 turing sendbackup[17655]: [ID 702911 auth.notice] error [/usr/local/bin/tar got signal 13, compress returned 1] Aug 14 02:06:34 turing sendbackup[17740]: [ID 702911 auth.notice] index tee cannot write [Broken pipe] could also try increasing dtimeout... I have twiddled with that one went from 1800 to 3600. Then back to 1800 and I am currently at 2400. How do your dumprates look? Here is the last amanda report received. I incorrectly used the wrong dump type on the /usr/oracle fs. These dumps were to tapes daily09, daily10. The next 2 tapes Amanda expects to used are: daily11, daily12. FAILURE AND STRANGE DUMP SUMMARY: tarjan /opt lev 0 FAILED [data timeout] turing /cs/turing/home2 lev 1 FAILED [data timeout]These dumps were to tapes daily09, daily10. The next 2 tapes Amanda expects to used are: daily11, daily12. FAILURE AND STRANGE DUMP SUMMARY: tarjan /opt lev 0 FAILED [data timeout] turing /cs/turing/home2 lev 1 FAILED [data timeout] turing /opt lev 0 FAILED [data timeout] tarjan /usr/oracle lev 0 FAILED [data timeout] turing /usr lev 0 STRANGE STATISTICS: Total Full Daily Estimate Time (hrs:min)0:50 Run Time (hrs:min)10:21 Dump Time (hrs:min)4:26 4:08 0:17 Output Size (meg) 13542.111766.4 1775.7 Original Size (meg) 26361.524555.9 1805.6 Avg Compressed Size (%)47.9 47.97.3 (level:#disks ...) Filesystems Dumped 13 4 9 (1:9) Avg Dump Rate (k/s) 870.0 808.9 1742.9 Tape Time (hrs:min)3:23 2:56 0:27 Tape Size (meg) 13542.511766.5 1776.0 Tape Used (%) 116.7 101.4 15.3 (level:#disks ...) Filesystems Taped13 4 9 (1:9) Avg Tp Write Rate (k/s) 1137.5 1141.5 .7 FAILED AND STRANGE DUMP DETAILS: /-- tarjan /opt lev 0 FAILED [data timeout] sendbackup: start [tarjan:/opt level 0] sendbackup: info BACKUP=/usr/local/bin/tar sendbackup: info RECOVER_CMD=/usr/local/bin/gzip -dc |/usr/local/bin/tar -f... - sendbackup: info COMPRESS_SUFFIX=.gz sendbackup: info end ? \ /-- turing /cs/turing/home2 lev 1 FAILED [data timeout] sendbackup: start [turing:/cs/turing/home2 level 1] sendbackup: info BACKUP=/usr/local/bin/tar sendbackup: info RECOVER_CMD=/usr/local/bin/gzip -dc |/usr/local/bin/tar -f... - sendbackup: info COMPRESS_SUFFIX=.gz sendbackup: info end ? \ /-- turing /opt lev 0 FAILED [data timeout] sendbackup: start [turing:/opt level 0] sendbackup: info BACKUP=/usr/sbin/ufsdump sendbackup: info RECOVER_CMD=/usr/local/bin/gzip -dc |/usr/sbin/ufsrestore -f... - sendbackup: info COMPRESS_SUFFIX=.gz sendbackup: info end | DUMP: Writing 32 Kilobyte records | DUMP: Date of this level 0 dump: Wed Aug 14 01:54:45 2002 | DUMP: Date of last level 0 dump: the epoch | DUMP: Dumping /dev/rdsk/c0t0d0s5 (turing:/opt) to standard output. | DUMP: Mapping (Pass I) [regular files] | DUMP: Mapping (Pass II) [directories] | DUMP: Estimated 6949216 blocks (3393.17MB) on 0.05 tapes. | DUMP: Dumping (Pass III) [directories] | DUMP: Dumping (Pass IV) [regular files] | ? gzip: stdout: Broken pipe ? sendbackup: index tee cannot write [Broken
Re: Index Tees - Data Timeouts
On 14 Aug 2002 at 2:19pm, Jim Summers wrote In an effort to debug this problem, is there a way I can interactively run the command(s) that amanda would run to see if anything is dumped to stdout? If so, are these the commands in runtar and sendbackup? Or would it be better to comment out all filesystems except one of the ones having problems and run amdump? The exact commands are in both runtar*debug and sendbackup*debug. It would be interesting to comment out some filesystems to see if it's a function of having too much going on on one host, or if it's something particular to those filesystems. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University
Re: Index Tees - Data Timeouts
On Wed, Aug 14, 2002 at 02:19:16PM -0500, Jim Summers wrote: In an effort to debug this problem, is there a way I can interactively run the command(s) that amanda would run to see if anything is dumped to stdout? If so, are these the commands in runtar and sendbackup? Or would it be better to comment out all filesystems except one of the ones having problems and run amdump? Two of the 4 failures (1 was only strange) were using ufsdump. Shouldn't those commands be in rundump debug files? -- Jon H. LaBadie [EMAIL PROTECTED] JG Computing 4455 Province Line Road(609) 252-0159 Princeton, NJ 08540-4322 (609) 683-7220 (fax)