Re: level 0 of huge filesystem not working (tar returned 2, and thebackup fails)
On Wed, 12 Mar 2003 at 3:54pm, wab wrote > One filesystem I'm trying to back up with AMANDA is really huge and I'm > encountering errors: > > This filesystem is so huge, a level 0 is taking longer than 24 hours. > Any ideas on what could be going wrong? My best guesses: > > 1. The filesystem is just too big for TAR. > 2. The filesystem is so big, its contents are changing during the tar > process and confusing it or amanda. I backup several DLEs with tar that are rather large -- I think the biggest one is nearly 80GB. That one takes about 3 hours (no compression). Of course, that Linux server is rather fast. > /-- /usr lev 0 FAILED [/usr/local/bin/tar returned 2] > sendbackup: start [:/usr level 0] > sendbackup: info BACKUP=/usr/local/bin/tar > sendbackup: info RECOVER_CMD=/usr/local/bin/gzip -dc |/usr/local/bin/tar > -f... - > sendbackup: info COMPRESS_SUFFIX=.gz > sendbackup: info end > ? gtar: Read error at byte 53808128, reading 10240 bytes, in file > ./archive/www/access.0203.gz: I/O error An I/O error is bad. Look in your system logs for more info on that. > ./opt/freeware/apache/share/htdocs/Library/easmenu.lbi.LCK: No such file > or directory The rest of the stuff, yes, has to do with tarring an active filesystem. > Any ideas as to what might be causing this? Look into that I/O error. Also, is this Solaris? For whatever reason, tar seems rather slow on Solaris (at least a lot of questions on this list seem to point that way). If that's a filesystem, could you try dump? -- Joshua Baker-LePain Department of Biomedical Engineering Duke University
Re: level 0 of huge filesystem not working (tar returned 2, and thebackup fails)
wab wrote: One filesystem I'm trying to back up with AMANDA is really huge and I'm encountering errors: This filesystem is so huge, a level 0 is taking longer than 24 hours. Any ideas on what could be going wrong? My best guesses: 1. The filesystem is just too big for TAR. No, at the end of the tar output, you see its summary line: > | Total bytes written: 30747043840 All fine here. 2. The filesystem is so big, its contents are changing during the tar process and confusing it or amanda. But there are some IO errors: ? gtar: Read error at byte 53808128, reading 10240 bytes, in file ./archive/www/access.0203.gz: I/O error These trigger the "error code 2" message at the end. It means that this file is probably corrupted on tape. But the rest of the archive is still useable. ? gtar: Cannot add file ./opt/freeware/apache/share/htdocs/Library/easmenu.lbi.LCK: No such file or directory > [...etc...] These message are the result of tarring an active filesystem. It's up to you to decide for each file if it is important or not. e.g. a missing lockfile or other temporary file is harmless, a growing logfile is harmless, a growing mailbox (with simple sequential access) is also harmless, but a changing Berkeley DB file is probably more dangerous. -- Paul Bijnens, XplanationTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, F6, * * quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ...* * ... "Are you sure?" ... YES ... Phew ... I'm out * ***
RE: level 0 of huge filesystem not working (tar returned 2, and thebackup fails)
On Thu, 13 Mar 2003 at 9:47am, wab wrote > It's AIX. > > So is this I/O error referring to writing to tape, or reading the file > from disk? I'd much rather use tar than dump... Reading from disk -- check the client's system logs. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University
Re: level 0 of huge filesystem not working (tar returned 2, and thebackup fails)
wab wrote: It's obviously this I/O error that's causing the problem... the filesystem is 67 gig (df -k says 67108864 1048-K blocks). The other filesystems being backed up to the tape only are using 3-4% of tape capacity... and it's a DLT 40/80. The compression ratio seems like all this should fit on 1 tape: STATISTICS: Total Full Daily Dump Time (hrs:min) 28:15 25:04 0:08 (1:18 start, 1:46 idle) Output Size (meg)1205.40.0 1205.4 Original Size (meg) 3534.70.0 3534.7 Avg Compressed Size (%)33.9--33.9 Tape Used (%) 3.40.03.4 (level:#disks ...) Filesystems Dumped 36 0 36 (1:35 2:1) Avg Dump Rate (k/s) 178.1-- 178.1 Avg Tp Write Rate (k/s)13.60.0 2660.9 but maybe it's possible this 67-gig filesystem is filling my DLT tape up, it reaches the end of the tape, and it I/O errors? If so I need to do some math (blech) to determine how much data we can get rid of on this big filesystem... A little further in the NOTES section of the amanda report, you can find out where/if Amanda hit into EndOfTape. Here is a snippet of mine: NOTES: taper: tape ARCHIVE-032 kb 33920672 fm 14 writing file: No space left on device It says that while writing file nr 14 it bumped into end of tape after writing almost 34 GByte to that tape. (The file is taped again completely on the next tape.) What does your report indicate? Is it what you expect for that tape capacity? -- Paul Bijnens, XplanationTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, F6, * * quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ...* * ... "Are you sure?" ... YES ... Phew ... I'm out * ***
RE: level 0 of huge filesystem not working (tar returned 2, and thebackup fails)
On Thu, 13 Mar 2003 at 9:53am, wab wrote > It's obviously this I/O error that's causing the problem... the > filesystem is 67 gig (df -k says 67108864 1048-K blocks). The other > filesystems being backed up to the tape only are using 3-4% of tape > capacity... and it's a DLT 40/80. The compression ratio seems like all > this should fit on 1 tape: > > STATISTICS: > Total Full Daily > > Dump Time (hrs:min) 28:15 25:04 0:08 (1:18 start, > 1:46 idle) > Output Size (meg)1205.40.0 1205.4 > Original Size (meg) 3534.70.0 3534.7 > Avg Compressed Size (%)33.9--33.9 That compression ratio is only from the filesystems that were successfully backed up. The ration can change *drastically* based on the fs contents. > but maybe it's possible this 67-gig filesystem is filling my DLT tape > up, it reaches the end of the tape, and it I/O errors? If so I need to > do some math (blech) to determine how much data we can get rid of on > this big filesystem... Again, the I/O errors were reported by tar, and so come from reading from disk, not writing to tape (which tar isn't doing). -- Joshua Baker-LePain Department of Biomedical Engineering Duke University