Re: level 0 of huge filesystem not working (tar returned 2, and thebackup fails)

2003-03-12 Thread Joshua Baker-LePain
On Wed, 12 Mar 2003 at 3:54pm, wab wrote

> One filesystem I'm trying to back up with AMANDA is really huge and I'm
> encountering errors:
> 
> This filesystem is so huge, a level 0 is taking longer than 24 hours.
> Any ideas on what could be going wrong? My best guesses:
> 
> 1. The filesystem is just too big for TAR.
> 2. The filesystem is so big, its contents are changing during the tar
> process and confusing it or amanda.

I backup several DLEs with tar that are rather large -- I think the 
biggest one is nearly 80GB.  That one takes about 3 hours (no 
compression).  Of course, that Linux server is rather fast.

> /--  /usr lev 0 FAILED [/usr/local/bin/tar returned 2]
> sendbackup: start [:/usr level 0]
> sendbackup: info BACKUP=/usr/local/bin/tar
> sendbackup: info RECOVER_CMD=/usr/local/bin/gzip -dc |/usr/local/bin/tar
> -f... -
> sendbackup: info COMPRESS_SUFFIX=.gz
> sendbackup: info end
> ? gtar: Read error at byte 53808128, reading 10240 bytes, in file
> ./archive/www/access.0203.gz: I/O error

An I/O error is bad.  Look in your system logs for more info on that.

> ./opt/freeware/apache/share/htdocs/Library/easmenu.lbi.LCK: No such file
> or directory

The rest of the stuff, yes, has to do with tarring an active filesystem.

> Any ideas as to what might be causing this?

Look into that I/O error.  Also, is this Solaris?  For whatever reason, 
tar seems rather slow on Solaris (at least a lot of questions on this 
list seem to point that way).  If that's a filesystem, could you try 
dump?

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University






Re: level 0 of huge filesystem not working (tar returned 2, and thebackup fails)

2003-03-13 Thread Paul Bijnens
wab wrote:
One filesystem I'm trying to back up with AMANDA is really huge and I'm
encountering errors:
This filesystem is so huge, a level 0 is taking longer than 24 hours.
Any ideas on what could be going wrong? My best guesses:
1. The filesystem is just too big for TAR.
No, at the end of the tar output, you see its summary line:

> | Total bytes written: 30747043840

All fine here.

2. The filesystem is so big, its contents are changing during the tar
process and confusing it or amanda.
But there are some IO errors:

? gtar: Read error at byte 53808128, reading 10240 bytes, in file
./archive/www/access.0203.gz: I/O error
These trigger the "error code 2" message at the end.
It means that this file is probably corrupted on tape.
But the rest of the archive is still useable.
? gtar: Cannot add file
./opt/freeware/apache/share/htdocs/Library/easmenu.lbi.LCK: No such file
or directory
> [...etc...]

These message are the result of tarring an active filesystem.
It's up to you to decide for each file if it is important or not.
e.g. a missing lockfile or other temporary file is harmless, a growing 
logfile is harmless, a growing mailbox (with simple sequential access) 
is also harmless, but a changing Berkeley DB file is probably more 
dangerous.

--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out  *
***



RE: level 0 of huge filesystem not working (tar returned 2, and thebackup fails)

2003-03-13 Thread Joshua Baker-LePain
On Thu, 13 Mar 2003 at 9:47am, wab wrote

> It's AIX.
> 
> So is this I/O error referring to writing to tape, or reading the file
> from disk? I'd much rather use tar than dump... 

Reading from disk -- check the client's system logs.

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University



Re: level 0 of huge filesystem not working (tar returned 2, and thebackup fails)

2003-03-13 Thread Paul Bijnens
wab wrote:
It's obviously this I/O error that's causing the problem... the
filesystem is 67 gig (df -k says 67108864 1048-K blocks). The other
filesystems being backed up to the tape only are using 3-4% of tape
capacity... and it's a DLT 40/80. The compression ratio seems like all
this should fit on 1 tape:
STATISTICS:
  Total   Full  Daily
      
Dump Time (hrs:min)   28:15  25:04   0:08   (1:18 start,
1:46 idle)
Output Size (meg)1205.40.0 1205.4
Original Size (meg)  3534.70.0 3534.7
Avg Compressed Size (%)33.9--33.9
Tape Used (%)   3.40.03.4   (level:#disks
...)
Filesystems Dumped   36  0 36   (1:35 2:1)
Avg Dump Rate (k/s)   178.1--   178.1
Avg Tp Write Rate (k/s)13.60.0 2660.9
but maybe it's possible this 67-gig filesystem is filling my DLT tape
up, it reaches the end of the tape, and it I/O errors? If so I need to
do some math (blech) to determine how much data we can get rid of on
this big filesystem...
A little further in the NOTES section of the amanda report,
you can find out where/if Amanda hit into EndOfTape.
Here is a snippet of mine:
NOTES:
  taper: tape ARCHIVE-032 kb 33920672 fm 14 writing file: No space left 
on device

It says that while writing file nr 14 it bumped into end of tape after
writing almost 34 GByte to that tape.  (The file is taped again 
completely on the next tape.)

What does your report indicate?  Is it what you expect for that tape 
capacity?

--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out  *
***



RE: level 0 of huge filesystem not working (tar returned 2, and thebackup fails)

2003-03-13 Thread Joshua Baker-LePain
On Thu, 13 Mar 2003 at 9:53am, wab wrote

> It's obviously this I/O error that's causing the problem... the
> filesystem is 67 gig (df -k says 67108864 1048-K blocks). The other
> filesystems being backed up to the tape only are using 3-4% of tape
> capacity... and it's a DLT 40/80. The compression ratio seems like all
> this should fit on 1 tape:
> 
> STATISTICS:
>   Total   Full  Daily
>       
> Dump Time (hrs:min)   28:15  25:04   0:08   (1:18 start,
> 1:46 idle)
> Output Size (meg)1205.40.0 1205.4
> Original Size (meg)  3534.70.0 3534.7
> Avg Compressed Size (%)33.9--33.9

That compression ratio is only from the filesystems that were successfully 
backed up.  The ration can change *drastically* based on the fs contents.  

> but maybe it's possible this 67-gig filesystem is filling my DLT tape
> up, it reaches the end of the tape, and it I/O errors? If so I need to
> do some math (blech) to determine how much data we can get rid of on
> this big filesystem...

Again, the I/O errors were reported by tar, and so come from reading from 
disk, not writing to tape (which tar isn't doing).

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University