Hi,

08.08.2007 12:30,, Ralf Gross wrote::
> Arno Lehmann schrieb:
>>> while looking for my verify problems, I noticed that my full backup had 
>>> errors
>>> at the end of the first tape but terminated with status OK. Therefor I 
>>> noticed
>>> this error very late.
>> But you actually kept and read the system log! Congratulations ;-)
> 
> Yeah, but only after opening a bug report regarding the verify errors.
>  
>>> /var/log/kern.log
>>> Aug  5 06:00:29 VU0EM005 kernel: st0: Current: sense key: Aborted Command
>>> Aug  5 06:00:29 VU0EM005 kernel: Additional sense: Data phase error
>>>
>>>
>>> 05-Aug 02:13 VU0EM005: Start Backup JobId 416, 
>>> Job=VU0EM003.2007-08-05_00.05.01
>>> 05-Aug 02:13 VU0EM005: Spooling data ...
>>> VU0EM003:      /home is a different filesystem. Will not descend from / 
>>> into /home
>>> VU0EM003:      /public is a different filesystem. Will not descend from / 
>>> into /public
>>> VU0EM003:      /lib/init/rw is a different filesystem. Will not descend 
>>> from / into /lib/init/rw
>>> VU0EM003:      /server is a different filesystem. Will not descend from / 
>>> into /server
>>> 05-Aug 04:44 VU0EM005: User specified spool size reached.
>>> 05-Aug 04:44 VU0EM005: Writing spooled data to Volume. Despooling 
>>> 450,971,582,471 bytes ...
>>> 05-Aug 06:00 VU0EM005: VU0EM003.2007-08-05_00.05.01 Error: block.c:569 
>>> Write error at 405:11595 on device "LTO3" (/dev/ULTRIUM-TD3). 
>>> ERR=Eingabe-/Ausgabefehler.
>>> 05-Aug 06:00 VU0EM005: VU0EM003.2007-08-05_00.05.01 Error: block.c:317 
>>> Volume data error at 405:4294967295!
>>> Block checksum mismatch in block=5235094 len=64512: calc=2a90e8e7 
>>> blk=20b379e5
>>> 05-Aug 06:00 VU0EM005: VU0EM003.2007-08-05_00.05.01 Error: Re-read last 
>>> block at EOT failed. ERR=block.c:317 Volume data error at 405:4294967295!
>>> Block checksum mismatch in block=5235094 len=64512: calc=2a90e8e7 
>>> blk=20b379e5
>>> 05-Aug 06:00 VU0EM005: End of medium on Volume "06D128L3" 
>>> Bytes=404,723,386,368 Blocks=6,273,613 at 05-Aug-2007 06:00.
>> Ok, this is unverified by me, but I *believe* that in these cases 
>> Bacul does the right thing, i.e. it writes the block in question to 
>> the next tape again.
>>
>> At least I hope this is the point of doing the re-read of the block 
>> where an error occured...
>>
>> And it would explain why this is not an error from the job's point of 
>> view.
> 
> Hm, I still *hope* something was wrong with the backup, because if
> not, I don't know why the verify and restore jobs both failed.

Good point... I admit I haven't followed this thread in detail as it 
started during my vacation :-)

...
>>> I've rerun the btape test (test/fill). No problems. At the moment I'm 
>>> running
>>> the full backup to the same tapes again. The end of the first tape was 
>>> detected
>>> without Block checksum mismatch.
>> You could have tried a restore of the job with the (possible) error...
> 
> I did. First I restored only a few diretories, without errors (70 GB).
> I was still thinking the verify job had problems, not the real backup.
> 
> There were many different errors with the verify jobs. But the last
> verify job  Itried also had a block mismatch at the end of tape #1.
> 
> 07-Aug 19:54 VU0EM005: End of file 405 on device "LTO3" (/dev/ULTRIUM-TD3), 
> Volume "06D128L3"
> 07-Aug 19:54 VU0EM005: VerifyVU0EM003.2007-08-07_18.18.29 Error: block.c:317 
> Volume data error at 405:11594!
> 
> 
> Then I did a full restore. This time I got an completely different
> error at a different volfile.
> 
> 07-Aug 21:56 VU0EM005: End of file 203 on device "LTO3" (/dev/ULTRIUM-TD3), 
> Volume "06D128L3"
> 07-Aug 22:12 VU0EM005: RestoreFiles.2007-08-07_20.56.07 Fatal error: 
> read.c:139 Error sending to File daemon. ERR=Die Wartezeit für die Verbindung 
> ist abgelaufen
> 07-Aug 22:12 VU0EM005: RestoreFiles.2007-08-07_20.56.07 Error: bnet.c:439 
> Write error sending 65536 bytes to client:10.60.1.252:36643: ERR=Die 
> Wartezeit für die Verbindung ist abgelaufen
> 07-Aug 22:35 VU0EM003: RestoreFiles.2007-08-07_20.56.07 Fatal error: 
> restore.c:252 Data record error. ERR=Unterbrechung während des
> Betriebssystemaufrufs

Ok, that does look more like a problem.

There might be some sort of network timeout involved, probably 
(partly) caused by the SD waiting for some tape activity. I wouldn't 
say this is normal, though.

Anyway, I see two different problems: A tape problem, perhaps a 
one-time problem that you won't be able to reproduce.

Then some network timeout, which is often related to a irewall/router 
with a non-standard behaviour in between SD and FD. Setting the 
heartbeat interval directive can help fixing those issues.

> 
>> Regarding the tape write/read error, I believe these happen from time 
>> to time, especially when related to end of tape detection (400GB on 
>> LTO3 seems like the tape was actually full, but for some reason the 
>> drive didn't report the approaching EOT to the OS). I would worry if 
>> this happened regularly, but if it's a one-time glitch, I think 
>> there's no problem.
> 
> That was the first backup with two tapes since I tested the drive 6
> months ago...

Not so good. Well. you're running more tests now, which seems 
advisable. It might be worth trying to do a tar backup and restore 
spanning two tapes, just for reference. That takes a while with LTO, 
though...

Arno

> Ralf
> 
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >>  http://get.splunk.com/
> _______________________________________________
> Bacula-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/bacula-users

-- 
Arno Lehmann
IT-Service Lehmann
www.its-lehmann.de

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to