Hi, 08.08.2007 12:30,, Ralf Gross wrote:: > Arno Lehmann schrieb: >>> while looking for my verify problems, I noticed that my full backup had >>> errors >>> at the end of the first tape but terminated with status OK. Therefor I >>> noticed >>> this error very late. >> But you actually kept and read the system log! Congratulations ;-) > > Yeah, but only after opening a bug report regarding the verify errors. > >>> /var/log/kern.log >>> Aug 5 06:00:29 VU0EM005 kernel: st0: Current: sense key: Aborted Command >>> Aug 5 06:00:29 VU0EM005 kernel: Additional sense: Data phase error >>> >>> >>> 05-Aug 02:13 VU0EM005: Start Backup JobId 416, >>> Job=VU0EM003.2007-08-05_00.05.01 >>> 05-Aug 02:13 VU0EM005: Spooling data ... >>> VU0EM003: /home is a different filesystem. Will not descend from / >>> into /home >>> VU0EM003: /public is a different filesystem. Will not descend from / >>> into /public >>> VU0EM003: /lib/init/rw is a different filesystem. Will not descend >>> from / into /lib/init/rw >>> VU0EM003: /server is a different filesystem. Will not descend from / >>> into /server >>> 05-Aug 04:44 VU0EM005: User specified spool size reached. >>> 05-Aug 04:44 VU0EM005: Writing spooled data to Volume. Despooling >>> 450,971,582,471 bytes ... >>> 05-Aug 06:00 VU0EM005: VU0EM003.2007-08-05_00.05.01 Error: block.c:569 >>> Write error at 405:11595 on device "LTO3" (/dev/ULTRIUM-TD3). >>> ERR=Eingabe-/Ausgabefehler. >>> 05-Aug 06:00 VU0EM005: VU0EM003.2007-08-05_00.05.01 Error: block.c:317 >>> Volume data error at 405:4294967295! >>> Block checksum mismatch in block=5235094 len=64512: calc=2a90e8e7 >>> blk=20b379e5 >>> 05-Aug 06:00 VU0EM005: VU0EM003.2007-08-05_00.05.01 Error: Re-read last >>> block at EOT failed. ERR=block.c:317 Volume data error at 405:4294967295! >>> Block checksum mismatch in block=5235094 len=64512: calc=2a90e8e7 >>> blk=20b379e5 >>> 05-Aug 06:00 VU0EM005: End of medium on Volume "06D128L3" >>> Bytes=404,723,386,368 Blocks=6,273,613 at 05-Aug-2007 06:00. >> Ok, this is unverified by me, but I *believe* that in these cases >> Bacul does the right thing, i.e. it writes the block in question to >> the next tape again. >> >> At least I hope this is the point of doing the re-read of the block >> where an error occured... >> >> And it would explain why this is not an error from the job's point of >> view. > > Hm, I still *hope* something was wrong with the backup, because if > not, I don't know why the verify and restore jobs both failed.
Good point... I admit I haven't followed this thread in detail as it started during my vacation :-) ... >>> I've rerun the btape test (test/fill). No problems. At the moment I'm >>> running >>> the full backup to the same tapes again. The end of the first tape was >>> detected >>> without Block checksum mismatch. >> You could have tried a restore of the job with the (possible) error... > > I did. First I restored only a few diretories, without errors (70 GB). > I was still thinking the verify job had problems, not the real backup. > > There were many different errors with the verify jobs. But the last > verify job Itried also had a block mismatch at the end of tape #1. > > 07-Aug 19:54 VU0EM005: End of file 405 on device "LTO3" (/dev/ULTRIUM-TD3), > Volume "06D128L3" > 07-Aug 19:54 VU0EM005: VerifyVU0EM003.2007-08-07_18.18.29 Error: block.c:317 > Volume data error at 405:11594! > > > Then I did a full restore. This time I got an completely different > error at a different volfile. > > 07-Aug 21:56 VU0EM005: End of file 203 on device "LTO3" (/dev/ULTRIUM-TD3), > Volume "06D128L3" > 07-Aug 22:12 VU0EM005: RestoreFiles.2007-08-07_20.56.07 Fatal error: > read.c:139 Error sending to File daemon. ERR=Die Wartezeit für die Verbindung > ist abgelaufen > 07-Aug 22:12 VU0EM005: RestoreFiles.2007-08-07_20.56.07 Error: bnet.c:439 > Write error sending 65536 bytes to client:10.60.1.252:36643: ERR=Die > Wartezeit für die Verbindung ist abgelaufen > 07-Aug 22:35 VU0EM003: RestoreFiles.2007-08-07_20.56.07 Fatal error: > restore.c:252 Data record error. ERR=Unterbrechung während des > Betriebssystemaufrufs Ok, that does look more like a problem. There might be some sort of network timeout involved, probably (partly) caused by the SD waiting for some tape activity. I wouldn't say this is normal, though. Anyway, I see two different problems: A tape problem, perhaps a one-time problem that you won't be able to reproduce. Then some network timeout, which is often related to a irewall/router with a non-standard behaviour in between SD and FD. Setting the heartbeat interval directive can help fixing those issues. > >> Regarding the tape write/read error, I believe these happen from time >> to time, especially when related to end of tape detection (400GB on >> LTO3 seems like the tape was actually full, but for some reason the >> drive didn't report the approaching EOT to the OS). I would worry if >> this happened regularly, but if it's a one-time glitch, I think >> there's no problem. > > That was the first backup with two tapes since I tested the drive 6 > months ago... Not so good. Well. you're running more tests now, which seems advisable. It might be worth trying to do a tar backup and restore spanning two tapes, just for reference. That takes a while with LTO, though... Arno > Ralf > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Bacula-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/bacula-users -- Arno Lehmann IT-Service Lehmann www.its-lehmann.de ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Bacula-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-users
