Hi,

I've configured verify jobs for my backups. This was working ok for the last 2
weeks with inc and diff backups. But now I get errors for the full backup
verify jobs.


Terminated Jobs:
 JobId Level Files Bytes Status Finished Name
====================================================================
   415 Full 370,351 66.86 G OK 05-Aug-07 02:13 SMTCZB0003
   416 Full 1,631,258 706.3 G OK 05-Aug-07 09:37 VU0EM003
   417 Full 1 882.1 M OK 05-Aug-07 09:39 BackupCatalog
   418 Volu 61,991 0 Error 05-Aug-07 12:15 VerifySMTCZB0003
   419 Volu 449,341 0 Diffs 05-Aug-07 15:10 VerifyVU0EM003
   420 Volu 1 0 OK 05-Aug-07 15:15 VerifyCATALOG



I ran both verify jobs again this moring with the same results.


client SMTCZB0003:
(client is behind a firewall, Heartbeat Interval = 60 + SDConnectTimeout = 4200 
fd settings):

06-Aug 08:19 VU0EM005: Verifying against JobId=415 
Job=SMTCZB0003.2007-08-05_00.05.00
06-Aug 08:19 VU0EM005: Bootstrap records written to 
/opt/bacula/var/bacula/working/VU0EM005.restore.10.bsr
06-Aug 08:19 VU0EM005:
06-Aug 08:19 VU0EM005: The job will require the following
   Volume(s) Storage(s) SD Device(s)
===========================================================================
06-Aug 08:19 VU0EM005:
06-Aug 08:19 VU0EM005: 06D128L3 NEC-T40A NEC-T40A
06-Aug 08:19 VU0EM005:
06-Aug 08:19 VU0EM005: Start Verify JobId=421 Level=VolumeToCatalog 
Job=VerifySMTCZB0003.2007-08-06_08.19.38
06-Aug 08:19 VU0EM005: 3307 Issuing autochanger "unload slot 19, drive 0" 
command.
06-Aug 08:21 VU0EM005: 3304 Issuing autochanger "load slot 9, drive 0" command.
06-Aug 08:22 VU0EM005: 3305 Autochanger "load slot 9, drive 0", status is OK.
06-Aug 08:22 VU0EM005: 3301 Issuing autochanger "loaded? drive 0" command.
06-Aug 08:22 VU0EM005: 3302 Autochanger "loaded? drive 0", result is Slot 9.
06-Aug 08:22 VU0EM005: Ready to read from volume "06D128L3" on device "LTO3" 
(/dev/ULTRIUM-TD3).
06-Aug 08:22 VU0EM005: Forward spacing Volume "06D128L3" to file:block 0:1.
06-Aug 08:23 VU0EM005: End of file 1 on device "LTO3" (/dev/ULTRIUM-TD3), 
Volume "06D128L3"
06-Aug 08:25 VU0EM005: End of file 2 on device "LTO3" (/dev/ULTRIUM-TD3), 
Volume "06D128L3"
[...]
06-Aug 09:45 VU0EM005: End of file 58 on device "LTO3" (/dev/ULTRIUM-TD3), 
Volume "06D128L3"
06-Aug 09:46 VU0EM005: New file: 
/home/mksadmin/MK<FF><FF><FF><FA>S/IntegrityServer/mksis/bulk/^home/mksadmin/Projects/ERW_Projekte/LEGO/Library/Tools/csd/basics/std/docu/en/html/rcs/node2.html/1.5
06-Aug 09:46 VU0EM005:
06-Aug 09:46 VU0EM005: The following files are in the Catalog but not on disk:
06-Aug 09:46 VU0EM005: /home/mksadmin/.bash_history
06-Aug 09:46 VU0EM005: /home/mksadmin/.xinitrc
06-Aug 09:46 VU0EM005: /home/mksadmin/.Xauthority
[...several more hundred lines...]
06-Aug 09:46 VU0EM005: VerifySMTCZB0003.2007-08-06_08.19.38 Fatal error: 
bnet.c:241 Packet size too big from "File daemon:53.115.208.20:9102. 
Terminating connection.
06-Aug 09:53 VU0EM005: VerifySMTCZB0003.2007-08-06_08.19.38 Fatal error: 
read.c:139 Error sending to File daemon. ERR=Datenübergabe unterbrochen (broken 
pipe)
06-Aug 09:53 VU0EM005: VerifySMTCZB0003.2007-08-06_08.19.38 Error: bnet.c:439 
Write error sending 65536 bytes to client:53.115.208.20:36643: 
ERR=Datenübergabe unterbrochen (broken pipe)
06-Aug 09:53 VU0EM005: VerifySMTCZB0003.2007-08-06_08.19.38 Fatal error: No Job 
status returned from FD.
06-Aug 09:53 VU0EM005: VerifySMTCZB0003.2007-08-06_08.19.38 Error: Bacula 2.0.3 
(06Mar07): 06-Aug-2007 09:53:55
  JobId: 421
  Job: VerifySMTCZB0003.2007-08-06_08.19.38
  FileSet: SMTCZB0003
  Verify Level: VolumeToCatalog
  Client: SMTCZB0003
  Verify JobId: 415
  Verify Job: SMTCZB0003
  Start time: 06-Aug-2007 08:19:40
  End time: 06-Aug-2007 09:53:55
  Files Expected: 370,351
  Files Examined: 321,504
  Non-fatal FD errors: 0
  FD termination status: Error
  SD termination status: Error
  Termination: *** Verify Error ***




BTW: I'm wondering about:

06-Aug 09:46 VU0EM005: The following files are in the Catalog but not on disk:
06-Aug 09:46 VU0EM005: /home/mksadmin/.bash_history".

During this verify job I noticed high network traffic between the dir and fd.
Is this normal? I thought VolumeToCatalog compares the meta data saved on tape
with the meta data in the catalog and does not compare the actual data on disk?

Second thing:

06-Aug 09:46 VU0EM005: New file: 
/home/mksadmin/MK<FF><FF><FF><FA>S/IntegrityServer/mksis/bulk/^home/mksadmin/Projects/ERW_Projekte/LEGO/Library/Tools/csd/basics/std/docu/en/html/rcs/node2.html/1.5

What are these <FF> characters about? The file is with the correct name in the 
db and on tape!


The file /home/mksadmin/.bash_history (I guess all files of jobid 415) was
backed up correctly. There were no errors during backup. I can find the
filename in the db and restore it from tape.  

database:

Enter path with trailing slash: /home/mksadmin/
Enter filename: .bash_history
Enter Client name: SMTCZB0003
+-------+---------------------+------------+-------+------------+
| jobid | jobstarttime | volumename | level | clientname |
+-------+---------------------+------------+-------+------------+
| 415 | 2007-08-05 00:07:06 | 06D128L3 | F | SMTCZB0003 |
[...]


restore:

Enter JobId(s), comma separated, to restore: 415
You have selected the following JobId: 415

Building directory tree for JobId 415 ...  +++++++++++++++++++++++++++++++++
1 Job, 307,361 files inserted into the tree.
[...]
cwd is: /
$ cd /home/mksadmin
cwd is: /home/mksadmin/
$ ls .bash_history
.bash_history

The restore job finished without errors. So the file is in the db and on tape
but I get an error message during verify.




Client VU0EM003:
(client not behind a firewall in the same subnet as the the dir)

06-Aug 10:25 VU0EM005: Verifying against JobId=416 
Job=VU0EM003.2007-08-05_00.05.01
06-Aug 10:25 VU0EM005: Bootstrap records written to 
/opt/bacula/var/bacula/working/VU0EM005.restore.11.bsr
06-Aug 10:25 VU0EM005:
06-Aug 10:25 VU0EM005: The job will require the following
   Volume(s) Storage(s) SD Device(s)
===========================================================================
06-Aug 10:25 VU0EM005:
06-Aug 10:25 VU0EM005: 06D128L3 NEC-T40A NEC-T40A
06-Aug 10:25 VU0EM005: 06D138L3 NEC-T40A NEC-T40A
06-Aug 10:25 VU0EM005:
06-Aug 10:25 VU0EM005: Start Verify JobId=422 Level=VolumeToCatalog 
Job=VerifyVU0EM003.2007-08-06_10.25.16
06-Aug 10:25 VU0EM005: 3307 Issuing autochanger "unload slot 19, drive 0" 
command.
06-Aug 10:26 VU0EM005: 3304 Issuing autochanger "load slot 9, drive 0" command.
06-Aug 10:26 VU0EM005: 3305 Autochanger "load slot 9, drive 0", status is OK.
06-Aug 10:26 VU0EM005: 3301 Issuing autochanger "loaded? drive 0" command.
06-Aug 10:26 VU0EM005: 3302 Autochanger "loaded? drive 0", result is Slot 9.
06-Aug 10:26 VU0EM005: Ready to read from volume "06D128L3" on device "LTO3" 
(/dev/ULTRIUM-TD3).
06-Aug 10:26 VU0EM005: Forward spacing Volume "06D128L3" to file:block 68:0.
06-Aug 10:28 VU0EM005: End of file 69 on device "LTO3" (/dev/ULTRIUM-TD3), 
Volume "06D128L3"
[...]
06-Aug 11:03 VU0EM005: VerifyVU0EM003.2007-08-06_10.25.16 Fatal error: 
read.c:139 Error sending to File daemon. ERR=Die Wartezeit für die Verbindung 
ist abgelaufen
06-Aug 11:03 VU0EM005: VerifyVU0EM003.2007-08-06_10.25.16 Error: bnet.c:439 
Write error sending 65536 bytes to client:10.60.1.252:36643: ERR=Die Wartezeit 
für die Verbindung ist abgelaufen
[...the original verify job showed these errors after Note: 0000002 hours, I 
just didn't want to wait these 2 hours again, so it's juts cut & paste]
VU0EM003: VerifyVU0EM003.2007-08-06_10.25 Fatal error: verify_vol.c:108 Data 
record error. ERR=Connection reset by peer
VU0EM005: The following files are in the Catalog but not on disk:



I a bit lost what is going wrong here and what to check next. This did not
happen with verify jobs that were run against inc or diff jobids.


I changed the Heartbeat Interval for the SD from 10min to 5min and restarted
the bacula-sd. Then I started the the first verify job again. This time I got
only one message about missing files (/boot).

Right after that I also reran the verify for the second job, but this job
almost imediatly failed with an error. I could reproduce this 3 times in a row.

06-Aug 14:02 VU0EM005: Verifying against JobId=416 
Job=VU0EM003.2007-08-05_00.05.01
06-Aug 14:02 VU0EM005: Bootstrap records written to 
/opt/bacula/var/bacula/working/VU0EM005.restore.15.bsr
06-Aug 14:02 VU0EM005:
06-Aug 14:02 VU0EM005: The job will require the following
   Volume(s) Storage(s) SD Device(s)
===========================================================================
06-Aug 14:02 VU0EM005:
06-Aug 14:02 VU0EM005: 06D128L3 NEC-T40A NEC-T40A
06-Aug 14:02 VU0EM005: 06D138L3 NEC-T40A NEC-T40A
06-Aug 14:02 VU0EM005:
06-Aug 14:02 VU0EM005: Start Verify JobId=426 Level=VolumeToCatalog 
Job=VerifyVU0EM003.2007-08-06_14.02.42
06-Aug 14:04 VU0EM005: Ready to read from volume "06D128L3" on device "LTO3" 
(/dev/ULTRIUM-TD3).
06-Aug 14:04 VU0EM005: Forward spacing Volume "06D128L3" to file:block 68:0.
06-Aug 14:05 VU0EM005: End of file 69 on device "LTO3" (/dev/ULTRIUM-TD3), 
Volume "06D128L3"
06-Aug 14:06 VU0EM005: New file: 
/usr/s<FF><FF><FF><FA>rc/linux-headers-2.6.18-3-amd64/include/config/flat/node/mem/
06-Aug 14:06 VU0EM005:
06-Aug 14:06 VU0EM005: The following files are in the Catalog but not on disk:
06-Aug 14:06 VU0EM005: 
/server/wikis/farm/erw-test-wiki/data/pages/TestSnapArcGetEdgeRadius/revisions/00000001
06-Aug 14:06 VU0EM005: 
/server/wikis/farm/erw-test-wiki/data/pages/TestSnapArcGetEdgeRadius/revisions/00000002
[...]
06-Aug 14:06 VU0EM005: 
/server/projekte/ERW/Freigaben/Freigaben/SRS110_BR216/Freigaben/V0263_Serienfreigabe/HiL/Errormanagement_ABA_CMS_PAS_V0263_051006_Test_291106/m148_int-dtr-03_t3_/Auswert_Daten/m148_INT-DTR-03_t3_-06-12-0
1-10-25-02.rar
06-Aug 14:06 VU0EM005: VerifyVU0EM003.2007-08-06_14.02.42 Fatal error: 
bnet.c:241 Packet size too big from "File daemon:10.60.1.252:9102. Terminating 
connection.
06-Aug 14:06 VU0EM005: VerifyVU0EM003.2007-08-06_14.02.42 Fatal error: No Job 
status returned from FD.
06-Aug 14:08 VU0EM005: VerifyVU0EM003.2007-08-06_14.02.42 Error: Bacula 2.0.3 
(06Mar07): 06-Aug-2007 14:08:12
  JobId: 426
  Job: VerifyVU0EM003.2007-08-06_14.02.42
  FileSet: VU0EM003
  Verify Level: VolumeToCatalog
  Client: VU0EM003
  Verify JobId: 416
  Verify Job: VU0EM003
  Start time: 06-Aug-2007 14:02:44
  End time: 06-Aug-2007 14:08:12
  Files Expected: 1,631,258
  Files Examined: 34,757
  Non-fatal FD errors: 0
  FD termination status: Error
  SD termination status: Error
  Termination: *** Verify Error ***

06-Aug 14:08 VU0EM005: message.c:481 Mail prog: bsmtp: bsmtp.c:92 Fatal 
malformed reply from localhost: 552 5.3.4 Error: message file too big
06-Aug 14:08 VU0EM005: VerifyVU0EM003.2007-08-06_14.02.42 Error: message.c:492 
Mail program terminated in error.
CMD=/opt/bacula/sbin/bsmtp -h localhost -f "(Bacula) [EMAIL PROTECTED]" -s 
"Bacula: Verify Fatal Error of VU0EM003 Verify Volume to Catalog" [EMAIL 
PROTECTED] ERR=Child exited with code 1



I've np problems with backups, the inc and diff verifies were ok for the last 2
weeks. But this two full backup verify jobs are always failing.

Ralf





-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to