Hi all,

i'm having problems with virtual full backup on one of our clients. Everything was working fine for years but, suddenly, Bacula is unable to complete the job - i was trying to run it about 10 times during past few days, all of them ends with the same weird error.

The error is occuring near of the end of the job, after 240 volumes were created (we have a limit of 5 GB for volume size) and 1,2 TB of data was copied.

Here are last lines from job report:

11-jan 05:40 server08-sd JobId 56875: Ready to read from volume "cloud0015-16052" on File device "cloud0015-file-storage" (/backup/cloud0015). 11-jan 05:40 server08-sd JobId 56875: Forward spacing Volume "cloud0015-16052" to addr=226 11-jan 05:40 server00-dir JobId 56875: Volume used once. Marking Volume "cloud0015-17313" as Used. 11-jan 05:40 server08-sd JobId 56875: End of medium on Volume "cloud0015-17313" Bytes=5,368,688,735 Blocks=83,220 at 11-Jan-2019 05:40. 11-jan 05:40 server00-dir JobId 56875: Created new Volume="cloud0015-17314", Pool="cloud0015-pool", MediaType="File-cloud0015" in catalog. 11-jan 05:40 server08-sd JobId 56875: Labeled new Volume "cloud0015-17314" on File device "cloud0015-file-storage-2" (/backup/cloud0015). 11-jan 05:40 server08-sd JobId 56875: Wrote label to prelabeled Volume "cloud0015-17314" on File device "cloud0015-file-storage-2" (/backup/cloud0015) 11-jan 05:40 server00-dir JobId 56875: Volume used once. Marking Volume "cloud0015-17314" as Used. 11-jan 05:40 server08-sd JobId 56875: New volume "cloud0015-17314" mounted on device "cloud0015-file-storage-2" (/backup/cloud0015) at 11-Jan-2019 05:40. 11-jan 05:40 server00-dir JobId 56875: Volume used once. Marking Volume "cloud0015-17314" as Used. 11-jan 05:40 server00-dir JobId 56875: Volume used once. Marking Volume "cloud0015-17314" as Used. 11-jan 05:41 server00-dir JobId 56875: Volume used once. Marking Volume "cloud0015-17314" as Used. 11-jan 05:41 server00-dir JobId 56875: Volume used once. Marking Volume "cloud0015-17314" as Used. 11-jan 05:41 server08-sd JobId 56875: End of Volume "cloud0015-16052" at addr=5368688648 on device "cloud0015-file-storage" (/backup/cloud0015). 11-jan 05:41 server08-sd JobId 56875: Ready to read from volume "cloud0015-16053" on File device "cloud0015-file-storage" (/backup/cloud0015). 11-jan 05:41 server08-sd JobId 56875: Forward spacing Volume "cloud0015-16053" to addr=226 11-jan 05:41 server00-dir JobId 56875: Volume used once. Marking Volume "cloud0015-17314" as Used. 11-jan 05:41 server08-sd JobId 56875: End of medium on Volume "cloud0015-17314" Bytes=5,368,688,678 Blocks=83,220 at 11-Jan-2019 05:41. 11-jan 05:41 server00-dir JobId 56875: Created new Volume="cloud0015-17315", Pool="cloud0015-pool", MediaType="File-cloud0015" in catalog. 11-jan 05:41 server08-sd JobId 56875: Labeled new Volume "cloud0015-17315" on File device "cloud0015-file-storage-2" (/backup/cloud0015). 11-jan 05:41 server08-sd JobId 56875: Wrote label to prelabeled Volume "cloud0015-17315" on File device "cloud0015-file-storage-2" (/backup/cloud0015) 11-jan 05:41 server00-dir JobId 56875: Volume used once. Marking Volume "cloud0015-17315" as Used. 11-jan 05:41 server08-sd JobId 56875: New volume "cloud0015-17315" mounted on device "cloud0015-file-storage-2" (/backup/cloud0015) at 11-Jan-2019 05:41. 11-jan 05:41 server00-dir JobId 56875: Volume used once. Marking Volume "cloud0015-17315" as Used. 11-jan 05:42 server00-dir JobId 56875: Volume used once. Marking Volume "cloud0015-17315" as Used. 11-jan 05:42 server00-dir JobId 56875: Volume used once. Marking Volume "cloud0015-17315" as Used. 11-jan 05:42 server00-dir JobId 56875: Volume used once. Marking Volume "cloud0015-17315" as Used. 11-jan 05:42 server00-dir JobId 56875: Error: Director's connection to SD for this Job was lost.





This is what is logged to syslog on SD side:

Jan 11 05:42:43 server08 bacula-sd[13585]: server08-sd: vbackup.c:130-56875 Fatal append error on device "cloud0015-file-storage-2" (/backup/cloud0015): ERR=file_dev.c:190 Could not open(/backup/cloud0015/cloud0015-17315,OPEN_READ_WRITE,0640): ERR=No such file or directory








There's no reason why the volume wasn't created. Permissions are ok:

# ls -ld /backup/cloud0015/
drwxr-x--- 2 bacula tape 36864 Jan 11 05:41 /backup/cloud0015/


Free disk space is sufficient:

# df -h /backup/
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdc1       7.3T  5.1T  2.2T  70% /backup







Another strange thing is that while virtual full was running, new incremental backup was queue for that client. After full failed, incremetnal was run AND completed fine while using THAT ONE ERRONEOUS VOLUME:

11-jan 05:42 server00-dir JobId 56897: Start Backup JobId 56897, Job=cloud0015.2019-01-11_01.00.17_25 11-jan 05:42 server00-dir JobId 56897: There are no more Jobs associated with Volume "cloud0015-17315". Marking it purged. 11-jan 05:42 server00-dir JobId 56897: All records pruned from Volume "cloud0015-17315"; marking it "Purged"
11-jan 05:42 server00-dir JobId 56897: Recycled volume "cloud0015-17315"
11-jan 05:42 server00-dir JobId 56897: Using Device "cloud0015-file-storage" to write. 11-jan 05:42 cloud0015-fd JobId 56897: shell command: run ClientRunBeforeJob "/var/www/backends/webbackend/misc/mysql_backup.py"
11-jan 05:42 server00-dir JobId 56897: Sending Accurate information to the FD.
11-jan 05:52 server08-sd JobId 56897: Recycled volume "cloud0015-17315" on File device "cloud0015-file-storage" (/backup/cloud0015), all previous data lost. 11-jan 05:52 server00-dir JobId 56897: Volume used once. Marking Volume "cloud0015-17315" as Used. 11-jan 05:58 server00-dir JobId 56897: Volume used once. Marking Volume "cloud0015-17315" as Used. 11-jan 06:03 server00-dir JobId 56897: Volume used once. Marking Volume "cloud0015-17315" as Used. 11-jan 06:06 server00-dir JobId 56897: Volume used once. Marking Volume "cloud0015-17315" as Used. 11-jan 06:13 server00-dir JobId 56897: Volume used once. Marking Volume "cloud0015-17315" as Used. 11-jan 06:16 server08-sd JobId 56897: Elapsed time=00:24:42, Transfer rate=3.260 M Bytes/second 11-jan 06:16 server00-dir JobId 56897: Volume used once. Marking Volume "cloud0015-17315" as Used. 11-jan 06:16 server08-sd JobId 56897: Sending spooled attrs to the Director. Despooling 10,006,797 bytes ...




This starts to happen on Bacula 7.4.4 but after few days i decided to upgrade to 9.2.1, problem persists. Debian linux, 64bit, i'm using original Debian packages of Bacula.

I also checked the health of HDD where backups are stored, done short and long SMART tests, file system checks etc. No errors at all.

Any hints? Thanks

azur




_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to