Hello Azur

It looks that problem was an error on the connection:



*11-jan 05:42 server00-dir JobId 56875: Error: Director's connection to  SD
for this Job was lost.  *

You could try enable "Heartbeat Interval in your config files:
https://github.com/wanderleihuttel/bacula-utils/blob/master/dicas/heartbeat_interval.md

Best regards

*Wanderlei Hüttel*
http://www.bacula.com.br


Em sex, 11 de jan de 2019 às 06:08, <azu...@pobox.sk> escreveu:

> Hi all,
>
> i'm having problems with virtual full backup on one of our clients.
> Everything was working fine for years but, suddenly, Bacula is unable
> to complete the job - i was trying to run it about 10 times during
> past few days, all of them ends with the same weird error.
>
> The error is occuring near of the end of the job, after 240 volumes
> were created (we have a limit of 5 GB for volume size) and 1,2 TB of
> data was copied.
>
> Here are last lines from job report:
>
> 11-jan 05:40 server08-sd JobId 56875: Ready to read from volume
> "cloud0015-16052" on File device "cloud0015-file-storage"
> (/backup/cloud0015).
> 11-jan 05:40 server08-sd JobId 56875: Forward spacing Volume
> "cloud0015-16052" to addr=226
> 11-jan 05:40 server00-dir JobId 56875: Volume used once. Marking
> Volume "cloud0015-17313" as Used.
> 11-jan 05:40 server08-sd JobId 56875: End of medium on Volume
> "cloud0015-17313" Bytes=5,368,688,735 Blocks=83,220 at 11-Jan-2019
> 05:40.
> 11-jan 05:40 server00-dir JobId 56875: Created new
> Volume="cloud0015-17314", Pool="cloud0015-pool",
> MediaType="File-cloud0015" in catalog.
> 11-jan 05:40 server08-sd JobId 56875: Labeled new Volume
> "cloud0015-17314" on File device "cloud0015-file-storage-2"
> (/backup/cloud0015).
> 11-jan 05:40 server08-sd JobId 56875: Wrote label to prelabeled Volume
> "cloud0015-17314" on File device "cloud0015-file-storage-2"
> (/backup/cloud0015)
> 11-jan 05:40 server00-dir JobId 56875: Volume used once. Marking
> Volume "cloud0015-17314" as Used.
> 11-jan 05:40 server08-sd JobId 56875: New volume "cloud0015-17314"
> mounted on device "cloud0015-file-storage-2" (/backup/cloud0015) at
> 11-Jan-2019 05:40.
> 11-jan 05:40 server00-dir JobId 56875: Volume used once. Marking
> Volume "cloud0015-17314" as Used.
> 11-jan 05:40 server00-dir JobId 56875: Volume used once. Marking
> Volume "cloud0015-17314" as Used.
> 11-jan 05:41 server00-dir JobId 56875: Volume used once. Marking
> Volume "cloud0015-17314" as Used.
> 11-jan 05:41 server00-dir JobId 56875: Volume used once. Marking
> Volume "cloud0015-17314" as Used.
> 11-jan 05:41 server08-sd JobId 56875: End of Volume "cloud0015-16052"
> at addr=5368688648 on device "cloud0015-file-storage"
> (/backup/cloud0015).
> 11-jan 05:41 server08-sd JobId 56875: Ready to read from volume
> "cloud0015-16053" on File device "cloud0015-file-storage"
> (/backup/cloud0015).
> 11-jan 05:41 server08-sd JobId 56875: Forward spacing Volume
> "cloud0015-16053" to addr=226
> 11-jan 05:41 server00-dir JobId 56875: Volume used once. Marking
> Volume "cloud0015-17314" as Used.
> 11-jan 05:41 server08-sd JobId 56875: End of medium on Volume
> "cloud0015-17314" Bytes=5,368,688,678 Blocks=83,220 at 11-Jan-2019
> 05:41.
> 11-jan 05:41 server00-dir JobId 56875: Created new
> Volume="cloud0015-17315", Pool="cloud0015-pool",
> MediaType="File-cloud0015" in catalog.
> 11-jan 05:41 server08-sd JobId 56875: Labeled new Volume
> "cloud0015-17315" on File device "cloud0015-file-storage-2"
> (/backup/cloud0015).
> 11-jan 05:41 server08-sd JobId 56875: Wrote label to prelabeled Volume
> "cloud0015-17315" on File device "cloud0015-file-storage-2"
> (/backup/cloud0015)
> 11-jan 05:41 server00-dir JobId 56875: Volume used once. Marking
> Volume "cloud0015-17315" as Used.
> 11-jan 05:41 server08-sd JobId 56875: New volume "cloud0015-17315"
> mounted on device "cloud0015-file-storage-2" (/backup/cloud0015) at
> 11-Jan-2019 05:41.
> 11-jan 05:41 server00-dir JobId 56875: Volume used once. Marking
> Volume "cloud0015-17315" as Used.
> 11-jan 05:42 server00-dir JobId 56875: Volume used once. Marking
> Volume "cloud0015-17315" as Used.
> 11-jan 05:42 server00-dir JobId 56875: Volume used once. Marking
> Volume "cloud0015-17315" as Used.
> 11-jan 05:42 server00-dir JobId 56875: Volume used once. Marking
> Volume "cloud0015-17315" as Used.
> 11-jan 05:42 server00-dir JobId 56875: Error: Director's connection to
> SD for this Job was lost.
>
>
>
>
>
> This is what is logged to syslog on SD side:
>
> Jan 11 05:42:43 server08 bacula-sd[13585]: server08-sd:
> vbackup.c:130-56875 Fatal append error on device
> "cloud0015-file-storage-2" (/backup/cloud0015): ERR=file_dev.c:190
> Could not
> open(/backup/cloud0015/cloud0015-17315,OPEN_READ_WRITE,0640): ERR=No
> such file or directory
>
>
>
>
>
>
>
>
> There's no reason why the volume wasn't created. Permissions are ok:
>
> # ls -ld /backup/cloud0015/
> drwxr-x--- 2 bacula tape 36864 Jan 11 05:41 /backup/cloud0015/
>
>
> Free disk space is sufficient:
>
> # df -h /backup/
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/sdc1       7.3T  5.1T  2.2T  70% /backup
>
>
>
>
>
>
>
> Another strange thing is that while virtual full was running, new
> incremental backup was queue for that client. After full failed,
> incremetnal was run AND completed fine while using THAT ONE ERRONEOUS
> VOLUME:
>
> 11-jan 05:42 server00-dir JobId 56897: Start Backup JobId 56897,
> Job=cloud0015.2019-01-11_01.00.17_25
> 11-jan 05:42 server00-dir JobId 56897: There are no more Jobs
> associated with Volume "cloud0015-17315". Marking it purged.
> 11-jan 05:42 server00-dir JobId 56897: All records pruned from Volume
> "cloud0015-17315"; marking it "Purged"
> 11-jan 05:42 server00-dir JobId 56897: Recycled volume "cloud0015-17315"
> 11-jan 05:42 server00-dir JobId 56897: Using Device
> "cloud0015-file-storage" to write.
> 11-jan 05:42 cloud0015-fd JobId 56897: shell command: run
> ClientRunBeforeJob "/var/www/backends/webbackend/misc/mysql_backup.py"
> 11-jan 05:42 server00-dir JobId 56897: Sending Accurate information to the
> FD.
> 11-jan 05:52 server08-sd JobId 56897: Recycled volume
> "cloud0015-17315" on File device "cloud0015-file-storage"
> (/backup/cloud0015), all previous data lost.
> 11-jan 05:52 server00-dir JobId 56897: Volume used once. Marking
> Volume "cloud0015-17315" as Used.
> 11-jan 05:58 server00-dir JobId 56897: Volume used once. Marking
> Volume "cloud0015-17315" as Used.
> 11-jan 06:03 server00-dir JobId 56897: Volume used once. Marking
> Volume "cloud0015-17315" as Used.
> 11-jan 06:06 server00-dir JobId 56897: Volume used once. Marking
> Volume "cloud0015-17315" as Used.
> 11-jan 06:13 server00-dir JobId 56897: Volume used once. Marking
> Volume "cloud0015-17315" as Used.
> 11-jan 06:16 server08-sd JobId 56897: Elapsed time=00:24:42, Transfer
> rate=3.260 M Bytes/second
> 11-jan 06:16 server00-dir JobId 56897: Volume used once. Marking
> Volume "cloud0015-17315" as Used.
> 11-jan 06:16 server08-sd JobId 56897: Sending spooled attrs to the
> Director. Despooling 10,006,797 bytes ...
>
>
>
>
> This starts to happen on Bacula 7.4.4 but after few days i decided to
> upgrade to 9.2.1, problem persists. Debian linux, 64bit, i'm using
> original Debian packages of Bacula.
>
> I also checked the health of HDD where backups are stored, done short
> and long SMART tests, file system checks etc. No errors at all.
>
> Any hints? Thanks
>
> azur
>
>
>
>
> _______________________________________________
> Bacula-users mailing list
> Bacula-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
>
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to