One of our Virtual Full jobs has been failing every day during Always Incremental consolidation, and I'm having trouble figuring out why.
*list joblog jobid=18065 Automatically selected Catalog: MyCatalog Using Catalog "MyCatalog" 2024-07-23 09:01:22 bareos-dir-prod JobId 18065: Start Virtual Backup JobId 18065, Job=node0057-AI.2024-07-23_09.00.04_41 2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Bootstrap records written to /var/lib/bareos/bareos-dir-prod.restore.45.bsr 2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Consolidating JobIds 17646,13356,13422,13488,13556,13625,13694 containing 2025684 files 2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Connected Storage daemon at bareos-sd-t1-prod.foo.bar.edu:9103, encryption: TLS_CHACHA20_POLY1305_SHA256 TLSv1.3 2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Encryption: TLS_CHACHA20_POLY1305_SHA256 TLSv1.3 2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Using Device "FileStorage5" to read. 2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Max configured use duration=72,000 sec. exceeded. Marking Volume "node0057-AI-Consolidated-12250" as Used. 2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Created new Volume "node0057-AI-Consolidated-12276" in catalog. 2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Using Device "FileStorageConsolidated5" to write. 2024-07-23 09:01:41 bareos-sd-t1-prod JobId 18065: Labeled new Volume "node0057-AI-Consolidated-12276" on device "FileStorageConsolidated5" (/var/lib/bareos/storage). 2024-07-23 09:01:41 bareos-sd-t1-prod JobId 18065: Wrote label to prelabeled Volume "node0057-AI-Consolidated-12276" on device "FileStorageConsolidated5" (/var/lib/bareos/storage) 2024-07-23 09:01:41 bareos-sd-t1-prod JobId 18065: Ready to read from volume "node0057-AI-Consolidated-6560" on device "FileStorage5" (/var/lib/bareos/storage). 2024-07-23 09:01:41 bareos-sd-t1-prod JobId 18065: Forward spacing Volume "node0057-AI-Consolidated-6560" to file:block 0:274. 2024-07-23 09:02:15 bareos-dir-prod JobId 18065: Insert of attributes batch table with 800001 entries start 2024-07-23 09:02:49 bareos-dir-prod JobId 18065: Insert of attributes batch table done 2024-07-23 09:02:49 bareos-dir-prod JobId 18065: Fatal error: Director's comm line to SD dropped. 2024-07-23 09:02:49 bareos-dir-prod JobId 18065: Insert of attributes batch table with 3303 entries start 2024-07-23 09:02:49 bareos-dir-prod JobId 18065: Insert of attributes batch table done 2024-07-23 09:02:49 bareos-dir-prod JobId 18065: Replicating deleted files from jobids 17646,13356,13422,13488,13556,13625,13694 to jobid 18065 2024-07-23 09:03:03 bareos-dir-prod JobId 18065: Error: Bareos bareos-dir-prod 22.1.4 (28Feb24): Build OS: Red Hat Enterprise Linux release 9.1 (Plow) JobId: 18065 Job: node0057-AI.2024-07-23_09.00.04_41 Backup Level: Virtual Full Client: "node0057.foo.bar.edu-fd" 22.1.5 (04Jun24) Red Hat Enterprise Linux Server release 7.9 (Maipo),redhat FileSet: "LinuxAll" 2023-10-22 13:39:16 Pool: "node0057-AI-Consolidated" (From Job Pool's NextPool resource) Catalog: "MyCatalog" (From Client resource) Storage: "bareos-sd-t1-prod-Consolidated" (From Storage from Pool's NextPool resource) Scheduled time: 23-Jul-2024 09:00:04 Start time: 21-May-2024 02:00:01 End time: 21-May-2024 02:30:31 Elapsed time: 30 mins 30 secs Priority: 10 SD Files Written: 0 SD Bytes Written: 110,846,184 (110.8 MB) Rate: 60.6 KB/s Volume name(s): node0057-AI-Consolidated-12276 Volume Session Id: 19 Volume Session Time: 1721645047 Last Volume Bytes: 275 (275 B) SD Errors: 0 SD termination status: Error Accurate: yes Bareos binary info: Bareos subscription release Job triggered by: User Termination: *** Backup Error *** * Director and SD are separate hosts, and this issue seems to persist only with jobs from this client, node0057. I enabled debug tracing on the SD but haven't seen anything that makes sense to me. ... bareos-sd-t1-prod (200): stored/mac.cc:195-18065 before write JobId=18065 FI=804499 SessId=19 Strm=1998 len=65 bareos-sd-t1-prod (200): stored/mac.cc:195-18065 before write JobId=18065 FI=804499 SessId=19 Strm=MD5 len=16 bareos-sd-t1-prod (100): stored/mac.cc:655-18065 ok=0 bareos-sd-t1-prod (130): stored/label.cc:627-18065 session_label record=fc052df8 bareos-sd-t1-prod (150): stored/label.cc:652-18065 Write sesson_label record JobId=18065 FI=EOS_LABEL SessId=19 Strm=18065 len=234 remainder=0 bareos-sd-t1-prod (150): stored/label.cc:660-18065 Leave WriteSessionLabel Block=390293886d File=0d bareos-sd-t1-prod (100): stored/block.cc:567-18065 return WriteBlockToDev, job is canceled bareos-sd-t1-prod (100): stored/mac.cc:684-18065 Set ok=FALSE after WriteBlockToDevice. bareos-sd-t1-prod (200): stored/mac.cc:687-18065 Flush block to device pos 0:390293886 bareos-sd-t1-prod (100): stored/acquire.cc:538-18065 releasing device "FileStorageConsolidated5" (/var/lib/bareos/storage) bareos-sd-t1-prod (100): stored/acquire.cc:560-18065 There are 0 writers in ReleaseDevice bareos-sd-t1-prod (50): stored/askdir.cc:366-18065 >dird UpdCat Job=node0057-AI.2024-07-23_09.00.04_41 FileAttributes bareos-sd-t1-prod (50): stored/askdir.cc:369-18065 create_jobmedia error BnetRecv bareos-sd-t1-prod (200): stored/mac.cc:229-18062 bareos-sd-t1-prod (200): stored/acquire.cc:568-18065 ===== Wrote block new pos 2:4028935146 bareos-sd-t1-prod (50): stored/askdir.cc:298-18065 Update cat VolBytes=390293887 bareos-sd-t1-prod (50): stored/askdir.cc:317-18065 >dird bareos-sd-t1-prod (200): stored/acquire.cc:587-18065 dir_update_vol_info. Release vol=node0057-AI-Consolidated-12276 dev="FileStorageConsolidated5" (/var/lib/bareos/storage) bareos-sd-t1-prod (150): stored/vol_mgr.cc:695-18065 === clear in_use vol=node0057-AI-Consolidated-12276 bareos-sd-t1-prod (150): stored/vol_mgr.cc:712-18065 === set not reserved vol=node0057-AI-Consolidated-12276 num_writers=0 dev_reserved=0 dev="FileStorageConsolidated5" (/var/lib/bareos/storage) bareos-sd-t1-prod (150): stored/vol_mgr.cc:740-18065 === clear in_use vol=node0057-AI-Consolidated-12276 bareos-sd-t1-prod (150): stored/vol_mgr.cc:751-18065 === remove volume node0057-AI-Consolidated-12276 dev="FileStorageConsolidated5" (/var/lib/bareos/storage) ... I can provide client/job/pool/storage configurations if they seem relevant here, and am continuing to poke at this myself. Thanks for any thoughts on troubleshooting. Josh -- You received this message because you are subscribed to the Google Groups "bareos-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to bareos-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/bareos-users/4a802f7a-69e6-4cff-9756-f6f089c8aa3en%40googlegroups.com.