Hi Joshua, Not sure if that feasible, but it may be interesting to raise the debug level to 500 on the sd and got only that job running ? Having the related configuration may also help people to understand the data flow and maybe spot something in.
Regards. On Tuesday 23 July 2024 at 19:04:02 UTC+2 Joshua Myles wrote: > Another thing I just noticed is that all of the failed jobs had an Elapsed > time right around 30 minutes, some just under 30 but all under 31. I did > rule out the (external) firewall timeout, which is 240 minutes, and am > looking through OS and application timeouts for anything around 30 minutes. > > Josh > > On Tue, Jul 23, 2024 at 11:28 AM Joshua Myles <jam...@mtu.edu> wrote: > >> One of our Virtual Full jobs has been failing every day during Always >> Incremental consolidation, and I'm having trouble figuring out why. >> >> *list joblog jobid=18065 >> Automatically selected Catalog: MyCatalog >> Using Catalog "MyCatalog" >> 2024-07-23 09:01:22 bareos-dir-prod JobId 18065: Start Virtual Backup >> JobId 18065, Job=node0057-AI.2024-07-23_09.00.04_41 >> 2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Bootstrap records >> written to /var/lib/bareos/bareos-dir-prod.restore.45.bsr >> 2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Consolidating JobIds >> 17646,13356,13422,13488,13556,13625,13694 containing 2025684 files >> 2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Connected Storage >> daemon at bareos-sd-t1-prod.foo.bar.edu:9103, encryption: >> TLS_CHACHA20_POLY1305_SHA256 TLSv1.3 >> 2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Encryption: >> TLS_CHACHA20_POLY1305_SHA256 TLSv1.3 >> 2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Using Device >> "FileStorage5" to read. >> 2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Max configured use >> duration=72,000 sec. exceeded. Marking Volume >> "node0057-AI-Consolidated-12250" as Used. >> 2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Created new Volume >> "node0057-AI-Consolidated-12276" in catalog. >> 2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Using Device >> "FileStorageConsolidated5" to write. >> 2024-07-23 09:01:41 bareos-sd-t1-prod JobId 18065: Labeled new Volume >> "node0057-AI-Consolidated-12276" on device "FileStorageConsolidated5" >> (/var/lib/bareos/storage). >> 2024-07-23 09:01:41 bareos-sd-t1-prod JobId 18065: Wrote label to >> prelabeled Volume "node0057-AI-Consolidated-12276" on device >> "FileStorageConsolidated5" (/var/lib/bareos/storage) >> 2024-07-23 09:01:41 bareos-sd-t1-prod JobId 18065: Ready to read from >> volume "node0057-AI-Consolidated-6560" on device "FileStorage5" >> (/var/lib/bareos/storage). >> 2024-07-23 09:01:41 bareos-sd-t1-prod JobId 18065: Forward spacing >> Volume "node0057-AI-Consolidated-6560" to file:block 0:274. >> 2024-07-23 09:02:15 bareos-dir-prod JobId 18065: Insert of attributes >> batch table with 800001 entries start >> 2024-07-23 09:02:49 bareos-dir-prod JobId 18065: Insert of attributes >> batch table done >> 2024-07-23 09:02:49 bareos-dir-prod JobId 18065: Fatal error: Director's >> comm line to SD dropped. >> 2024-07-23 09:02:49 bareos-dir-prod JobId 18065: Insert of attributes >> batch table with 3303 entries start >> 2024-07-23 09:02:49 bareos-dir-prod JobId 18065: Insert of attributes >> batch table done >> 2024-07-23 09:02:49 bareos-dir-prod JobId 18065: Replicating deleted >> files from jobids 17646,13356,13422,13488,13556,13625,13694 to jobid 18065 >> 2024-07-23 09:03:03 bareos-dir-prod JobId 18065: Error: Bareos >> bareos-dir-prod 22.1.4 (28Feb24): >> Build OS: Red Hat Enterprise Linux release 9.1 (Plow) >> JobId: 18065 >> Job: node0057-AI.2024-07-23_09.00.04_41 >> Backup Level: Virtual Full >> Client: "node0057.foo.bar.edu-fd" 22.1.5 (04Jun24) Red >> Hat Enterprise Linux Server release 7.9 (Maipo),redhat >> FileSet: "LinuxAll" 2023-10-22 13:39:16 >> Pool: "node0057-AI-Consolidated" (From Job Pool's >> NextPool resource) >> Catalog: "MyCatalog" (From Client resource) >> Storage: "bareos-sd-t1-prod-Consolidated" (From Storage >> from Pool's NextPool resource) >> Scheduled time: 23-Jul-2024 09:00:04 >> Start time: 21-May-2024 02:00:01 >> End time: 21-May-2024 02:30:31 >> Elapsed time: 30 mins 30 secs >> Priority: 10 >> SD Files Written: 0 >> SD Bytes Written: 110,846,184 (110.8 MB) >> Rate: 60.6 KB/s >> Volume name(s): node0057-AI-Consolidated-12276 >> Volume Session Id: 19 >> Volume Session Time: 1721645047 >> Last Volume Bytes: 275 (275 B) >> SD Errors: 0 >> SD termination status: Error >> Accurate: yes >> Bareos binary info: Bareos subscription release >> Job triggered by: User >> Termination: *** Backup Error *** >> >> * >> >> Director and SD are separate hosts, and this issue seems to persist only >> with jobs from this client, node0057. I enabled debug tracing on the SD but >> haven't seen anything that makes sense to me. >> >> ... >> bareos-sd-t1-prod (200): stored/mac.cc:195-18065 before write JobId=18065 >> FI=804499 SessId=19 Strm=1998 len=65 >> bareos-sd-t1-prod (200): stored/mac.cc:195-18065 before write JobId=18065 >> FI=804499 SessId=19 Strm=MD5 len=16 >> bareos-sd-t1-prod (100): stored/mac.cc:655-18065 ok=0 >> bareos-sd-t1-prod (130): stored/label.cc:627-18065 session_label >> record=fc052df8 >> bareos-sd-t1-prod (150): stored/label.cc:652-18065 Write sesson_label >> record JobId=18065 FI=EOS_LABEL SessId=19 Strm=18065 len=234 remainder=0 >> bareos-sd-t1-prod (150): stored/label.cc:660-18065 Leave >> WriteSessionLabel Block=390293886d File=0d >> bareos-sd-t1-prod (100): stored/block.cc:567-18065 return >> WriteBlockToDev, job is canceled >> bareos-sd-t1-prod (100): stored/mac.cc:684-18065 Set ok=FALSE after >> WriteBlockToDevice. >> bareos-sd-t1-prod (200): stored/mac.cc:687-18065 Flush block to device >> pos 0:390293886 >> bareos-sd-t1-prod (100): stored/acquire.cc:538-18065 releasing device >> "FileStorageConsolidated5" (/var/lib/bareos/storage) >> bareos-sd-t1-prod (100): stored/acquire.cc:560-18065 There are 0 writers >> in ReleaseDevice >> bareos-sd-t1-prod (50): stored/askdir.cc:366-18065 >dird UpdCat >> Job=node0057-AI.2024-07-23_09.00.04_41 FileAttributes bareos-sd-t1-prod >> (50): stored/askdir.cc:369-18065 create_jobmedia error BnetRecv >> bareos-sd-t1-prod (200): stored/mac.cc:229-18062 bareos-sd-t1-prod (200): >> stored/acquire.cc:568-18065 ===== Wrote block new pos 2:4028935146 >> bareos-sd-t1-prod (50): stored/askdir.cc:298-18065 Update cat >> VolBytes=390293887 >> bareos-sd-t1-prod (50): stored/askdir.cc:317-18065 >dird >> bareos-sd-t1-prod (200): stored/acquire.cc:587-18065 dir_update_vol_info. >> Release vol=node0057-AI-Consolidated-12276 dev="FileStorageConsolidated5" >> (/var/lib/bareos/storage) >> bareos-sd-t1-prod (150): stored/vol_mgr.cc:695-18065 === clear in_use >> vol=node0057-AI-Consolidated-12276 >> bareos-sd-t1-prod (150): stored/vol_mgr.cc:712-18065 === set not reserved >> vol=node0057-AI-Consolidated-12276 num_writers=0 dev_reserved=0 >> dev="FileStorageConsolidated5" (/var/lib/bareos/storage) >> bareos-sd-t1-prod (150): stored/vol_mgr.cc:740-18065 === clear in_use >> vol=node0057-AI-Consolidated-12276 >> bareos-sd-t1-prod (150): stored/vol_mgr.cc:751-18065 === remove volume >> node0057-AI-Consolidated-12276 dev="FileStorageConsolidated5" >> (/var/lib/bareos/storage) >> ... >> >> I can provide client/job/pool/storage configurations if they seem >> relevant here, and am continuing to poke at this myself. Thanks for any >> thoughts on troubleshooting. >> >> Josh >> >> -- >> You received this message because you are subscribed to the Google Groups >> "bareos-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to bareos-users...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/bareos-users/4a802f7a-69e6-4cff-9756-f6f089c8aa3en%40googlegroups.com >> >> <https://groups.google.com/d/msgid/bareos-users/4a802f7a-69e6-4cff-9756-f6f089c8aa3en%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > > > -- > Joshua Myles > he/him > Senior Storage Administrator > Michigan Tech IT > > We can help. > mtu.edu/it > 906-369-1870 <(906)%20369-1870> > -- You received this message because you are subscribed to the Google Groups "bareos-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to bareos-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/bareos-users/6f63eed5-fe24-47c4-991c-18293155605cn%40googlegroups.com.