Hi Joshua,

Not sure if that feasible, but it may be interesting to raise the debug 
level to 500 on the sd and got only that job running ?
Having the related configuration may also help people to understand the 
data flow and maybe spot something in.

Regards.
On Tuesday 23 July 2024 at 19:04:02 UTC+2 Joshua Myles wrote:

> Another thing I just noticed is that all of the failed jobs had an Elapsed 
> time right around 30 minutes, some just under 30 but all under 31. I did 
> rule out the (external) firewall timeout, which is 240 minutes, and am 
> looking through OS and application timeouts for anything around 30 minutes.
>
> Josh
>
> On Tue, Jul 23, 2024 at 11:28 AM Joshua Myles <jam...@mtu.edu> wrote:
>
>> One of our Virtual Full jobs has been failing every day during Always 
>> Incremental consolidation, and I'm having trouble figuring out why.
>>
>> *list joblog jobid=18065
>> Automatically selected Catalog: MyCatalog
>> Using Catalog "MyCatalog"
>>  2024-07-23 09:01:22 bareos-dir-prod JobId 18065: Start Virtual Backup 
>> JobId 18065, Job=node0057-AI.2024-07-23_09.00.04_41
>>  2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Bootstrap records 
>> written to /var/lib/bareos/bareos-dir-prod.restore.45.bsr
>>  2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Consolidating JobIds 
>> 17646,13356,13422,13488,13556,13625,13694 containing 2025684 files
>>  2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Connected Storage 
>> daemon at bareos-sd-t1-prod.foo.bar.edu:9103, encryption: 
>> TLS_CHACHA20_POLY1305_SHA256 TLSv1.3
>>  2024-07-23 09:01:41 bareos-dir-prod JobId 18065:  Encryption: 
>> TLS_CHACHA20_POLY1305_SHA256 TLSv1.3
>>  2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Using Device 
>> "FileStorage5" to read.
>>  2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Max configured use 
>> duration=72,000 sec. exceeded. Marking Volume 
>> "node0057-AI-Consolidated-12250" as Used.
>>  2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Created new Volume 
>> "node0057-AI-Consolidated-12276" in catalog.
>>  2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Using Device 
>> "FileStorageConsolidated5" to write.
>>  2024-07-23 09:01:41 bareos-sd-t1-prod JobId 18065: Labeled new Volume 
>> "node0057-AI-Consolidated-12276" on device "FileStorageConsolidated5" 
>> (/var/lib/bareos/storage).
>>  2024-07-23 09:01:41 bareos-sd-t1-prod JobId 18065: Wrote label to 
>> prelabeled Volume "node0057-AI-Consolidated-12276" on device 
>> "FileStorageConsolidated5" (/var/lib/bareos/storage)
>>  2024-07-23 09:01:41 bareos-sd-t1-prod JobId 18065: Ready to read from 
>> volume "node0057-AI-Consolidated-6560" on device "FileStorage5" 
>> (/var/lib/bareos/storage).
>>  2024-07-23 09:01:41 bareos-sd-t1-prod JobId 18065: Forward spacing 
>> Volume "node0057-AI-Consolidated-6560" to file:block 0:274.
>>  2024-07-23 09:02:15 bareos-dir-prod JobId 18065: Insert of attributes 
>> batch table with 800001 entries start
>>  2024-07-23 09:02:49 bareos-dir-prod JobId 18065: Insert of attributes 
>> batch table done
>>  2024-07-23 09:02:49 bareos-dir-prod JobId 18065: Fatal error: Director's 
>> comm line to SD dropped.
>>  2024-07-23 09:02:49 bareos-dir-prod JobId 18065: Insert of attributes 
>> batch table with 3303 entries start
>>  2024-07-23 09:02:49 bareos-dir-prod JobId 18065: Insert of attributes 
>> batch table done
>>  2024-07-23 09:02:49 bareos-dir-prod JobId 18065: Replicating deleted 
>> files from jobids 17646,13356,13422,13488,13556,13625,13694 to jobid 18065
>>  2024-07-23 09:03:03 bareos-dir-prod JobId 18065: Error: Bareos 
>> bareos-dir-prod 22.1.4 (28Feb24):
>>   Build OS:               Red Hat Enterprise Linux release 9.1 (Plow)
>>   JobId:                  18065
>>   Job:                    node0057-AI.2024-07-23_09.00.04_41
>>   Backup Level:           Virtual Full
>>   Client:                 "node0057.foo.bar.edu-fd" 22.1.5 (04Jun24) Red 
>> Hat Enterprise Linux Server release 7.9 (Maipo),redhat
>>   FileSet:                "LinuxAll" 2023-10-22 13:39:16
>>   Pool:                   "node0057-AI-Consolidated" (From Job Pool's 
>> NextPool resource)
>>   Catalog:                "MyCatalog" (From Client resource)
>>   Storage:                "bareos-sd-t1-prod-Consolidated" (From Storage 
>> from Pool's NextPool resource)
>>   Scheduled time:         23-Jul-2024 09:00:04
>>   Start time:             21-May-2024 02:00:01
>>   End time:               21-May-2024 02:30:31
>>   Elapsed time:           30 mins 30 secs
>>   Priority:               10
>>   SD Files Written:       0
>>   SD Bytes Written:       110,846,184 (110.8 MB)
>>   Rate:                   60.6 KB/s
>>   Volume name(s):         node0057-AI-Consolidated-12276
>>   Volume Session Id:      19
>>   Volume Session Time:    1721645047
>>   Last Volume Bytes:      275 (275 B)
>>   SD Errors:              0
>>   SD termination status:  Error
>>   Accurate:               yes
>>   Bareos binary info:     Bareos subscription release
>>   Job triggered by:       User
>>   Termination:            *** Backup Error ***
>>
>> *
>>
>> Director and SD are separate hosts, and this issue seems to persist only 
>> with jobs from this client, node0057. I enabled debug tracing on the SD but 
>> haven't seen anything that makes sense to me.
>>
>> ...
>> bareos-sd-t1-prod (200): stored/mac.cc:195-18065 before write JobId=18065 
>> FI=804499 SessId=19 Strm=1998 len=65
>> bareos-sd-t1-prod (200): stored/mac.cc:195-18065 before write JobId=18065 
>> FI=804499 SessId=19 Strm=MD5 len=16
>> bareos-sd-t1-prod (100): stored/mac.cc:655-18065 ok=0
>> bareos-sd-t1-prod (130): stored/label.cc:627-18065 session_label 
>> record=fc052df8
>> bareos-sd-t1-prod (150): stored/label.cc:652-18065 Write sesson_label 
>> record JobId=18065 FI=EOS_LABEL SessId=19 Strm=18065 len=234 remainder=0
>> bareos-sd-t1-prod (150): stored/label.cc:660-18065 Leave 
>> WriteSessionLabel Block=390293886d File=0d
>> bareos-sd-t1-prod (100): stored/block.cc:567-18065 return 
>> WriteBlockToDev, job is canceled
>> bareos-sd-t1-prod (100): stored/mac.cc:684-18065 Set ok=FALSE after 
>> WriteBlockToDevice.
>> bareos-sd-t1-prod (200): stored/mac.cc:687-18065 Flush block to device 
>> pos 0:390293886
>> bareos-sd-t1-prod (100): stored/acquire.cc:538-18065 releasing device 
>> "FileStorageConsolidated5" (/var/lib/bareos/storage)
>> bareos-sd-t1-prod (100): stored/acquire.cc:560-18065 There are 0 writers 
>> in ReleaseDevice
>> bareos-sd-t1-prod (50): stored/askdir.cc:366-18065 >dird UpdCat 
>> Job=node0057-AI.2024-07-23_09.00.04_41 FileAttributes bareos-sd-t1-prod 
>> (50): stored/askdir.cc:369-18065 create_jobmedia error BnetRecv
>> bareos-sd-t1-prod (200): stored/mac.cc:229-18062 bareos-sd-t1-prod (200): 
>> stored/acquire.cc:568-18065 ===== Wrote block new pos 2:4028935146
>> bareos-sd-t1-prod (50): stored/askdir.cc:298-18065 Update cat 
>> VolBytes=390293887
>> bareos-sd-t1-prod (50): stored/askdir.cc:317-18065 >dird 
>> bareos-sd-t1-prod (200): stored/acquire.cc:587-18065 dir_update_vol_info. 
>> Release vol=node0057-AI-Consolidated-12276 dev="FileStorageConsolidated5" 
>> (/var/lib/bareos/storage)
>> bareos-sd-t1-prod (150): stored/vol_mgr.cc:695-18065 === clear in_use 
>> vol=node0057-AI-Consolidated-12276
>> bareos-sd-t1-prod (150): stored/vol_mgr.cc:712-18065 === set not reserved 
>> vol=node0057-AI-Consolidated-12276 num_writers=0 dev_reserved=0 
>> dev="FileStorageConsolidated5" (/var/lib/bareos/storage)
>> bareos-sd-t1-prod (150): stored/vol_mgr.cc:740-18065 === clear in_use 
>> vol=node0057-AI-Consolidated-12276
>> bareos-sd-t1-prod (150): stored/vol_mgr.cc:751-18065 === remove volume 
>> node0057-AI-Consolidated-12276 dev="FileStorageConsolidated5" 
>> (/var/lib/bareos/storage)
>> ...
>>
>> I can provide client/job/pool/storage configurations if they seem 
>> relevant here, and am continuing to poke at this myself. Thanks for any 
>> thoughts on troubleshooting.
>>
>> Josh
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "bareos-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to bareos-users...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/bareos-users/4a802f7a-69e6-4cff-9756-f6f089c8aa3en%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/bareos-users/4a802f7a-69e6-4cff-9756-f6f089c8aa3en%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
>
> -- 
> Joshua Myles
> he/him
> Senior Storage Administrator
> Michigan Tech IT
>
> We can help.
> mtu.edu/it
> 906-369-1870 <(906)%20369-1870>
>

-- 
You received this message because you are subscribed to the Google Groups 
"bareos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bareos-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bareos-users/6f63eed5-fe24-47c4-991c-18293155605cn%40googlegroups.com.

Reply via email to