[Bacula-users] bpipe problems on ver. 15.0.2

2024-06-14 Thread Žiga Žvan
Hi!
I'm using bacula to backup some virtual machines from my esxi hosts. It worked 
on version 9.6.5 (Centos), however I'm having problems on version 15.0.2 
(Ubuntu). Backup job ends with a success, however bacula-fd service gets killed 
in the process...
Does anybody experience similar problems?
Any suggestion how to fix this?

Kind regards,
Ziga Zvan


 Relevant part of conf 
Job {
Name = "esxi_donke_SomeHost-backup"
JobDefs = "SomeHost-job"
ClientRunBeforeJob = "sshpass -p 'SomePassword' ssh -o StrictHostKeyChecking=no 
SomeUser@esxhost.domain.local /ghettoVCB-master/ghettoVCB.sh -g 
/ghettoVCB-master/ghettoVCB.conf -m SomeHost"
ClientRunAfterJob = "sshpass -p 'SomePassword' ssh -o StrictHostKeyChecking=no 
SomeUser@esxhost.domain.local rm -rf /vmfs/volumes/ds2_raid6/backup/SomeHost"
}


FileSet {
Name = "SomeHost-fileset"
Include {
Options {
signature = MD5
Compression = GZIP1
}
Plugin = "bpipe:/mnt/bkp_SomeHost.tar:sshpass -p 'SomePassword' ssh -o 
StrictHostKeyChecking=no SomeUser@esxhost.domain.local /bin/tar -c 
/vmfs/volumes/ds2_raid6/backup/SomeHost:/bin/tar -C 
/storage/bacula/imagerestore -xvf -"
}
Exclude {
}
}

 Bacula-fd state after backup finished ###

× bacula-fd.service - Bacula File Daemon service
 Loaded: loaded (/lib/systemd/system/bacula-fd.service; enabled; vendor 
preset: enabled)
 Active: failed (Result: signal) since Tue 2024-06-11 13:00:08 CEST; 20h ago
Process: 392733 ExecStart=/opt/bacula/bin/bacula-fd -fP -c 
/opt/bacula/etc/bacula-fd.conf (code=killed, signal=SEGV)
   Main PID: 392733 (code=killed, signal=SEGV)
CPU: 3h 33min 48.142s

Jun 11 13:00:08 bacula bacula-fd[392733]: Bacula interrupted by signal 11: 
Segmentation violation
Jun 11 13:00:08 bacula bacula-fd[393952]: bsmtp: bsmtp.c:508-0 Failed to 
connect to mailhost localhost
Jun 11 13:00:08 bacula bacula-fd[392733]: The btraceback call returned 1
Jun 11 13:00:08 bacula bacula-fd[392733]: LockDump: 
/opt/bacula/working/bacula.392733.traceback
Jun 11 13:00:08 bacula bacula-fd[392733]: bacula-fd: smartall.c:418-1791 
Orphaned buffer: bacula-fd 280 bytes at 55fad3bdf278>
Jun 11 13:00:08 bacula bacula-fd[392733]: bacula-fd: smartall.c:418-1791 
Orphaned buffer: bacula-fd 280 bytes at 55fad3bdff08>
Jun 11 13:00:08 bacula bacula-fd[392733]: bacula-fd: smartall.c:418-1791 
Orphaned buffer: bacula-fd 536 bytes at 55fad3beb678>
Jun 11 13:00:08 bacula systemd[1]: bacula-fd.service: Main process exited, 
code=killed, status=11/SEGV
Jun 11 13:00:08 bacula systemd[1]: bacula-fd.service: Failed with result 
'signal'.
Jun 11 13:00:08 bacula systemd[1]: bacula-fd.service: Consumed 3h 33min 48.142s 
CPU time.


# Trace output##
Check the log files for more information.

Please install a debugger (gdb) to receive a traceback.
Attempt to dump locks
threadid=0x7f16f1023640 max=2 current=-1
threadid=0x7f16f1824640 max=2 current=-1
threadid=0x7f16f202d640 max=0 current=-1
threadid=0x7f16f2093780 max=0 current=-1
Attempt to dump current JCRs. njcrs=0
List plugins. Hook count=1
Plugin 0x55fad3b0bf28 name="bpipe-fd.so"

___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Again LTO9 and performances...

2024-06-14 Thread Marco Gaiarin
Mandi! Bill Arlofski via Bacula-users
  In chel di` si favelave...

> Hope this helps!

Thanks to all for the hints and the explainings; bacula is really a bad
beast... there's ever room for improvement! ;-)

-- 




___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Volume not being marked as Error in Catalog after S3/RGW timeout

2024-06-14 Thread Ana Emília M . Arruda
Hello Martin,

Glad to hear that!
Thanks a lot for the feedback. :-)
You are very welcome!

Best regards,
Ana

On Fri, Jun 14, 2024 at 10:26 AM Martin Reissner 
wrote:

> Hello Ana,
>
> I wanted to report back that after almost two weeks on Bacula 15.0.2 using
> the "Amazon" driver, the situation has improved a lot. We have seen no more
> backups failing because of timeouts. I had some errors with Copyjobs that
> are
> reading from S3/RGW and writing to file storage but they might have been
> caused by something else and it was only few errors. Thank you for the
> heads up with the "Amazon" driver!
>
> All the best,
>
> Martin
>
> On 16.05.24 17:58, Ana Emília M. Arruda wrote:
> > Hello Martin,
> >
> > Yes, the Amazon driver will help with the timeout issues. And we have
> been improving the Amazon driver continuously. Thus, I would move to this
> one. If you still see this issue related to the volume not marked as error,
> then we should investigate it.
> >
> > Best,
> > Ana
> >
> > On Wed, May 15, 2024 at 12:04 PM Martin Reissner  > wrote:
> >
> > Hello Ana,
> >
> > thank you for the heads up. An upgrade to one of the more recent
> versions has been in my backlog for a while now, maybe this will get me
> some time to actually get it done.
> > I'd still like to know whether what I am seeing with the volume not
> being marked Error is an actual bug or something on my behalf but if the
> "Amazon" driver helps with the effects of the timeouts I'l gladly take it.
> >
> > Regards,
> >
> > Martin
> >
> > On 14.05.24 20:53, Ana Emília M. Arruda wrote:
> >  > Hello Martin,
> >  >
> >  > Do you think you can upgrade to 15.0.X? I would recommend you to
> use the "Amazon" driver, instead of the "S3" driver. You can simply change
> the "Driver" in the cloud resource, and restart the SD. I'm not sure the
> Amazon driver is available in 13.0.2, but you can have a try.
> >  >
> >  > The Amazon driver is much more stable to such timeout issues.
> >  >
> >  > Best regards,
> >  > Ana
> >  >
> >  > On Mon, May 6, 2024 at 8:53 AM Martin Reissner <
> mreiss...@wavecon.de   mreiss...@wavecon.de >> wrote:
> >  >
> >  > Hello,
> >  >
> >  > by now I am mostly using our Ceph RGW with the S3 driver as
> storage and this works just fine but time and again requests towards the
> RGW time out.
> >  > This is of course our business and not Bacula's but due to a
> behaviour I can't understand this causes us more trouble than it should.
> >  >
> >  > When one of these errors happens it looks like this in the
> logs:
> >  >
> >  >
> >  > 04-Mai 02:32 mybackup-sd JobId 968544: Error:
> S3_delete_object ERR=RequestTimeout CURL Effective URL:
> https://myrgw/mystorage/myvolume-25809/part.10 <
> https://myrgw/mystorage/myvolume-25809/part.10> <
> https://myrgw/mystorage/myvolume-25809/part.10 <
> https://myrgw/mystorage/myvolume-25809/part.10>> CURL OS Error: 101 CURL
> Effective URL: https://myrgw/mystorage/myvolume/part.10 <
> https://myrgw/mystorage/myvolume/part.10> <
> https://myrgw/mystorage/myvolume/part.10 <
> https://myrgw/mystorage/myvolume/part.10>> CURL OS Error: 101
> >  > 04-Mai 02:32 mybackup-sd JobId 968544: Fatal error:
> label.c:575 Truncate error on Cloud device "mydevice"
> (/opt/bacula/cloudcache): ERR= S3_delete_object ERR=RequestTimeout CURL
> Effective URL: https://myrgw/mystorage/myvolume/part.10 <
> https://myrgw/mystorage/myvolume/part.10> <
> https://myrgw/mystorage/myvolume/part.10 <
> https://myrgw/mystorage/myvolume/part.10>> CURL OS Error: 101 CURL
> Effective URL: https://myrgw/mystorage/myvolume/part.10 <
> https://myrgw/mystorage/myvolume/part.10> <
> https://myrgw/mystorage/myvolume/part.10 <
> https://myrgw/mystorage/myvolume/part.10>> CURL OS Error: 101
> >  > 04-Mai 02:32 mybackup-sd JobId 968544: Marking Volume
> "myvolume" in Error in Catalog.
> >  > 04-Mai 02:32 mybackup-sd JobId 968544: Fatal error: Job
> 968544 canceled.
> >  > 04-Mai 02:32 mybackup-dir JobId 968544: Error: Bacula
> Enterprise wc-backup2-dir 13.0.2 (18Feb23):
> >  >
> >  >
> >  > However when I check the Volume status in the Catalog I see:
> >  >
> >  >
> >  > *list volume=myvolume
> >  >
>  
> +-++---+-+--+--+--+-+--+---+---+-+--+---+
> >  > | MediaId | VolumeName | VolStatus | Enabled
> | VolBytes | VolFiles | VolRetention | Recycle | Slot | InChanger |
> MediaType | VolType | VolParts | ExpiresIn |
> >  >
>  
> +-++---+-+--+--+--+-+--+---+---+-+--+---+
> >  > 

Re: [Bacula-users] Volume not being marked as Error in Catalog after S3/RGW timeout

2024-06-14 Thread Martin Reissner

Hello Ana,

I wanted to report back that after almost two weeks on Bacula 15.0.2 using the 
"Amazon" driver, the situation has improved a lot. We have seen no more backups 
failing because of timeouts. I had some errors with Copyjobs that are
reading from S3/RGW and writing to file storage but they might have been caused by 
something else and it was only few errors. Thank you for the heads up with the 
"Amazon" driver!

All the best,

Martin

On 16.05.24 17:58, Ana Emília M. Arruda wrote:

Hello Martin,

Yes, the Amazon driver will help with the timeout issues. And we have been 
improving the Amazon driver continuously. Thus, I would move to this one. If 
you still see this issue related to the volume not marked as error, then we 
should investigate it.

Best,
Ana

On Wed, May 15, 2024 at 12:04 PM Martin Reissner mailto:mreiss...@wavecon.de>> wrote:

Hello Ana,

thank you for the heads up. An upgrade to one of the more recent versions 
has been in my backlog for a while now, maybe this will get me some time to 
actually get it done.
I'd still like to know whether what I am seeing with the volume not being marked 
Error is an actual bug or something on my behalf but if the "Amazon" driver 
helps with the effects of the timeouts I'l gladly take it.

Regards,

Martin

On 14.05.24 20:53, Ana Emília M. Arruda wrote:
 > Hello Martin,
 >
 > Do you think you can upgrade to 15.0.X? I would recommend you to use the "Amazon" driver, 
instead of the "S3" driver. You can simply change the "Driver" in the cloud resource, and 
restart the SD. I'm not sure the Amazon driver is available in 13.0.2, but you can have a try.
 >
 > The Amazon driver is much more stable to such timeout issues.
 >
 > Best regards,
 > Ana
 >
 > On Mon, May 6, 2024 at 8:53 AM Martin Reissner mailto:mreiss...@wavecon.de> >> wrote:
 >
 >     Hello,
 >
 >     by now I am mostly using our Ceph RGW with the S3 driver as storage 
and this works just fine but time and again requests towards the RGW time out.
 >     This is of course our business and not Bacula's but due to a 
behaviour I can't understand this causes us more trouble than it should.
 >
 >     When one of these errors happens it looks like this in the logs:
 >
 >
 >     04-Mai 02:32 mybackup-sd JobId 968544: Error:  S3_delete_object ERR=RequestTimeout CURL Effective URL: 
https://myrgw/mystorage/myvolume-25809/part.10  
> CURL OS 
Error: 101 CURL Effective URL: https://myrgw/mystorage/myvolume/part.10 
 > CURL OS Error: 101
 >     04-Mai 02:32 mybackup-sd JobId 968544: Fatal error: label.c:575 Truncate error on Cloud device 
"mydevice" (/opt/bacula/cloudcache): ERR= S3_delete_object ERR=RequestTimeout CURL Effective URL: 
https://myrgw/mystorage/myvolume/part.10  
> CURL OS Error: 101 CURL 
Effective URL: https://myrgw/mystorage/myvolume/part.10  
> CURL OS Error: 101
 >     04-Mai 02:32 mybackup-sd JobId 968544: Marking Volume "myvolume" in 
Error in Catalog.
 >     04-Mai 02:32 mybackup-sd JobId 968544: Fatal error: Job 968544 
canceled.
 >     04-Mai 02:32 mybackup-dir JobId 968544: Error: Bacula Enterprise 
wc-backup2-dir 13.0.2 (18Feb23):
 >
 >
 >     However when I check the Volume status in the Catalog I see:
 >
 >
 >     *list volume=myvolume
 >     
+-++---+-+--+--+--+-+--+---+---+-+--+---+
 >     | MediaId | VolumeName                 | VolStatus | Enabled | 
VolBytes | VolFiles | VolRetention | Recycle | Slot | InChanger | MediaType | 
VolType | VolParts | ExpiresIn |
 >     
+-++---+-+--+--+--+-+--+---+---+-+--+---+
 >     |  25,809 | myvolume                   | Recycle   |       1 |       
 1 |        0 |      691,200 |       1 |    0 |         0 | CloudType |      14 |  
     12 |         0 |
 >     
+-++---+-+--+--+--+-+--+---+---+-+--+---+
 >
 >
 >     The VolStatus "Recycle" causes the Volume being used for subsequent 
Jobs which then all