Re: [Bacula-users] Way of working or dangerous?

Jan Sielemann via Bacula-users Tue, 20 Jan 2026 14:10:06 -0800

Hello Michel,

I tested at least one of the backups for validity but in this case withhaving all files on one side (the remote).

The case, where some files are on remote and some are on local side didnot happen yet.


But I will test this special case too (with unimportant data).

I migrated from Bareos to Bacula and as far as I can tell, thisprocedure worked on Bareos.

The point that makes me wondering was, that the bconsole in Bacula did'toffer me the solution of simply updating a volume and changing thestorage for it. In Bareos this is well documented and supported behavior.

See:https://docs.bareos.org/TasksAndConcepts/HowToManuallyTransferDataVolumes.html

And when something is not offered, it seems like it's notwanted/supported and may fail for reasons, I cannot oversee.

But as Rob already mentioned, I would like a complete automated processfor the future...every human intervention is a point of failure. And atleast I have to manually intervent, when new full backups are created onthe clients side. I put an ZABBIX-Trigger on the storage location of theSD. If any new files appear there, it means, that the full backup usedup its retention time and I need to go and fetch the new full backupsmanually and transport it to the storage daemon.

As someone may know, we in Germany have a brand new and moderninfrastructure when it comes to internet.

And the date, when it was new was 1990. :D

Since a full backup can contain 800GiB and more, it's a pain to send itover the internet to my offsite storage...


Best regards,

Jan


------------------------------------------------------------------------
Logo JSIIT Consulting   JSIIT Consulting        Phone:  +49 40 22865465         
Jan Sielemann   Telefax:        +49 40 22865468         
Riekbornweg 22  Mobile:         +49 1578 6769143

22547 Hamburg GPG-Public-Key: 809774813E4461D7<https://jsiit.net/public/338622A442F25A5869C8ADE9809774813E4461D7.pub.asc>


        :       
        

------------------------------------------------------------------------
On 1/20/26 22:43, Michel Figgins wrote:

Having a pool exist on multiple SDs makes me uncomfortable.
Have you tested a restore from one of the Fulls you moved to theremote site?
- Michel

*From:*Rob Gerber <[email protected]>
*Sent:* Tuesday, January 20, 2026 1:24 PM
*To:* Jan Sielemann <[email protected]>
*Cc:* [email protected]
*Subject:* Re: [Bacula-users] Way of working or dangerous?
I think the usual best practice advice is to not modify bacula'sdatabase unless you have a very good reason to do so. The normalreasons not to do something like that is that your scripts or manualwork aren't automated, haven't been as thoroughly bug tested asbacula, and are subject to human error. I understand that you have agood reason to follow your current practice.
I understand your question about database safety, but I am not able toanswer it due to a lack of knowledge, besides what I have said above.
I think maybe I could provide an alternative suggestion that wouldsolve the problem in a safer way.
To understand my idea, you need to know:
1. You can host your own s3 compatible storage server (not using anycloud provider). This server uses the s3 protocol, but only relies onyour own infrastructure.
2. The bacula cloud storage plugin allows bacula to write to s3compatible services, including ones you host yourself.
3. Uploads in the bacula cloud storage plugin are resumable, and cloudvolumes are split into 'parts'. which have a configurable size. Cloudjob success doesn't depend on the successful upload of those volumeparts. If the upload fails for one or more parts, the cloud job doesnot fail. You can try to upload the parts again, as many times asneeded. This means that writing copy jobs to a cloud resource you hoston your remote SD server could solve your problem and eliminate theneed to do manual volume transfers, and database edits.
4. I am going to refer to 'cloud jobs' and 'copy jobs'. A copy job isa job that copies the data from an existing bacula job from one poolto another. A copy job does not rely on the FD at all. A cloud jobwrites to a cloud resource, and doesn't have to be a copy job. A copyjob can write to any available type of resource, including cloudresources.
5. The scariest failure mode for the bacula cloud plugin is thatfailure to upload a part file does not automatically generate an error(this is also a feature, in case of a temporary network interruption).So you could have an error condition where your cloud volume partsaren't all uploaded to the remote server, and bacula will not regardthis as a fatal error, even if some parts NEVER upload. For thisreason, I have written a part upload script that watches its outputfor failure to upload a part, and exits 1 (setting the bacula adminjob that ran the script to an error state). *I cannot emphasize thisenough. You must have a cloud volume part sweeper script, and IMO musthave some sort of logic to detect consistent failure to upload thoseparts. *I can give you a copy of the script I'm using for this, if youwind up going down this path.
My idea is:
1. Write all your local jobs to the local SD, then use bacula copyjobs to copy them to a cloud resource on the local SD to an s3compatible server hosted on the same server as your remote SD. Datafrom jobs will be hosted in both locations. Truncate your cloud cacheASAP after upload, so you aren't storing the data twice.
OR
2. Create a cloud resource on your local SD, pointing to an s3compatible server hosted on the same server as your remote SD.Directly write your backup jobs to the cloud storage resource. Uploadgenerated cloud volume part files to your remote s3 compatible server.Truncate your local cache as needed if you are limited on localstorage space.
Here is the general process for option 1. Option 2 is similar, withoutthe copy steps.
1. Set up an s3 compatible storage server on your remote SD server.RustFS, or an alternative. I think it's traditional to recommend MinIOfor this, but the company behind MinIO are actively removing featuresfrom the project, so probably best to use something else.2. Set up cloud storage resources in the local bacula SD conf file,and equivalent storage resources in the bacula director conf file. Ofcourse, the 'cloud' storage resources will point to your s3 storageservice on the remote SD server. *It would be best if you used bacula15.x for this, because the new cloud driver was vastly improved inthat version. * The new driver in bacula 15.x is named 'Amazon', butit can work with any s3 compatible server.
3. Place Pool-Full and Pool-Incremental to the local SD. Write alloriginal backups to these pools. Set 'next pool' to the 'copy'pools in step 4.
4. Configure Pool-Full-Copy and Pool-Incremental-Copy on the local SD.Set storage to the cloud storage resource.
5. Run your local backups. All backups will write to the local SD.
6. Run copy jobs, using selection method 'pool uncopied jobs' forPool-Full and Pool-Incremental. One copy job instance per pool to becopied.
6.1 For each local job that needs to be copied, bacula will launch 2jobs: a *copy control* *job* (that orchestrates the copy process), anda *copy job* (that will effectively 'replay' the original backup job,but this time it reads data from the original local volumes and writesto cloud volumes in the local cache, for eventual upload to the remotes3 storage).
6.2 Each cloud job is written to the local cache first, then uploaded.Because the process is done this way, the copy job can finish even ifthe upload process doesn't complete at that time.7. Run a 'cloud volume part sweeper' admin job after all the copy jobsfinish, to make sure that any parts which failed to upload areuploaded to the s3 compatible server. The part sweeper job is veryimportant because any volume part that didn't successfully uploadduring a cloud job won't be uploaded later without this sort of job.The basic idea is the script runs a bconsole command: 'cloud allpoolsstorage=$storage upload'
Some notes:
You can choose to truncate the local cloud cache after upload, and toperform uploads of the cloud volume part files only after the copyjobs finish. In the case I am describing, it makes sense to truncatethe local cloud cache ASAP after upload because all the same data isalready stored in the local SD.
If you instruct your cloud resources to only upload when instructed(instead of on job completion), you can focus on finishing the processof writing your copy job cloud volume parts to the local cache, anduploading those part files during non-peak times.
You might want to set 'allow duplicate jobs = no' in your copy job /jobdefs, because there is a possible condition where a job could bequeued for upload multiple times. Effectively what could happen is:'Really Big Full Job 1' runs and finishes. Copy Job 2 selects 'ReallyBig Full Job 1' to be copied and gets to work. Copy Job 2 hasn'tfinished running by the next night, when Copy Job 3 launches, andfinds that 'Really Big Full Job 1' hasn't been successfully copiedyet. So Copy Job 3 also queues up to copy 'Really Big Full Job 1'.Eventually Copy Job 2 finishes uploading its copy of 'Really Big FullJob 1', so when Copy Job 4 runs 2 nights from now, it won'tselect 'Really Big Full Job 1' to be copied yet again, but this doesnothing to prevent Copy Job 3 from uploading 'Really Big Full Job 1' asecond time. 'allow duplicate jobs = no' helps prevent this condition.Not typing cloud resource upload to copy job completion could alsohelp reduce this risk, since job length duration would only be limitedto local conditions, not network availability.
Make sure that your local volumes and the cloud volumes have the sameretention periods. Any condition where a local job has a retentionperiod > the cloud retention period will result in the cloud job andits associated volumes being pruned, and then the next copy job willselect the local job for upload again. Once it has uploaded, it willbe pruned again, and then uploaded again, and then... until the localjob expires.
For this reason, it is also probably best that you set your cloud poolto limit the number of jobs per volume to 1, or limit the volume useduration to 20hr or something.
Bacula will not fail a cloud job just because a cloud volume partfailed to upload. This is a feature, but there isn't any functionalitybuilt into bacula right now (AFAIK) to bring this failure to yourattention (bacularis DOES highlight these failures in the bacularislog). I have a cloud volume part sweeper script that writes its outputto a log file, then after the upload process completes, reads its logfile for any failure to upload messages. If it finds any parts failedto upload, the script exits 1, which will cause the bacula admin jobthat launched it to exit with a fatal error. This should bring thefailure to upload to your attention, and you can monitor to see if theproblem persists.
For option 2 (cloud resource first, no copy jobs), it is also possiblethat you could configure a cloud resource on your local SD, and thensimply write your Pool-Full and Pool-Incremental backups directly tothe cloud resource. The cloud volumes would be written to your localcloud cache, and would remain available for fast restore if needed(unless you truncated the local cache). This alternative skips theidea of making copy jobs at all, and just relies on the cloud plugindirectly. This isn't an invalid approach. It just depends on yourpriorities. I prefer the idea of having a dedicated local storageresource whose volumes aren't subject to cache truncation, so as longas the local volumes still exist I can do a local restore. This 'cloudfirst' plan could help if you have limited storage space on your localSD. In that case, you could frequently truncate the local cache, tokeep local SD space available.
Regards,

Robert Gerber

402-237-8692

[email protected]
On Tue, Jan 20, 2026 at 1:16 PM Jan Sielemann via Bacula-users<[email protected]> wrote:
    yes, but the question was more, if the database-modification is a
    good way of working.

    I'm transporting the volume files via nightly sftp/rsync between
    the hosts, in rare cases with a physical disk.

    I just want to know, whether modifying the database in this way is
    recommendable or a strict no-go (and why).


        
        
        
        

        
        
        

        
        
        

        
        
        

        
        
        

    ------------------------------------------------------------------------

    On 1/20/26 20:09, Phil Stracchino wrote:

        On 1/20/26 13:40, Dragan Milivojević wrote:

            If your internet is so bad that you can't use bacula copy
            jobs, use
            rsync to synchronize remote storage.


        This is an excellent suggestion.  Seconded.  rsync will
        transfer only as much data as is needed to sync the files, and
        can resume an interrupted copy.

    _______________________________________________
    Bacula-users mailing list
    [email protected]
    https://lists.sourceforge.net/lists/listinfo/bacula-users



_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] Way of working or dangerous?

Reply via email to