Re: [Bacula-users] Way of working or dangerous?

Michel Figgins Tue, 20 Jan 2026 14:00:00 -0800

Having a pool exist on multiple SDs makes me uncomfortable.

Have you tested a restore from one of the Fulls you moved to the remote site?

- Michel

From: Rob Gerber <[email protected]>
Sent: Tuesday, January 20, 2026 1:24 PM
To: Jan Sielemann <[email protected]>
Cc: [email protected]
Subject: Re: [Bacula-users] Way of working or dangerous?

I think the usual best practice advice is to not modify bacula's database 
unless you have a very good reason to do so. The normal reasons not to do 
something like that is that your scripts or manual work aren't automated, 
haven't been as thoroughly bug tested as bacula, and are subject to human 
error. I understand that you have a good reason to follow your current practice.
I understand your question about database safety, but I am not able to answer 
it due to a lack of knowledge, besides what I have said above.
I think maybe I could provide an alternative suggestion that would solve the 
problem in a safer way.

To understand my idea, you need to know:
1. You can host your own s3 compatible storage server (not using any cloud 
provider). This server uses the s3 protocol, but only relies on your own 
infrastructure.
2. The bacula cloud storage plugin allows bacula to write to s3 compatible 
services, including ones you host yourself.
3. Uploads in the bacula cloud storage plugin are resumable, and cloud volumes 
are split into 'parts'. which have a configurable size. Cloud job success 
doesn't depend on the successful upload of those volume parts. If the upload 
fails for one or more parts, the cloud job does not fail. You can try to upload 
the parts again, as many times as needed. This means that writing copy jobs to 
a cloud resource you host on your remote SD server could solve your problem and 
eliminate the need to do manual volume transfers, and database edits.
4. I am going to refer to 'cloud jobs' and 'copy jobs'. A copy job is a job 
that copies the data from an existing bacula job from one pool to another. A 
copy job does not rely on the FD at all. A cloud job writes to a cloud 
resource, and doesn't have to be a copy job. A copy job can write to any 
available type of resource, including cloud resources.
5. The scariest failure mode for the bacula cloud plugin is that failure to 
upload a part file does not automatically generate an error (this is also a 
feature, in case of a temporary network interruption). So you could have an 
error condition where your cloud volume parts aren't all uploaded to the remote 
server, and bacula will not regard this as a fatal error, even if some parts 
NEVER upload. For this reason, I have written a part upload script that watches 
its output for failure to upload a part, and exits 1 (setting the bacula admin 
job that ran the script to an error state). I cannot emphasize this enough. You 
must have a cloud volume part sweeper script, and IMO must have some sort of 
logic to detect consistent failure to upload those parts. I can give you a copy 
of the script I'm using for this, if you wind up going down this path.

My idea is:
1. Write all your local jobs to the local SD, then use bacula copy jobs to copy 
them to a cloud resource on the local SD to an s3 compatible server hosted on 
the same server as your remote SD. Data from jobs will be hosted in both 
locations. Truncate your cloud cache ASAP after upload, so you aren't storing 
the data twice.
OR
2. Create a cloud resource on your local SD, pointing to an s3 compatible 
server hosted on the same server as your remote SD. Directly write your backup 
jobs to the cloud storage resource. Upload generated cloud volume part files to 
your remote s3 compatible server. Truncate your local cache as needed if you 
are limited on local storage space.

Here is the general process for option 1. Option 2 is similar, without the copy 
steps.
1. Set up an s3 compatible storage server on your remote SD server. RustFS, or 
an alternative. I think it's traditional to recommend MinIO for this, but the 
company behind MinIO are actively removing features from the project, so 
probably best to use something else.
2. Set up cloud storage resources in the local bacula SD conf file, and 
equivalent storage resources in the bacula director conf file. Of course, the 
'cloud' storage resources will point to your s3 storage service on the remote 
SD server.  It would be best if you used bacula 15.x for this, because the new 
cloud driver was vastly improved in that version.  The new driver in bacula 
15.x is named 'Amazon', but it can work with any s3 compatible server.
3. Place Pool-Full and Pool-Incremental to the local SD. Write all original 
backups to these pools. Set 'next pool' to the 'copy' pools in step 4.
4. Configure Pool-Full-Copy and Pool-Incremental-Copy on the local SD. Set 
storage to the cloud storage resource.
5. Run your local backups. All backups will write to the local SD.
6. Run copy jobs, using selection method 'pool uncopied jobs' for Pool-Full and 
Pool-Incremental. One copy job instance per pool to be copied.
6.1 For each local job that needs to be copied, bacula will launch 2 jobs: a 
copy control job (that orchestrates the copy process), and a copy job (that 
will effectively 'replay' the original backup job, but this time it reads data 
from the original local volumes and writes to cloud volumes in the local cache, 
for eventual upload to the remote s3 storage).
6.2 Each cloud job is written to the local cache first, then uploaded. Because 
the process is done this way, the copy job can finish even if the upload 
process doesn't complete at that time.
7. Run a 'cloud volume part sweeper' admin job after all the copy jobs finish, 
to make sure that any parts which failed to upload are uploaded to the s3 
compatible server. The part sweeper job is very important because any volume 
part that didn't successfully upload during a cloud job won't be uploaded later 
without this sort of job. The basic idea is the script runs a bconsole command: 
'cloud allpools storage=$storage upload'

Some notes:
You can choose to truncate the local cloud cache after upload, and to perform 
uploads of the cloud volume part files only after the copy jobs finish. In the 
case I am describing, it makes sense to truncate the local cloud cache ASAP 
after upload because all the same data is already stored in the local SD.
If you instruct your cloud resources to only upload when instructed (instead of 
on job completion), you can focus on finishing the process of writing your copy 
job cloud volume parts to the local cache, and uploading those part files 
during non-peak times.
You might want to set 'allow duplicate jobs = no' in your copy job / jobdefs, 
because there is a possible condition where a job could be queued for upload 
multiple times. Effectively what could happen is: 'Really Big Full Job 1' runs 
and finishes. Copy Job 2 selects 'Really Big Full Job 1' to be copied and gets 
to work. Copy Job 2 hasn't finished running by the next night, when Copy Job 3 
launches, and finds that 'Really Big Full Job 1' hasn't been successfully 
copied yet. So Copy Job 3 also queues up to copy 'Really Big Full Job 1'. 
Eventually Copy Job 2 finishes uploading its copy of 'Really Big Full Job 1', 
so when Copy Job 4 runs 2 nights from now, it won't select 'Really Big Full Job 
1' to be copied yet again, but this does nothing to prevent Copy Job 3 from 
uploading 'Really Big Full Job 1' a second time. 'allow duplicate jobs = no' 
helps prevent this condition. Not typing cloud resource upload to copy job 
completion could also help reduce this risk, since job length duration would 
only be limited to local conditions, not network availability.
Make sure that your local volumes and the cloud volumes have the same retention 
periods. Any condition where a local job has a retention period > the cloud 
retention period will result in the cloud job and its associated volumes being 
pruned, and then the next copy job will select the local job for upload again. 
Once it has uploaded, it will be pruned again, and then uploaded again, and 
then... until the local job expires.
For this reason, it is also probably best that you set your cloud pool to limit 
the number of jobs per volume to 1, or limit the volume use duration to 20hr or 
something.
Bacula will not fail a cloud job just because a cloud volume part failed to 
upload. This is a feature, but there isn't any functionality built into bacula 
right now (AFAIK) to bring this failure to your attention (bacularis DOES 
highlight these failures in the bacularis log). I have a cloud volume part 
sweeper script that writes its output to a log file, then after the upload 
process completes, reads its log file for any failure to upload messages. If it 
finds any parts failed to upload, the script exits 1, which will cause the 
bacula admin job that launched it to exit with a fatal error. This should bring 
the failure to upload to your attention, and you can monitor to see if the 
problem persists.

For option 2 (cloud resource first, no copy jobs), it is also possible that you 
could configure a cloud resource on your local SD, and then simply write your 
Pool-Full and Pool-Incremental backups directly to the cloud resource. The 
cloud volumes would be written to your local cloud cache, and would remain 
available for fast restore if needed (unless you truncated the local cache). 
This alternative skips the idea of making copy jobs at all, and just relies on 
the cloud plugin directly. This isn't an invalid approach. It just depends on 
your priorities. I prefer the idea of having a dedicated local storage resource 
whose volumes aren't subject to cache truncation, so as long as the local 
volumes still exist I can do a local restore. This 'cloud first' plan could 
help if you have limited storage space on your local SD. In that case, you 
could frequently truncate the local cache, to keep local SD space available.

Regards,
Robert Gerber
402-237-8692
[email protected]<mailto:[email protected]>

On Tue, Jan 20, 2026 at 1:16 PM Jan Sielemann via Bacula-users 
<[email protected]<mailto:[email protected]>> 
wrote:

yes, but the question was more, if the database-modification is a good way of 
working.

I'm transporting the volume files via nightly sftp/rsync between the hosts, in 
rare cases with a physical disk.

I just want to know, whether modifying the database in this way is 
recommendable or a strict no-go (and why).

________________________________
On 1/20/26 20:09, Phil Stracchino wrote:
On 1/20/26 13:40, Dragan Milivojević wrote:

If your internet is so bad that you can't use bacula copy jobs, use
rsync to synchronize remote storage.

This is an excellent suggestion.  Seconded.  rsync will transfer only as much 
data as is needed to sync the files, and can resume an interrupted copy.

_______________________________________________
Bacula-users mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/bacula-users

_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] Way of working or dangerous?

Reply via email to