On Tue, Oct 3, 2023, at 1:45 PM, Eric Bollengier via Bacula-users wrote:
> Hello Dan,
>
> On 9/30/23 15:54, Dan Langille wrote:
>> Hello,
>>
>> The Bacula PostgreSQL schema is missing several foreign keys (FK). Foreign
>> keys are not a new database concept; they've been around for decades. They
>> are reliable and robust.
>
> Yes, it's robust, but Bacula is doing mass insertion, and here, the cost is
> bigger than the benefits IMHO. Of course, if someone can run tests on a
> multi-billion record database and show us that the cost is around few %, we
> can
> consider them.
Here's hoping someone with that database can step up.
> Basically, each time you insert something, you need to query the other tables
> to
> check the existence or not of records. The checks for the existence of the
> records is already done at the bacula level one time per job, no real need to
> do
> it one time per insertion/update/delete.
Is that one-time-per-job check done via a SELECT on the database? I should
explain analyze that one too and see if we can optimize that.
>> Wednesday, I started a dbcheck on a Bacula database. Granted, that database
>> is 19 years old and this is the first time I've run dbcheck (as far as I
>> know). That dbcheck is still going. FYI, the dump to disk is about 140GB;
>> lots of cruft removal.
>>
>> When PostgreSQL was first added to Bacula, there was resistance to FK, and I
>> did not pursue the issue. Thus, it persists to this day. I hope to change
>> that.
>
> I don't think they are necessary, we use transaction in the cleanup now days,
> so
> orphan records should not exist anymore.
>
>> I would like to take that development work back up (pun intended), and start
>> adding foreign keys back into Bacula, at least for PostgresQL. That might
>> remove the need for dbcheck (again, at least for Bacula on PostgreSQL).
>
> dbcheck is needed from time to time to purge the Path table mostly, other
> tables
> should be ok with a recent version. Not sure the algorithms in dbcheck are
> always the best ones to find a remove records.
>
>> For example, one index I have been using this index for years. I find it
>> referenced[1] in the the 5.x documentation, but it is not part of the
>> catalog creation.
>>
>> "file_jobid_idx" btree (jobid)
>>
>> This index vastly improves the construction of the files, often going from
>> hours to seconds. I don't recall when that index was added here, but
>> building trees has never been an issue here.
>
> This index is great, but it uses GB of space and it has a cost during the
> insertion of the attributes. PostgreSQL can use the composed index on JobId,
> PathId instead to locate records, so you may add it back if you want, but it's
> not mandatory.
How many GB? Do we know? We can easily find out too.
bacula=# select pg_table_size('file_jobid_idx');
pg_table_size
---------------
37937438720
(1 row)
That's about 40GB. I'll check the explain analyse and see how things go and
come up with cost. Right now, we have no data.
Some might say: if the cost > 0 it's not worth it. However, it is worth it,
IMO. Databases are much better at enforcing relational integrity than code is.
It's what they do.
Thank you
--
Dan Langille
[email protected]
_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users