On 2020-10-08 15:54, Phil Stracchino wrote:
On 10/8/20 9:11 AM, Josip Deanovic wrote:
Do you have to turn off attribute spooling with 9.6.3 and 9.6.6?
Disabling attribute spooling will inflict noticeable performance
degradation.


Unfortunately, yes, because the attribute spooling code — at least for
the MySQL driver — is broken.  It caches all of the attribute data in a
temporary table until the job is done, then dumps it all into the DB at
once, ignoring the configured write batch size.  If the job copies more
than 128K files, this exceeds Galera 3's hard writeset limit.

I see. Thank you for explaining it.
I used to use MySQL for a long time and had no problems because I didn't
use Galera MySQL cluster with Bacula.

If it honored the batch size setting, it would be perfectly fine.  That
said, I probably would not have done the spooling that way in the first
place.  I would have cached the attribute data in memory until I had
$BATCHSIZE records, then written them directly to the DB in a batch.  I
honestly think this would perform better than saving them all until the
end of the job and then ogging the DB with potentially millions of
records at once.  That is ALWAYS a bad idea.

I'd write and offer a patch — in fact I'd overhaul the entire MySQL
driver — but I don't know nearly enough C++.

I don't know how these things are implemented in Bacula.
It's possible that Bacula team did it because they thought that it
would help setting up HA for the Bacula director daemon.

In case one is using database cluster in round-robin setup, one
of the master nodes could start lagging which could have unpredictable
effects on most applications (unless synchronous communication is
in use).

With some applications, depending on how they utilize database,
it could lead to some kind of interlocks which would need to be
solved by the cluster software or otherwise it could lead to
long or indefinite timeouts.

Round-robin with database nodes (master-master) is usually fine
for applications that produce small queries and don't have to
create awfully complex relations. Otherwise, database cluster
software would need to take care about locking which brings in
the question of synchronous communication and the overall
performance gain from such setup.


I am aware that some decisions in Bacula regarding dealing with
database connections are not the best.

For example, if you use a Copy which is configured to select like
300 jobs to copy, Bacula would open 600 connections to the database.
Those connections would stay opened until jobs are finished.
For each Copy job that completes, two connections would get released.

And if your database has connection limit set below the number of
connections Bacula temporary needs, Bacula-dir would segfault.
I have experienced it with Postgres and I have found old posts
in the mailing list archives claiming that the same problem exists
with MySQL as well.

There are few ways to workaround the problem with too many connections
but Bacula director shouldn't segfault.


--
Josip Deanovic


_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to