Hello all,

I am using bucardo 5.1.2 to replicate multiple groups of tables using multimaster mode - currently 2 machines (32 cores, 16 GB), but may be more in the future to make the application more scalable (lot of pg clients).

A group of table is related to a specific event and consists of 10 tables and sequences. Two of those tables are updated frequently, approximatively 20 new lines and 20 or less lines deleted each second.

The main application creates a fixed number of events, not more than 100 at the early stage of the application; when a new event is created, an external program creates the corresponding schema in all machines and calls bucardo to create a sync for those new tables and sequences.

bucardo sync are created like this :
bucardo add sync sync_xxx db_group=dbgroup_xxx relgroup=relgroup_xxx conflict_strategy=bucardo_latest autokick=1

I first tried with only a few events, 1 or 2, (so there are 1 or 2 bucardo sync) and it was working correctly. Il I increases the number of sync (10 syncs), I notice that all sync status oscillate between Good and Bad and that they are a lot of errors in the logs. When they are in Bad state, it takes them one or two minute them to go back in Good state , meaning the tables are not updated during this time.

Here the type of error that comes  very often in the logs :
<<<
(2754) [Fri Feb 20 15:56:35 2015] KID (the_sync_XXX_6) Kid has died, error is: DBD::Pg::db pg_cancel failed: No asynchronous query is running at /usr/share/perl5/Bucardo.pm line 5403. Line: 5425 Main DB state: ? Error: none DB channel_db_bucardo_0 state: ? Error: none DB channel_db_bucardo_1 state: 40001 Error: 7 DBI::db=HASH(0x1cdce80)->disconnect invalidates 20 active statement handles (either destroy statement handles or call finish on them before disconnecting) at /usr/share/perl5/Bucardo.pm line 2692. (2754) [Fri Feb 20 15:56:35 2015] KID (the_sync_XXX_6) Kid 2754 exiting at cleanup_kid. Sync "the_sync_XXX_6" channel_XXX_0.streams Reason: DBD::Pg::db pg_cancel failed: No asynchronous query is running at /usr/share/perl5/Bucardo.pm line 5403. Line: 5425 Main DB state: ? Error: none DB channel_db_bucardo_0 state: ? Error: none DB channel_db_bucardo_1 state: 40001 Error: 7 (2681) [Fri Feb 20 15:56:35 2015] KID (the_sync_XXX_8) Kid has died, error is: DBD::Pg::db pg_cancel failed: No asynchronous query is running at /usr/share/perl5/Bucardo.pm line 5403. Line: 5425 Main DB state: ? Error: none DB channel_db_bucardo_0 state: ? Error: none DB channel_db_bucardo_1 state: 40001 Error: 7 DBI::db=HASH(0x1ce50c8)->disconnect invalidates 26 active statement handles (either destroy statement handles or call finish on them before disconnecting) at /usr/share/perl5/Bucardo.pm line 2692. (2681) [Fri Feb 20 15:56:35 2015] KID (the_sync_XXX_8) Kid 2681 exiting at cleanup_kid. Sync "the_sync_XXX_8" channel_XXX_0.streams Reason: DBD::Pg::db pg_cancel failed: No asynchronous query is running at /usr/share/perl5/Bucardo.pm line 5403. Line: 5425 Main DB state: ? Error: none DB channel_db_bucardo_0 state: ? Error: none DB channel_db_bucardo_1 state: 40001 Error: 7
>>>

If I restart bucardo, it does not solve the problem.

Questions:
1) How scalable is bucardo : In other words, is there a sync limit in bucardo, that could make a solution with lot of syncs not scalable? For example, are syncs independent or not / dependent of a process that controls all syncs, ie, if one process is blocked, does it have impacts on others?

2) Is there a way to get rid of those errors? I guess they are related to the fact the sync are not refreshed.

3) Is there a way to make the syncs more responsive? for example, the kid can be created with options "checktime", "lifetime", "maxkicks", "overdue" or "expired", but I am not sure to understand the benefits of those options.

4) I notice there are global options to control the bucardo children processes - for example 'ctl_checkonkids_time'. Can it help to restart erroneous processes more quickly ?

Last question, may be related or not : I notice that some sync sometimes become inactive. After that, I find no way to make then work again, using bucardo activate XXX does not solve, and stopping / restarting the daemon does not help, and nothing special is present in the logs to explain what is wrong. So why a sync can become inactive, why and what to do in this case?

Thanks and regards,
Sylvain
_______________________________________________
Bucardo-general mailing list
[email protected]
https://mail.endcrypt.com/mailman/listinfo/bucardo-general

Reply via email to