+1 On Thu, 12 Mar 2020 at 20:40, Kevin Fenzi <ke...@scrye.com> wrote:
> We have been having the cluster fall over for still unknown reasons, > but this patch should at least help prevent them: > > first we increase the net_ticktime parameter from it's default of 60 to > 120. > rabbitmq sends 4 'ticks' to other cluster members over this time and if 25% > of them are lost it assumes that cluster member is down. All these vm's are > on the same net and in the same datacenter, but perhaps heavy load > from other vm's causes them to sometimes not get a tick in time? > http://www.rabbitmq.com/nettick.html > > Also, set our partitioning strategy to autoheal. Currently if some cluster > member gets booted out, it gets paused, and stops processing at all. > With autoheal it will try and figure out a 'winning' partition and restart > all the nodes that are not in that partition. > https://www.rabbitmq.com/partitions.html > > Hopefully the first thing will make partitions less likely and the second > will make them repair without causing massive pain to the cluster. > > Signed-off-by: Kevin Fenzi <ke...@scrye.com> > --- > roles/rabbitmq_cluster/templates/rabbitmq.config | 6 ++---- > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/roles/rabbitmq_cluster/templates/rabbitmq.config > b/roles/rabbitmq_cluster/templates/rabbitmq.config > index 5c38dbd..82dd444 100644 > --- a/roles/rabbitmq_cluster/templates/rabbitmq.config > +++ b/roles/rabbitmq_cluster/templates/rabbitmq.config > @@ -21,7 +21,7 @@ > > %% How to respond to cluster partitions. > %% Documentation: https://www.rabbitmq.com/partitions.html > - {cluster_partition_handling, pause_minority}, > + {cluster_partition_handling, autoheal}, > > %% And some general config > {log_levels, [{connection, none}]}, > @@ -29,9 +29,7 @@ > {heartbeat, 600}, > {channel_max, 128} > ]}, > - {kernel, > - [ > - ]}, > + {kernel, [{net_ticktime, 120}]}, > {rabbitmq_management, > [ > {listener, [{port, 15672}, > -- > 1.8.3.1 > _______________________________________________ > infrastructure mailing list -- infrastructure@lists.fedoraproject.org > To unsubscribe send an email to > infrastructure-le...@lists.fedoraproject.org > Fedora Code of Conduct: > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: > https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedoraproject.org >
_______________________________________________ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedoraproject.org