+1

On Thu, 12 Mar 2020 at 20:40, Kevin Fenzi <ke...@scrye.com> wrote:

> We have been having the cluster fall over for still unknown reasons,
> but this patch should at least help prevent them:
>
> first we increase the net_ticktime parameter from it's default of 60 to
> 120.
> rabbitmq sends 4 'ticks' to other cluster members over this time and if 25%
> of them are lost it assumes that cluster member is down. All these vm's are
> on the same net and in the same datacenter, but perhaps heavy load
> from other vm's causes them to sometimes not get a tick in time?
> http://www.rabbitmq.com/nettick.html
>
> Also, set our partitioning strategy to autoheal. Currently if some cluster
> member gets booted out, it gets paused, and stops processing at all.
> With autoheal it will try and figure out a 'winning' partition and restart
> all the nodes that are not in that partition.
> https://www.rabbitmq.com/partitions.html
>
> Hopefully the first thing will make partitions less likely and the second
> will make them repair without causing massive pain to the cluster.
>
> Signed-off-by: Kevin Fenzi <ke...@scrye.com>
> ---
>  roles/rabbitmq_cluster/templates/rabbitmq.config | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/roles/rabbitmq_cluster/templates/rabbitmq.config
> b/roles/rabbitmq_cluster/templates/rabbitmq.config
> index 5c38dbd..82dd444 100644
> --- a/roles/rabbitmq_cluster/templates/rabbitmq.config
> +++ b/roles/rabbitmq_cluster/templates/rabbitmq.config
> @@ -21,7 +21,7 @@
>
>     %% How to respond to cluster partitions.
>     %% Documentation: https://www.rabbitmq.com/partitions.html
> -   {cluster_partition_handling, pause_minority},
> +   {cluster_partition_handling, autoheal},
>
>     %% And some general config
>     {log_levels, [{connection, none}]},
> @@ -29,9 +29,7 @@
>     {heartbeat, 600},
>     {channel_max, 128}
>    ]},
> - {kernel,
> -  [
> -  ]},
> + {kernel, [{net_ticktime,  120}]},
>   {rabbitmq_management,
>    [
>     {listener, [{port, 15672},
> --
> 1.8.3.1
> _______________________________________________
> infrastructure mailing list -- infrastructure@lists.fedoraproject.org
> To unsubscribe send an email to
> infrastructure-le...@lists.fedoraproject.org
> Fedora Code of Conduct:
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives:
> https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedoraproject.org
>
_______________________________________________
infrastructure mailing list -- infrastructure@lists.fedoraproject.org
To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedoraproject.org

Reply via email to