Re: [Openstack-operators] [oslo] RabbitMQ queue TTL issues moving to Liberty

Sam Morrison Mon, 25 Jul 2016 16:16:47 -0700

The queue TTL happens on reply queues and fanout queues. I don’t think it 
should happen on fanout queues. They should auto delete. I can understand the 
reason for having them on reply queues though so maybe that would be a way to 
forward?


Or am I missing something and it is needed on fanout queues too?

Cheers,
Sam



> On 25 Jul 2016, at 8:47 PM, Dmitry Mescheryakov <dmescherya...@mirantis.com> 
> wrote:
> 
> Sam,
> 
> For your case I would suggest to lower rabbit_transient_queues_ttl until you 
> are comfortable with volume of messages which comes during that time. Setting 
> the parameter to 1 will essentially replicate bahaviour of auto_delete 
> queues. But I would suggest not to set it that low, as otherwise your 
> OpenStack will suffer from the original bug. Probably a value like 20 seconds 
> should work in most cases.
> 
> I think that there is a space for improvement here - we can delete reply and 
> fanout queues on graceful shutdown. But I am not sure if it will be easy to 
> implement, as it requires services (Nova, Neutron, etc.) to stop RPC server 
> on sigint and I don't know if they do it right now.
> 
> I don't think we can make case with sigkill any better. Other than that, the 
> issue could be investigated on Neutron side, maybe number of messages could 
> be reduced there.
> 
> Thanks,
> 
> Dmitry
> 
> 2016-07-25 9:27 GMT+03:00 Sam Morrison <sorri...@gmail.com 
> <mailto:sorri...@gmail.com>>:
> We recently upgraded to Liberty and have come across some issues with queue 
> build ups.
> 
> This is due to changes in rabbit to set queue expiries as opposed to queue 
> auto delete.
> See https://bugs.launchpad.net/oslo.messaging/+bug/1515278 
> <https://bugs.launchpad.net/oslo.messaging/+bug/1515278> for more information.
> 
> The fix for this bug is in liberty and it does fix an issue however it causes 
> another one.
> 
> Every time you restart something that has a fanout queue. Eg. 
> cinder-scheduler or the neutron agents you will have
> a queue in rabbit that is still bound to the rabbitmq exchange (and so still 
> getting messages in) but no consumers.
> 
> These messages in these queues are basically rubbish and don’t need to exist. 
> Rabbit will delete these queues after 10 mins (although the default in master 
> is now changed to 30 mins)
> 
> During this time the queue will grow and grow with messages. This sets off 
> our nagios alerts and our ops guys have to deal with something that isn’t 
> really an issue. They basically delete the queue.
> 
> A bad scenario is when you make a change to your cloud that means all your 
> 1000 neutron agents are restarted, this causes a couple of dead queues per 
> agent to hang around. (port updates and security group updates) We get around 
> 25 messages / second on these queues and so you can see after 10 minutes we 
> have a ton of messages in these queues.
> 
> 1000 x 2 x 25 x 600 = 30,000,000 messages in 10 minutes to be precise.
> 
> Has anyone else been suffering with this before a raise a bug?
> 
> Cheers,
> Sam
> 
> 
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org 
> <mailto:OpenStack-operators@lists.openstack.org>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators 
> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators>
>

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [oslo] RabbitMQ queue TTL issues moving to Liberty

Reply via email to