Re: [openstack-dev] [Openstack-operators] [Fuel][Oslo][RabbitMQ][Shovel] Deprecate mirrored queues from HA AMQP cluster scenario

2015-06-08 Thread Michael Klishin
On 8 June 2015 at 15:10:15, Davanum Srinivas (dava...@gmail.com) wrote:
> I'd like to bring out a poll about deprecating the RabbitMQ mirrored  
> queues for HA layout and replacing the AMQP clustering by shovel  
> [0],
> [1]. I guess the federation would not be a good option, but let's  
> consider it as well.

RabbitMQ team member here. 

Neither Shovel nor Federation will replace mirroring. Shovel moves messages
from a queue to an exchange (within a single node or between remote nodes 
and/or clusters).
It doesn't replicate anything.

Federation has two parts to it:

 * Queue federation: no replicate, distributes messages from a single logical 
queue
   between N nodes or clusters, when there are no local consumers to consume 
them.
 * Exchange federation replicates a stream of messages going through an 
exchange.
   As messages are consumed upstream, downstream has no way of knowing about it.

> Why this must be done? The answer is that the rabbit cluster cannot  
> detect and survive "micro outages" well and just ending up with  
> some
> queues stuck and as a result, the rabbitmqctl control plane hanged  
> completely unresponsive (until the rabbit node erased and recovered  
> its
> cluster membership). These outages could be caused either by  
> the network
> *or* by CPU load spikes. For example, like this bug in Fuel project  
> [2]
> and this mail thread [3].

The right thing to do here is introduce timeouts to rabbitmqctl, which was 99% 
finished
in the past but some RabbitMQ team members felt it should produce more detailed
error messages, which extended the scope of the change significantly.

> This seems rather the Erlang's 
> Mnesia generic clustering issue, than something what could be just fixed 
> in RabbitMQ, unless the mnesia based clustering would be dropped 
> completely ;)

While Mnesia indeed needs to be replaced to introduce AP (as in CAP) style 
mirroring,
the issue you're bringing up here has nothing to do with Mnesia.
Mnesia is not used by rabbitmqctl, and it is not used to store messages.
It's a rabbitmqctl
issue, and potentially a hint that you may want to reduce net_ticktime value 
(say, to 5-10 seconds)
to make queue master unavailability detected faster.



1. http://www.rabbitmq.com/nettick.html
--  
MK  

Staff Software Engineer, Pivotal/RabbitMQ  



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Fuel][Oslo][RabbitMQ][Shovel] Deprecate mirrored queues from HA AMQP cluster scenario

2015-06-09 Thread Michael Klishin
Nodes can become unavailable with Shovel or Federation as well.

While both plugins will enqueue undelivered/unconfirmed messages internally,
recovery can take a while or never happen. So replication is certainly
necessary.

RabbitMQ has a guide that mentions several possible failure scenarios:
http://www.rabbitmq.com/reliability.html

Note that a lot of them do not even involve a messaging server, and would
be just as relevant for Pacemaker setups. This is as much an oslo.messaging
design concern as whatever messaging technology is used.
There's ongoing work on publisher confirms — one of the things
oslo.messaging must have — and heartbeats, for faster peer unavailability
detection. Shovel or Federation
or AP-style mirroring wouldn't change this.

So please clarify what problems are being tackled here. Currently there are
several largely unrelated things mentioned: rabbitmqctl timeouts, Fuel
provisioning, Mnesia being very consistency-oriented, desired
oslo.messaging fault tolerance improvements. I'm not sure how some of these
relate to each other and
why OpenStack has to work around issues that should be reported to the
RabbitMQ team.

I will push for introducing the most basic timeout support
in ctl in the next bug fix release.

On Mon, Jun 8, 2015 at 5:24 PM, Bogdan Dobrelya 
wrote:

> > RabbitMQ team member here.
>
> Thank you for a quick response, Michael!
>
> >
> > Neither Shovel nor Federation will replace mirroring. Shovel moves
> messages
> > from a queue to an exchange (within a single node or between remote
> nodes and/or clusters).
> > It doesn't replicate anything.
>
> Yes, the idea was to not just replace, but redesign OpenStack libs to
> use cluster-less messaging as well. It should assume that some messages
> from RPC conversations may be lost. And that messages aren't synced
> between different AMQP nodes specified in the config of OpenStack
> services (rabbit_hosts=).
>
> >
> > Federation has two parts to it:
> >
> >  * Queue federation: no replicate, distributes messages from a single
> logical queue
> >between N nodes or clusters, when there are no local consumers to
> consume them.
> >  * Exchange federation replicates a stream of messages going through an
> exchange.
> >As messages are consumed upstream, downstream has no way of knowing
> about it.
> >
> >
> > The right thing to do here is introduce timeouts to rabbitmqctl, which
> was 99% finished
> > in the past but some RabbitMQ team members felt it should produce more
> detailed
> > error messages, which extended the scope of the change significantly.
> >
> >
> > While Mnesia indeed needs to be replaced to introduce AP (as in CAP)
> style mirroring,
> > the issue you're bringing up here has nothing to do with Mnesia.
> > Mnesia is not used by rabbitmqctl, and it is not used to store messages.
> > It's a rabbitmqctl
> > issue, and potentially a hint that you may want to reduce net_ticktime
> value (say, to 5-10 seconds)
> > to make queue master unavailability detected faster.
> >
> >
>
> Thank you, I updated the bug comments [0]. We will test this option as
> well.
>
> [0] https://bugs.launchpad.net/fuel/+bug/1460762/comments/23
>
> >
> > 1. http://www.rabbitmq.com/nettick.html
> > --
> > MK
> >
> > Staff Software Engineer, Pivotal/RabbitMQ
>
>
> --
> Best regards,
> Bogdan Dobrelya,
> Skype #bogdando_at_yahoo.com
> Irc #bogdando
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
MK

Staff Software Engineer, Pivotal/RabbitMQ
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Fuel][Oslo][RabbitMQ][Shovel] Deprecate mirrored queues from HA AMQP cluster scenario

2015-06-09 Thread Michael Klishin
On 9 June 2015 at 10:26:27, Michael Klishin (mklis...@pivotal.io) wrote:
> I will push for introducing the most basic timeout support
> in ctl in the next bug fix release.

Some (highly conservative, for the sake of backwards compatibility)
improvements are already merged and will be in 3.5.4 :

https://github.com/rabbitmq/rabbitmq-server/pull/181
-- 
MK 

Staff Software Engineer, Pivotal/RabbitMQ 



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [rabbitmq] ANN New ops-oriented guides on rabbitmq.com

2015-06-10 Thread Michael Klishin
First of all, apologies if this belongs strictly to openstack-docs,
based on multiple discussions in Vancouver I'd like more people to be aware of 
this.

As announced in April and at the summit in May, RabbitMQ
team at Pivotal  would like to help with OpenStack documentation and operations
experience around RabbitMQ.
It was later decided that most of the docs improvements should go to 
rabbitmq.com.

I'm happy to announced that we recently have shipped two new ops-oriented
guides:

 * Networking: http://www.rabbitmq.com/networking.html 
 * Production Checklist: http://www.rabbitmq.com/production-checklist.html 

The former covers multiple subjects related to networking, in particular 
tuning for two common scenarios: maximum throughput and highest possible number 
of concurrent connections. 
The latter is aimed at users looking to move into production or validate their 
existing deployment.

Nothing not OpenStack-specific so far but should be relevant to OpenStack 
operators.
Expansions to the above guides and more OpenStack-focused docs are coming
as time permits.

Cheers.
--  
MK  

Staff Software Engineer, Pivotal/RabbitMQ  



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] Improving OpenStack documentation around RabbitMQ

2015-04-28 Thread Michael Klishin
Hi, 

I'm a RabbitMQ engineering team member and we'd like to help improve OpenStack 
docs
around it.

I've been reading the docs and making notes of what can be improved. We'd be 
happy
to contribute the changes. However, we're not very familiar with the OpenStack 
development
process and have a few questions before we start.

As far as I understand, OpenStack Kilo is about to ship. Does this mean we can 
only contribute
documentation improvements for the release after it? Are there maintenance 
releases that doc improvements
could go into? If so, how is this reflected in repository  branches?

Should the changes we propose be discussed on this list or in GitHub issues [1]?

Finally, we are considering adding a doc guide dedicated to OpenStack on 
rabbitmq.com (we have one for EC2,
for instance). Note that we are not looking
to replace what's on docs.openstack.org, only provide a guide that can go into 
more details.
Does this sound like a good idea to the OpenStack community? Should we keep 
everything on docs.openstack.org?
Would it be OK if we link to rabbitmq.com guides in any changes we contribute? 
I don't think OpenStack Juno
docs have a lot of external links: is that by design?

Thanks.

1. https://github.com/openstack/openstack-manuals 
--  
MK  

Staff Software Engineer, Pivotal/RabbitMQ  



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Improving OpenStack documentation around RabbitMQ

2015-04-28 Thread Michael Klishin
On 28 April 2015 at 16:33:35, Davanum Srinivas (dava...@gmail.com) wrote:
> Hello Michael.
>  
> Just moving your thread to the correct mailling list.

Apologies, I've signed up to openstack-docs now and will re-post there. 
--  
MK  

Staff Software Engineer, Pivotal/RabbitMQ  



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Improving OpenStack documentation around RabbitMQ

2015-04-28 Thread Michael Klishin
On 28 April 2015 at 16:33:35, Davanum Srinivas (dava...@gmail.com) wrote:
> Have you seen this?
> https://github.com/openstack/ha-guide/tree/master/doc/high-availability-guide/ha_aa_rabbitmq
>   
>  
> That url was built from this github repo:
> https://github.com/openstack/ha-guide/tree/master/doc/high-availability-guide/ha_aa_rabbitmq
>   
>  
> There's a weekly meeting for the HA documentation to meet people  
> working on the HA guide:
> https://wiki.openstack.org/wiki/Meetings#HA_Guide_Update_Meeting  

Thank you, I'll take a look.

At this stage I'm trying to understand the process more than anything. E.g. how 
can
documentation improvements to Kilo be contributed after it ships.

Some of the improvements we have in mind are not HA-related. 
--  
MK  

Staff Software Engineer, Pivotal/RabbitMQ  



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Improving OpenStack documentation around RabbitMQ

2015-04-28 Thread Michael Klishin
On 28 April 2015 at 16:44:32, Michael Klishin (mklis...@pivotal.io) wrote:
> At this stage I'm trying to understand the process more than  
> anything. E.g. how can
> documentation improvements to Kilo be contributed after it  
> ships.
>  
> Some of the improvements we have in mind are not HA-related.

I've decided to start a proper new thread on openstack-docs with my original 
questions.

Let's continue there. 
--  
MK  

Staff Software Engineer, Pivotal/RabbitMQ  



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev