Re: [openstack-dev] [all][massively distributed][architecture]Coordination between actions/WGs

Alec Hothan (ahothan) Thu, 01 Sep 2016 09:01:16 -0700

This topic of oslo messaging issues has been going on for a long time and the 
main issue is not the transport itself (each transport has its own limitations) 
but the code using oslo messaging (e.g. pieces of almost every openstack 
service). It is relatively easy to write code using oslo messaging that works 
with devstack or a small scale deployment, it is much less easy to write such 
code that works under the conditions of operations at scale: frequent lack of 
an appropriate test platform, limitations in existing testing tools and to top 
it all, "fuzzy" oslo messaging API definition makes the handling of abnormal 
conditions and load conditions very unpredictable and inconsistent across 
components.


You can't solve this by just "fixing" the oslo messaging layer or by swapping 
to another transport (you'll just open up another can of worms)

As suggested by Ian below, the only practical way to fix this is to define a 
new set of APIs that is much more strictly defined, have openstack code migrate 
to these new APIs and test adequately.
That is clearly very difficult to do with resources moving away from "stable 
and mature" services and attracted by the latest buzzwords (such as containers).

On the original topic of this thread, having geographical distribution will 
certainly introduce a new set of issues at scale.


  Alec

 






On 9/1/16, 6:52 AM, "Ken Giusti" <kgiu...@gmail.com> wrote:

>On Wed, Aug 31, 2016 at 3:30 PM, Ian Wells <ijw.ubu...@cack.org.uk> wrote:
>> On 31 August 2016 at 10:12, Clint Byrum <cl...@fewbar.com> wrote:
>>>
>>> Excerpts from Duncan Thomas's message of 2016-08-31 12:42:23 +0300:
>>> > On 31 August 2016 at 11:57, Bogdan Dobrelya <bdobre...@mirantis.com>
>>> > wrote:
>>> >
>>> > > I agree that RPC design pattern, as it is implemented now, is a major
>>> > > blocker for OpenStack in general. It requires a major redesign,
>>> > > including handling of corner cases, on both sides, *especially* RPC
>>> > > call
>>> > > clients. Or may be it just have to be abandoned to be replaced by a
>>> > > more
>>> > > cloud friendly pattern.
>>> >
>>> >
>>> > Is there a writeup anywhere on what these issues are? I've heard this
>>> > sentiment expressed multiple times now, but without a writeup of the
>>> > issues
>>> > and the design goals of the replacement, we're unlikely to make progress
>>> > on
>>> > a replacement - even if somebody takes the heroic approach and writes a
>>> > full replacement themselves, the odds of getting community by-in are
>>> > very
>>> > low.
>>>
>>> Right, this is exactly the sort of thing I'd like to gather a group of
>>> design-minded folks around in an Architecture WG. Oslo is busy with the
>>> implementations we have now, but I'm sure many oslo contributors would
>>> like to come up for air and talk about the design issues, and come up
>>> with a current design, and some revisions to it, or a whole new one,
>>> that can be used to put these summit hallway rumors to rest.
>>
>>
>> I'd say the issue is comparatively easy to describe.  In a call sequence:
>>
>> 1. A sends a message to B
>> 2. B receives messages
>> 3. B acts upon message
>> 4. B responds to message
>> 5. A receives response
>> 6. A acts upon response
>>
>> ... you can have a fault at any point in that message flow (consider crashes
>> or program restarts).  If you ask for something to happen, you wait for a
>> reply, and you don't get one, what does it mean?  The operation may have
>> happened, with or without success, or it may not have gotten to the far end.
>> If you send the message, does that mean you'd like it to cause an action
>> tomorrow?  A year from now?  Or perhaps you'd like it to just not happen?
>> Do you understand what Oslo promises you here, and do you think every person
>> who ever wrote an RPC call in the whole OpenStack solution also understood
>> it?
>>
>
>Precisely - IMHO it's a shortcoming of the current o.m. RPC (and
>Notification) API in that it does not let the API user explicitly set
>the desired delivery guarantee when publishing.  Right now it's
>implied that the delivery guarantee is "At Most Once" but that's
>mostly not precisely defined in any meaningful way.
>
>Any messaging API should be explicit regarding what delivery
>guarantee(s) are possible.  In addition, an API should allow the user
>to designate the importance of a message on a per-send basis:  can
>this message be dropped?  can this message be duplicated?  At what
>point in time does the message become invalid (already offered for RPC
>via timeout, but not Notifications IIRC), etc....
>
>And well-understood failure modes... things always fail...
>
>
>> I have opinions about other patterns we could use, but I don't want to push
>> my solutions here, I want to see if this is really as much of a problem as
>> it looks and if people concur with my summary above.  However, the right
>> approach is most definitely to create a new and more fitting set of oslo
>> interfaces for communication patterns, and then to encourage people to move
>> to the new ones from the old.  (Whether RabbitMQ is involved is neither here
>> nor there, as this is really a question of Oslo APIs, not their
>> implementation.)
>>
>
>Hmmmmmm... maybe.   Message bus technology is varied, and so is it's
>behavior.  There are brokerless, point-to-point backends supported by
>oslo.messaging [1],[2] which will exhibit different
>capabilities/behaviors from the traditional broker-based
>store-and-forward backend (e.g. message acking end-to-end vs to the
>intermediary).
>
>All the more reason to have explicit delivery guarantees and well
>understood failure scenarios defined by the API.
>
>[1] http://docs.openstack.org/developer/oslo.messaging/zmq_driver.html
>[2] http://docs.openstack.org/developer/oslo.messaging/AMQP1.0.html
>
>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
>
>-- 
>Ken Giusti  (kgiu...@gmail.com)
>
>__________________________________________________________________________
>OpenStack Development Mailing List (not for usage questions)
>Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all][massively distributed][architecture]Coordination between actions/WGs

Reply via email to