On Nov 13, 2014, at 10:59 AM, Clint Byrum <cl...@fewbar.com> wrote: > Excerpts from Zane Bitter's message of 2014-11-13 09:55:43 -0800: >> On 13/11/14 09:58, Clint Byrum wrote: >>> Excerpts from Zane Bitter's message of 2014-11-13 05:54:03 -0800: >>>> On 13/11/14 03:29, Murugan, Visnusaran wrote: >>>>> Hi all, >>>>> >>>>> Convergence-POC distributes stack operations by sending resource actions >>>>> over RPC for any heat-engine to execute. Entire stack lifecycle will be >>>>> controlled by worker/observer notifications. This distributed model has >>>>> its own advantages and disadvantages. >>>>> >>>>> Any stack operation has a timeout and a single engine will be >>>>> responsible for it. If that engine goes down, timeout is lost along with >>>>> it. So a traditional way is for other engines to recreate timeout from >>>>> scratch. Also a missed resource action notification will be detected >>>>> only when stack operation timeout happens. >>>>> >>>>> To overcome this, we will need the following capability: >>>>> >>>>> 1.Resource timeout (can be used for retry) >>>> >>>> I don't believe this is strictly needed for phase 1 (essentially we >>>> don't have it now, so nothing gets worse). >>>> >>> >>> We do have a stack timeout, and it stands to reason that we won't have a >>> single box with a timeout greenthread after this, so a strategy is >>> needed. >> >> Right, that was 2, but I was talking specifically about the resource >> retry. I think we agree on both points. >> >>>> For phase 2, yes, we'll want it. One thing we haven't discussed much is >>>> that if we used Zaqar for this then the observer could claim a message >>>> but not acknowledge it until it had processed it, so we could have >>>> guaranteed delivery. >>>> >>> >>> Frankly, if oslo.messaging doesn't support reliable delivery then we >>> need to add it. >> >> That is straight-up impossible with AMQP. Either you ack the message and >> risk losing it if the worker dies before processing is complete, or you >> don't ack the message until it's processed and you become a blocker for >> every other worker trying to pull jobs off the queue. It works fine when >> you have only one worker; otherwise not so much. This is the crux of the >> whole "why isn't Zaqar just Rabbit" debate. >> > > I'm not sure we have the same understanding of AMQP, so hopefully we can > clarify here. This stackoverflow answer echoes my understanding: > > http://stackoverflow.com/questions/17841843/rabbitmq-does-one-consumer-block-the-other-consumers-of-the-same-queue > > Not ack'ing just means they might get retransmitted if we never ack. It > doesn't block other consumers. And as the link above quotes from the > AMQP spec, when there are multiple consumers, FIFO is not guaranteed. > Other consumers get other messages. > > So just add the ability for a consumer to read, work, ack to > oslo.messaging, and this is mostly handled via AMQP. Of course that > also likely means no zeromq for Heat without accepting that messages > may be lost if workers die. > > Basically we need to add something that is not "RPC" but instead > "jobqueue" that mimics this: > > http://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messaging/rpc/dispatcher.py#n131 > > I've always been suspicious of this bit of code, as it basically means > that if anything fails between that call, and the one below it, we have > lost contact, but as long as clients are written to re-send when there > is a lack of reply, there shouldn't be a problem. But, for a job queue, > there is no reply, and so the worker would dispatch, and then > acknowledge after the dispatched call had returned (including having > completed the step where new messages are added to the queue for any > newly-possible children). > > Just to be clear, I believe what Zaqar adds is the ability to peek at > a specific message ID and not affect it in the queue, which is entirely > different than ACK'ing the ones you've already received in your session. > >> Most stuff in OpenStack gets around this by doing synchronous calls >> across oslo.messaging, where there is an end-to-end ack. We don't want >> that here though. We'll probably have to make do with having ways to >> recover after a failure (kick off another update with the same data is >> always an option). The hard part is that if something dies we don't >> really want to wait until the stack timeout to start recovering. >> > > I fully agree. Josh's point about using a coordination service like > Zookeeper to maintain liveness is an interesting one here. If we just > make sure that all the workers that have claimed work off the queue are > alive, that should be sufficient to prevent a hanging stack situation > like you describe above. > >>> Zaqar should have nothing to do with this and is, IMO, a >>> poor choice at this stage, though I like the idea of using it in the >>> future so that we can make Heat more of an outside-the-cloud app. >> >> I'm inclined to agree that it would be hard to force operators to deploy >> Zaqar in order to be able to deploy Heat, and that we should probably be >> cautious for that reason. >> >> That said, from a purely technical point of view it's not a poor choice >> at all - it has *exactly* the semantics we want (unlike AMQP), and at >> least to the extent that the operator wants to offer Zaqar to users >> anyway it completely eliminates a whole backend that they would >> otherwise have to deploy. It's a tragedy that all of OpenStack has not >> been designed to build upon itself in this way and it causes me physical >> pain to know that we're about to perpetuate it. >> >>>>> 2.Recover from engine failure (loss of stack timeout, resource action >>>>> notification) >>>>> >>>>> Suggestion: >>>>> >>>>> 1.Use task queue like celery to host timeouts for both stack and resource. >>>> >>>> I believe Celery is more or less a non-starter as an OpenStack >>>> dependency because it uses Kombu directly to talk to the queue, vs. >>>> oslo.messaging which is an abstraction layer over Kombu, Qpid, ZeroMQ >>>> and maybe others in the future. i.e. requiring Celery means that some >>>> users would be forced to install Rabbit for the first time. >>>> >>>> One option would be to fork Celery and replace Kombu with oslo.messaging >>>> as its abstraction layer. Good luck getting that maintained though, >>>> since Celery _invented_ Kombu to be it's abstraction layer. >>>> >>> >>> A slight side point here: Kombu supports Qpid and ZeroMQ. Oslo.messaging >> >> You're right about Kombu supporting Qpid, it appears they added it. I >> don't see ZeroMQ on the list though: >> >> http://kombu.readthedocs.org/en/latest/userguide/connections.html#transport-comparison >> > > They, confusingly, call it zmq, and it may not be in a recent release: > > https://github.com/celery/kombu/blob/master/kombu/transport/zmq.py > >>> is more about having a unified API than a set of magic backends. It >>> actually boggles my mind why we didn't just use kombu (cue 20 reactions >>> with people saying it wasn't EXACTLY right), but I think we're committed >> >> Well, we also have to take into account the fact that Qpid support was >> added only during the last 9 months, whereas oslo.messaging was >> implemented 3 years ago and time travel hasn't been invented yet (for >> any definition of 'yet'). >> > > Go back in time 3 years ago, and perhaps we could have done all the work > we've done in kombu. Hindsight though.
+1 to this, I've seen the openstack community shy away from helping/improving other open source projects, which saddens me. Kombu I think is in this category, but the future is unwritten and there is still hope! > >>> to oslo.messaging now. Anyway, celery would need no such refactor, as >>> kombu would be able to access the same bus as everything else just fine. >> >> Interesting, so that would make it easier to get Celery added to the >> global requirements, although we'd likely still have headaches to deal >> with around configuration. >> > > Yeah, I'm not advocating for celery, just pointing out that it has > become more like what we already deploy. :) > >>>>> 2.Poll database for engine failures and restart timers/ retrigger >>>>> resource retry (IMHO: This would be a traditional and weighs heavy) >>>>> >>>>> 3.Migrate heat to use TaskFlow. (Too many code change) >>>> >>>> If it's just handling timed triggers (maybe this is closer to #2) and >>>> not migrating the whole code base, then I don't see why it would be a >>>> big change (or even a change at all - it's basically new functionality). >>>> I'm not sure if TaskFlow has something like this already. If not we >>>> could also look at what Mistral is doing with timed tasks and see if we >>>> could spin some of it out into an Oslo library. >>>> >>> >>> I feel like it boils down to something running periodically checking for >>> scheduled tasks that are due to run but have not run yet. I wonder if we >>> can actually look at Ironic for how they do this, because Ironic polls >>> power state of machines constantly, and uses a hash ring to make sure >>> only one conductor is polling any one machine at a time. If we broke >>> stacks up into a hash ring like that for the purpose of singleton tasks >>> like timeout checking, that might work out nicely. >> >> +1 for something like this, and +2 if we can get it from a library we >> don't have to write ourselves (whether it be TaskFlow or something spun >> out of Mistral or Ironic into Oslo). >> > > Right, those things are fairly generic and would definitely fit nicely > in a library. > > So, the simplest possible solution, I think, is to lock resource id + > graph version. Since we are scared of Zookeeper, we'll need a periodic > job in the engines that looks for stale locks, or we have to wait for > another stack operation to check for them. Maybe it's time we face our fears, have people even tried zookeeper? Honestly I start to wonder, because it has some really neat features if people just try it out... > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev