Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

Murugan, Visnusaran Mon, 08 Dec 2014 04:02:52 -0800

Hi Zane & Michael,

Please have a look @ 
https://etherpad.openstack.org/p/execution-stream-and-aggregator-based-convergence

Updated with a combined approach which does not require persisting graph and 
backup stack removal. This approach reduces DB queries by waiting for 
completion notification on a topic. The drawback I see is that delete stack 
stream will be huge as it will have the entire graph. We can always dump such 
data in ResourceLock.data Json and pass a simple flag "load_stream_from_db" to 
converge RPC call as a workaround for delete operation.

To Stop current stack operation, we will use your traversal_id based approach. 
If in case you feel Aggregator model creates more queues, then we might have to 
poll DB to get resource status. (Which will impact performance adversely :) )

Lock table: name(Unique - Resource_id), stack_id, engine_id, data (Json to 
store stream dict)

Your thoughts.
Vishnu (irc: ckmvishnu)
Unmesh (irc: unmeshg)

-----Original Message-----
From: Zane Bitter [mailto:zbit...@redhat.com] 
Sent: Thursday, December 4, 2014 10:50 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

On 01/12/14 02:02, Anant Patil wrote:
> On GitHub:https://github.com/anantpatil/heat-convergence-poc

I'm trying to review this code at the moment, and finding some stuff I don't 
understand:

https://github.com/anantpatil/heat-convergence-poc/blob/master/heat/engine/stack.py#L911-L916

This appears to loop through all of the resources *prior* to kicking off any 
actual updates to check if the resource will change. This is impossible to do 
in general, since a resource may obtain a property value from an attribute of 
another resource and there is no way to know whether an update to said other 
resource would cause a change in the attribute value.

In addition, no attempt to catch UpdateReplace is made. Although that looks 
like a simple fix, I'm now worried about the level to which this code has been 
tested.

I'm also trying to wrap my head around how resources are cleaned up in 
dependency order. If I understand correctly, you store in the ResourceGraph 
table the dependencies between various resource names in the current template 
(presumably there could also be some left around from previous templates too?). 
For each resource name there may be a number of rows in the Resource table, 
each with an incrementing version. 
As far as I can tell though, there's nowhere that the dependency graph for 
_previous_ templates is persisted? So if the dependency order changes in the 
template we have no way of knowing the correct order to clean up in any more? 
(There's not even a mechanism to associate a resource version with a particular 
template, which might be one avenue by which to recover the dependencies.)

I think this is an important case we need to be able to handle, so I added a 
scenario to my test framework to exercise it and discovered that my 
implementation was also buggy. Here's the fix: 
https://github.com/zaneb/heat-convergence-prototype/commit/786f367210ca0acf9eb22bea78fd9d51941b0e40

> It was difficult, for me personally, to completely understand Zane's 
> PoC and how it would lay the foundation for aforementioned design 
> goals. It would be very helpful to have Zane's understanding here. I 
> could understand that there are ideas like async message passing and 
> notifying the parent which we also subscribe to.

So I guess the thing to note is that there are essentially two parts to my Poc:
1) A simulation framework that takes what will be in the final implementation 
multiple tasks running in parallel in separate processes and talking to a 
database, and replaces it with an event loop that runs the tasks sequentially 
in a single process with an in-memory data store. 
I could have built a more realistic simulator using Celery or something, but I 
preferred this way as it offers deterministic tests.
2) A toy implementation of Heat on top of this framework.

The files map roughly to Heat something like this:

converge.engine       -> heat.engine.service
converge.stack        -> heat.engine.stack
converge.resource     -> heat.engine.resource
converge.template     -> heat.engine.template
converge.dependencies -> actually is heat.engine.dependencies
converge.sync_point   -> no equivalent
converge.converger    -> no equivalent (this is convergence "worker")
converge.reality      -> represents the actual OpenStack services

For convenience, I just use the @asynchronous decorator to turn an ordinary 
method call into a simulated message.

The concept is essentially as follows:
At the start of a stack update (creates and deletes are also just
updates) we create any new resources in the DB calculate the dependency graph 
for the update from the data in the DB and template. This graph is the same one 
used by updates in Heat currently, so it contains both the forward and reverse 
(cleanup) dependencies. The stack update then kicks off checks of all the leaf 
nodes, passing the pre-calculated dependency graph.

Each resource check may result in a call to the create(), update() or
delete() methods of a Resource plugin. The resource also reads any attributes 
that will be required from it. Once this is complete, it triggers any dependent 
resources that are ready, or updates a SyncPoint in the database if there are 
dependent resources that have multiple requirements. The message triggering the 
next resource will contain the dependency graph again, as well as the RefIds 
and required attributes of any resources it depends on.

The new dependencies thus created are added to the resource itself in the 
database at the time it is checked, allowing it to record the changes caused by 
a requirement being unexpectedly replaced without needing a global lock on 
anything.

When cleaning up resources, we also endeavour to remove any that are 
successfully deleted from the dependencies graph.

Each traversal has a unique ID that is both stored in the stack and passed down 
through the resource check triggers. (At present this is the template ID, but 
it may make more sense to have a unique ID since old template IDs can be 
resurrected in the case of a rollback.) As soon as these fail to match the 
resource checks stop propagating, so only an update of a single field is 
required (rather than locking an entire
table) before beginning a new stack update.

Hopefully that helps a little. Please let me know if you have specific 
questions. I'm *very* happy to incorporate other ideas into it, since it's 
pretty quick to change, has tests to check for regressions, and is intended to 
be thrown away anyhow (so I genuinely don't care if some bits get thrown away 
earlier than others).

> In retrospective, we had to struggle a lot to understand the existing 
> Heat engine. We couldn't have done justice by just creating another 
> project in GitHub and without any concrete understanding of existing 
> state-of-affairs.

I completely agree, and you guys did the right thing by starting out looking at 
Heat. But remember, the valuable thing isn't the code, it's what you learned. 
My concern is that now that you have Heat pretty well figured out, you won't be 
able to continue to learn nearly as fast trying to wrestle with the Heat 
codebase as you could with the simulator. We don't want to fall into the trap 
of just shipping whatever we have because it's too hard to explore the other 
options, we want to identify a promising design and iterate it as quickly as 
possible.

cheers,
Zane.

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

Reply via email to