Hi Spyros,

Thanks for starting this thread. My initial understanding was that the
planned session would more around
heat performance/scalability issues w/ magnum.

As most of the additional stuff you mentioned are around heat best
practices, I think the specs/reviews
would be a great place to start the discussion and we can also squeeze them
as part of the same session.

Some comments inline.

On Mon, Oct 10, 2016 at 9:24 PM, Spyros Trigazis <strig...@gmail.com> wrote:

> Hi Sergey,
>
> I have seen the session, I wanted to add more details to
> start the discussion earlier and to be better prepared.
>
> Thanks,
> Spyros
>
>
> On 10 October 2016 at 17:36, Sergey Kraynev <skray...@mirantis.com> wrote:
>
>> Hi Spyros,
>>
>> AFAIK we already have special session slot related with your topic.
>> So thank you for the providing all items here.
>> Rabi, can we add link on this mail to etherpad ? (it will save our time
>> during session :) )
>>
>> On 10 October 2016 at 18:11, Spyros Trigazis <strig...@gmail.com> wrote:
>>
>>> Hi heat and magnum.
>>>
>>> Apart from the scalability issues that have been observed, I'd like to
>>> add few more subjects to discuss during the summit.
>>>
>>> 1. One nested stack per node and linear scale of cluster creation
>>> time.
>>>
>>> 1.1
>>> For large stacks, the creation of all nested stack scales linearly. We
>>> haven't run any tested using the convergence-engine.
>>>
>>
>From what I understand, magnum uses ResourceGroups and Template Resources.
(ex. Cluster->RGs->master/nodes) to build the cluster.

As the nested stack operations happen over rpc, they should be distributed
across all available engines.
So, the finding that the build time increases linearly is not good. It
would probably be worth providing more
details of heat configuration(ex. number of engine workers etc) on your
test setup. it would also be useful
to do some tests with convergence enabled, as that is the default from
newton.

Magnum seems to use a collection of software configs (scipts) as a
multipart mime with server
user_data. So the the build time for 'every node' would be dependent on the
time taken by these scripts
at boot.

1.2
>>> For large stacks, 1000 nodes, the final call to heat to fetch the
>>> IPs for all nodes takes 3 to 4 minutes. In heat, the stack has status
>>> CREATE_COMPLETE but magnum's state is updated when this long final
>>> call is done. Can we do better? Maybe fetch only the master IPs or
>>> get he IPs in chunks.
>>>
>>

We seem load the nested stacks in memory to retrieve their outputs. That
would probably explain the
behaviour above, where you load all the nested stacks for the nodes to
fetch their ips. There is some
work[1][2] happening atm to change that.

[1] https://review.openstack.org/#/c/383839/
[2] https://review.openstack.org/#/c/384718


> 1.3
>>> After the stack create API call to heat, magnum's conductor
>>> busy-waits heat with a thread/cluster. (In case of a magnum conductor
>>> restart, we lose that thread and we can't update the status in
>>> magnum). Investigate better ways to sync the status between magnum
>>> and heat.
>>>
>> Rather than waiting/polling, probably you can implement an observer that
consumes events
from heat/event-sink and updates magnum accordingly? May be there are
better options too.


> 2. Next generation magnum clusters
>>>
>>> A need that comes up frequently in magnum is heterogeneous clusters.
>>> * We want to able to create cluster on different hardware, (e.g. spawn
>>>   vms on nodes with SSDs and nodes without SSDs or other special
>>>   hardware available only in some nodes of the cluster FPGA, GPU)
>>> * Spawn cluster across different AZs
>>>
>>> I'll describe briefly our plan here, for further information we have a
>>> detailed spec under review. [1]
>>>
>>> To address this issue we introduce the node-group concept in magnum.
>>> Each node-group will correspond to a different heat stack. The master
>>> nodes can be organized in one or more stacks, so as the worker nodes.
>>>
>>> We investigate how to implement this feature. We consider the
>>> following:
>>> At the moment, we have three template files, cluster, master and
>>> node, and all three template files create one stack. The new
>>> generation of clusters will have a cluster stack containing
>>> the resources in the cluster template, specifically, networks, lbaas
>>> floating-ips etc. Then, the output of this stack would be passed as
>>> input to create the master node stack(s) and the worker nodes
>>> stack(s).
>>>
>>


> 3. Use of heat-agent
>>>
>>> A missing feature in magnum is the lifecycle operations in magnum. For
>>> restart of services and COE upgrades (upgrade docker, kubernetes and
>>> mesos) we consider using the heat-agent. Another option is to create a
>>> magnum agent or daemon like trove.
>>>
>>> 3.1
>>> For restart, a few systemctl restart or service restart commands will
>>> be issued. [2]
>>>
>>> 3.2
>>> For upgrades there are three scenarios:
>>> 1. Upgrade a service which runs in a container. In this case, a small
>>>    script that runs in each node is sufficient. No vm reboot required.
>>> 2. For an ubuntu based image or similar that requires a package upgrade
>>>    a similar small script is sufficient too. No vm reboot required.
>>> 3. For our fedora atomic images, we need to perform a rebase on the
>>>    rpm-ostree files system which requires a reboot.
>>> 4. Finally, a thought under investigation is replacing the nodes one
>>>    by one using a different image. e.g. Upgrade from fedora 24 to 25
>>>    with new versions of packages all in a new qcow2 image. How could
>>>    we update the stack for this?
>>>
>>> Options 1. and 2. can be done by upgrading all worker nodes at once or
>>> one by one. Options 3. and 4. should be done one by one.
>>>
>>> I'm drafting a spec about upgrades, should be ready by Wednesday.
>>>
>>> Cheers,
>>> Spyros
>>>
>>> [1] https://review.openstack.org/#/c/352734/
>>> [2] https://review.openstack.org/#/c/368981/
>>>
>>> ____________________________________________________________
>>> ______________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: openstack-dev-requ...@lists.op
>>> enstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>>
>>
>> --
>> Regards,
>> Sergey.
>>
>> ____________________________________________________________
>> ______________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscrib
>> e
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>


-- 
Regards,
Rabi Misra
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to