Re: [openstack-dev] [ironic][edge] Notes from the PTG

Csatari, Gergely (Nokia - HU/Budapest) Fri, 28 Sep 2018 02:55:36 -0700

Hi Jim,

Thanks for sharing your notes.


One note about the jumping automomus control plane requirement.
This requirement was already identified during the Dublin PTG workshop 
[1<https://wiki.openstack.org/w/index.php?title=OpenStack_Edge_Discussions_Dublin_PTG>].
 This is needed for two reasons the edge cloud instance should stay operational 
even if there is a network break towards other edge cloud instances and the 
edge cloud instance should work together with other edge cloud instances 
running other version of the control plane. In Denver we deided to leave out 
these requirements form the MVP architecture discussions.

Br,
Gerg0

[1]: 
https://wiki.openstack.org/w/index.php?title=OpenStack_Edge_Discussions_Dublin_PTG



From: Jim Rollenhagen <j...@jimrollenhagen.com<mailto:j...@jimrollenhagen.com>>
Reply-To: 
"openstack-dev@lists.openstack.org<mailto:openstack-dev@lists.openstack.org>" 
<openstack-dev@lists.openstack.org<mailto:openstack-dev@lists.openstack.org>>
Date: Wednesday, September 19, 2018 at 10:49 AM
To: 
"openstack-dev@lists.openstack.org<mailto:openstack-dev@lists.openstack.org>" 
<openstack-dev@lists.openstack.org<mailto:openstack-dev@lists.openstack.org>>
Subject: [openstack-dev] [ironic][edge] Notes from the PTG

I wrote up some notes from my perspective at the PTG for some internal teams 
and figured I may as well share them here. They're primarily from the ironic 
and edge WG rooms. Fairly raw, very long, but hopefully useful to someone. 
Enjoy.

Tuesday: edge

Edge WG (IMHO) has historically just talked about use cases, hand-waved a bit, 
and jumped to requiring an autonomous control plane per edge site - thus 
spending all of their time talking about how they will make glance and keystone 
sync data between control planes.

penick described roughly what we do with keystone/athenz and how that can be 
used in a federated keystone deployment to provide autonomy for any control 
plane, but also a single view via a global keystone.

penick and I both kept pushing for people to define a real architecture, and we 
ended up with 10-15 people huddled around an easel for most of the afternoon. 
Of note:

- Windriver (and others?) refuse to budge on the many control plane thing
    - This means that they will need some orchestration tooling up top in the 
main DC / client machines to even come close to reasonably managing all of 
these sites
    - They will probably need some syncing tooling
    - glance->glance isn’t a thing, no matter how many people say it is.
    - Glance PTL recommends syncing metadata outside of glance process, and a 
global(ly distributed?) glance backend.
- We also defined the single pane of glass architecture that Oath plans to 
deploy
    - Okay with losing connectivity from central control plane to single edge 
site
    - Each edge site is a cell
    - Each far edge site is just compute nodes
    - Still may want to consider image distribution to edge sites so we don’t 
have to go back to main DC?
    - Keystone can be distributed the same as first architecture
    - Nova folks may start investigating putting API hosts at the cell level to 
get the best of both worlds - if there’s a network partition, can still talk to 
cell API to manage things
    - Need to think about removing the need for rabbitmq between edge and far 
edge
        - Kafka was suggested in the edge room for oslo.messaging in general
        - Etcd watchers may be another option for an o.msg driver
        - Other other options are more invasive into nova - involve changing 
how nova-compute talks to conductor (etcd, etc) or even putting REST APIs in 
nova-compute (and nova-conductor?)
        - Neutron is going to work on an OVS “superagent” - superagent does the 
RPC handling, talks some other way to child agents. Intended to scale to 
thousands of children. Primary use case is smart nics but seems like a win for 
the edge case as well.

penick took an action item to draw up the architecture diagrams in a digestable 
format.

Wednesday: ironic things

Started with a retrospective. See 
https://etherpad.openstack.org/p/ironic-stein-ptg-retrospective for the notes - 
there wasn’t many surprising things here. We did discuss trying to target some 
quick wins for the beginning of the cycle, so that we didn’t have all of our 
features trying to land at the end. Using wsgi with the ironic-api was 
mentioned as a potential regression, but we agreed it’s a config/documentation 
issue. I took an action to make a task to document this better.

Next we quickly reviewed our vision doc, and people didn’t have much to say 
about it.

Metalsmith: it’s a thing, it’s being included into the ironic project. Dmitry 
is open to optionally supporting placement. Multiple instances will be a 
feature in the future. Otherwise mostly feature complete, goal is to keep it 
simple.

Networking-ansible: redhat building tooling that integrates with upstream 
ansible modules for networking gear. Kind of an alternative to n-g-s. Not 
really much on plans here, RH just wanted to introduce it to the community. 
Some discussion about it possibly replacing n-g-s later, but no hard plans.

Deploy steps/templates: we talked about what the next steps are, and what an 
MVP looks like. Deploy templates are triggered by the traits that nodes are 
scheduled against, and can add steps before or after (or in between?) the 
default deploy steps. We agreed that we should add a RAID deploy step, with 
standing questions for how arguments are passed to that deploy step, and what 
the defaults look like. Myself and mgoddard took an action item to open an RFE 
for this. We also agreed that we should start thinking about how the current 
(only) deploy step should be split into multiple steps.

Graphical console: we discussed what the next steps are for this work. We 
agreed that we should document the interface and what is returned (a URL), and 
also start working on a redfish driver for graphical consoles. We also noted 
that we can test in the gate with qemu, but we only need to test that a correct 
URL is returned, not that the console actually works (because we don’t really 
care that qemu’s console works).

Python 3: we talked about the changes to our jobs that are needed. We agreed to 
use the base name of the jobs for Python 3 (as those will be used for a long 
time), and add a “python2” prefix for the Python 2 jobs. We also discussed 
dropping certain coverage for Python 2, as our CI jobs tend to mostly test the 
same codepaths with some config differences. Last, we talked about mixed 
environment Python 2 and 3 testing, as this will be a thing people doing 
rolling upgrades of Python versions will hit. I sent an email to the ML asking 
if others had done or thought about this, and it sounds like we can limit that 
testing to oslo.messaging, and a task was reported there.

Pre-upgrade checks: Not much was discussed here; TheJulia is going to look into 
it. One item of note is that there is an oslo project being proposed that can 
carry some of the common code for this.

Performance improvements: We first discussed our virt driver’s performance. It 
was found that Nova’s power sync loop makes a call to Ironic for each instance 
that the compute service is managing. We do some node caching in our driver 
that would be useful for this. I took an action item to look into it, and have 
a WIP patch: https://review.openstack.org/#/c/602127/ . That patch just needs a 
bug filed and unit tests written. On Thursday, we talked with Nova about other 
performance things, and agreed we should implement a hook in Nova that Ironic 
can do to say “power changed” and “deploy done” and other things like this. 
This will help reduce or eliminate polling from our virt driver to Ironic, and 
also allow Nova to notice these changes faster. More on that later?

Splitting the conductor: we discussed the many tasks the conductor is 
responsible for, and pondered if we could or should split things up. This has 
implications (good and bad) for operability, scalability, and security. 
Splitting the conductor to multiple workers would allow operators to use 
different security models for different tasks (e.g. only allowing an “OOB 
worker” access to the OOB network). It would also allow folks to scale out 
workers that do lots of work (like the power status loop) separately from those 
that do minimal work (writing PXE configs). I intend to investigate this more 
during this cycle and lay out a plan for doing the work. This also may require 
better distributed locking, which TheJulia has started investigating.

Changing boot mode defaults: Apparently Intel is going to stop shipping 
hardware that is capable of legacy BIOS booting in 2020. We agreed that we 
should work toward changing the default boot mode to UEFI to better prepare our 
users, but we can’t drop legacy BIOS mode until all of the old hardware in the 
world is gone. TheJulia is going to dig through the code and make a task list.

UEFI HTTPClient booting: This is a DHCP class that allows the DHCP server to 
return a URL instead of a “next-server” (TFTP location) response. This is a 
clear value add, and TheJulia is going to work on it as she is already neck 
deep in that area of code. We also need to ensure that Neutron supports this. 
It should, as it’s just more DHCP options, but we need to verify.

SecureBoot: I presented Oath’s secureboot model, which doesn’t depend on a 
centralized attestation server. It made sense to people, and we discussed 
putting the driver in tree. The process does rely on some enhancements to iPXE, 
so Oath is going to investigate upstreaming those changes and publishing more 
documentation, and then an in-tree driver should be no problem. We also 
discussed Ironic’s current SecureBoot (TrustedBoot?) implementations. Currently 
it only works with PXE, not iPXE or Grub2. TheJulia is going to look into 
adding this support. We should be able to do CI jobs for it, as TPM 1.2 and 2.0 
emulation both seem to be supported in QEMU as of 2.11.

NIC PXE configuration as a clean step: the DRAC driver team has a desire to 
configure NICs for PXE or not, and sync with the ironic database’s pxe_enabled 
field. This has gone back and forth in IRC. We were able to resolve some of the 
issues with it, and rpioso is going to write a small spec to make sure we get 
the details right.

Thursday: more ironic things

Neutron cross-project discussion: we discussed SmartNICs, which the Neutron 
team had also discussed the previous day. In short, SmartNICs are NICs that run 
OVS. The Neutron team discussed the scalability of their OVS agent running 
across thousands of machines, and are planning to make some sort of 
“superagent”. This superagent essentially owns a group of OVS agents. It will 
talk to Neutron over rabbit as usual, but then use some other protocol to talk 
to the OVS agents it is managing. This should help with rabbit load even in 
“standard” Openstack environments, and is especially useful (to me) for 
minimizing rabbitmq connections from far edge sites. The catch with SmartNICs 
and Ironic is that the NICs must have power to be configured (and thus the 
machine must be on). This breaks our general model of “only configure 
networking with the machine off, to make sure we don’t cross streams between 
tenants and control plane”. We came to a decent compromise (I think), and 
agreed to continue in the ironic spec, and revisit the topic in Berlin.

Federation: we discussed federation and people seemed interested, however I 
don’t believe we made any real progress toward getting it done. There’s still a 
debate whether this should be something in Ironic itself, or if there should 
just be some sort of proxy layer in front of multiple Ironic environments. To 
be continued in the spec.

Agent polling: we discussed the spec to drop communication from IPA to the 
conductor. It seems like nobody has major issues with it, and the spec just 
needs some polishing before landing.

L3 deployments: We brought this up, and again there seems to be little 
contention. I ended up approving the spec shortly after.

Neutron event processing: This work has been hanging for years and not getting 
done. Some folks wondered if we should just poll Neutron, if that gets the work 
done more quickly. Others wondered if we should even care about it at all (we 
should). TheJulia is going to follow up with dtantsur and vdrok to see if we 
can get someone to mainline some caffeine and just get it done.

CMDB: Oath and CERN presented their work toward speccing out a CMDB application 
that can integrate with Ironic. We discussed the problems that they are trying 
to solve and agreed they need solving. We also agreed that strict schema is 
better than blobjects (© jaypipes). We agreed it probably doesn’t need to be in 
Ironic governance, but could be one day. The next steps are to start hacking in 
a new repo in the OpenStack infrastructure, and propose specs for any Ironic 
integration that is needed. Red Hat and Dell contributors also showed interest 
in the project and volunteered to help. Some folks are going to try and talk to 
the wider OpenStack community to find out if there’s interest or needs from 
projects like Nova/Neutron/Cinder, etc.

Stein goals: We put together a list of goals and voted on them. Julia has since 
proposed the patch to document them: https://review.openstack.org/#/c/603161/

Last thing Thursday: Cross-project discussions with Nova. Summarized here, but 
lots of detail in the etherpad under the Ironic section: 
https://etherpad.openstack.org/p/nova-ptg-stein

Power sync: We discussed some problems CERN has with the instance power sync 
(Rackspace also saw these problems). In short, nova asserts power state if the 
instance “should” be off but the power is turned on out-of-band. Operators 
definitely need to be aware of this when doing maintenance on active machines, 
but we also discussed Ironic calling back to Nova when Ironic knows that the 
power state has been updated (via Ironic API, etc). I volunteered to look at 
this, and dansmith volunteered to help out.

API heaviness: We discussed how many API calls our virt driver does. As 
mentioned earlier, I proposed a patch to make the power sync loop more 
lightweight. There’s also lots of polling for tasks like deploy and rescue, 
which we can dramatically reduce with a callback from Ironic to Nova. I also 
volunteered to investigate this, and dansmith again agreed to help.

Compute host grouping: Ironic now has a mechanism for grouping conductors to 
nodes, and we want to mirror that in Nova. We discussed how to take the group 
as a config option and be able to find the other compute services managing that 
group, so we can build the hash ring correctly. We concluded that it’s a really 
hard problem (TM), and agreed to also add a config option like “peer_list” that 
can be used to list other compute services in the same group. This can be read 
dynamically each time we build the hash ring, or can be a mutable config with 
updates triggered by a SIGHUP. We’ll hash out the details in a blueprint or 
spec. Again, I agreed to begin the work, and dansmith agreed to help.

Capabilities filter: This was the last topic. It’s been on the chopping block 
for ages, but we are just now reaching the point where it can be properly 
deprecated. We discussed the plan, and mostly agreed it was good enough. 
johnthetubaguy is going to send the plan wider and make sure it will work for 
folks. We also discussed modeling countable resources on Ironic resource 
providers, which will work as long as there is still some resource class with 
an inventory of one, like we have today. Some folks may investigate doing this, 
but it’s fuzzy how much people care or if we really need/want to do it.

Friday: kind of bummed around the Ironic and TC rooms. Lots of interesting 
discussions, but nothing I feel like writing about here (as Ironic 
conversations were things like code deep-dives not worth communicating widely, 
and the TC topics have been written about to death).

// jim

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ironic][edge] Notes from the PTG

Reply via email to