Rossella Sblendido wrote:
Hello Artur,
thanks for staring this thread. See inline please.
On 10/15/2015 05:23 PM, Ihar Hrachyshka wrote:
Hi Artur,
thanks a lot for caring about upgrades!
There are a lot of good points below. As you noted, surprisingly, we
seem to have rolling upgrades working for RPC layer. Before we go
into complicating database workflow by doing oslo.versionedobjects
transition heavy-lifting, I would like us to spend cycles on making
sure rolling upgrades work not just surprisingly, but also covered
with appropriate gating (I speak grenade).
+1 agreed that the first step is to have test coverage then we can go
on improving the process :)
I also feel that upgrades are in lots of ways not only a technical
issue, but a cultural one too. You should have reviewers being aware
of all the moving parts, and how a seemingly innocent change can
break the flow. That’s why I plan to start on a devref page
specifically about upgrades, where we could lay ground about which
scenarios we should support, and those we should not (f.e. we have
plenty of compatibility code in agents that to handle old controller
scenario, which should not be supported); how all pieces interact and
behave in transition, and what to look for during reviews. Hopefully,
once such a page is up and read by folks, we will be able to have
more meaningful conversation about our upgrade strategy.
On 14 Oct 2015, at 20:10, Korzeniewski, Artur
<artur.korzeniew...@intel.com> wrote:
Hi all,
I would like to gather all upgrade activities in Neutron in one
place, in order to summarizes the current status and future
activities on rolling upgrades in Mitaka.
If you think it’s worth it, we can start up a new etherpad page to
gather upgrade ideas and things to do.
1. RPC versioning
a. It is already implemented in Neutron.
b. TODO: To have the rolling upgrade we have to implement the
RPC version pinning in conf.
i. I’m not
a big fan of this solution, but we can work out better idea if needed.
As Dan pointed out, and as I think Miguel was thinking about, we can
have pin defined by agents in the cluster. Actually, we can have per
agent pin.
I am not a big fan either mostly because the pinning is a manual task.
Anyway looking at the patch Dan linked
https://review.openstack.org/#/c/233289/ ...if we remove the manual
step I can become a fan of this approach :)
Yes, the minimum implementation we could agree on initially was pining.
Direct request of objects from agents
to neutron-server includes the requested version, so that's always OK,
the complicated part is notification of object
changes via fanout.
In that case, I thinking of including the supported object versions on
agent status reports, so neutron server can
decide on runtime which versions to send (in some cases it may need to
send several versions in parallel), I'm in
long due to upload the strategy to the rpc callbacks devref. But it will
be along those lines.
c. Possible unit/functional tests to catch RPC version
incompatibilities between RPC revisions.
d. TODO: Multi-node Grenade job to have rolling upgrades
covered in CI.
That is not for unit or functional test level.
As you mentioned, we already have grenade project that is designed to
test upgrades. To validate RPC compatibility on rolling upgrade we
would need so called ‘partial’ job (when different components are
running with different versions; in case of neutron it would mean a
new controller and old agents). The job is present in nova gate and
validates RPC compatibility.
As far as I know, Russell Bryant was looking into introducing the job
for neutron, but was blocked by ongoing grenade refactoring to
support partial upgrades ‘the right way’ (using multinode setups). I
think that we should check with grenade folks on that matter, I have
heard start of Mitaka was ETA for this work to complete.
2. Message content versioning – versioned objects
a. TODO: implement Oslo Versionobject in Mitaka cycle. The
interesting entities to be implemented: network, subnet, port,
security groups…
Though we haven’t touched base neutron resources in Liberty, we
introduced oslo.versionedobjects based NeutronObject class during
Liberty as part of QoS effort. I plan to expand on that work during
Mitaka.
++
The existing code for QoS resources can be found at:
https://github.com/openstack/neutron/tree/master/neutron/objects
b. Will OVO have impact on vendor plugins?
It surely can have significant impact, but hopefully dict compat
layer should make transition more smooth:
https://github.com/openstack/neutron/blob/master/neutron/objects/base.py#L50
Correct.
c. Be strict on changes in version objects in code review, any
change in object structure should increment the minor
(backward-compatible) or major (breaking change) RPC version.
That’s assuming we have a clear mapping of objects onto current RPC
interfaces, which is not obvious. Another problem we would need to
solve is core resource extensions (currently available in ml2 only),
like qos or port_security, that modify resources based on controller
configuration.
d. Indirection API – message from newer format should be
translated to older version by neutron server.
For QoS, we used a new object agnostic subscriber mechanism to
propagate changes applied to QoS objects into agents:
http://docs.openstack.org/developer/neutron/devref/rpc_callbacks.html
It is already (expected) to downgrade objects based on agent version
(note it’s not implemented yet, but will surely be ready during Mitaka):
https://github.com/openstack/neutron/blob/master/neutron/api/rpc/handlers/resources_rpc.py#L142
Yes, that's exactly what I was talking above. It has an object retrieval
for agents, where they can specify a version,
but subscription/notifications is the complicated part.
3. Database migration
a. Online schema migration was done in Liberty release, any
work left to do?
Nothing specific, maybe a bug or two here and there.
b. TODO: Online data migration to be introduced in Mitaka cycle.
i. Online
data migration can be done during normal operation on the data.
ii. There
should be also the script to invoke the data migration in the
background.
c. Currently the contract phase is doing the data migration.
But since the contract phase should be run offline, we should move
the data migration to preceding step. Also the contract phase should
be blocked if there is still relevant data in removed entities.
Yes, we definitely need a stop mechanism first, then play with data
migrations. I don’t think we can consider data migration before we
have a way to hide bloody migration details behind abstract resources
(read: versioned objects). Realistically, I would consider data
migration too far at the moment to consider as a todo step. But we
definitely should look forward to it.
i. Contract
phase can be executed online, if there is all new code running in
setup.
I am not sure how it’s possible. Do you think it’s realistic to
expect controller to resolve a lot of checks that usually db does
(constraints?) while schema is not enforced?
d. The other strategy is to not drop tables, alter names or
remove the columns from the DB – what’s in, it’s in. We should put
more attention on code reviews, merge only additive changes and
avoid questionable DB modification.
I don’t like that approach. It suggests there is no way back if we
screw something. Having a short contract phase which is offline seems
to me like a reasonable approach. Anyway, it can be reconsidered
after we have the elephant in the room solved (the data migration
problem).
e. The Neutron server should be updated first, in order to do
data translation between old format into new schema. When doing
this, we can be sure that old data would not be inserted into old DB
structures.
To my taste, that’s ^ the most clear way to go.
Correct.
I have performed the manual Kilo to Liberty upgrade, both in
operational manner and in code review of the RPC APIs. All is
working fine.
We can have some discussion on cross-project session [7] or we can
also review any issues with Neutron upgrade in Friday’s unplugged
session [8].
I will be more than happy to sit with folks interested in our upgrade
story and go write a plan for Mitaka.
I am interested too and I am based in Italy (same time zone, yuppie)
cheers,
Rossella
Ping me for discussion, it's a topic I'm interested on too.
Please ping me on irc (ihrachys), and we will think how we can sync
effectively and push the effort forward. (btw I am located in Czech
Republic, so we should be in the same time zone).
Regards,
Ihar
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev