Excerpts from Chris Hoge's message of 2016-06-14 16:37:06 -0700: > Top posting one note and direct comments inline, I’m proposing > this as a member of the DefCore working group, but this > proposal itself has not been accepted as the forward course of > action by the working group. These are my own views as the > administrator of the program and not that of the working group > itself, which may independently reject the idea outside of the > response from the upstream devs. > > I posted a link to this thread to the DefCore mailing list to make > that working group aware of the outstanding issues. > > > On Jun 14, 2016, at 3:50 PM, Matthew Treinish <mtrein...@kortar.org> wrote: > > > > On Tue, Jun 14, 2016 at 05:42:16PM -0400, Doug Hellmann wrote: > >> Excerpts from Matthew Treinish's message of 2016-06-14 15:12:45 -0400: > >>> On Tue, Jun 14, 2016 at 02:41:10PM -0400, Doug Hellmann wrote: > >>>> Excerpts from Matthew Treinish's message of 2016-06-14 14:21:27 -0400: > >>>>> On Tue, Jun 14, 2016 at 10:57:05AM -0700, Chris Hoge wrote: > >>>>>> Last year, in response to Nova micro-versioning and extension > >>>>>> updates[1], > >>>>>> the QA team added strict API schema checking to Tempest to ensure that > >>>>>> no additional properties were added to Nova API responses[2][3]. In the > >>>>>> last year, at least three vendors participating the the OpenStack > >>>>>> Powered > >>>>>> Trademark program have been impacted by this change, two of which > >>>>>> reported this to the DefCore Working Group mailing list earlier this > >>>>>> year[4]. > >>>>>> > >>>>>> The DefCore Working Group determines guidelines for the OpenStack > >>>>>> Powered > >>>>>> program, which includes capabilities with associated functional tests > >>>>>> from Tempest that must be passed, and designated sections with > >>>>>> associated > >>>>>> upstream code [5][6]. In determining these guidelines, the working > >>>>>> group > >>>>>> attempts to balance the future direction of development with lagging > >>>>>> indicators of deployments and user adoption. > >>>>>> > >>>>>> After a tremendous amount of consideration, I believe that the DefCore > >>>>>> Working Group needs to implement a temporary waiver for the strict API > >>>>>> checking requirements that were introduced last year, to give > >>>>>> downstream > >>>>>> deployers more time to catch up with the strict micro-versioning > >>>>>> requirements determined by the Nova/Compute team and enforced by the > >>>>>> Tempest/QA team. > >>>>> > >>>>> I'm very much opposed to this being done. If we're actually concerned > >>>>> with > >>>>> interoperability and verify that things behave in the same manner > >>>>> between multiple > >>>>> clouds then doing this would be a big step backwards. The fundamental > >>>>> disconnect > >>>>> here is that the vendors who have implemented out of band extensions or > >>>>> were > >>>>> taking advantage of previously available places to inject extra > >>>>> attributes > >>>>> believe that doing so means they're interoperable, which is quite far > >>>>> from > >>>>> reality. **The API is not a place for vendor differentiation.** > >>>> > >>>> This is a temporary measure to address the fact that a large number > >>>> of existing tests changed their behavior, rather than having new > >>>> tests added to enforce this new requirement. The result is deployments > >>>> that previously passed these tests may no longer pass, and in fact > >>>> we have several cases where that's true with deployers who are > >>>> trying to maintain their own standard of backwards-compatibility > >>>> for their end users. > >>> > >>> That's not what happened though. The API hasn't changed and the tests > >>> haven't > >>> really changed either. We made our enforcement on Nova's APIs a bit > >>> stricter to > >>> ensure nothing unexpected appeared. For the most these tests work on any > >>> version > >>> of OpenStack. (we only test it in the gate on supported stable releases, > >>> but I > >>> don't expect things to have drastically shifted on older releases) It also > >>> doesn't matter which version of the API you run, v2.0 or v2.1. Literally, > >>> the > >>> only case it ever fails is when you run something extra, not from the > >>> community, > >>> either as an extension (which themselves are going away [1]) or another > >>> service > >>> that wraps nova or imitates nova. I'm personally not comfortable saying > >>> those > >>> extras are ever part of the OpenStack APIs. > >>> > >>>> We have basically three options. > >>>> > >>>> 1. Tell deployers who are trying to do the right for their immediate > >>>> users that they can't use the trademark. > >>>> > >>>> 2. Flag the related tests or remove them from the DefCore enforcement > >>>> suite entirely. > >>>> > >>>> 3. Be flexible about giving consumers of Tempest time to meet the > >>>> new requirement by providing a way to disable the checks. > >>>> > >>>> Option 1 goes against our own backwards compatibility policies. > >>> > >>> I don't think backwards compatibility policies really apply to what what > >>> define > >>> as the set of tests that as a community we are saying a vendor has to > >>> pass to > >>> say they're OpenStack. From my perspective as a community we either take > >>> a hard > >>> stance on this and say to be considered an interoperable cloud (and to > >>> get the > >>> trademark) you have to actually have an interoperable product. We slowly > >>> ratchet > >>> up the requirements every 6 months, there isn't any implied backwards > >>> compatibility in doing that. You passed in the past but not in the newer > >>> stricter > >>> guidelines. > >>> > >>> Also, even if I did think it applied, we're not talking about a change > >>> which > >>> would fall into breaking that. The change was introduced a year and half > >>> ago > >>> during kilo and landed a year ago during liberty: > >>> > >>> https://review.openstack.org/#/c/156130/ > >>> > >>> That's way longer than our normal deprecation period of 3 months and a > >>> release > >>> boundary. > >>> > >>>> > >>>> Option 2 gives us no winners and actually reduces the interoperability > >>>> guarantees we already have in place. > >>>> > >>>> Option 3 applies our usual community standard of slowly rolling > >>>> forward while maintaining compatibility as broadly as possible. > >>> > >>> Except in this case there isn't actually any compatibility being > >>> maintained. > >>> We're saying that we can't make the requirements for interoperability > >>> testing > >>> stricter until all the vendors who were passing in the past are able to > >>> pass > >>> the stricter version. > >>> > >>>> > >>>> No one is suggesting that a permanent, or even open-ended, exception > >>>> be granted. > >>> > >>> Sure, I agree an permanent or open-ended exception would be even worse. > >>> But, I > >>> still think as a community we need to draw a hard line in the sand here. > >>> Just > >>> because this measure is temporary doesn't make it any more palatable. > >>> > >>> By doing this, even as a temporary measure, we're saying it's ok to call > >>> things > >>> an OpenStack API when you add random gorp to the responses. Which is > >>> something we've > >>> very clearly said as a community is the exact opposite of the case, which > >>> the > >>> testing reflects. I still contend just because some vendors were running > >>> old > >>> versions of tempest and old versions of openstack where their > >>> incompatible API > >>> changes weren't caught doesn't mean they should be given pass now. > >> > >> Nobody is saying random gorp is OK, and I'm not sure "line in the > >> sand" rhetoric is really constructive. The issue is not with the > >> nature of the API policies, it's with the implementation of those > >> policies and how they were rolled out. > >> > >> DefCore defines its rules using named tests in Tempest. If these > >> new enforcement policies had been applied by adding new tests to > >> Tempest, then DefCore could have added them using its processes > >> over a period of time and we wouldn't have had any issues. That's > >> not what happened. Instead, the behavior of a bunch of *existing* > >> tests changed. As a result, deployments that have not changed fail > >> tests that they used to pass, without any action being taken on the > >> deployer's part. We've moved the goal posts on our users in a way > >> that was not easily discoverable, because it couldn't be tracked > >> through the (admittedly limited) process we have in place for doing > >> that tracking. > >> > >> So, we want a way to get the test results back to their existing > >> status, which will then let us roll adoption forward smoothly instead > >> of lurching from "pass" to "fail" to "pass". > > > > It doesn't have to be a bright line pass or fail. My primary concern here is > > that making this change is basically saying we're going to let things "pass" > > when running out of tree stuff that's adding arbitrary fields to the > > response. This > > isn't really interoperable and isn't being honest with what the vendor > > clouds are > > actually doing. It would hide the truth from the people who rely on these > > results > > to determine interoperability. The proposal as I read it (and maybe it's my > > misconception) was to mask this and vendor clouds "pass" until they can fix > > it, > > which essentially hides the issue. Especially given there are a lot of > > clouds and > > products that don't have any issue here. > > The opposite is the intention of this proposal. It’s a compromise that admits > that since the introduction of the OpenStack Powered program, and the release > of this strict checking on additional properties, vendors that once passed > now fail, and the incentives to force that change didn’t start being felt > until > they hit their product renewal cycle. > > It’s not trying to mask anything, to the contrary by bringing it up here and > stating their public test results would indicate which APIs send additional > properties back, it’s shining a light on the issue and publicly stating that > it’s > not an acceptable long-term solution. > > > But, if we add another possible state on the defcore side like conditional > > pass, > > warning, yellow, etc. (the name doesn't matter) which is used to indicate > > that > > things on product X could only pass when strict validation was disabled (and > > be clear about where and why) then my concerns would be alleviated. I just > > do > > not want this to end up not being visible to end users trying to evaluate > > interoperability of different clouds using the test results. > > The OpenStack Marketplace is where these comparisons would happen, > and the APIs with additional response data would be stated. > > >> > >> We should, separately, address the process issues and the limitations > >> this situation has exposed. That may mean changing the way DefCore > >> defines its policies, or tracks things, or uses Tempest. For > >> example, in the future, we may want tie versions of Tempest to > >> versions of the trademark more closely, so that it's possible for > >> someone running the Mitaka version of OpenStack to continue to use > >> the Mitaka version of Tempest and not have to upgrade Tempest in > >> order to retain their trademark (maybe that's how it already works?). > > > > Tempest master supports all currently supported stable branches. So right > > now > > any commit to master is tested against a master cloud, a mitaka cloud, and a > > liberty cloud in the gate. We tag/push a release whenever we add or drop > > support > > for a release, the most recent being dropping kilo. [1][2] That being said > > the > > openstack apis **should** be backwards compatible so ideally master tempest > > would > > work fine on older clouds. (although this might not be reality) The primary > > wrinkle here are the tests which would depend on feature flags to indicate > > it's > > availability on newer versions. We eventually remove flags after all > > supported > > releases have a given feature. But, this can be worked around with test > > selection. (ie don't even try to run tests that require a feature juno > > didn’t > > have) > > The current active guidelines cover icehouse through mitaka. The release > of 2016.08 will change that to cover juno through mitaka (with newton > as an add-on to 2016.08 when it’s released). There’s overlap between > the guidelines, so 2016.01 covers juno through mitaka while 2016.08 > will cover kilo through newton. Essentially two years of releases. > > >> We may also need to consider that test implementation details may > >> change, and have a review process within DefCore to help expose > >> those changes to make them clearer to deployers. > >> > >> Fixing the process issue may also mean changing the way we implement > >> things in Tempest. In this case, adding a flag helps move ahead > >> more smoothly. Perhaps we adopt that as a general policy in the > >> future when we make underlying behavioral changes like this to > >> existing tests. Perhaps instead we have a policy that we do not > >> change the behavior of existing tests in such significant ways, at > >> least if they're tagged as being used by DefCore. I don't know -- > >> those are things we need to discuss. > > > > Sure I agree, this thread raises larger issues which need to be figured out. > > But, that is probably an independent discussion. > > I’m beginning to wonder if we need to make DefCore use release > branches then back-port bug-fixes and relevant features additions > as necessary.
We should definitely have that conversation, to understand what effect it would have both on Tempest and on DefCore. Doug __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev