Re: [ovs-dev] [PATCH ovn v2 0/3] Stabilize CI by pinning container, runner and Python versions.

Ales Musil Thu, 16 Nov 2023 00:45:22 -0800

On Thu, Nov 16, 2023 at 9:34 AM Dumitru Ceara <dce...@redhat.com> wrote:


> On 11/16/23 00:12, Ilya Maximets wrote:
> > On 11/15/23 22:20, Mark Michelson wrote:
> >> On 11/15/23 12:02, Ilya Maximets wrote:
> >>> On 11/15/23 17:00, Dumitru Ceara wrote:
> >>>> All of the above were changed to track the latest available releases.
> >>>> Initially that seemed like a good idea but in practice, a new release
> would
> >>>> potentially (silently) cause CI runs that used to pass on given stable
> >>>> versions to unexpectedly start failing.
> >>>>
> >>>> To address that this series pins all versions we can control allowing
> us
> >>>> to use different values for different branches.
> >>>>
> >>>> NOTE: the last commit of this series will look slightly different on
> >>>> stable branches, e.g., on branches <= 23.06 we need Python <= 3.11 and
> >>>> Fedora 37.
> >>>>
> >>>> The first commit of the series bumps the OVS submodule to include the
> >>>> latest fixes to errors reported by recent versions of flake8.
> >>>>
> >>>> Changes in v2:
> >>>> - added first patch to bump OVS submodule
> >>>> - addressed Ales' review comments:
> >>>>    - moved the logic to determine which image to use out of ci.sh;
> >>>>      it's now in the workflow itself.
> >>>>    - moved setting of OVN_IMAGE in the same block with all related
> >>>>      env vars
> >>>> - added a note for maintainers in the release process documentation
> >>>>    to mention that the new workflow will likely have to be triggered
> >>>>    manually (at least once) when branching for a new release.  GitHub
> >>>>    doesn't allow periodic jobs on non-default branches.
> >>>
> >>> IMHO, that sounds horrible to maintain.  Can we just build things per
> run?
> >>
> >> What if we set up the "Containers" action to run on branch creation?
> >> Would that adequately substitute for the manual run that creates the
> image?
> >>
> >> If we only have to run the action once each time a branch is created, I
> >> don't think that qualifies as "horrible" to maintain. This could be
> >> lumped into our branch creation scripts. However, if it can be
> automated
> >> as I suggested above, that would be better.
> >
> > The job will never be re-run, meaning we'll be using the same container
> > for many years.  In case we'll need an update for this container, it will
> > need to be done manually for each branch, except for main.
> >
> > Also, in the patch, the Cirrus CI is set to use the container from main,
> > so this will need to be adjusted for each branch, will need to become
> > part of the release patches (?).
> >
> > Another issue with current implementation of containers is that it's
> > a pain for external contributor to change dependencies, because they'll
> > have to modify the CI in multiple places in order to build and use their
> > own containers.
> > And with these new changes it will become worse.  For example, I never
> use
> > 'main' or 'branch-*' branches in my own forks.  And CI depends on the
> branch
> > name now, and it will fall back to an unmaintained old-style-named image.
> > Might not be easy to figure out why all your builds are failing,
> especially
> > if you're new to the project.
> >
>
> We could try to make it build the container image every time on stable
> branches and avoid any manual steps.
>
> >>
> >>> How much value these containers actually have?
>
> I think you mentioned the benefit lower already:
> - we can share the artifact (container image) with Cirrus CI; aarch64
> container build takes significantly longer
> - we can share the artifact with downstream CIs
>
> >>
> >> As patch 2 mentions, it allows for us to use different distro bases for
> >> each version.
> >
> > This can be done by simply specifying 'container: name' in the job,
> > GHA will fetch and use that container.
> >
> >> But I also think that having the container image cached
> >> allows for quicker test runs since we are not having to reconstruct the
> >> environment each time we perform test runs.
> >
> > It takes less than 2 minutes to create a full x86 environment.  We spend
> > more time re-building OVS and OVN for every variant of tests, even though
> > most of them are using the same compiler flags.
> >
> > Anyways, there is no need to push the container into registry, we can
> build
> > it and pass to a next job via artifacts.  I don't think spending extra
> 1-2
> > minutes per workflow (not job, workflow) is a significant problem, if it
> > allows to just build whatever the current dependencies are and not think
> > about what kind of outdated build from which branch is going to be used.
> >
> > The only downside of this is that we can't share artifact with Cirrus CI
> > or some downstream CIs, but that also doesn't seem to be a significant
> > issue.  1-2 minutes per workflow is not that much.  Building the aarch64
> > container takes a lot more time, but switching it from fedora to ubuntu
> > takes only 10 minutes to build, so might be reasonable to just rebuild it
> > on-demand as well.
> >
>
> Ales, why do we need the Cirrus CI image to be Fedora based?
>

We don't, but it's way easier for me to navigate. I'm not even sure how/if
Ubuntu one works for aarch64. We can switch that one, however I don't see
any clear benefit in doing so. Also one thing to note, even if it's 1-2
minutes (it's probably slightly more) we are losing the ability to have
reproducible builds. It's very easy today to take the image and reproduce
the same issue IMO.


>
> > Not a strong opinion, but the branch name thing might be the most
> annoying,
> > if I understand correctly.
> >
>
> On the (very) short term however, my problem is that I can't merge any
> bug fix in OVN because I can't run any CI at the moment.  Mostly that's
> because of different versions of OVN (and OVS via submodule) python code
> that triggers different flake8 warnings with different
> flake8/python/distro versions.
>
> A quick fix for that for now could be to bump OVS submodule again to
> fdbf0bb2aed5 ("flake8: Fix E721 check failures.") (which is what patch
> 1/3 is doing) but on _all_ stable branches.  That's not ideal because
> that commit (and its backports) are not part of any release yet.
>
> I wanted to avoid doing that but it seems we don't have an easy
> alternative so I'm starting to think we should just go for the OVS
> submodule bump and try to find a better solution for the container
> workflow on the longer term.
>
> What do you guys think?
>

I'm afraid that we will constantly chase those ghosts of different
dependency problems for various branches. And I'm not sure what would be
the proper way to handle that. The unfortunate thing is that we don't
control directly OvS so some of those "hacks" might be always necessary to
have it working.


>
> Thanks,
> Dumitru
>
>
>
>
>
Thanks,
Ales

-- 

Ales Musil

Senior Software Engineer - OVN Core

Red Hat EMEA <https://www.redhat.com>

amu...@redhat.com
<https://red.ht/sig>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn v2 0/3] Stabilize CI by pinning container, runner and Python versions.

Reply via email to