On 11/16/23 10:17, Dumitru Ceara wrote: > On 11/16/23 10:02, Dumitru Ceara wrote: >> On 11/16/23 09:51, Dumitru Ceara wrote: >>> On 11/16/23 09:44, Ales Musil wrote: >>>> On Thu, Nov 16, 2023 at 9:34 AM Dumitru Ceara <dce...@redhat.com> wrote: >>>> >>>>> On 11/16/23 00:12, Ilya Maximets wrote: >>>>>> On 11/15/23 22:20, Mark Michelson wrote: >>>>>>> On 11/15/23 12:02, Ilya Maximets wrote: >>>>>>>> On 11/15/23 17:00, Dumitru Ceara wrote: >>>>>>>>> All of the above were changed to track the latest available releases. >>>>>>>>> Initially that seemed like a good idea but in practice, a new release >>>>> would >>>>>>>>> potentially (silently) cause CI runs that used to pass on given stable >>>>>>>>> versions to unexpectedly start failing. >>>>>>>>> >>>>>>>>> To address that this series pins all versions we can control allowing >>>>> us >>>>>>>>> to use different values for different branches. >>>>>>>>> >>>>>>>>> NOTE: the last commit of this series will look slightly different on >>>>>>>>> stable branches, e.g., on branches <= 23.06 we need Python <= 3.11 and >>>>>>>>> Fedora 37. >>>>>>>>> >>>>>>>>> The first commit of the series bumps the OVS submodule to include the >>>>>>>>> latest fixes to errors reported by recent versions of flake8. >>>>>>>>> >>>>>>>>> Changes in v2: >>>>>>>>> - added first patch to bump OVS submodule >>>>>>>>> - addressed Ales' review comments: >>>>>>>>> - moved the logic to determine which image to use out of ci.sh; >>>>>>>>> it's now in the workflow itself. >>>>>>>>> - moved setting of OVN_IMAGE in the same block with all related >>>>>>>>> env vars >>>>>>>>> - added a note for maintainers in the release process documentation >>>>>>>>> to mention that the new workflow will likely have to be triggered >>>>>>>>> manually (at least once) when branching for a new release. GitHub >>>>>>>>> doesn't allow periodic jobs on non-default branches. >>>>>>>> >>>>>>>> IMHO, that sounds horrible to maintain. Can we just build things per >>>>> run? >>>>>>> >>>>>>> What if we set up the "Containers" action to run on branch creation? >>>>>>> Would that adequately substitute for the manual run that creates the >>>>> image? >>>>>>> >>>>>>> If we only have to run the action once each time a branch is created, I >>>>>>> don't think that qualifies as "horrible" to maintain. This could be >>>>>>> lumped into our branch creation scripts. However, if it can be >>>>> automated >>>>>>> as I suggested above, that would be better. >>>>>> >>>>>> The job will never be re-run, meaning we'll be using the same container >>>>>> for many years. In case we'll need an update for this container, it will >>>>>> need to be done manually for each branch, except for main. >>>>>> >>>>>> Also, in the patch, the Cirrus CI is set to use the container from main, >>>>>> so this will need to be adjusted for each branch, will need to become >>>>>> part of the release patches (?). >>>>>> >>>>>> Another issue with current implementation of containers is that it's >>>>>> a pain for external contributor to change dependencies, because they'll >>>>>> have to modify the CI in multiple places in order to build and use their >>>>>> own containers. >>>>>> And with these new changes it will become worse. For example, I never >>>>> use >>>>>> 'main' or 'branch-*' branches in my own forks. And CI depends on the >>>>> branch >>>>>> name now, and it will fall back to an unmaintained old-style-named image. >>>>>> Might not be easy to figure out why all your builds are failing, >>>>> especially >>>>>> if you're new to the project. >>>>>> >>>>> >>>>> We could try to make it build the container image every time on stable >>>>> branches and avoid any manual steps.
But the branch name will still be a problem. Every time people use any branch name other than 'main' for their own development, they'll get an outdated container, unless they change the CI scripts. Or am I missing something? >>>>> >>>>>>> >>>>>>>> How much value these containers actually have? >>>>> >>>>> I think you mentioned the benefit lower already: >>>>> - we can share the artifact (container image) with Cirrus CI; aarch64 >>>>> container build takes significantly longer >>>>> - we can share the artifact with downstream CIs >>>>> >>>>>>> >>>>>>> As patch 2 mentions, it allows for us to use different distro bases for >>>>>>> each version. >>>>>> >>>>>> This can be done by simply specifying 'container: name' in the job, >>>>>> GHA will fetch and use that container. >>>>>> >>>>>>> But I also think that having the container image cached >>>>>>> allows for quicker test runs since we are not having to reconstruct the >>>>>>> environment each time we perform test runs. >>>>>> >>>>>> It takes less than 2 minutes to create a full x86 environment. We spend >>>>>> more time re-building OVS and OVN for every variant of tests, even though >>>>>> most of them are using the same compiler flags. >>>>>> >>>>>> Anyways, there is no need to push the container into registry, we can >>>>> build >>>>>> it and pass to a next job via artifacts. I don't think spending extra >>>>> 1-2 >>>>>> minutes per workflow (not job, workflow) is a significant problem, if it >>>>>> allows to just build whatever the current dependencies are and not think >>>>>> about what kind of outdated build from which branch is going to be used. >>>>>> >>>>>> The only downside of this is that we can't share artifact with Cirrus CI >>>>>> or some downstream CIs, but that also doesn't seem to be a significant >>>>>> issue. 1-2 minutes per workflow is not that much. Building the aarch64 >>>>>> container takes a lot more time, but switching it from fedora to ubuntu >>>>>> takes only 10 minutes to build, so might be reasonable to just rebuild it >>>>>> on-demand as well. >>>>>> >>>>> >>>>> Ales, why do we need the Cirrus CI image to be Fedora based? >>>>> >>>> >>>> We don't, but it's way easier for me to navigate. I'm not even sure how/if >>>> Ubuntu one works for aarch64. We can switch that one, however I don't see >>>> any clear benefit in doing so. Also one thing to note, even if it's 1-2 >>> >>> Well, it takes way less time to build, that might be an argument for it. FWIW, Fedora takes an hour to build, Ubuntu only takes ~10 minutes. >>> >>>> minutes (it's probably slightly more) we are losing the ability to have >>>> reproducible builds. It's very easy today to take the image and reproduce >>>> the same issue IMO. >>>> >>> >>> That's a good point, reproducible builds are important. Storing the >>> image as GitHub CI artifact would still allow us to download it and load >>> it locally but it's way easier if it's just stored in the ghcr.io register. How many times did you use this for debugging in the past year? If it's not a day-to-day churn, it might be worth to add an extra step of downloading an artifact. >>> >>>> >>>>> >>>>>> Not a strong opinion, but the branch name thing might be the most >>>>> annoying, >>>>>> if I understand correctly. >>>>>> >>>>> >>>>> On the (very) short term however, my problem is that I can't merge any >>>>> bug fix in OVN because I can't run any CI at the moment. Mostly that's >>>>> because of different versions of OVN (and OVS via submodule) python code >>>>> that triggers different flake8 warnings with different >>>>> flake8/python/distro versions. >>>>> >>>>> A quick fix for that for now could be to bump OVS submodule again to >>>>> fdbf0bb2aed5 ("flake8: Fix E721 check failures.") (which is what patch >>>>> 1/3 is doing) but on _all_ stable branches. That's not ideal because >>>>> that commit (and its backports) are not part of any release yet. >>>>> >>>>> I wanted to avoid doing that but it seems we don't have an easy >>>>> alternative so I'm starting to think we should just go for the OVS >>>>> submodule bump and try to find a better solution for the container >>>>> workflow on the longer term. >>>>> >> >> Just in case I wasn't clear, what I meant here is applying patch 1/3 to >> main and its corresponding stable branch backports bumping the submodule >> to the tips of OVS branch-3.2, branch-3.1 and branch-3.0. Bumping submodule to the tips of OVS stable branches seems OK to me. >> > > .. and we need to pin Python version to 3.11 on branches <=23.03. In general, patches 1 and 3 of this set are fine (I didn't test). The main problem, IMO, is the branch name dependency that makes day-to-day development harder. > >>>>> What do you guys think? >>>>> >>>> >>>> I'm afraid that we will constantly chase those ghosts of different >>>> dependency problems for various branches. And I'm not sure what would be >>>> the proper way to handle that. The unfortunate thing is that we don't >>>> control directly OvS so some of those "hacks" might be always necessary to >>>> have it working. >>>> >>> >>> Isn't that a different discussion? Partially because OVN uses >>> "internal" OVS structures directly. >>> > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev