On 11/16/23 13:44, Ales Musil wrote: > On Thu, Nov 16, 2023 at 1:40 PM Dumitru Ceara <dce...@redhat.com> wrote: > >> On 11/16/23 13:05, Ilya Maximets wrote: >>> On 11/16/23 10:17, Dumitru Ceara wrote: >>>> On 11/16/23 10:02, Dumitru Ceara wrote: >>>>> On 11/16/23 09:51, Dumitru Ceara wrote: >>>>>> On 11/16/23 09:44, Ales Musil wrote: >>>>>>> On Thu, Nov 16, 2023 at 9:34 AM Dumitru Ceara <dce...@redhat.com> >> wrote: >>>>>>> >>>>>>>> On 11/16/23 00:12, Ilya Maximets wrote: >>>>>>>>> On 11/15/23 22:20, Mark Michelson wrote: >>>>>>>>>> On 11/15/23 12:02, Ilya Maximets wrote: >>>>>>>>>>> On 11/15/23 17:00, Dumitru Ceara wrote: >>>>>>>>>>>> All of the above were changed to track the latest available >> releases. >>>>>>>>>>>> Initially that seemed like a good idea but in practice, a new >> release >>>>>>>> would >>>>>>>>>>>> potentially (silently) cause CI runs that used to pass on given >> stable >>>>>>>>>>>> versions to unexpectedly start failing. >>>>>>>>>>>> >>>>>>>>>>>> To address that this series pins all versions we can control >> allowing >>>>>>>> us >>>>>>>>>>>> to use different values for different branches. >>>>>>>>>>>> >>>>>>>>>>>> NOTE: the last commit of this series will look slightly >> different on >>>>>>>>>>>> stable branches, e.g., on branches <= 23.06 we need Python <= >> 3.11 and >>>>>>>>>>>> Fedora 37. >>>>>>>>>>>> >>>>>>>>>>>> The first commit of the series bumps the OVS submodule to >> include the >>>>>>>>>>>> latest fixes to errors reported by recent versions of flake8. >>>>>>>>>>>> >>>>>>>>>>>> Changes in v2: >>>>>>>>>>>> - added first patch to bump OVS submodule >>>>>>>>>>>> - addressed Ales' review comments: >>>>>>>>>>>> - moved the logic to determine which image to use out of >> ci.sh; >>>>>>>>>>>> it's now in the workflow itself. >>>>>>>>>>>> - moved setting of OVN_IMAGE in the same block with all >> related >>>>>>>>>>>> env vars >>>>>>>>>>>> - added a note for maintainers in the release process >> documentation >>>>>>>>>>>> to mention that the new workflow will likely have to be >> triggered >>>>>>>>>>>> manually (at least once) when branching for a new release. >> GitHub >>>>>>>>>>>> doesn't allow periodic jobs on non-default branches. >>>>>>>>>>> >>>>>>>>>>> IMHO, that sounds horrible to maintain. Can we just build >> things per >>>>>>>> run? >>>>>>>>>> >>>>>>>>>> What if we set up the "Containers" action to run on branch >> creation? >>>>>>>>>> Would that adequately substitute for the manual run that creates >> the >>>>>>>> image? >>>>>>>>>> >>>>>>>>>> If we only have to run the action once each time a branch is >> created, I >>>>>>>>>> don't think that qualifies as "horrible" to maintain. This could >> be >>>>>>>>>> lumped into our branch creation scripts. However, if it can be >>>>>>>> automated >>>>>>>>>> as I suggested above, that would be better. >>>>>>>>> >>>>>>>>> The job will never be re-run, meaning we'll be using the same >> container >>>>>>>>> for many years. In case we'll need an update for this container, >> it will >>>>>>>>> need to be done manually for each branch, except for main. >>>>>>>>> >>>>>>>>> Also, in the patch, the Cirrus CI is set to use the container from >> main, >>>>>>>>> so this will need to be adjusted for each branch, will need to >> become >>>>>>>>> part of the release patches (?). >>>>>>>>> >>>>>>>>> Another issue with current implementation of containers is that >> it's >>>>>>>>> a pain for external contributor to change dependencies, because >> they'll >>>>>>>>> have to modify the CI in multiple places in order to build and use >> their >>>>>>>>> own containers. >>>>>>>>> And with these new changes it will become worse. For example, I >> never >>>>>>>> use >>>>>>>>> 'main' or 'branch-*' branches in my own forks. And CI depends on >> the >>>>>>>> branch >>>>>>>>> name now, and it will fall back to an unmaintained old-style-named >> image. >>>>>>>>> Might not be easy to figure out why all your builds are failing, >>>>>>>> especially >>>>>>>>> if you're new to the project. >>>>>>>>> >>>>>>>> >>>>>>>> We could try to make it build the container image every time on >> stable >>>>>>>> branches and avoid any manual steps. >>> >>> But the branch name will still be a problem. Every time people use any >>> branch name other than 'main' for their own development, they'll get an >>> outdated container, unless they change the CI scripts. Or am I missing >>> something? >>> >> >> Yeah, we'd have to make it build the container image every time on >> non-default branches. That's a problem because the fedora container >> takes so long. So we'd have to separate the two types of container >> builds. Seems like a lot of trouble to go through indeed. >> >>>>>>>> >>>>>>>>>> >>>>>>>>>>> How much value these containers actually have? >>>>>>>> >>>>>>>> I think you mentioned the benefit lower already: >>>>>>>> - we can share the artifact (container image) with Cirrus CI; >> aarch64 >>>>>>>> container build takes significantly longer >>>>>>>> - we can share the artifact with downstream CIs >>>>>>>> >>>>>>>>>> >>>>>>>>>> As patch 2 mentions, it allows for us to use different distro >> bases for >>>>>>>>>> each version. >>>>>>>>> >>>>>>>>> This can be done by simply specifying 'container: name' in the job, >>>>>>>>> GHA will fetch and use that container. >>>>>>>>> >>>>>>>>>> But I also think that having the container image cached >>>>>>>>>> allows for quicker test runs since we are not having to >> reconstruct the >>>>>>>>>> environment each time we perform test runs. >>>>>>>>> >>>>>>>>> It takes less than 2 minutes to create a full x86 environment. We >> spend >>>>>>>>> more time re-building OVS and OVN for every variant of tests, even >> though >>>>>>>>> most of them are using the same compiler flags. >>>>>>>>> >>>>>>>>> Anyways, there is no need to push the container into registry, we >> can >>>>>>>> build >>>>>>>>> it and pass to a next job via artifacts. I don't think spending >> extra >>>>>>>> 1-2 >>>>>>>>> minutes per workflow (not job, workflow) is a significant problem, >> if it >>>>>>>>> allows to just build whatever the current dependencies are and not >> think >>>>>>>>> about what kind of outdated build from which branch is going to be >> used. >>>>>>>>> >>>>>>>>> The only downside of this is that we can't share artifact with >> Cirrus CI >>>>>>>>> or some downstream CIs, but that also doesn't seem to be a >> significant >>>>>>>>> issue. 1-2 minutes per workflow is not that much. Building the >> aarch64 >>>>>>>>> container takes a lot more time, but switching it from fedora to >> ubuntu >>>>>>>>> takes only 10 minutes to build, so might be reasonable to just >> rebuild it >>>>>>>>> on-demand as well. >>>>>>>>> >>>>>>>> >>>>>>>> Ales, why do we need the Cirrus CI image to be Fedora based? >>>>>>>> >>>>>>> >>>>>>> We don't, but it's way easier for me to navigate. I'm not even sure >> how/if >>>>>>> Ubuntu one works for aarch64. We can switch that one, however I >> don't see >>>>>>> any clear benefit in doing so. Also one thing to note, even if it's >> 1-2 >>>>>> >>>>>> Well, it takes way less time to build, that might be an argument for >> it. >>> >>> FWIW, Fedora takes an hour to build, Ubuntu only takes ~10 minutes. >>> >>>>>> >>>>>>> minutes (it's probably slightly more) we are losing the ability to >> have >>>>>>> reproducible builds. It's very easy today to take the image and >> reproduce >>>>>>> the same issue IMO. >>>>>>> >>>>>> >>>>>> That's a good point, reproducible builds are important. Storing the >>>>>> image as GitHub CI artifact would still allow us to download it and >> load >>>>>> it locally but it's way easier if it's just stored in the ghcr.io >> register. >>> >>> How many times did you use this for debugging in the past year? >>> If it's not a day-to-day churn, it might be worth to add an extra >>> step of downloading an artifact. >>> >> >> I'm not saying it's hard to use the artifact directly, just that it's >> more convenient to get the image from the registry. I don't use it that >> much, but when I do need it it's way easier for me to do a regular >> "podman/docker run" pointing it to the registry than to figure out how >> to download and load the image from the artifact. Anyhow, it's a minor >> annoyance and can be automated. >> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> Not a strong opinion, but the branch name thing might be the most >>>>>>>> annoying, >>>>>>>>> if I understand correctly. >>>>>>>>> >>>>>>>> >>>>>>>> On the (very) short term however, my problem is that I can't merge >> any >>>>>>>> bug fix in OVN because I can't run any CI at the moment. Mostly >> that's >>>>>>>> because of different versions of OVN (and OVS via submodule) python >> code >>>>>>>> that triggers different flake8 warnings with different >>>>>>>> flake8/python/distro versions. >>>>>>>> >>>>>>>> A quick fix for that for now could be to bump OVS submodule again to >>>>>>>> fdbf0bb2aed5 ("flake8: Fix E721 check failures.") (which is what >> patch >>>>>>>> 1/3 is doing) but on _all_ stable branches. That's not ideal >> because >>>>>>>> that commit (and its backports) are not part of any release yet. >>>>>>>> >>>>>>>> I wanted to avoid doing that but it seems we don't have an easy >>>>>>>> alternative so I'm starting to think we should just go for the OVS >>>>>>>> submodule bump and try to find a better solution for the container >>>>>>>> workflow on the longer term. >>>>>>>> >>>>> >>>>> Just in case I wasn't clear, what I meant here is applying patch 1/3 to >>>>> main and its corresponding stable branch backports bumping the >> submodule >>>>> to the tips of OVS branch-3.2, branch-3.1 and branch-3.0. >>> >>> Bumping submodule to the tips of OVS stable branches seems OK to me. >>> >> >> OK. >> >>>>> >>>> >>>> .. and we need to pin Python version to 3.11 on branches <=23.03. >>> >>> In general, patches 1 and 3 of this set are fine (I didn't test). >> >> OK, I'll prepare v3 and backports and post them soon. >> >>> The main problem, IMO, is the branch name dependency that makes >>> day-to-day development harder. >>> >> >> I think it's still useful to be able to pin container versions on stable >> branches and make them re-usable for other types of (downstream) CI. >> >> OTOH, if we have a dockerfile we use for building these CI container >> images, maybe other CI systems (e.g., the RedHat downstream CI) could >> just use the dockerfile to periodically pre-build their own environment. >> >> Ales, what do you think? >> > > > It's possible, but also very easy for that dockerfile to fall behind. > Currently we test both images. If we switch to Ubuntu for Cirrus CI and > disable automatic builds of Fedora I'm afraid that no one will care and > Fedora will be just unmaintained. >
Could we run a set of periodic GitHub workflows with Fedora as container image? Or does that become even more complex. > >> >>>> >>>>>>>> What do you guys think? >>>>>>>> >>>>>>> >>>>>>> I'm afraid that we will constantly chase those ghosts of different >>>>>>> dependency problems for various branches. And I'm not sure what >> would be >>>>>>> the proper way to handle that. The unfortunate thing is that we don't >>>>>>> control directly OvS so some of those "hacks" might be always >> necessary to >>>>>>> have it working. >>>>>>> >>>>>> >>>>>> Isn't that a different discussion? Partially because OVN uses >>>>>> "internal" OVS structures directly. >>>>>> >>>> >>> >> >> > Thanks, > Ales > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev