On 11/16/23 00:12, Ilya Maximets wrote:
> On 11/15/23 22:20, Mark Michelson wrote:
>> On 11/15/23 12:02, Ilya Maximets wrote:
>>> On 11/15/23 17:00, Dumitru Ceara wrote:
>>>> All of the above were changed to track the latest available releases.
>>>> Initially that seemed like a good idea but in practice, a new release would
>>>> potentially (silently) cause CI runs that used to pass on given stable
>>>> versions to unexpectedly start failing.
>>>>
>>>> To address that this series pins all versions we can control allowing us
>>>> to use different values for different branches.
>>>>
>>>> NOTE: the last commit of this series will look slightly different on
>>>> stable branches, e.g., on branches <= 23.06 we need Python <= 3.11 and
>>>> Fedora 37.
>>>>
>>>> The first commit of the series bumps the OVS submodule to include the
>>>> latest fixes to errors reported by recent versions of flake8.
>>>>
>>>> Changes in v2:
>>>> - added first patch to bump OVS submodule
>>>> - addressed Ales' review comments:
>>>>    - moved the logic to determine which image to use out of ci.sh;
>>>>      it's now in the workflow itself.
>>>>    - moved setting of OVN_IMAGE in the same block with all related
>>>>      env vars
>>>> - added a note for maintainers in the release process documentation
>>>>    to mention that the new workflow will likely have to be triggered
>>>>    manually (at least once) when branching for a new release.  GitHub
>>>>    doesn't allow periodic jobs on non-default branches.
>>>
>>> IMHO, that sounds horrible to maintain.  Can we just build things per run?
>>
>> What if we set up the "Containers" action to run on branch creation? 
>> Would that adequately substitute for the manual run that creates the image?
>>
>> If we only have to run the action once each time a branch is created, I 
>> don't think that qualifies as "horrible" to maintain. This could be 
>> lumped into our branch creation scripts. However, if it can be automated 
>> as I suggested above, that would be better.
> 
> The job will never be re-run, meaning we'll be using the same container
> for many years.  In case we'll need an update for this container, it will
> need to be done manually for each branch, except for main.
> 
> Also, in the patch, the Cirrus CI is set to use the container from main,
> so this will need to be adjusted for each branch, will need to become
> part of the release patches (?).
> 
> Another issue with current implementation of containers is that it's
> a pain for external contributor to change dependencies, because they'll
> have to modify the CI in multiple places in order to build and use their
> own containers.
> And with these new changes it will become worse.  For example, I never use
> 'main' or 'branch-*' branches in my own forks.  And CI depends on the branch
> name now, and it will fall back to an unmaintained old-style-named image.
> Might not be easy to figure out why all your builds are failing, especially
> if you're new to the project.
> 

We could try to make it build the container image every time on stable
branches and avoid any manual steps.

>>
>>> How much value these containers actually have?

I think you mentioned the benefit lower already:
- we can share the artifact (container image) with Cirrus CI; aarch64
container build takes significantly longer
- we can share the artifact with downstream CIs

>>
>> As patch 2 mentions, it allows for us to use different distro bases for 
>> each version.
> 
> This can be done by simply specifying 'container: name' in the job,
> GHA will fetch and use that container.
> 
>> But I also think that having the container image cached 
>> allows for quicker test runs since we are not having to reconstruct the 
>> environment each time we perform test runs.
> 
> It takes less than 2 minutes to create a full x86 environment.  We spend
> more time re-building OVS and OVN for every variant of tests, even though
> most of them are using the same compiler flags.
> 
> Anyways, there is no need to push the container into registry, we can build
> it and pass to a next job via artifacts.  I don't think spending extra 1-2
> minutes per workflow (not job, workflow) is a significant problem, if it
> allows to just build whatever the current dependencies are and not think
> about what kind of outdated build from which branch is going to be used.
> 
> The only downside of this is that we can't share artifact with Cirrus CI
> or some downstream CIs, but that also doesn't seem to be a significant
> issue.  1-2 minutes per workflow is not that much.  Building the aarch64
> container takes a lot more time, but switching it from fedora to ubuntu
> takes only 10 minutes to build, so might be reasonable to just rebuild it
> on-demand as well.
> 

Ales, why do we need the Cirrus CI image to be Fedora based?

> Not a strong opinion, but the branch name thing might be the most annoying,
> if I understand correctly.
> 

On the (very) short term however, my problem is that I can't merge any
bug fix in OVN because I can't run any CI at the moment.  Mostly that's
because of different versions of OVN (and OVS via submodule) python code
that triggers different flake8 warnings with different
flake8/python/distro versions.

A quick fix for that for now could be to bump OVS submodule again to
fdbf0bb2aed5 ("flake8: Fix E721 check failures.") (which is what patch
1/3 is doing) but on _all_ stable branches.  That's not ideal because
that commit (and its backports) are not part of any release yet.

I wanted to avoid doing that but it seems we don't have an easy
alternative so I'm starting to think we should just go for the OVS
submodule bump and try to find a better solution for the container
workflow on the longer term.

What do you guys think?

Thanks,
Dumitru




_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to