On Tue, Feb 09, 2021 at 09:58:29AM +0000, Daniel P. Berrangé wrote: > On Tue, Feb 09, 2021 at 07:37:51AM +0100, Thomas Huth wrote: > > On 08/02/2021 17.33, Daniel P. Berrangé wrote: > > [...] > > > For example, consider pushing 5 commits, one of which contains a > > > dockerfile change. This will trigger a CI pipeline for the > > > containers. Now consider you do some more work on the branch and push 3 > > > further commits, so you now have a branch of 8 commits. For the second > > > push GitLab will only look at the 3 most recent commits, the other 5 > > > were already present. Thus GitLab will not realize that the branch has > > > dockerfile changes that need to trigger the container build. > > > > > > This can cause real world problems: > > > > > > - Push 5 commits to branch "foo", including a dockerfile change > > > > > > => rebuilds the container images with content from "foo" > > > => build jobs runs against containers from "foo" > > > > > > - Refresh your master branch with latest upstream master > > > > > > => rebuilds the container images with content from "master" > > > => build jobs runs against containers from "master" > > > > > > - Push 3 more commits to branch "foo", with no dockerfile change > > > > > > => no container rebuild triggers > > > => build jobs runs against containers from "master" > > > > > > The "changes" conditional in gitlab is OK, *provided* your build > > > jobs are not relying on any external state from previous builds. > > > > > > This is NOT the case in QEMU, because we are building container > > > images and these are cached. This is a scenario in which the > > > "changes" conditional is not usuable. > > > > > > The only other way to avoid this problem would be to use the git > > > branch name as the container image tag, instead of always using > > > "latest". > > I'm basically fine with your patch, but let me ask one more thing: Won't we > > still have the problem if the user pushes to different branches > > simultaneously? E.g. the user pushes to "foo" with changes to dockerfiles, > > containers start to get rebuild, then pushes to master without waiting for > > the previous CI to finish, then the containers get rebuild from the "master" > > job without the local changes to the dockerfiles. Then in the "foo" CI > > pipelines the following jobs might run with the containers that have been > > built by the "master" job... > > Yes, this is the issue I describe in the cover letter. > > > So if we really want to get it bulletproof, do we have to use the git branch > > name as the container image tag? > > That is possible, but I'm somewhat loathe to do that, as it means the > container registry in developers forks will accumulate a growing list > of image tags. I know gitlab will force expire once it gets beyond a > certain number of tags, but it still felt pretty wasteful of space > to create so many tags. > > Having said that, maybe this is not actually wasteful if we always > use the "master" as a cache for docker, then the "new" images we > build on each branch will just re-use existing docker layers and > thus not add to disk usage. We'd only see extra usage if the branch > contained changes to dockerfiles.
The challenge here is that I need the docker tag name to be in an env variable in the gitlab-ci.yml file. I can directly use $CI_COMMIT_REF_NAME to get the branch name but the list of valid characters for a git branch is way more permissive than valid characters for a docker tag. So we need to filter the git branch name to form a valid docker tag, and AFAICT, there's no way todo that when setting a global env variable in the gitlab-ci.yml. I can only do filtering once in the before_script: stage, and that's too late to use it in the image name for the job. We could ignore the problem and hope people always have sane branch names ? https://docs.docker.com/engine/reference/commandline/tag/ "A tag name must be valid ASCII and may contain lowercase and uppercase letters, digits, underscores, periods and dashes. A tag name may not start with a period or a dash and may contain a maximum of 128 characters." that rule would cover all my git branch names, but then ASCII covers most common english needs. I worry that we might have contributors who genuinely use non-ASCII chars in their git branch names, especially those speakers of non-english/european languages eg persian, chinese, japanese languages for example. Git is very permissive, allowing everything except a short list https://www.spinics.net/lists/git/msg133704.html "A branch name can not: - Have a path component that begins with "." - Have a double dot ".." - Have an ASCII control character, "~", "^", ":" or SP, anywhere - End with a "/" - End with ".lock" - Contain a "\" (backslash" The result will be if someone names their git branch "🏂", then all the CI jobs will fail in gitlab. $ git branch 🏂 works $ docker tag 470671670cac foo:🏂 Error: invalid reference format fails Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|