Re: [PR] Fix image cache optimizations - speeding up the build [airflow]
potiuk merged PR #38442: URL: https://github.com/apache/airflow/pull/38442 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix image cache optimizations - speeding up the build [airflow]
potiuk commented on code in PR #38442: URL: https://github.com/apache/airflow/pull/38442#discussion_r1537942743 ## .github/workflows/additional-ci-image-checks.yml: ## @@ -142,7 +140,7 @@ jobs: - name: "Login to ghcr.io" run: echo "${{ env.GITHUB_TOKEN }}" | docker login ghcr.io -u ${{ github.actor }} --password-stdin - name: "Check that image builds quickly" -run: breeze shell --max-time 120 +run: breeze shell --max-time 600 --platform "linux/amd64" Review Comment: Yes. there are few steps above that take usually ~ 1 minute :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix image cache optimizations - speeding up the build [airflow]
potiuk commented on code in PR #38442: URL: https://github.com/apache/airflow/pull/38442#discussion_r1537943156 ## .github/workflows/additional-ci-image-checks.yml: ## @@ -142,7 +140,7 @@ jobs: - name: "Login to ghcr.io" run: echo "${{ env.GITHUB_TOKEN }}" | docker login ghcr.io -u ${{ github.actor }} --password-stdin - name: "Check that image builds quickly" -run: breeze shell --max-time 120 +run: breeze shell --max-time 600 --platform "linux/amd64" Review Comment: (quite a bit less than that usually) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix image cache optimizations - speeding up the build [airflow]
o-nikolas commented on code in PR #38442: URL: https://github.com/apache/airflow/pull/38442#discussion_r1537925307 ## .github/workflows/additional-ci-image-checks.yml: ## @@ -142,7 +140,7 @@ jobs: - name: "Login to ghcr.io" run: echo "${{ env.GITHUB_TOKEN }}" | docker login ghcr.io -u ${{ github.actor }} --password-stdin - name: "Check that image builds quickly" -run: breeze shell --max-time 120 +run: breeze shell --max-time 600 --platform "linux/amd64" Review Comment: Is `600`s (10min) timeout intentionally a minute lower than the 11 minute one above? ## .github/workflows/additional-ci-image-checks.yml: ## @@ -89,42 +89,40 @@ jobs: # delay cache refresh. It does not attempt to upgrade to newer dependencies. # We only push CI cache as PROD cache usually does not gain as much from fresh cache because # it uses prepared airflow and provider packages that invalidate the cache anyway most of the time - # push-early-buildx-cache-to-github-registry: - # name: Push Early Image Cache - # uses: ./.github/workflows/push-image-cache.yml - # permissions: - # contents: read - # # This write is only given here for `push` events from "apache/airflow" repo. It is not given for PRs - # # from forks. This is to prevent malicious PRs from creating images in the "apache/airflow" repo. - # # For regular build for PRS this "build-prod-images" workflow will be skipped anyway by the - # # "in-workflow-build" condition - # packages: write - # secrets: inherit - # with: - # runs-on: ${{ inputs.runs-on }} - # cache-type: "Early" - # include-prod-images: "false" - # push-latest-images: "false" - # image-tag: ${{ inputs.image-tag }} - # python-versions: ${{ inputs.python-versions }} - # branch: ${{ inputs.branch }} - # use-uv: "true" - # include-success-outputs: ${{ inputs.include-success-outputs }} - # constraints-branch: ${{ inputs.constraints-branch }} - # docker-cache: ${{ inputs.docker-cache }} - # if: inputs.canary-run == 'true' && inputs.branch == 'main' + push-early-buildx-cache-to-github-registry: +name: Push Early Image Cache +uses: ./.github/workflows/push-image-cache.yml +permissions: + contents: read + # This write is only given here for `push` events from "apache/airflow" repo. It is not given for PRs + # from forks. This is to prevent malicious PRs from creating images in the "apache/airflow" repo. + # For regular build for PRS this "build-prod-images" workflow will be skipped anyway by the + # "in-workflow-build" condition + packages: write +secrets: inherit +with: + # Runs on Public runners + cache-type: "Early" + include-prod-images: "false" + push-latest-images: "false" + platform: "linux/amd64" + python-versions: ${{ inputs.python-versions }} + branch: ${{ inputs.branch }} + constraints-branch: ${{ inputs.constraints-branch }} + use-uv: "true" + include-success-outputs: ${{ inputs.include-success-outputs }} + docker-cache: ${{ inputs.docker-cache }} +if: inputs.canary-run == 'true' && inputs.branch == 'main' # Check that after earlier cache push, breeze command will build quickly check-that-image-builds-quickly: -timeout-minutes: 5 +timeout-minutes: 11 Review Comment: Nice catch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix image cache optimizations - speeding up the build [airflow]
potiuk commented on PR #38442: URL: https://github.com/apache/airflow/pull/38442#issuecomment-2017747862 Looks green :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix image cache optimizations - speeding up the build [airflow]
potiuk commented on PR #38442: URL: https://github.com/apache/airflow/pull/38442#issuecomment-2017463146 With this one we are back in the realm of ~ 2 m average for building image for regular builds (down from ~ 7m seen recently) . So the refactor done recently can be finally concluded with some cleanups once we merge that one :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix image cache optimizations - speeding up the build [airflow]
potiuk commented on code in PR #38442: URL: https://github.com/apache/airflow/pull/38442#discussion_r1536985669 ## .github/workflows/ci-image-build.yml: ## @@ -195,19 +194,19 @@ ${{ inputs.do-build == 'true' && inputs.image-tag || '' }}" run: > breeze ci-image build --tag-as-latest --image-tag "${{ inputs.image-tag }}" --python "${{ matrix.python-version }}" - --platform "linux/${{ inputs.platform }}" + --platform "${{ inputs.platform }}" env: DOCKER_CACHE: ${{ inputs.docker-cache }} INSTALL_MYSQL_CLIENT_TYPE: ${{ inputs.install-mysql-client-type }} UPGRADE_TO_NEWER_DEPENDENCIES: ${{ inputs.upgrade-to-newer-dependencies }} GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - BUILDER: ${{ inputs.platform == 'amd64' && 'default' || 'airflow_cache' }} + BUILDER: "airflow_cache" Review Comment: That was the clue. BUILDER should be always set to the same `airflow_cache` and then the cache will be nicely reused and rebuilt. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix image cache optimizations - speeding up the build [airflow]
potiuk commented on code in PR #38442: URL: https://github.com/apache/airflow/pull/38442#discussion_r1536951037 ## .github/workflows/finalize-tests.yml: ## @@ -120,31 +120,48 @@ jobs: run: git push - # Push BuildX cache to GitHub Registry in Apache repository, if all tests are successful and build - # is executed as result of direct push to "main" or one of the "vX-Y-test" branches - # It rebuilds all images using just-pushed constraints using buildx and pushes them to registry - # It will automatically check if a new python image was released and will pull the latest one if needed - # push-buildx-cache-to-github-registry: - #name: Push Regular Image Cache - #needs: [update-constraints] - #uses: ./.github/workflows/push-image-cache.yml - #permissions: - # contents: read - # packages: write - #secrets: inherit - #with: - # runs-on: ${{ inputs.runs-on }} - # cache-type: "Regular" - # include-prod-images: "true" - # push-latest-images: "true" - # use-uv: "true" - # image-tag: ${{ inputs.image-tag }} - # python-versions: ${{ inputs.python-versions }} - # branch: ${{ inputs.branch }} - # constraints-branch: ${{ inputs.constraints-branch }} - # include-success-outputs: ${{ inputs.include-success-outputs }} - # docker-cache: ${{ inputs.docker-cache }} - #if: inputs.canary-run == 'true' + push-buildx-cache-to-github-registry-amd: Review Comment: We have now "final" cache built for both AMD and ARM but in two separate workflows. This way AMD can be run using public runners but ARM will use self-hosted ones/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix image cache optimizations - speeding up the build [airflow]
potiuk commented on code in PR #38442: URL: https://github.com/apache/airflow/pull/38442#discussion_r1536950612 ## .github/workflows/additional-ci-image-checks.yml: ## @@ -89,42 +89,40 @@ jobs: # delay cache refresh. It does not attempt to upgrade to newer dependencies. # We only push CI cache as PROD cache usually does not gain as much from fresh cache because # it uses prepared airflow and provider packages that invalidate the cache anyway most of the time - # push-early-buildx-cache-to-github-registry: - # name: Push Early Image Cache - # uses: ./.github/workflows/push-image-cache.yml - # permissions: - # contents: read - # # This write is only given here for `push` events from "apache/airflow" repo. It is not given for PRs - # # from forks. This is to prevent malicious PRs from creating images in the "apache/airflow" repo. - # # For regular build for PRS this "build-prod-images" workflow will be skipped anyway by the - # # "in-workflow-build" condition - # packages: write - # secrets: inherit - # with: - # runs-on: ${{ inputs.runs-on }} - # cache-type: "Early" - # include-prod-images: "false" - # push-latest-images: "false" - # image-tag: ${{ inputs.image-tag }} - # python-versions: ${{ inputs.python-versions }} - # branch: ${{ inputs.branch }} - # use-uv: "true" - # include-success-outputs: ${{ inputs.include-success-outputs }} - # constraints-branch: ${{ inputs.constraints-branch }} - # docker-cache: ${{ inputs.docker-cache }} - # if: inputs.canary-run == 'true' && inputs.branch == 'main' + push-early-buildx-cache-to-github-registry: +name: Push Early Image Cache +uses: ./.github/workflows/push-image-cache.yml +permissions: + contents: read + # This write is only given here for `push` events from "apache/airflow" repo. It is not given for PRs + # from forks. This is to prevent malicious PRs from creating images in the "apache/airflow" repo. + # For regular build for PRS this "build-prod-images" workflow will be skipped anyway by the + # "in-workflow-build" condition + packages: write +secrets: inherit +with: + # Runs on Public runners + cache-type: "Early" + include-prod-images: "false" + push-latest-images: "false" + platform: "linux/amd64" + python-versions: ${{ inputs.python-versions }} + branch: ${{ inputs.branch }} + constraints-branch: ${{ inputs.constraints-branch }} + use-uv: "true" + include-success-outputs: ${{ inputs.include-success-outputs }} + docker-cache: ${{ inputs.docker-cache }} +if: inputs.canary-run == 'true' && inputs.branch == 'main' # Check that after earlier cache push, breeze command will build quickly check-that-image-builds-quickly: -timeout-minutes: 5 +timeout-minutes: 11 Review Comment: This job had not been tesitng what it was supposed to be testing. Having IMAGE_TAG, it was just pulling the image rarther than building it using cache and timeout of 2 minutes was far too low for building it. We might tweak it in the future as well when the cache gets stabilized. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix image cache optimizations - speeding up the build [airflow]
potiuk commented on code in PR #38442: URL: https://github.com/apache/airflow/pull/38442#discussion_r1536950198 ## .github/workflows/additional-ci-image-checks.yml: ## @@ -89,42 +89,40 @@ jobs: # delay cache refresh. It does not attempt to upgrade to newer dependencies. # We only push CI cache as PROD cache usually does not gain as much from fresh cache because # it uses prepared airflow and provider packages that invalidate the cache anyway most of the time - # push-early-buildx-cache-to-github-registry: - # name: Push Early Image Cache - # uses: ./.github/workflows/push-image-cache.yml - # permissions: - # contents: read - # # This write is only given here for `push` events from "apache/airflow" repo. It is not given for PRs - # # from forks. This is to prevent malicious PRs from creating images in the "apache/airflow" repo. - # # For regular build for PRS this "build-prod-images" workflow will be skipped anyway by the - # # "in-workflow-build" condition - # packages: write - # secrets: inherit - # with: - # runs-on: ${{ inputs.runs-on }} - # cache-type: "Early" - # include-prod-images: "false" - # push-latest-images: "false" - # image-tag: ${{ inputs.image-tag }} - # python-versions: ${{ inputs.python-versions }} - # branch: ${{ inputs.branch }} - # use-uv: "true" - # include-success-outputs: ${{ inputs.include-success-outputs }} - # constraints-branch: ${{ inputs.constraints-branch }} - # docker-cache: ${{ inputs.docker-cache }} - # if: inputs.canary-run == 'true' && inputs.branch == 'main' + push-early-buildx-cache-to-github-registry: +name: Push Early Image Cache +uses: ./.github/workflows/push-image-cache.yml +permissions: + contents: read + # This write is only given here for `push` events from "apache/airflow" repo. It is not given for PRs + # from forks. This is to prevent malicious PRs from creating images in the "apache/airflow" repo. + # For regular build for PRS this "build-prod-images" workflow will be skipped anyway by the + # "in-workflow-build" condition + packages: write +secrets: inherit +with: + # Runs on Public runners Review Comment: We build the AMD images using public runners, to save "self-hosted" ones for heavier jobs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix image cache optimizations - speeding up the build [airflow]
potiuk commented on code in PR #38442: URL: https://github.com/apache/airflow/pull/38442#discussion_r1536950005 ## .github/workflows/additional-ci-image-checks.yml: ## @@ -89,42 +89,40 @@ jobs: # delay cache refresh. It does not attempt to upgrade to newer dependencies. # We only push CI cache as PROD cache usually does not gain as much from fresh cache because # it uses prepared airflow and provider packages that invalidate the cache anyway most of the time - # push-early-buildx-cache-to-github-registry: - # name: Push Early Image Cache - # uses: ./.github/workflows/push-image-cache.yml - # permissions: - # contents: read - # # This write is only given here for `push` events from "apache/airflow" repo. It is not given for PRs - # # from forks. This is to prevent malicious PRs from creating images in the "apache/airflow" repo. - # # For regular build for PRS this "build-prod-images" workflow will be skipped anyway by the - # # "in-workflow-build" condition - # packages: write - # secrets: inherit - # with: - # runs-on: ${{ inputs.runs-on }} - # cache-type: "Early" - # include-prod-images: "false" - # push-latest-images: "false" - # image-tag: ${{ inputs.image-tag }} - # python-versions: ${{ inputs.python-versions }} - # branch: ${{ inputs.branch }} - # use-uv: "true" - # include-success-outputs: ${{ inputs.include-success-outputs }} - # constraints-branch: ${{ inputs.constraints-branch }} - # docker-cache: ${{ inputs.docker-cache }} - # if: inputs.canary-run == 'true' && inputs.branch == 'main' + push-early-buildx-cache-to-github-registry: Review Comment: Re-enable pushing early image cache. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix image cache optimizations - speeding up the build [airflow]
potiuk commented on code in PR #38442: URL: https://github.com/apache/airflow/pull/38442#discussion_r1536949904 ## dev/breeze/src/airflow_breeze/utils/docker_command_utils.py: ## @@ -381,12 +381,15 @@ def prepare_base_build_command(image_params: CommonBuildParams) -> list[str]: ] ) if not image_params.docker_host: +builder = get_and_use_docker_context(image_params.builder) build_command_param.extend( [ "--builder", -get_and_use_docker_context(image_params.builder), +builder, ] ) +if builder != "default": Review Comment: When we are using builder different than default we use `--load` flag to load the built image to local docker engine. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix image cache optimizations - speeding up the build [airflow]
potiuk commented on code in PR #38442: URL: https://github.com/apache/airflow/pull/38442#discussion_r1536949488 ## dev/breeze/src/airflow_breeze/utils/docker_command_utils.py: ## @@ -658,6 +661,7 @@ def autodetect_docker_context(): def get_and_use_docker_context(context: str): if context == "autodetect": context = autodetect_docker_context() +run_command(["docker", "context", "create", context], check=False) Review Comment: Creates missing context if needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix image cache optimizations - speeding up the build [airflow]
potiuk commented on code in PR #38442: URL: https://github.com/apache/airflow/pull/38442#discussion_r1536949451 ## dev/breeze/src/airflow_breeze/utils/image.py: ## @@ -197,18 +197,8 @@ def tag_image_as_latest(image_params: CommonBuildParams, output: Output | None) check=False, ) if command.returncode != 0: -return command -if image_params.push: -command = run_command( -[ -"docker", -"push", -image_params.airflow_image_name + ":latest", -], -output=output, -capture_output=True, -check=False, -) +get_console(output=output).print(command.stdout) Review Comment: Adds diagnostics in case `docker tag fails` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Fix image cache optimizations - speeding up the build [airflow]
potiuk opened a new pull request, #38442: URL: https://github.com/apache/airflow/pull/38442 The recent refactors in workflows broke the way how cache had been used in the CI builds. This PR brings back the optimizations by using the cache and rebuilding it. --- **^ Add meaningful description above** Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#pull-request-guidelines)** for more information. In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org