Re: [PR] Fix image cache optimizations - speeding up the build [airflow]

2024-03-25 Thread via GitHub


potiuk merged PR #38442:
URL: https://github.com/apache/airflow/pull/38442


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix image cache optimizations - speeding up the build [airflow]

2024-03-25 Thread via GitHub


potiuk commented on code in PR #38442:
URL: https://github.com/apache/airflow/pull/38442#discussion_r1537942743


##
.github/workflows/additional-ci-image-checks.yml:
##
@@ -142,7 +140,7 @@ jobs:
   - name: "Login to ghcr.io"
 run: echo "${{ env.GITHUB_TOKEN }}" | docker login ghcr.io -u ${{ 
github.actor }} --password-stdin
   - name: "Check that image builds quickly"
-run: breeze shell --max-time 120
+run: breeze shell --max-time 600 --platform "linux/amd64"

Review Comment:
   Yes. there are few steps above that take usually ~ 1 minute :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix image cache optimizations - speeding up the build [airflow]

2024-03-25 Thread via GitHub


potiuk commented on code in PR #38442:
URL: https://github.com/apache/airflow/pull/38442#discussion_r1537943156


##
.github/workflows/additional-ci-image-checks.yml:
##
@@ -142,7 +140,7 @@ jobs:
   - name: "Login to ghcr.io"
 run: echo "${{ env.GITHUB_TOKEN }}" | docker login ghcr.io -u ${{ 
github.actor }} --password-stdin
   - name: "Check that image builds quickly"
-run: breeze shell --max-time 120
+run: breeze shell --max-time 600 --platform "linux/amd64"

Review Comment:
   (quite a bit less than that usually)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix image cache optimizations - speeding up the build [airflow]

2024-03-25 Thread via GitHub


o-nikolas commented on code in PR #38442:
URL: https://github.com/apache/airflow/pull/38442#discussion_r1537925307


##
.github/workflows/additional-ci-image-checks.yml:
##
@@ -142,7 +140,7 @@ jobs:
   - name: "Login to ghcr.io"
 run: echo "${{ env.GITHUB_TOKEN }}" | docker login ghcr.io -u ${{ 
github.actor }} --password-stdin
   - name: "Check that image builds quickly"
-run: breeze shell --max-time 120
+run: breeze shell --max-time 600 --platform "linux/amd64"

Review Comment:
   Is `600`s (10min) timeout intentionally a minute lower than the 11 minute 
one above?



##
.github/workflows/additional-ci-image-checks.yml:
##
@@ -89,42 +89,40 @@ jobs:
   # delay cache refresh. It does not attempt to upgrade to newer dependencies.
   # We only push CI cache as PROD cache usually does not gain as much from 
fresh cache because
   # it uses prepared airflow and provider packages that invalidate the cache 
anyway most of the time
-  # push-early-buildx-cache-to-github-registry:
-  #   name: Push Early Image Cache
-  #   uses: ./.github/workflows/push-image-cache.yml
-  #   permissions:
-  # contents: read
-  # # This write is only given here for `push` events from 
"apache/airflow" repo. It is not given for PRs
-  # # from forks. This is to prevent malicious PRs from creating images in 
the "apache/airflow" repo.
-  # # For regular build for PRS this "build-prod-images" workflow will be 
skipped anyway by the
-  # # "in-workflow-build" condition
-  # packages: write
-  #   secrets: inherit
-  #   with:
-  # runs-on: ${{ inputs.runs-on }}
-  # cache-type: "Early"
-  # include-prod-images: "false"
-  # push-latest-images: "false"
-  # image-tag: ${{ inputs.image-tag }}
-  # python-versions: ${{ inputs.python-versions }}
-  # branch: ${{ inputs.branch }}
-  # use-uv: "true"
-  # include-success-outputs: ${{ inputs.include-success-outputs }}
-  # constraints-branch: ${{ inputs.constraints-branch }}
-  # docker-cache: ${{ inputs.docker-cache }}
-  #   if: inputs.canary-run == 'true' && inputs.branch == 'main'
+  push-early-buildx-cache-to-github-registry:
+name: Push Early Image Cache
+uses: ./.github/workflows/push-image-cache.yml
+permissions:
+  contents: read
+  # This write is only given here for `push` events from "apache/airflow" 
repo. It is not given for PRs
+  # from forks. This is to prevent malicious PRs from creating images in 
the "apache/airflow" repo.
+  # For regular build for PRS this "build-prod-images" workflow will be 
skipped anyway by the
+  # "in-workflow-build" condition
+  packages: write
+secrets: inherit
+with:
+  # Runs on Public runners
+  cache-type: "Early"
+  include-prod-images: "false"
+  push-latest-images: "false"
+  platform: "linux/amd64"
+  python-versions: ${{ inputs.python-versions }}
+  branch: ${{ inputs.branch }}
+  constraints-branch: ${{ inputs.constraints-branch }}
+  use-uv: "true"
+  include-success-outputs: ${{ inputs.include-success-outputs }}
+  docker-cache: ${{ inputs.docker-cache }}
+if: inputs.canary-run == 'true' && inputs.branch == 'main'
 
   # Check that after earlier cache push, breeze command will build quickly
   check-that-image-builds-quickly:
-timeout-minutes: 5
+timeout-minutes: 11

Review Comment:
   Nice catch



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix image cache optimizations - speeding up the build [airflow]

2024-03-25 Thread via GitHub


potiuk commented on PR #38442:
URL: https://github.com/apache/airflow/pull/38442#issuecomment-2017747862

   Looks green :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix image cache optimizations - speeding up the build [airflow]

2024-03-25 Thread via GitHub


potiuk commented on PR #38442:
URL: https://github.com/apache/airflow/pull/38442#issuecomment-2017463146

   With this one we are back in the realm of ~ 2 m average for building image 
for regular builds (down from ~ 7m seen recently) . So the refactor done 
recently can be finally concluded with some cleanups once we merge that one :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix image cache optimizations - speeding up the build [airflow]

2024-03-24 Thread via GitHub


potiuk commented on code in PR #38442:
URL: https://github.com/apache/airflow/pull/38442#discussion_r1536985669


##
.github/workflows/ci-image-build.yml:
##
@@ -195,19 +194,19 @@ ${{ inputs.do-build == 'true' && inputs.image-tag || '' 
}}"
 run: >
   breeze ci-image build --tag-as-latest --image-tag "${{ 
inputs.image-tag }}"
   --python "${{ matrix.python-version }}"
-  --platform "linux/${{ inputs.platform }}"
+  --platform "${{ inputs.platform }}"
 env:
   DOCKER_CACHE: ${{ inputs.docker-cache }}
   INSTALL_MYSQL_CLIENT_TYPE: ${{ inputs.install-mysql-client-type }}
   UPGRADE_TO_NEWER_DEPENDENCIES: ${{ 
inputs.upgrade-to-newer-dependencies }}
   GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-  BUILDER: ${{ inputs.platform == 'amd64' && 'default' || 
'airflow_cache' }}
+  BUILDER: "airflow_cache"

Review Comment:
   That was the clue. BUILDER should be always set to the same `airflow_cache` 
and then the cache will be nicely reused and rebuilt.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix image cache optimizations - speeding up the build [airflow]

2024-03-24 Thread via GitHub


potiuk commented on code in PR #38442:
URL: https://github.com/apache/airflow/pull/38442#discussion_r1536951037


##
.github/workflows/finalize-tests.yml:
##
@@ -120,31 +120,48 @@ jobs:
 run:
   git push
 
-  # Push BuildX cache to GitHub Registry in Apache repository, if all tests 
are successful and build
-  # is executed as result of direct push to "main" or one of the "vX-Y-test" 
branches
-  # It rebuilds all images using just-pushed constraints using buildx and 
pushes them to registry
-  # It will automatically check if a new python image was released and will 
pull the latest one if needed
-  #  push-buildx-cache-to-github-registry:
-  #name: Push Regular Image Cache
-  #needs: [update-constraints]
-  #uses: ./.github/workflows/push-image-cache.yml
-  #permissions:
-  #  contents: read
-  #  packages: write
-  #secrets: inherit
-  #with:
-  #  runs-on: ${{ inputs.runs-on }}
-  #  cache-type: "Regular"
-  #  include-prod-images: "true"
-  #  push-latest-images: "true"
-  #  use-uv: "true"
-  #  image-tag: ${{ inputs.image-tag }}
-  #  python-versions: ${{ inputs.python-versions }}
-  #  branch: ${{ inputs.branch }}
-  #  constraints-branch: ${{ inputs.constraints-branch }}
-  #  include-success-outputs: ${{ inputs.include-success-outputs }}
-  #  docker-cache: ${{ inputs.docker-cache }}
-  #if: inputs.canary-run == 'true'
+  push-buildx-cache-to-github-registry-amd:

Review Comment:
   We have now "final" cache built for both AMD and ARM but in two separate 
workflows. This way AMD can be run using public runners but ARM will use 
self-hosted ones/



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix image cache optimizations - speeding up the build [airflow]

2024-03-24 Thread via GitHub


potiuk commented on code in PR #38442:
URL: https://github.com/apache/airflow/pull/38442#discussion_r1536950612


##
.github/workflows/additional-ci-image-checks.yml:
##
@@ -89,42 +89,40 @@ jobs:
   # delay cache refresh. It does not attempt to upgrade to newer dependencies.
   # We only push CI cache as PROD cache usually does not gain as much from 
fresh cache because
   # it uses prepared airflow and provider packages that invalidate the cache 
anyway most of the time
-  # push-early-buildx-cache-to-github-registry:
-  #   name: Push Early Image Cache
-  #   uses: ./.github/workflows/push-image-cache.yml
-  #   permissions:
-  # contents: read
-  # # This write is only given here for `push` events from 
"apache/airflow" repo. It is not given for PRs
-  # # from forks. This is to prevent malicious PRs from creating images in 
the "apache/airflow" repo.
-  # # For regular build for PRS this "build-prod-images" workflow will be 
skipped anyway by the
-  # # "in-workflow-build" condition
-  # packages: write
-  #   secrets: inherit
-  #   with:
-  # runs-on: ${{ inputs.runs-on }}
-  # cache-type: "Early"
-  # include-prod-images: "false"
-  # push-latest-images: "false"
-  # image-tag: ${{ inputs.image-tag }}
-  # python-versions: ${{ inputs.python-versions }}
-  # branch: ${{ inputs.branch }}
-  # use-uv: "true"
-  # include-success-outputs: ${{ inputs.include-success-outputs }}
-  # constraints-branch: ${{ inputs.constraints-branch }}
-  # docker-cache: ${{ inputs.docker-cache }}
-  #   if: inputs.canary-run == 'true' && inputs.branch == 'main'
+  push-early-buildx-cache-to-github-registry:
+name: Push Early Image Cache
+uses: ./.github/workflows/push-image-cache.yml
+permissions:
+  contents: read
+  # This write is only given here for `push` events from "apache/airflow" 
repo. It is not given for PRs
+  # from forks. This is to prevent malicious PRs from creating images in 
the "apache/airflow" repo.
+  # For regular build for PRS this "build-prod-images" workflow will be 
skipped anyway by the
+  # "in-workflow-build" condition
+  packages: write
+secrets: inherit
+with:
+  # Runs on Public runners
+  cache-type: "Early"
+  include-prod-images: "false"
+  push-latest-images: "false"
+  platform: "linux/amd64"
+  python-versions: ${{ inputs.python-versions }}
+  branch: ${{ inputs.branch }}
+  constraints-branch: ${{ inputs.constraints-branch }}
+  use-uv: "true"
+  include-success-outputs: ${{ inputs.include-success-outputs }}
+  docker-cache: ${{ inputs.docker-cache }}
+if: inputs.canary-run == 'true' && inputs.branch == 'main'
 
   # Check that after earlier cache push, breeze command will build quickly
   check-that-image-builds-quickly:
-timeout-minutes: 5
+timeout-minutes: 11

Review Comment:
   This job had not been tesitng what it was supposed to be testing. Having 
IMAGE_TAG, it was just pulling the image rarther than building it using cache 
and timeout of 2 minutes was far too low for building it. We might tweak it in 
the future as well when the cache gets stabilized.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix image cache optimizations - speeding up the build [airflow]

2024-03-24 Thread via GitHub


potiuk commented on code in PR #38442:
URL: https://github.com/apache/airflow/pull/38442#discussion_r1536950198


##
.github/workflows/additional-ci-image-checks.yml:
##
@@ -89,42 +89,40 @@ jobs:
   # delay cache refresh. It does not attempt to upgrade to newer dependencies.
   # We only push CI cache as PROD cache usually does not gain as much from 
fresh cache because
   # it uses prepared airflow and provider packages that invalidate the cache 
anyway most of the time
-  # push-early-buildx-cache-to-github-registry:
-  #   name: Push Early Image Cache
-  #   uses: ./.github/workflows/push-image-cache.yml
-  #   permissions:
-  # contents: read
-  # # This write is only given here for `push` events from 
"apache/airflow" repo. It is not given for PRs
-  # # from forks. This is to prevent malicious PRs from creating images in 
the "apache/airflow" repo.
-  # # For regular build for PRS this "build-prod-images" workflow will be 
skipped anyway by the
-  # # "in-workflow-build" condition
-  # packages: write
-  #   secrets: inherit
-  #   with:
-  # runs-on: ${{ inputs.runs-on }}
-  # cache-type: "Early"
-  # include-prod-images: "false"
-  # push-latest-images: "false"
-  # image-tag: ${{ inputs.image-tag }}
-  # python-versions: ${{ inputs.python-versions }}
-  # branch: ${{ inputs.branch }}
-  # use-uv: "true"
-  # include-success-outputs: ${{ inputs.include-success-outputs }}
-  # constraints-branch: ${{ inputs.constraints-branch }}
-  # docker-cache: ${{ inputs.docker-cache }}
-  #   if: inputs.canary-run == 'true' && inputs.branch == 'main'
+  push-early-buildx-cache-to-github-registry:
+name: Push Early Image Cache
+uses: ./.github/workflows/push-image-cache.yml
+permissions:
+  contents: read
+  # This write is only given here for `push` events from "apache/airflow" 
repo. It is not given for PRs
+  # from forks. This is to prevent malicious PRs from creating images in 
the "apache/airflow" repo.
+  # For regular build for PRS this "build-prod-images" workflow will be 
skipped anyway by the
+  # "in-workflow-build" condition
+  packages: write
+secrets: inherit
+with:
+  # Runs on Public runners

Review Comment:
   We build the AMD images using public runners, to save "self-hosted" ones for 
heavier jobs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix image cache optimizations - speeding up the build [airflow]

2024-03-24 Thread via GitHub


potiuk commented on code in PR #38442:
URL: https://github.com/apache/airflow/pull/38442#discussion_r1536950005


##
.github/workflows/additional-ci-image-checks.yml:
##
@@ -89,42 +89,40 @@ jobs:
   # delay cache refresh. It does not attempt to upgrade to newer dependencies.
   # We only push CI cache as PROD cache usually does not gain as much from 
fresh cache because
   # it uses prepared airflow and provider packages that invalidate the cache 
anyway most of the time
-  # push-early-buildx-cache-to-github-registry:
-  #   name: Push Early Image Cache
-  #   uses: ./.github/workflows/push-image-cache.yml
-  #   permissions:
-  # contents: read
-  # # This write is only given here for `push` events from 
"apache/airflow" repo. It is not given for PRs
-  # # from forks. This is to prevent malicious PRs from creating images in 
the "apache/airflow" repo.
-  # # For regular build for PRS this "build-prod-images" workflow will be 
skipped anyway by the
-  # # "in-workflow-build" condition
-  # packages: write
-  #   secrets: inherit
-  #   with:
-  # runs-on: ${{ inputs.runs-on }}
-  # cache-type: "Early"
-  # include-prod-images: "false"
-  # push-latest-images: "false"
-  # image-tag: ${{ inputs.image-tag }}
-  # python-versions: ${{ inputs.python-versions }}
-  # branch: ${{ inputs.branch }}
-  # use-uv: "true"
-  # include-success-outputs: ${{ inputs.include-success-outputs }}
-  # constraints-branch: ${{ inputs.constraints-branch }}
-  # docker-cache: ${{ inputs.docker-cache }}
-  #   if: inputs.canary-run == 'true' && inputs.branch == 'main'
+  push-early-buildx-cache-to-github-registry:

Review Comment:
   Re-enable pushing early image cache.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix image cache optimizations - speeding up the build [airflow]

2024-03-24 Thread via GitHub


potiuk commented on code in PR #38442:
URL: https://github.com/apache/airflow/pull/38442#discussion_r1536949904


##
dev/breeze/src/airflow_breeze/utils/docker_command_utils.py:
##
@@ -381,12 +381,15 @@ def prepare_base_build_command(image_params: 
CommonBuildParams) -> list[str]:
 ]
 )
 if not image_params.docker_host:
+builder = get_and_use_docker_context(image_params.builder)
 build_command_param.extend(
 [
 "--builder",
-get_and_use_docker_context(image_params.builder),
+builder,
 ]
 )
+if builder != "default":

Review Comment:
   When we are using builder different than default we use `--load` flag to 
load the built image to local docker engine.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix image cache optimizations - speeding up the build [airflow]

2024-03-24 Thread via GitHub


potiuk commented on code in PR #38442:
URL: https://github.com/apache/airflow/pull/38442#discussion_r1536949488


##
dev/breeze/src/airflow_breeze/utils/docker_command_utils.py:
##
@@ -658,6 +661,7 @@ def autodetect_docker_context():
 def get_and_use_docker_context(context: str):
 if context == "autodetect":
 context = autodetect_docker_context()
+run_command(["docker", "context", "create", context], check=False)

Review Comment:
   Creates missing context if needed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix image cache optimizations - speeding up the build [airflow]

2024-03-24 Thread via GitHub


potiuk commented on code in PR #38442:
URL: https://github.com/apache/airflow/pull/38442#discussion_r1536949451


##
dev/breeze/src/airflow_breeze/utils/image.py:
##
@@ -197,18 +197,8 @@ def tag_image_as_latest(image_params: CommonBuildParams, 
output: Output | None)
 check=False,
 )
 if command.returncode != 0:
-return command
-if image_params.push:
-command = run_command(
-[
-"docker",
-"push",
-image_params.airflow_image_name + ":latest",
-],
-output=output,
-capture_output=True,
-check=False,
-)
+get_console(output=output).print(command.stdout)

Review Comment:
   Adds diagnostics in case `docker tag fails`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Fix image cache optimizations - speeding up the build [airflow]

2024-03-24 Thread via GitHub


potiuk opened a new pull request, #38442:
URL: https://github.com/apache/airflow/pull/38442

   The recent refactors in workflows broke the way how cache had been used in 
the CI builds. This PR brings back the optimizations by using the cache and 
rebuilding it.
   
   
   
   
   
   
   
   
   ---
   **^ Add meaningful description above**
   Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#pull-request-guidelines)**
 for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a 
newsfragment file, named `{pr_number}.significant.rst` or 
`{issue_number}.significant.rst`, in 
[newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org