[ 
https://issues.apache.org/jira/browse/HADOOP-19859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18075764#comment-18075764
 ] 

ASF GitHub Bot commented on HADOOP-19859:
-----------------------------------------

ajfabbri commented on code in PR #8451:
URL: https://github.com/apache/hadoop/pull/8451#discussion_r3132362703


##########
.github/workflows/build_image_cache.yml:
##########
@@ -0,0 +1,53 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+name: Image Cache
+
+# Security: write privileges are safe since this is triggered only by
+# `push` and `workflow_dispatch` (implying user has write access).
+on:
+  # Run jobs when a commit is merged
+  push:
+    branches:
+      - 'trunk'
+      - 'branch-*'
+    paths:
+      - 'dev-support/docker/**'

Review Comment:
   Cool. Makes sense to rebuild when anything in here changes. In the future we 
might just publish an image and use it directly instead of always doing a 
cached build? We can iterate on it though. 👍 



##########
.github/workflows/build_image_cache.yml:
##########
@@ -0,0 +1,51 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+name: Image Cache
+
+on:
+  # Run jobs when a commit is merged
+  push:
+    branches:
+      - 'trunk'
+      - 'branch-*'
+    paths:
+      - 'dev-support/docker/**'
+  workflow_dispatch:

Review Comment:
   Agreed. It is a "trusted" action. (We still are careful to use best 
practices below, and our CodeQL scanning helps enforce that in the future.)



##########
.github/workflows/tmpl_build_image_cache.yml:
##########
@@ -0,0 +1,62 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+name: Build Image Cache
+
+on:
+  workflow_call:
+    inputs:
+      os:
+        required: false
+        type: string
+        description: Operating system to create build image cache for.
+        default: ubuntu_24
+
+# Default to minimal permissions for workflow.
+permissions:
+  packages: read
+
+jobs:
+  main:
+    name: build-image-cache-${{ inputs.os }}-${{ github.ref_name }}
+    if: github.repository == 'apache/hadoop'
+    runs-on: ubuntu-24.04
+    permissions:
+      packages: write
+    steps:
+      - name: Checkout Hadoop repository
+        uses: actions/checkout@v6
+      - name: Set up Docker Buildx
+        uses: 
docker/setup-buildx-action@4d04d5d9486b7bd6fa91e7baf45bbb4f8b9deedd # v4.0.0
+      - name: Login to DockerHub
+        uses: docker/login-action@b45d80f862d83dbcd57f89517bcf500b2ab88fb2 # 
v4.0.0
+        with:
+          registry: ghcr.io
+          username: ${{ github.actor }}
+          password: ${{ secrets.GITHUB_TOKEN }}
+      - name: Build image cache for ${{ inputs.os }}-${{ github.ref_name }}
+        id: docker_build
+        uses: 
docker/build-push-action@d08e5c354a6adb9ed34480a06d141179aa583294 # v7.0.0
+        with:
+          context: ./dev-support/docker/
+          file: ./dev-support/docker/Dockerfile_${{ inputs.os }}
+          push: true
+          tags: ghcr.io/apache/hadoop/gha-build-${{ inputs.os 
}}-image-cache:${{ github.ref_name }}-static
+          cache-from: type=registry,ref=ghcr.io/apache/hadoop/gha-build-${{ 
inputs.os }}-image-cache:${{ github.ref_name }}
+          cache-to: type=registry,ref=ghcr.io/apache/hadoop/gha-build-${{ 
inputs.os }}-image-cache:${{ github.ref_name }},mode=max

Review Comment:
   Just getting familiar with this and reading docs. Is this based on the Spark 
CI workflows?
   
   `type=registry` ([docs](https://docs.docker.com/build/cache/backends/)) 
   
   > registry: embeds the build cache into a separate image, and pushes to a 
dedicated location separate from the main output.
   
   `cache-to:` exports the cache to a particular backend (registry) after a 
build. `cache-from` specifies how to import at start of a build. IIUC the local 
BuildKit cache is always enabled, but has no persistence between runs, so only 
helps with multiple builds within the same workflow.
   
   The locations passed in (`ref=`) act as the key for the cache lookup, and we 
separate these by OS and branch name.
   
   `mode=max` means to export all intermediate layers of the image build, 
whereas `mode=min` only exports those which end up in the image. This looks 
good to me. 👍 
   





> Use cache to speed up GHA infra image building
> ----------------------------------------------
>
>                 Key: HADOOP-19859
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19859
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Cheng Pan
>            Priority: Major
>              Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to