kou commented on code in PR #47600:
URL: https://github.com/apache/arrow/pull/47600#discussion_r2372243104
##########
dev/release/04-binary-download.sh:
##########
@@ -39,3 +42,22 @@ crossbow_job_prefix="release-${version_with_rc}"
: ${CROSSBOW_JOB_ID:="${crossbow_job_prefix}-${CROSSBOW_JOB_NUMBER}"}
archery crossbow download-artifacts --no-fetch ${CROSSBOW_JOB_ID} "$@"
+
+# Wait for the GitHub Workflow that creates the Linux packages
+# to finish before downloading the artifacts.
+. "${SOURCE_DIR}/utils-watch-gh-workflow.sh" "${release_tag}"
"package_linux.yml"
+
+RUN_ID=$(get_run_id)
Review Comment:
Ah, you want to use a function defined in `utils-watch-gh-workflow.sh`.
##########
.github/workflows/package_linux.yml:
##########
@@ -0,0 +1,298 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+name: Package Linux
+
+on:
+ push:
+ branches:
+ - '**'
+ - '!dependabot/**'
+ paths:
+ - '.dockerignore'
+ - '.github/workflows/check_labels.yml'
+ - '.github/workflows/package_linux.yml'
+ - '.github/workflows/report_ci.yml'
+ - 'cpp/**'
+ - 'c_glib/**'
+ - 'dev/tasks/linux-packages/**'
+ - 'format/Flight.proto'
+ - 'testing'
+ tags:
+ - '**'
+ pull_request:
+ paths:
+ - '.dockerignore'
+ - '.github/workflows/check_labels.yml'
+ - '.github/workflows/package_linux.yml'
+ - '.github/workflows/report_ci.yml'
+ - 'cpp/**'
+ - 'c_glib/**'
+ - 'dev/tasks/linux-packages/**'
+ - 'format/Flight.proto'
+ - 'testing'
+ types:
+ - labeled
+ - opened
+ - reopened
+ - synchronize
+ schedule:
+ - cron: "0 2 * * *"
+ workflow_dispatch:
+ inputs:
+ version:
+ description: "The Arrow version"
+ type: string
+ required: true
+ no_rc_version:
+ description: "The Arrow version without RC"
+ type: string
+ required: true
+
+concurrency:
+ group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{
github.workflow }}
+ cancel-in-progress: true
+
+permissions:
+ contents: read
+
+jobs:
+ check-labels:
+ uses: ./.github/workflows/check_labels.yml
+ secrets: inherit
+ with:
+ parent-workflow: package_linux
+
+ package:
+ name: ${{ matrix.id }}
+ runs-on: ${{ contains(matrix.id, 'amd64') && 'ubuntu-latest' ||
'ubuntu-24.04-arm' }}
+ needs: check-labels
+ if: >-
+ needs.check-labels.outputs.force == 'true' ||
+ contains(fromJSON(needs.check-labels.outputs.ci-extra-labels || '[]'),
'CI: Extra') ||
+ contains(fromJSON(needs.check-labels.outputs.ci-extra-labels || '[]'),
'CI: Extra: Package: Linux')
+ timeout-minutes: 75
+ strategy:
+ fail-fast: false
+ matrix:
+ id:
+ - almalinux-8-amd64
+ - almalinux-8-arm64
+ - almalinux-9-amd64
+ - almalinux-9-arm64
+ - almalinux-10-amd64
+ - almalinux-10-arm64
+ - amazon-linux-2023-amd64
+ - amazon-linux-2023-arm64
+ - centos-9-stream-amd64
+ - centos-9-stream-arm64
+ - centos-7-amd64
+ - debian-bookworm-amd64
+ - debian-bookworm-arm64
+ - debian-trixie-amd64
+ - debian-trixie-arm64
+ - debian-forky-amd64
+ - debian-forky-arm64
+ - ubuntu-jammy-amd64
+ - ubuntu-jammy-arm64
+ - ubuntu-noble-amd64
+ - ubuntu-noble-arm64
+ env:
+ DOCKER_VOLUME_PREFIX: ".docker/"
+ ARROW_VERSION: ${{ inputs.version || ''}}
+ NO_RC_VERSION: ${{ inputs.no_rc_version || ''}}
+ steps:
+ - name: Prepare environment variables
+ env:
+ ID: ${{ matrix.id }}
+ run: |
+ set -ex
+ # Example: almalinux-8-amd64 -> amd64
+ architecture="${ID##*-}"
+ echo "ARCHITECTURE=${architecture}" >> "${GITHUB_ENV}"
+ # Example: almalinux-8-amd64 -> almalinux-8
+ target="${ID%-*}"
+ case "${target}" in
+ almalinux-*|amazon-linux-*|centos-*)
+ echo "TASK_NAMESPACE=yum" >> "${GITHUB_ENV}"
+ echo "UPLOAD_EXTENSIONS=rpm" >> "${GITHUB_ENV}"
+ if [[ "${architecture}" == "arm64" ]]; then
+ # Example: almalinux-8 -> almalinux-8-aarch64
+ target="${target}-aarch64"
+ fi
+ echo "YUM_TARGETS=${target}" >> "${GITHUB_ENV}"
+ ;;
+ *)
+ echo "TASK_NAMESPACE=apt" >> "${GITHUB_ENV}"
+ upload_extensions=(ddeb deb debian.tar.xz .dsc .orig.tar.gz)
+ echo "UPLOAD_EXTENSIONS=${upload_extensions[*]}" >>
"${GITHUB_ENV}"
+ if [[ "${architecture}" == "arm64" ]]; then
+ # Example: ubuntu-noble -> ubuntu-noble-arm64
+ target="${target}-arm64"
+ fi
+ echo "APT_TARGETS=${target}" >> "${GITHUB_ENV}"
+ ;;
+ esac
+ echo "TARGET=${target}" >> "${GITHUB_ENV}"
+ - name: Checkout Arrow
+ uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 #
v5.0.0
+ with:
+ fetch-depth: 0
+ submodules: recursive
+ - name: Free up disk space
+ if: runner.os == 'Linux' && runner.arch == 'X64'
+ shell: bash
+ run: |
+ ci/scripts/util_free_space.sh
+ - name: Cache Docker Volumes
+ uses: actions/cache@5a3ec84eff668545956fd18022155c47e93e2684 # v4.2.3
+ with:
+ path: .docker
+ key: package-linux-${{ matrix.id }}-${{ hashFiles('cpp/**') }}
+ restore-keys: package-linux-${{ matrix.id }}-
+ - name: Set up Ruby
+ run: |
+ sudo apt update
+ sudo apt install -y \
+ rake \
+ ruby \
+ ruby-dev
+ - name: Prepare apache-arrow-apt-source for arm64
+ if: ${{ env.ARCHITECTURE == 'arm64' }}
+ run: |
+ pushd dev/tasks/linux-packages/apache-arrow-apt-source/apt
+ for target in *-*; do
+ cp -a ${target} ${target}-arm64
+ done
+ popd
+ - name: Prepare apache-arrow-release for arm64
+ if: ${{ env.ARCHITECTURE == 'arm64' }}
+ run: |
+ pushd dev/tasks/linux-packages/apache-arrow-release/yum
+ for target in *-*; do
+ cp -a ${target} ${target}-aarch64
+ done
+ popd
+ - name: Get Arrow Version
+ id: get-arrow-version
+ run: |
+ # In case of dev (no workflow_dispatch inputs) compute ARROW_VERSION
and
+ # use the same version for NO_RC_VERSION.
+ if [ -z "${ARROW_VERSION}" ]; then
+ ARROW_VERSION=$(git describe --tags --abbrev=0 --match
"apache-arrow-[0-9]*.*" | sed 's/^apache-arrow-//')
+ ARROW_VERSION="${ARROW_VERSION}$(date +%Y%m%d)"
+ echo "ARROW_VERSION=${ARROW_VERSION}" >> $GITHUB_ENV
+ echo "NO_RC_VERSION=${ARROW_VERSION}" >> $GITHUB_ENV
+ fi
+ - name: Build
+ run: |
+ set -e
+ pushd dev/tasks/linux-packages
+ rake version:update ARROW_RELEASE_TIME="$(date --iso-8601=seconds)"
+ rake docker:pull || :
+ rake --trace ${TASK_NAMESPACE}:build BUILD_DIR=build
+ popd
+ env:
+ REPO: ghcr.io/${{ github.repository }}-package-linux
+ - name: Login to Dockerhub
+ if: >-
+ success() &&
+ github.event_name == 'push' &&
+ github.repository == 'apache/arrow' &&
+ github.ref_name == 'main'
+ uses: docker/login-action@184bdaa0721073962dff0199f1fb9940f07167d1 #
v3.5.0
+ with:
+ registry: ghcr.io
+ username: ${{ github.actor }}
+ password: ${{ secrets.GITHUB_TOKEN}}
+ - name: Docker Push
+ continue-on-error: true
+ if: >-
+ success() &&
+ github.event_name == 'push' &&
+ github.repository == 'apache/arrow' &&
+ github.ref_name == 'main'
+ shell: bash
+ run: |
+ pushd dev/tasks/linux-packages
+ rake docker:push
+ popd
+ env:
+ REPO: ${{ secrets.REPO }}
+ - name: Build artifact tarball
+ shell: bash
+ run: |
+ set -ex
+ pushd dev/tasks/linux-packages
+ tar cvzf ${{ matrix.id }}.tar.gz */${TASK_NAMESPACE}/repositories
Review Comment:
Let's simplify paths more to simplify `binary-task.rb`:
```suggestion
mkdir -p artifacts
cp -a */${TASK_NAMESPACE}/repositories/* artifacts/
tar cvzf ${{ matrix.id }}.tar.gz -C artifacts .
```
##########
dev/tasks/linux-packages/helper.rb:
##########
@@ -47,13 +47,13 @@ def arrow_source_dir
def detect_version(release_time)
version_env = ENV["ARROW_VERSION"]
- return version_env if version_env
+ return version_env unless version_env.nil? || version_env.empty?
Review Comment:
We can simplify this:
```suggestion
return version_env unless version_env.to_s.empty?
```
##########
dev/release/binary-task.rb:
##########
@@ -1645,45 +1645,37 @@ def define_apt_rc_tasks
progress_label = "Copying: #{distribution} #{code_name}"
progress_reporter = ProgressReporter.new(progress_label)
- distribution_dir = "#{incoming_dir}/#{distribution}"
- pool_dir = "#{distribution_dir}/pool/#{code_name}"
- rm_rf(pool_dir, verbose: verbose?)
- mkdir_p(pool_dir, verbose: verbose?)
- source_dir_prefix = "#{artifacts_dir}/#{distribution}-#{code_name}"
- # apache/arrow uses debian-bookworm-{amd64,arm64} but
- # apache/arrow-adbc uses debian-bookworm. So the following
- # glob must much both of them.
- Dir.glob("#{source_dir_prefix}*/*") do |path|
- base_name = File.basename(path)
+ destination_prefix = "#{incoming_dir}/#{distribution}"
+ rm_rf(destination_prefix, verbose: verbose?)
+
+ # Copy the entire repository structure
+ source_pattern =
"#{artifacts_dir}/#{distribution}-#{code_name}*/*/" \
+ "apt/repositories/#{distribution}"
+ Dir.glob(source_pattern) do |repo_source|
+ progress_reporter.increment_max
+ mkdir_p(File.dirname(destination_prefix), verbose: verbose?)
+ cp_r(repo_source, destination_prefix, preserve: true, verbose:
verbose?)
+ progress_reporter.advance
+ end
+
+ # Create latest package links after copying
+ Dir.glob("#{destination_prefix}/**/*-apt-source_*.deb") do
|apt_source_path|
+ base_name = File.basename(apt_source_path)
package_name = ENV["DEB_PACKAGE_NAME"]
Review Comment:
We can remove this because `*-apt-source` is only used in this block. We
don't need to customize it by an environment variable.
##########
dev/release/binary-task.rb:
##########
@@ -2043,59 +2035,23 @@ def define_yum_rc_tasks
progress_label = "Copying: #{distribution} #{distribution_version}"
progress_reporter = ProgressReporter.new(progress_label)
- destination_prefix = [
- incoming_dir,
- distribution,
- distribution_version,
- ].join("/")
+ destination_prefix =
"#{incoming_dir}/#{distribution}/#{distribution_version}"
Review Comment:
Oh... This part uses `prefix` not `dir`... Sorry... This should be `dir`...
##########
dev/release/04-binary-download.sh:
##########
@@ -39,3 +42,22 @@ crossbow_job_prefix="release-${version_with_rc}"
: ${CROSSBOW_JOB_ID:="${crossbow_job_prefix}-${CROSSBOW_JOB_NUMBER}"}
archery crossbow download-artifacts --no-fetch ${CROSSBOW_JOB_ID} "$@"
+
+# Wait for the GitHub Workflow that creates the Linux packages
+# to finish before downloading the artifacts.
+. "${SOURCE_DIR}/utils-watch-gh-workflow.sh" "${release_tag}"
"package_linux.yml"
+
+RUN_ID=$(get_run_id)
+# Download the artifacts created by the package_linux.yml workflow
+download_artifacts "${SOURCE_DIR}/../../packages/${CROSSBOW_JOB_ID}"
Review Comment:
How about using GitHub Release for RC instead of GitHub Actions artifacts?
If we use GitHub Release, we don't need `RUN_ID`: `gh release download
apache-arrow-${VERSION}-rc${RC} ...`
##########
dev/release/binary-task.rb:
##########
@@ -1645,45 +1645,37 @@ def define_apt_rc_tasks
progress_label = "Copying: #{distribution} #{code_name}"
progress_reporter = ProgressReporter.new(progress_label)
- distribution_dir = "#{incoming_dir}/#{distribution}"
- pool_dir = "#{distribution_dir}/pool/#{code_name}"
- rm_rf(pool_dir, verbose: verbose?)
- mkdir_p(pool_dir, verbose: verbose?)
- source_dir_prefix = "#{artifacts_dir}/#{distribution}-#{code_name}"
- # apache/arrow uses debian-bookworm-{amd64,arm64} but
- # apache/arrow-adbc uses debian-bookworm. So the following
- # glob must much both of them.
- Dir.glob("#{source_dir_prefix}*/*") do |path|
- base_name = File.basename(path)
+ destination_prefix = "#{incoming_dir}/#{distribution}"
Review Comment:
Thanks for writing Ruby!
Sure. I can help. Can I push something to this branch if it's needed?
BTW, `destination_dir` is better for this variable name because this is a
directory name not a prefix of path. (In this code, "prefix" is used for `abc`
for `abcde` directory and "dir" is used for `abcde` directory.)
##########
dev/release/utils-watch-gh-workflow.sh:
##########
@@ -30,17 +30,30 @@ TAG=$1
WORKFLOW=$2
: "${REPOSITORY:=${GITHUB_REPOSITORY:-apache/arrow}}"
+get_run_id() {
+ gh run list \
+ --branch "${TAG}" \
+ --jq '.[].databaseId' \
+ --json databaseId \
+ --limit 1 \
+ --repo "${REPOSITORY}" \
+ --workflow "${WORKFLOW}"
+}
+
+download_artifacts() {
+ RUN_ID=$(get_run_id)
+ echo "Downloading artitfacts for workflow with ID: ${RUN_ID}"
+ gh run download \
+ ${RUN_ID} \
+ --repo "${REPOSITORY}" \
+ --dir "$1"
+}
Review Comment:
Hmm. Defining this function in this file isn't good because this file name
is `watch-gh-workflow`.
##########
dev/release/04-binary-download.sh:
##########
@@ -39,3 +42,22 @@ crossbow_job_prefix="release-${version_with_rc}"
: ${CROSSBOW_JOB_ID:="${crossbow_job_prefix}-${CROSSBOW_JOB_NUMBER}"}
archery crossbow download-artifacts --no-fetch ${CROSSBOW_JOB_ID} "$@"
+
+# Wait for the GitHub Workflow that creates the Linux packages
+# to finish before downloading the artifacts.
+. "${SOURCE_DIR}/utils-watch-gh-workflow.sh" "${release_tag}"
"package_linux.yml"
Review Comment:
We don't need `.` here because all information is passed via arguments:
```suggestion
"${SOURCE_DIR}/utils-watch-gh-workflow.sh" "${release_tag}"
"package_linux.yml"
```
##########
dev/release/post-05-update-gh-release-notes.sh:
##########
@@ -36,7 +36,7 @@ WORKFLOW="release.yml"
# Wait for the GitHub Workflow that creates the GitHub Release
# to finish before updating the release notes.
-"${SOURCE_DIR}/utils-watch-gh-workflow.sh" "${TAG}" "${WORKFLOW}"
+. "${SOURCE_DIR}/utils-watch-gh-workflow.sh" "${TAG}" "${WORKFLOW}"
Review Comment:
We don't need `.` here.
##########
dev/release/binary-task.rb:
##########
@@ -1645,45 +1645,37 @@ def define_apt_rc_tasks
progress_label = "Copying: #{distribution} #{code_name}"
progress_reporter = ProgressReporter.new(progress_label)
- distribution_dir = "#{incoming_dir}/#{distribution}"
- pool_dir = "#{distribution_dir}/pool/#{code_name}"
- rm_rf(pool_dir, verbose: verbose?)
- mkdir_p(pool_dir, verbose: verbose?)
- source_dir_prefix = "#{artifacts_dir}/#{distribution}-#{code_name}"
- # apache/arrow uses debian-bookworm-{amd64,arm64} but
- # apache/arrow-adbc uses debian-bookworm. So the following
- # glob must much both of them.
Review Comment:
Yes.
But I can change ADBC to simplify this script.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]