This is an automated email from the ASF dual-hosted git repository.
espino pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/cloudberry.git
The following commit(s) were added to refs/heads/main by this push:
new d6b001ea91b Improve CI reliability and developer productivity through
test scheduling optimizations, mirror stability fixes, and a new artifact reuse
feature. (#1379)
d6b001ea91b is described below
commit d6b001ea91b0108fd834371d89a637b99328c502
Author: Ed Espino <[email protected]>
AuthorDate: Tue Oct 7 08:41:17 2025 -0700
Improve CI reliability and developer productivity through test scheduling
optimizations, mirror stability fixes, and a new artifact reuse feature. (#1379)
* Document disk-intensive test placement in greenplum_schedule
Add comment explaining why autovacuum-template0-segment and profile tests
are positioned early in the test schedule. These tests consume significant
disk space through WAL generation, XID consumption, and autovacuum
operations.
Running them early when ~20GB disk space is available (vs ~10GB later) helps
avoid disk exhaustion issues during test execution.
* Fix Rocky Linux mirror instability in CI
Add repository metadata refresh and retry logic to handle transient
mirror failures during RPM installation. This addresses frequent 404
errors from Rocky Linux mirrors that cause CI failures.
Changes:
- Run 'dnf clean all' and 'dnf makecache --refresh' before installation
- Add '--setopt=retries=10' to dnf install command
- Apply fix to both rpm-install-test and test jobs
This improves CI reliability without changing functionality.
* Add artifact reuse feature for faster test iteration
Enable reusing build artifacts from previous workflow runs to speed up
test iteration by ~50-70 minutes. This is useful for debugging test
failures without rebuilding.
Changes:
- Add 'reuse_artifacts_from_run_id' workflow input parameter
- Skip build job when reusing artifacts from specified run
- Skip rpm-install-test job when reusing artifacts
- Update artifact download steps to support cross-run downloads
- Add proper job conditionals to handle skipped build job
Usage:
Manually trigger workflow and specify a previous run ID in the
'reuse_artifacts_from_run_id' input field. Leave empty to build fresh.
This maintains backward compatibility - default behavior unchanged.
* Add GitHub Actions workflow documentation for developers
Create comprehensive documentation for GitHub Actions workflows, focusing
on features that help developers iterate faster when debugging CI issues.
Key sections:
- Manual workflow triggers and input parameters
- Artifact reuse feature with step-by-step guide
- Running workflows in forked repositories
- Troubleshooting common issues
This documentation enables developers to:
- Reuse build artifacts to save ~50-70 minutes per test iteration
- Run CI validation in their forks before submitting PRs
- Understand available workflow options and test selections
- Debug test failures more efficiently
* Pin Rocky Linux repos to stable 9.x release
Use --releasever=9 to pin dnf to stable Rocky Linux 9.x repos instead
of bleeding-edge point releases (e.g., 9.6) that may not be fully synced
across all mirrors.
Rocky Linux maintains binary compatibility within major versions, so
pinning to 9 ensures we get stable, widely-mirrored packages while
remaining compatible with the 9.6 container OS.
This complements the earlier retry/refresh logic by addressing the root
cause: new point releases have metadata sync lag across mirror network.
* Move all autovacuum tests to early execution
Move autovacuum and autovacuum-segment tests alongside
autovacuum-template0-segment to run early in the schedule when more
disk space is available.
All three autovacuum tests are disk-intensive and benefit from running
when ~20GB is available rather than later when space may be constrained.
This grouping also improves test organization by keeping related tests
together.
* Clarify secrets configuration in workflow documentation
Update README to clarify that no manual secret configuration is required
for normal development workflows:
- GITHUB_TOKEN is automatically provided by GitHub
- Only used for artifact reuse feature (downloading previous run artifacts)
- DockerHub secrets only needed for custom container image builds
(advanced/maintainer use case)
This removes confusion about required setup steps for fork users.
---
.github/workflows/README.md | 258 +++++++++++++++++++++++++++++++++
.github/workflows/build-cloudberry.yml | 39 ++++-
src/test/regress/greenplum_schedule | 16 +-
3 files changed, 303 insertions(+), 10 deletions(-)
diff --git a/.github/workflows/README.md b/.github/workflows/README.md
new file mode 100644
index 00000000000..ae1651742e0
--- /dev/null
+++ b/.github/workflows/README.md
@@ -0,0 +1,258 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+# GitHub Actions Workflows
+
+This directory contains GitHub Actions workflows for Apache Cloudberry CI/CD.
+
+## Table of Contents
+
+- [Available Workflows](#available-workflows)
+- [Manual Workflow Triggers](#manual-workflow-triggers)
+- [Artifact Reuse for Faster Testing](#artifact-reuse-for-faster-testing)
+- [Running Workflows in Forked
Repositories](#running-workflows-in-forked-repositories)
+
+## Available Workflows
+
+| Workflow | Purpose | Trigger |
+|----------|---------|---------|
+| `build-cloudberry.yml` | Main CI: build, test, create RPMs | Push, PR,
Manual |
+| `build-dbg-cloudberry.yml` | Debug build with assertions enabled | Push, PR,
Manual |
+| `apache-rat-audit.yml` | License header compliance check | Push, PR |
+| `coverity.yml` | Static code analysis with Coverity | Weekly, Manual |
+| `sonarqube.yml` | Code quality analysis with SonarQube | Push to main |
+| `docker-cbdb-build-containers.yml` | Build Docker images for CI | Manual |
+| `docker-cbdb-test-containers.yml` | Build test Docker images | Manual |
+
+## Manual Workflow Triggers
+
+Many workflows support manual triggering via `workflow_dispatch`, allowing
developers to run CI jobs on-demand.
+
+### How to Manually Trigger a Workflow
+
+1. Navigate to the **Actions** tab in GitHub
+2. Select the workflow from the left sidebar (e.g., "Build and Test
Cloudberry")
+3. Click **Run workflow** button (top right)
+4. Select your branch
+5. Configure input parameters (if available)
+6. Click **Run workflow**
+
+### Workflow Input Parameters
+
+#### `build-cloudberry.yml` - Main CI
+
+| Parameter | Description | Default | Example |
+|-----------|-------------|---------|---------|
+| `test_selection` | Comma-separated list of tests to run, or "all" | `all` |
`ic-good-opt-off,ic-contrib` |
+| `reuse_artifacts_from_run_id` | Run ID to reuse build artifacts from (see
below) | _(empty)_ | `12345678901` |
+
+**Available test selections:**
+- `all` - Run all test suites
+- `ic-good-opt-off` - Installcheck with optimizer off
+- `ic-good-opt-on` - Installcheck with optimizer on
+- `ic-contrib` - Contrib extension tests
+- `ic-resgroup` - Resource group tests
+- `ic-resgroup-v2` - Resource group v2 tests
+- `ic-resgroup-v2-memory-accounting` - Resource group memory tests
+- `ic-singlenode` - Single-node mode tests
+- `make-installcheck-world` - Full test suite
+- And more... (see workflow for complete list)
+
+## Artifact Reuse for Faster Testing
+
+When debugging test failures, rebuilding Cloudberry (~50-70 minutes) on every
iteration is inefficient. The artifact reuse feature allows you to reuse build
artifacts from a previous successful run.
+
+### How It Works
+
+1. Build artifacts (RPMs, source tarballs) from a previous workflow run are
downloaded
+2. Build job is skipped (saves ~45-60 minutes)
+3. RPM installation test is skipped (saves ~5-10 minutes)
+4. Test jobs run with the reused artifacts
+5. You can iterate on test configurations without rebuilding
+
+### Step-by-Step Guide
+
+#### 1. Find the Run ID
+
+After a successful build (even if tests failed), get the run ID:
+
+**Option A: From GitHub Actions UI**
+- Go to **Actions** tab → Click on a completed workflow run
+- The URL will be:
`https://github.com/apache/cloudberry/actions/runs/12345678901`
+- The run ID is `12345678901`
+
+**Option B: From GitHub API**
+```bash
+# List recent workflow runs
+gh run list --workflow=build-cloudberry.yml --limit 5
+
+# Get run ID from specific branch
+gh run list --workflow=build-cloudberry.yml --branch=my-feature --limit 1
+```
+
+#### 2. Trigger New Run with Artifact Reuse
+
+**Via GitHub UI:**
+1. Go to **Actions** → **Build and Test Cloudberry**
+2. Click **Run workflow**
+3. Enter the run ID in **"Reuse build artifacts from a previous run ID"**
+4. Optionally customize **test_selection**
+5. Click **Run workflow**
+
+**Via GitHub CLI:**
+```bash
+# Reuse artifacts from run 12345678901, run only specific tests
+gh workflow run build-cloudberry.yml \
+ --field reuse_artifacts_from_run_id=12345678901 \
+ --field test_selection=ic-good-opt-off
+```
+
+#### 3. Monitor Test Execution
+
+- Build job will be skipped (shows as "Skipped" in Actions UI)
+- RPM Install Test will be skipped
+- Test jobs will run with artifacts from the specified run ID
+- Total time: ~15-30 minutes (vs ~65-100 minutes for full build+test)
+
+### Use Cases
+
+**Debugging a specific test failure:**
+```bash
+# Run 1: Full build + all tests (finds test failure in ic-good-opt-off)
+gh workflow run build-cloudberry.yml
+
+# Get the run ID from output
+RUN_ID=$(gh run list --workflow=build-cloudberry.yml --limit 1 --json
databaseId --jq '.[0].databaseId')
+
+# Run 2: Reuse artifacts, run only failing test
+gh workflow run build-cloudberry.yml \
+ --field reuse_artifacts_from_run_id=$RUN_ID \
+ --field test_selection=ic-good-opt-off
+```
+
+**Testing different configurations:**
+```bash
+# Test with optimizer off, then on, using same build
+gh workflow run build-cloudberry.yml \
+ --field reuse_artifacts_from_run_id=$RUN_ID \
+ --field test_selection=ic-good-opt-off
+
+gh workflow run build-cloudberry.yml \
+ --field reuse_artifacts_from_run_id=$RUN_ID \
+ --field test_selection=ic-good-opt-on
+```
+
+### Limitations
+
+- Artifacts expire after 90 days (GitHub default retention)
+- Run ID must be from the same repository (or accessible fork)
+- Artifacts must include both RPM and source build artifacts
+- Cannot reuse artifacts across different OS/architecture combinations
+- Changes to source code require a fresh build
+
+## Running Workflows in Forked Repositories
+
+GitHub Actions workflows are enabled in forks, allowing you to validate
changes before submitting a Pull Request.
+
+### Initial Setup (One-Time)
+
+1. **Fork the repository** to your GitHub account
+
+2. **Enable GitHub Actions** in your fork:
+ - Go to your fork's **Actions** tab
+ - Click **"I understand my workflows, go ahead and enable them"**
+
+**Secrets Configuration:**
+
+No manual secret configuration is required for the main build and test
workflows.
+
+- `GITHUB_TOKEN` is automatically provided by GitHub and used when downloading
artifacts from previous runs (artifact reuse feature)
+- DockerHub secrets (`DOCKERHUB_USER`, `DOCKERHUB_TOKEN`) are only required
for building custom container images (advanced/maintainer use case, not needed
for typical development)
+
+### Workflow Behavior in Forks
+
+- ✅ **Automated triggers work**: Push and PR events trigger workflows
+- ✅ **Manual triggers work**: `workflow_dispatch` is fully functional
+- ✅ **Artifact reuse works**: Can reuse artifacts from previous runs in your
fork
+- ⚠️ **Cross-fork artifact reuse**: Not supported (security restriction)
+- ⚠️ **Some features may be limited**: Certain features requiring
organization-level secrets may not work
+
+### Best Practices for Fork Development
+
+1. **Test locally first** when possible (faster iteration)
+2. **Use manual triggers** to avoid burning GitHub Actions minutes
unnecessarily
+3. **Use artifact reuse** to iterate on test failures efficiently
+4. **Push to feature branches** to trigger automated CI
+5. **Review Actions tab** to ensure workflows completed successfully before
opening PR
+
+### Example Fork Workflow
+
+```bash
+# 1. Create feature branch in fork
+git checkout -b fix-test-failure
+
+# 2. Make changes and push to fork
+git commit -am "Fix test failure"
+git push origin fix-test-failure
+
+# 3. CI runs automatically on push
+
+# 4. If tests fail, iterate using artifact reuse
+# Get run ID from your fork's Actions tab
+gh workflow run build-cloudberry.yml \
+ --field reuse_artifacts_from_run_id=12345678901 \
+ --field test_selection=ic-good-opt-off
+
+# 5. Once tests pass, open PR to upstream
+gh pr create --web
+```
+
+## Troubleshooting
+
+### "Build job was skipped but tests failed to start"
+
+**Cause:** Artifacts from specified run ID not found or expired
+
+**Solution:**
+- Verify the run ID is correct
+- Check that run completed successfully (built artifacts)
+- Run a fresh build if artifacts expired (>90 days)
+
+### "Workflow not found in fork"
+
+**Cause:** GitHub Actions not enabled in fork
+
+**Solution:**
+- Go to fork's **Actions** tab
+- Click to enable workflows
+
+### "Resource not accessible by integration"
+
+**Cause:** Workflow trying to access artifacts from different repository
+
+**Solution:**
+- Can only reuse artifacts from same repository
+- Run a fresh build in your fork first, then reuse those artifacts
+
+## Additional Resources
+
+- [GitHub Actions Documentation](https://docs.github.com/en/actions)
+- [Cloudberry Contributing Guide](../../CONTRIBUTING.md)
+- [Cloudberry Build Guide](../../deploy/build/README.md)
+- [DevOps Scripts](../../devops/README.md)
diff --git a/.github/workflows/build-cloudberry.yml
b/.github/workflows/build-cloudberry.yml
index fecd44a9637..04d5e827b6e 100644
--- a/.github/workflows/build-cloudberry.yml
+++ b/.github/workflows/build-cloudberry.yml
@@ -113,6 +113,11 @@ on:
required: false
default: 'all'
type: string
+ reuse_artifacts_from_run_id:
+ description: 'Reuse build artifacts from a previous run ID (leave
empty to build fresh)'
+ required: false
+ default: ''
+ type: string
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
@@ -412,6 +417,7 @@ jobs:
needs: [check-skip]
runs-on: ubuntu-22.04
timeout-minutes: 120
+ if: github.event.inputs.reuse_artifacts_from_run_id == ''
outputs:
build_timestamp: ${{ steps.set_timestamp.outputs.timestamp }}
@@ -687,6 +693,10 @@ jobs:
rpm-install-test:
name: RPM Install Test Apache Cloudberry
needs: [check-skip, build]
+ if: |
+ !cancelled() &&
+ (needs.build.result == 'success' || needs.build.result == 'skipped') &&
+ github.event.inputs.reuse_artifacts_from_run_id == ''
runs-on: ubuntu-22.04
timeout-minutes: 120
@@ -710,6 +720,8 @@ jobs:
name: apache-cloudberry-db-incubating-rpm-build-artifacts
path: ${{ github.workspace }}/rpm_build_artifacts
merge-multiple: false
+ run-id: ${{ github.event.inputs.reuse_artifacts_from_run_id ||
github.run_id }}
+ github-token: ${{ secrets.GITHUB_TOKEN }}
- name: Cloudberry Environment Initialization
if: needs.check-skip.outputs.should_skip != 'true'
@@ -814,12 +826,18 @@ jobs:
echo "Version: ${RPM_VERSION}"
echo "Release: ${RPM_RELEASE}"
+ # Refresh repository metadata to avoid mirror issues
+ echo "Refreshing repository metadata..."
+ dnf clean all
+ dnf makecache --refresh || dnf makecache
+
# Clean install location
rm -rf /usr/local/cloudberry-db
- # Install RPM
+ # Install RPM with retry logic for mirror issues
+ # Use --releasever=9 to pin to stable Rocky Linux 9 repos (not
bleeding-edge 9.6)
echo "Starting installation..."
- if ! time dnf install -y "${RPM_FILE}"; then
+ if ! time dnf install -y --setopt=retries=10 --releasever=9
"${RPM_FILE}"; then
echo "::error::RPM installation failed"
exit 1
fi
@@ -858,6 +876,9 @@ jobs:
test:
name: ${{ matrix.test }}
needs: [check-skip, build, prepare-test-matrix]
+ if: |
+ !cancelled() &&
+ (needs.build.result == 'success' || needs.build.result == 'skipped')
runs-on: ubuntu-22.04
timeout-minutes: 120
# actionlint-allow matrix[*].pg_settings
@@ -1087,6 +1108,8 @@ jobs:
name: apache-cloudberry-db-incubating-rpm-build-artifacts
path: ${{ github.workspace }}/rpm_build_artifacts
merge-multiple: false
+ run-id: ${{ github.event.inputs.reuse_artifacts_from_run_id ||
github.run_id }}
+ github-token: ${{ secrets.GITHUB_TOKEN }}
- name: Download Cloudberry Source build artifacts
if: needs.check-skip.outputs.should_skip != 'true'
@@ -1095,6 +1118,8 @@ jobs:
name: apache-cloudberry-db-incubating-source-build-artifacts
path: ${{ github.workspace }}/source_build_artifacts
merge-multiple: false
+ run-id: ${{ github.event.inputs.reuse_artifacts_from_run_id ||
github.run_id }}
+ github-token: ${{ secrets.GITHUB_TOKEN }}
- name: Verify downloaded artifacts
if: needs.check-skip.outputs.should_skip != 'true'
@@ -1186,12 +1211,18 @@ jobs:
echo "Version: ${RPM_VERSION}"
echo "Release: ${RPM_RELEASE}"
+ # Refresh repository metadata to avoid mirror issues
+ echo "Refreshing repository metadata..."
+ dnf clean all
+ dnf makecache --refresh || dnf makecache
+
# Clean install location
rm -rf /usr/local/cloudberry-db
- # Install RPM
+ # Install RPM with retry logic for mirror issues
+ # Use --releasever=9 to pin to stable Rocky Linux 9 repos (not
bleeding-edge 9.6)
echo "Starting installation..."
- if ! time dnf install -y "${RPM_FILE}"; then
+ if ! time dnf install -y --setopt=retries=10 --releasever=9
"${RPM_FILE}"; then
echo "::error::RPM installation failed"
exit 1
fi
diff --git a/src/test/regress/greenplum_schedule
b/src/test/regress/greenplum_schedule
index ecf37e73029..039e8d7e9c4 100755
--- a/src/test/regress/greenplum_schedule
+++ b/src/test/regress/greenplum_schedule
@@ -15,6 +15,16 @@
# hitting max_connections limit on segments.
#
+# Run disk-intensive tests early when maximum disk space is available.
+# These tests consume significant disk space through WAL generation, XID
consumption,
+# and autovacuum operations. Running them early helps avoid disk exhaustion
issues.
+test: autovacuum
+test: autovacuum-segment
+test: autovacuum-template0-segment
+
+# check profile feature
+test: profile
+
# test for builtin namespace pg_ext_aux
test: pg_ext_aux
@@ -321,9 +331,6 @@ test: oid_wraparound
# hence it should be run in isolation.
test: fts_recovery_in_progress
ignore: mirror_replay
-test: autovacuum
-test: autovacuum-segment
-test: autovacuum-template0-segment
# gpexpand introduce the partial tables, check them if they can run correctly
test: gangsize gang_reuse
@@ -334,9 +341,6 @@ test: run_utility_gpexpand_phase1
# check correct error message when create extension error on segment
test: create_extension_fail
-# check profile feature
-test: profile
-
# check offload entry root slice to QE feature
test: offload_entry_to_qe
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]