aglinxinyuan commented on code in PR #4976: URL: https://github.com/apache/texera/pull/4976#discussion_r3205080369
########## .github/workflows/fork-ci.yml: ########## @@ -0,0 +1,99 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +# Fork CI — Apache Spark model +# +# Runs the full PR-gating suite in the contributor's fork on every branch +# push. Mirrors what required-checks.yml runs in apache/texera on a +# ci-enable/** push: build matrix + license headers, aggregated into a +# single "fork-ci" result that notify_test_workflow.yml surfaces on the PR +# as the "Build" commit status. +# +# This workflow is a no-op in the canonical apache/texera repository — +# every job is guarded by (github.repository_owner != 'apache') so it +# never consumes main-repo runner quota. +# +# Secrets (CODECOV_TOKEN, NX_CLOUD_ACCESS_TOKEN) are unavailable in forks +# and degrade gracefully: Codecov uploads are skipped (fail_ci_if_error: +# false) and NX Cloud remote caching is disabled; builds still pass. + +name: Fork CI + +on: + push: + branches-ignore: + - main + - 'release/**' + - 'ci-enable/**' + +permissions: + contents: read + +concurrency: + group: ${{ github.workflow }}-${{ github.ref }} + cancel-in-progress: true + +jobs: + build: + if: github.repository_owner != 'apache' + uses: ./.github/workflows/build.yml + with: + run_frontend: true + run_amber: true + run_amber_integration: true + run_platform: true + run_python: true + run_agent_service: true Review Comment: **Hardcoding all `run_*: true` bypasses the label-driven CI gating documented in [AGENTS.md → "CI labels & gating"](https://github.com/apache/texera/blob/main/AGENTS.md#ci-labels--gating) and implemented by `LABEL_STACKS` in `required-checks.yml`.** Before this PR, a docs-only PR (label union → empty) skipped every build stack; a frontend-only PR ran only `frontend`; etc. After this PR, fork CI runs the full matrix on every push regardless of labels. Apache's runner budget is unaffected (public-fork minutes are free), but this means slower PR feedback for contributors and wasted compute on every docs-only change. Two ways to keep parity: 1. Mirror the `LABEL_STACKS` precheck logic into `fork-ci.yml` (re-fetch PR labels for `head_ref` via the GitHub API on push, set the `run_*` outputs accordingly). The fork has read access to PR labels on the upstream repo. 2. Accept the regression for now and update the AGENTS.md "CI labels & gating" section to reflect that label gating now applies only to push events / backports, not PR builds. ########## .github/workflows/update_build_status.yml: ########## @@ -0,0 +1,186 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +name: Update build status workflow + +on: + workflow_run: + workflows: ["On pull request update"] + types: [completed] + schedule: + - cron: "*/5 * * * *" + workflow_dispatch: + +jobs: + update: + name: Update build status + runs-on: ubuntu-latest + # workflow_run mode polls up to 60 min waiting for fork CI completion. + # Other modes (cron, workflow_dispatch) finish in seconds. + timeout-minutes: 65 + permissions: + actions: read + statuses: write + steps: + - name: "Update build status" + uses: actions/github-script@v8 + env: + TRIGGER_EVENT: ${{ github.event_name }} + with: + github-token: ${{ secrets.GITHUB_TOKEN }} + script: | + const isLongPoll = process.env.TRIGGER_EVENT === 'workflow_run' + const maxIterations = isLongPoll ? 120 : 1 // 120 * 30s = 60 min + const sleepMs = 30000 + const statusContext = 'Build' + + console.log('=== update_build_status: ' + new Date().toISOString() + ' ===') + console.log('trigger=' + process.env.TRIGGER_EVENT + ' mode=' + (isLongPoll ? 'long-poll up to ' + maxIterations + ' iters' : 'single pass')) + + // Map fork run state -> commit status state. + function runToStatusState(run) { + if (run.status !== 'completed') return 'pending' + if (run.conclusion === 'success') return 'success' + if (run.conclusion === 'cancelled') return 'error' + return 'failure' + } + + // Find the most recent Build status for a SHA. /statuses/{sha} + // returns statuses in reverse chronological order; the first + // matching context is current. + async function getCurrentBuildStatus(sha) { + try { + const resp = await github.request('GET /repos/{owner}/{repo}/commits/{ref}/statuses', { + owner: context.repo.owner, + repo: context.repo.repo, + ref: sha, + per_page: 100, + }) + return resp.data.find(s => s.context === statusContext) || null + } catch (error) { + console.log(' -> failed to read current Build status: ' + error.message) + return null + } + } + + // One pass over all open PRs. Returns { pendingCount, updateCount } + // — pendingCount is how many PRs are still in non-terminal state + // (used by the long-poll loop to decide whether to keep going). + async function syncOnce() { + const prsIter = github.paginate.iterator( + 'GET /repos/{owner}/{repo}/pulls', + { owner: context.repo.owner, repo: context.repo.repo, state: 'open', per_page: 100 } + ) + let prCount = 0 + let updateCount = 0 + let pendingCount = 0 + + for await (const prs of prsIter) { + for (const pr of prs.data) { + prCount++ + if (!pr.head.repo) continue + if (pr.head.repo.full_name === context.repo.owner + '/' + context.repo.repo) continue + + const current = await getCurrentBuildStatus(pr.head.sha) + // Don't fight notify's deliberate non-fork failure decisions + // (blocked branch, fork inaccessible). target_url for those + // points at the PR or notify run — never at the fork run. + if (current && current.state === 'failure') { + const isForkTarget = current.target_url && current.target_url.includes('/' + pr.head.repo.full_name + '/actions/runs/') + if (!isForkTarget) continue Review Comment: **Sticky-failure bug: this skip rule prevents recovery from notify's "detection failed" path.** In `notify_test_workflow.yml` line 337-338, the "Fork CI run not detected" path sets: ```js const forkActionsUrl = 'https://github.com/' + headRepo + '/actions' await failCheck('Fork CI run not detected for ' + head_sha.substring(0, 8), forkActionsUrl) ``` That `target_url` is the fork's Actions tab — it has no `/runs/<id>/`, so `isForkTarget` here is `false`, so the `continue` fires and update never re-checks this PR. Even if the fork CI run eventually registers and completes successfully, the `Build` status stays red until the contributor force-pushes. This breaks the "5-min cron picks up late runs" recovery path that the 60s detection budget in notify implicitly assumes. Under heavy GitHub Actions load, queued-run registration past 60s is plausible. Fix options: - Use a sentinel `target_url` for detection-failed that the updater treats as "keep looking" (e.g. point at `.../actions/workflows/fork-ci.yml`, which `isForkTarget` would still skip but a new `isLookAgainTarget` check would catch). - Or special-case detection-failed by description (`current.description.startsWith('Fork CI run not detected')`) — distinct from the genuinely terminal failures (blocked branch, fork inaccessible) the comment is justifying. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
