This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new a9d24683e474 [SPARK-57002][INFRA] Enforce Upstream-First policy in 
merge_spark_pr.py cherry-pick prompts
a9d24683e474 is described below

commit a9d24683e4740506e24f99485c4ae8bdadd6496f
Author: Liang-Chi Hsieh <[email protected]>
AuthorDate: Fri May 22 15:54:07 2026 -0700

    [SPARK-57002][INFRA] Enforce Upstream-First policy in merge_spark_pr.py 
cherry-pick prompts
    
    ### What changes were proposed in this pull request?
    
    When a committer manually types `branch-M.N` at the cherry-pick prompt 
while `branch-M.x` exists and has not yet received the commit, the script now 
surfaces the Upstream-First policy and offers to pick into both branches in one 
step (the policy-compliant default). The committer can still pick only 
`branch-M.N` if the commit is genuinely a `branch-M.N`-only maintenance bugfix, 
or abort.
    
    Implementation notes:
    
    - Split `cherry_pick` into `_do_cherry_pick` (fetch + cherry-pick + push) 
and `cherry_pick` (prompt + policy check). The policy wrapper returns a list of 
refs so the main loop can advance its remaining-branches list correctly when 
one prompt consumes two branches.
    - Replace the `branch_iter` iterator with a mutable `remaining_branches` 
list in the main cherry-pick loop, so picks consumed by the two-branch path are 
accounted for in the next prompt's default.
    - Add an `already_picked` parameter to `cherry_pick` so the policy check 
skips its prompt when `branch-M.x` is in the set of refs already touched this 
session (e.g. when the PR was merged into `branch-M.x` and the loop is now 
picking into `branch-M.N`).
    
    ### Why are the changes needed?
    
    The Upstream-First backporting policy (documented in the header comment of 
`dev/merge_spark_pr.py`) requires non-bugfix commits to flow through 
`branch-M.x` before reaching `branch-M.N`. The merge script already orders 
`branch-M.x` ahead of `branch-M.N` as the cherry-pick default. However, when a 
committer types `branch-M.N` at the prompt, the script silently proceeds and 
`branch-M.x` is never revisited.
    
    This has led to commits landing on `branch-4.2` but missing `branch-4.x`. 
Six such commits observed on the current branches (as of 2026-05-22):
    
    - SPARK-56700 (#55651)
    - SPARK-56676 (#55623)
    - SPARK-56838 (#55836)
    - SPARK-56650 (#55589)
    - SPARK-56856 (#55969)
    - SPARK-56977 (#56023)
    
    All six landed on master and `branch-4.2` but were not cherry-picked to 
`branch-4.x`, requiring follow-up backports.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes for committers using `dev/merge_spark_pr.py`. When the typed 
cherry-pick target is `branch-M.N` and `branch-M.x` exists and is not yet 
picked, an additional prompt asks whether to pick into both. Accepting the 
default ("both") preserves prior behavior plus an extra cherry-pick to 
`branch-M.x`.
    
    No change when the committer accepts the default `branch-M.x` target, or 
when picking into `branch-M.x` first and `branch-M.N` second (the typical 
policy-compliant flow).
    
    ### How was this patch tested?
    
    - `python3 -m doctest dev/merge_spark_pr.py` passes (34/34, all 
pre-existing tests — none cover the new policy logic).
    - New `cherry_pick` policy logic was reviewed for behavior but **not 
exercised end-to-end**: actually running `merge_spark_pr.py` requires committer 
privileges and a live open PR to merge. Edge cases were traced by reading the 
code (PR target = master with manual branch-M.N entry; PR target = branch-M.x 
with default branch-M.N pick; multiple iterations after a two-branch pick).
    - Reviewers familiar with the merge flow are encouraged to verify behavior 
on first real use, especially the abort path and the interaction with manual 
conflict resolution inside `_do_cherry_pick`.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Generated-by: Claude Code (Opus 4.7)
    
    Closes #56058 from viirya/infra-merge-script-upstream-first-policy.
    
    Authored-by: Liang-Chi Hsieh <[email protected]>
    Signed-off-by: Liang-Chi Hsieh <[email protected]>
---
 dev/merge_spark_pr.py | 113 +++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 102 insertions(+), 11 deletions(-)

diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index b630e13b968c..6e5da30f94b9 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -470,11 +470,8 @@ def merge_pr(pr_num, target_ref, title, body, 
pr_repo_desc, pr_author, co_author
     return merge_hash
 
 
-def cherry_pick(pr_num, merge_hash, default_branch):
-    pick_ref = bold_input("Enter a branch name [%s]: " % default_branch)
-    if pick_ref == "":
-        pick_ref = default_branch
-
+def _do_cherry_pick(pr_num, merge_hash, pick_ref):
+    """Cherry-pick `merge_hash` onto `pick_ref` and push. Returns the pushed 
ref."""
     pick_branch_name = "%s_PICK_PR_%s_%s" % (BRANCH_PREFIX, pr_num, 
pick_ref.upper())
 
     run_cmd("git fetch %s %s:%s" % (PUSH_REMOTE_NAME, pick_ref, 
pick_branch_name))
@@ -495,7 +492,6 @@ def cherry_pick(pr_num, merge_hash, default_branch):
     try:
         run_cmd("git push %s %s:%s" % (PUSH_REMOTE_NAME, pick_branch_name, 
pick_ref))
     except Exception as e:
-        clean_up()
         fail("Exception while pushing: %s" % e)
 
     pick_hash = run_cmd("git rev-parse %s" % pick_branch_name)[:8]
@@ -506,6 +502,85 @@ def cherry_pick(pr_num, merge_hash, default_branch):
     return pick_ref
 
 
+def _upstream_first_sibling(target_ref, pick_ref, branch_names, 
already_picked):
+    """Return the sibling branch-M.x if Upstream-First should prompt, else 
None.
+
+    The policy only applies when the PR was merged into master: that's the 
only case
+    where the committer can type branch-M.N at the cherry-pick prompt and 
bypass the
+    rolling branch-M.x. When the PR was opened against branch-M.x the merge 
itself
+    lands there (nothing to bypass), and when it was opened against branch-M.N 
the
+    author already chose per-branch scope.
+
+    >>> _upstream_first_sibling("master", "branch-4.2", ["branch-4.x", 
"branch-4.2"], ())
+    'branch-4.x'
+    >>> _upstream_first_sibling("master", "branch-4.2", ["branch-4.x", 
"branch-4.2"],
+    ...                         ("branch-4.x",))
+    >>> _upstream_first_sibling("master", "branch-4.x", ["branch-4.x"], ())
+    >>> _upstream_first_sibling("master", "branch-4.99", ["branch-4.2"], ())
+    >>> _upstream_first_sibling("branch-4.x", "branch-4.2", ["branch-4.x", 
"branch-4.2"], ())
+    >>> _upstream_first_sibling("branch-4.2", "branch-3.5", ["branch-4.x", 
"branch-3.5"], ())
+    """
+    if target_ref != "master":
+        return None
+    m = re.match(r"^branch-(\d+)\.(\d+)$", pick_ref)
+    if not m:
+        return None
+    candidate = "branch-%s.x" % m.group(1)
+    if candidate in branch_names and candidate not in already_picked:
+        return candidate
+    return None
+
+
+def cherry_pick(pr_num, merge_hash, default_branch, branch_names, target_ref, 
already_picked=()):
+    """Prompt for a target branch and cherry-pick `merge_hash` onto it.
+
+    Enforces the Upstream-First policy (see header comment) via
+    `_upstream_first_sibling`: when the PR was merged into master and the 
committer
+    types a branch-M.N target while branch-M.x is also a known release branch 
AND
+    has not already received this commit, prompt to confirm whether to pick 
into
+    BOTH (the policy-compliant default) or branch-M.N only (treated as a
+    maintenance-only bugfix). Returns the list of refs actually picked into, so
+    the main loop can advance its remaining-branches list correctly.
+    """
+    pick_ref = bold_input("Enter a branch name [%s]: " % default_branch)
+    if pick_ref == "":
+        pick_ref = default_branch
+
+    sibling_x = _upstream_first_sibling(target_ref, pick_ref, branch_names, 
already_picked)
+    if sibling_x is not None:
+        print()
+        print("=" * 80)
+        print(
+            "Upstream-First policy: non-bugfix commits on %s should also land 
on %s."
+            % (pick_ref, sibling_x)
+        )
+        print(
+            "If this is a %s-only maintenance bugfix, you may pick %s alone." 
% (pick_ref, pick_ref)
+        )
+        print("Otherwise, pick both (%s first, then %s)." % (sibling_x, 
pick_ref))
+        print("=" * 80)
+        choice = (
+            bold_input(
+                "Pick into [b]oth %s + %s / [o]nly %s / [a]bort (default: 
both): "
+                % (sibling_x, pick_ref, pick_ref)
+            )
+            .strip()
+            .lower()
+        )
+        if choice in ("", "b", "both"):
+            picked_x = _do_cherry_pick(pr_num, merge_hash, sibling_x)
+            picked_n = _do_cherry_pick(pr_num, merge_hash, pick_ref)
+            return [picked_x, picked_n]
+        elif choice in ("o", "only"):
+            return [_do_cherry_pick(pr_num, merge_hash, pick_ref)]
+        elif choice in ("a", "abort"):
+            fail("Aborted by user at Upstream-First policy prompt.")
+        else:
+            fail("Unrecognized choice %r; aborting." % choice)
+
+    return [_do_cherry_pick(pr_num, merge_hash, pick_ref)]
+
+
 def print_jira_issue_summary(issue):
     summary = "Summary\t\t%s\n" % issue.fields.summary
     assignee = issue.fields.assignee
@@ -832,7 +907,6 @@ def main():
     branches = get_json("%s/branches" % GITHUB_API_BASE)
     branch_names = list(filter(lambda x: x.startswith("branch-"), [x["name"] 
for x in branches]))
     branch_names = sorted(branch_names, key=semver_branch_rank, reverse=True)
-    branch_iter = iter(branch_names)
 
     if len(sys.argv) == 1:
         pr_num = bold_input("Which pull request would you like to merge? (e.g. 
34): ")
@@ -942,7 +1016,8 @@ def main():
             fail("Couldn't find any merge commit for #%s, you may need to 
update HEAD." % pr_num)
 
         print("Found commit %s:\n%s" % (merge_hash, message))
-        cherry_pick(pr_num, merge_hash, next(branch_iter, branch_names[0]))
+        default = branch_names[0]
+        cherry_pick(pr_num, merge_hash, default, branch_names, target_ref, 
already_picked=())
         sys.exit(0)
 
     if not bool(pr["mergeable"]):
@@ -976,11 +1051,27 @@ def main():
         print("PR #%s is still open after push; closing it explicitly." % 
pr_num)
         close_pr(pr_num)
 
+    # Walk a mutable remaining-branches list so the next default correctly 
skips any
+    # branches already picked, including branches consumed by the 
Upstream-First two-branch
+    # path inside cherry_pick (e.g. picking branch-M.x + branch-M.N in a 
single prompt).
+    # merged_refs doubles as the already_picked set passed to cherry_pick: it 
starts with
+    # target_ref (the merge sink, never to be re-picked) and grows with every 
cherry-pick.
+    remaining_branches = [b for b in branch_names if b != target_ref]
     pick_prompt = "Would you like to pick %s into another branch?" % merge_hash
     while bold_input("\n%s (y/N): " % pick_prompt).lower() == "y":
-        merged_refs = merged_refs + [
-            cherry_pick(pr_num, merge_hash, next(branch_iter, branch_names[0]))
-        ]
+        default = remaining_branches[0] if remaining_branches else 
branch_names[0]
+        picked = cherry_pick(
+            pr_num,
+            merge_hash,
+            default,
+            branch_names,
+            target_ref,
+            already_picked=tuple(merged_refs),
+        )
+        merged_refs = merged_refs + picked
+        for b in picked:
+            if b in remaining_branches:
+                remaining_branches.remove(b)
 
     if asf_jira is not None:
         continue_maybe("Would you like to update an associated JIRA?")


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to