[PR] feat(spark): add show_inflight_commits and cleanup_stale_inflight_com… [hudi]

via GitHub Fri, 08 May 2026 12:43:14 -0700


mahsoodebrahim opened a new pull request, #18709:
URL: https://github.com/apache/hudi/pull/18709

feat(spark): add show_inflight_commits and cleanup_stale_inflight_commits
stored procedures

### Describe the issue this Pull Request addresses

A stuck inflight commit (write, compaction, log-compaction, or clustering)
blocks async clean, archival, and subsequent compaction on a Hudi table. Today
operators have to drop into `hudi-cli` and run a `repair rollback` per stuck
instant — fine for single-table triage, awkward when an oncall is looking at
several tables and just wants to inspect or remediate via Spark SQL.

This PR adds two `CALL` procedures so that workflow can stay in SQL.

### Summary and Changelog

Two new Spark SQL stored procedures:

**`show_inflight_commits(table, min_age_minutes?)`** — lists pending
`REQUESTED` and `INFLIGHT` instants from the active timeline across all action
types (write, rollback, clean, restore, indexing). The optional
`min_age_minutes` filter narrows the result to stale instants only.
Output: `(instant_time, action, state)`.

**`cleanup_stale_inflight_commits(table, allowed_inflight_interval_minutes?,
include_ingestion_commits?, dry_run?)`** — rolls back stale inflight
write-timeline commits older than the threshold (default 180 minutes).
Per-action routing:

- `COMMIT` / `DELTA_COMMIT` / `REPLACE_COMMIT` → `client.rollback()` (same
pattern as `RollbackToInstantTimeProcedure`).
- `COMPACTION` / `LOG_COMPACTION` / `CLUSTERING` → dedicated
`table.rollbackInflight*` methods, mirroring
`RunRollbackInflightTableServiceProcedure.scala`. `HoodieSparkTable`,
`tsClient`, and the pending-rollback lookup are constructed lazily on the first
such instant — matched sets that contain no table-service inflights pay zero
`HoodieSparkTable.create` overhead.
- `include_ingestion_commits` defaults to `false`. Opting in allows rollback
of `COMMIT`/`DELTA_COMMIT` inflights, which can drop in-progress ingestion
data; documented as a safety warning in the procedure's Scaladoc.
- `dry_run` defaults to `false`. When `true`, matched instants are listed
with `rollback_status = NULL` ("matched but not actioned") and the write client
is not constructed.

Output: `(instant_time, action, rollback_status)` where `rollback_status` is
`true` = succeeded, `false` = failed, `NULL` = dry-run preview.

Files added:
-
`hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/HoodieTimelineCleanupUtil.java`
— single public method `inflightWriteCommitsOlderThan` backing the cleanup
procedure.
-
`hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowInflightCommitsProcedure.scala`
-
`hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/CleanupStaleInflightCommitsProcedure.scala`
-
`hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/procedure/TestShowInflightCommitsProcedure.scala`
-
`hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/procedure/TestCleanupStaleInflightCommitsProcedure.scala`

Files modified:
-
`hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/HoodieProcedures.scala`
— register the two new procedures (2 lines).

### Impact

**Public API.** Adds two new `CALL` procedure names
(`show_inflight_commits`, `cleanup_stale_inflight_commits`) and one new public
utility class (`org.apache.hudi.HoodieTimelineCleanupUtil`). The output schemas
are part of the procedures' going-forward contracts:

- `show_inflight_commits`: `(instant_time STRING, action STRING, state
STRING)`. The `state` column distinguishes `REQUESTED` from `INFLIGHT`, which
is useful when triaging — a `REQUESTED` instant can usually be deleted directly
while an `INFLIGHT` one needs a true rollback.
- `cleanup_stale_inflight_commits`: `(instant_time STRING, action STRING,
rollback_status BOOLEAN nullable)`. `rollback_status` is three-valued: `true` =
succeeded, `false` = failed, `NULL` = dry-run preview.

**Performance.** No impact on existing code paths. Within the cleanup
procedure, the `HoodieSparkTable` / `tsClient` / pending-rollback lookup state
is `lazy val`, so tables whose matched-instant set contains no compaction /
log-compaction / clustering pay zero `HoodieSparkTable.create` overhead.
Dry-run mode skips `HoodieCLIUtils.createHoodieWriteClient` entirely.

**No breaking changes.**

### Risk Level

**low** — additive only:

- 5 new files plus a 2-line registration edit in `HoodieProcedures.scala`.
No existing procedure or write-path behavior is modified.
- Per-action rollback routing reuses APIs already invoked in
`RollbackToInstantTimeProcedure` and
`RunRollbackInflightTableServiceProcedure`; no new low-level rollback logic is
introduced.
- Each per-instant rollback is wrapped in a `try`/`catch`; failures are
surfaced as `rollback_status = false` rather than aborting the procedure
mid-set.
- 13 unit tests cover the empty / threshold / `include_ingestion_commits`
gating / `dry_run` paths plus the dedicated `COMPACTION_ACTION`,
`CLUSTERING_ACTION`, partitioned-COW, and MOR `DELTA_COMMIT_ACTION` branches.
`mvn checkstyle:check` and `mvn scalastyle:check` are both clean.

The defensive `try`/`catch` arms (rollback throws → `rollback_status =
false`) are not directly exercised by tests because triggering them
deterministically requires fabricating timeline corruption; the arms mirror the
same pattern used elsewhere and are intentionally minimal.

### Documentation Update

A follow-up website PR against `apache/hudi-site` is needed to document the
two new procedures alongside the existing `CALL` procedure docs. No new configs
are added and no default values of existing configs are changed.

### Contributor's checklist

- [x] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [x] Enough context is provided in the sections above
- [x] Adequate tests were added if applicable

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] feat(spark): add show_inflight_commits and cleanup_stale_inflight_com… [hudi]

Reply via email to