mahsoodebrahim opened a new pull request, #18709:
URL: https://github.com/apache/hudi/pull/18709

   feat(spark): add show_inflight_commits and cleanup_stale_inflight_commits 
stored procedures
   
   ### Describe the issue this Pull Request addresses
   
   A stuck inflight commit (write, compaction, log-compaction, or clustering) 
blocks async clean, archival, and subsequent compaction on a Hudi table. Today 
operators have to drop into `hudi-cli` and run a `repair rollback` per stuck 
instant — fine for single-table triage, awkward when an oncall is looking at 
several tables and just wants to inspect or remediate via Spark SQL.
   
   This PR adds two `CALL` procedures so that workflow can stay in SQL.
   
   ### Summary and Changelog
   
   Two new Spark SQL stored procedures:
   
   **`show_inflight_commits(table, min_age_minutes?)`** — lists pending 
`REQUESTED` and `INFLIGHT` instants from the active timeline across all action 
types (write, rollback, clean, restore, indexing). The optional 
`min_age_minutes` filter narrows the result to stale instants only.
   Output: `(instant_time, action, state)`.
   
   **`cleanup_stale_inflight_commits(table, allowed_inflight_interval_minutes?, 
include_ingestion_commits?, dry_run?)`** — rolls back stale inflight 
write-timeline commits older than the threshold (default 180 minutes). 
Per-action routing:
   
   - `COMMIT` / `DELTA_COMMIT` / `REPLACE_COMMIT` → `client.rollback()` (same 
pattern as `RollbackToInstantTimeProcedure`).
   - `COMPACTION` / `LOG_COMPACTION` / `CLUSTERING` → dedicated 
`table.rollbackInflight*` methods, mirroring 
`RunRollbackInflightTableServiceProcedure.scala`. `HoodieSparkTable`, 
`tsClient`, and the pending-rollback lookup are constructed lazily on the first 
such instant — matched sets that contain no table-service inflights pay zero 
`HoodieSparkTable.create` overhead.
   - `include_ingestion_commits` defaults to `false`. Opting in allows rollback 
of `COMMIT`/`DELTA_COMMIT` inflights, which can drop in-progress ingestion 
data; documented as a safety warning in the procedure's Scaladoc.
   - `dry_run` defaults to `false`. When `true`, matched instants are listed 
with `rollback_status = NULL` ("matched but not actioned") and the write client 
is not constructed.
   
   Output: `(instant_time, action, rollback_status)` where `rollback_status` is 
`true` = succeeded, `false` = failed, `NULL` = dry-run preview.
   
   Files added:
   - 
`hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/HoodieTimelineCleanupUtil.java`
 — single public method `inflightWriteCommitsOlderThan` backing the cleanup 
procedure.
   - 
`hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowInflightCommitsProcedure.scala`
   - 
`hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/CleanupStaleInflightCommitsProcedure.scala`
   - 
`hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/procedure/TestShowInflightCommitsProcedure.scala`
   - 
`hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/procedure/TestCleanupStaleInflightCommitsProcedure.scala`
   
   Files modified:
   - 
`hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/HoodieProcedures.scala`
 — register the two new procedures (2 lines).
   
   ### Impact
   
   **Public API.** Adds two new `CALL` procedure names 
(`show_inflight_commits`, `cleanup_stale_inflight_commits`) and one new public 
utility class (`org.apache.hudi.HoodieTimelineCleanupUtil`). The output schemas 
are part of the procedures' going-forward contracts:
   
   - `show_inflight_commits`: `(instant_time STRING, action STRING, state 
STRING)`. The `state` column distinguishes `REQUESTED` from `INFLIGHT`, which 
is useful when triaging — a `REQUESTED` instant can usually be deleted directly 
while an `INFLIGHT` one needs a true rollback.
   - `cleanup_stale_inflight_commits`: `(instant_time STRING, action STRING, 
rollback_status BOOLEAN nullable)`. `rollback_status` is three-valued: `true` = 
succeeded, `false` = failed, `NULL` = dry-run preview.
   
   **Performance.** No impact on existing code paths. Within the cleanup 
procedure, the `HoodieSparkTable` / `tsClient` / pending-rollback lookup state 
is `lazy val`, so tables whose matched-instant set contains no compaction / 
log-compaction / clustering pay zero `HoodieSparkTable.create` overhead. 
Dry-run mode skips `HoodieCLIUtils.createHoodieWriteClient` entirely.
   
   **No breaking changes.**
   
   ### Risk Level
   
   **low** — additive only:
   
   - 5 new files plus a 2-line registration edit in `HoodieProcedures.scala`. 
No existing procedure or write-path behavior is modified.
   - Per-action rollback routing reuses APIs already invoked in 
`RollbackToInstantTimeProcedure` and 
`RunRollbackInflightTableServiceProcedure`; no new low-level rollback logic is 
introduced.
   - Each per-instant rollback is wrapped in a `try`/`catch`; failures are 
surfaced as `rollback_status = false` rather than aborting the procedure 
mid-set.
   - 13 unit tests cover the empty / threshold / `include_ingestion_commits` 
gating / `dry_run` paths plus the dedicated `COMPACTION_ACTION`, 
`CLUSTERING_ACTION`, partitioned-COW, and MOR `DELTA_COMMIT_ACTION` branches. 
`mvn checkstyle:check` and `mvn scalastyle:check` are both clean.
   
   The defensive `try`/`catch` arms (rollback throws → `rollback_status = 
false`) are not directly exercised by tests because triggering them 
deterministically requires fabricating timeline corruption; the arms mirror the 
same pattern used elsewhere and are intentionally minimal.
   
   ### Documentation Update
   
   A follow-up website PR against `apache/hudi-site` is needed to document the 
two new procedures alongside the existing `CALL` procedure docs. No new configs 
are added and no default values of existing configs are changed.
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Enough context is provided in the sections above
   - [x] Adequate tests were added if applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to