Re: [PR] feat(blob): Supplementing RFC-100 with blob cleaner design [hudi]

via GitHub Wed, 25 Mar 2026 19:17:40 -0700


the-other-tim-brown commented on code in PR #18359:
URL: https://github.com/apache/hudi/pull/18359#discussion_r2992070048



##########
rfc/rfc-100/rfc-100-blob-cleaner-design.md:
##########
@@ -0,0 +1,777 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+# RFC-100 Part 2: Blob Cleanup for Unstructured Data
+
+## Proposers
+
+- @voon
+
+## Approvers
+
+- (TBD)
+
+## Status
+
+Issue: <Link to GH feature issue>
+
+> Please keep the status updated in `rfc/README.md`.
+
+---
+
+## Abstract
+
+When Hudi cleans expired file slices, out-of-line blob files they reference 
may become orphaned --
+still consuming storage but unreachable by any query. This RFC extends the 
existing file slice
+cleaner to identify and delete these orphaned blob files safely and 
efficiently. The design uses a
+three-stage pipeline: (1) per-file-group set-difference to find 
locally-orphaned blobs, (2) an MDT
+secondary index lookup for cross-file-group verification of 
externally-referenced blobs, and (3)
+container file lifecycle resolution. For Hudi-created blobs, cleanup is 
essentially free -- structural
+path uniqueness eliminates cross-file-group concerns entirely. For 
user-provided external blobs,
+targeted index lookups scale with the number of candidates, not the table 
size. Tables without blob
+columns pay zero cost.
+
+---
+
+## Background
+
+### Why Blob Cleanup Is Needed
+
+RFC-100 introduces out-of-line blob storage for unstructured data (images, 
video, documents). A
+record's `BlobReference` field points to an external blob file by `(path, 
offset, length)`. When
+the cleaner expires old file slices, the blob files they reference may no 
longer be needed -- but the
+existing cleaner has no concept of transitive references. It deletes file 
slices without considering
+the blob files they point to. Without blob cleanup, orphaned blobs accumulate 
indefinitely.
+
+### Two Blob Flows
+
+Blob cleanup must support two distinct entry flows with fundamentally 
different properties:
+
+**Flow 1 -- Hudi-created blobs.** Blobs created by Hudi's write path, stored at
+`{table}/.hoodie/blobs/{partition}/{col}/{instant}/{blob_id}`. The commit 
instant in the path
+guarantees uniqueness (C11), and blobs are scoped to a single file group (P3). 
Cross-file-group
+sharing does not occur. This is the expected majority flow for Phase 3 
workloads.
+
+**Flow 2 -- User-provided external blobs.** Users have existing blob files in 
external storage
+(e.g., `s3://media-bucket/videos/`). Records reference these blobs directly by 
path. Hudi manages
+the *references*, not the *storage layout*. Cross-file-group sharing is common 
-- multiple records
+across different file groups can point to the same blob. This is the expected 
primary flow for
+Phase 1 workloads.
+
+| Property                  | Flow 1 (Hudi-created)             | Flow 2 
(External)                    |
+|---------------------------|-----------------------------------|--------------------------------------|
+| Path uniqueness           | Guaranteed (instant in path, C11) | Not 
guaranteed (user controls)       |
+| Cross-FG sharing          | Does not occur (FG-scoped)        | Common 
(multiple records, same blob) |
+| Writer/cleaner race       | Cannot occur (D2)                 | Can occur 
(D3)                       |
+| Per-FG cleanup sufficient | Yes                               | No -- 
cross-FG verification needed   |
+
+### Constraints and Requirements Reference
+
+Full descriptions and failure modes in [Appendix 
B](rfc-100-blob-cleaner-problem.md).
+
+| ID  | Constraint                                      | Flow 1 | Flow 2 | 
Remarks                      |
+|-----|-------------------------------------------------|--------|--------|------------------------------|
+| C1  | Blob immutability (append-once, read-many)      | Y      | Y      |    
                          |
+| C2  | Delete-and-re-add same path                     | --     | Y      | 
Eliminated for Flow 1 by C11 |
+| C3  | Cross-file-group blob sharing                   | --     | Y      | 
Common for external blobs    |
+| C4  | Container files (`(offset, length)` ranges)     | Y      | Y      |    
                          |
+| C5  | MOR log updates shadow base file blob refs      | Y      | Y      |    
                          |
+| C6  | Existing cleaner is per-file-group scoped       | Y      | Y      |    
                          |
+| C7  | OCC is per-file-group                           | Y      | Y      | No 
global contention allowed |
+| C8  | Clustering moves blob refs between file groups  | Y      | Y      |    
                          |
+| C9  | Savepoints freeze file slices and blob refs     | Y      | Y      |    
                          |
+| C10 | Rollback can invalidate or resurrect references | Y      | Y      |    
                          |
+| C11 | Blob paths include commit instant               | Y      | --     | 
Eliminates C2, C3, C13       |
+| C12 | Archival removes commit metadata                | Y      | Y      |    
                          |
+| C13 | Cross-FG verification needed at scale           | --     | Y      |    
                          |
+
+| ID  | Requirement                                                      |
+|-----|------------------------------------------------------------------|
+| R1  | No premature deletion (hard invariant)                           |
+| R2  | No permanent orphans (bounded cleanup)                           |
+| R3  | Container awareness (range-level liveness)                       |
+| R4  | MOR correctness (over-retention acceptable, under-retention not) |
+| R5  | Concurrency safety (no global serialization)                     |
+| R6  | Scale proportional to work, not table size                       |
+| R7  | No cost for non-blob tables                                      |
+| R8  | All cleaning policies supported                                  |
+| R9  | Crash safety and idempotency                                     |
+| R10 | Observability (metrics for deleted, retained, reclaimed)         |
+
+---
+
+## Design Overview
+
+### Design Philosophy
+
+Blob cleanup extends the existing `CleanPlanner` / `CleanActionExecutor` 
pipeline -- same timeline
+instant, same plan-execute-complete lifecycle, same crash recovery and OCC 
integration. A
+`hasBlobColumns()` check gates all blob logic so non-blob tables pay zero cost.
+
+The two flows have different cost structures, and the design keeps them 
separate. Flow 1
+(Hudi-created blobs) gets per-FG cleanup with no cross-FG overhead. Flow 2 
(external blobs) gets
+targeted cross-FG verification via MDT secondary index. Dispatch is a string 
prefix check on the
+blob path.
+
+### Three-Stage Pipeline
+
+| Stage       | Scope                | Purpose                                 
                                         | When it runs                         
  |
+|-------------|----------------------|----------------------------------------------------------------------------------|----------------------------------------|
+| **Stage 1** | Per-file-group       | Collect expired/retained blob refs, 
compute set difference, dispatch by category | Always (for blob tables)         
      |
+| **Stage 2** | Cross-file-group     | Verify external blob candidates against 
MDT secondary index or fallback scan     | Only when external candidates exist  
  |
+| **Stage 3** | Container resolution | Determine delete vs. 
flag-for-compaction at the container level                  | Only when 
container blobs are involved |
+
+### Independent Implementability
+
+The three stages have clean input/output interfaces and can be implemented, 
tested, and shipped
+independently:
+
+| Stage   | Input                                                   | Output   
                                           |
+|---------|---------------------------------------------------------|-----------------------------------------------------|
+| Stage 1 | `FileGroupCleanResult` (expired + retained slices)      | 
`hudi_blob_deletes`, `external_candidates`          |
+| Stage 2 | `external_candidates`, `cleaned_fg_ids`                 | 
`external_deletes`                                  |
+| Stage 3 | `hudi_blob_deletes` + `external_deletes`, retained refs | 
`blob_files_to_delete`, `containers_for_compaction` |
+
+A shared foundation layer must land first (see [Rollout / Adoption 
Plan](#rollout--adoption-plan)), after which stages
+can proceed in any order.
+
+### Key Decisions
+
+| Decision            | Choice                                                 
 | Rationale                                                          |
+|---------------------|---------------------------------------------------------|--------------------------------------------------------------------|
+| Blob identity       | `(path, offset, length)` tuple                         
 | Handles containers (C4) and path reuse (C2) correctly              |
+| Cleanup scope       | Per-FG (Hudi blobs) + MDT index lookup (external 
blobs) | Aligns with OCC (C7) and existing cleaner (C6); scales for C13     |
+| Dispatch mechanism  | Path prefix check on blob path                         
 | Zero-cost classification; Hudi blobs match `.hoodie/blobs/` prefix |
+| Cross-FG mechanism  | MDT secondary index on `reference.external_path`       
 | Short-circuits on first non-cleaned FG ref; first-class for Flow 2 |
+| Write-path overhead | None (Flow 1); MDT index maintenance (Flow 2)          
 | Index maintained by existing MDT pipeline, not a new write cost    |
+| MOR strategy        | Over-retain (union of base + log refs)                 
 | Safe (C5, R4); cleaned after compaction                            |
+| Container strategy  | Tuple-level tracking; delete only when all ranges dead 
 | Correct (C4, R3); partial containers flagged for blob compaction   |
+
+```mermaid
+flowchart LR
+    subgraph Planning["CleanPlanActionExecutor.requestClean()"]
+        direction TB
+        Gate{"hasBlobColumns()?"}
+        Gate -- No --> Skip["Skip blob cleanup<br/>(zero cost)"]
+        Gate -- Yes --> CP
+
+        subgraph CP["CleanPlanner (per-partition, per-FG)"]
+            direction TB
+            Policy["Policy method<br/>→ FileGroupCleanResult<br/>(expired + 
retained slices)"]
+            S1["<b>Stage 1</b><br/>Per-FG blob ref<br/>set difference + 
dispatch"]
+            Policy --> S1
+        end
+
+        S1 --> S2["<b>Stage 2</b><br/>Cross-FG verification<br/>(MDT secondary 
index)"]
+        S1 -->|hudi_blob_deletes| S3
+        S2 -->|external_deletes| S3["<b>Stage 3</b><br/>Container 
lifecycle<br/>resolution"]
+    end
+
+    subgraph Plan["HoodieCleanerPlan"]
+        FP["filePathsToBeDeleted<br/>(existing)"]
+        BP["blobFilesToDelete<br/>(new)"]

Review Comment:
   We decided in another [PR](https://github.com/apache/hudi/pull/18259) that 
this is too large to write to the plan. Has that changed?



##########
rfc/rfc-100/rfc-100-blob-cleaner-design.md:
##########
@@ -0,0 +1,777 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+# RFC-100 Part 2: Blob Cleanup for Unstructured Data
+
+## Proposers
+
+- @voon
+
+## Approvers
+
+- (TBD)
+
+## Status
+
+Issue: <Link to GH feature issue>
+
+> Please keep the status updated in `rfc/README.md`.
+
+---
+
+## Abstract
+
+When Hudi cleans expired file slices, out-of-line blob files they reference 
may become orphaned --
+still consuming storage but unreachable by any query. This RFC extends the 
existing file slice
+cleaner to identify and delete these orphaned blob files safely and 
efficiently. The design uses a
+three-stage pipeline: (1) per-file-group set-difference to find 
locally-orphaned blobs, (2) an MDT
+secondary index lookup for cross-file-group verification of 
externally-referenced blobs, and (3)
+container file lifecycle resolution. For Hudi-created blobs, cleanup is 
essentially free -- structural
+path uniqueness eliminates cross-file-group concerns entirely. For 
user-provided external blobs,
+targeted index lookups scale with the number of candidates, not the table 
size. Tables without blob
+columns pay zero cost.
+
+---
+
+## Background
+
+### Why Blob Cleanup Is Needed
+
+RFC-100 introduces out-of-line blob storage for unstructured data (images, 
video, documents). A
+record's `BlobReference` field points to an external blob file by `(path, 
offset, length)`. When
+the cleaner expires old file slices, the blob files they reference may no 
longer be needed -- but the
+existing cleaner has no concept of transitive references. It deletes file 
slices without considering
+the blob files they point to. Without blob cleanup, orphaned blobs accumulate 
indefinitely.
+
+### Two Blob Flows
+
+Blob cleanup must support two distinct entry flows with fundamentally 
different properties:
+
+**Flow 1 -- Hudi-created blobs.** Blobs created by Hudi's write path, stored at
+`{table}/.hoodie/blobs/{partition}/{col}/{instant}/{blob_id}`. The commit 
instant in the path
+guarantees uniqueness (C11), and blobs are scoped to a single file group (P3). 
Cross-file-group
+sharing does not occur. This is the expected majority flow for Phase 3 
workloads.
+
+**Flow 2 -- User-provided external blobs.** Users have existing blob files in 
external storage
+(e.g., `s3://media-bucket/videos/`). Records reference these blobs directly by 
path. Hudi manages
+the *references*, not the *storage layout*. Cross-file-group sharing is common 
-- multiple records
+across different file groups can point to the same blob. This is the expected 
primary flow for
+Phase 1 workloads.
+
+| Property                  | Flow 1 (Hudi-created)             | Flow 2 
(External)                    |
+|---------------------------|-----------------------------------|--------------------------------------|
+| Path uniqueness           | Guaranteed (instant in path, C11) | Not 
guaranteed (user controls)       |
+| Cross-FG sharing          | Does not occur (FG-scoped)        | Common 
(multiple records, same blob) |
+| Writer/cleaner race       | Cannot occur (D2)                 | Can occur 
(D3)                       |
+| Per-FG cleanup sufficient | Yes                               | No -- 
cross-FG verification needed   |
+
+### Constraints and Requirements Reference
+
+Full descriptions and failure modes in [Appendix 
B](rfc-100-blob-cleaner-problem.md).
+
+| ID  | Constraint                                      | Flow 1 | Flow 2 | 
Remarks                      |
+|-----|-------------------------------------------------|--------|--------|------------------------------|
+| C1  | Blob immutability (append-once, read-many)      | Y      | Y      |    
                          |
+| C2  | Delete-and-re-add same path                     | --     | Y      | 
Eliminated for Flow 1 by C11 |
+| C3  | Cross-file-group blob sharing                   | --     | Y      | 
Common for external blobs    |
+| C4  | Container files (`(offset, length)` ranges)     | Y      | Y      |    
                          |
+| C5  | MOR log updates shadow base file blob refs      | Y      | Y      |    
                          |
+| C6  | Existing cleaner is per-file-group scoped       | Y      | Y      |    
                          |
+| C7  | OCC is per-file-group                           | Y      | Y      | No 
global contention allowed |
+| C8  | Clustering moves blob refs between file groups  | Y      | Y      |    
                          |
+| C9  | Savepoints freeze file slices and blob refs     | Y      | Y      |    
                          |
+| C10 | Rollback can invalidate or resurrect references | Y      | Y      |    
                          |
+| C11 | Blob paths include commit instant               | Y      | --     | 
Eliminates C2, C3, C13       |
+| C12 | Archival removes commit metadata                | Y      | Y      |    
                          |
+| C13 | Cross-FG verification needed at scale           | --     | Y      |    
                          |
+
+| ID  | Requirement                                                      |
+|-----|------------------------------------------------------------------|
+| R1  | No premature deletion (hard invariant)                           |
+| R2  | No permanent orphans (bounded cleanup)                           |
+| R3  | Container awareness (range-level liveness)                       |
+| R4  | MOR correctness (over-retention acceptable, under-retention not) |
+| R5  | Concurrency safety (no global serialization)                     |
+| R6  | Scale proportional to work, not table size                       |
+| R7  | No cost for non-blob tables                                      |
+| R8  | All cleaning policies supported                                  |
+| R9  | Crash safety and idempotency                                     |
+| R10 | Observability (metrics for deleted, retained, reclaimed)         |
+
+---
+
+## Design Overview
+
+### Design Philosophy
+
+Blob cleanup extends the existing `CleanPlanner` / `CleanActionExecutor` 
pipeline -- same timeline
+instant, same plan-execute-complete lifecycle, same crash recovery and OCC 
integration. A
+`hasBlobColumns()` check gates all blob logic so non-blob tables pay zero cost.
+
+The two flows have different cost structures, and the design keeps them 
separate. Flow 1
+(Hudi-created blobs) gets per-FG cleanup with no cross-FG overhead. Flow 2 
(external blobs) gets
+targeted cross-FG verification via MDT secondary index. Dispatch is a string 
prefix check on the
+blob path.
+
+### Three-Stage Pipeline
+
+| Stage       | Scope                | Purpose                                 
                                         | When it runs                         
  |
+|-------------|----------------------|----------------------------------------------------------------------------------|----------------------------------------|
+| **Stage 1** | Per-file-group       | Collect expired/retained blob refs, 
compute set difference, dispatch by category | Always (for blob tables)         
      |
+| **Stage 2** | Cross-file-group     | Verify external blob candidates against 
MDT secondary index or fallback scan     | Only when external candidates exist  
  |
+| **Stage 3** | Container resolution | Determine delete vs. 
flag-for-compaction at the container level                  | Only when 
container blobs are involved |
+
+### Independent Implementability
+
+The three stages have clean input/output interfaces and can be implemented, 
tested, and shipped
+independently:
+
+| Stage   | Input                                                   | Output   
                                           |
+|---------|---------------------------------------------------------|-----------------------------------------------------|
+| Stage 1 | `FileGroupCleanResult` (expired + retained slices)      | 
`hudi_blob_deletes`, `external_candidates`          |
+| Stage 2 | `external_candidates`, `cleaned_fg_ids`                 | 
`external_deletes`                                  |
+| Stage 3 | `hudi_blob_deletes` + `external_deletes`, retained refs | 
`blob_files_to_delete`, `containers_for_compaction` |
+
+A shared foundation layer must land first (see [Rollout / Adoption 
Plan](#rollout--adoption-plan)), after which stages
+can proceed in any order.
+
+### Key Decisions
+
+| Decision            | Choice                                                 
 | Rationale                                                          |
+|---------------------|---------------------------------------------------------|--------------------------------------------------------------------|
+| Blob identity       | `(path, offset, length)` tuple                         
 | Handles containers (C4) and path reuse (C2) correctly              |
+| Cleanup scope       | Per-FG (Hudi blobs) + MDT index lookup (external 
blobs) | Aligns with OCC (C7) and existing cleaner (C6); scales for C13     |
+| Dispatch mechanism  | Path prefix check on blob path                         
 | Zero-cost classification; Hudi blobs match `.hoodie/blobs/` prefix |
+| Cross-FG mechanism  | MDT secondary index on `reference.external_path`       
 | Short-circuits on first non-cleaned FG ref; first-class for Flow 2 |
+| Write-path overhead | None (Flow 1); MDT index maintenance (Flow 2)          
 | Index maintained by existing MDT pipeline, not a new write cost    |
+| MOR strategy        | Over-retain (union of base + log refs)                 
 | Safe (C5, R4); cleaned after compaction                            |
+| Container strategy  | Tuple-level tracking; delete only when all ranges dead 
 | Correct (C4, R3); partial containers flagged for blob compaction   |
+
+```mermaid
+flowchart LR
+    subgraph Planning["CleanPlanActionExecutor.requestClean()"]
+        direction TB
+        Gate{"hasBlobColumns()?"}
+        Gate -- No --> Skip["Skip blob cleanup<br/>(zero cost)"]
+        Gate -- Yes --> CP
+
+        subgraph CP["CleanPlanner (per-partition, per-FG)"]
+            direction TB
+            Policy["Policy method<br/>→ FileGroupCleanResult<br/>(expired + 
retained slices)"]
+            S1["<b>Stage 1</b><br/>Per-FG blob ref<br/>set difference + 
dispatch"]
+            Policy --> S1
+        end
+
+        S1 --> S2["<b>Stage 2</b><br/>Cross-FG verification<br/>(MDT secondary 
index)"]
+        S1 -->|hudi_blob_deletes| S3
+        S2 -->|external_deletes| S3["<b>Stage 3</b><br/>Container 
lifecycle<br/>resolution"]
+    end
+
+    subgraph Plan["HoodieCleanerPlan"]
+        FP["filePathsToBeDeleted<br/>(existing)"]
+        BP["blobFilesToDelete<br/>(new)"]
+        CC["containersToCompact<br/>(new)"]
+    end
+
+    S3 --> BP
+    S3 --> CC
+    CP --> FP
+
+    subgraph Execution["CleanActionExecutor.runClean()"]
+        direction TB
+        DF["Delete file slices<br/>(existing, parallel)"]
+        DB["Delete blob files<br/>(new, parallel)"]
+        RC["Record containers<br/>for blob compaction"]
+    end
+
+    FP --> DF
+    BP --> DB
+    CC --> RC
+```
+
+---
+
+## Algorithm
+
+### Stage 1: Per-File-Group Local Cleanup
+
+Stage 1 runs after the existing policy logic determines which file slices are 
expired and retained
+for a given file group. It collects blob refs from both sets and computes 
locally-orphaned blobs by
+set difference.
+
+```
+Input:  A file group FG with expired_slices and retained_slices (from policy)
+Output: hudi_blob_deletes     -- blobs safe to delete immediately
+        external_candidates   -- external blobs needing cross-FG verification
+
+for each file_group being cleaned:
+
+    // Collect expired blob refs (base files + log files)
+    // Must read log files: blob refs introduced and superseded within the log
+    // chain before compaction would otherwise become permanent orphans.
+    expired_refs = Set<(path, offset, length)>()
+    for slice in expired_slices:
+        for ref in extractBlobRefs(slice.baseFile):   // columnar projection
+            if ref.type == OUT_OF_LINE and ref.managed == true:
+                expired_refs.add((ref.path, ref.offset, ref.length))
+        for ref in extractBlobRefs(slice.logFiles):   // full record read
+            if ref.type == OUT_OF_LINE and ref.managed == true:
+                expired_refs.add((ref.path, ref.offset, ref.length))
+
+    if expired_refs is empty:
+        continue                                       // no blob work for 
this FG
+
+    // Collect retained blob refs (base files only)
+    // Cleaning is fenced on compaction: retained base files contain the merged
+    // state. Log reads are unnecessary -- any shadowed base ref causes safe
+    // over-retention, cleaned after the next compaction cycle.
+    retained_refs = Set<(path, offset, length)>()
+    for slice in retained_slices:
+        for ref in extractBlobRefs(slice.baseFile):   // columnar projection 
only
+            if ref.type == OUT_OF_LINE and ref.managed == true:
+                retained_refs.add((ref.path, ref.offset, ref.length))
+
+    // Compute local orphans by set difference
+    local_orphans = expired_refs - retained_refs
+
+    // Dispatch by blob category
+    for ref in local_orphans:
+        if ref.path starts with TABLE_PATH + "/.hoodie/blobs/":
+            hudi_blob_deletes.add(ref)             // P3: no cross-FG refs 
possible
+        else:
+            external_candidates.add(ref)           // C13: cross-FG refs are 
common
+```
+
+**Correctness notes:**
+
+- **Hudi-created blobs:** If a blob ref appears in expired but not retained 
slices of the same FG,
+  it is globally orphaned -- Hudi blobs are FG-scoped (C11), so no cross-FG 
check is needed.
+- **MOR -- expired side reads base + logs:** Blob refs can be introduced and 
superseded entirely
+  within the log chain (e.g., `log@t2: row1→blob_B`, then `log@t3: 
row1→blob_C`). After
+  compaction, `blob_B` exists only in the expired log. Skipping logs would 
orphan it permanently.
+- **MOR -- retained side reads base only:** Cleaning is fenced on compaction, 
so retained base
+  files contain the merged state. Shadowed base refs cause over-retention 
(safe), cleaned after
+  the next compaction.
+- **Savepoints:** Inherited from existing cleaner -- savepointed slices stay 
in the retained set.
+- **Replaced FGs (clustering):** `retained_slices` is empty, so all blob refs 
become candidates.
+  Hudi blobs are safe to delete (clustering creates new blobs in the target 
FG). External blobs
+  flow to Stage 2 (clustering copies the pointer, so Stage 2 finds it in the 
target FG).
+
+### Stage 2: Cross-FG Verification (External Blobs)
+
+Stage 2 executes only when `external_candidates` is non-empty. For Flow 1 
workloads (Hudi-created
+blobs only), this stage is skipped entirely.
+
+#### Primary path: MDT secondary index
+
+When the MDT secondary index on `reference.external_path` is available and 
fully built:
+
+```
+Input:  external_candidates, cleaned_fg_ids
+Output: external_deletes (confirmed globally orphaned)
+
+candidate_paths = external_candidates.map(ref -> ref.path).distinct()
+
+// Step 1: Batched prefix scan on secondary index
+// Key format: escaped(external_path)$escaped(record_key)
+// Returns ALL record keys that reference each candidate path
+path_to_record_keys = 
mdtMetadata.readSecondaryIndexDataTableRecordKeysWithKeys(
+    HoodieListData.eager(candidate_paths), indexPartitionName)

Review Comment:
   The `HoodieData` here is always referencing `HoodieListData` but can we use 
the RDD backed `HoodieData` when running on spark?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(blob): Supplementing RFC-100 with blob cleaner design [hudi]

Reply via email to