Copilot commented on code in PR #17613: URL: https://github.com/apache/hudi/pull/17613#discussion_r2624961134
########## website/blog/2025-12-16-maximizing-throughput-nbcc.md: ########## @@ -0,0 +1,138 @@ +--- +title: "Maximizing Throughput with Apache Hudi NBCC: Stop Retrying, Start Scaling" +excerpt: "Learn how Hudi's Non-Blocking Concurrency Control eliminates retry storms for concurrent writers, maximizing throughput in streaming and mixed workloads." +author: "Shiyan Xu" +category: blog +image: /assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p6-nbcc-compaction.png +tags: + - hudi + - data lakehouse + - concurrency control + - streaming +--- + +Data lakehouses often run multiple concurrent writers—streaming ingestion, batch ETL, maintenance jobs. The default approach, Optimistic Concurrency Control (OCC), assumes conflicts are rare and handles them through retries. That assumption breaks down in increasingly common scenarios, such as running maintenance batch jobs on tables receiving streaming writes. When conflicts become the norm, retries pile up with OCC, and the write throughput tanks. + +Hudi introduced [Non-Blocking Concurrency Control (NBCC)](https://hudi.apache.org/docs/concurrency_control#non-blocking-concurrency-control) in release 1.0, solving this problem by allowing writers to append data files in parallel and using the write completion time to determine the serialization order for reads or merges. We'll explore why OCC struggles under real-world concurrency, how NBCC works under the hood, and how to configure NBCC in your pipelines. + +## The Problem with Retries + +Picture this scenario: your streaming pipeline ingests clickstream data every minute from multiple Kafka topics. A nightly GDPR deletion job kicks off at midnight, scanning across thousands of partitions to purge user records—also touching data files the ingestion pipeline is actively writing to. By 3 AM, you get paged—the deletion job has failed repeatedly, burning compute resources while the ingestion writer keeps winning the race to commit. + + Review Comment: Missing image file "p1-occ-retries.png". The blog post references this image on line 22, but only three PNG files are included in the PR (p3-recordkey-filegroup.png, p4-completion-time.png, and p6-nbcc-compaction.png). The images p1-occ-retries.png, p2-nbcc-overview.png, and p5-truetime.gif are referenced but not included. ```suggestion *(Diagram: OCC retries stacking up under heavy contention between long-running and short-running writers.)* ``` ########## website/blog/2025-12-16-maximizing-throughput-nbcc.md: ########## @@ -0,0 +1,138 @@ +--- +title: "Maximizing Throughput with Apache Hudi NBCC: Stop Retrying, Start Scaling" +excerpt: "Learn how Hudi's Non-Blocking Concurrency Control eliminates retry storms for concurrent writers, maximizing throughput in streaming and mixed workloads." +author: "Shiyan Xu" +category: blog +image: /assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p6-nbcc-compaction.png +tags: + - hudi + - data lakehouse + - concurrency control + - streaming +--- + +Data lakehouses often run multiple concurrent writers—streaming ingestion, batch ETL, maintenance jobs. The default approach, Optimistic Concurrency Control (OCC), assumes conflicts are rare and handles them through retries. That assumption breaks down in increasingly common scenarios, such as running maintenance batch jobs on tables receiving streaming writes. When conflicts become the norm, retries pile up with OCC, and the write throughput tanks. + +Hudi introduced [Non-Blocking Concurrency Control (NBCC)](https://hudi.apache.org/docs/concurrency_control#non-blocking-concurrency-control) in release 1.0, solving this problem by allowing writers to append data files in parallel and using the write completion time to determine the serialization order for reads or merges. We'll explore why OCC struggles under real-world concurrency, how NBCC works under the hood, and how to configure NBCC in your pipelines. + +## The Problem with Retries + +Picture this scenario: your streaming pipeline ingests clickstream data every minute from multiple Kafka topics. A nightly GDPR deletion job kicks off at midnight, scanning across thousands of partitions to purge user records—also touching data files the ingestion pipeline is actively writing to. By 3 AM, you get paged—the deletion job has failed repeatedly, burning compute resources while the ingestion writer keeps winning the race to commit. + + + +OCC assumes conflicts are rare—an assumption that held in traditional batch-oriented data lakes where jobs were scheduled sequentially. Most transactions will not overlap, so let them proceed optimistically and check for conflicts at commit time. But high-frequency streaming breaks this assumption: when you have minute-level ingestion plus long-running maintenance jobs, overlapping writes are not the exception—they are the norm. + +This is a classic concurrency anti-pattern: under OCC, conflict probability grows with transaction duration. Long-running jobs competing against frequent short writes lose nearly every commit race and retry indefinitely. When both concurrent writers are running ingestion, without careful coordination between the writers (e.g., segregating writers by partitions), the consequences become more severe: conflicts occur more often, overall throughput is reduced, and compute costs increase. The key insight is that retries are the throughput killer—we need a fundamentally different approach. + +## Hudi NBCC: Write in Parallel, Serialize by Completion Time + +NBCC avoids conflicts by design: let every writer append updates to Hudi’s log files in the Merge-on-Read (MOR) table, then let readers or mergers follow the serialization order based on write completion time. Let's say there are two writers, both updating a record concurrently. Under NBCC, each writer produces its own log file containing the update. Since there's no file contention, there's nothing to conflict on. At read time or during compaction, Hudi follows the write completion time and processes the associated log files in the proper order. + + Review Comment: Missing image file "p2-nbcc-overview.png". This image is referenced on line 32 but not included in the PR. ```suggestion  ``` ########## website/blog/2025-12-16-maximizing-throughput-nbcc.md: ########## @@ -0,0 +1,138 @@ +--- +title: "Maximizing Throughput with Apache Hudi NBCC: Stop Retrying, Start Scaling" +excerpt: "Learn how Hudi's Non-Blocking Concurrency Control eliminates retry storms for concurrent writers, maximizing throughput in streaming and mixed workloads." +author: "Shiyan Xu" +category: blog +image: /assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p6-nbcc-compaction.png +tags: + - hudi + - data lakehouse + - concurrency control + - streaming +--- + +Data lakehouses often run multiple concurrent writers—streaming ingestion, batch ETL, maintenance jobs. The default approach, Optimistic Concurrency Control (OCC), assumes conflicts are rare and handles them through retries. That assumption breaks down in increasingly common scenarios, such as running maintenance batch jobs on tables receiving streaming writes. When conflicts become the norm, retries pile up with OCC, and the write throughput tanks. + +Hudi introduced [Non-Blocking Concurrency Control (NBCC)](https://hudi.apache.org/docs/concurrency_control#non-blocking-concurrency-control) in release 1.0, solving this problem by allowing writers to append data files in parallel and using the write completion time to determine the serialization order for reads or merges. We'll explore why OCC struggles under real-world concurrency, how NBCC works under the hood, and how to configure NBCC in your pipelines. + +## The Problem with Retries + +Picture this scenario: your streaming pipeline ingests clickstream data every minute from multiple Kafka topics. A nightly GDPR deletion job kicks off at midnight, scanning across thousands of partitions to purge user records—also touching data files the ingestion pipeline is actively writing to. By 3 AM, you get paged—the deletion job has failed repeatedly, burning compute resources while the ingestion writer keeps winning the race to commit. + + + +OCC assumes conflicts are rare—an assumption that held in traditional batch-oriented data lakes where jobs were scheduled sequentially. Most transactions will not overlap, so let them proceed optimistically and check for conflicts at commit time. But high-frequency streaming breaks this assumption: when you have minute-level ingestion plus long-running maintenance jobs, overlapping writes are not the exception—they are the norm. + +This is a classic concurrency anti-pattern: under OCC, conflict probability grows with transaction duration. Long-running jobs competing against frequent short writes lose nearly every commit race and retry indefinitely. When both concurrent writers are running ingestion, without careful coordination between the writers (e.g., segregating writers by partitions), the consequences become more severe: conflicts occur more often, overall throughput is reduced, and compute costs increase. The key insight is that retries are the throughput killer—we need a fundamentally different approach. + +## Hudi NBCC: Write in Parallel, Serialize by Completion Time + +NBCC avoids conflicts by design: let every writer append updates to Hudi’s log files in the Merge-on-Read (MOR) table, then let readers or mergers follow the serialization order based on write completion time. Let's say there are two writers, both updating a record concurrently. Under NBCC, each writer produces its own log file containing the update. Since there's no file contention, there's nothing to conflict on. At read time or during compaction, Hudi follows the write completion time and processes the associated log files in the proper order. + + + +Both OCC and NBCC require locking—OCC during commit validation, NBCC during timestamp generation. The key difference is how long the lock is held, and what happens after. OCC holds the lock while validating: for concurrent commits, it compares the sets of written files to detect conflicts—so validation time grows with both transaction size and concurrent writer count. If validation detects a conflict, the losing writers discard their completed work and retry. NBCC's lock duration is a negligible constant (a configurable clock skew duration, 200ms by default) regardless of transaction size: acquire lock, generate timestamp, sleep for clock skew, release. No file-level validation, no conflict detection, no retries. + +| | OCC | NBCC | +|:---------------|:------------------------------------------------|:--------------------------------------------| +| On conflict | Abort and retry | No conflicts—each writer appends separately | +| Lock duration | Scales with the number of written files to validate | Constant (brief clock skew duration) | +| Resource waste | High | Nearly none | + +Hudi supports both OCC and NBCC for multi-writer scenarios. Hudi also offers [early conflict detection](https://hudi.apache.org/docs/concurrency_control/#early-conflict-detection) for OCC, which can reduce wasted work by failing faster. However, OCC's validation lock duration still exceeds NBCC's timestamp generation time, and retries still occur after conflicts are detected—both impacting overall write throughput. + +## How NBCC Works Under the Hood + +Hudi NBCC relies on several design foundations to enable conflict-free concurrent writes and maximize throughput. + +### Record Keys and File Groups + +Hudi organizes data into file groups, where records with the same record key always route to the same file group. Hudi uses [indexes](https://hudi.apache.org/docs/indexes) to efficiently route records to file groups. For MOR tables, updates don't rewrite base files—instead, writers append updates to log files within the file group. + + + +This record colocation is a key foundation for making NBCC possible. Records and their updates will always be routed to the same file group based on record keys, either in base files or log files—all associated with the same file ID that identifies the file group. The record key to file group mapping and the file ID association support read and merge operations by efficiently locating files to process. Also, concurrent Hudi writers use timestamps and write tokens to generate non-conflicting file names within each file group, and thus data writing can be non-blocking. + +### Completion Time: Serializing Concurrent Writes + +With NBCC, concurrent writers produce log files whose write transactions overlap in time. To process these files correctly, we need a proper serialization order. Consider: Writer A starts deltacommit 1 at T1 and completes at T5; Writer B starts deltacommit 2 at T2 and completes at T4. If we order by start timestamp, deltacommit 1 would be processed first—but it actually finished later than deltacommit 2. Completion time reflects the correct order for processing the files. + + + +Tracking the completion time is critical for NBCC. Concurrent writers flush records to files in parallel without any guarantee of completion order based on the start time. Hudi timeline tracks when each write is actually completed, enabling the correct serialization order for the writes. + +### TrueTime-like Timestamp Generation + +Distributed writers running on different machines face clock skew—their local clocks may differ by tens or even hundreds of milliseconds. Without coordination, two writers could generate the same timestamp or produce incorrect ordering. + +Hudi solves this with a TrueTime-like mechanism inspired by [Google Spanner](https://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf): + + Review Comment: Missing image file "p5-truetime.gif". This image is referenced on line 70 but not included in the PR. ```suggestion  ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
