(hudi) branch asf-site updated: docs(blog): new blog for Hudi NBCC (#17613)

xushiyan Tue, 16 Dec 2025 14:28:02 -0800

This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 280efa64a6c5 docs(blog): new blog for Hudi NBCC (#17613)
280efa64a6c5 is described below

commit 280efa64a6c549152ac6ba3d76696a539cff11b2
Author: Shiyan Xu <[email protected]>
AuthorDate: Tue Dec 16 16:27:36 2025 -0600

    docs(blog): new blog for Hudi NBCC (#17613)
---
 .../blog/2025-12-16-maximizing-throughput-nbcc.md  | 138 +++++++++++++++++++++
 .../p1-occ-retries.png                             | Bin 0 -> 48438 bytes
 .../p2-nbcc-overview.png                           | Bin 0 -> 51569 bytes
 .../p3-recordkey-filegroup.png                     | Bin 0 -> 34785 bytes
 .../p4-completion-time.png                         | Bin 0 -> 32325 bytes
 .../p5-truetime.gif                                | Bin 0 -> 104598 bytes
 .../p6-nbcc-compaction.png                         | Bin 0 -> 35097 bytes
 7 files changed, 138 insertions(+)

diff --git a/website/blog/2025-12-16-maximizing-throughput-nbcc.md 
b/website/blog/2025-12-16-maximizing-throughput-nbcc.md
new file mode 100644
index 000000000000..8ed3b5879874
--- /dev/null
+++ b/website/blog/2025-12-16-maximizing-throughput-nbcc.md
@@ -0,0 +1,138 @@
+---
+title: "Maximizing Throughput with Apache Hudi NBCC: Stop Retrying, Start 
Scaling"
+excerpt: "Learn how Hudi's Non-Blocking Concurrency Control eliminates retry 
storms for concurrent writers, maximizing throughput in streaming and mixed 
workloads."
+author: "Shiyan Xu"
+category: blog
+image: 
/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p6-nbcc-compaction.png
+tags:
+  - hudi
+  - data lakehouse
+  - concurrency control
+  - streaming
+---
+
+Data lakehouses often run multiple concurrent writers—streaming ingestion, 
batch ETL, maintenance jobs. The default approach, Optimistic Concurrency 
Control (OCC), assumes conflicts are rare and handles them through retries. 
That assumption breaks down in increasingly common scenarios, such as running 
maintenance batch jobs on tables receiving streaming writes. When conflicts 
become the norm, retries pile up with OCC, and the write throughput tanks.
+
+Hudi introduced [Non-Blocking Concurrency Control 
(NBCC)](https://hudi.apache.org/docs/concurrency_control#non-blocking-concurrency-control)
 in release 1.0, solving this problem by allowing writers to append data files 
in parallel and using the write completion time to determine the serialization 
order for reads or merges. We'll explore why OCC struggles under real-world 
concurrency, how NBCC works under the hood, and how to configure NBCC in your 
pipelines.
+
+## The Problem with Retries
+
+Picture this scenario: your streaming pipeline ingests clickstream data every 
minute from multiple Kafka topics. A nightly GDPR deletion job kicks off at 
midnight, scanning across thousands of partitions to purge user records—also 
touching data files the ingestion pipeline is actively writing to. By 3 AM, you 
get paged—the deletion job has failed repeatedly, burning compute resources 
while the ingestion writer keeps winning the race to commit.
+
+![p1-occ-retries](/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p1-occ-retries.png)
+
+OCC assumes conflicts are rare—an assumption that held in traditional 
batch-oriented data lakes where jobs were scheduled sequentially. Most 
transactions will not overlap, so let them proceed optimistically and check for 
conflicts at commit time. But high-frequency streaming breaks this assumption: 
when you have minute-level ingestion plus long-running maintenance jobs, 
overlapping writes are not the exception—they are the norm.
+
+This is a classic concurrency anti-pattern: under OCC, conflict probability 
grows with transaction duration. Long-running jobs competing against frequent 
short writes lose nearly every commit race and retry indefinitely. When both 
concurrent writers are running ingestion, without careful coordination between 
the writers (e.g., segregating writers by partitions), the consequences become 
more severe: conflicts occur more often, overall throughput is reduced, and 
compute costs increase. The [...]
+
+## Hudi NBCC: Write in Parallel, Serialize by Completion Time
+
+NBCC avoids conflicts by design: let every writer append updates to Hudi’s log 
files in the Merge-on-Read (MOR) table, then let readers or mergers follow the 
serialization order based on write completion time. Let's say there are two 
writers, both updating a record concurrently. Under NBCC, each writer produces 
its own log file containing the update. Since there's no file contention, 
there's nothing to conflict on. At read time or during compaction, Hudi follows 
the write completion time [...]
+
+![p2-nbcc-overview](/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p2-nbcc-overview.png)
+
+Both OCC and NBCC require locking—OCC during commit validation, NBCC during 
timestamp generation. The key difference is how long the lock is held, and what 
happens after. OCC holds the lock while validating: for concurrent commits, it 
compares the sets of written files to detect conflicts—so validation time grows 
with both transaction size and concurrent writer count. If validation detects a 
conflict, the losing writers discard their completed work and retry. NBCC's 
lock duration is a ne [...]
+
+|                | OCC                                             | NBCC      
                                  |
+|:---------------|:------------------------------------------------|:--------------------------------------------|
+| On conflict    | Abort and retry                                 | No 
conflicts—each writer appends separately |
+| Lock duration  | Scales with the number of written files to validate | 
Constant (brief clock skew duration)        |
+| Resource waste | High                                            | Nearly 
none                                 |
+
+Hudi supports both OCC and NBCC for multi-writer scenarios. Hudi also offers 
[early conflict 
detection](https://hudi.apache.org/docs/concurrency_control/#early-conflict-detection)
 for OCC, which can reduce wasted work by failing faster. However, OCC's 
validation lock duration still exceeds NBCC's timestamp generation time, and 
retries still occur after conflicts are detected—both impacting overall write 
throughput.
+
+## How NBCC Works Under the Hood
+
+Hudi NBCC relies on several design foundations to enable conflict-free 
concurrent writes and maximize throughput.
+
+### Record Keys and File Groups
+
+Hudi organizes data into file groups, where records with the same record key 
always route to the same file group. Hudi uses 
[indexes](https://hudi.apache.org/docs/indexes) to efficiently route records to 
file groups. For MOR tables, updates don't rewrite base files—instead, writers 
append updates to log files within the file group.
+
+![p3-recordkey-filegroup](/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p3-recordkey-filegroup.png)
+
+This record colocation is a key foundation for making NBCC possible. Records 
and their updates will always be routed to the same file group based on record 
keys, either in base files or log files—all associated with the same file ID 
that identifies the file group. The record key to file group mapping and the 
file ID association support read and merge operations by efficiently locating 
files to process. Also, concurrent Hudi writers use timestamps and write tokens 
to generate non-conflict [...]
+
+### Completion Time: Serializing Concurrent Writes
+
+With NBCC, concurrent writers produce log files whose write transactions 
overlap in time. To process these files correctly, we need a proper 
serialization order. Consider: Writer A starts deltacommit 1 at T1 and 
completes at T5; Writer B starts deltacommit 2 at T2 and completes at T4. If we 
order by start timestamp, deltacommit 1 would be processed first—but it 
actually finished later than deltacommit 2. Completion time reflects the 
correct order for processing the files.
+
+![p4-completion-time](/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p4-completion-time.png)
+
+Tracking the completion time is critical for NBCC. Concurrent writers flush 
records to files in parallel without any guarantee of completion order based on 
the start time. Hudi timeline tracks when each write is actually completed, 
enabling the correct serialization order for the writes.
+
+### TrueTime-like Timestamp Generation
+
+Distributed writers running on different machines face clock skew—their local 
clocks may differ by tens or even hundreds of milliseconds. Without 
coordination, two writers could generate the same timestamp or produce 
incorrect ordering.
+
+Hudi solves this with a TrueTime-like mechanism inspired by [Google 
Spanner](https://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf):
+
+![p5-truetime](/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p5-truetime.gif)
+
+The process works as follows:
+
+1. Acquire a distributed lock
+2. Generate timestamp using local clock
+3. Sleep for X milliseconds
+4. Release lock
+
+The sleep accounts for the worst-case clock skew between writers. By waiting 
longer than the maximum expected skew before releasing the lock, Hudi 
guarantees that any subsequent writer will generate a strictly greater 
timestamp.
+
+These monotonically increasing timestamps ensure that concurrent writers never 
produce conflicting or out-of-order commits—achieving transaction 
serializability and completing the foundation for NBCC.
+
+### Supporting Designs
+
+In scenarios suited for NBCC, we may encounter long commit histories and need 
to properly merge records. Hudi 1.0 introduced two supporting designs that 
complement NBCC.
+
+**LSM Timeline**: High-frequency streaming can produce millions of commits 
over time. Listing and parsing individual timeline files would be prohibitively 
slow, and the storage overhead can become a headache. [Hudi 
timeline](https://hudi.apache.org/docs/timeline) uses an LSM Tree 
structure—archiving older commits into sorted and compacted Parquet files for 
efficient lookups and reduced storage footprint.
+
+**Merge Modes**: When log files from concurrent writers need to be merged 
during reads or compaction, we need proper record-merging logic. Hudi supports 
flexible [merge modes](https://hudi.apache.org/docs/record_merger#merge-modes) 
that control how records with the same key are resolved—whether to keep the 
latest by commit time, respect user-defined ordering fields, or apply custom 
merge functions.
+
+## Using NBCC
+
+With the design foundations covered, let's see how NBCC works in practice.
+
+### NBCC in Action
+
+NBCC allows concurrent writers to append data to log files in a 
non-conflicting way. Subsequent read or merge operations will need to follow 
the write completion time to process the files. Take compaction as an example, 
as shown in the diagram below.
+
+![p6-nbcc-compaction](/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p6-nbcc-compaction.png)
+
+Consider two writers: Writer A creates deltacommit 1 (starts at T1, completes 
at T5); Writer B creates deltacommit 2 (starts at T2, completes at T3). When 
compaction is scheduled at T4, the planner only includes files from deltacommit 
2, since its completion time (T3) is earlier than the compaction schedule time 
(T4). Deltacommit 1, though started earlier, is excluded because it hadn't 
completed yet when compaction was planned—its files will be included in a later 
compaction.
+
+Snapshot reads follow the same rules. A query at T4 includes data from 
deltacommit 2 but excludes deltacommit 1, which hasn't finished yet. A query 
after T5 includes both deltacommits, with log files read in completion time 
order and records merged according to the configured merge mode.
+
+### Configuration
+
+NBCC requires Hudi 1.0+ and a lock provider for TrueTime-like timestamp 
generation. Common options for lock providers include ZooKeeper-based and 
DynamoDB-based, which integrate with existing infrastructure many organizations 
already run. Hudi 1.1 introduced the [storage-based lock 
provider](https://hudi.apache.org/docs/concurrency_control#storage-based-lock-provider),
 which uses cloud storage conditional writes (S3, GCS) and requires no external 
server—an option with less operational ov [...]
+
+```py
+hudi_writer_options = {
+  'hoodie.write.concurrency.mode': 'NON_BLOCKING_CONCURRENCY_CONTROL',
+  'hoodie.write.lock.provider': 
'org.apache.hudi.client.transaction.lock.StorageBasedLockProvider',
+}
+```
+
+To enable NBCC for your concurrent writers, configure the concurrency mode and 
lock provider options for each writer, as shown in the example above.
+
+### When to Use NBCC
+
+If you are running multiple concurrent streaming writers, or running streaming 
ingestion with batch maintenance jobs like GDPR deletion, NBCC is more suitable 
than OCC. The table below summarizes some common examples:
+
+| Use Case                                                     | 
Recommendation | Why                                      |
+|:-------------------------------------------------------------|:---------------|:-----------------------------------------|
+| Batch ETL with single writer or multiple coordinated writers | OCC is fine   
 | No concurrency conflicts                 |
+| Multiple concurrent streaming writers                        | NBCC          
 | Avoid retry storms                       |
+| Mixed streaming + batch maintenance                         | NBCC           
| Long-running jobs will not starve        |
+| Copy-on-Write (COW) tables with infrequent updates                           
| OCC is fine    | COW rewrites base files anyway           |
+| MOR tables with frequent updates                             | NBCC          
 | Maximum benefit from log file separation |
+
+Hudi NBCC is designed specifically for MOR tables. COW tables rewrite entire 
base files on updates, so file-level conflicts are unavoidable regardless of 
concurrency control mode. As of now, NBCC is restricted to working with tables 
using the simple bucket index or partition-level bucket index. Learn more from 
the [concurrency control docs 
page](https://hudi.apache.org/docs/concurrency_control#non-blocking-concurrency-control).
+
+## Summary
+
+OCC assumes conflicts are rare. When you mix high-frequency streaming with 
long-running maintenance jobs, OCC's retry-on-conflict model breaks 
down—causing wasted compute, reduced throughput, and job starvation. Retries 
are the throughput killer.
+
+NBCC takes a different approach: let every writer succeed by appending to 
separate log files, then follow the write completion time for reads and 
compaction. Three design foundations make this possible—record keys and file 
groups that colocate records and their updates, completion time tracking that 
properly orders overlapping write transactions, and TrueTime-like timestamp 
generation that guarantees monotonically increasing timestamps across 
distributed writers.
+
+The result: maximum throughput for concurrent writes in Hudi pipelines. 
Long-running jobs complete without being starved, multiple ingestion pipelines 
coexist without contention, and your data platform scales without coordination 
overhead. Stop retrying, start scaling—see [the 
docs](https://hudi.apache.org/docs/overview) to get started.
diff --git 
a/website/static/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p1-occ-retries.png
 
b/website/static/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p1-occ-retries.png
new file mode 100644
index 000000000000..fabf7d93897e
Binary files /dev/null and 
b/website/static/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p1-occ-retries.png
 differ
diff --git 
a/website/static/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p2-nbcc-overview.png
 
b/website/static/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p2-nbcc-overview.png
new file mode 100644
index 000000000000..7d69ac12be56
Binary files /dev/null and 
b/website/static/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p2-nbcc-overview.png
 differ
diff --git 
a/website/static/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p3-recordkey-filegroup.png
 
b/website/static/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p3-recordkey-filegroup.png
new file mode 100644
index 000000000000..03136a636e3f
Binary files /dev/null and 
b/website/static/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p3-recordkey-filegroup.png
 differ
diff --git 
a/website/static/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p4-completion-time.png
 
b/website/static/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p4-completion-time.png
new file mode 100644
index 000000000000..0aeca68885f3
Binary files /dev/null and 
b/website/static/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p4-completion-time.png
 differ
diff --git 
a/website/static/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p5-truetime.gif
 
b/website/static/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p5-truetime.gif
new file mode 100644
index 000000000000..1d1eef9b5e97
Binary files /dev/null and 
b/website/static/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p5-truetime.gif
 differ
diff --git 
a/website/static/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p6-nbcc-compaction.png
 
b/website/static/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p6-nbcc-compaction.png
new file mode 100644
index 000000000000..9e80f0c240ef
Binary files /dev/null and 
b/website/static/assets/images/blog/2025-12-16-maximizing-throughput-nbcc/p6-nbcc-compaction.png
 differ

(hudi) branch asf-site updated: docs(blog): new blog for Hudi NBCC (#17613)

Reply via email to