[ 
https://issues.apache.org/jira/browse/HBASE-29645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18043527#comment-18043527
 ] 

Hudson commented on HBASE-29645:
--------------------------------

Results for branch branch-2.6
        [build #394 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.6/394/]:
 (x) *{color:red}-1 overall{color}*
----
details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.6/394/General_20Nightly_20Build_20Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- Something went wrong running this stage, please [check relevant console 
output|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.6/394//console].


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.6/394/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 jdk11 hadoop3 checks{color}
-- Something went wrong running this stage, please [check relevant console 
output|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.6/394//console].


(x) {color:red}-1 jdk17 hadoop3 checks{color}
-- Something went wrong running this stage, please [check relevant console 
output|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.6/394//console].














> AsyncBufferedMutatorImpl concurrency improvement
> ------------------------------------------------
>
>                 Key: HBASE-29645
>                 URL: https://issues.apache.org/jira/browse/HBASE-29645
>             Project: HBase
>          Issue Type: Improvement
>          Components: Client
>    Affects Versions: 3.0.0-beta-1, 2.6.3, 2.5.12
>            Reporter: Andrew Kyle Purtell
>            Assignee: Andrew Kyle Purtell
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.7.0, 3.0.0-beta-2, 2.6.5, 2.5.14
>
>         Attachments: AsyncBufferedMutatorBenchmark.java
>
>
> This patch modifies AsyncBufferedMutatorImpl class to improve its performance 
> under concurrent usage.
> While AsyncTable#batch() is largely asynchronous in nature, it can exhibit 
> blocking behavior during its preparation phase, for instance, while looking 
> up region locations. In the original implementation of 
> AsyncBufferedMutatorImpl, calls to AsyncTable#batch() occur within a 
> synchronized block, potentially causing severe contention and stalling other 
> threads trying to buffer their mutations. The original implementation relied 
> on coarse grained synchronized blocks for multi-threading safety, so when one 
> thread triggered a buffer flush (either because the buffer was full or a 
> periodic timer fired), all other threads attempting to add mutations via the 
> mutate method would be blocked until the table.batch() call completed, which 
> could take a surprisingly long time.
> The new implementation replaces the broad synchronized blocks with a 
> ReentrantLock. This lock is acquired only for the brief period needed to 
> safely copy the current batch of mutations and futures into local variables 
> and swap in a new internal buffer. Immediately after this quick operation, 
> the lock is released. The batch() call is then executed outside of the locked 
> section. This allows other threads to continue adding new mutations 
> concurrently while the flushing of the previous batch proceeds independently. 
> The client has already opted in to asynchronous and potentially interleaved 
> commit of the mutations submitted to AsyncBufferedMutator, by definition. The 
> minimization of critical section scope minimizes thread contention and 
> significantly boosts throughput under load. Other related profiler driven 
> efficiency changes are also included, such as elimination of stream api and 
> array resizing hotspots identified by the profiler.
> To validate the performance improvement of these changes, a JMH benchmark, 
> AsyncBufferedMutatorBenchmark (attached to this issue), was created to 
> measure the performance of the mutate method under various conditions. It 
> focuses specifically on the overhead and concurrency management of 
> AsyncBufferedMutatorImpl itself, not the underlying network communication. To 
> achieve this, it uses the Mockito framework to create a mock AsyncTable that 
> instantly returns completed futures, isolating the mutator's buffering logic 
> for measurement. It runs tests with 1, 10, and 100 threads to simulate no, 
> medium, and high levels of concurrency. It uses a low value (100) for 
> maxMutations to force frequent flushes based on the number of mutations, and 
> a very high value (100,000) to ensure flushes are rare in that measurement 
> case. The benchmark measures the average time per operation in microseconds, 
> where a lower score indicates better performance and higher throughput.
> With a single thread and no contention the performance of both 
> implementations is nearly identical. The minor variations are negligible and 
> show that the new locking mechanism does not introduce any performance 
> regression in the non-concurrent case. For example, with a 10MB buffer and 
> high maxMutations, the NEW implementation scored 0.167 µs/op while the OLD 
> scored 0.169 µs/op, a statistically insignificant difference. When the test 
> is run with 10 threads, a noticeable gap appears. In the scenario designed to 
> cause frequent flushes (maxMutations = 100), the NEW implementation is 
> approximately 12 times faster than the OLD one (14.250 µs/op for NEW vs. 
> 172.463 µs/op for OLD). This is because the OLD implementation forces threads 
> to wait while flushes occur, and flushes incur a synthetic thread sleep of 
> 1ms to simulate occasional unexpected blocking behavior in 
> AsyncTable#batch(), whereas the NEW implementation allows them to proceed 
> without contention. The most significant results come from the 100-thread 
> tests, which simulate high contention. In the frequent flush scenario 
> (maxMutations = 100) the NEW implementation is 114 times faster in the 
> synthetic benchmark scenario (16.123 µs/op for NEW vs. 1847.567 µs/op for 
> OLD). Note that blocking IO observed in a real client for e.g. region 
> location lookups can produce a much more significant impact. With the OLD 
> code, 100 threads are constantly competing for a lock that is held for a long 
> duration, leading to a contention storm. The NEW code's reduced locking scope 
> almost entirely eliminates this bottleneck.
> OS: Apple Silicon (aarch64) M1 Max / 64 GB
> JVM: openjdk version "17.0.11" 2024-04-16 LTS  / OpenJDK 64-Bit Server VM 
> Zulu17.50+19-CA (build 17.0.11+9-LTS, mixed mode, sharing)
> |Threads|Max Mutations|OLD Implementation (µs/op)|NEW Implementation 
> (µs/op)|Performance Gain (OLD / NEW)|
> |1|100|14.091|16.313|0.86x (comparable)|
> |1|100,000|0.169|0.167|1.01x (comparable)|
> |10|100|172.463|14.250|12.10x|
> |10|100,000|2.465|1.072|2.30x|
> |100|100|1847.567|16.123|114.59x|
> |100|100,000|24.125|12.796|1.89x|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to