[ 
https://issues.apache.org/jira/browse/HBASE-10305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13867513#comment-13867513
 ] 

Chao Shi commented on HBASE-10305:
----------------------------------

bq. you can create a new test case,which is the batch update without 
WAL,firstly. Make sure that the problem is sync hLog.

I've tried SKIP_WAL specified at client side, and the performance improves 
greatly. Moreover, I think I can confirm the problem by looking into the stack, 
where threads are stuck at waiting for log sync.

bq. It depends on what level of write guarentee you want. There is a delayed 
log syncing feature already present. In that case there wont be immediate sync 
after a log append. Instead you can configure a time period at which this 
syncer thread doing a sync for till appends.

Hi Anoop, I understand that delayed sync should alleviate this problem. But I 
think it may be misleading with the default behaviour (i.e. sync immediately). 
A good row-key design is to spread the workload evenly over the cluster. 
However, this will lead to unexpected performance degradation as data grows. 
(The overhead of log syncs increases linearly if a batch size is larger than 
the number of servers.)

It seems like such behaviour is intended. So I'm going to close this ticket if 
no one else have better suggestions.

> Batch update performance drops as the number of regions grows
> -------------------------------------------------------------
>
>                 Key: HBASE-10305
>                 URL: https://issues.apache.org/jira/browse/HBASE-10305
>             Project: HBase
>          Issue Type: Bug
>          Components: Performance
>            Reporter: Chao Shi
>
> In our use case, we use a small number (~5) of proxy programs that read from 
> a queue and batch update to HBase. Our program is multi-threaded and HBase 
> client will batch mutations to each RS.
> We found we're getting lower TPS when there are more regions. I think the 
> reason is RS syncs HLog for each region. Suppose there is a single region, 
> the batch update will only touch one region and therefore syncs HLog once. 
> And suppose there are 10 regions per server, in RS#multi() it have to process 
> update for each individual region and sync HLog 10 times.
> Please note that in our scenario, batched mutations usually are independent 
> with each other and need to touch a various number of regions.
> We are using the 0.94 series, but I think the trunk should have the same 
> problem after a quick look into the code.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to