[ 
https://issues.apache.org/jira/browse/HBASE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476528#comment-13476528
 ] 

Kannan Muthukkaruppan commented on HBASE-6980:
----------------------------------------------

I did a quick prototype against 89-fb with expected results. In my test setup, 
I was doing WAL-less puts, and previously wasn't able to go much beyond 
100MB/second of ingest into HBase, but with parallel flushing, was able to get 
3-4x improvement.

Two locks that got in the way of the implementation were (which I temporarily 
just commented out in the prototype) are:

* In MemStoreFlusher.java, the lock variable named "lock" seems to be getting 
acquired in MemStoreFlusher.java:interruptIfNecessary() to ensure that an 
orderly shutdown is done after any in-progress flush completes.  Because the 
flushRegion() also grabs the same lock, we will need to figure out if we can 
simply get rid of the lock or use reader-writer locks (such that the flushers 
can grab it in read mode, and the interrupt grabs it in write mode).

* In HLog.java: startCacheFlush/completeCacheFlush() grab the cacheFlushLock. 
This lock is also grabbed by the log roller (rollWriter()) and HLog.close() 
methods. It is not clear to me yet why the rollWriter() needs to grab the 
cacheFlushLock.

If anyone has further thoughts on a good resolution for the above locks or the 
exact original intent for those locks (Stack?), please share your ideas.

                
> Parallel Flushing Of Memstores
> ------------------------------
>
>                 Key: HBASE-6980
>                 URL: https://issues.apache.org/jira/browse/HBASE-6980
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> For write dominated workloads, single threaded memstore flushing is an 
> unnecessary bottleneck. With a single flusher thread, we are basically not 
> setup to take advantage of the aggregate throughput that multi-disk nodes 
> provide.
> * For puts with WAL enabled, the bottleneck is more likely the "single" WAL 
> per region server. So this particular fix may not buy as much unless we 
> unlock that bottleneck with multiple commit logs per region server. (Topic 
> for a separate JIRA-- HBASE-6981).
> * But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk 
> imports), we should be able to support much better ingest rates with parallel 
> flushing of memstores.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to