[ 
https://issues.apache.org/jira/browse/HBASE-24436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119762#comment-17119762
 ] 

Andrew Kyle Purtell edited comment on HBASE-24436 at 5/29/20, 4:58 PM:
-----------------------------------------------------------------------

If you allocate one pool at region level with 10 threads for 10 stores, you 
have 10 threads for opening all store files in the region. If you allocate 10 
pools for 10 stores with 10 threads each you have 100 threads for opening all 
store files in the region*. If the goal is to open regions as quickly as 
possible, the parallelism of 100 threads is better than 10. Allocating 
proportionally at the store level is better aligned than at the region level. 
When using fixed pools allocating the resource at the store level seems a 
better option. That's the simple point.

A compromise approach could allocate a pool at the region level that is sized 
proportionally to the number of store _files_, not stores, with some 
configurable upper bound.

Your point about diminishing returns of increased parallelism is well met. I 
would expect the implementation of ForkJoinPool with internal controls on 
parallelism and its work stealing approach to executing runnables to be aware 
of this. It may be my confidence in FJP's implementation is misplaced, but that 
remains to be seen (and we/I can look at the code if serious about using it to 
recursively decompose the region open process).

* - Rather than fixed pool per store I would recommend a sizing formula based 
on the number of store files and some configurable upper bound.


was (Author: apurtell):
If you allocate one pool at region level with 10 threads for 10 stores, you 
have 10 threads for opening all store files in the region. If you allocate 10 
pools for 10 stores with 10 threads each you have 100 threads for opening all 
store files in the region*. If the goal is to open regions as quickly as 
possible, the parallelism of 100 threads is better than 10. When using fixed 
pools allocating the resource at the store level seems a better option. That's 
the simple point.

Your point about diminishing returns of increased parallelism is well met. I 
would expect the implementation of ForkJoinPool with internal controls on 
parallelism and its work stealing approach to executing runnables to be aware 
of this. It may be my confidence in FJP's implementation is misplaced, but that 
remains to be seen (and we/I can look at the code if serious about using it to 
recursively decompose the region open process). 

* - Rather than fixed pool per store I would recommend a sizing formula based 
on the number of store files and some configurable upper bound. 

> The store file open and close thread pool should be shared at the region level
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-24436
>                 URL: https://issues.apache.org/jira/browse/HBASE-24436
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Junhong Xu
>            Assignee: Junhong Xu
>            Priority: Minor
>
> For now, we provide threads per column family evenly in general, but  there 
> are some cases that some column families have much more store files than 
> others( maybe that's the life, right? ). So in that case, some Stores have 
> beed done quickly while others are struggling.We should share the thread pool 
> at the region level in case of data skew.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to