[ 
https://issues.apache.org/jira/browse/HBASE-24436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117881#comment-17117881
 ] 

Andrew Kyle Purtell commented on HBASE-24436:
---------------------------------------------

In another brainstorming we had where I work there is an idea to use a fork 
join pool for open work. The pool is allocated at the regionserver level. Open 
work is recursively decomposed to align with the fork-join model. Each open 
task is a runnable. Each store file open is a runnable. We let the pool decide 
what is the appropriate level of parallelism and do not give an explicit bound. 
I can’t prove it without trying it and collecting metrics but this would allow 
us to be faster about opening than any option that uses fixed size pools when 
we have uneven distribution of number of files in stores, some quite large 
(like date tiered ones). 

In the meantime if you are going to continue to use fixed sized pools they 
should be allocated per store not per region so they will at least be 
proportional to the number of stores (as opposed to number of regions, which is 
not the same and may be substantially lower, imagine a table with 10 
families... you are exchanging 10 fixed size pools for 1 fixed size pool with 
this proposal, seems the wrong direction)

> The store file open and close thread pool should be shared at the region level
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-24436
>                 URL: https://issues.apache.org/jira/browse/HBASE-24436
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Junhong Xu
>            Assignee: Junhong Xu
>            Priority: Minor
>
> For now, we provide threads per column family evenly in general, but  there 
> are some cases that some column families have much more store files than 
> others( maybe that's the life, right? ). So in that case, some Stores have 
> beed done quickly while others are struggling.We should share the thread pool 
> at the region level in case of data skew.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to