[ 
https://issues.apache.org/jira/browse/HBASE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14607542#comment-14607542
 ] 

Lars Hofhansl commented on HBASE-13959:
---------------------------------------

bq. Is there an existing jira to capture the requirements for central execution 
pool? If not, I can work on it.

Don't think so. Let's create a new one.

bq. Meanwhile, is it OK to work on a follow up patch to increase the 
concurrency further by creating both reference files in parallel? This work 
will not go waste when a central pool is introduced

Sure. Be very careful, though, for correctness it is essential that the 
daughter come online in the correct order (daughter B first). Or at least are 
made visible in META in that order.

bq. This should battle tested. I do remember constant failures with # of split 
threads > 1 in my test runs a while ago (that was on a master branch ~ 9-10 
months ago).
That's the number of parallel splits executed. This is different. Clearly the 
intention of the code was to do this. As it is we make a new ThreadPool with 
exactly one thread in it in the vast majority of the cases.
I don't see how making multile references in parallel can cause harm unless 
we're overloading the system somewhere (maybe too many threads on the RS or not 
enough handler threads at the NN).

If I remember right you run with # blocking store files set to 200, right? You 
may have to dial down the max threads introduced here.

I have no problem running this in our production systems (of course we'll do 
our usual acceptance testing in any case).

bq. Why does creation of reference files take so long?
Each is creating a new file, which takes a while as the NN has to persist its 
state. (the Namenode can handle 10000 operation/sec, but each individual one 
takes a bit to finish). 1/5s-1/3s is about what I had measured in the past.


> Region splitting uses a single thread in most common cases
> ----------------------------------------------------------
>
>                 Key: HBASE-13959
>                 URL: https://issues.apache.org/jira/browse/HBASE-13959
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.98.12
>            Reporter: Hari Krishna Dara
>            Assignee: Hari Krishna Dara
>            Priority: Critical
>             Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0
>
>         Attachments: 13959-0.98.txt, 13959-suggest.txt, HBASE-13959-2.patch, 
> HBASE-13959-3.patch, HBASE-13959-4.patch, HBASE-13959-5.patch, 
> HBASE-13959.patch, region-split-durations-compared.png
>
>
> When storefiles need to be split as part of a region split, the current logic 
> uses a threadpool with the size set to the size of the number of stores. 
> Since most common table setup involves only a single column family, this 
> translates to having a single store and so the threadpool is run with a 
> single thread. However, in a write heavy workload, there could be several 
> tens of storefiles in a store at the time of splitting, and with a threadpool 
> size of one, these files end up getting split sequentially.
> With a bit of tracing, I noticed that it takes on an average of 350ms to 
> create a single reference file, and splitting each storefile involves 
> creating two of these, so with a storefile count of 20, it takes about 14s 
> just to get through this phase alone (2 reference files for each storefile), 
> pushing the total time the region is offline to 18s or more. For environments 
> that are setup to fail fast, this makes the client exhaust all retries and 
> fail with NotServingRegionException.
> The fix should increase the concurrency of this operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to