[ https://issues.apache.org/jira/browse/HBASE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hari Krishna Dara updated HBASE-13959: -------------------------------------- Attachment: HBASE-13959-4.patch Based on latest comments from Lars. > Region splitting takes too long because it uses a single thread in most > common cases > ------------------------------------------------------------------------------------ > > Key: HBASE-13959 > URL: https://issues.apache.org/jira/browse/HBASE-13959 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 0.98.12 > Reporter: Hari Krishna Dara > Assignee: Hari Krishna Dara > Priority: Critical > Fix For: 0.98.14 > > Attachments: 13959-suggest.txt, HBASE-13959-2.patch, > HBASE-13959-3.patch, HBASE-13959-4.patch, HBASE-13959.patch, > region-split-durations-compared.png > > > When storefiles need to be split as part of a region split, the current logic > uses a threadpool with the size set to the size of the number of stores. > Since most common table setup involves only a single column family, this > translates to having a single store and so the threadpool is run with a > single thread. However, in a write heavy workload, there could be several > tens of storefiles in a store at the time of splitting, and with a threadpool > size of one, these files end up getting split sequentially. > With a bit of tracing, I noticed that it takes on an average of 350ms to > create a single reference file, and splitting each storefile involves > creating two of these, so with a storefile count of 20, it takes about 14s > just to get through this phase alone (2 reference files for each storefile), > pushing the total time the region is offline to 18s or more. For environments > that are setup to fail fast, this makes the client exhaust all retries and > fail with NotServingRegionException. > The fix should increase the concurrency of this operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)