[ https://issues.apache.org/jira/browse/HBASE-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030224#comment-13030224 ]
Hudson commented on HBASE-3721: ------------------------------- Integrated in HBase-TRUNK #1909 (See [https://builds.apache.org/hudson/job/HBase-TRUNK/1909/]) > Speedup LoadIncrementalHFiles > ----------------------------- > > Key: HBASE-3721 > URL: https://issues.apache.org/jira/browse/HBASE-3721 > Project: HBase > Issue Type: Improvement > Components: util > Reporter: Ted Yu > Assignee: Ted Yu > Fix For: 0.92.0 > > Attachments: 3721-v2.txt, 3721-v3.txt, 3721-v4.txt, 3721-v6.patch, > 3721.txt, LoadIncrementalHFiles.java > > > From Adam Phelps: > from the logs it looks like <1% of the hfiles we're loading have to be split. > Looking at the code for LoadIncrementHFiles (hbase v0.90.1), I'm actually > thinking our problem is that this code loads the hfiles sequentially. Our > largest table has over 2500 regions and the data being loaded is fairly well > distributed across them, so there end up being around 2500 HFiles for each > load period. At 1-2 seconds per HFile that means the loading process is very > time consuming. > Currently server.bulkLoadHFile() is a blocking call. > We can utilize ExecutorService to achieve better parallelism on multi-core > computer. > New configuration parameter "hbase.loadincremental.threads.max" is introduced > which sets the maximum number of threads for parallel bulk load. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira