[
https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215080#comment-13215080
]
Jean-Daniel Cryans commented on HBASE-4365:
-------------------------------------------
Conclusion for the 1TB upload:
Flush size: 512MB
Split size: 20GB
Without patch:
18012s
With patch:
12505s
It's 1.44x better, so a huge improvement. The difference here is due to the
fact that it takes an awfully long time to split the first few regions without
the patch. In the past I was starting the test with a smaller split size and
then once I got a good distribution I was doing an online alter to set it to
20GB. Not anymore with this patch :)
Another observation: the upload in general is slowed down by "too many store
files" blocking. I could trace this to compactions taking a long time to get
rid of reference files (3.5GB taking more than 10 minutes) and during that time
you can hit the block multiple times. We really ought to see how we can
optimize the compactions, consider compacting those big files in many threads
instead of only one, and enable referencing reference files to skip some
compactions altogether.
> Add a decent heuristic for region size
> --------------------------------------
>
> Key: HBASE-4365
> URL: https://issues.apache.org/jira/browse/HBASE-4365
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 0.92.1, 0.94.0
> Reporter: Todd Lipcon
> Priority: Critical
> Labels: usability
> Attachments: 4365-v2.txt, 4365.txt
>
>
> A few of us were brainstorming this morning about what the default region
> size should be. There were a few general points made:
> - in some ways it's better to be too-large than too-small, since you can
> always split a table further, but you can't merge regions currently
> - with HFile v2 and multithreaded compactions there are fewer reasons to
> avoid very-large regions (10GB+)
> - for small tables you may want a small region size just so you can
> distribute load better across a cluster
> - for big tables, multi-GB is probably best
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira