[ https://issues.apache.org/jira/browse/HBASE-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281536#comment-15281536 ]
Matteo Bertozzi commented on HBASE-15808: ----------------------------------------- +1 > Reduce potential bulk load intermediate space usage and waste > ------------------------------------------------------------- > > Key: HBASE-15808 > URL: https://issues.apache.org/jira/browse/HBASE-15808 > Project: HBase > Issue Type: Improvement > Affects Versions: 1.2.0 > Reporter: Jerry He > Assignee: Jerry He > Priority: Minor > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.2 > > Attachments: HBASE-15808-v2.patch, HBASE-15808-v3.patch, > HBASE-15808.patch > > > If the bulk load input files do not match the existing region boudaries, the > files will be splitted. > In the unfornate cases where the files need to be splitted multiple times, > the process can consume unnecessary space and can even cause out of space. > Here is over-simplified example. > Orinal size of input files: > consumed space: size --> 300GB > After a round of splits: > consumed space: size + tmpspace1 --> 300GB + 300GB > After another round of splits: > consumded space: size + tmpspace1 + tmpspace2 --> 300GB + 300GB + 300GB > .. > Currently we don't do any cleanup in the process. At least all the > intermediate tmpspace (not the last one) can be deleted in the process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)