[ 
https://issues.apache.org/jira/browse/HBASE-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15278799#comment-15278799
 ] 

Matteo Bertozzi commented on HBASE-15808:
-----------------------------------------

patch looks ok to me. checkstyle may complain about that first try/catch 
alignment.

any chance that we can have a unit test so if someone is going to remove it we 
will notice?
looks like we have already some tests that do splits in 
TestLoadIncrementalHFiles, but maybe it is not so trivial to find and check the 
files with what we have today.

> Reduce potential bulk load intermediate space usage and waste
> -------------------------------------------------------------
>
>                 Key: HBASE-15808
>                 URL: https://issues.apache.org/jira/browse/HBASE-15808
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 1.2.0
>            Reporter: Jerry He
>            Assignee: Jerry He
>            Priority: Minor
>             Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.2
>
>         Attachments: HBASE-15808-v2.patch, HBASE-15808.patch
>
>
> If the bulk load input files do not match the existing region boudaries, the 
> files will be splitted.
> In the unfornate cases where the files need to be splitted multiple times,
> the process can consume unnecessary space and can even cause out of space.
> Here is over-simplified example.
> Orinal size of input files:  
>   consumed space: size --> 300GB
> After a round of splits: 
>   consumed space: size + tmpspace1 --> 300GB + 300GB
> After another round of splits: 
>   consumded space:  size + tmpspace1 + tmpspace2 --> 300GB + 300GB + 300GB
> ..
> Currently we don't do any cleanup in the process. At least all the 
> intermediate tmpspace (not the last one) can be deleted in the process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to