[ 
https://issues.apache.org/jira/browse/HBASE-5475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560232#comment-13560232
 ] 

Nick Dimiduk commented on HBASE-5475:
-------------------------------------

I have an application that does this. It depends on what is currently an 
[external library|https://github.com/ndimiduk/reservoirsampler] implementing a 
reservoir sampler over the input data to produce the splits file. The code is 
actually from one of the examples in Alex Holmes's book. I'd like to roll the 
functionality into ImportTsv, but my application functions pretty differently 
than the current tool.
                
> Allow importtsv and Import to work truly offline when using bulk import option
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-5475
>                 URL: https://issues.apache.org/jira/browse/HBASE-5475
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: Lars Hofhansl
>            Priority: Minor
>
> Currently importtsv (and now also Import with HBASE-5440) support using 
> HFileOutputFormat for later bulk loading.
> However, currently that cannot be without having access to the table we're 
> going to import to, because both importtsv and Import need to lookup the 
> split points, and find the compression setting.
> It would be nice if there would be an offline way to provide the split point 
> and compression setting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to