[ https://issues.apache.org/jira/browse/HBASE-5475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560232#comment-13560232 ]
Nick Dimiduk commented on HBASE-5475: ------------------------------------- I have an application that does this. It depends on what is currently an [external library|https://github.com/ndimiduk/reservoirsampler] implementing a reservoir sampler over the input data to produce the splits file. The code is actually from one of the examples in Alex Holmes's book. I'd like to roll the functionality into ImportTsv, but my application functions pretty differently than the current tool. > Allow importtsv and Import to work truly offline when using bulk import option > ------------------------------------------------------------------------------ > > Key: HBASE-5475 > URL: https://issues.apache.org/jira/browse/HBASE-5475 > Project: HBase > Issue Type: Improvement > Components: mapreduce > Reporter: Lars Hofhansl > Priority: Minor > > Currently importtsv (and now also Import with HBASE-5440) support using > HFileOutputFormat for later bulk loading. > However, currently that cannot be without having access to the table we're > going to import to, because both importtsv and Import need to lookup the > split points, and find the compression setting. > It would be nice if there would be an offline way to provide the split point > and compression setting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira