[ 
https://issues.apache.org/jira/browse/HBASE-7697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-7697.
---------------------------------

    Resolution: Invalid

Closing as invalid because this is pretty vague. If you're interested, see 
related mapreduce improvements in HBASE-8084.
                
> Consolidate tools for getting data into, out of HBase
> -----------------------------------------------------
>
>                 Key: HBASE-7697
>                 URL: https://issues.apache.org/jira/browse/HBASE-7697
>             Project: HBase
>          Issue Type: Improvement
>          Components: Client, mapreduce
>            Reporter: Nick Dimiduk
>            Assignee: Nick Dimiduk
>
> The user experience for importing data into HBase and getting a dump out of 
> HBase is pretty poor. The existing tools as I understand them include:
> - org.apache.hadoop.hbase.mapreduce.Export,
> - org.apache.hadoop.hbase.mapreduce.Import,
> - org.apache.hadoop.hbase.mapreduce.ImportTsv,
> - org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles, and
> - org.apache.hadoop.hbase.mapreduce.CopyTable
> Each one provides specific features that do not necessarily overlap with the 
> others. For instance, Import and ImportTsv could have most of their logic 
> combined, sharing common driver code and leaving the details of the 
> file-format up to the user to provide via a pluggable mapper. Export and 
> CopyTable both map over a target table; it's only the detail of what they do 
> with the data that is different. Bulk operations via HFiles could be a more 
> common use-case as well, not just a special case of ImportTsv.
> The list of [open 
> issues|https://issues.apache.org/jira/issues/?filter=-1&jql=project%20%3D%20HBASE%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20text%20~%20%22ImportTsv%22%20ORDER%20BY%20updatedDate%20DESC]
>  against ImportTsv alone indicates users are using the tool, and I certainly 
> advise it for people getting started with a new HBase deployment.
> I propose a single interface for getting data into and out of HBase. It would 
> be pluggable, allowing users to override details of their file formats and 
> schemas. We can provide implementations that replicate existing tool 
> behaviors as example modules. These tools are also a reasonable place, IMHO, 
> to include support for creation and loading of snapshots.
> I started down the path of a specific tool intended to overcome some of the 
> limitations of ImportTsv and it has since refactored into a more general 
> purpose application. Initial patches forthcoming. Comments strongly 
> encouraged.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to