[ https://issues.apache.org/jira/browse/HBASE-7697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nick Dimiduk resolved HBASE-7697. --------------------------------- Resolution: Invalid Closing as invalid because this is pretty vague. If you're interested, see related mapreduce improvements in HBASE-8084. > Consolidate tools for getting data into, out of HBase > ----------------------------------------------------- > > Key: HBASE-7697 > URL: https://issues.apache.org/jira/browse/HBASE-7697 > Project: HBase > Issue Type: Improvement > Components: Client, mapreduce > Reporter: Nick Dimiduk > Assignee: Nick Dimiduk > > The user experience for importing data into HBase and getting a dump out of > HBase is pretty poor. The existing tools as I understand them include: > - org.apache.hadoop.hbase.mapreduce.Export, > - org.apache.hadoop.hbase.mapreduce.Import, > - org.apache.hadoop.hbase.mapreduce.ImportTsv, > - org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles, and > - org.apache.hadoop.hbase.mapreduce.CopyTable > Each one provides specific features that do not necessarily overlap with the > others. For instance, Import and ImportTsv could have most of their logic > combined, sharing common driver code and leaving the details of the > file-format up to the user to provide via a pluggable mapper. Export and > CopyTable both map over a target table; it's only the detail of what they do > with the data that is different. Bulk operations via HFiles could be a more > common use-case as well, not just a special case of ImportTsv. > The list of [open > issues|https://issues.apache.org/jira/issues/?filter=-1&jql=project%20%3D%20HBASE%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20text%20~%20%22ImportTsv%22%20ORDER%20BY%20updatedDate%20DESC] > against ImportTsv alone indicates users are using the tool, and I certainly > advise it for people getting started with a new HBase deployment. > I propose a single interface for getting data into and out of HBase. It would > be pluggable, allowing users to override details of their file formats and > schemas. We can provide implementations that replicate existing tool > behaviors as example modules. These tools are also a reasonable place, IMHO, > to include support for creation and loading of snapshots. > I started down the path of a specific tool intended to overcome some of the > limitations of ImportTsv and it has since refactored into a more general > purpose application. Initial patches forthcoming. Comments strongly > encouraged. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira