[jira] Commented: (HADOOP-2075) [hbase] Bulk load and dump tools

stack (JIRA) Tue, 06 Nov 2007 13:42:12 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540582
 ]


stack commented on HADOOP-2075:
-------------------------------

Bulk uploader needs to be able to tolerate myriad data input types.  Data will 
likely need massaging and ultimately, if writing HRegion content directly into 
HDFS rather than going against hbase API -- preferred since it'll be dog slow 
doing bulk uploads going against hbase API -- then it has to be sorted.  Using 
mapreduce would make sense.

Look too at using PIG because it has a few LOAD implementations -- from files 
on local or HDFS -- and some facility for doing transforms on data moving 
tuples around.  Would need to write a special STORE operator that wrote the 
data sorted out as HRegions direct into HDFS (This would be different than 
PIG-6 which is about writing into hbase via API).

Also, chatting with Jim, this is a pretty important issue.  This is the first 
folks run into when they start to get serious about hbase.

> [hbase] Bulk load and dump tools
> --------------------------------
>
>                 Key: HADOOP-2075
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2075
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>            Reporter: stack
>            Priority: Minor
>
> Hbase needs tools to facilitate bulk upload and possibly dumping.  Going via 
> the current APIs, particularly if the dataset is large and cell content is 
> small, uploads can take a long time even when using many concurrent clients.
> PNUTS folks talked of need for a different API to manage bulk upload/dump.
> Another notion would be to somehow have the bulk loader tools somehow write 
> regions directly in hdfs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2075) [hbase] Bulk load and dump tools

Reply via email to