I am trying to use this technique to say bulk load 20 billion rows.  I
tried it on a smaller set 20 million rows. A few things I had to take
care was to write a custom partitioning logic so that a range of keys
only go to a particular reduce since there was some mention of global
ordering.
For example  Users  (1 --  1mill) ---> Reducer 1 and so on

My questions are:
1.  Can I divide the bulk loading into multiple runs  --  the existing
bulk load bails out if it finds a HDFS output directory with the same
name
2.  What I want to do is make multiple runs of 10 billion and then
combine the output before running  loadtable.rb --  is this possible ?
I am thinking this may be required in case my MR bulk loading fails in
between and I need to start from where I crashed

Any tips with huge bulk loading experience ?


-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of
stack
Sent: Thursday, January 14, 2010 6:19 AM
To: [email protected]
Subject: Re: HBase bulk load

See
http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/
mapreduce/package-summary.html#bulk
St.Ack

On Wed, Jan 13, 2010 at 4:30 PM, Ted Yu <[email protected]> wrote:

> Jonathan:
> Since you implemented
>
>
https://issues.apache.org/jira/si/jira.issueviews:issue-html/HBASE-48/HB
ASE-48.html
> ,
> maybe you can point me to some document how bulk load is used ?
> I found bin/loadtable.rb and assume that can be used to import data
back
> into HBase.
>
> Thanks
>

This email is sent for and on behalf of Ivy Comptech Private Limited. Ivy 
Comptech Private Limited is a limited liability company.  

This email and any attachments are confidential, and may be legally privileged 
and protected by copyright. If you are not the intended recipient dissemination 
or copying of this email is prohibited. If you have received this in error, 
please notify the sender by replying by email and then delete the email 
completely from your system. 
Any views or opinions are solely those of the sender.  This communication is 
not intended to form a binding contract on behalf of Ivy Comptech Private 
Limited unless expressly indicated to the contrary and properly authorised. Any 
actions taken on the basis of this email are at the recipient's own risk.

Registered office:
Ivy Comptech Private Limited, Cyber Spazio, Road No. 2, Banjara Hills, 
Hyderabad 500 033, Andhra Pradesh, India. Registered number: 37994. Registered 
in India. A list of members' names is available for inspection at the 
registered office.

Reply via email to