I am trying to use this technique to say bulk load 20 billion rows. I tried it on a smaller set 20 million rows. A few things I had to take care was to write a custom partitioning logic so that a range of keys only go to a particular reduce since there was some mention of global ordering. For example Users (1 -- 1mill) ---> Reducer 1 and so on
My questions are: 1. Can I divide the bulk loading into multiple runs -- the existing bulk load bails out if it finds a HDFS output directory with the same name 2. What I want to do is make multiple runs of 10 billion and then combine the output before running loadtable.rb -- is this possible ? I am thinking this may be required in case my MR bulk loading fails in between and I need to start from where I crashed Any tips with huge bulk loading experience ? -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of stack Sent: Thursday, January 14, 2010 6:19 AM To: [email protected] Subject: Re: HBase bulk load See http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/ mapreduce/package-summary.html#bulk St.Ack On Wed, Jan 13, 2010 at 4:30 PM, Ted Yu <[email protected]> wrote: > Jonathan: > Since you implemented > > https://issues.apache.org/jira/si/jira.issueviews:issue-html/HBASE-48/HB ASE-48.html > , > maybe you can point me to some document how bulk load is used ? > I found bin/loadtable.rb and assume that can be used to import data back > into HBase. > > Thanks > This email is sent for and on behalf of Ivy Comptech Private Limited. Ivy Comptech Private Limited is a limited liability company. This email and any attachments are confidential, and may be legally privileged and protected by copyright. If you are not the intended recipient dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. Any views or opinions are solely those of the sender. This communication is not intended to form a binding contract on behalf of Ivy Comptech Private Limited unless expressly indicated to the contrary and properly authorised. Any actions taken on the basis of this email are at the recipient's own risk. Registered office: Ivy Comptech Private Limited, Cyber Spazio, Road No. 2, Banjara Hills, Hyderabad 500 033, Andhra Pradesh, India. Registered number: 37994. Registered in India. A list of members' names is available for inspection at the registered office.
