Ved Prakash wrote:
Hi,

I am trying to upload close to 4 GB of data on hbase, but I am not able to
do. Following are my observations:

1. Before failing number of rows inserted were about 800,000, and the time
taken for this upload was close to 6 hours, and this is just 1/10th of total
rows that I have to insert,. This way it would take me ages before I can
insert all the data into the table. And by the time I would be finished I
would have similar size ready for another insertion. Is there any better way
to do this?

Tell us more about your setup Ved. How many regionservers? What version of hbase? Is your uploader single-threaded (If so, you need to fix this)?

2. I saw this happens whenever a datanode goes down. Doesn't hbase have
mechanism where it continues loading data to other datanodes, in case one
fails?
It does. Regions from failed server get deployed elsewhere and away we go again (at least, thats how its supposed to work in theory).

For inserting data in the table, I wrote a php script having database
connectivity using the REST [ https://issues.apache.org/jira/browse/HBASE-37]
Since the datafile is big, and on my first execution I faced a similar
situation even when none of my datanodes were down. I thought of breaking it
into smaller files and then processing them, but, it doesn't help.

Tell us also more about your data format and schema. What size are the inserts? Are you inserting rows in lexicographically ascending order (not so good) or are your inserts to random locations in the row namespace (better)?

Its probably smart breaking up a big file into smaller pieces, unless you have a means of resuming from where uploads failed.

St.Ack



Can you guys help me with this.

Thanks
-------------------------------------------------

hbase-hadoop-


Reply via email to