Re: Regionserver error while uploading large data file

stack Fri, 14 Mar 2008 09:36:28 -0700

Ved Prakash wrote:

Hi,


I am trying to upload close to 4 GB of data on hbase, but I am not able to
do. Following are my observations:

1. Before failing number of rows inserted were about 800,000, and the time
taken for this upload was close to 6 hours, and this is just 1/10th of total
rows that I have to insert,. This way it would take me ages before I can
insert all the data into the table. And by the time I would be finished I
would have similar size ready for another insertion. Is there any better way
to do this?

Tell us more about your setup Ved. How many regionservers? Whatversion of hbase? Is your uploader single-threaded (If so, you need tofix this)?

2. I saw this happens whenever a datanode goes down. Doesn't hbase have
mechanism where it continues loading data to other datanodes, in case one
fails?

It does. Regions from failed server get deployed elsewhere and away wego again (at least, thats how its supposed to work in theory).

For inserting data in the table, I wrote a php script having database
connectivity using the REST [ https://issues.apache.org/jira/browse/HBASE-37]
Since the datafile is big, and on my first execution I faced a similar
situation even when none of my datanodes were down. I thought of breaking it
into smaller files and then processing them, but, it doesn't help.

Tell us also more about your data format and schema. What size are theinserts? Are you inserting rows in lexicographically ascending order(not so good) or are your inserts to random locations in the rownamespace (better)?

Its probably smart breaking up a big file into smaller pieces, unlessyou have a means of resuming from where uploads failed.


St.Ack

Can you guys help me with this.

Thanks
-------------------------------------------------

hbase-hadoop-

Re: Regionserver error while uploading large data file

Reply via email to