Hi Amandeep, Following is my ec2 cluster configuration: High-CPU Medium Instance 1.7 GB of memory, 5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each), 350 GB of instance storage, 32-bit platform
so I don't think I have much option when it comes to the GB part. iHowever, is there any way i can make use of 5ec2 compute units to increase my performance? Regarding the table splits, I dont see hbase doing the table spilts automatically. After loading about 17000 rows in table1, I can still see it as one region (after checking it on web UI). thats why i had to manually split it. or is there any configuration/settings I have to do to ensure that the tables are split automatically? I will increase the dataXceivers and ulimit to 32k Thanks a ton Rakhi. > > > Hi Amandeep, > > I have 1GB Memory on each node on ec2 cluster(C1 Medium) > . > > i am using hadoop-0.19.0 and hbase-0.19.0 > > well we were starting with 10,000 rows, but later it will go up to > 100,000 > > rows. > > > 1GB is too low. You need around 4GB to get a stable system. > > > > > > > my map task basically reads an hbase table 'Table1', performs analysis on > > each row, and dumps the analysis results into another hbase table > 'Table2'. > > each analysis task takes about 3-4 minutes when tested on local machine > > (the > > algorithm part.... w/o the map reduce). > > > > i have divided 'Table1' to 30 regions b4 sending it to the map. and set > the > > maximum number of map tasks to 20. > > Let hbase do the division into regions. Leave the table as it is in default > state. > > > > > i have set DataXceivers to 1024 and uLimit to 1024 > > yes.. increase these.. > 2048 dataxceivers and 32k ulimit. > > > > > i am able to process about 300 rows in an hour which i feel quite slow... > > how do i increase the performance. > > the reaons are mentioned above. > > > > > > > meanwhile i will try settin the dataXceivers to 2048 and increasing the > > file > > limit as you mentioned. > > > > Thanks, > > Rakhi > > > > On Wed, Apr 8, 2009 at 11:40 AM, Amandeep Khurana <[email protected]> > > wrote: > > > > > 20 nodes is good enough to begins with. How much memory do you have on > > each > > > node? IMO, you should keep 1GB per daemon and 1GB for the MR job like > > > Andrew > > > suggested. > > > You dont necessarily have to separate the datanodes and tasktrackers as > > > long > > > as you have enough resources. > > > 10000 rows isnt big at all from hbase standpoint. What kind of > > computation > > > are you doing before dumping data into hbase? And what versions of > Hadoop > > > and Hbase are you running? > > > > > > There's another thing you should do. Increase the DataXceivers limit to > > > 2048 > > > (thats what I use). > > > > > > If you have root privelege over the cluster, then increase the file > limit > > > to > > > 32k (see hbase faq for details). > > > > > > Try this out and see how it goes. > > > > > > > > > Amandeep Khurana > > > Computer Science Graduate Student > > > University of California, Santa Cruz > > > > > > > > > On Tue, Apr 7, 2009 at 2:45 AM, Rakhi Khatwani < > [email protected] > > > >wrote: > > > > > > > Hi, > > > > I have a 20 node cluster on ec2(small instance).... i have a set > > of > > > > tables which store huge amount of data (tried wid 10,000 rows... more > > to > > > be > > > > added).... but during my map reduce jobs, some of the region servers > > shut > > > > down thereby causing data loss, stop in my program execution and > infact > > > one > > > > of my tables got damaged. when ever i scan the table, i get the could > > not > > > > obtain block error. > > > > > > > > 1. i want to make the cluster more robust. since it contains a lot of > > > data. > > > > and its really important that they remain stable. > > > > 2. if one of my tables gets damaged (even after restarting dfs n > > hbase), > > > > how > > > > do i go about recovering it? > > > > > > > > my ec2 cluster mostly has the default configuration. > > > > with hadoop-site n hbase-site have some entries pertaining to > > map-reduce > > > > (for example. num of map tasks, mapred.task.timeout etc). > > > > > > > > Your help will be greatly appreciated. > > > > Thanks, > > > > Raakhi Khatwani > > > > > > > > > >
