20 nodes is good enough to begins with. How much memory do you have on each node? IMO, you should keep 1GB per daemon and 1GB for the MR job like Andrew suggested. You dont necessarily have to separate the datanodes and tasktrackers as long as you have enough resources. 10000 rows isnt big at all from hbase standpoint. What kind of computation are you doing before dumping data into hbase? And what versions of Hadoop and Hbase are you running?
There's another thing you should do. Increase the DataXceivers limit to 2048 (thats what I use). If you have root privelege over the cluster, then increase the file limit to 32k (see hbase faq for details). Try this out and see how it goes. Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Tue, Apr 7, 2009 at 2:45 AM, Rakhi Khatwani <[email protected]>wrote: > Hi, > I have a 20 node cluster on ec2(small instance).... i have a set of > tables which store huge amount of data (tried wid 10,000 rows... more to be > added).... but during my map reduce jobs, some of the region servers shut > down thereby causing data loss, stop in my program execution and infact one > of my tables got damaged. when ever i scan the table, i get the could not > obtain block error. > > 1. i want to make the cluster more robust. since it contains a lot of data. > and its really important that they remain stable. > 2. if one of my tables gets damaged (even after restarting dfs n hbase), > how > do i go about recovering it? > > my ec2 cluster mostly has the default configuration. > with hadoop-site n hbase-site have some entries pertaining to map-reduce > (for example. num of map tasks, mapred.task.timeout etc). > > Your help will be greatly appreciated. > Thanks, > Raakhi Khatwani >
