Stack, Thank you very much for your comments. I am running a cluster with 20 nodes. I set 19 as both regionserver and zookeeper quorums. The versions I am using are Hadoop0.20.1 and HBase0.20.1. I started with an empty table and try to load 200 million records into that empty table. There is a key in each record. Logically, in my MR program, during the setup, I opened an HTable, in my mapper, I fetch the record from HTable via key in the record, then make some changes to the columns and update that row back to HTable through TableOutputFormat by passing a put. There is no reduce tasks involved here. (Though it is unnecessary to fetch row from an empty table, I just intended to do that)
Additionally, when I reduced the number of regionservers and number of zookeeper quorums. I had different errors: org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:929) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:580) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:562) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:693) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:589) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:562) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:693) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:593) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:556) at org.apache.hadoop.hbase.client.HTable.(HTable.java:127) at org.apache.hadoop.hbase.client.HTable.(HTable.java:105) at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.getRecordWriter(TableOutputFormat.java:116) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:573) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Many thanks in advance. zhenyu On Wed, Oct 28, 2009 at 12:39 PM, stack <[email protected]> wrote: > Whats your cluster topology? How many nodes involved? When you see the > below message, how many regions in your table? How are you loading your > table? > Thanks, > St.Ack > > On Wed, Oct 28, 2009 at 7:45 AM, Zhenyu Zhong <[email protected] > >wrote: > > > Nitay, > > > > I am very appreciated. > > > > As Ryan suggested, I increased the zookeeper session timeout to 40seconds > > along with the GC options -XX:ParallelGCThreads=8 > -XX:+UseConcMarkSweepGC > > in place. I set the Heapsize to 4GB. I also set the vm.swappiness=0. > > > > However it still ran into problem. Please find the following errors. > > > > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to > > contact region server x.x.x.x:60021 for region > > YYYY,117.99.7.153,1256396118155, row '1170491458', but failed after 10 > > attempts. > > Exceptions: > > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed > > setting up proxy to /x.x.x.x:60021 after attempts=1 > > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed > > setting up proxy to /x.x.x.x:60021 after attempts=1 > > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed > > setting up proxy to /x.x.x.x:60021 after attempts=1 > > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed > > setting up proxy to /x.x.x.x:60021 after attempts=1 > > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed > > setting up proxy to /x.x.x.x:60021 after attempts=1 > > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed > > setting up proxy to /x.x.x.x:60021 after attempts=1 > > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed > > setting up proxy to /x.x.x.x:60021 after attempts=1 > > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed > > setting up proxy to /x.x.x.:60021 after attempts=1 > > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed > > setting up proxy to /x.x.x.x:60021 after attempts=1 > > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed > > setting up proxy to /x.x.x.x:60021 after attempts=1 > > > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1001) > > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:413) > > > > > > The input file is about 10GB around 200million rows of data. > > This load doesn't seem too large. However this kind of errors keep > popping > > up. > > > > Does Regionserver need to be deployed to dedicated machines? > > Does Zookeeper need to be deployed to dedicated machines as well? > > > > Best, > > zhenyu > > > > > > > > On Wed, Oct 28, 2009 at 1:37 AM, nitay <[email protected]> wrote: > > > > > Hi Zhenyu, > > > > > > Sorry for the delay. I started working on this a while back, before I > > left > > > my job for another company. Since then I haven't had much time to work > on > > > HBase unfortunately :(. I'll try to dig up what I had and see what > shape > > > it's in and update you. > > > > > > Cheers, > > > -n > > > > > > > > > On Oct 27, 2009, at 3:38 PM, Ryan Rawson wrote: > > > > > > Sorry I must have mistyped, I meant to say "40 seconds". You can > > >> still see multi-second pauses at times, so you need to give yourself a > > >> bigger buffer. > > >> > > >> The parallel threads argument should not be necessary, but you do need > > >> the UseConcMarkSweepGC flag as well. > > >> > > >> Let us know how it goes! > > >> -ryan > > >> > > >> > > >> On Tue, Oct 27, 2009 at 3:19 PM, Zhenyu Zhong < > [email protected]> > > >> wrote: > > >> > > >>> Ryan, > > >>> I am very appreciated for your feedbacks. > > >>> I have set the zookeeper.session.timeout to seconds which is way > higher > > >>> than > > >>> 40ms. > > >>> In the same time, the -Xms is set to 4GB, which should be sufficient. > > >>> I also tried GC options like > > >>> > > >>> -XX:ParallelGCThreads=8 > > >>> -XX:+UseConcMarkSweepGC > > >>> > > >>> I even set the vm.swappiness=0 > > >>> > > >>> However, I still came across the problem that a RegionServer shutdown > > >>> itself. > > >>> > > >>> Best, > > >>> zhong > > >>> > > >>> > > >>> On Tue, Oct 27, 2009 at 6:05 PM, Ryan Rawson <[email protected]> > > wrote: > > >>> > > >>> Set the ZK timeout to something like 40ms, and give the GC enough > Xmx > > >>>> so you never risk entering the much dreaded concurrent-mode-failure > > >>>> whereby the entire heap must be GCed. > > >>>> > > >>>> Consider testing Java 7 and the G1 GC. > > >>>> > > >>>> We could get a JNI thread to do this, but no one has done so yet. I > am > > >>>> personally hoping for G1 and in the meantime overprovision our Xmx > to > > >>>> avoid the concurrent mode failures. > > >>>> > > >>>> -ryan > > >>>> > > >>>> On Tue, Oct 27, 2009 at 2:59 PM, Zhenyu Zhong < > > [email protected]> > > >>>> wrote: > > >>>> > > >>>>> Ryan, > > >>>>> > > >>>>> Thank you very much. > > >>>>> May I ask whether there are any ways to get around this problem to > > make > > >>>>> HBase more stable? > > >>>>> > > >>>>> best, > > >>>>> zhong > > >>>>> > > >>>>> > > >>>>> > > >>>>> On Tue, Oct 27, 2009 at 4:06 PM, Ryan Rawson <[email protected]> > > >>>>> wrote: > > >>>>> > > >>>>> There isnt any working code yet. Just an idea, and a prototype. > > >>>>>> > > >>>>>> There is some sense that if we can get the G1 GC that we could get > > rid > > >>>>>> of all long pauses, and avoid the need for this. > > >>>>>> > > >>>>>> -ryan > > >>>>>> > > >>>>>> On Mon, Oct 26, 2009 at 2:30 PM, Zhenyu Zhong < > > >>>>>> [email protected]> > > >>>>>> wrote: > > >>>>>> > > >>>>>>> Hi, > > >>>>>>> > > >>>>>>> I am very interesting to the solution that Joey proposed and > would > > >>>>>>> > > >>>>>> like > > >>>> > > >>>>> to > > >>>>>> > > >>>>>>> have a try. > > >>>>>>> Does anyone have any ideas on how to deploy this zk_wrapper in > JNI > > >>>>>>> integration? > > >>>>>>> > > >>>>>>> I would be very appreciated. > > >>>>>>> > > >>>>>>> thanks > > >>>>>>> zhong > > >>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > > > > >
