Re: regarding to HBase 1316 ZooKeeper: use native threads to avoid GC stalls (JNI integration)

Zhenyu Zhong Wed, 28 Oct 2009 10:56:14 -0700

Stack,

Thank you very much for your comments.
I am running a cluster with 20 nodes. I set 19 as both regionserver and
zookeeper quorums.
The versions I am using are  Hadoop0.20.1 and HBase0.20.1.
I started with an empty table and try to load 200 million records into that
empty table.
There is a key in each record. Logically, in my MR program, during the
setup, I opened an HTable, in my mapper, I fetch the record from HTable via
key in the record, then make some changes to the columns and update that row
back to HTable through TableOutputFormat by passing a put. There is no
reduce tasks involved here.  (Though it is unnecessary to fetch row from an
empty table, I just intended to do that)


Additionally, when I reduced the number of regionservers and number of
zookeeper quorums.
I had different errors:
org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
to locate root region at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:929)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:580)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:562)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:693)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:589)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:562)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:693)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:593)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:556)
at org.apache.hadoop.hbase.client.HTable.(HTable.java:127) at
org.apache.hadoop.hbase.client.HTable.(HTable.java:105) at
org.apache.hadoop.hbase.mapreduce.TableOutputFormat.getRecordWriter(TableOutputFormat.java:116)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:573) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at
org.apache.hadoop.mapred.Child.main(Child.java:170)

Many thanks in advance.
zhenyu




On Wed, Oct 28, 2009 at 12:39 PM, stack <[email protected]> wrote:

> Whats your cluster topology?  How many nodes involved?  When you see the
> below message, how many regions in your table?  How are you loading your
> table?
> Thanks,
> St.Ack
>
> On Wed, Oct 28, 2009 at 7:45 AM, Zhenyu Zhong <[email protected]
> >wrote:
>
> > Nitay,
> >
> > I am very appreciated.
> >
> > As Ryan suggested, I increased the zookeeper session timeout to 40seconds
> > along with the GC options -XX:ParallelGCThreads=8
>  -XX:+UseConcMarkSweepGC
> > in place. I set the Heapsize to 4GB.  I also set the vm.swappiness=0.
> >
> > However it still ran into problem. Please find the following errors.
> >
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
> > contact region server x.x.x.x:60021 for region
> > YYYY,117.99.7.153,1256396118155, row '1170491458', but failed after 10
> > attempts.
> > Exceptions:
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> > setting up proxy to /x.x.x.x:60021 after attempts=1
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> > setting up proxy to /x.x.x.x:60021 after attempts=1
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> > setting up proxy to /x.x.x.x:60021 after attempts=1
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> > setting up proxy to /x.x.x.x:60021 after attempts=1
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> > setting up proxy to /x.x.x.x:60021 after attempts=1
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> > setting up proxy to /x.x.x.x:60021 after attempts=1
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> > setting up proxy to /x.x.x.x:60021 after attempts=1
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> > setting up proxy to /x.x.x.:60021 after attempts=1
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> > setting up proxy to /x.x.x.x:60021 after attempts=1
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> > setting up proxy to /x.x.x.x:60021 after attempts=1
> >
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1001)
> >        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:413)
> >
> >
> > The input file is about 10GB around 200million rows of data.
> > This load doesn't seem too large. However this kind of errors keep
> popping
> > up.
> >
> > Does Regionserver need to be deployed to dedicated machines?
> > Does Zookeeper need to be deployed to dedicated machines as well?
> >
> > Best,
> > zhenyu
> >
> >
> >
> > On Wed, Oct 28, 2009 at 1:37 AM, nitay <[email protected]> wrote:
> >
> > > Hi Zhenyu,
> > >
> > > Sorry for the delay. I started working on this a while back, before I
> > left
> > > my job for another company. Since then I haven't had much time to work
> on
> > > HBase unfortunately :(. I'll try to dig up what I had and see what
> shape
> > > it's in and update you.
> > >
> > > Cheers,
> > > -n
> > >
> > >
> > > On Oct 27, 2009, at 3:38 PM, Ryan Rawson wrote:
> > >
> > >  Sorry I must have mistyped, I meant to say "40 seconds".  You can
> > >> still see multi-second pauses at times, so you need to give yourself a
> > >> bigger buffer.
> > >>
> > >> The parallel threads argument should not be necessary, but you do need
> > >> the UseConcMarkSweepGC flag as well.
> > >>
> > >> Let us know how it goes!
> > >> -ryan
> > >>
> > >>
> > >> On Tue, Oct 27, 2009 at 3:19 PM, Zhenyu Zhong <
> [email protected]>
> > >> wrote:
> > >>
> > >>> Ryan,
> > >>> I am very appreciated for your feedbacks.
> > >>> I have set the zookeeper.session.timeout to seconds which is way
> higher
> > >>> than
> > >>> 40ms.
> > >>> In the same time, the -Xms is set to 4GB, which should be sufficient.
> > >>> I also tried GC options like
> > >>>
> > >>>  -XX:ParallelGCThreads=8
> > >>> -XX:+UseConcMarkSweepGC
> > >>>
> > >>> I even set the vm.swappiness=0
> > >>>
> > >>> However, I still came across the problem that a RegionServer shutdown
> > >>> itself.
> > >>>
> > >>> Best,
> > >>> zhong
> > >>>
> > >>>
> > >>> On Tue, Oct 27, 2009 at 6:05 PM, Ryan Rawson <[email protected]>
> > wrote:
> > >>>
> > >>>  Set the ZK timeout to something like 40ms, and give the GC enough
> Xmx
> > >>>> so you never risk entering the much dreaded concurrent-mode-failure
> > >>>> whereby the entire heap must be GCed.
> > >>>>
> > >>>> Consider testing Java 7 and the G1 GC.
> > >>>>
> > >>>> We could get a JNI thread to do this, but no one has done so yet. I
> am
> > >>>> personally hoping for G1 and in the meantime overprovision our Xmx
> to
> > >>>> avoid the concurrent mode failures.
> > >>>>
> > >>>> -ryan
> > >>>>
> > >>>> On Tue, Oct 27, 2009 at 2:59 PM, Zhenyu Zhong <
> > [email protected]>
> > >>>> wrote:
> > >>>>
> > >>>>> Ryan,
> > >>>>>
> > >>>>> Thank you very much.
> > >>>>> May I ask whether there are any ways to get around this problem to
> > make
> > >>>>> HBase more stable?
> > >>>>>
> > >>>>> best,
> > >>>>> zhong
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Tue, Oct 27, 2009 at 4:06 PM, Ryan Rawson <[email protected]>
> > >>>>> wrote:
> > >>>>>
> > >>>>>  There isnt any working code yet. Just an idea, and a prototype.
> > >>>>>>
> > >>>>>> There is some sense that if we can get the G1 GC that we could get
> > rid
> > >>>>>> of all long pauses, and avoid the need for this.
> > >>>>>>
> > >>>>>> -ryan
> > >>>>>>
> > >>>>>> On Mon, Oct 26, 2009 at 2:30 PM, Zhenyu Zhong <
> > >>>>>> [email protected]>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> Hi,
> > >>>>>>>
> > >>>>>>> I am very interesting to the solution that Joey proposed and
> would
> > >>>>>>>
> > >>>>>> like
> > >>>>
> > >>>>> to
> > >>>>>>
> > >>>>>>> have a try.
> > >>>>>>> Does anyone have any ideas on how to deploy this zk_wrapper in
> JNI
> > >>>>>>> integration?
> > >>>>>>>
> > >>>>>>> I would be very appreciated.
> > >>>>>>>
> > >>>>>>> thanks
> > >>>>>>> zhong
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >
> >
>

Re: regarding to HBase 1316 ZooKeeper: use native threads to avoid GC stalls (JNI integration)

Reply via email to