Re: hundreds of reads of a metafile

Raghu Angadi Mon, 12 Apr 2010 16:00:42 -0700

On Fri, Apr 9, 2010 at 11:41 AM, Stack <[email protected]> wrote:

> On Thu, Apr 8, 2010 at 2:26 PM, Raghu Angadi <[email protected]> wrote:
> > Regionserver log on node 72 is at : http://bit.ly/cd9acU  (160K
> gzipped). To
> > give a scale of reads, the local datanode had 3.5K reads spread across
> about
> > 20 blocks. Pretty much all the reads were from the same DFSClient ID.
> will
> > watch if this happens again.
> >
>


Sorry for the delay. This is 3.5K reads per second. This goes on for may be
minutes.
We have not been running load against this cluster yet. will update soon
with behavior with the cluster stability.

Overall, do you think writes are expected to be gracefully slowed down if
the RAM is not enough?

Raghu.

Thanks for digging in Raghu.
>
> Looking in logs, lots of churn -- other regionservers shedding
> regions, restarting?  -- and write load would probably do better if
> given more RAM.  You keep hitting the ceiling where we put down a gate
> blocking writes until flushes finish.
>
> What time interval are we talking of regards the 3.5k reads across 20
> blocks?  Were the 20 blocks under ${HBASE_ROOTDIR}/.META. directory?
> This regionserver was carrying the .META. region it looks like so its
> going to be popular.  I'd think the cache should be running
> interference on i/o but maybe its not doing a good job of it.  The
> write load/churn might be blowing the cache.  Yeah, log at DEBUG and
> we'll get a better idea.
>
> You re doing big upload?
>
> Cache is a config where you set how much of heap to allocate.  Default
> is 0.2 IIRC.
> St.Ack
>
>
>
>
>
> > We are increasing memory from 3GB to 6GB. Any pointers about how to set
> size
> > of block cache will be helpful.
> > will enable DEBUG for LruBlockCache.
> >
> > Raghu.
> >
> > On Thu, Apr 8, 2010 at 12:46 AM, Stack <[email protected]> wrote:
> >
> >> On Thu, Apr 8, 2010 at 12:20 AM, Raghu Angadi <[email protected]>
> wrote:
> >> > Thanks Stack. will move to 20.3 or 20 trunk very soon. more responses
> >> inline
> >> > below.
> >> >
> >> Do.  A 0.20.4 should be around in next week or so which will be better
> >> still, FYI.
> >>
> >> > On Wed, Apr 7, 2010 at 8:52 PM, Stack <[email protected]> wrote:
> >> >
> >> >> On Wed, Apr 7, 2010 at 7:49 PM, Raghu Angadi <[email protected]>
> >> wrote:
> >> >> > We are working with a small HBase cluster (5 nodes) with fairly
> beefy
> >> >> nodes.
> >> >> > While looking at why all the regionservers died at one time,
> noticed
> >> that
> >> >> > these servers read some files 100s of times a second. This may not
> be
> >> >> cause
> >> >> > of the error... but do you think this is odd?
> >> >> >
> >> >> Check end of regionserver log.  Should say why RegionServer went
> away.
> >> >>  The usual reason is long GC pause, one that is longer than zk
> session
> >> >> timeout.
> >> >
> >> >
> >> > This seems to be the case... There were CMS GC failures (promotion
> >> failed,
> >> > Full GC etc). There were 4-5 pauses of about 4-10 seconds over a
> minute
> >> or
> >> > so. Is that enough to kill ZK session? We are increasing the memory
> and
> >> will
> >> > go through tuning tips on wiki.
> >> >
> >>
> >> ZK session in your 0.20.1 is probably 40 seconds IIRC but yeah, this
> >> is common enough until a bit of tuning is done.  If you update to
> >> 0.20.3 at least, the zk session is 60 seconds but you should try and
> >> avoid the promotion failures if you can.
> >>
> >> > There are various other errors in the log over couple of hours of RS
> run.
> >> > will post a link to the full log.
> >> >
> >> > --- failure on RS-72 ---
> >> > 2010-04-06 22:11:07,668 WARN org.apache.zookeeper.ClientCnxn:
> Exception
> >> > closing session 0x127d58da4e70002 to
> sun.nio.ch.selectionkeyi...@426295eb
> >> > java.io.IOException: TIMED OUT
> >> > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
> >> > 2010-04-06 22:11:07,668 WARN org.apache.zookeeper.ClientCnxn:
> Exception
> >> > closing session 0x27d58da6de0088 to
> sun.nio.ch.selectionkeyi...@283f4633
> >> > java.io.IOException: TIMED OUT
> >> > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
> >> > 2010-04-06 22:11:07,672 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> >> Server
> >> > handler 27 on 60020, call put([...@20a192c7,
> >> > [Lorg.apache.hadoop.hbase.client.Put;@4fab578d) from 10.10.0.72:60211
> :
> >> > error: java.io.IOException: Server not running, aborting
> >> > java.io.IOException: Server not running, aborting
> >> > at
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2345)
> >> > ------------
> >>
> >>
> >> These look like the zk session time out sequence.
> >>
> >>
> >> >
> >> > --- failure on RS-73 after a few minutes ---
> >> >
> >> > 2010-04-06 22:21:41,867 INFO
> >> > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> >> > -4957903368956265878 lease expired
> >> > 2010-04-06 22:21:47,806 WARN org.apache.zookeeper.ClientCnxn:
> Exception
> >> > closing session 0x127d58da4e7002a to
> sun.nio.ch.selectionkeyi...@15ef1241
> >> > java.io.IOException: TIMED OUT
> >> > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
> >> > 2010-04-06 22:21:47,806 ERROR
> >> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> >> > java.lang.OutOfMemoryError: Java heap space
> >> > at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:39)
> >> > at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
> >> > -------
> >> >
> >>
> >> This is zk session timeout and an OOME.  The GC couldn't succeed.
> >>
> >> How much memory you giving these puppies?  CMS is kinda sloppy so you
> >> need to give it a bit more space to work in.
> >>
> >> >  [...]
> >> >
> >> >> > 2010-04-06 21:51:43,328 INFO org.apache.hadoop.hdfs.DFSClient:
> Could
> >> not
> >> >> > obtain block blk_-7610953303919156937_1089667 from any node:
> >> >> >  java.io.IOException: No live nodes contain current block
> >> >> > [...]
> >> >> > ----
> >> >>
> >> >> Are you accessing from mapreduce?  If so, does your hadoop have
> >> hdfs-127?
> >> >>
> >> >> Then there are the usual suspects.  Xceivers count -- up it to 2k or
> >> >> so -- and ulimit should be much greater than the default 1024.
> >> >
> >> >
> >> > yes. Most of the traffic now is puts from reducers. I think HDFS is a
> >> recent
> >> > Cloudera release. I will check. Most likely it wont have hdfs-127.
> >> >
> >>
> >> My guess is that it does.. but yeah, check (You should remember that
> >> one -- smile)
> >>
> >>
> >> > yup.. we hit Xceivers limit very early. The limit is 2k and fd limit
> is
> >> also
> >> > high.
> >> >
> >> > [...]
> >> >
> >> >> > There are thousands of repeated reads of many small files like
> this.
> >> >> >
> >> >> > --- From NN log, this block was created
> >> >> > for /hbase/.META./1028785192/info/1728561479703335912
> >> >> > 2010-04-06 21:51:20,906 INFO org.apache.hadoop.hdfs.StateChange:
> >> BLOCK*
> >> >> > NameSystem.allocateBlock:
> >> >> /hbase/.META./1028785192/info/1728561479703335912.
> >> >> > blk_8972126557191254374_1090962
> >> >> > ----
> >> >> >
> >> >> > Btw, we had single replication set for this file by mistake.
> >> >> >
> >> >>
> >> >> So, if any error getting a block, there is no recourse.  Was there
> >> >> concurrent processes sucking i/o from HDFS running at same time?
> >> >
> >> >
> >> >
> >> >> Writing, clients need to figure where to write.  They'll do this by
> >> >> doing lookup in .META.  They'll then cache the info.  If clients are
> >> >> short-lived, then lots of .META. hits.
> >> >>
> >> >
> >> > Client here is HBase client (in our case reducers)?
> >> >
> >>
> >> Your reducers run for a while?  So clients have chance to exploit
> >> cache of meta data info?
> >>
> >> >
> >> >> And as Ryan says, whats the caching stats look like for the .META.
> >> >> region?  (See server it was hosted on and check its logs -- we dump
> >> >> cache metrics every minute or so).
> >> >>
> >> >
> >> > What should I look for in the log for these stats? I didn't notice
> stats.
> >> I
> >> > will send link to  full log.
> >> >
> >>
> >> Looks like this in regionserver log: 2010-04-08 00:43:51,822 DEBUG
> >> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
> >> Total=6.6769104MB (7001248), Free=807.12305MB (846329888), Max=813.8MB
> >> (853331136), Counts: Blocks=0, Access=202589, Hit=0, Miss=202589,
> >> Evictions=0, Evicted=0, Ratios: Hit Ratio=0.0%, Miss Ratio=100.0%,
> >> Evicted/Run=NaN
> >>
> >> You need to enable DEBUG (I'd point you at FAQ for how to do this but
> >> you know how).
> >>
> >> St.Ack
> >>
> >> > Raghu.
> >> >
> >> >
> >> >> St.Ack
> >> >>
> >> >
> >>
> >
>

Re: hundreds of reads of a metafile

Reply via email to