Ted,
Your attachments didn't come through. Try putting them up on the web or pastebin somewhere. What's happening in the RegionServer logs between the time that the get works and the get doesn't work? Also, I recommend upgrading to 0.20.3, there are critical fixes. From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Saturday, March 13, 2010 3:48 AM To: hbase-user@hadoop.apache.org Subject: Re: slow response in hbase shell We use hbase 0.20.1 There are 3 region servers. 13 regions on each server. I don't see any exception in master log. I was able to run 10 successful get commands before hitting the following: I am attaching (partial) master log and region server log from 10.10.31.137 hbase(main):014:0> get 'ruletable', 'com.about.acne' COLUMN CELL lpm_1.0:category timestamp=1268347483823, value=http://acne.about.com\t9002:0.86580086\thost\n 1 row(s) in 0.0120 seconds hbase(main):015:0> get 'ruletable', 'com.about.acne' NativeException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server 10.10.31.137:60020 for region ruletable,,1268083966723, row 'com.about.acne', but failed after 5 attempts. Exceptions: org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: ruletable,,1268083966723 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.j ava:2307) at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:17 84) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl .java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) On Fri, Mar 12, 2010 at 9:08 PM, Jonathan Gray <jl...@streamy.com> wrote: Seems like something weird is going on with your regionservers and balancing. Can you post big snippets from the regionserver and master logs? Have you checked to see what's going on? Is there repetitive balancing going on that never seems to reach steady-state? How many regions and how many nodes on which version of HBase? > -----Original Message----- > From: Ted Yu [mailto:yuzhih...@gmail.com] > Sent: Friday, March 12, 2010 8:24 PM > To: hbase-user@hadoop.apache.org > Subject: slow response in hbase shell > > Hi, > > > We sometimes saw over 5 second delay running get in hbase 0.20.1 > shell: > > > > hbase(main):002:0> get 'ruletable', 'ca.tsn.www' > > 0 row(s) in 10.1330 seconds > > > > From our 3 region servers there are a lot of such messages: > > > > > > 2010-03-12 00:00:00,996 INFO [regionserver/10.10.31.135:60020] > > regionserver.HRegionServer(493): MSG_REGION_CLOSE: > > > crawltable,com.pandora.www:http\x2Finclude\x2FlyricsAdEmbed.html\x3Fgen > re\x3Delectronica\x26artist\x3DR273847\x26webname\x3D\x26sz\x3D2000x8\x > 26ord\x3D125823029226371645\x26tile\x3D3\x26_id\x3Dbottom_leaderboard_c > ontainer,1266944566406: > > Overloaded > > 2010-03-12 00:00:00,997 INFO [regionserver/10.10.31.135:60020] > > regionserver.HRegionServer(493): MSG_REGION_CLOSE: > > domaincrawltable,,1268098564908: Overloaded > > > > grep Overloaded hbase-hbaseadmin-regionserver.log | wc > > 40428 343638 11523230 > > > > grep Overloaded hbase-hbaseadmin-regionserver.log | wc > > 40430 343655 11307703 > > > > grep Overloaded hbase-hbaseadmin-regionserver.log | wc > > 40466 343961 11340379 > > > > Was the slow response due to the load balancing ? > > > > The strange thing was that after several quick responses I would see: > > hbase(main):004:0> get 'ruletable', 'com.about.acne' > COLUMN > CELL > lpm_1.0:category timestamp=1268347483823, value= > http://acne.about.com\t9002:0.86580 086\thost\n > 1 row(s) in 0.0040 seconds > hbase(main):005:0> get 'ruletable', 'com.about.acne' > COLUMN > CELL > lpm_1.0:category timestamp=1268347483823, value= > http://acne.about.com\t9002:0.86580 086\thost\n > 1 row(s) in 0.0040 seconds > hbase(main):006:0> get 'ruletable', 'com.about.acne' > NativeException: > org.apache.hadoop.hbase.client.RetriesExhaustedException: > Trying to contact region server 10.10.31.136:60020 for region > ruletable,,1268083966723, row 'com.about.acne', but failed after 5 > attempts. > Exceptions: > org.apache.hadoop.hbase.NotServingRegionException: > org.apache.hadoop.hbase.NotServingRegionException: > ruletable,,1268083966723 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer > ver.java:2307) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.ja > va:1784) > at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso > rImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:91 > 5) > > But 006:60010/master.jsp refreshes quickly and shows all three > regionservers. > Don't know why hbase shell encountered NSRE. > > > > > Thanks