I don't see the RetriesExhaustedException at the client. But the pattern
that I see is that each client thread makes a certain number of requests
(none of those requests actually succeed), before one of its requests gets
blocked and hangs. That number is the same every time. But, nothing
particularly useful in the client log. I also see nothing in the region
server log. I tried turning up logging in the regionservers from INFO to
DEBUG, but on restart, I see no extra debug messages. To make the change, I
just changed hbase.root.logger in hbase/conf/log4j.properties, but no
effect. Am I missing something?
This problem happens even after an Hbase restart.
I've tried different number of client threads. For small numbers (1-5 e.g.)
I get stuck. I tried some very high numbers like 200, and that always makes
progress, although it slows way down. So it seems that many client threads
are getting stuck.
I haven't been able to figure out which region servers have the problem. I
only have 3, though, so I did a thread dump on each. I looked at the
handlers and they all look like this:
"IPC Server handler 94 on 60020" daemon prio=10 tid=0x08550c00 nid=0x6014
waiting on condition [0x6b1cd000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x78382718> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(
AbstractQueuedSynchronizer.java:1925)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:901)
Let me know if there are other threads I should pay attention to.
Finally, as for version, I have the official 0.20.2 release, which I
downloaded perhaps 2 months ago.
Thanks,
Adam
On 1/14/10 10:51 PM, "stack" <[email protected]> wrote:
> On Wed, Jan 13, 2010 at 10:48 PM, Adam Silberstein
> <[email protected]>wrote:
>
>> ....
>> And then, 5 seconds into it, I start seeing these:
>>
>> 10/01/13 22:33:27 DEBUGclient.HConnectionManager$TableServers: Reloading
>> region usertable,user263042644,1263431524192 location because regionserver
>> didn't accept updates; tries=0 of max=10, waiting=2000ms
>>
>> But the region servers are all up, and none logged any messages during the
>> client run.
>>
>
> This sounds like a lockup Adam or a close out.. like the client can't get
> into the server because all its handlers are occupied. Does it got
> to RetriesExhaustedException? If so, whats the log message. Would be
> interesting to see if any of the edits in the batch are making it on to the
> server. Can you figure which server its going to? Thread dump it? Your
> hbase is up to date? There was a deadlock in hbase there for about a day
> about a week or so back. Maybe you picked it up?
>
> St.Ack