[ 
https://issues.apache.org/jira/browse/HBASE-11714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-11714:
-------------------------------

    Description: 
Discussed on the user@hbase mailing list 
(http://markmail.org/thread/w3cqjxwo2smkn2jw)

"Recently switched from 0.94 and 0.98, and finding that periodically things
are having issues - lots of retry exceptions" :

2014-08-08 17:22:43 o.a.h.h.c.AsyncProcess [INFO] #105158,
table=rt_global_monthly_campaign_deliveries, attempt=10/35 failed 500 ops,
last exception: java.net.SocketTimeoutException: Call to
ip-10-201-128-23.us-west-1.compute.internal/10.201.128.23:60020 failed
because java.net.SocketTimeoutException: 2000 millis timeout while waiting
for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10.248.130.152:46014
remote=ip-10-201-128-23.us-west-1.compute.internal/10.201.128.23:60020] on
ip-10-201-128-23.us-west-1.compute.internal,60020,1405642103651, tracking
started Fri Aug 08 17:21:55 UTC 2014, retrying after 10043 ms, replay 500
ops.

there are 2 methods in RpcRetryingCaller: callWithRetries and 
callWithoutRetries.
it looks the timeout setup of callWithRetries is good, while callWithoutRetries 
is wrong(multi RPC for this user): caller cannot specify a valid timeout, but 
callWithoutRetries still calls beforeCall, which looks a method for 
callWithRetries only,  to set timeout. since RpcRetryingCaller#callTimeout  is 
not set, thread local timeout is set to 2s(MIN_RPC_TIMEOUT) via 
RpcClient.setRpcTimeout, which is the final pinginterval set to the socket.

when there are heavy write workload and the rpc cannot complete in 2s, the 
client close the connection, so the server side connection is reset and finally 
cause problem in HBASE-11705

> RpcRetryingCaller#callWithoutRetries set rpc timeout to 2 seconds incorrectly
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-11714
>                 URL: https://issues.apache.org/jira/browse/HBASE-11714
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Qiang Tian
>
> Discussed on the user@hbase mailing list 
> (http://markmail.org/thread/w3cqjxwo2smkn2jw)
> "Recently switched from 0.94 and 0.98, and finding that periodically things
> are having issues - lots of retry exceptions" :
> 2014-08-08 17:22:43 o.a.h.h.c.AsyncProcess [INFO] #105158,
> table=rt_global_monthly_campaign_deliveries, attempt=10/35 failed 500 ops,
> last exception: java.net.SocketTimeoutException: Call to
> ip-10-201-128-23.us-west-1.compute.internal/10.201.128.23:60020 failed
> because java.net.SocketTimeoutException: 2000 millis timeout while waiting
> for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/10.248.130.152:46014
> remote=ip-10-201-128-23.us-west-1.compute.internal/10.201.128.23:60020] on
> ip-10-201-128-23.us-west-1.compute.internal,60020,1405642103651, tracking
> started Fri Aug 08 17:21:55 UTC 2014, retrying after 10043 ms, replay 500
> ops.
> there are 2 methods in RpcRetryingCaller: callWithRetries and 
> callWithoutRetries.
> it looks the timeout setup of callWithRetries is good, while 
> callWithoutRetries is wrong(multi RPC for this user): caller cannot specify a 
> valid timeout, but callWithoutRetries still calls beforeCall, which looks a 
> method for callWithRetries only,  to set timeout. since 
> RpcRetryingCaller#callTimeout  is not set, thread local timeout is set to 
> 2s(MIN_RPC_TIMEOUT) via RpcClient.setRpcTimeout, which is the final 
> pinginterval set to the socket.
> when there are heavy write workload and the rpc cannot complete in 2s, the 
> client close the connection, so the server side connection is reset and 
> finally cause problem in HBASE-11705



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to