[ 
https://issues.apache.org/jira/browse/HBASE-11714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-11714:
-------------------------------

    Component/s: IPC/RPC

> RpcRetryingCaller#callWithoutRetries set rpc timeout to 2 seconds incorrectly
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-11714
>                 URL: https://issues.apache.org/jira/browse/HBASE-11714
>             Project: HBase
>          Issue Type: Bug
>          Components: IPC/RPC
>    Affects Versions: 0.98.3
>            Reporter: Qiang Tian
>            Assignee: Qiang Tian
>
> Discussed on the user@hbase mailing list 
> (http://markmail.org/thread/w3cqjxwo2smkn2jw)
> "Recently switched from 0.94 and 0.98, and finding that periodically things
> are having issues - lots of retry exceptions" :
> 2014-08-08 17:22:43 o.a.h.h.c.AsyncProcess [INFO] #105158,
> table=rt_global_monthly_campaign_deliveries, attempt=10/35 failed 500 ops,
> last exception: java.net.SocketTimeoutException: Call to
> ip-10-201-128-23.us-west-1.compute.internal/10.201.128.23:60020 failed
> because java.net.SocketTimeoutException: 2000 millis timeout while waiting
> for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/10.248.130.152:46014
> remote=ip-10-201-128-23.us-west-1.compute.internal/10.201.128.23:60020] on
> ip-10-201-128-23.us-west-1.compute.internal,60020,1405642103651, tracking
> started Fri Aug 08 17:21:55 UTC 2014, retrying after 10043 ms, replay 500
> ops.
> there are 2 methods in RpcRetryingCaller: callWithRetries and 
> callWithoutRetries.
> it looks the timeout setup of callWithRetries is good, while 
> callWithoutRetries is wrong(multi RPC for this user): caller cannot specify a 
> valid timeout, but callWithoutRetries still calls beforeCall, which looks a 
> method for callWithRetries only,  to set timeout. since 
> RpcRetryingCaller#callTimeout  is not set, thread local timeout is set to 
> 2s(MIN_RPC_TIMEOUT) via RpcClient.setRpcTimeout, which is the final 
> pinginterval set to the socket.
> when there are heavy write workload and the rpc cannot complete in 2s, the 
> client close the connection, so the server side connection is reset and 
> finally cause problem in HBASE-11705



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to