[ https://issues.apache.org/jira/browse/HBASE-11714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092592#comment-14092592 ]
Nicolas Liochon commented on HBASE-11714: ----------------------------------------- Make sense, I've updated description in HBASE-11374 with the error message. > RpcRetryingCaller#callWithoutRetries set rpc timeout to 2 seconds incorrectly > ----------------------------------------------------------------------------- > > Key: HBASE-11714 > URL: https://issues.apache.org/jira/browse/HBASE-11714 > Project: HBase > Issue Type: Bug > Components: IPC/RPC > Affects Versions: 0.98.3 > Reporter: Qiang Tian > Assignee: Qiang Tian > Fix For: 0.98.4 > > Attachments: hbase-11714-0.98.patch > > > Discussed on the user@hbase mailing list > (http://markmail.org/thread/w3cqjxwo2smkn2jw) > {quote} > "Recently switched from 0.94 and 0.98, and finding that periodically things > are having issues - lots of retry exceptions" : > {quote} > client log: > {quote} > 2014-08-08 17:22:43 o.a.h.h.c.AsyncProcess [INFO] #105158, > table=rt_global_monthly_campaign_deliveries, attempt=10/35 failed 500 ops, > last exception: java.net.SocketTimeoutException: Call to > ip-10-201-128-23.us-west-1.compute.internal/10.201.128.23:60020 failed > because java.net.SocketTimeoutException: 2000 millis timeout while waiting > for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/10.248.130.152:46014 > remote=ip-10-201-128-23.us-west-1.compute.internal/10.201.128.23:60020] on > ip-10-201-128-23.us-west-1.compute.internal,60020,1405642103651, tracking > started Fri Aug 08 17:21:55 UTC 2014, retrying after 10043 ms, replay 500 > ops. > {quote} > analysis: > there are 2 methods in RpcRetryingCaller: callWithRetries and > callWithoutRetries. > it looks the timeout setup of callWithRetries is good, while > callWithoutRetries is wrong(multi RPC for this user): caller cannot specify a > valid timeout, but callWithoutRetries still calls beforeCall, which looks a > method for callWithRetries only, to set timeout. since > RpcRetryingCaller#callTimeout is not set, thread local timeout is set to > 2s(MIN_RPC_TIMEOUT) via RpcClient.setRpcTimeout, which is the final > pinginterval set to the socket. > when there are heavy write workload and the rpc cannot complete in 2s, the > client close the connection, so the server side connection is reset and > finally exposes the problem in HBASE-11705 -- This message was sent by Atlassian JIRA (v6.2#6252)