Nicolas Liochon created HBASE-10566:
---------------------------------------

             Summary: cleanup rpcTimeout in the client
                 Key: HBASE-10566
                 URL: https://issues.apache.org/jira/browse/HBASE-10566
             Project: HBase
          Issue Type: Bug
          Components: Client
    Affects Versions: 0.99.0
            Reporter: Nicolas Liochon
            Assignee: Nicolas Liochon
             Fix For: 0.99.0


There are two issues:
1) A confusion between the socket timeout and the call timeout
Socket timeouts should be minimal: a default like 20 seconds, that could be 
lowered to single digits timeouts for some apps: if we can not write to the 
socket in 10 second, we have an issue. This is different from the total 
duration (send query + do query + receive query), that can be longer, as it can 
include remotes calls on the server and so on. Today, we have a single value, 
it does not allow us to have low socket read timeouts.
2) The timeout can be different between the calls. Typically, if the total 
time, retries included is 60 seconds but failed after 2 seconds, then the 
remaining is 58s. HBase does this today, but by hacking with a thread local 
storage variable. It's a hack (it should have been a parameter of the methods, 
the TLS allowed to bypass all the layers. May be protobuf makes this 
complicated, to be confirmed), but as well it does not really work, because we 
can have multithreading issues (we use the updated rpc timeout of someone else, 
or we create a new BlockingRpcChannelImplementation with a random default 
timeout).

Ideally, we could send the call timeout to the server as well: it will be able 
to dismiss alone the calls that it received but git stick in the request queue 
or in the internal retries (on hdfs for example).

This will make the system more reactive to failure.
I think we can solve this now, especially after 10525. The main issue is to 
something that fits well with protobuf...
Then it should be easy to have a pool of thread for writers and readers, w/o a 
single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to