[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908619#comment-13908619
 ] 

stack commented on HBASE-10566:
-------------------------------

bq.  Nick pointed me to this, it's quite interesting: 
https://code.google.com/p/protobuf-rpc-pro/wiki/RpcTimeout

Yes.  It would be fun to try and drop in a new transport, one that had 
fancyness like that of pb-rpc-pro with bidirectional messaging and cancel, etc.

Or this dead one: https://code.google.com/p/netty-protobuf-rpc/  There are 
others too..

This is a thorny issue N.  Thanks for digging in.

bq.  if we can not write to the socket in 10 second, we have an issue.

Yes.  We've inherited a load of our timeouts from our batch-orientated parent 
and have yet to change them in many cases.

bq. Today, we have a single value, it does not allow us to have low socket read 
timeouts.

Yes. Excellent.  This sloppyness has been allowed prevail down through the 
years (sorry about that).

bq. May be protobuf makes this complicated, to be confirmed), but as well it 
does not really work, because we can have multithreading issues 

This we inherited from the hadoop rpc.

You think we should just do a new transport altogether?  You are fixing ugly 
legacy.

bq. we could send the call timeout to the server as well: 

Yes.  Server might reject a call, even before it starts working on it, because 
it is already past its timeout (for whatever reason)


bq.           getStub().multi(null, request); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 
pcrc not used

Good.

bq. we would need to instantiate one rpcController per call (we more or less do 
that already)

We do this already -- at least IIRC, this is what the model puts up on us.

rpcController we use at the moment for carrying our cellblock across the pb rpc 
interface (it doesn't allow for extra args as is).

Let me look at the patch






> cleanup rpcTimeout in the client
> --------------------------------
>
>                 Key: HBASE-10566
>                 URL: https://issues.apache.org/jira/browse/HBASE-10566
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 0.99.0
>            Reporter: Nicolas Liochon
>            Assignee: Nicolas Liochon
>             Fix For: 0.99.0
>
>         Attachments: 10566.sample.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will make the system more reactive to failure.
> I think we can solve this now, especially after 10525. The main issue is to 
> something that fits well with protobuf...
> Then it should be easy to have a pool of thread for writers and readers, w/o 
> a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to