[ 
https://issues.apache.org/jira/browse/THRIFT-5233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134564#comment-17134564
 ] 

Duru Can Celasun commented on THRIFT-5233:
------------------------------------------

> I started to look into the rabbit hole of I/O timeouts in the go library 
> (mainly the socket timeout on TSocket), and start to realize that the current 
> way we handle it is not great.

Yeah, most of that code predates context in stdlib so there is nothing more 
granular than the socket timeout. The problem you describe is valid, but 
honestly none of the options seem ideal.

Option 1 is definitely off the table. If I understand your proposal correctly, 
option 2 only breaks TProtocol whereas option 3 breaks TTransport as well. 
Custom TProtocol in the wild are rare, but we should definitely try to avoid 
breaking TTransport. So if you'd like to open a PR, let's go with option 2.

Even then, it's a fairly big BC break. I don't like it, but we broke 
compatibility for context propagation before and this is more of the same.

> I believe some compiler change might also be required to pass it all the way 
> to TProtocol calls.

Yes, I think you'll have to pass the context into various iprot.Read* calls in 
the compiler but that should be a small change.

> I/O timeout handling in go library
> ----------------------------------
>
>                 Key: THRIFT-5233
>                 URL: https://issues.apache.org/jira/browse/THRIFT-5233
>             Project: Thrift
>          Issue Type: Improvement
>          Components: Go - Compiler, Go - Library
>    Affects Versions: 0.13.0
>            Reporter: Yuxuan Wang
>            Priority: Major
>
> While debugging the bug in the first implementation of THRIFT-5214, I started 
> to look into the rabbit hole of I/O timeouts in the go library (mainly the 
> socket timeout on TSocket), and start to realize that the current way we 
> handle it is not great.
> From client's perspective, how a request goes is:
> client prepare the request TStruct -> send the request over the wire -> start 
> reading the response over the wire -> handle the response or I/O error
> The problem here, is that server will only send the first byte of response 
> out after it finished processing the request. So if the client incorrectly 
> configured a socket timeout on the TSocket used by this request to a value 
> that's too low, the reading will just report i/o timeout when the deadline 
> reaches, and the client will just fail the whole request.
> This will cause problems:
> # Client can't set socket timeout to something too low. In fact, if they use 
> a client pool (so for most requests there's no overhead on connecting), they 
> need the socket timeout to be at least the latency SLA of the server they 
> talk to, otherwise there's a serious risk that the client will fail a lot of 
> requests prematurely.
> # On the other hand, setting the socket timeout to something too high is also 
> a bad practice. In case that server is overloaded and cannot handle requests 
> in a timely fashion, client should have a way to fail faster, instead of 
> waiting for a very long timeout to expire.
> If the client set a deadline on the context object with the call, their 
> expectation usually would be that the request will fail after the deadline 
> passed (a small overhead is usually acceptable). But the socket timeout is 
> usually some fixed value configured to the program, while the actual deadline 
> on the context is more variable (e.g. this could be a server talking to 
> upstream server, it has a fixed totally deadline for the whole endpoint, but 
> the steps before that might take variable time so the deadline left here can 
> vary a lot).
> This leads to the point that we would need a way to keep socket timeout and 
> the deadline set in the context object in-sync.
> There are a few different options. The first obvious one, is to pass the 
> context object all the way to TSocket.Read, so that function can extract the 
> deadline out of the context object (if any), and set that as the socket 
> timeout instead of the pre-configured, fixed one.
> But that is also easy to be ruled out, because that would change the function 
> signature of TSocket.Read, making TSocket no longer implement io.Reader 
> interface.
> The next option, I think, is to handle it on TProtocol level. The idea is 
> that we pass the context object all the way into TProtocol.ReadMessageBegin 
> (and maybe other read functions of TProcotol?). This way, it can handle the 
> read errors, check if it's a timeout error, and if it is and the context 
> deadline haven't passed, it can just keep retrying again. This way, we can 
> set the fixed socket timeout to a low number when we know we will always have 
> a real deadline in the context attached, and just let TProtocol 
> implementation to keep retrying instead. If the user don't attach a deadline 
> to the context object, then they still need to set a large number on the 
> socket timeout.
> A slightly different option, is to add SetReadTimeout function to TTransport 
> interface. So all the TTransport implementations wrapping TSocket should just 
> pass that along. If the terminal one is actually not a TSocket (for example, 
> it's TMemoryBuffer), then this function just does nothing. This way, the 
> TProtocol.Read* functions can just extract deadline out of the context 
> object, and call their TTransport.SetReadTimeout with that deadline.
> Either way, this is going to be a breaking change that we need to add context 
> object to TProtocol.Read* function signatures. As a result, I believe some 
> compiler change might also be required to pass it all the way to TProtocol 
> calls. But even if compiler changes are needed, that part should be minimal 
> (just make sure we are passing context object correctly), and most of the 
> changes would be on TProtocol implementations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to