[ 
https://issues.apache.org/jira/browse/THRIFT-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869045#comment-16869045
 ] 

Jens Geyer commented on THRIFT-4887:
------------------------------------

{quote}
my fix is in client side
{quote}

I somehow assumed we are talking about the server end. So that's good news.

{quote}
It's fine to reinitialize protocol/transport stack from scratch, but might not 
close sockets as subsequent requests(maybe another thrift method that works 
fine) could re-use them. Opening a lot sockets consume a lot resources and 
could easily lead to Socket Exceptions
{quote}

That has been the policy with Thrift connections: if something really bad 
happens, throw away the connection. If it's a client side problem then fine: 
Just write a better client. If the client fails due to such a problem, it will 
still fail on the second and third try as well. Nobody cares about sockets when 
the whole app keeps crashing. I mean, we still talk about a scenario that only 
happens because someone manipulated the IDL in an incompatible manner and did 
not mange to get all client and server versions straight  - which BTW would be 
a big-time illusion to get such a constellation managed in a highly distributed 
solution. There is a chance as long as you control all machines involved, but 
as soon as you don't have access to just one old client you will get ugly phone 
calls.

I don't say that I don't see the value of the addition. But I'm also not 100% 
convinced. However, I don't want to get in the way and would really love to 
hear other opinions as well. 





> Thrift will OOM at a low concurrency if fields added and old client requests 
> new server
> ---------------------------------------------------------------------------------------
>
>                 Key: THRIFT-4887
>                 URL: https://issues.apache.org/jira/browse/THRIFT-4887
>             Project: Thrift
>          Issue Type: Bug
>         Environment: Almost all versions from 0.8.0 to the newest 
> 0.13.0-snapshot, and verify on 0.9.3/0.11.0.
> Almost all languages, and verified on Go/Java
>            Reporter: aqingsir
>            Priority: Major
>         Attachments: readI32.jpg
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> (Could see  more on [https://github.com/aqingsao/thrift-oom])
> //background
> A serious issue occured in our prod env and finally it came out to be the 
> changement of some fields in an IDL file, old client still requested new 
> server and crashed due to OOM.
> IDL changement could be stated as: Return value of the interface is a list, 
> element of which is a struct object and has 5 fields. A new field is added to 
> the middle of the struct.
> // to reproduce
> In this case a low concurency of 10 will reproduce this issue, you could find 
> a demo project on: [https://github.com/aqingsao/thrift-oom]
> // reason
> Thrift tries to consume all data in inputstream by skipping fields that are 
> redundant or have a type mismatch. But it won't consume subsequent fields if 
> there's an exception.
> In such a case Thrift does nothing the underlying inputstream, so trouble 
> comes to the next request who reuses this connection, as the cursor still 
> points to some middle position of the inputstream.
> As Thrift always starts with a readI32() method for any response, which means 
> the length of the method's name. Unbelievable the length could be as large as 
> 184549632, which is about 176M. This explains why OOM occurs even at a 
> concurrency of 10
> // how to fix
> Always clear inputstream in TSocket if there are any redundant data at the 
> end of a method call.
> I'll submit a PR soon for Java version.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to