[
https://issues.apache.org/jira/browse/THRIFT-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869045#comment-16869045
]
Jens Geyer commented on THRIFT-4887:
------------------------------------
{quote}
my fix is in client side
{quote}
I somehow assumed we are talking about the server end. So that's good news.
{quote}
It's fine to reinitialize protocol/transport stack from scratch, but might not
close sockets as subsequent requests(maybe another thrift method that works
fine) could re-use them. Opening a lot sockets consume a lot resources and
could easily lead to Socket Exceptions
{quote}
That has been the policy with Thrift connections: if something really bad
happens, throw away the connection. If it's a client side problem then fine:
Just write a better client. If the client fails due to such a problem, it will
still fail on the second and third try as well. Nobody cares about sockets when
the whole app keeps crashing. I mean, we still talk about a scenario that only
happens because someone manipulated the IDL in an incompatible manner and did
not mange to get all client and server versions straight - which BTW would be
a big-time illusion to get such a constellation managed in a highly distributed
solution. There is a chance as long as you control all machines involved, but
as soon as you don't have access to just one old client you will get ugly phone
calls.
I don't say that I don't see the value of the addition. But I'm also not 100%
convinced. However, I don't want to get in the way and would really love to
hear other opinions as well.
> Thrift will OOM at a low concurrency if fields added and old client requests
> new server
> ---------------------------------------------------------------------------------------
>
> Key: THRIFT-4887
> URL: https://issues.apache.org/jira/browse/THRIFT-4887
> Project: Thrift
> Issue Type: Bug
> Environment: Almost all versions from 0.8.0 to the newest
> 0.13.0-snapshot, and verify on 0.9.3/0.11.0.
> Almost all languages, and verified on Go/Java
> Reporter: aqingsir
> Priority: Major
> Attachments: readI32.jpg
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> (Could see more on [https://github.com/aqingsao/thrift-oom])
> //background
> A serious issue occured in our prod env and finally it came out to be the
> changement of some fields in an IDL file, old client still requested new
> server and crashed due to OOM.
> IDL changement could be stated as: Return value of the interface is a list,
> element of which is a struct object and has 5 fields. A new field is added to
> the middle of the struct.
> // to reproduce
> In this case a low concurency of 10 will reproduce this issue, you could find
> a demo project on: [https://github.com/aqingsao/thrift-oom]
> // reason
> Thrift tries to consume all data in inputstream by skipping fields that are
> redundant or have a type mismatch. But it won't consume subsequent fields if
> there's an exception.
> In such a case Thrift does nothing the underlying inputstream, so trouble
> comes to the next request who reuses this connection, as the cursor still
> points to some middle position of the inputstream.
> As Thrift always starts with a readI32() method for any response, which means
> the length of the method's name. Unbelievable the length could be as large as
> 184549632, which is about 176M. This explains why OOM occurs even at a
> concurrency of 10
> // how to fix
> Always clear inputstream in TSocket if there are any redundant data at the
> end of a method call.
> I'll submit a PR soon for Java version.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)