[ 
https://issues.apache.org/jira/browse/THRIFT-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16864485#comment-16864485
 ] 

Jens Geyer commented on THRIFT-4887:
------------------------------------

I just did a quick check using netstd.

{quote}
Why there's an OOM issue

As we know Thrift tries to consume all data in inputstream by skipping fields 
that are redundant or have a type mismatch. At the same time Thrift validates 
every struct object and throws an exception if it's invalid. It conflicts 
because Thrift won't consume subsequent fields if there's an exception.

The current Thrift RPC request fails on such an exception, just as expected, 
but nothing is done to the underlying inputstream, which means there still 
exists some redundant data, and cursor points to some middle position of the 
inputstream.
{quote}

That's not bow it should work. The idea is that the validation throws and 
aborts the read, that's correct so far. But as a co nsequence, the Code should 
end up in {{Server.Execute()}} where any exception is caught and the whole 
protocol/Transport stack is reinitialized from scratch. If that is not the case 
with Java anymore, that should be corrected.



> Thrift will OOM at a low concurrency if fields added and old client requests 
> new server
> ---------------------------------------------------------------------------------------
>
>                 Key: THRIFT-4887
>                 URL: https://issues.apache.org/jira/browse/THRIFT-4887
>             Project: Thrift
>          Issue Type: Bug
>         Environment: Almost all versions from 0.8.0 to the newest 
> 0.13.0-snapshot, and verify on 0.9.3/0.11.0.
> Almost all languages, and verified on Go/Java
>            Reporter: aqingsir
>            Priority: Major
>         Attachments: readI32.jpg
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> (Could see  more on [https://github.com/aqingsao/thrift-oom])
> //background
> A serious issue occured in our prod env and finally it came out to be the 
> changement of some fields in an IDL file, old client still requested new 
> server and crashed due to OOM.
> IDL changement could be stated as: Return value of the interface is a list, 
> element of which is a struct object and has 5 fields. A new field is added to 
> the middle of the struct.
> // to reproduce
> In this case a low concurency of 10 will reproduce this issue, you could find 
> a demo project on: [https://github.com/aqingsao/thrift-oom]
> // reason
> Thrift tries to consume all data in inputstream by skipping fields that are 
> redundant or have a type mismatch. But it won't consume subsequent fields if 
> there's an exception.
> In such a case Thrift does nothing the underlying inputstream, so trouble 
> comes to the next request who reuses this connection, as the cursor still 
> points to some middle position of the inputstream.
> As Thrift always starts with a readI32() method for any response, which means 
> the length of the method's name. Unbelievable the length could be as large as 
> 184549632, which is about 176M. This explains why OOM occurs even at a 
> concurrency of 10
> // how to fix
> Always clear inputstream in TSocket if there are any redundant data at the 
> end of a method call.
> I'll submit a PR soon for Java version.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to