aqingsir created THRIFT-4887:
--------------------------------

             Summary: Thrift will OOM at a low concurrency if fields added and 
old client requests new server
                 Key: THRIFT-4887
                 URL: https://issues.apache.org/jira/browse/THRIFT-4887
             Project: Thrift
          Issue Type: Bug
         Environment: Almost all versions from 0.8.0 to the newest 
0.13.0-snapshot, and verify on 0.9.3/0.11.0.

Almost all languages, and verified on Go/Java
            Reporter: aqingsir
         Attachments: readI32.jpg

(Could see  more on [https://github.com/aqingsao/thrift-oom])

//background

A serious issue occured in our prod env and finally it came out to be the 
changement of some fields in an IDL file, old client still requested new server 
and crashed due to OOM.

IDL changement could be stated as: Return value of the interface is a list, 
element of which is a struct object and has 5 fields. A new field is added to 
the middle of the struct.

// to reproduce

In this case a low concurency of 10 will reproduce this issue, you could find a 
demo project on: [https://github.com/aqingsao/thrift-oom]

// reason

Thrift tries to consume all data in inputstream by skipping fields that are 
redundant or have a type mismatch. But it won't consume subsequent fields if 
there's an exception.

In such a case Thrift does nothing the underlying inputstream, so trouble comes 
to the next request who reuses this connection, as the cursor still points to 
some middle position of the inputstream.

As Thrift always starts with a readI32() method for any response, which means 
the length of the method's name. Unbelievable the length could be as large as 
184549632, which is about 176M. This explains why OOM occurs even at a 
concurrency of 10

// how to fix

Always clear inputstream in TSocket if there are any redundant data at the end 
of a method call.

I'll submit a PR soon for Java version.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to