[ https://issues.apache.org/jira/browse/THRIFT-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16870555#comment-16870555 ]
aqingsir commented on THRIFT-4887: ---------------------------------- Thanks James, The missing guide is great enough to worth the name. I also agree with the solution of preventing replacement of previously assigned slot numbers, but could it be done on Thrift protocol level instead of a best practice, cause I doubt whether one fifth of Thrift users have ever read or understoodd the missing guide. > Thrift will OOM at a low concurrency if fields added and old client requests > new server > --------------------------------------------------------------------------------------- > > Key: THRIFT-4887 > URL: https://issues.apache.org/jira/browse/THRIFT-4887 > Project: Thrift > Issue Type: Bug > Environment: Almost all versions from 0.8.0 to the newest > 0.13.0-snapshot, and verify on 0.9.3/0.11.0. > Almost all languages, and verified on Go/Java > Reporter: aqingsir > Assignee: James E. King III > Priority: Major > Attachments: readI32.jpg > > Time Spent: 40m > Remaining Estimate: 0h > > (Could see more on [https://github.com/aqingsao/thrift-oom]) > //background > A serious issue occured in our prod env and finally it came out to be the > changement of some fields in an IDL file, old client still requested new > server and crashed due to OOM. > IDL changement could be stated as: Return value of the interface is a list, > element of which is a struct object and has 5 fields. A new field is added to > the middle of the struct. > // to reproduce > In this case a low concurency of 10 will reproduce this issue, you could find > a demo project on: [https://github.com/aqingsao/thrift-oom] > // reason > Thrift tries to consume all data in inputstream by skipping fields that are > redundant or have a type mismatch. But it won't consume subsequent fields if > there's an exception. > In such a case Thrift does nothing the underlying inputstream, so trouble > comes to the next request who reuses this connection, as the cursor still > points to some middle position of the inputstream. > As Thrift always starts with a readI32() method for any response, which means > the length of the method's name. Unbelievable the length could be as large as > 184549632, which is about 176M. This explains why OOM occurs even at a > concurrency of 10 > // how to fix > Always clear inputstream in TSocket if there are any redundant data at the > end of a method call. > I'll submit a PR soon for Java version. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)