[
https://issues.apache.org/jira/browse/THRIFT-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869099#comment-16869099
]
aqingsir commented on THRIFT-4887:
----------------------------------
Glad to see that we are aligned with each other, and many thanks for your quick
and patient reply.
The only minor difference is that should a connection be thrown away if a
request fails. At first it should be made clear that whether such a failure is
a connection failure, or is it just a request failure?
In the case we discussed above, it only fails on requests to a specific method,
which has a mismatch of IDL version between client and server sides, so in my
opinion the connection should be maintained and continue to work cause it might
visit multiple instead of one downstream methods.
To make it clear, let's say that a thrift client might visit 5 downstream
methods, what is supposed to be done if 1 method is broken?
# If this method is an important and mainstream method, errors should be
printed and even business might fail.
# If this method is not a mainstream method, errors should be printed and
nothing else.
And in both cases no crash is expected.
A quick summary is that a connection should be not thrown away if it's only a
request failure rather than a connection failure so that it could be re-used to
the utmost extent.
And at last I respect your opinion. It's really a good idea to hear from others.
> Thrift will OOM at a low concurrency if fields added and old client requests
> new server
> ---------------------------------------------------------------------------------------
>
> Key: THRIFT-4887
> URL: https://issues.apache.org/jira/browse/THRIFT-4887
> Project: Thrift
> Issue Type: Bug
> Environment: Almost all versions from 0.8.0 to the newest
> 0.13.0-snapshot, and verify on 0.9.3/0.11.0.
> Almost all languages, and verified on Go/Java
> Reporter: aqingsir
> Priority: Major
> Attachments: readI32.jpg
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> (Could see more on [https://github.com/aqingsao/thrift-oom])
> //background
> A serious issue occured in our prod env and finally it came out to be the
> changement of some fields in an IDL file, old client still requested new
> server and crashed due to OOM.
> IDL changement could be stated as: Return value of the interface is a list,
> element of which is a struct object and has 5 fields. A new field is added to
> the middle of the struct.
> // to reproduce
> In this case a low concurency of 10 will reproduce this issue, you could find
> a demo project on: [https://github.com/aqingsao/thrift-oom]
> // reason
> Thrift tries to consume all data in inputstream by skipping fields that are
> redundant or have a type mismatch. But it won't consume subsequent fields if
> there's an exception.
> In such a case Thrift does nothing the underlying inputstream, so trouble
> comes to the next request who reuses this connection, as the cursor still
> points to some middle position of the inputstream.
> As Thrift always starts with a readI32() method for any response, which means
> the length of the method's name. Unbelievable the length could be as large as
> 184549632, which is about 176M. This explains why OOM occurs even at a
> concurrency of 10
> // how to fix
> Always clear inputstream in TSocket if there are any redundant data at the
> end of a method call.
> I'll submit a PR soon for Java version.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)