Michael Scott Leuthaeuser created THRIFT-3805:
-------------------------------------------------
Summary: Golang server susceptible to memory spike from malformed
message
Key: THRIFT-3805
URL: https://issues.apache.org/jira/browse/THRIFT-3805
Project: Thrift
Issue Type: Bug
Components: Go - Library
Affects Versions: 0.9.3
Environment: OSX 10.10.5, Debian 7.10
Reporter: Michael Scott Leuthaeuser
Priority: Critical
When a thrift server is started through the Go library, using the default
TServerSocket, the processor from the generated files, and a SimpleServer2
(which uses BinaryProtocol), it listens on the created socket for incoming
messages. However, if the incoming message is *not* from another Thrift
server, and thus the data submitted over the connection does not conform to the
Thrift message serialization format (for us, this was triggered by automated
security scanners connecting to the exposed port and sending it data to see how
it would react), it triggers a massive memory spike.
This appears to happen because the BinaryProtocol expects the first 4 bytes of
the file to indicate the size of the following message, interpreted as a signed
Int32 in BigEndian format. If this value is positive, it reads that many bytes
from the connection.
However, to do this, it first *allocates* a slice of that many bytes. If the
provided message is not a thrift serialization, this signed Int32 can be
massive (we're typically seeing it have a value of 1-2 billion), causing it to
allocate *hundreds* of megabytes of data. This slice is retained until the
expected number of bytes are received (which generally doesn't happen for a
malformed connection), or until the connection is closed from the client end.
When either happen, the BinaryProtocol then type-casts the entire array to a
String type, which causes a *second* allocation of equal size, before returning
the message to be processed.
Since this processing fails, both the byte slice and the string are discarded
and reclaimed on the next garbage collection cycle, but in the mean type, it
will have allocated an amount of memory equal to double the value of that
Int32, in bytes. We're typically seeing this allocation exceed 800-1000 MB per
variable (so 800 MB when the connection is received, and another 800-1000 MB
when it's closed). The first allocation will persist as long as the connection
is maintained, and other connections that are similarly malformed will cause
concurrent allocation spikes, which can easily crash the server.
To demonstrate this, one can make an extremely simple single-method thrift
contract, start a server, then netcat to the host port and feed it at least 4
characters of data.
To fix this, we implemented an iterative buffered read into the BinaryProtocol
itself. Instead of allocating the entire array at once, if the message
indicates it is larger than a certain size (we have it set at 32kb, but the
actual value is largely arbitrary), it instead reads the message in increments
of that size, and joins them using the standard library bytes.Buffer type.
Here's a diff of our changes:
https://gist.github.com/kaedys/53b8fd05690d5d8f202afa7878d6e3d5
This fixes the issue of having a massive initial allocation due to a malformed
request, while still allowing messages of arbitrary incoming size. It will
likely marginally slow down the reading of extremely large messages (ones
substantially larger than the constant defined), but the bytes.Buffer type is
well optimized for growing itself as it is written to. I also realize that
this only occurs with malformed requests to the server, but I would think a
server package should be relatively hardened against malformed data being
submitted to it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)