[ 
https://issues.apache.org/jira/browse/HADOOP-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HADOOP-10940:
---------------------------------

    Attachment: HADOOP-10940.patch

The problem causes invalid rpc responses to cause the client to go OOM.  This 
is killing oozie servers when users try to use a 2.x client to 0.23.  The same 
applies for 2.x to 1.x.

Added a {{IpcStreams}} object to manage the rpc encoding/decoding.  Response 
size must be > 0 and < data data length used by the rpc server.  Request 
decoding is simpler and more efficient.

If the first response has length -1, then it's assumed to be a pre-rpcv9 error 
response.  Pre-rpcv9 responses began with the callId, not a length, and the 
callId for error was -1.

This patch also fixes flushing issues.  Namely the multiple-send before reading 
a response.  This occurs in two cases:
# insecure: connection header+context+call
# secure: connection header+sasl negotiate

W/o the fix to control flushing, unit tests to verify invalid rpc version 
always failed with broken pipe.  When the server reads the connection header 
for an incompatible client, it sends an error response and immediately closes 
the socket.  The client may still be in the process of sending multiple 
messages as listed above and cause a broken pipe.

I believe the flushing issue may also solve the sporadic unit tests failing 
under windows about the remote end closing the connection.

> RPC client does no bounds checking of responses
> -----------------------------------------------
>
>                 Key: HADOOP-10940
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10940
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>         Attachments: HADOOP-10940.patch
>
>
> The rpc client does no bounds checking of server responses.  In the case of 
> communicating with an older and incompatible RPC, this may lead to OOM issues 
> and leaking of resources.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to