[
https://issues.apache.org/jira/browse/HBASE-16752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563323#comment-15563323
]
Ashu Pachauri commented on HBASE-16752:
---------------------------------------
The current plan I have is to return a RequestTooBigException to the client.
The behavior of the server currently when it encounters a giant request is to
locally log the message and drop the connection after reading the data length,
without having to read the entire rpc off the wire.
I plan to maintain the current behavior. However, to return the response I need
to read the request off the wire to get the RPC Call ID. I can think of two
ways to tackle this:
1. Read only first few bytes of the rpc to try to decode the request header
from them. This makes assumptions on request header format and size, and will
need to be kept in sync with the changes to the request format.
2. Instead of returning response with correct Call ID, return with a special
call ID (negative) for responses that are to be following by dropping the
connection and modify the rpc connection implementation to handle there
responses in a special way. This breaks the fundamental client server model,
because server is sending special instructions to the client which the client
did not ask for.
Personally, I like the second approach because it just sets a special contract
and does not require any funky stuff to decode the header, but I am open to
suggestions.
> Upgrading from 1.2 to 1.3 can lead to replication failures due to difference
> in RPC size limit
> ----------------------------------------------------------------------------------------------
>
> Key: HBASE-16752
> URL: https://issues.apache.org/jira/browse/HBASE-16752
> Project: HBase
> Issue Type: Bug
> Components: Replication, rpc
> Affects Versions: 1.3.0
> Reporter: Ashu Pachauri
> Assignee: Ashu Pachauri
>
> In HBase 1.2, we don't limit size of a single RPC but in 1.3 we limit it by
> default to 256 MB. This means that during upgrade scenarios (or when source
> is 1.2 peer is already on 1.3), it's possible to encounter a situation where
> we try to send an rpc with size greater than 256 MB because we never unroll a
> WALEdit while sending replication traffic.
> RpcServer throws the underlying exception locally, but closes the connection
> with returning the underlying error to the client, and client only sees a
> "Broken pipe" error.
> I am not sure what is the proper fix here (or if one is needed) to make sure
> this does not happen, but we should return the underlying exception to the
> RpcClient, because without it, it can be difficult to diagnose the problem,
> especially for someone new to HBase.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)