[jira] [Commented] (HBASE-16752) Upgrading from 1.2 to 1.3 can lead to replication failures due to difference in RPC size limit

Ashu Pachauri (JIRA) Mon, 10 Oct 2016 13:03:47 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-16752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563323#comment-15563323
 ]


Ashu Pachauri commented on HBASE-16752:
---------------------------------------

The current plan I have is to return a RequestTooBigException to the client. 
The behavior of the server currently when it encounters a giant request is to 
locally log the message and drop the connection after reading the data length, 
without having to read the entire rpc off the wire.
I plan to maintain the current behavior. However, to return the response I need 
to read the request off the wire to get the RPC Call ID. I can think of two 
ways to tackle this:
1. Read only first few bytes of the rpc to try to decode the request header 
from them. This makes assumptions on request header format and size, and will 
need to be kept in sync with the changes to the request format.
2. Instead of returning response with correct Call ID, return with a special 
call ID (negative) for responses that are to be following by dropping the 
connection and modify the rpc connection implementation to handle there 
responses in a special way. This breaks the fundamental client server model, 
because server is sending special instructions to the client which the client 
did not ask for.
Personally, I like the second approach because it just sets a special contract 
and does not require any funky stuff to decode the header, but I am open to 
suggestions.

> Upgrading from 1.2 to 1.3 can lead to replication failures due to difference 
> in RPC size limit
> ----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-16752
>                 URL: https://issues.apache.org/jira/browse/HBASE-16752
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication, rpc
>    Affects Versions: 1.3.0
>            Reporter: Ashu Pachauri
>            Assignee: Ashu Pachauri
>
> In HBase 1.2, we don't limit size of a single RPC but in 1.3 we limit it by 
> default to 256 MB.  This means that during upgrade scenarios (or when source 
> is 1.2 peer is already on 1.3), it's possible to encounter a situation where 
> we try to send an rpc with size greater than 256 MB because we never unroll a 
> WALEdit while sending replication traffic.
> RpcServer throws the underlying exception locally, but closes the connection 
> with returning the underlying error to the client, and client only sees a 
> "Broken pipe" error.
> I am not sure what is the proper fix here (or if one is needed) to make sure 
> this does not happen, but we should return the underlying exception to the 
> RpcClient, because without it, it can be difficult to diagnose the problem, 
> especially for someone new to HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16752) Upgrading from 1.2 to 1.3 can lead to replication failures due to difference in RPC size limit

Reply via email to