[jira] [Commented] (HBASE-16752) Upgrading from 1.2 to 1.3 can lead to replication failures due to difference in RPC size limit

Ashu Pachauri (JIRA) Wed, 12 Oct 2016 01:49:08 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-16752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568079#comment-15568079
 ]


Ashu Pachauri commented on HBASE-16752:
---------------------------------------

[~anoop.hbase] The way it's implemented right now is that there is no feedback 
to the client (there is no RequestTooBigException), and the connection is 
simply dropped. This has two side effects:
1. Client only sees connection drops without any reason, which may be hard to 
debug for people not very familiar with HBase codebase. Even if I do try to 
return a RequestTooBigException (a new exception), client simply discards this 
because server sends an incorrect call ID that it's not expecting (Server has 
an incorrect call ID because it does not want to read the whole request  as 
it's too large).
2. Client will retry the same rpc again and again and keep failing (until 
retries are exhausted or forever in case of replication).

The implication on replication is that if the destination peer is upgraded to 
1.3 (where servers enforce this limit), replication can fail because source can 
take large RPCs while peer cannot. A temporary fix here is that the HBase admin 
override this rpc size limit on the peer. We could also change the default on 
HBase 1.3 (currently 256 MB per call, 1 GB total call queue size ) to match max 
call queue size on HBase 1.2 (1 GB), but then it defeats the purpose of this 
config.

That said, I do not plan to fix the replication problem, just to give better 
feedback to the client so that this can be easily diagnosed, temporary fix can 
be applied and clients can be modified to respect the rpc size limit.


> Upgrading from 1.2 to 1.3 can lead to replication failures due to difference 
> in RPC size limit
> ----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-16752
>                 URL: https://issues.apache.org/jira/browse/HBASE-16752
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication, rpc
>    Affects Versions: 1.3.0
>            Reporter: Ashu Pachauri
>            Assignee: Ashu Pachauri
>
> In HBase 1.2, we don't limit size of a single RPC but in 1.3 we limit it by 
> default to 256 MB.  This means that during upgrade scenarios (or when source 
> is 1.2 peer is already on 1.3), it's possible to encounter a situation where 
> we try to send an rpc with size greater than 256 MB because we never unroll a 
> WALEdit while sending replication traffic.
> RpcServer throws the underlying exception locally, but closes the connection 
> with returning the underlying error to the client, and client only sees a 
> "Broken pipe" error.
> I am not sure what is the proper fix here (or if one is needed) to make sure 
> this does not happen, but we should return the underlying exception to the 
> RpcClient, because without it, it can be difficult to diagnose the problem, 
> especially for someone new to HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16752) Upgrading from 1.2 to 1.3 can lead to replication failures due to difference in RPC size limit

Reply via email to