[ https://issues.apache.org/jira/browse/HBASE-16752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15568079#comment-15568079 ]
Ashu Pachauri commented on HBASE-16752: --------------------------------------- [~anoop.hbase] The way it's implemented right now is that there is no feedback to the client (there is no RequestTooBigException), and the connection is simply dropped. This has two side effects: 1. Client only sees connection drops without any reason, which may be hard to debug for people not very familiar with HBase codebase. Even if I do try to return a RequestTooBigException (a new exception), client simply discards this because server sends an incorrect call ID that it's not expecting (Server has an incorrect call ID because it does not want to read the whole request as it's too large). 2. Client will retry the same rpc again and again and keep failing (until retries are exhausted or forever in case of replication). The implication on replication is that if the destination peer is upgraded to 1.3 (where servers enforce this limit), replication can fail because source can take large RPCs while peer cannot. A temporary fix here is that the HBase admin override this rpc size limit on the peer. We could also change the default on HBase 1.3 (currently 256 MB per call, 1 GB total call queue size ) to match max call queue size on HBase 1.2 (1 GB), but then it defeats the purpose of this config. That said, I do not plan to fix the replication problem, just to give better feedback to the client so that this can be easily diagnosed, temporary fix can be applied and clients can be modified to respect the rpc size limit. > Upgrading from 1.2 to 1.3 can lead to replication failures due to difference > in RPC size limit > ---------------------------------------------------------------------------------------------- > > Key: HBASE-16752 > URL: https://issues.apache.org/jira/browse/HBASE-16752 > Project: HBase > Issue Type: Bug > Components: Replication, rpc > Affects Versions: 1.3.0 > Reporter: Ashu Pachauri > Assignee: Ashu Pachauri > > In HBase 1.2, we don't limit size of a single RPC but in 1.3 we limit it by > default to 256 MB. This means that during upgrade scenarios (or when source > is 1.2 peer is already on 1.3), it's possible to encounter a situation where > we try to send an rpc with size greater than 256 MB because we never unroll a > WALEdit while sending replication traffic. > RpcServer throws the underlying exception locally, but closes the connection > with returning the underlying error to the client, and client only sees a > "Broken pipe" error. > I am not sure what is the proper fix here (or if one is needed) to make sure > this does not happen, but we should return the underlying exception to the > RpcClient, because without it, it can be difficult to diagnose the problem, > especially for someone new to HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)