[ 
https://issues.apache.org/jira/browse/HBASE-15221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15137339#comment-15137339
 ] 

Josh Elser commented on HBASE-15221:
------------------------------------

I think the root cause is that {{AsyncProcess.receiveGlobalFailures}} isn't 
getting invoked in the normal path (we fall into {{receiveMultiAction()}} -- 
the exceptions aren't raised, only passed along in the MultiResponse). For 
consistency with what 0.98 is doing, we should clear the cache for the server 
in AsyncProcess and not do this custom handling up in HTableMultiplexer.

Let me try to get an addendum on what was already applied (preemptive thanks 
[~tedyu] and [~busbey] for getting pulled along by my sleuthing)

> HTableMultiplexer improvements (stale region locations and resource leaks)
> --------------------------------------------------------------------------
>
>                 Key: HBASE-15221
>                 URL: https://issues.apache.org/jira/browse/HBASE-15221
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Critical
>             Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.4
>
>         Attachments: HBASE-15221.001.patch, HBASE-15221.002.patch, 
> HBASE-15221.003.patch, HBASE-15221.branch-1.patch, HBASE-15221.v4.patch
>
>
> It looks like HTableMultiplexer has a couple of issues.
> Upon failing to send a Put to the appropriate RS, the Put is re-queued back 
> into the system. Normally this is fine as such an exception is transient and 
> the Put would eventually succeed. However, in the case where the Put was 
> rejected because of a NotServingRegionException (e.g. split, balance, merge), 
> the re-queuing of the Put will end up using the same cached HRegionLocation. 
> This means that the Put will just be repeatedly sent back to the same RS over 
> and over again, eventually being dropped on the floor. Need to invalidate the 
> location cache (or make sure we refresh it) when we re-queue the Put.
> The internal ClusterConnection is also leaked. If a user creates many 
> HTableMultiplexers, they'll eventually run into issues (memory, zk 
> connections, etc) because they'll never get cleaned up. HTableMultiplexer 
> needs a close method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to