[ 
https://issues.apache.org/jira/browse/ACCUMULO-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14957176#comment-14957176
 ] 

Eric Newton commented on ACCUMULO-4028:
---------------------------------------

May want to use Read/Write locks to eliminate some of the contention in 
ZooCache.


> ServerClient getConnection is inefficient
> -----------------------------------------
>
>                 Key: ACCUMULO-4028
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4028
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 1.4.5, 1.5.4, 1.6.4, 1.7.0
>         Environment: Large production environment.
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>             Fix For: 1.6.5, 1.7.1, 1.8.0
>
>
> Several bulk load FATE operations were taking a long time, but actual bulk 
> load statistics were quite good.
> The master bulk load threads were stuck in LoadFiles, specifically trying to 
> get a connection to a random tablet server.
> The method to get a random connection looks at all the tablet server locks in 
> zookeeper. On a large cluster (say, one with more than 1000 nodes), this is a 
> lot of lookups in zookeeper.  And this is done for every file to be bulk 
> loaded.
> Normally, these lookups would be cached in zooCache, and the next look up 
> would would all be from local memory.  But the cache is a singleton in the 
> master, so other activities, especially those that make RPC calls to 
> zookeeper while holding the lock, will delay these lookups.
> The master has a list of the active tablet servers. It can pick one at random 
> and create a new connection to it, using, potentially thousands of fewer 
> calls to the zoocache for each file to be loaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to