[ https://issues.apache.org/jira/browse/ACCUMULO-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14957176#comment-14957176 ]
Eric Newton commented on ACCUMULO-4028: --------------------------------------- May want to use Read/Write locks to eliminate some of the contention in ZooCache. > ServerClient getConnection is inefficient > ----------------------------------------- > > Key: ACCUMULO-4028 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4028 > Project: Accumulo > Issue Type: Bug > Components: client > Affects Versions: 1.4.5, 1.5.4, 1.6.4, 1.7.0 > Environment: Large production environment. > Reporter: Eric Newton > Assignee: Eric Newton > Fix For: 1.6.5, 1.7.1, 1.8.0 > > > Several bulk load FATE operations were taking a long time, but actual bulk > load statistics were quite good. > The master bulk load threads were stuck in LoadFiles, specifically trying to > get a connection to a random tablet server. > The method to get a random connection looks at all the tablet server locks in > zookeeper. On a large cluster (say, one with more than 1000 nodes), this is a > lot of lookups in zookeeper. And this is done for every file to be bulk > loaded. > Normally, these lookups would be cached in zooCache, and the next look up > would would all be from local memory. But the cache is a singleton in the > master, so other activities, especially those that make RPC calls to > zookeeper while holding the lock, will delay these lookups. > The master has a list of the active tablet servers. It can pick one at random > and create a new connection to it, using, potentially thousands of fewer > calls to the zoocache for each file to be loaded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)