[jira] [Commented] (ACCUMULO-4028) ServerClient getConnection is inefficient

2015-10-14 Thread Eric Newton (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957176#comment-14957176
 ] 

Eric Newton commented on ACCUMULO-4028:
---

May want to use Read/Write locks to eliminate some of the contention in 
ZooCache.


> ServerClient getConnection is inefficient
> -
>
> Key: ACCUMULO-4028
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4028
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Affects Versions: 1.4.5, 1.5.4, 1.6.4, 1.7.0
> Environment: Large production environment.
>Reporter: Eric Newton
>Assignee: Eric Newton
> Fix For: 1.6.5, 1.7.1, 1.8.0
>
>
> Several bulk load FATE operations were taking a long time, but actual bulk 
> load statistics were quite good.
> The master bulk load threads were stuck in LoadFiles, specifically trying to 
> get a connection to a random tablet server.
> The method to get a random connection looks at all the tablet server locks in 
> zookeeper. On a large cluster (say, one with more than 1000 nodes), this is a 
> lot of lookups in zookeeper.  And this is done for every file to be bulk 
> loaded.
> Normally, these lookups would be cached in zooCache, and the next look up 
> would would all be from local memory.  But the cache is a singleton in the 
> master, so other activities, especially those that make RPC calls to 
> zookeeper while holding the lock, will delay these lookups.
> The master has a list of the active tablet servers. It can pick one at random 
> and create a new connection to it, using, potentially thousands of fewer 
> calls to the zoocache for each file to be loaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4028) ServerClient getConnection is inefficient

2015-10-14 Thread marco polo (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957442#comment-14957442
 ] 

marco polo commented on ACCUMULO-4028:
--

[~ecn], I have the patch ready for you from ACCUMULO-3508 to make ZooCache use 
read/write locks. I'll send this patch to you ASAP. Thanks. 

> ServerClient getConnection is inefficient
> -
>
> Key: ACCUMULO-4028
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4028
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Affects Versions: 1.4.5, 1.5.4, 1.6.4, 1.7.0
> Environment: Large production environment.
>Reporter: Eric Newton
>Assignee: Eric Newton
> Fix For: 1.6.5, 1.7.1, 1.8.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Several bulk load FATE operations were taking a long time, but actual bulk 
> load statistics were quite good.
> The master bulk load threads were stuck in LoadFiles, specifically trying to 
> get a connection to a random tablet server.
> The method to get a random connection looks at all the tablet server locks in 
> zookeeper. On a large cluster (say, one with more than 1000 nodes), this is a 
> lot of lookups in zookeeper.  And this is done for every file to be bulk 
> loaded.
> Normally, these lookups would be cached in zooCache, and the next look up 
> would would all be from local memory.  But the cache is a singleton in the 
> master, so other activities, especially those that make RPC calls to 
> zookeeper while holding the lock, will delay these lookups.
> The master has a list of the active tablet servers. It can pick one at random 
> and create a new connection to it, using, potentially thousands of fewer 
> calls to the zoocache for each file to be loaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4028) ServerClient getConnection is inefficient

2015-10-14 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957444#comment-14957444
 ] 

Josh Elser commented on ACCUMULO-4028:
--

{code}
+  TServerInstance servers[] = 
master.onlineTabletServers().toArray(new TServerInstance[0]);
{code}

Would be better to pass an array that is the correct size instead of 0. This 
will save another array allocation.

It seems a shame we have to transform the Set into an array just so we can get 
random access over it. Seems wasteful every time we have a file to bulk load. 
Might it be better to make the array once, wrap it as an UnmodifiableList and 
pass the single reference to each Callable?

> ServerClient getConnection is inefficient
> -
>
> Key: ACCUMULO-4028
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4028
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Affects Versions: 1.4.5, 1.5.4, 1.6.4, 1.7.0
> Environment: Large production environment.
>Reporter: Eric Newton
>Assignee: Eric Newton
> Fix For: 1.6.5, 1.7.1, 1.8.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Several bulk load FATE operations were taking a long time, but actual bulk 
> load statistics were quite good.
> The master bulk load threads were stuck in LoadFiles, specifically trying to 
> get a connection to a random tablet server.
> The method to get a random connection looks at all the tablet server locks in 
> zookeeper. On a large cluster (say, one with more than 1000 nodes), this is a 
> lot of lookups in zookeeper.  And this is done for every file to be bulk 
> loaded.
> Normally, these lookups would be cached in zooCache, and the next look up 
> would would all be from local memory.  But the cache is a singleton in the 
> master, so other activities, especially those that make RPC calls to 
> zookeeper while holding the lock, will delay these lookups.
> The master has a list of the active tablet servers. It can pick one at random 
> and create a new connection to it, using, potentially thousands of fewer 
> calls to the zoocache for each file to be loaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4028) ServerClient getConnection is inefficient

2015-10-14 Thread Eric Newton (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957593#comment-14957593
 ] 

Eric Newton commented on ACCUMULO-4028:
---

I've hoisted the creation of the array to outside of the loop.

> ServerClient getConnection is inefficient
> -
>
> Key: ACCUMULO-4028
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4028
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Affects Versions: 1.4.5, 1.5.4, 1.6.4, 1.7.0
> Environment: Large production environment.
>Reporter: Eric Newton
>Assignee: Eric Newton
> Fix For: 1.6.5, 1.7.1, 1.8.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Several bulk load FATE operations were taking a long time, but actual bulk 
> load statistics were quite good.
> The master bulk load threads were stuck in LoadFiles, specifically trying to 
> get a connection to a random tablet server.
> The method to get a random connection looks at all the tablet server locks in 
> zookeeper. On a large cluster (say, one with more than 1000 nodes), this is a 
> lot of lookups in zookeeper.  And this is done for every file to be bulk 
> loaded.
> Normally, these lookups would be cached in zooCache, and the next look up 
> would would all be from local memory.  But the cache is a singleton in the 
> master, so other activities, especially those that make RPC calls to 
> zookeeper while holding the lock, will delay these lookups.
> The master has a list of the active tablet servers. It can pick one at random 
> and create a new connection to it, using, potentially thousands of fewer 
> calls to the zoocache for each file to be loaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4028) ServerClient getConnection is inefficient

2015-10-14 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957596#comment-14957596
 ] 

Josh Elser commented on ACCUMULO-4028:
--

Thanks!

> ServerClient getConnection is inefficient
> -
>
> Key: ACCUMULO-4028
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4028
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Affects Versions: 1.4.5, 1.5.4, 1.6.4, 1.7.0
> Environment: Large production environment.
>Reporter: Eric Newton
>Assignee: Eric Newton
> Fix For: 1.6.5, 1.7.1, 1.8.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Several bulk load FATE operations were taking a long time, but actual bulk 
> load statistics were quite good.
> The master bulk load threads were stuck in LoadFiles, specifically trying to 
> get a connection to a random tablet server.
> The method to get a random connection looks at all the tablet server locks in 
> zookeeper. On a large cluster (say, one with more than 1000 nodes), this is a 
> lot of lookups in zookeeper.  And this is done for every file to be bulk 
> loaded.
> Normally, these lookups would be cached in zooCache, and the next look up 
> would would all be from local memory.  But the cache is a singleton in the 
> master, so other activities, especially those that make RPC calls to 
> zookeeper while holding the lock, will delay these lookups.
> The master has a list of the active tablet servers. It can pick one at random 
> and create a new connection to it, using, potentially thousands of fewer 
> calls to the zoocache for each file to be loaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4028) ServerClient getConnection is inefficient

2015-10-14 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957229#comment-14957229
 ] 

Josh Elser commented on ACCUMULO-4028:
--

bq. And this is done for every file to be bulk loaded.

bq. The master has a list of the active tablet servers. It can pick one at 
random and create a new connection to it, using, potentially thousands of fewer 
calls to the zoocache for each file to be loaded.

Would it make sense for the BulkImporter to batch calls to the master as well 
to get many random tservers at one time, instead of as it processes each file? 
Batching increases the likelihood that a server dies before we get to use it, 
but that should be rare on average and already be retried automatically (I 
hope).

> ServerClient getConnection is inefficient
> -
>
> Key: ACCUMULO-4028
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4028
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Affects Versions: 1.4.5, 1.5.4, 1.6.4, 1.7.0
> Environment: Large production environment.
>Reporter: Eric Newton
>Assignee: Eric Newton
> Fix For: 1.6.5, 1.7.1, 1.8.0
>
>
> Several bulk load FATE operations were taking a long time, but actual bulk 
> load statistics were quite good.
> The master bulk load threads were stuck in LoadFiles, specifically trying to 
> get a connection to a random tablet server.
> The method to get a random connection looks at all the tablet server locks in 
> zookeeper. On a large cluster (say, one with more than 1000 nodes), this is a 
> lot of lookups in zookeeper.  And this is done for every file to be bulk 
> loaded.
> Normally, these lookups would be cached in zooCache, and the next look up 
> would would all be from local memory.  But the cache is a singleton in the 
> master, so other activities, especially those that make RPC calls to 
> zookeeper while holding the lock, will delay these lookups.
> The master has a list of the active tablet servers. It can pick one at random 
> and create a new connection to it, using, potentially thousands of fewer 
> calls to the zoocache for each file to be loaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)