[jira] [Updated] (DRILL-3890) TPCH Concurrency test hit foremanException even when all the drillbits are alive

2015-10-02 Thread Victoria Markman (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victoria Markman updated DRILL-3890:

Assignee: (was: Victoria Markman)

> TPCH Concurrency test hit foremanException even when all the drillbits are 
> alive
> 
>
> Key: DRILL-3890
> URL: https://issues.apache.org/jira/browse/DRILL-3890
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.2.0
> Environment: ucs-node 1 - node 11 (10+1 node cluster), RHEL 6.4 Linux 
> 2.6.32-358.el6.x86_64, MapR 4.0.2.29870.GA, drill 1.2 master git commitID 
> f78ab84183e73216b76732f66f87ccf48e2340d3
>Reporter: Dechang Gu
>
> In TPCH Concurrency test, when number of query threads is 24 or more, many 
> queries are terminated due to "FroemanException", for example:
> SYSTEM ERROR: ForemanException: One more more nodes lost connectivity during 
> query. Identified nodes were [ucs-node8.perf.lab:31010].
> (org.apache.drill.exec.work.foreman.ForemanException) One more more nodes 
> lost connectivity during query. Identified nodes were 
> [ucs-node8.perf.lab:31010].
> org.apache.drill.exec.work.foreman.QueryManager$2.drillbitUnregistered():527
> org.apache.drill.exec.coord.ClusterCoordinator.drillbitUnregistered():68
> org.apache.drill.exec.coord.zk.ZKClusterCoordinator.updateEndpoints():248
> org.apache.drill.exec.coord.zk.ZKClusterCoordinator.access$300():58
> org.apache.drill.exec.coord.zk.ZKClusterCoordinator$ZKListener.cacheChanged():154
> org.apache.curator.x.discovery.details.ServiceCacheImpl$2.apply():164
> org.apache.curator.x.discovery.details.ServiceCacheImpl$2.apply():160
> org.apache.curator.framework.listen.ListenerContainer$1.run():92
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute():293
> org.apache.curator.framework.listen.ListenerContainer.forEach():83
> org.apache.curator.x.discovery.details.ServiceCacheImpl.childEvent():157
> org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply():509
> org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply():503
> org.apache.curator.framework.listen.ListenerContainer$1.run():92
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute():293
> org.apache.curator.framework.listen.ListenerContainer.forEach():83
> org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners():500
> org.apache.curator.framework.recipes.cache.EventOperation.invoke():35
> org.apache.curator.framework.recipes.cache.PathChildrenCache$10.run():762
> java.util.concurrent.Executors$RunnableAdapter.call():471
> java.util.concurrent.FutureTask.run():262
> java.util.concurrent.Executors$RunnableAdapter.call():471java.util.concurrent.FutureTask.run():262
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745
> But at that time the drillbit on the identified node is still active. 
> It works fine with 16 or fewer query threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3890) TPCH Concurrency test hit foremanException even when all the drillbits are alive

2015-10-02 Thread Sudheesh Katkam (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam updated DRILL-3890:
---
Component/s: (was: Functions - Drill)
 Execution - Flow

> TPCH Concurrency test hit foremanException even when all the drillbits are 
> alive
> 
>
> Key: DRILL-3890
> URL: https://issues.apache.org/jira/browse/DRILL-3890
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
> Environment: ucs-node 1 - node 11 (10+1 node cluster), RHEL 6.4 Linux 
> 2.6.32-358.el6.x86_64, MapR 4.0.2.29870.GA, drill 1.2 master git commitID 
> f78ab84183e73216b76732f66f87ccf48e2340d3
>Reporter: Dechang Gu
>
> In TPCH Concurrency test, when number of query threads is 24 or more, many 
> queries are terminated due to "FroemanException", for example:
> SYSTEM ERROR: ForemanException: One more more nodes lost connectivity during 
> query. Identified nodes were [ucs-node8.perf.lab:31010].
> (org.apache.drill.exec.work.foreman.ForemanException) One more more nodes 
> lost connectivity during query. Identified nodes were 
> [ucs-node8.perf.lab:31010].
> org.apache.drill.exec.work.foreman.QueryManager$2.drillbitUnregistered():527
> org.apache.drill.exec.coord.ClusterCoordinator.drillbitUnregistered():68
> org.apache.drill.exec.coord.zk.ZKClusterCoordinator.updateEndpoints():248
> org.apache.drill.exec.coord.zk.ZKClusterCoordinator.access$300():58
> org.apache.drill.exec.coord.zk.ZKClusterCoordinator$ZKListener.cacheChanged():154
> org.apache.curator.x.discovery.details.ServiceCacheImpl$2.apply():164
> org.apache.curator.x.discovery.details.ServiceCacheImpl$2.apply():160
> org.apache.curator.framework.listen.ListenerContainer$1.run():92
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute():293
> org.apache.curator.framework.listen.ListenerContainer.forEach():83
> org.apache.curator.x.discovery.details.ServiceCacheImpl.childEvent():157
> org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply():509
> org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply():503
> org.apache.curator.framework.listen.ListenerContainer$1.run():92
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute():293
> org.apache.curator.framework.listen.ListenerContainer.forEach():83
> org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners():500
> org.apache.curator.framework.recipes.cache.EventOperation.invoke():35
> org.apache.curator.framework.recipes.cache.PathChildrenCache$10.run():762
> java.util.concurrent.Executors$RunnableAdapter.call():471
> java.util.concurrent.FutureTask.run():262
> java.util.concurrent.Executors$RunnableAdapter.call():471java.util.concurrent.FutureTask.run():262
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745
> But at that time the drillbit on the identified node is still active. 
> It works fine with 16 or fewer query threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3890) TPCH Concurrency test hit foremanException even when all the drillbits are alive

2015-10-02 Thread Sudheesh Katkam (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam updated DRILL-3890:
---
Description: 
In TPCH Concurrency test, when number of query threads is 24 or more, many 
queries are terminated due to "FroemanException", for example:

{code}
SYSTEM ERROR: ForemanException: One more more nodes lost connectivity during 
query. Identified nodes were [ucs-node8.perf.lab:31010].
(org.apache.drill.exec.work.foreman.ForemanException) One more more nodes lost 
connectivity during query. Identified nodes were [ucs-node8.perf.lab:31010].

org.apache.drill.exec.work.foreman.QueryManager$2.drillbitUnregistered():527
org.apache.drill.exec.coord.ClusterCoordinator.drillbitUnregistered():68
org.apache.drill.exec.coord.zk.ZKClusterCoordinator.updateEndpoints():248
org.apache.drill.exec.coord.zk.ZKClusterCoordinator.access$300():58
org.apache.drill.exec.coord.zk.ZKClusterCoordinator$ZKListener.cacheChanged():154
org.apache.curator.x.discovery.details.ServiceCacheImpl$2.apply():164
org.apache.curator.x.discovery.details.ServiceCacheImpl$2.apply():160
org.apache.curator.framework.listen.ListenerContainer$1.run():92
com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute():293
org.apache.curator.framework.listen.ListenerContainer.forEach():83
org.apache.curator.x.discovery.details.ServiceCacheImpl.childEvent():157
org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply():509
org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply():503
org.apache.curator.framework.listen.ListenerContainer$1.run():92
com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute():293
org.apache.curator.framework.listen.ListenerContainer.forEach():83
org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners():500
org.apache.curator.framework.recipes.cache.EventOperation.invoke():35
org.apache.curator.framework.recipes.cache.PathChildrenCache$10.run():762
java.util.concurrent.Executors$RunnableAdapter.call():471
java.util.concurrent.FutureTask.run():262
java.util.concurrent.Executors$RunnableAdapter.call():471java.util.concurrent.FutureTask.run():262
java.util.concurrent.ThreadPoolExecutor.runWorker():1145
java.util.concurrent.ThreadPoolExecutor$Worker.run():615
java.lang.Thread.run():745
{code}

But at that time the drillbit on the identified node is still active. 
It works fine with 16 or fewer query threads.

  was:
In TPCH Concurrency test, when number of query threads is 24 or more, many 
queries are terminated due to "FroemanException", for example:
SYSTEM ERROR: ForemanException: One more more nodes lost connectivity during 
query. Identified nodes were [ucs-node8.perf.lab:31010].
(org.apache.drill.exec.work.foreman.ForemanException) One more more nodes lost 
connectivity during query. Identified nodes were [ucs-node8.perf.lab:31010].
org.apache.drill.exec.work.foreman.QueryManager$2.drillbitUnregistered():527
org.apache.drill.exec.coord.ClusterCoordinator.drillbitUnregistered():68
org.apache.drill.exec.coord.zk.ZKClusterCoordinator.updateEndpoints():248
org.apache.drill.exec.coord.zk.ZKClusterCoordinator.access$300():58
org.apache.drill.exec.coord.zk.ZKClusterCoordinator$ZKListener.cacheChanged():154
org.apache.curator.x.discovery.details.ServiceCacheImpl$2.apply():164
org.apache.curator.x.discovery.details.ServiceCacheImpl$2.apply():160
org.apache.curator.framework.listen.ListenerContainer$1.run():92
com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute():293
org.apache.curator.framework.listen.ListenerContainer.forEach():83
org.apache.curator.x.discovery.details.ServiceCacheImpl.childEvent():157
org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply():509
org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply():503
org.apache.curator.framework.listen.ListenerContainer$1.run():92
com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute():293
org.apache.curator.framework.listen.ListenerContainer.forEach():83
org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners():500
org.apache.curator.framework.recipes.cache.EventOperation.invoke():35
org.apache.curator.framework.recipes.cache.PathChildrenCache$10.run():762
java.util.concurrent.Executors$RunnableAdapter.call():471
java.util.concurrent.FutureTask.run():262
java.util.concurrent.Executors$RunnableAdapter.call():471java.util.concurrent.FutureTask.run():262
java.util.concurrent.ThreadPoolExecutor.runWorker():1145
java.util.concurrent.ThreadPoolExecutor$Worker.run():615
java.lang.Thread.run():745


But at that time the drillbit on the identified node is still active. 
It works fine with 16 or fewer query threads.


> TPCH Concurrency test hit foremanException even when all the drillbits are 
> alive
> 
>
>