[jira] [Updated] (DRILL-3890) TPCH Concurrency test hit foremanException even when all the drillbits are alive
[ https://issues.apache.org/jira/browse/DRILL-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Victoria Markman updated DRILL-3890: Assignee: (was: Victoria Markman) > TPCH Concurrency test hit foremanException even when all the drillbits are > alive > > > Key: DRILL-3890 > URL: https://issues.apache.org/jira/browse/DRILL-3890 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.2.0 > Environment: ucs-node 1 - node 11 (10+1 node cluster), RHEL 6.4 Linux > 2.6.32-358.el6.x86_64, MapR 4.0.2.29870.GA, drill 1.2 master git commitID > f78ab84183e73216b76732f66f87ccf48e2340d3 >Reporter: Dechang Gu > > In TPCH Concurrency test, when number of query threads is 24 or more, many > queries are terminated due to "FroemanException", for example: > SYSTEM ERROR: ForemanException: One more more nodes lost connectivity during > query. Identified nodes were [ucs-node8.perf.lab:31010]. > (org.apache.drill.exec.work.foreman.ForemanException) One more more nodes > lost connectivity during query. Identified nodes were > [ucs-node8.perf.lab:31010]. > org.apache.drill.exec.work.foreman.QueryManager$2.drillbitUnregistered():527 > org.apache.drill.exec.coord.ClusterCoordinator.drillbitUnregistered():68 > org.apache.drill.exec.coord.zk.ZKClusterCoordinator.updateEndpoints():248 > org.apache.drill.exec.coord.zk.ZKClusterCoordinator.access$300():58 > org.apache.drill.exec.coord.zk.ZKClusterCoordinator$ZKListener.cacheChanged():154 > org.apache.curator.x.discovery.details.ServiceCacheImpl$2.apply():164 > org.apache.curator.x.discovery.details.ServiceCacheImpl$2.apply():160 > org.apache.curator.framework.listen.ListenerContainer$1.run():92 > com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute():293 > org.apache.curator.framework.listen.ListenerContainer.forEach():83 > org.apache.curator.x.discovery.details.ServiceCacheImpl.childEvent():157 > org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply():509 > org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply():503 > org.apache.curator.framework.listen.ListenerContainer$1.run():92 > com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute():293 > org.apache.curator.framework.listen.ListenerContainer.forEach():83 > org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners():500 > org.apache.curator.framework.recipes.cache.EventOperation.invoke():35 > org.apache.curator.framework.recipes.cache.PathChildrenCache$10.run():762 > java.util.concurrent.Executors$RunnableAdapter.call():471 > java.util.concurrent.FutureTask.run():262 > java.util.concurrent.Executors$RunnableAdapter.call():471java.util.concurrent.FutureTask.run():262 > java.util.concurrent.ThreadPoolExecutor.runWorker():1145 > java.util.concurrent.ThreadPoolExecutor$Worker.run():615 > java.lang.Thread.run():745 > But at that time the drillbit on the identified node is still active. > It works fine with 16 or fewer query threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3890) TPCH Concurrency test hit foremanException even when all the drillbits are alive
[ https://issues.apache.org/jira/browse/DRILL-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sudheesh Katkam updated DRILL-3890: --- Component/s: (was: Functions - Drill) Execution - Flow > TPCH Concurrency test hit foremanException even when all the drillbits are > alive > > > Key: DRILL-3890 > URL: https://issues.apache.org/jira/browse/DRILL-3890 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.2.0 > Environment: ucs-node 1 - node 11 (10+1 node cluster), RHEL 6.4 Linux > 2.6.32-358.el6.x86_64, MapR 4.0.2.29870.GA, drill 1.2 master git commitID > f78ab84183e73216b76732f66f87ccf48e2340d3 >Reporter: Dechang Gu > > In TPCH Concurrency test, when number of query threads is 24 or more, many > queries are terminated due to "FroemanException", for example: > SYSTEM ERROR: ForemanException: One more more nodes lost connectivity during > query. Identified nodes were [ucs-node8.perf.lab:31010]. > (org.apache.drill.exec.work.foreman.ForemanException) One more more nodes > lost connectivity during query. Identified nodes were > [ucs-node8.perf.lab:31010]. > org.apache.drill.exec.work.foreman.QueryManager$2.drillbitUnregistered():527 > org.apache.drill.exec.coord.ClusterCoordinator.drillbitUnregistered():68 > org.apache.drill.exec.coord.zk.ZKClusterCoordinator.updateEndpoints():248 > org.apache.drill.exec.coord.zk.ZKClusterCoordinator.access$300():58 > org.apache.drill.exec.coord.zk.ZKClusterCoordinator$ZKListener.cacheChanged():154 > org.apache.curator.x.discovery.details.ServiceCacheImpl$2.apply():164 > org.apache.curator.x.discovery.details.ServiceCacheImpl$2.apply():160 > org.apache.curator.framework.listen.ListenerContainer$1.run():92 > com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute():293 > org.apache.curator.framework.listen.ListenerContainer.forEach():83 > org.apache.curator.x.discovery.details.ServiceCacheImpl.childEvent():157 > org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply():509 > org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply():503 > org.apache.curator.framework.listen.ListenerContainer$1.run():92 > com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute():293 > org.apache.curator.framework.listen.ListenerContainer.forEach():83 > org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners():500 > org.apache.curator.framework.recipes.cache.EventOperation.invoke():35 > org.apache.curator.framework.recipes.cache.PathChildrenCache$10.run():762 > java.util.concurrent.Executors$RunnableAdapter.call():471 > java.util.concurrent.FutureTask.run():262 > java.util.concurrent.Executors$RunnableAdapter.call():471java.util.concurrent.FutureTask.run():262 > java.util.concurrent.ThreadPoolExecutor.runWorker():1145 > java.util.concurrent.ThreadPoolExecutor$Worker.run():615 > java.lang.Thread.run():745 > But at that time the drillbit on the identified node is still active. > It works fine with 16 or fewer query threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3890) TPCH Concurrency test hit foremanException even when all the drillbits are alive
[ https://issues.apache.org/jira/browse/DRILL-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sudheesh Katkam updated DRILL-3890: --- Description: In TPCH Concurrency test, when number of query threads is 24 or more, many queries are terminated due to "FroemanException", for example: {code} SYSTEM ERROR: ForemanException: One more more nodes lost connectivity during query. Identified nodes were [ucs-node8.perf.lab:31010]. (org.apache.drill.exec.work.foreman.ForemanException) One more more nodes lost connectivity during query. Identified nodes were [ucs-node8.perf.lab:31010]. org.apache.drill.exec.work.foreman.QueryManager$2.drillbitUnregistered():527 org.apache.drill.exec.coord.ClusterCoordinator.drillbitUnregistered():68 org.apache.drill.exec.coord.zk.ZKClusterCoordinator.updateEndpoints():248 org.apache.drill.exec.coord.zk.ZKClusterCoordinator.access$300():58 org.apache.drill.exec.coord.zk.ZKClusterCoordinator$ZKListener.cacheChanged():154 org.apache.curator.x.discovery.details.ServiceCacheImpl$2.apply():164 org.apache.curator.x.discovery.details.ServiceCacheImpl$2.apply():160 org.apache.curator.framework.listen.ListenerContainer$1.run():92 com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute():293 org.apache.curator.framework.listen.ListenerContainer.forEach():83 org.apache.curator.x.discovery.details.ServiceCacheImpl.childEvent():157 org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply():509 org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply():503 org.apache.curator.framework.listen.ListenerContainer$1.run():92 com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute():293 org.apache.curator.framework.listen.ListenerContainer.forEach():83 org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners():500 org.apache.curator.framework.recipes.cache.EventOperation.invoke():35 org.apache.curator.framework.recipes.cache.PathChildrenCache$10.run():762 java.util.concurrent.Executors$RunnableAdapter.call():471 java.util.concurrent.FutureTask.run():262 java.util.concurrent.Executors$RunnableAdapter.call():471java.util.concurrent.FutureTask.run():262 java.util.concurrent.ThreadPoolExecutor.runWorker():1145 java.util.concurrent.ThreadPoolExecutor$Worker.run():615 java.lang.Thread.run():745 {code} But at that time the drillbit on the identified node is still active. It works fine with 16 or fewer query threads. was: In TPCH Concurrency test, when number of query threads is 24 or more, many queries are terminated due to "FroemanException", for example: SYSTEM ERROR: ForemanException: One more more nodes lost connectivity during query. Identified nodes were [ucs-node8.perf.lab:31010]. (org.apache.drill.exec.work.foreman.ForemanException) One more more nodes lost connectivity during query. Identified nodes were [ucs-node8.perf.lab:31010]. org.apache.drill.exec.work.foreman.QueryManager$2.drillbitUnregistered():527 org.apache.drill.exec.coord.ClusterCoordinator.drillbitUnregistered():68 org.apache.drill.exec.coord.zk.ZKClusterCoordinator.updateEndpoints():248 org.apache.drill.exec.coord.zk.ZKClusterCoordinator.access$300():58 org.apache.drill.exec.coord.zk.ZKClusterCoordinator$ZKListener.cacheChanged():154 org.apache.curator.x.discovery.details.ServiceCacheImpl$2.apply():164 org.apache.curator.x.discovery.details.ServiceCacheImpl$2.apply():160 org.apache.curator.framework.listen.ListenerContainer$1.run():92 com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute():293 org.apache.curator.framework.listen.ListenerContainer.forEach():83 org.apache.curator.x.discovery.details.ServiceCacheImpl.childEvent():157 org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply():509 org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply():503 org.apache.curator.framework.listen.ListenerContainer$1.run():92 com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute():293 org.apache.curator.framework.listen.ListenerContainer.forEach():83 org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners():500 org.apache.curator.framework.recipes.cache.EventOperation.invoke():35 org.apache.curator.framework.recipes.cache.PathChildrenCache$10.run():762 java.util.concurrent.Executors$RunnableAdapter.call():471 java.util.concurrent.FutureTask.run():262 java.util.concurrent.Executors$RunnableAdapter.call():471java.util.concurrent.FutureTask.run():262 java.util.concurrent.ThreadPoolExecutor.runWorker():1145 java.util.concurrent.ThreadPoolExecutor$Worker.run():615 java.lang.Thread.run():745 But at that time the drillbit on the identified node is still active. It works fine with 16 or fewer query threads. > TPCH Concurrency test hit foremanException even when all the drillbits are > alive > > >