[ https://issues.apache.org/jira/browse/DRILL-6998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Abhishek Ravi updated DRILL-6998: --------------------------------- Description: Following query joins 2 tables on *two* (>1) fields. {noformat} select count(*) from lineitem l inner join partsupp p on l.l_partkey = p.ps_partkey AND l.l_suppkey = p.ps_suppkey {noformat} The query does not return even though Fragment 0:0 reports a state change from {{RUNNING}} -> {{FINISHED}} Following is the jstack output of the {{Frag0:0}}. {noformat} "23b85137-b102-39a9-70d9-72381c5fb93b:frag:0:0" #16037 daemon prio=10 os_prio=0 tid=0x00007f5f48d415d0 nid=0x1a61 waiting on condition [0x00007f61b32b2000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.drill.exec.work.filter.RuntimeFilterSink.close(RuntimeFilterSink.java:116) at org.apache.drill.exec.work.filter.RuntimeFilterRouter.waitForComplete(RuntimeFilterRouter.java:113) at org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:738) at org.apache.drill.exec.work.foreman.QueryStateProcessor.wrapUpCompletion(QueryStateProcessor.java:315) at org.apache.drill.exec.work.foreman.QueryStateProcessor.running(QueryStateProcessor.java:276) at org.apache.drill.exec.work.foreman.QueryStateProcessor.moveToState(QueryStateProcessor.java:92) - locked <0x000000055f9a7468> (a org.apache.drill.exec.work.foreman.QueryStateProcessor) at org.apache.drill.exec.work.foreman.QueryStateProcessor$StateSwitch.processEvent(QueryStateProcessor.java:349) at org.apache.drill.exec.work.foreman.QueryStateProcessor$StateSwitch.processEvent(QueryStateProcessor.java:342) at org.apache.drill.common.EventProcessor.processEvents(EventProcessor.java:107) at org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:65) at org.apache.drill.exec.work.foreman.QueryStateProcessor$StateSwitch.addEvent(QueryStateProcessor.java:344) at org.apache.drill.exec.work.foreman.QueryStateProcessor.addToEventQueue(QueryStateProcessor.java:155) at org.apache.drill.exec.work.foreman.Foreman.addToEventQueue(Foreman.java:213) at org.apache.drill.exec.work.foreman.QueryManager.nodeComplete(QueryManager.java:519) at org.apache.drill.exec.work.foreman.QueryManager.access$100(QueryManager.java:65) at org.apache.drill.exec.work.foreman.QueryManager$NodeTracker.fragmentComplete(QueryManager.java:483) at org.apache.drill.exec.work.foreman.QueryManager.fragmentDone(QueryManager.java:155) at org.apache.drill.exec.work.foreman.QueryManager.access$400(QueryManager.java:65) at org.apache.drill.exec.work.foreman.QueryManager$1.statusUpdate(QueryManager.java:546) at org.apache.drill.exec.rpc.control.WorkEventBus.statusUpdate(WorkEventBus.java:63) at org.apache.drill.exec.work.batch.ControlMessageHandler.requestFragmentStatus(ControlMessageHandler.java:253) at org.apache.drill.exec.rpc.control.LocalControlConnectionManager.runCommand(LocalControlConnectionManager.java:130) at org.apache.drill.exec.rpc.control.ControlTunnel.sendFragmentStatus(ControlTunnel.java:89) at org.apache.drill.exec.work.fragment.FragmentStatusReporter.sendStatus(FragmentStatusReporter.java:122) at org.apache.drill.exec.work.fragment.FragmentStatusReporter.stateChanged(FragmentStatusReporter.java:91) at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:367) at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:219) at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:330) at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {noformat} >From the code, it seems that {{RuntimeFilterSink.close}} is stuck at {code:java} while (!asyncAggregateWorker.over.get()) { try { Thread.sleep(100); } catch (InterruptedException e) { logger.error("interrupted while sleeping to wait for the aggregating worker thread to exit", e); } } {code} This is because {{AsyncAggregateWorker}} exits due to the following exception, before it could set asyncAggregateWorker.over is set to *false*. {noformat} 2019-01-22 16:01:18,773 [drill-executor-1301] ERROR o.a.d.e.w.filter.RuntimeFilterSink - Failed to aggregate or route the RFW java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.drill.exec.work.filter.RuntimeFilterWritable.unwrap(RuntimeFilterWritable.java:67) ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] at org.apache.drill.exec.work.filter.RuntimeFilterWritable.aggregate(RuntimeFilterWritable.java:78) ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] at org.apache.drill.exec.work.filter.RuntimeFilterSink.aggregate(RuntimeFilterSink.java:140) ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] at org.apache.drill.exec.work.filter.RuntimeFilterSink.access$600(RuntimeFilterSink.java:52) ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] at org.apache.drill.exec.work.filter.RuntimeFilterSink$AsyncAggregateWorker.run(RuntimeFilterSink.java:246) ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_151] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_151] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_151] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_151] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151] {noformat} A simple fix would be to add {{over.set(true)}} to the {{finally}} block in {{AsyncAggregateWorker.run}}. Hit the issue with latest changes in the PR -> https://github.com/apache/drill/pull/1600 was: Following query joins 2 tables on *two* (>1) fields. {noformat} select count(*) from lineitem l inner join partsupp p on l.l_partkey = p.ps_partkey AND l.l_suppkey = p.ps_suppkey {noformat} The query does not return even though Fragment 0:0 reports a state change from {{RUNNING}} -> {{FINISHED}} Following is the jstack output of the {{Frag0:0}}. {noformat} "23b85137-b102-39a9-70d9-72381c5fb93b:frag:0:0" #16037 daemon prio=10 os_prio=0 tid=0x00007f5f48d415d0 nid=0x1a61 waiting on condition [0x00007f61b32b2000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.drill.exec.work.filter.RuntimeFilterSink.close(RuntimeFilterSink.java:116) at org.apache.drill.exec.work.filter.RuntimeFilterRouter.waitForComplete(RuntimeFilterRouter.java:113) at org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:738) at org.apache.drill.exec.work.foreman.QueryStateProcessor.wrapUpCompletion(QueryStateProcessor.java:315) at org.apache.drill.exec.work.foreman.QueryStateProcessor.running(QueryStateProcessor.java:276) at org.apache.drill.exec.work.foreman.QueryStateProcessor.moveToState(QueryStateProcessor.java:92) - locked <0x000000055f9a7468> (a org.apache.drill.exec.work.foreman.QueryStateProcessor) at org.apache.drill.exec.work.foreman.QueryStateProcessor$StateSwitch.processEvent(QueryStateProcessor.java:349) at org.apache.drill.exec.work.foreman.QueryStateProcessor$StateSwitch.processEvent(QueryStateProcessor.java:342) at org.apache.drill.common.EventProcessor.processEvents(EventProcessor.java:107) at org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:65) at org.apache.drill.exec.work.foreman.QueryStateProcessor$StateSwitch.addEvent(QueryStateProcessor.java:344) at org.apache.drill.exec.work.foreman.QueryStateProcessor.addToEventQueue(QueryStateProcessor.java:155) at org.apache.drill.exec.work.foreman.Foreman.addToEventQueue(Foreman.java:213) at org.apache.drill.exec.work.foreman.QueryManager.nodeComplete(QueryManager.java:519) at org.apache.drill.exec.work.foreman.QueryManager.access$100(QueryManager.java:65) at org.apache.drill.exec.work.foreman.QueryManager$NodeTracker.fragmentComplete(QueryManager.java:483) at org.apache.drill.exec.work.foreman.QueryManager.fragmentDone(QueryManager.java:155) at org.apache.drill.exec.work.foreman.QueryManager.access$400(QueryManager.java:65) at org.apache.drill.exec.work.foreman.QueryManager$1.statusUpdate(QueryManager.java:546) at org.apache.drill.exec.rpc.control.WorkEventBus.statusUpdate(WorkEventBus.java:63) at org.apache.drill.exec.work.batch.ControlMessageHandler.requestFragmentStatus(ControlMessageHandler.java:253) at org.apache.drill.exec.rpc.control.LocalControlConnectionManager.runCommand(LocalControlConnectionManager.java:130) at org.apache.drill.exec.rpc.control.ControlTunnel.sendFragmentStatus(ControlTunnel.java:89) at org.apache.drill.exec.work.fragment.FragmentStatusReporter.sendStatus(FragmentStatusReporter.java:122) at org.apache.drill.exec.work.fragment.FragmentStatusReporter.stateChanged(FragmentStatusReporter.java:91) at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:367) at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:219) at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:330) at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {noformat} >From the code, it seems that {{RuntimeFilterSink.close}} is stuck at {code:java} while (!asyncAggregateWorker.over.get()) { try { Thread.sleep(100); } catch (InterruptedException e) { logger.error("interrupted while sleeping to wait for the aggregating worker thread to exit", e); } } {code} This is because {{AsyncAggregateWorker}} exits due to the following exception, before it could set asyncAggregateWorker.over is set to *false*. {noformat} 2019-01-22 16:01:18,773 [drill-executor-1301] ERROR o.a.d.e.w.filter.RuntimeFilterSink - Failed to aggregate or route the RFW java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.drill.exec.work.filter.RuntimeFilterWritable.unwrap(RuntimeFilterWritable.java:67) ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] at org.apache.drill.exec.work.filter.RuntimeFilterWritable.aggregate(RuntimeFilterWritable.java:78) ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] at org.apache.drill.exec.work.filter.RuntimeFilterSink.aggregate(RuntimeFilterSink.java:140) ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] at org.apache.drill.exec.work.filter.RuntimeFilterSink.access$600(RuntimeFilterSink.java:52) ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] at org.apache.drill.exec.work.filter.RuntimeFilterSink$AsyncAggregateWorker.run(RuntimeFilterSink.java:246) ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_151] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_151] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_151] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_151] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151] {noformat} A simple fix would be to add {{over.set(true)}} to the {{finally}} block of {{AsyncAggregateWorker.run}}. Hit the issue with latest changes in the PR -> https://github.com/apache/drill/pull/1600 > Queries failing with "Failed to aggregate or route the RFW" due to > "java.lang.ArrayIndexOutOfBoundsException" do not complete > ----------------------------------------------------------------------------------------------------------------------------- > > Key: DRILL-6998 > URL: https://issues.apache.org/jira/browse/DRILL-6998 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow > Affects Versions: 1.16.0 > Reporter: Abhishek Ravi > Assignee: weijie.tong > Priority: Major > Fix For: 1.16.0 > > > Following query joins 2 tables on *two* (>1) fields. > {noformat} > select count(*) from lineitem l inner join partsupp p on l.l_partkey = > p.ps_partkey AND l.l_suppkey = p.ps_suppkey > {noformat} > The query does not return even though Fragment 0:0 reports a state change > from {{RUNNING}} -> {{FINISHED}} > Following is the jstack output of the {{Frag0:0}}. > {noformat} > "23b85137-b102-39a9-70d9-72381c5fb93b:frag:0:0" #16037 daemon prio=10 > os_prio=0 tid=0x00007f5f48d415d0 nid=0x1a61 waiting on condition > [0x00007f61b32b2000] > java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.drill.exec.work.filter.RuntimeFilterSink.close(RuntimeFilterSink.java:116) > at > org.apache.drill.exec.work.filter.RuntimeFilterRouter.waitForComplete(RuntimeFilterRouter.java:113) > at > org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:738) > at > org.apache.drill.exec.work.foreman.QueryStateProcessor.wrapUpCompletion(QueryStateProcessor.java:315) > at > org.apache.drill.exec.work.foreman.QueryStateProcessor.running(QueryStateProcessor.java:276) > at > org.apache.drill.exec.work.foreman.QueryStateProcessor.moveToState(QueryStateProcessor.java:92) > - locked <0x000000055f9a7468> (a > org.apache.drill.exec.work.foreman.QueryStateProcessor) > at > org.apache.drill.exec.work.foreman.QueryStateProcessor$StateSwitch.processEvent(QueryStateProcessor.java:349) > at > org.apache.drill.exec.work.foreman.QueryStateProcessor$StateSwitch.processEvent(QueryStateProcessor.java:342) > at > org.apache.drill.common.EventProcessor.processEvents(EventProcessor.java:107) > at > org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:65) > at > org.apache.drill.exec.work.foreman.QueryStateProcessor$StateSwitch.addEvent(QueryStateProcessor.java:344) > at > org.apache.drill.exec.work.foreman.QueryStateProcessor.addToEventQueue(QueryStateProcessor.java:155) > at > org.apache.drill.exec.work.foreman.Foreman.addToEventQueue(Foreman.java:213) > at > org.apache.drill.exec.work.foreman.QueryManager.nodeComplete(QueryManager.java:519) > at > org.apache.drill.exec.work.foreman.QueryManager.access$100(QueryManager.java:65) > at > org.apache.drill.exec.work.foreman.QueryManager$NodeTracker.fragmentComplete(QueryManager.java:483) > at > org.apache.drill.exec.work.foreman.QueryManager.fragmentDone(QueryManager.java:155) > at > org.apache.drill.exec.work.foreman.QueryManager.access$400(QueryManager.java:65) > at > org.apache.drill.exec.work.foreman.QueryManager$1.statusUpdate(QueryManager.java:546) > at > org.apache.drill.exec.rpc.control.WorkEventBus.statusUpdate(WorkEventBus.java:63) > at > org.apache.drill.exec.work.batch.ControlMessageHandler.requestFragmentStatus(ControlMessageHandler.java:253) > at > org.apache.drill.exec.rpc.control.LocalControlConnectionManager.runCommand(LocalControlConnectionManager.java:130) > at > org.apache.drill.exec.rpc.control.ControlTunnel.sendFragmentStatus(ControlTunnel.java:89) > at > org.apache.drill.exec.work.fragment.FragmentStatusReporter.sendStatus(FragmentStatusReporter.java:122) > at > org.apache.drill.exec.work.fragment.FragmentStatusReporter.stateChanged(FragmentStatusReporter.java:91) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:367) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:219) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:330) > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} > From the code, it seems that {{RuntimeFilterSink.close}} is stuck at > {code:java} > while (!asyncAggregateWorker.over.get()) { > try { > Thread.sleep(100); > } catch (InterruptedException e) { > logger.error("interrupted while sleeping to wait for the aggregating > worker thread to exit", e); > } > } > {code} > This is because {{AsyncAggregateWorker}} exits due to the following > exception, before it could set asyncAggregateWorker.over is set to *false*. > {noformat} > 2019-01-22 16:01:18,773 [drill-executor-1301] ERROR > o.a.d.e.w.filter.RuntimeFilterSink - Failed to aggregate or route the RFW > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.drill.exec.work.filter.RuntimeFilterWritable.unwrap(RuntimeFilterWritable.java:67) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.work.filter.RuntimeFilterWritable.aggregate(RuntimeFilterWritable.java:78) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.work.filter.RuntimeFilterSink.aggregate(RuntimeFilterSink.java:140) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.work.filter.RuntimeFilterSink.access$600(RuntimeFilterSink.java:52) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > org.apache.drill.exec.work.filter.RuntimeFilterSink$AsyncAggregateWorker.run(RuntimeFilterSink.java:246) > ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_151] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [na:1.8.0_151] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [na:1.8.0_151] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_151] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151] > {noformat} > A simple fix would be to add {{over.set(true)}} to the {{finally}} block in > {{AsyncAggregateWorker.run}}. > Hit the issue with latest changes in the PR -> > https://github.com/apache/drill/pull/1600 -- This message was sent by Atlassian JIRA (v7.6.3#76005)