[ https://issues.apache.org/jira/browse/DRILL-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16093471#comment-16093471 ]
Roman commented on DRILL-5083: ------------------------------ It seems I got some reproduce of this issue. I use Drill from master (35d07c3bd) which includes [DRILL-5420|https://issues.apache.org/jira/browse/DRILL-5420] and [DRILL-5599|https://issues.apache.org/jira/browse/DRILL-5599] fixes (CANCELLATION_REQUESTED issues). Here is a list of my properties: planner.enable_hashjoin = false; planner.enable_hashagg = false; planner.enable_mergejoin = true; planner.memory.max_query_memory_per_node = 1048576; I ran query which should fail after ~2 min with "RESOURCE ERROR: External Sort encountered an error while spilling to disk" and manually cancelled it after 1min 40 sec. In this case query hangs in CANCELLATION_REQUESTED state until I restart drillbit. It seems query hangs in code generation state. Here is my jstack example: {code:xml} "26989d8b-2aa4-1a56-f7d3-2b1d7b55d786:frag:10:1" #181 daemon prio=10 os_prio=0 tid=0x00007f1eec0b0800 nid=0x2be0 sleeping[0x00007f1edc650000] java.lang.Thread.State: RUNNABLE at com.sun.codemodel.JStringLiteral.generate(JStringLiteral.java:61) at com.sun.codemodel.JFormatter.g(JFormatter.java:350) at com.sun.codemodel.JFormatter.g(JFormatter.java:363) at com.sun.codemodel.JInvocation.generate(JInvocation.java:185) at com.sun.codemodel.JFormatter.g(JFormatter.java:350) at com.sun.codemodel.JThrow.state(JThrow.java:67) at com.sun.codemodel.JFormatter.s(JFormatter.java:386) at com.sun.codemodel.JBlock.generateBody(JBlock.java:448) at com.sun.codemodel.JBlock.generate(JBlock.java:436) at com.sun.codemodel.JFormatter.g(JFormatter.java:350) at com.sun.codemodel.JConditional.state(JConditional.java:115) at com.sun.codemodel.JFormatter.s(JFormatter.java:386) at com.sun.codemodel.JBlock.generateBody(JBlock.java:448) at com.sun.codemodel.JBlock.generate(JBlock.java:436) at com.sun.codemodel.JFormatter.g(JFormatter.java:350) at com.sun.codemodel.JBlock.state(JBlock.java:464) at com.sun.codemodel.JFormatter.s(JFormatter.java:386) at com.sun.codemodel.JBlock.generateBody(JBlock.java:448) at com.sun.codemodel.JBlock.generate(JBlock.java:436) at com.sun.codemodel.JFormatter.g(JFormatter.java:350) at com.sun.codemodel.JBlock.state(JBlock.java:464) at com.sun.codemodel.JFormatter.s(JFormatter.java:386) at com.sun.codemodel.JMethod.declare(JMethod.java:460) at com.sun.codemodel.JFormatter.d(JFormatter.java:376) at com.sun.codemodel.JDefinedClass.declareBody(JDefinedClass.java:815) at com.sun.codemodel.JDefinedClass.declare(JDefinedClass.java:788) at com.sun.codemodel.JFormatter.d(JFormatter.java:376) at com.sun.codemodel.JFormatter.write(JFormatter.java:406) at com.sun.codemodel.JPackage.build(JPackage.java:438) at com.sun.codemodel.JCodeModel.build(JCodeModel.java:311) at com.sun.codemodel.JCodeModel.build(JCodeModel.java:301) at org.apache.drill.exec.expr.CodeGenerator.generate(CodeGenerator.java:191) at org.apache.drill.exec.compile.CodeCompiler.createInstances(CodeCompiler.java:177) at org.apache.drill.exec.compile.CodeCompiler.createInstance(CodeCompiler.java:159) at org.apache.drill.exec.ops.FragmentContext.getImplementationClass(FragmentContext.java:325) at org.apache.drill.exec.ops.FragmentContext.getImplementationClass(FragmentContext.java:319) at org.apache.drill.exec.physical.impl.join.MergeJoinBatch.generateNewWorker(MergeJoinBatch.java:382) at org.apache.drill.exec.physical.impl.join.MergeJoinBatch.innerNext(MergeJoinBatch.java:193) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:133) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) at org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:325) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) at org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext(StreamingAggBatch.java:140) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:105) at org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:144) at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:95) at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:234) at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:227) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:227) at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Locked ownable synchronizers: - <0x00000000e2398088> (a java.util.concurrent.ThreadPoolExecutor$Worker) {code} I got not the same jstack as was described in previous messages, but in this case, I can get an infinite loop in MergeJoin and I think it relates to this issue. > RecordIterator can sometimes restart a query on close > ----------------------------------------------------- > > Key: DRILL-5083 > URL: https://issues.apache.org/jira/browse/DRILL-5083 > Project: Apache Drill > Issue Type: Bug > Affects Versions: 1.8.0 > Reporter: Paul Rogers > Assignee: Roman > Priority: Minor > Attachments: DrillOperatorErrorHandlingRedesign.pdf > > > This one is very confusing... > In a test with a MergeJoin and external sort, operators are stacked something > like this: > {code} > Screen > - MergeJoin > - - External Sort > ... > {code} > Using the injector to force a OOM in spill, the external sort threw a > UserException up the stack. This was handed by: > {code} > IteratorValidatorBatchIterator.next( ) > RecordIterator.clearInflightBatches( ) > RecordIterator.close( ) > MergeJoinBatch.close( ) > {code} > Which does the following: > {code} > // Check whether next() should even have been called in current state. > if (null != exceptionState) { > throw new IllegalStateException( > {code} > But, the exceptionState is set, so we end up throwing an > IllegalStateException during cleanup. > Seems the code should agree: if {{next( )}} will be called during cleanup, > then {{next( )}} should gracefully handle that case. -- This message was sent by Atlassian JIRA (v6.4.14#64029)