[ 
https://issues.apache.org/jira/browse/DRILL-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16093471#comment-16093471
 ] 

Roman commented on DRILL-5083:
------------------------------

It seems I got some reproduce of this issue. 
I use Drill from master (35d07c3bd) which includes 
[DRILL-5420|https://issues.apache.org/jira/browse/DRILL-5420] and 
[DRILL-5599|https://issues.apache.org/jira/browse/DRILL-5599] fixes 
(CANCELLATION_REQUESTED issues). Here is a list of my properties:

planner.enable_hashjoin = false;
planner.enable_hashagg = false;
planner.enable_mergejoin = true;
planner.memory.max_query_memory_per_node = 1048576;

I ran query which should fail after ~2 min with "RESOURCE ERROR: External Sort 
encountered an error while spilling to disk" and manually cancelled it after 
1min 40 sec. In this case query hangs in CANCELLATION_REQUESTED state until I 
restart drillbit. It seems query hangs in code generation state. Here is my 
jstack example:
{code:xml}
   "26989d8b-2aa4-1a56-f7d3-2b1d7b55d786:frag:10:1" #181 daemon prio=10 
os_prio=0 tid=0x00007f1eec0b0800 nid=0x2be0 sleeping[0x00007f1edc650000]
   java.lang.Thread.State: RUNNABLE
        at com.sun.codemodel.JStringLiteral.generate(JStringLiteral.java:61)
        at com.sun.codemodel.JFormatter.g(JFormatter.java:350)
        at com.sun.codemodel.JFormatter.g(JFormatter.java:363)
        at com.sun.codemodel.JInvocation.generate(JInvocation.java:185)
        at com.sun.codemodel.JFormatter.g(JFormatter.java:350)
        at com.sun.codemodel.JThrow.state(JThrow.java:67)
        at com.sun.codemodel.JFormatter.s(JFormatter.java:386)
        at com.sun.codemodel.JBlock.generateBody(JBlock.java:448)
        at com.sun.codemodel.JBlock.generate(JBlock.java:436)
        at com.sun.codemodel.JFormatter.g(JFormatter.java:350)
        at com.sun.codemodel.JConditional.state(JConditional.java:115)
        at com.sun.codemodel.JFormatter.s(JFormatter.java:386)
        at com.sun.codemodel.JBlock.generateBody(JBlock.java:448)
        at com.sun.codemodel.JBlock.generate(JBlock.java:436)
        at com.sun.codemodel.JFormatter.g(JFormatter.java:350)
        at com.sun.codemodel.JBlock.state(JBlock.java:464)
        at com.sun.codemodel.JFormatter.s(JFormatter.java:386)
        at com.sun.codemodel.JBlock.generateBody(JBlock.java:448)
        at com.sun.codemodel.JBlock.generate(JBlock.java:436)
        at com.sun.codemodel.JFormatter.g(JFormatter.java:350)
        at com.sun.codemodel.JBlock.state(JBlock.java:464)
        at com.sun.codemodel.JFormatter.s(JFormatter.java:386)
        at com.sun.codemodel.JMethod.declare(JMethod.java:460)
        at com.sun.codemodel.JFormatter.d(JFormatter.java:376)
        at com.sun.codemodel.JDefinedClass.declareBody(JDefinedClass.java:815)
        at com.sun.codemodel.JDefinedClass.declare(JDefinedClass.java:788)
        at com.sun.codemodel.JFormatter.d(JFormatter.java:376)
        at com.sun.codemodel.JFormatter.write(JFormatter.java:406)
        at com.sun.codemodel.JPackage.build(JPackage.java:438)
        at com.sun.codemodel.JCodeModel.build(JCodeModel.java:311)
        at com.sun.codemodel.JCodeModel.build(JCodeModel.java:301)
        at 
org.apache.drill.exec.expr.CodeGenerator.generate(CodeGenerator.java:191)
        at 
org.apache.drill.exec.compile.CodeCompiler.createInstances(CodeCompiler.java:177)
        at 
org.apache.drill.exec.compile.CodeCompiler.createInstance(CodeCompiler.java:159)
        at 
org.apache.drill.exec.ops.FragmentContext.getImplementationClass(FragmentContext.java:325)
        at 
org.apache.drill.exec.ops.FragmentContext.getImplementationClass(FragmentContext.java:319)
        at 
org.apache.drill.exec.physical.impl.join.MergeJoinBatch.generateNewWorker(MergeJoinBatch.java:382)
        at 
org.apache.drill.exec.physical.impl.join.MergeJoinBatch.innerNext(MergeJoinBatch.java:193)
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
        at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
        at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:133)
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
        at 
org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:325)
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
        at 
org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext(StreamingAggBatch.java:140)
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
        at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:105)
        at 
org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:144)
        at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:95)
        at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:234)
        at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:227)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
        at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:227)
        at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
        - <0x00000000e2398088> (a 
java.util.concurrent.ThreadPoolExecutor$Worker)

{code}

I got not the same jstack as was described in previous messages, but in this 
case, I can get an infinite loop in MergeJoin and I think it relates to this 
issue.

> RecordIterator can sometimes restart a query on close
> -----------------------------------------------------
>
>                 Key: DRILL-5083
>                 URL: https://issues.apache.org/jira/browse/DRILL-5083
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.8.0
>            Reporter: Paul Rogers
>            Assignee: Roman
>            Priority: Minor
>         Attachments: DrillOperatorErrorHandlingRedesign.pdf
>
>
> This one is very confusing...
> In a test with a MergeJoin and external sort, operators are stacked something 
> like this:
> {code}
> Screen
> - MergeJoin
> - - External Sort
> ...
> {code}
> Using the injector to force a OOM in spill, the external sort threw a 
> UserException up the stack. This was handed by:
> {code}
> IteratorValidatorBatchIterator.next( )
> RecordIterator.clearInflightBatches( )
> RecordIterator.close( )
> MergeJoinBatch.close( )
> {code}
> Which does the following:
> {code}
>       // Check whether next() should even have been called in current state.
>       if (null != exceptionState) {
>         throw new IllegalStateException(
> {code}
> But, the exceptionState is set, so we end up throwing an 
> IllegalStateException during cleanup.
> Seems the code should agree: if {{next( )}} will be called during cleanup, 
> then {{next( )}} should gracefully handle that case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to