[ https://issues.apache.org/jira/browse/DRILL-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14744390#comment-14744390 ]
Victoria Markman commented on DRILL-2418: ----------------------------------------- The original error happened much later during execution. After we started throwing this new error message, memory leak is not reproducible any more. I automated not supported implicit cast cases (under Functional/Passing/joins/implicit_cast_not_supported) and ran these in a loop of 20 iterations with 10 concurrent queries and I don't see memory leak any more. {code} heap(b) direct(b) jvm_direct(b) 2442800296 11799156 1509996862 2207849320 11798941 1493219626 1968600792 11798941 1493219598 1737329064 11798941 1493219582 1503715872 11798941 1493219558 1282546296 11798941 1493219538 1050979952 11798941 1493203108 3357847152 11799156 1493203340 3120888640 11798941 1493203308 2885817664 11798941 1493203288 2654219216 11798941 1493203268 2421491408 11799156 1493203236 2193786112 11798941 1493203220 1957000352 11798941 1493186745 1725191640 11798941 1493186725 1500829160 11798941 1493186701 1282839416 11798941 1493186689 1061937448 11798941 1493186661 3366707496 11798941 1493186893 3131793720 11798941 1493186865 2901312272 11798941 1493186849 {code} > Memory leak during execution if comparison function is not found > ---------------------------------------------------------------- > > Key: DRILL-2418 > URL: https://issues.apache.org/jira/browse/DRILL-2418 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow > Affects Versions: 0.8.0 > Reporter: Victoria Markman > Assignee: Chris Westin > Fix For: 1.2.0 > > Attachments: cast_tbl_1.parquet, cast_tbl_2.parquet, > not_supported_cast.txt > > > While testing implicit cast during join, I ran into an issue where if you run > a query that throws an exception during execution, eventually, if you run > enough of those, drill will run out of memory. > Here is a query example: > {code} > select count(*) from cast_tbl_1 a, cast_tbl_2 b where a.c_float = b.c_time > failed: RemoteRpcException: Failure while running fragment., Failure finding > function that runtime code generation expected. Signature: > compare_to_nulls_high( TIME:OPTIONAL, FLOAT4:OPTIONAL ) returns INT:REQUIRED > [ 633c8ce3-1ed2-4a0a-8248-1e3d5b4f7c0a on atsqa4-133.qa.lab:31010 ] > [ 633c8ce3-1ed2-4a0a-8248-1e3d5b4f7c0a on atsqa4-133.qa.lab:31010 ] > Test_Failed: 2015/03/10 18:34:15.0015 - Failed to execute. > {code} > If you set planner.slice_target to 1, you hit out of memory after about ~40 > or so of such failures on my cluster. > {code} > select count(*) from cast_tbl_1 a, cast_tbl_2 b where a.d38 = b.c_double > Query failed: OutOfMemoryException: You attempted to create a new child > allocator with initial reservation 3000000 but only 916199 bytes of memory > were available. > {code} > From the drillbit.log > {code} > 2015-03-10 18:34:34,588 [2b00c6c5-5525-ae65-25f8-24ea2d88ba2f:foreman] INFO > o.a.d.e.store.parquet.FooterGatherer - Fetch Parquet Footers: Executed 1 out > of 1 using 1 threads. Time: 1ms total, 1.190007ms avg, 1ms max. > 2015-03-10 18:34:34,591 [2b00c6c5-5525-ae65-25f8-24ea2d88ba2f:foreman] INFO > o.a.d.e.store.parquet.FooterGatherer - Fetch Parquet Footers: Executed 1 out > of 1 using 1 threads. Time: 0ms total, 0.953679ms avg, 0ms max. > 2015-03-10 18:34:34,627 [2b00c6c5-5525-ae65-25f8-24ea2d88ba2f:foreman] INFO > o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host > atsqa4-136.qa.lab. Skipping affinity to that host. > 2015-03-10 18:34:34,627 [2b00c6c5-5525-ae65-25f8-24ea2d88ba2f:foreman] INFO > o.a.d.e.s.parquet.ParquetGroupScan - Load Parquet RowGroup block maps: > Executed 1 out of 1 using 1 threads. Time: 1ms total, 1.609586ms avg, 1ms max. > 2015-03-10 18:34:34,629 [2b00c6c5-5525-ae65-25f8-24ea2d88ba2f:foreman] INFO > o.a.d.e.s.schedule.BlockMapBuilder - Failure finding Drillbit running on host > atsqa4-136.qa.lab. Skipping affinity to that host. > 2015-03-10 18:34:34,629 [2b00c6c5-5525-ae65-25f8-24ea2d88ba2f:foreman] INFO > o.a.d.e.s.parquet.ParquetGroupScan - Load Parquet RowGroup block maps: > Executed 1 out of 1 using 1 threads. Time: 1ms total, 1.270340ms avg, 1ms max. > 2015-03-10 18:34:34,684 [2b00c6c5-5525-ae65-25f8-24ea2d88ba2f:foreman] INFO > o.a.drill.exec.work.foreman.Foreman - State change requested. PENDING --> > FAILED > org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception > during fragment initialization: Failure while getting memory allocator for > fragment. > at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:195) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_71] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_71] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71] > Caused by: org.apache.drill.common.exceptions.ExecutionSetupException: > Failure while getting memory allocator for fragment. > at > org.apache.drill.exec.ops.FragmentContext.<init>(FragmentContext.java:119) > ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.foreman.Foreman.setupRootFragment(Foreman.java:535) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan(Foreman.java:307) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:511) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:186) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > ... 4 common frames omitted > Caused by: org.apache.drill.exec.memory.OutOfMemoryException: You attempted > to create a new child allocator with initial reservation 3000000 but only > 916199 bytes of memory were available. > at > org.apache.drill.exec.memory.TopLevelAllocator.getChildAllocator(TopLevelAllocator.java:121) > ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.ops.FragmentContext.<init>(FragmentContext.java:116) > ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > ... 8 common frames omitted > 2015-03-10 18:34:34,700 [2b00c6c5-5525-ae65-25f8-24ea2d88ba2f:foreman] ERROR > o.a.drill.exec.work.foreman.Foreman - Error > 96a7baf4-f17a-454c-831b-f3dc77bd4381: OutOfMemoryException: You attempted to > create a new child allocator with initial reservation 3000000 but only 916199 > bytes of memory were available. > org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception > during fragment initialization: Failure while getting memory allocator for > fragment. > at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:195) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_71] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_71] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71] > Caused by: org.apache.drill.common.exceptions.ExecutionSetupException: > Failure while getting memory allocator for fragment. > at > org.apache.drill.exec.ops.FragmentContext.<init>(FragmentContext.java:119) > ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.foreman.Foreman.setupRootFragment(Foreman.java:535) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan(Foreman.java:307) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:511) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:186) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > ... 4 common frames omitted > Caused by: org.apache.drill.exec.memory.OutOfMemoryException: You attempted > to create a new child allocator with initial reservation 3000000 but only > 916199 bytes of memory were available. > at > org.apache.drill.exec.memory.TopLevelAllocator.getChildAllocator(TopLevelAllocator.java:121) > ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.ops.FragmentContext.<init>(FragmentContext.java:116) > ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > ... 8 common frames omitted > 2015-03-10 18:34:34,700 [2b00c6c5-5525-ae65-25f8-24ea2d88ba2f:foreman] INFO > o.a.drill.exec.work.foreman.Foreman - foreman cleaning up - status: > [0=>[0=>FragmentData [isLocal=true, status=profile { > {code} > I will attach reproduction and I have to add that I have no proof that error > is actually causing memory leak (speculation on my part). -- This message was sent by Atlassian JIRA (v6.3.4#6332)