[ https://issues.apache.org/jira/browse/DRILL-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596250#comment-14596250 ]
Victoria Markman commented on DRILL-3265: ----------------------------------------- This query now runs: memory leak was fixed. I need to verify correctness of the result. > Query with window function and sort below that spills to disk runs out of > memory > -------------------------------------------------------------------------------- > > Key: DRILL-3265 > URL: https://issues.apache.org/jira/browse/DRILL-3265 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators > Affects Versions: 1.0.0 > Reporter: Victoria Markman > Assignee: Deneche A. Hakim > Labels: window_function > Fix For: 1.1.0 > > Attachments: drill-3265.log > > > Failure: > {code} > 0: jdbc:drill:schema=dfs> select > . . . . . . . . . . . . > sum(ss_quantity) over(partition by ss_store_sk > order by ss_sold_date_sk) > . . . . . . . . . . . . > from store_sales; > java.lang.RuntimeException: java.sql.SQLException: RESOURCE ERROR: One or > more nodes ran out of memory while executing the query. > Fragment 1:13 > [Error Id: 72609220-e431-41fc-8505-8f9740c96153 on atsqa4-133.qa.lab:31010] > at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73) > at > sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:85) > at sqlline.TableOutputFormat.print(TableOutputFormat.java:116) > at sqlline.SqlLine.print(SqlLine.java:1583) > at sqlline.Commands.execute(Commands.java:852) > at sqlline.Commands.sql(Commands.java:751) > at sqlline.SqlLine.dispatch(SqlLine.java:738) > at sqlline.SqlLine.begin(SqlLine.java:612) > at sqlline.SqlLine.start(SqlLine.java:366) > at sqlline.SqlLine.main(SqlLine.java:259) > {code} > Failure looks legitimate from customer point of view (granted that we know > that we can run out of memory in some cases) > However, there are couple of problems with this scenario: > 1. It looks like we are running out of memory during disk based sort > {code} > 2015-06-09 16:55:45,693 [2a88e633-b714-49a1-c36a-509a9c817c77:frag:1:19] WARN > o.a.d.e.p.i.xsort.ExternalSortBatch - Starting to merge. 3311 batch groups. > Current allocated memory: 17456644 > 2015-06-09 16:55:45,763 [2a88e633-b714-49a1-c36a-509a9c817c77:frag:1:13] WARN > o.a.d.exec.memory.BufferAllocator - Unable to allocate buffer of size 20833 > due to memory limit. Current allocation: 19989451 > java.lang.Exception: null > at > org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:253) > [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] > at > org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:273) > [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] > at > org.apache.drill.exec.vector.UInt1Vector.allocateNew(UInt1Vector.java:171) > [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] > at > org.apache.drill.exec.vector.NullableIntVector.allocateNew(NullableIntVector.java:204) > [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] > at > org.apache.drill.exec.vector.AllocationHelper.allocateNew(AllocationHelper.java:56) > [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] > at > org.apache.drill.exec.test.generated.PriorityQueueCopierGen139.allocateVectors(PriorityQueueCopierTemplate.java:123) > [na:na] > at > org.apache.drill.exec.test.generated.PriorityQueueCopierGen139.next(PriorityQueueCopierTemplate.java:66) > [na:na] > at > org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:224) > [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:146) > [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105) > [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:95) > [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:92) > [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:146) > [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105) > [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:95) > [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.window.WindowFrameRecordBatch.innerNext(WindowFrameRecordBatch.java:123) > [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:146) > [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105) > [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:95) > [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129) > [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:146) > [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:83) > [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:95) > [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:73) > [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:259) > [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:253) > [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] > at java.security.AccessController.doPrivileged(Native Method) > [na:1.7.0_71] > at javax.security.auth.Subject.doAs(Subject.java:415) [na:1.7.0_71] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) > [hadoop-common-2.5.1-mapr-1503.jar:na] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:253) > [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_71] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_71] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71] > 2015-06-09 16:55:45,765 [2a88e633-b714-49a1-c36a-509a9c817c77:frag:1:13] INFO > o.a.d.c.exceptions.UserException - User Error Occurred > org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more > nodes ran out of memory while executing the query. > {code} > 2. The same sort running by itself and spilling to disk finishes successfully. > 3. There is an indication in drillbit.log that memory leak has occurred. > {code} > 2015-06-09 16:55:46,053 [2a88e633-b714-49a1-c36a-509a9c817c77:frag:1:11] > ERROR o.a.d.c.exceptions.UserException - SYSTEM ERROR: > java.lang.IllegalStateException: Failure while closing accountor. Expected > private and shared pools to be set to initial values. However, one or more > were not. Stats are > zone init allocated delta > private 10000000 10636 9989364 > shared 10000000 3114634 6885366. > Fragment 1:11 > [Error Id: 607d4f33-dda8-4cd9-8489-e34e914737e9 on atsqa4-133.qa.lab:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > java.lang.IllegalStateException: Failure while closing accountor. Expected > private and shared pools to be set to initial values. However, one or more > were not. Stats are > zone init allocated delta > private 10000000 10636 9989364 > shared 10000000 3114634 6885366. > Fragment 1:11 > {code} > 3. Ocasionally, after executing the same query after it failed, I see an > error on failing to create a spill directory even though I have enough disk > space in my spill directory that is configured to be on a local file system. > (remember, sort by itself runs fine) > {code} > 0: jdbc:drill:schema=dfs> select > . . . . . . . . . . . . > sum(ss_quantity) over(partition by ss_store_sk > order by ss_sold_date_sk) > . . . . . . . . . . . . > from store_sales; > java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: > java.io.IOException: Mkdirs failed to create > /tmp/drill/spill/2a88e74d-df47-146d-0cf9-8773303a7f37/major_fragment_1/minor_fragment_0/operator_4 > (exists=false, cwd=file:///opt/mapr/drill/drill-1.1.0/bin) > {code} > Additional information: > Plan for the query that runs out of memory: > {code} > 0: jdbc:drill:schema=dfs> explain plan for select > . . . . . . . . . . . . > sum(ss_quantity) over(partition by ss_store_sk > order by ss_sold_date_sk) > . . . . . . . . . . . . > from store_sales; > +------+------+ > | text | json | > +------+------+ > | 00-00 Screen > 00-01 UnionExchange > 01-01 Project(EXPR$0=[CASE(>($3, 0), CAST($4):ANY, null)]) > 01-02 Window(window#0=[window(partition {1} order by [2] range > between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT($0), > $SUM0($0)])])01-03 SelectionVectorRemover > 01-04 Sort(sort0=[$1], sort1=[$2], dir0=[ASC], dir1=[ASC]) > 01-05 Project(ss_quantity=[$0], ss_store_sk=[$1], > ss_sold_date_sk=[$2]) > 01-06 HashToRandomExchange(dist0=[[$1]]) > 02-01 UnorderedMuxExchange > 03-01 Project(ss_quantity=[$0], ss_store_sk=[$1], > ss_sold_date_sk=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($1))]) > 03-02 Project(ss_quantity=[$2], ss_store_sk=[$1], > ss_sold_date_sk=[$0]) > 03-03 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:///tpcds100/parquet/store_sales]], > selectionRoot=/tpcds100/parquet/store_sales, numFiles=1, columns=[`ss_qua > ntity`, `ss_store_sk`, `ss_sold_date_sk`]]]) > {code} > The same sort executes succesfully when window function is not present: > {code} > 0: jdbc:drill:schema=dfs> explain plan for select ss_quantity, ss_store_sk, > ss_sold_date_sk from store_sales order by ss_store_sk, ss_sold_date_sk; > +------+------+ > | text | json | > +------+------+ > | 00-00 Screen > 00-01 Project(ss_quantity=[$0], ss_store_sk=[$1], ss_sold_date_sk=[$2]) > 00-02 SingleMergeExchange(sort0=[1 ASC], sort1=[2 ASC]) > 01-01 SelectionVectorRemover > 01-02 Sort(sort0=[$1], sort1=[$2], dir0=[ASC], dir1=[ASC]) > 01-03 Project(ss_quantity=[$0], ss_store_sk=[$1], > ss_sold_date_sk=[$2]) > 01-04 HashToRandomExchange(dist0=[[$1]], dist1=[[$2]]) > 02-01 UnorderedMuxExchange > 03-01 Project(ss_quantity=[$0], ss_store_sk=[$1], > ss_sold_date_sk=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($2, > hash64AsDouble($1)))]) > 03-02 Project(ss_quantity=[$2], ss_store_sk=[$1], > ss_sold_date_sk=[$0]) > 03-03 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:///tpcds100/parquet/store_sales]], > selectionRoot=/tpcds100/parquet/store_sales, numFiles=1, > columns=[`ss_quantity`, `ss_store_sk`, `ss_sold_date_sk`]]]) > {code} > To reproduce: > 1. Single drillbit > 2. Configuration: > DRILL_MAX_DIRECT_MEMORY="8G" > DRILL_HEAP="4G" > 3. Dataset: tpcds100 parquet > 4. Run query: > {code} > select > sum(ss_quantity) over(partition by ss_store_sk order by ss_sold_date_sk) > from store_sales; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)