[ https://issues.apache.org/jira/browse/DRILL-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Timothy Farkas resolved DRILL-5138. ----------------------------------- Resolution: Fixed > TopN operator on top of ~110 GB data set is very slow > ----------------------------------------------------- > > Key: DRILL-5138 > URL: https://issues.apache.org/jira/browse/DRILL-5138 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators > Reporter: Rahul Challapalli > Assignee: Timothy Farkas > > git.commit.id.abbrev=cf2b7c7 > No of cores : 23 > No of disks : 5 > DRILL_MAX_DIRECT_MEMORY="24G" > DRILL_MAX_HEAP="12G" > The below query ran for more than 4 hours and did not complete. The table is > ~110 GB > {code} > select * from catalog_sales order by cs_quantity, cs_wholesale_cost limit 1; > {code} > Physical Plan : > {code} > 00-00 Screen : rowType = RecordType(ANY *): rowcount = 1.0, cumulative > cost = {1.00798629141E10 rows, 4.17594320691E10 cpu, 0.0 io, > 4.1287118487552E13 network, 0.0 memory}, id = 352 > 00-01 Project(*=[$0]) : rowType = RecordType(ANY *): rowcount = 1.0, > cumulative cost = {1.0079862914E10 rows, 4.1759432069E10 cpu, 0.0 io, > 4.1287118487552E13 network, 0.0 memory}, id = 351 > 00-02 Project(T0¦¦*=[$0]) : rowType = RecordType(ANY T0¦¦*): rowcount > = 1.0, cumulative cost = {1.0079862914E10 rows, 4.1759432069E10 cpu, 0.0 io, > 4.1287118487552E13 network, 0.0 memory}, id = 350 > 00-03 SelectionVectorRemover : rowType = RecordType(ANY T0¦¦*, ANY > cs_quantity, ANY cs_wholesale_cost): rowcount = 1.0, cumulative cost = > {1.0079862914E10 rows, 4.1759432069E10 cpu, 0.0 io, 4.1287118487552E13 > network, 0.0 memory}, id = 349 > 00-04 Limit(fetch=[1]) : rowType = RecordType(ANY T0¦¦*, ANY > cs_quantity, ANY cs_wholesale_cost): rowcount = 1.0, cumulative cost = > {1.0079862913E10 rows, 4.1759432068E10 cpu, 0.0 io, 4.1287118487552E13 > network, 0.0 memory}, id = 348 > 00-05 SingleMergeExchange(sort0=[1 ASC], sort1=[2 ASC]) : > rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, ANY cs_wholesale_cost): > rowcount = 1.439980416E9, cumulative cost = {1.0079862912E10 rows, > 4.1759432064E10 cpu, 0.0 io, 4.1287118487552E13 network, 0.0 memory}, id = 347 > 01-01 SelectionVectorRemover : rowType = RecordType(ANY T0¦¦*, > ANY cs_quantity, ANY cs_wholesale_cost): rowcount = 1.439980416E9, cumulative > cost = {8.639882496E9 rows, 3.0239588736E10 cpu, 0.0 io, 2.3592639135744E13 > network, 0.0 memory}, id = 346 > 01-02 TopN(limit=[1]) : rowType = RecordType(ANY T0¦¦*, ANY > cs_quantity, ANY cs_wholesale_cost): rowcount = 1.439980416E9, cumulative > cost = {7.19990208E9 rows, 2.879960832E10 cpu, 0.0 io, 2.3592639135744E13 > network, 0.0 memory}, id = 345 > 01-03 Project(T0¦¦*=[$0], cs_quantity=[$1], > cs_wholesale_cost=[$2]) : rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, > ANY cs_wholesale_cost): rowcount = 1.439980416E9, cumulative cost = > {5.759921664E9 rows, 2.879960832E10 cpu, 0.0 io, 2.3592639135744E13 network, > 0.0 memory}, id = 344 > 01-04 HashToRandomExchange(dist0=[[$1]], dist1=[[$2]]) : > rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, ANY cs_wholesale_cost, ANY > E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.439980416E9, cumulative cost = > {5.759921664E9 rows, 2.879960832E10 cpu, 0.0 io, 2.3592639135744E13 network, > 0.0 memory}, id = 343 > 02-01 UnorderedMuxExchange : rowType = RecordType(ANY > T0¦¦*, ANY cs_quantity, ANY cs_wholesale_cost, ANY > E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.439980416E9, cumulative cost = > {4.319941248E9 rows, 1.1519843328E10 cpu, 0.0 io, 0.0 network, 0.0 memory}, > id = 342 > 03-01 Project(T0¦¦*=[$0], cs_quantity=[$1], > cs_wholesale_cost=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($2, > hash32AsDouble($1))]) : rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, ANY > cs_wholesale_cost, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.439980416E9, > cumulative cost = {2.879960832E9 rows, 1.0079862912E10 cpu, 0.0 io, 0.0 > network, 0.0 memory}, id = 341 > 03-02 Project(T0¦¦*=[$0], cs_quantity=[$1], > cs_wholesale_cost=[$2]) : rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, > ANY cs_wholesale_cost): rowcount = 1.439980416E9, cumulative cost = > {1.439980416E9 rows, 4.319941248E9 cpu, 0.0 io, 0.0 network, 0.0 memory}, id > = 340 > 03-03 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath > [path=maprfs:///drill/testdata/tpcds/parquet/sf1000/catalog_sales]], > selectionRoot=maprfs:/drill/testdata/tpcds/parquet/sf1000/catalog_sales, > numFiles=1, usedMetadataFile=false, columns=[`*`]]]) : rowType = > (DrillRecordRow[*, cs_quantity, cs_wholesale_cost]): rowcount = > 1.439980416E9, cumulative cost = {1.439980416E9 rows, 4.319941248E9 cpu, 0.0 > io, 0.0 network, 0.0 memory}, id = 339 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)