[
https://issues.apache.org/jira/browse/DRILL-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Timothy Farkas resolved DRILL-5138.
-----------------------------------
Resolution: Fixed
> TopN operator on top of ~110 GB data set is very slow
> -----------------------------------------------------
>
> Key: DRILL-5138
> URL: https://issues.apache.org/jira/browse/DRILL-5138
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Relational Operators
> Reporter: Rahul Challapalli
> Assignee: Timothy Farkas
>
> git.commit.id.abbrev=cf2b7c7
> No of cores : 23
> No of disks : 5
> DRILL_MAX_DIRECT_MEMORY="24G"
> DRILL_MAX_HEAP="12G"
> The below query ran for more than 4 hours and did not complete. The table is
> ~110 GB
> {code}
> select * from catalog_sales order by cs_quantity, cs_wholesale_cost limit 1;
> {code}
> Physical Plan :
> {code}
> 00-00 Screen : rowType = RecordType(ANY *): rowcount = 1.0, cumulative
> cost = {1.00798629141E10 rows, 4.17594320691E10 cpu, 0.0 io,
> 4.1287118487552E13 network, 0.0 memory}, id = 352
> 00-01 Project(*=[$0]) : rowType = RecordType(ANY *): rowcount = 1.0,
> cumulative cost = {1.0079862914E10 rows, 4.1759432069E10 cpu, 0.0 io,
> 4.1287118487552E13 network, 0.0 memory}, id = 351
> 00-02 Project(T0¦¦*=[$0]) : rowType = RecordType(ANY T0¦¦*): rowcount
> = 1.0, cumulative cost = {1.0079862914E10 rows, 4.1759432069E10 cpu, 0.0 io,
> 4.1287118487552E13 network, 0.0 memory}, id = 350
> 00-03 SelectionVectorRemover : rowType = RecordType(ANY T0¦¦*, ANY
> cs_quantity, ANY cs_wholesale_cost): rowcount = 1.0, cumulative cost =
> {1.0079862914E10 rows, 4.1759432069E10 cpu, 0.0 io, 4.1287118487552E13
> network, 0.0 memory}, id = 349
> 00-04 Limit(fetch=[1]) : rowType = RecordType(ANY T0¦¦*, ANY
> cs_quantity, ANY cs_wholesale_cost): rowcount = 1.0, cumulative cost =
> {1.0079862913E10 rows, 4.1759432068E10 cpu, 0.0 io, 4.1287118487552E13
> network, 0.0 memory}, id = 348
> 00-05 SingleMergeExchange(sort0=[1 ASC], sort1=[2 ASC]) :
> rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, ANY cs_wholesale_cost):
> rowcount = 1.439980416E9, cumulative cost = {1.0079862912E10 rows,
> 4.1759432064E10 cpu, 0.0 io, 4.1287118487552E13 network, 0.0 memory}, id = 347
> 01-01 SelectionVectorRemover : rowType = RecordType(ANY T0¦¦*,
> ANY cs_quantity, ANY cs_wholesale_cost): rowcount = 1.439980416E9, cumulative
> cost = {8.639882496E9 rows, 3.0239588736E10 cpu, 0.0 io, 2.3592639135744E13
> network, 0.0 memory}, id = 346
> 01-02 TopN(limit=[1]) : rowType = RecordType(ANY T0¦¦*, ANY
> cs_quantity, ANY cs_wholesale_cost): rowcount = 1.439980416E9, cumulative
> cost = {7.19990208E9 rows, 2.879960832E10 cpu, 0.0 io, 2.3592639135744E13
> network, 0.0 memory}, id = 345
> 01-03 Project(T0¦¦*=[$0], cs_quantity=[$1],
> cs_wholesale_cost=[$2]) : rowType = RecordType(ANY T0¦¦*, ANY cs_quantity,
> ANY cs_wholesale_cost): rowcount = 1.439980416E9, cumulative cost =
> {5.759921664E9 rows, 2.879960832E10 cpu, 0.0 io, 2.3592639135744E13 network,
> 0.0 memory}, id = 344
> 01-04 HashToRandomExchange(dist0=[[$1]], dist1=[[$2]]) :
> rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, ANY cs_wholesale_cost, ANY
> E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.439980416E9, cumulative cost =
> {5.759921664E9 rows, 2.879960832E10 cpu, 0.0 io, 2.3592639135744E13 network,
> 0.0 memory}, id = 343
> 02-01 UnorderedMuxExchange : rowType = RecordType(ANY
> T0¦¦*, ANY cs_quantity, ANY cs_wholesale_cost, ANY
> E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.439980416E9, cumulative cost =
> {4.319941248E9 rows, 1.1519843328E10 cpu, 0.0 io, 0.0 network, 0.0 memory},
> id = 342
> 03-01 Project(T0¦¦*=[$0], cs_quantity=[$1],
> cs_wholesale_cost=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($2,
> hash32AsDouble($1))]) : rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, ANY
> cs_wholesale_cost, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.439980416E9,
> cumulative cost = {2.879960832E9 rows, 1.0079862912E10 cpu, 0.0 io, 0.0
> network, 0.0 memory}, id = 341
> 03-02 Project(T0¦¦*=[$0], cs_quantity=[$1],
> cs_wholesale_cost=[$2]) : rowType = RecordType(ANY T0¦¦*, ANY cs_quantity,
> ANY cs_wholesale_cost): rowcount = 1.439980416E9, cumulative cost =
> {1.439980416E9 rows, 4.319941248E9 cpu, 0.0 io, 0.0 network, 0.0 memory}, id
> = 340
> 03-03 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath
> [path=maprfs:///drill/testdata/tpcds/parquet/sf1000/catalog_sales]],
> selectionRoot=maprfs:/drill/testdata/tpcds/parquet/sf1000/catalog_sales,
> numFiles=1, usedMetadataFile=false, columns=[`*`]]]) : rowType =
> (DrillRecordRow[*, cs_quantity, cs_wholesale_cost]): rowcount =
> 1.439980416E9, cumulative cost = {1.439980416E9 rows, 4.319941248E9 cpu, 0.0
> io, 0.0 network, 0.0 memory}, id = 339
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)