Robert Hou created DRILL-6565: --------------------------------- Summary: cume_dist does not return enough rows Key: DRILL-6565 URL: https://issues.apache.org/jira/browse/DRILL-6565 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 1.14.0 Reporter: Robert Hou Assignee: Pritesh Maker Attachments: drillbit.log.7802
This query should return 64 rows but only returns 38 rows: alter session set `planner.width.max_per_node` = 1; alter session set `planner.width.max_per_query` = 1; select * from ( select cume_dist() over (order by Index) IntervalSecondValuea, Index from (select * from dfs.`/drill/testdata/batch_memory/fourvarchar_asc_nulls_16MB_1GB.parquet` order by BigIntvalue)) d where d.Index = 1; I tried to reproduce the problem by using a smaller table, but it does not reproduce. I tried to reproduce the problem without the outside select statement, but it does not reproduce. Here is the explain plan: {noformat} | 00-00 Screen : rowType = RecordType(DOUBLE IntervalSecondValuea, ANY Index): rowcount = 12000.0, cumulative cost = {757200.0 rows, 1.1573335922911648E7 cpu, 0.0 io, 0.0 network, 1920000.0 memory}, id = 4034 00-01 ProjectAllowDup(IntervalSecondValuea=[$0], Index=[$1]) : rowType = RecordType(DOUBLE IntervalSecondValuea, ANY Index): rowcount = 12000.0, cumulative cost = {756000.0 rows, 1.1572135922911648E7 cpu, 0.0 io, 0.0 network, 1920000.0 memory}, id = 4033 00-02 Project(w0$o0=[$1], $0=[$0]) : rowType = RecordType(DOUBLE w0$o0, ANY $0): rowcount = 12000.0, cumulative cost = {744000.0 rows, 1.1548135922911648E7 cpu, 0.0 io, 0.0 network, 1920000.0 memory}, id = 4032 00-03 SelectionVectorRemover : rowType = RecordType(ANY $0, DOUBLE w0$o0): rowcount = 12000.0, cumulative cost = {732000.0 rows, 1.1524135922911648E7 cpu, 0.0 io, 0.0 network, 1920000.0 memory}, id = 4031 00-04 Filter(condition=[=($0, 1)]) : rowType = RecordType(ANY $0, DOUBLE w0$o0): rowcount = 12000.0, cumulative cost = {720000.0 rows, 1.1512135922911648E7 cpu, 0.0 io, 0.0 network, 1920000.0 memory}, id = 4030 00-05 Window(window#0=[window(partition {} order by [0] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [CUME_DIST()])]) : rowType = RecordType(ANY $0, DOUBLE w0$o0): rowcount = 80000.0, cumulative cost = {640000.0 rows, 1.1144135922911648E7 cpu, 0.0 io, 0.0 network, 1920000.0 memory}, id = 4029 00-06 SelectionVectorRemover : rowType = RecordType(ANY $0): rowcount = 80000.0, cumulative cost = {560000.0 rows, 1.0984135922911648E7 cpu, 0.0 io, 0.0 network, 1920000.0 memory}, id = 4028 00-07 Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY $0): rowcount = 80000.0, cumulative cost = {480000.0 rows, 1.0904135922911648E7 cpu, 0.0 io, 0.0 network, 1920000.0 memory}, id = 4027 00-08 Project($0=[ITEM($0, 'Index')]) : rowType = RecordType(ANY $0): rowcount = 80000.0, cumulative cost = {400000.0 rows, 5692067.961455824 cpu, 0.0 io, 0.0 network, 1280000.0 memory}, id = 4026 00-09 SelectionVectorRemover : rowType = RecordType(DYNAMIC_STAR T2¦¦**, ANY BigIntvalue): rowcount = 80000.0, cumulative cost = {320000.0 rows, 5612067.961455824 cpu, 0.0 io, 0.0 network, 1280000.0 memory}, id = 4025 00-10 Sort(sort0=[$1], dir0=[ASC]) : rowType = RecordType(DYNAMIC_STAR T2¦¦**, ANY BigIntvalue): rowcount = 80000.0, cumulative cost = {240000.0 rows, 5532067.961455824 cpu, 0.0 io, 0.0 network, 1280000.0 memory}, id = 4024 00-11 Project(T2¦¦**=[$0], BigIntvalue=[$1]) : rowType = RecordType(DYNAMIC_STAR T2¦¦**, ANY BigIntvalue): rowcount = 80000.0, cumulative cost = {160000.0 rows, 320000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 4023 00-12 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/batch_memory/fourvarchar_asc_nulls_16MB_1GB.parquet]], selectionRoot=maprfs:/drill/testdata/batch_memory/fourvarchar_asc_nulls_16MB_1GB.parquet, numFiles=1, numRowGroups=6, usedMetadataFile=false, columns=[`**`]]]) : rowType = RecordType(DYNAMIC_STAR **, ANY BigIntvalue): rowcount = 80000.0, cumulative cost = {80000.0 rows, 160000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 4022 {noformat} I have attached the drillbit.log. The commit id is: | 1.14.0-SNAPSHOT | aa127b70b1e46f7f4aa19881f25eda583627830a | DRILL-6523: Fix NPE for describe of partial schema | 22.06.2018 @ 11:28:23 PDT | r...@mapr.com | 23.06.2018 @ 02:05:10 PDT | fourvarchar_asc_nulls95.q -- This message was sent by Atlassian JIRA (v7.6.3#76005)