[jira] [Updated] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
[ https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sorabh Hamirwasia updated DRILL-7154: - Reviewer: Boaz Ben-Zvi > TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled > - > > Key: DRILL-7154 > URL: https://issues.apache.org/jira/browse/DRILL-7154 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Hanumath Rao Maduri >Priority: Blocker > Fix For: 1.16.0 > > Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, > 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.data.log, > hashagg.nostats.foreman.log, hashagg.stats.disabled.data.log, > hashagg.stats.disabled.foreman.log > > > Here is TPCH 04 with sf 1000: > {noformat} > select > o.o_orderpriority, > count(*) as order_count > from > orders o > where > o.o_orderdate >= date '1996-10-01' > and o.o_orderdate < date '1996-10-01' + interval '3' month > and > exists ( > select > * > from > lineitem l > where > l.l_orderkey = o.o_orderkey > and l.l_commitdate < l.l_receiptdate > ) > group by > o.o_orderpriority > order by > o.o_orderpriority; > {noformat} > TPCH query 4 takes 30% longer. The plan is the same. But the Hash Agg > operator in the new plan is taking longer. One possible reason is that the > Hash Agg operator in the new plan is not using as many buckets as the old > plan did. The memory usage of the Hash Agg operator in the new plan is using > less memory compared to the old plan. > Here is the old plan: > {noformat} > 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT > order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 > rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 > network, 2.2631985057468002E10 memory}, id = 5645 > 00-01 Project(o_orderpriority=[$0], order_count=[$1]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, > 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 > memory}, id = 5644 > 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, > 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643 > 01-01 OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642 > 02-01SelectionVectorRemover : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641 > 02-02 Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640 > 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType > = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 > memory}, id = 5639 > 02-04 HashToRandomExchange(dist0=[[$0]]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, > cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 > memory}, id = 5638 > 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : > rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = > 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 > cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 > memory}, id = 5637 > 03-02 Project(o_orderpriority=[$1]) : rowType = > RecordType(ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = > {1.8694476940441746E10 rows, 8.145890595055101E10 cpu, 2.2499969127E10 io, > 3.25631968386048E12 network, 1.5311985057468002E10 memory}, id
[jira] [Updated] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
[ https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sorabh Hamirwasia updated DRILL-7154: - Labels: ready-to-commit (was: ) > TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled > - > > Key: DRILL-7154 > URL: https://issues.apache.org/jira/browse/DRILL-7154 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Hanumath Rao Maduri >Priority: Blocker > Labels: ready-to-commit > Fix For: 1.16.0 > > Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, > 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.data.log, > hashagg.nostats.foreman.log, hashagg.stats.disabled.data.log, > hashagg.stats.disabled.foreman.log > > > Here is TPCH 04 with sf 1000: > {noformat} > select > o.o_orderpriority, > count(*) as order_count > from > orders o > where > o.o_orderdate >= date '1996-10-01' > and o.o_orderdate < date '1996-10-01' + interval '3' month > and > exists ( > select > * > from > lineitem l > where > l.l_orderkey = o.o_orderkey > and l.l_commitdate < l.l_receiptdate > ) > group by > o.o_orderpriority > order by > o.o_orderpriority; > {noformat} > TPCH query 4 takes 30% longer. The plan is the same. But the Hash Agg > operator in the new plan is taking longer. One possible reason is that the > Hash Agg operator in the new plan is not using as many buckets as the old > plan did. The memory usage of the Hash Agg operator in the new plan is using > less memory compared to the old plan. > Here is the old plan: > {noformat} > 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT > order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 > rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 > network, 2.2631985057468002E10 memory}, id = 5645 > 00-01 Project(o_orderpriority=[$0], order_count=[$1]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, > 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 > memory}, id = 5644 > 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, > 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643 > 01-01 OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642 > 02-01SelectionVectorRemover : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641 > 02-02 Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640 > 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType > = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 > memory}, id = 5639 > 02-04 HashToRandomExchange(dist0=[[$0]]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, > cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 > memory}, id = 5638 > 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : > rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = > 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 > cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 > memory}, id = 5637 > 03-02 Project(o_orderpriority=[$1]) : rowType = > RecordType(ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = > {1.8694476940441746E10 rows, 8.145890595055101E10 cpu, 2.2499969127E10 io, > 3.25631968386
[jira] [Updated] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
[ https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-7154: -- Priority: Blocker (was: Critical) > TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled > - > > Key: DRILL-7154 > URL: https://issues.apache.org/jira/browse/DRILL-7154 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Boaz Ben-Zvi >Priority: Blocker > Fix For: 1.16.0 > > Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, > 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.data.log, > hashagg.nostats.foreman.log, hashagg.stats.disabled.data.log, > hashagg.stats.disabled.foreman.log > > > Here is TPCH 04 with sf 1000: > {noformat} > select > o.o_orderpriority, > count(*) as order_count > from > orders o > where > o.o_orderdate >= date '1996-10-01' > and o.o_orderdate < date '1996-10-01' + interval '3' month > and > exists ( > select > * > from > lineitem l > where > l.l_orderkey = o.o_orderkey > and l.l_commitdate < l.l_receiptdate > ) > group by > o.o_orderpriority > order by > o.o_orderpriority; > {noformat} > TPCH query 4 takes 30% longer. The plan is the same. But the Hash Agg > operator in the new plan is taking longer. One possible reason is that the > Hash Agg operator in the new plan is not using as many buckets as the old > plan did. The memory usage of the Hash Agg operator in the new plan is using > less memory compared to the old plan. > Here is the old plan: > {noformat} > 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT > order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 > rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 > network, 2.2631985057468002E10 memory}, id = 5645 > 00-01 Project(o_orderpriority=[$0], order_count=[$1]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, > 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 > memory}, id = 5644 > 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, > 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643 > 01-01 OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642 > 02-01SelectionVectorRemover : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641 > 02-02 Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640 > 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType > = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 > memory}, id = 5639 > 02-04 HashToRandomExchange(dist0=[[$0]]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, > cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 > memory}, id = 5638 > 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : > rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = > 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 > cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 > memory}, id = 5637 > 03-02 Project(o_orderpriority=[$1]) : rowType = > RecordType(ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = > {1.8694476940441746E10 rows, 8.145890595055101E10 cpu, 2.2499969127E10 io, > 3.25631968386048E12 network, 1.5311985057468002E10 memory}, id = 5636 >
[jira] [Updated] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
[ https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-7154: -- Attachment: hashagg.nostats.data.log hashagg.stats.disabled.data.log > TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled > - > > Key: DRILL-7154 > URL: https://issues.apache.org/jira/browse/DRILL-7154 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Boaz Ben-Zvi >Priority: Critical > Fix For: 1.16.0 > > Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, > 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.data.log, > hashagg.nostats.foreman.log, hashagg.stats.disabled.data.log, > hashagg.stats.disabled.foreman.log > > > Here is TPCH 04 with sf 1000: > {noformat} > select > o.o_orderpriority, > count(*) as order_count > from > orders o > where > o.o_orderdate >= date '1996-10-01' > and o.o_orderdate < date '1996-10-01' + interval '3' month > and > exists ( > select > * > from > lineitem l > where > l.l_orderkey = o.o_orderkey > and l.l_commitdate < l.l_receiptdate > ) > group by > o.o_orderpriority > order by > o.o_orderpriority; > {noformat} > TPCH query 4 takes 30% longer. The plan is the same. But the Hash Agg > operator in the new plan is taking longer. One possible reason is that the > Hash Agg operator in the new plan is not using as many buckets as the old > plan did. The memory usage of the Hash Agg operator in the new plan is using > less memory compared to the old plan. > Here is the old plan: > {noformat} > 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT > order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 > rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 > network, 2.2631985057468002E10 memory}, id = 5645 > 00-01 Project(o_orderpriority=[$0], order_count=[$1]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, > 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 > memory}, id = 5644 > 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, > 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643 > 01-01 OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642 > 02-01SelectionVectorRemover : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641 > 02-02 Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640 > 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType > = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 > memory}, id = 5639 > 02-04 HashToRandomExchange(dist0=[[$0]]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, > cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 > memory}, id = 5638 > 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : > rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = > 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 > cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 > memory}, id = 5637 > 03-02 Project(o_orderpriority=[$1]) : rowType = > RecordType(ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = > {1.8694476940441746E10 rows, 8.145890595055101E10 cpu, 2.2499969127E10 io, > 3.25631968386048E12
[jira] [Updated] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
[ https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-7154: -- Attachment: (was: hashagg.nostats.log) > TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled > - > > Key: DRILL-7154 > URL: https://issues.apache.org/jira/browse/DRILL-7154 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Boaz Ben-Zvi >Priority: Critical > Fix For: 1.16.0 > > Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, > 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.data.log, > hashagg.nostats.foreman.log, hashagg.stats.disabled.data.log, > hashagg.stats.disabled.foreman.log > > > Here is TPCH 04 with sf 1000: > {noformat} > select > o.o_orderpriority, > count(*) as order_count > from > orders o > where > o.o_orderdate >= date '1996-10-01' > and o.o_orderdate < date '1996-10-01' + interval '3' month > and > exists ( > select > * > from > lineitem l > where > l.l_orderkey = o.o_orderkey > and l.l_commitdate < l.l_receiptdate > ) > group by > o.o_orderpriority > order by > o.o_orderpriority; > {noformat} > TPCH query 4 takes 30% longer. The plan is the same. But the Hash Agg > operator in the new plan is taking longer. One possible reason is that the > Hash Agg operator in the new plan is not using as many buckets as the old > plan did. The memory usage of the Hash Agg operator in the new plan is using > less memory compared to the old plan. > Here is the old plan: > {noformat} > 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT > order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 > rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 > network, 2.2631985057468002E10 memory}, id = 5645 > 00-01 Project(o_orderpriority=[$0], order_count=[$1]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, > 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 > memory}, id = 5644 > 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, > 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643 > 01-01 OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642 > 02-01SelectionVectorRemover : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641 > 02-02 Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640 > 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType > = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 > memory}, id = 5639 > 02-04 HashToRandomExchange(dist0=[[$0]]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, > cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 > memory}, id = 5638 > 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : > rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = > 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 > cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 > memory}, id = 5637 > 03-02 Project(o_orderpriority=[$1]) : rowType = > RecordType(ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = > {1.8694476940441746E10 rows, 8.145890595055101E10 cpu, 2.2499969127E10 io, > 3.25631968386048E12 network, 1.5311985057468002E10 memory}, id
[jira] [Updated] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
[ https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-7154: -- Attachment: (was: hashagg.stats.disabled.log) > TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled > - > > Key: DRILL-7154 > URL: https://issues.apache.org/jira/browse/DRILL-7154 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Boaz Ben-Zvi >Priority: Critical > Fix For: 1.16.0 > > Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, > 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.data.log, > hashagg.nostats.foreman.log, hashagg.stats.disabled.data.log, > hashagg.stats.disabled.foreman.log > > > Here is TPCH 04 with sf 1000: > {noformat} > select > o.o_orderpriority, > count(*) as order_count > from > orders o > where > o.o_orderdate >= date '1996-10-01' > and o.o_orderdate < date '1996-10-01' + interval '3' month > and > exists ( > select > * > from > lineitem l > where > l.l_orderkey = o.o_orderkey > and l.l_commitdate < l.l_receiptdate > ) > group by > o.o_orderpriority > order by > o.o_orderpriority; > {noformat} > TPCH query 4 takes 30% longer. The plan is the same. But the Hash Agg > operator in the new plan is taking longer. One possible reason is that the > Hash Agg operator in the new plan is not using as many buckets as the old > plan did. The memory usage of the Hash Agg operator in the new plan is using > less memory compared to the old plan. > Here is the old plan: > {noformat} > 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT > order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 > rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 > network, 2.2631985057468002E10 memory}, id = 5645 > 00-01 Project(o_orderpriority=[$0], order_count=[$1]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, > 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 > memory}, id = 5644 > 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, > 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643 > 01-01 OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642 > 02-01SelectionVectorRemover : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641 > 02-02 Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640 > 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType > = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 > memory}, id = 5639 > 02-04 HashToRandomExchange(dist0=[[$0]]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, > cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 > memory}, id = 5638 > 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : > rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = > 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 > cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 > memory}, id = 5637 > 03-02 Project(o_orderpriority=[$1]) : rowType = > RecordType(ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = > {1.8694476940441746E10 rows, 8.145890595055101E10 cpu, 2.2499969127E10 io, > 3.25631968386048E12 network, 1.5311985057468002E10 memo
[jira] [Updated] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
[ https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-7154: -- Attachment: hashagg.nostats.foreman.log hashagg.stats.disabled.foreman.log > TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled > - > > Key: DRILL-7154 > URL: https://issues.apache.org/jira/browse/DRILL-7154 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Boaz Ben-Zvi >Priority: Critical > Fix For: 1.16.0 > > Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, > 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.foreman.log, > hashagg.nostats.log, hashagg.stats.disabled.foreman.log, > hashagg.stats.disabled.log > > > Here is TPCH 04 with sf 1000: > {noformat} > select > o.o_orderpriority, > count(*) as order_count > from > orders o > where > o.o_orderdate >= date '1996-10-01' > and o.o_orderdate < date '1996-10-01' + interval '3' month > and > exists ( > select > * > from > lineitem l > where > l.l_orderkey = o.o_orderkey > and l.l_commitdate < l.l_receiptdate > ) > group by > o.o_orderpriority > order by > o.o_orderpriority; > {noformat} > TPCH query 4 takes 30% longer. The plan is the same. But the Hash Agg > operator in the new plan is taking longer. One possible reason is that the > Hash Agg operator in the new plan is not using as many buckets as the old > plan did. The memory usage of the Hash Agg operator in the new plan is using > less memory compared to the old plan. > Here is the old plan: > {noformat} > 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT > order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 > rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 > network, 2.2631985057468002E10 memory}, id = 5645 > 00-01 Project(o_orderpriority=[$0], order_count=[$1]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, > 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 > memory}, id = 5644 > 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, > 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643 > 01-01 OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642 > 02-01SelectionVectorRemover : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641 > 02-02 Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640 > 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType > = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 > memory}, id = 5639 > 02-04 HashToRandomExchange(dist0=[[$0]]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, > cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 > memory}, id = 5638 > 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : > rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = > 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 > cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 > memory}, id = 5637 > 03-02 Project(o_orderpriority=[$1]) : rowType = > RecordType(ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = > {1.8694476940441746E10 rows, 8.145890595055101E10 cpu, 2.2499969127E10 io, > 3.25631968386048E12 netw
[jira] [Updated] (DRILL-7154) TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
[ https://issues.apache.org/jira/browse/DRILL-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-7154: -- Summary: TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled (was: TPCH query 4 and 17 take longer with sf 1000 when Statistics are disabled) > TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled > - > > Key: DRILL-7154 > URL: https://issues.apache.org/jira/browse/DRILL-7154 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Boaz Ben-Zvi >Priority: Critical > Fix For: 1.16.0 > > Attachments: 235a3ed4-e3d1-f3b7-39c5-fc947f56b6d5.sys.drill, > 235a471b-aa97-bfb5-207d-3f25b4b5fbbb.sys.drill, hashagg.nostats.log, > hashagg.stats.disabled.log > > > Here is TPCH 04 with sf 1000: > {noformat} > select > o.o_orderpriority, > count(*) as order_count > from > orders o > where > o.o_orderdate >= date '1996-10-01' > and o.o_orderdate < date '1996-10-01' + interval '3' month > and > exists ( > select > * > from > lineitem l > where > l.l_orderkey = o.o_orderkey > and l.l_commitdate < l.l_receiptdate > ) > group by > o.o_orderpriority > order by > o.o_orderpriority; > {noformat} > TPCH query 4 takes 30% longer. The plan is the same. But the Hash Agg > operator in the new plan is taking longer. One possible reason is that the > Hash Agg operator in the new plan is not using as many buckets as the old > plan did. The memory usage of the Hash Agg operator in the new plan is using > less memory compared to the old plan. > Here is the old plan: > {noformat} > 00-00Screen : rowType = RecordType(ANY o_orderpriority, BIGINT > order_count): rowcount = 375.0, cumulative cost = {1.9163601940441746E10 > rows, 9.07316867594483E10 cpu, 2.2499969127E10 io, 3.59423968386048E12 > network, 2.2631985057468002E10 memory}, id = 5645 > 00-01 Project(o_orderpriority=[$0], order_count=[$1]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9163226940441746E10 rows, 9.07313117594483E10 cpu, > 2.2499969127E10 io, 3.59423968386048E12 network, 2.2631985057468002E10 > memory}, id = 5644 > 00-02SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9159476940441746E10 rows, 9.07238117594483E10 cpu, 2.2499969127E10 io, > 3.59423968386048E12 network, 2.2631985057468002E10 memory}, id = 5643 > 01-01 OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9155726940441746E10 rows, 9.0643982838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5642 > 02-01SelectionVectorRemover : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9151976940441746E10 rows, 9.0640232838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5641 > 02-02 Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY > o_orderpriority, BIGINT order_count): rowcount = 375.0, cumulative cost = > {1.9148226940441746E10 rows, 9.0636482838025E10 cpu, 2.2499969127E10 io, > 3.56351968386048E12 network, 2.2631985057468002E10 memory}, id = 5640 > 02-03HashAgg(group=[{0}], order_count=[$SUM0($1)]) : rowType > = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 375.0, > cumulative cost = {1.9144476940441746E10 rows, 9.030890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.2571985057468002E10 > memory}, id = 5639 > 02-04 HashToRandomExchange(dist0=[[$0]]) : rowType = > RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = 3.75E7, > cumulative cost = {1.9106976940441746E10 rows, 8.955890595055101E10 cpu, > 2.2499969127E10 io, 3.56351968386048E12 network, 2.1911985057468002E10 > memory}, id = 5638 > 03-01HashAgg(group=[{0}], order_count=[COUNT()]) : > rowType = RecordType(ANY o_orderpriority, BIGINT order_count): rowcount = > 3.75E7, cumulative cost = {1.9069476940441746E10 rows, 8.895890595055101E10 > cpu, 2.2499969127E10 io, 3.25631968386048E12 network, 2.1911985057468002E10 > memory}, id = 5637 > 03-02 Project(o_orderpriority=[$1]) : rowType = > RecordType(ANY o_orderpriority): rowcount = 3.75E8, cumulative cost = > {1.8694476940441746E10 rows, 8.145890595055101E10 cpu, 2.2499969127E10 io, > 3.2563196838