logan-keede commented on issue #17261: URL: https://github.com/apache/datafusion/issues/17261#issuecomment-3503076679
```
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan
|
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | SortPreservingMergeExec: [c0@0 ASC NULLS LAST, c1@1
ASC NULLS LAST, c2@2 ASC NULLS LAST, c3@3 ASC NULLS LAST, c4@4 ASC NULLS LAST,
c5@5 ASC NULLS LAST, c6@6 ASC NULLS LAST, c7@7 ASC NULLS LAST, c8@8 ASC NULLS
LAST, c9@9 ASC NULLS LAST], metrics=[output_rows=10000,
elapsed_compute=5.645754ms, output_bytes=1562.5 KB]
|
| | UnionExec, metrics=[output_rows=10000,
elapsed_compute=665.791µs, output_bytes=1562.5 KB]
|
| | SortExec: expr=[c0@0 ASC NULLS LAST, c1@1 ASC
NULLS LAST, c2@2 ASC NULLS LAST, c3@3 ASC NULLS LAST, c4@4 ASC NULLS LAST, c5@5
ASC NULLS LAST, c6@6 ASC NULLS LAST, c7@7 ASC NULLS LAST, c8@8 ASC NULLS LAST],
preserve_partitioning=[true], metrics=[output_rows=9000,
elapsed_compute=12.662881ms, output_bytes=1406.2 KB, spill_count=0,
spilled_bytes=0.0 B, spilled_rows=0, batches_split=0]
|
| | ProjectionExec: expr=[c0@0 as c0, c1@1 as c1,
c2@2 as c2, c3@3 as c3, c4@4 as c4, c5@5 as c5, c6@6 as c6, c7@7 as c7, c8@8 as
c8, CAST(c9@9 AS Decimal128(20, 0)) as c9], metrics=[output_rows=9000,
elapsed_compute=838.677µs, output_bytes=1406.2 KB]
|
| | RepartitionExec:
partitioning=RoundRobinBatch(12), input_partitions=9, metrics=[spill_count=0,
spilled_bytes=0.0 B, spilled_rows=0, fetch_time=8.041255ms,
repartition_time=9ns, send_time=44.691µs]
|
| | UnionExec, metrics=[output_rows=9000,
elapsed_compute=162.148µs, output_bytes=1335.9 KB]
|
| | ProjectionExec: expr=[c0@0 as c0, c1@1 as
c1, c2@2 as c2, c3@3 as c3, c4@4 as c4, c5@5 as c5, c6@6 as c6, c7@7 as c7,
CAST(c8@8 AS Decimal128(20, 0)) as c8, c9@9 as c9], metrics=[output_rows=8000,
elapsed_compute=733.787µs, output_bytes=1187.5 KB]
|
| | UnionExec, metrics=[output_rows=8000,
elapsed_compute=132.399µs, output_bytes=1125.0 KB]
|
| | ProjectionExec: expr=[c0@0 as c0, c1@1
as c1, c2@2 as c2, c3@3 as c3, c4@4 as c4, c5@5 as c5, c6@6 as c6, CAST(c7@7 AS
Decimal128(20, 0)) as c7, c8@8 as c8, c9@9 as c9], metrics=[output_rows=7000,
elapsed_compute=687.809µs, output_bytes=984.4 KB]
|
| | UnionExec,
metrics=[output_rows=7000, elapsed_compute=111.441µs, output_bytes=929.7 KB]
|
| | ProjectionExec: expr=[c0@0 as c0,
c1@1 as c1, c2@2 as c2, c3@3 as c3, c4@4 as c4, c5@5 as c5, CAST(c6@6 AS
Decimal128(20, 0)) as c6, c7@7 as c7, c8@8 as c8, c9@9 as c9],
metrics=[output_rows=6000, elapsed_compute=642.729µs, output_bytes=796.9 KB]
|
| | UnionExec,
metrics=[output_rows=6000, elapsed_compute=91.198µs, output_bytes=750.0 KB]
|
| | ProjectionExec: expr=[c0@0 as
c0, c1@1 as c1, c2@2 as c2, c3@3 as c3, c4@4 as c4, CAST(c5@5 AS Decimal128(20,
0)) as c5, c6@6 as c6, c7@7 as c7, c8@8 as c8, c9@9 as c9],
metrics=[output_rows=5000, elapsed_compute=569.976µs, output_bytes=625.0 KB]
|
| | UnionExec,
metrics=[output_rows=5000, elapsed_compute=72.586µs, output_bytes=585.9 KB]
|
| | ProjectionExec: expr=[c0@0
as c0, c1@1 as c1, c2@2 as c2, c3@3 as c3, CAST(c4@4 AS Decimal128(20, 0)) as
c4, c5@5 as c5, c6@6 as c6, c7@7 as c7, c8@8 as c8, c9@9 as c9],
metrics=[output_rows=4000, elapsed_compute=456.798µs, output_bytes=468.8 KB]
|
| | UnionExec,
metrics=[output_rows=4000, elapsed_compute=42.078µs, output_bytes=437.5 KB]
|
| | ProjectionExec:
expr=[c0@0 as c0, c1@1 as c1, c2@2 as c2, CAST(c3@3 AS Decimal128(20, 0)) as
c3, c4@4 as c4, c5@5 as c5, c6@6 as c6, c7@7 as c7, c8@8 as c8, c9@9 as c9],
metrics=[output_rows=3000, elapsed_compute=343.243µs, output_bytes=328.1 KB]
|
| | UnionExec,
metrics=[output_rows=3000, elapsed_compute=31.691µs, output_bytes=304.7 KB]
|
| | ProjectionExec:
expr=[c0@0 as c0, c1@1 as c1, CAST(c2@2 AS Decimal128(20, 0)) as c2, c3@3 as
c3, c4@4 as c4, c5@5 as c5, c6@6 as c6, c7@7 as c7, c8@8 as c8, c9@9 as c9],
metrics=[output_rows=2000, elapsed_compute=227.115µs, output_bytes=203.1 KB]
|
| | UnionExec,
metrics=[output_rows=2000, elapsed_compute=18.204µs, output_bytes=187.5 KB]
|
| |
ProjectionExec: expr=[CAST(c0@0 AS Decimal128(20, 0)) as c0, CAST(1 AS
Decimal128(20, 0)) as c1, 2 as c2, 3 as c3, 4 as c4, 5 as c5, 6 as c6, 7 as c7,
8 as c8, 9 as c9], metrics=[output_rows=1000, elapsed_compute=331.999µs,
output_bytes=93.8 KB]
|
| |
DataSourceExec: partitions=1, partition_sizes=[1], output_ordering=c0@0 ASC,
metrics=[]
|
| |
ProjectionExec: expr=[CAST(0 AS Decimal128(20, 0)) as c0, CAST(c1@0 AS
Decimal128(20, 0)) as c1, 2 as c2, 3 as c3, 4 as c4, 5 as c5, 6 as c6, 7 as c7,
8 as c8, 9 as c9], metrics=[output_rows=1000, elapsed_compute=235.608µs,
output_bytes=93.8 KB]
|
| |
DataSourceExec: partitions=1, partition_sizes=[1], metrics=[]
|
| | ProjectionExec:
expr=[CAST(0 AS Decimal128(20, 0)) as c0, CAST(1 AS Decimal128(20, 0)) as c1,
CAST(c2@0 AS Decimal128(20, 0)) as c2, 3 as c3, 4 as c4, 5 as c5, 6 as c6, 7 as
c7, 8 as c8, 9 as c9], metrics=[output_rows=1000, elapsed_compute=274.968µs,
output_bytes=101.6 KB]
|
| | DataSourceExec:
partitions=1, partition_sizes=[1], metrics=[]
|
| | ProjectionExec:
expr=[CAST(0 AS Decimal128(20, 0)) as c0, CAST(1 AS Decimal128(20, 0)) as c1,
CAST(2 AS Decimal128(20, 0)) as c2, CAST(c3@0 AS Decimal128(20, 0)) as c3, 4 as
c4, 5 as c5, 6 as c6, 7 as c7, 8 as c8, 9 as c9], metrics=[output_rows=1000,
elapsed_compute=278.882µs, output_bytes=109.4 KB]
|
| | DataSourceExec:
partitions=1, partition_sizes=[1], metrics=[]
|
| | ProjectionExec:
expr=[CAST(0 AS Decimal128(20, 0)) as c0, CAST(1 AS Decimal128(20, 0)) as c1,
CAST(2 AS Decimal128(20, 0)) as c2, CAST(3 AS Decimal128(20, 0)) as c3,
CAST(c4@0 AS Decimal128(20, 0)) as c4, 5 as c5, 6 as c6, 7 as c7, 8 as c8, 9 as
c9], metrics=[output_rows=1000, elapsed_compute=357.672µs, output_bytes=117.2
KB]
|
| | DataSourceExec:
partitions=1, partition_sizes=[1], metrics=[]
|
| | ProjectionExec: expr=[CAST(0
AS Decimal128(20, 0)) as c0, CAST(1 AS Decimal128(20, 0)) as c1, CAST(2 AS
Decimal128(20, 0)) as c2, CAST(3 AS Decimal128(20, 0)) as c3, CAST(4 AS
Decimal128(20, 0)) as c4, CAST(c5@0 AS Decimal128(20, 0)) as c5, 6 as c6, 7 as
c7, 8 as c8, 9 as c9], metrics=[output_rows=1000, elapsed_compute=290.957µs,
output_bytes=125.0 KB]
|
| | DataSourceExec:
partitions=1, partition_sizes=[1], metrics=[]
|
| | ProjectionExec: expr=[CAST(0 AS
Decimal128(20, 0)) as c0, CAST(1 AS Decimal128(20, 0)) as c1, CAST(2 AS
Decimal128(20, 0)) as c2, CAST(3 AS Decimal128(20, 0)) as c3, CAST(4 AS
Decimal128(20, 0)) as c4, CAST(5 AS Decimal128(20, 0)) as c5, CAST(c6@0 AS
Decimal128(20, 0)) as c6, 7 as c7, 8 as c8, 9 as c9],
metrics=[output_rows=1000, elapsed_compute=350.579µs, output_bytes=132.8 KB]
|
| | DataSourceExec: partitions=1,
partition_sizes=[1], metrics=[]
|
| | ProjectionExec: expr=[CAST(0 AS
Decimal128(20, 0)) as c0, CAST(1 AS Decimal128(20, 0)) as c1, CAST(2 AS
Decimal128(20, 0)) as c2, CAST(3 AS Decimal128(20, 0)) as c3, CAST(4 AS
Decimal128(20, 0)) as c4, CAST(5 AS Decimal128(20, 0)) as c5, CAST(6 AS
Decimal128(20, 0)) as c6, CAST(c7@0 AS Decimal128(20, 0)) as c7, 8 as c8, 9 as
c9], metrics=[output_rows=1000, elapsed_compute=215.061µs, output_bytes=140.6
KB] |
| | DataSourceExec: partitions=1,
partition_sizes=[1], metrics=[]
|
| | ProjectionExec: expr=[CAST(0 AS
Decimal128(20, 0)) as c0, CAST(1 AS Decimal128(20, 0)) as c1, CAST(2 AS
Decimal128(20, 0)) as c2, CAST(3 AS Decimal128(20, 0)) as c3, CAST(4 AS
Decimal128(20, 0)) as c4, CAST(5 AS Decimal128(20, 0)) as c5, CAST(6 AS
Decimal128(20, 0)) as c6, CAST(7 AS Decimal128(20, 0)) as c7, CAST(c8@0 AS
Decimal128(20, 0)) as c8, 9 as c9], metrics=[output_rows=1000,
elapsed_compute=310.342µs, output_bytes=148.4 KB] |
| | DataSourceExec: partitions=1,
partition_sizes=[1], metrics=[]
|
| | SortExec: expr=[c9@9 ASC NULLS LAST],
preserve_partitioning=[false], metrics=[output_rows=1000,
elapsed_compute=381.008µs, output_bytes=0.0 B, spill_count=0, spilled_bytes=0.0
B, spilled_rows=0, batches_split=0]
|
| | ProjectionExec: expr=[CAST(0 AS Decimal128(20,
0)) as c0, CAST(1 AS Decimal128(20, 0)) as c1, CAST(2 AS Decimal128(20, 0)) as
c2, CAST(3 AS Decimal128(20, 0)) as c3, CAST(4 AS Decimal128(20, 0)) as c4,
CAST(5 AS Decimal128(20, 0)) as c5, CAST(6 AS Decimal128(20, 0)) as c6, CAST(7
AS Decimal128(20, 0)) as c7, CAST(8 AS Decimal128(20, 0)) as c8, CAST(c9@0 AS
Decimal128(20, 0)) as c9], metrics=[output_rows=1000,
elapsed_compute=243.051µs, output_bytes=156.2 KB] |
| | DataSourceExec: partitions=1,
partition_sizes=[1], metrics=[]
|
| |
|
+-------------------+------------------------------------------------------------------------------------------------------------------
```
This is the result I get by using `EXPLAIN ANALYZE` with `ctx.sql` where
`ctx` is an instance of `SessionContext` , on the other hand using
datafusion-cli I get:-
```
ORDER BY c0, c1, c2, c3, c4, c5, c6, c7, c8, c9;
+-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan
|
+-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | SortPreservingMergeExec: [c0@0 ASC NULLS LAST, c1@1
ASC NULLS LAST, c2@2 ASC NULLS LAST, c3@3 ASC NULLS LAST, c4@4 ASC NULLS LAST,
c5@5 ASC NULLS LAST, c6@6 ASC NULLS LAST, c7@7 ASC NULLS LAST, c8@8 ASC NULLS
LAST, c9@9 ASC NULLS LAST], metrics=[output_rows=30, elapsed_compute=202.88µs,
output_bytes=2.3 KB] |
| | UnionExec, metrics=[output_rows=30,
elapsed_compute=1.117171ms, output_bytes=2.3 KB]
|
| | SortExec: expr=[c0@0 ASC NULLS LAST],
preserve_partitioning=[false], metrics=[output_rows=3,
elapsed_compute=1.712165ms, output_bytes=0.0 B, spill_count=0,
spilled_bytes=0.0 B, spilled_rows=0, batches_split=0]
|
| | ProjectionExec: expr=[CAST(c0@0 AS Int64) as c0,
1 as c1, 2 as c2, 3 as c3, 4 as c4, 5 as c5, 6 as c6, 7 as c7, 8 as c8, 9 as
c9], metrics=[output_rows=3, elapsed_compute=79.9µs, output_bytes=280.0 B]
|
| | DataSourceExec: partitions=1,
partition_sizes=[1], metrics=[]
|
| | SortExec: expr=[c1@1 ASC NULLS LAST],
preserve_partitioning=[false], metrics=[output_rows=3,
elapsed_compute=1.244665ms, output_bytes=0.0 B, spill_count=0,
spilled_bytes=0.0 B, spilled_rows=0, batches_split=0]
|
| | ProjectionExec: expr=[0 as c0, CAST(c1@0 AS
Int64) as c1, 2 as c2, 3 as c3, 4 as c4, 5 as c5, 6 as c6, 7 as c7, 8 as c8, 9
as c9], metrics=[output_rows=3, elapsed_compute=86.887µs, output_bytes=280.0 B]
|
| | DataSourceExec: partitions=1,
partition_sizes=[1], metrics=[]
|
| | SortExec: expr=[c2@2 ASC NULLS LAST],
preserve_partitioning=[false], metrics=[output_rows=3,
elapsed_compute=1.217714ms, output_bytes=0.0 B, spill_count=0,
spilled_bytes=0.0 B, spilled_rows=0, batches_split=0]
|
| | ProjectionExec: expr=[0 as c0, 1 as c1,
CAST(c2@0 AS Int64) as c2, 3 as c3, 4 as c4, 5 as c5, 6 as c6, 7 as c7, 8 as
c8, 9 as c9], metrics=[output_rows=3, elapsed_compute=59.552µs,
output_bytes=280.0 B]
|
| | DataSourceExec: partitions=1,
partition_sizes=[1], metrics=[]
|
| | SortExec: expr=[c3@3 ASC NULLS LAST],
preserve_partitioning=[false], metrics=[output_rows=3,
elapsed_compute=136.622µs, output_bytes=0.0 B, spill_count=0, spilled_bytes=0.0
B, spilled_rows=0, batches_split=0]
|
| | ProjectionExec: expr=[0 as c0, 1 as c1, 2 as c2,
CAST(c3@0 AS Int64) as c3, 4 as c4, 5 as c5, 6 as c6, 7 as c7, 8 as c8, 9 as
c9], metrics=[output_rows=3, elapsed_compute=90.253µs, output_bytes=280.0 B]
|
| | DataSourceExec: partitions=1,
partition_sizes=[1], metrics=[]
|
| | SortExec: expr=[c4@4 ASC NULLS LAST],
preserve_partitioning=[false], metrics=[output_rows=3,
elapsed_compute=1.248026ms, output_bytes=0.0 B, spill_count=0,
spilled_bytes=0.0 B, spilled_rows=0, batches_split=0]
|
| | ProjectionExec: expr=[0 as c0, 1 as c1, 2 as c2,
3 as c3, CAST(c4@0 AS Int64) as c4, 5 as c5, 6 as c6, 7 as c7, 8 as c8, 9 as
c9], metrics=[output_rows=3, elapsed_compute=100.374µs, output_bytes=280.0 B]
|
| | DataSourceExec: partitions=1,
partition_sizes=[1], metrics=[]
|
| | SortExec: expr=[c5@5 ASC NULLS LAST],
preserve_partitioning=[false], metrics=[output_rows=3,
elapsed_compute=1.01873ms, output_bytes=0.0 B, spill_count=0, spilled_bytes=0.0
B, spilled_rows=0, batches_split=0]
|
| | ProjectionExec: expr=[0 as c0, 1 as c1, 2 as c2,
3 as c3, 4 as c4, CAST(c5@0 AS Int64) as c5, 6 as c6, 7 as c7, 8 as c8, 9 as
c9], metrics=[output_rows=3, elapsed_compute=70.435µs, output_bytes=280.0 B]
|
| | DataSourceExec: partitions=1,
partition_sizes=[1], metrics=[]
|
| | SortExec: expr=[c6@6 ASC NULLS LAST],
preserve_partitioning=[false], metrics=[output_rows=3,
elapsed_compute=512.958µs, output_bytes=0.0 B, spill_count=0, spilled_bytes=0.0
B, spilled_rows=0, batches_split=0]
|
| | ProjectionExec: expr=[0 as c0, 1 as c1, 2 as c2,
3 as c3, 4 as c4, 5 as c5, CAST(c6@0 AS Int64) as c6, 7 as c7, 8 as c8, 9 as
c9], metrics=[output_rows=3, elapsed_compute=64.86µs, output_bytes=280.0 B]
|
| | DataSourceExec: partitions=1,
partition_sizes=[1], metrics=[]
|
| | SortExec: expr=[c7@7 ASC NULLS LAST],
preserve_partitioning=[false], metrics=[output_rows=3,
elapsed_compute=584.371µs, output_bytes=0.0 B, spill_count=0, spilled_bytes=0.0
B, spilled_rows=0, batches_split=0]
|
| | ProjectionExec: expr=[0 as c0, 1 as c1, 2 as c2,
3 as c3, 4 as c4, 5 as c5, 6 as c6, CAST(c7@0 AS Int64) as c7, 8 as c8, 9 as
c9], metrics=[output_rows=3, elapsed_compute=64.695µs, output_bytes=280.0 B]
|
| | DataSourceExec: partitions=1,
partition_sizes=[1], metrics=[]
|
| | SortExec: expr=[c8@8 ASC NULLS LAST],
preserve_partitioning=[false], metrics=[output_rows=3,
elapsed_compute=161.32µs, output_bytes=0.0 B, spill_count=0, spilled_bytes=0.0
B, spilled_rows=0, batches_split=0]
|
| | ProjectionExec: expr=[0 as c0, 1 as c1, 2 as c2,
3 as c3, 4 as c4, 5 as c5, 6 as c6, 7 as c7, CAST(c8@0 AS Int64) as c8, 9 as
c9], metrics=[output_rows=3, elapsed_compute=72.97µs, output_bytes=280.0 B]
|
| | DataSourceExec: partitions=1,
partition_sizes=[1], metrics=[]
|
| | SortExec: expr=[c9@9 ASC NULLS LAST],
preserve_partitioning=[false], metrics=[output_rows=3,
elapsed_compute=48.301µs, output_bytes=0.0 B, spill_count=0, spilled_bytes=0.0
B, spilled_rows=0, batches_split=0]
|
| | ProjectionExec: expr=[0 as c0, 1 as c1, 2 as c2,
3 as c3, 4 as c4, 5 as c5, 6 as c6, 7 as c7, 8 as c8, CAST(c9@0 AS Int64) as
c9], metrics=[output_rows=3, elapsed_compute=24.715µs, output_bytes=280.0 B]
|
| | DataSourceExec: partitions=1,
partition_sizes=[1], metrics=[]
|
| |
|
+-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```
Which is much cleaner and I believe that's what @alamb is referring to in
the above comment.
Is this difference between `ctx.sql` and `datafusion-cli` expected?
It seems like benchmarks is skipping over some optimizations somehow.
Perhaps making sure that benchmarks and datafusion-cli execute queries in
the same way will be enough to solve this issue. WDYT?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
