alamb commented on code in PR #20015:
URL: https://github.com/apache/datafusion/pull/20015#discussion_r2731983401
##########
datafusion/sqllogictest/test_files/projection_pushdown.slt:
##########
@@ -361,13 +361,58 @@ SELECT id, s['value'] FROM simple_struct ORDER BY
s['value'];
5 250
4 300
+###
+# Test 4.4: Projection with duplicate column through Sort
+# The projection expands the logical output (3→4 columns) but reduces physical
columns
+# since the duplicate column reuses an existing source column.
+###
+
+statement ok
+COPY (
+ SELECT
+ column1 as col_a,
+ column2 as col_b,
+ column3 as col_c
+ FROM VALUES
+ (1, 2, 3),
+ (4, 5, 6),
+ (7, 8, 9)
+) TO 'test_files/scratch/projection_pushdown/three_cols.parquet'
+STORED AS PARQUET;
+
+statement ok
+CREATE EXTERNAL TABLE three_cols STORED AS PARQUET
+LOCATION 'test_files/scratch/projection_pushdown/three_cols.parquet';
+
+query TT
+EXPLAIN SELECT col_a, col_b, col_c, col_b as col_b_dup FROM three_cols ORDER
BY col_a;
+----
+logical_plan
+01)Sort: three_cols.col_a ASC NULLS LAST
+02)--Projection: three_cols.col_a, three_cols.col_b, three_cols.col_c,
three_cols.col_b AS col_b_dup
+03)----TableScan: three_cols projection=[col_a, col_b, col_c]
+physical_plan
+01)SortExec: expr=[col_a@0 ASC NULLS LAST], preserve_partitioning=[false]
+02)--DataSourceExec: file_groups={1 group:
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/projection_pushdown/three_cols.parquet]]},
projection=[col_a, col_b, col_c, col_b@1 as col_b_dup], file_type=parquet
+
+# Verify correctness
+query IIII
+SELECT col_a, col_b, col_c, col_b as col_b_dup FROM three_cols ORDER BY col_a;
Review Comment:
given this data was inserted in order of `col_a` it might be better to check
a different sort order (e.g. sort by col_a desc)
##########
datafusion/sqllogictest/test_files/projection_pushdown.slt:
##########
@@ -361,13 +361,58 @@ SELECT id, s['value'] FROM simple_struct ORDER BY
s['value'];
5 250
4 300
+###
+# Test 4.4: Projection with duplicate column through Sort
+# The projection expands the logical output (3→4 columns) but reduces physical
columns
Review Comment:
I don't understand the comment "reduces physical columns" as there are three
physical column and all three are scanned 🤔
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]