alamb commented on code in PR #14273:
URL: https://github.com/apache/datafusion/pull/14273#discussion_r1929724090
##########
datafusion/sqllogictest/test_files/tpch/plans/q6.slt.part:
##########
@@ -31,13 +31,13 @@ logical_plan
01)Projection: sum(lineitem.l_extendedprice * lineitem.l_discount) AS revenue
02)--Aggregate: groupBy=[[]], aggr=[[sum(lineitem.l_extendedprice *
lineitem.l_discount)]]
03)----Projection: lineitem.l_extendedprice, lineitem.l_discount
-04)------Filter: lineitem.l_shipdate >= Date32("1994-01-01") AND
lineitem.l_shipdate < Date32("1995-01-01") AND lineitem.l_discount >=
Decimal128(Some(5),15,2) AND lineitem.l_discount <= Decimal128(Some(7),15,2)
AND lineitem.l_quantity < Decimal128(Some(2400),15,2)
-05)--------TableScan: lineitem projection=[l_quantity, l_extendedprice,
l_discount, l_shipdate], partial_filters=[lineitem.l_shipdate >=
Date32("1994-01-01"), lineitem.l_shipdate < Date32("1995-01-01"),
lineitem.l_discount >= Decimal128(Some(5),15,2), lineitem.l_discount <=
Decimal128(Some(7),15,2), lineitem.l_quantity < Decimal128(Some(2400),15,2)]
+04)------Filter: lineitem.l_shipdate >= Date32("1994-01-01") AND
lineitem.l_shipdate < Date32("1995-01-01") AND CAST(lineitem.l_discount AS
Float64) >= Float64(0.049999999999999996) AND CAST(lineitem.l_discount AS
Float64) <= Float64(0.06999999999999999) AND lineitem.l_quantity <
Decimal128(Some(2400),15,2)
Review Comment:
This will likely cause a performance regression as it will cast the entire
`lineitem.l_discount` column to Float before comparison where previously it
could compare to a constant.
##########
datafusion/sqllogictest/test_files/tpch/plans/q11.slt.part:
##########
@@ -49,7 +49,7 @@ limit 10;
logical_plan
01)Sort: value DESC NULLS FIRST, fetch=10
02)--Projection: partsupp.ps_partkey, sum(partsupp.ps_supplycost *
partsupp.ps_availqty) AS value
-03)----Inner Join: Filter: CAST(sum(partsupp.ps_supplycost *
partsupp.ps_availqty) AS Decimal128(38, 15)) >
__scalar_sq_1.sum(partsupp.ps_supplycost * partsupp.ps_availqty) *
Float64(0.0001)
+03)----Inner Join: Filter: CAST(sum(partsupp.ps_supplycost *
partsupp.ps_availqty) AS Float64) > __scalar_sq_1.sum(partsupp.ps_supplycost *
partsupp.ps_availqty) * Float64(0.0001)
Review Comment:
I vaguely remember the use of Decimal here was important for TPCH results
(maybe for correctness or something 🤔 )
##########
datafusion/core/tests/parquet/mod.rs:
##########
@@ -184,7 +184,13 @@ impl TestOutput {
/// and the appropriate scenario
impl ContextWithParquet {
async fn new(scenario: Scenario, unit: Unit) -> Self {
- Self::with_config(scenario, unit, SessionConfig::new()).await
+ let mut session_config = SessionConfig::new();
+ // TODO (https://github.com/apache/datafusion/issues/12817) once this
is the default behavior, remove from here
Review Comment:
Does this means that DataFusion will no longer prune predicates like
`decimal_col = 5.0`?
If so, this like a significant regression / issue for anyone who relies on
decimal types (like comet for example)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]