yyanyy commented on code in PR #55252:
URL: https://github.com/apache/spark/pull/55252#discussion_r3054447715


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala:
##########
@@ -197,7 +206,8 @@ case class DataSourceV2ScanRelation(
       ),
       ordering = ordering.map(
         _.map(o => o.copy(child = QueryPlan.normalizeExpressions(o.child, 
output)))
-      )
+      ),
+      pushedFilters = pushedFilters.map(QueryPlan.normalizeExpressions(_, 
output))

Review Comment:
   I debated with myself a bit on this and decided to not include pushed 
filters whose references are no longer in the pruned output.
   
   The reason for me to include them before, is that I was a bit worried that a 
fully pushed filter can be lost from `pushedFilter` when the filtered column is 
not projected:
   
   > `SELECT j FROM t WHERE i > 3` with a connector that pushes both 
GreaterThan and IsNotNull:
   
   1. After analysis: `Project([j], Filter(i > 3, DataSourceV2Relation([i, 
j])))`
   2. InferFiltersFromConstraints (runs before V2ScanRelationPushDown): derives 
`IsNotNull(i) → Project([j], Filter(i > 3 AND IsNotNull(i), 
DataSourceV2Relation([i, j])))`
   3. pushDownFilters: both `i > 3` and `IsNotNull(i)` fully pushed → Filter 
node removed → `Project([j], ScanBuilderHolder), pushedFilterExpressions = [i > 
3]`
   4. pruneColumns: only j is referenced → `output = [j]`, column `i` pruned → 
pushed filter `i > 3` references pruned `i` → dropped
   
   And I was a bit worried since part of the intention for this field was to 
keep what guarantee the connector can make, and now we couldn't have the 
complete information.
   
   But for now, I think this is acceptable since 1/ pushedFilters is 
informational only; a potential future improvement could be to keep fully 
pushed filters as post-scan Filter nodes, which would prevent the column from 
being pruned in the first place, and 2/ it appears that spark plan validator 
will also catch and reject this case when it realizes that some references in 
expressions is not resolvable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to