Re: [PR] [SPARK-36783][SQL] ScanOperation should not push Filter through nondeterministic Project [spark]

2024-05-15 Thread via GitHub


wForget commented on PR #34023:
URL: https://github.com/apache/spark/pull/34023#issuecomment-2114043502

   > Yes, because after pruning, less data are scanned and the 
non-deterministic function in the SELECT list may return different results if 
we don't prune ahead.
   
   Makes sense, thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-36783][SQL] ScanOperation should not push Filter through nondeterministic Project [spark]

2024-05-15 Thread via GitHub


cloud-fan commented on PR #34023:
URL: https://github.com/apache/spark/pull/34023#issuecomment-2114040674

   Yes, because after pruning, less data are scanned and the non-deterministic 
function in the SELECT list may return different results if we don't prune 
ahead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-36783][SQL] ScanOperation should not push Filter through nondeterministic Project [spark]

2024-05-15 Thread via GitHub


wForget commented on PR #34023:
URL: https://github.com/apache/spark/pull/34023#issuecomment-2113995703

   @cloud-fan This change also affects `PhysicalOperation`, causing hive 
partition pruning to not take effect when there is a non-deterministic project. 
Is this expected behavior?
   
   like:
   ```
   select * 
   from (
 select
   c1, reflect('java.net.URLDecoder', 'decode', c2, 'UTF-8') as c2, dt
 from test) t1
   where dt='2024-05-15' limit 10;
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org