ahshahid commented on PR #43808:
URL: https://github.com/apache/spark/pull/43808#issuecomment-1811946724

   @dongjoon-hyun I think the reason for not catching the issue of reuse of 
exchange is a mix of multiple things
   1) Spark is not testing with any concrete DataSourceV2 implementation. ( 
like iceberg)
   2) The simulation of DataSourceV2 impl using InMemoryTableScan is buggy 
because of equals / hashcode not taking into account pushed runtime filters, as 
a result any reuse of exchange bug would not be caught ( i.e mismatch of cached 
exchange plans would not be detected, giving a false assurance of re-use0
   3) If I am not wrong, the tpcds tests are run using Hive  as DataSource and 
not sure if it supports push down of runtime filters.
   4) The bug in AQE only shows in TPCDS if table are partitioned and equi join 
involves partitioning column.  I am not sure if right now various tpcds tests 
use partitioned table or not.
   
   Yes I have been able to reproduce the issue using InMemoryTableScans as 
DataSourceV2 impl for tpcds tests. I will checkin a prototype test for 
reproducing the bug using q14b and if needed all tests can be run.
   
   Though I ought to point out that while running my test I also hit the issue 
of computeStats being called twice ( which throws error only in testing). I 
have not debugged that... yet. And not sure if the assertion of computeStats 
occuring only once is maintainable..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to