ahshahid commented on PR #43808: URL: https://github.com/apache/spark/pull/43808#issuecomment-1811946724
@dongjoon-hyun I think the reason for not catching the issue of reuse of exchange is a mix of multiple things 1) Spark is not testing with any concrete DataSourceV2 implementation. ( like iceberg) 2) The simulation of DataSourceV2 impl using InMemoryTableScan is buggy because of equals / hashcode not taking into account pushed runtime filters, as a result any reuse of exchange bug would not be caught ( i.e mismatch of cached exchange plans would not be detected, giving a false assurance of re-use0 3) If I am not wrong, the tpcds tests are run using Hive as DataSource and not sure if it supports push down of runtime filters. 4) The bug in AQE only shows in TPCDS if table are partitioned and equi join involves partitioning column. I am not sure if right now various tpcds tests use partitioned table or not. Yes I have been able to reproduce the issue using InMemoryTableScans as DataSourceV2 impl for tpcds tests. I will checkin a prototype test for reproducing the bug using q14b and if needed all tests can be run. Though I ought to point out that while running my test I also hit the issue of computeStats being called twice ( which throws error only in testing). I have not debugged that... yet. And not sure if the assertion of computeStats occuring only once is maintainable.. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org