Shekharrajak opened a new pull request, #3060: URL: https://github.com/apache/datafusion-comet/pull/3060
Ref https://github.com/apache/datafusion-comet/issues/3053 ## Rationale for this change Dynamic Partition Pruning (DPP) was previously disabled for native_datafusion scan due to subquery handling concerns. However, Spark's DPP mechanism already evaluates dynamic filters and provides pre-filtered partition lists via dynamicallySelectedPartitions. This PR enables DPP support, achieving up to 301x I/O reduction by leveraging Spark's existing DPP infrastructure with native scan. ## What changes are included in this PR? * Remove DPP fallback in CometNativeScan.isSupported() - native scan now accepts DPP queries * Add DPP auto-selection in CometScanRule.selectScan() - automatically selects native_datafusion for queries with dynamic pruning filters ## How are these changes tested? Unit tests Add CometDPPSuite - test suite validating DPP with native scan Add CometDPPBenchmark - benchmark showing I/O reduction metrics ``` Implementation numOutputRows Reduction Factor -------------------------------------------------------------------------------- Spark (baseline) 15,781,140 1.0x Comet (auto scan) 5,295,380 3.0x Comet (native_datafusion + DPP) 52,500 301x ``` Run the benchmark using ``` SPARK_GENERATE_BENCHMARK_FILES=1 make benchmark-org.apache.spark.sql.benchmark.CometDPPBenchmark # check the result: cat spark/benchmarks/CometDPPBenchmark-jdk17-results.txt ``` <img width="946" height="914" alt="Screenshot 2026-01-10 at 1 25 43 AM" src="https://github.com/user-attachments/assets/596198c8-d676-4a93-b716-ef7eebdb502f" /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
