Shekharrajak opened a new pull request, #3060:
URL: https://github.com/apache/datafusion-comet/pull/3060

   
   Ref https://github.com/apache/datafusion-comet/issues/3053
   
   ## Rationale for this change
   
   Dynamic Partition Pruning (DPP) was previously disabled for 
native_datafusion scan due to subquery handling concerns. However, Spark's DPP 
mechanism already evaluates dynamic filters and provides pre-filtered partition 
lists via dynamicallySelectedPartitions. This PR enables DPP support, achieving 
up to 301x I/O reduction by leveraging Spark's existing DPP infrastructure with 
native scan.
   
   ## What changes are included in this PR?
   
   * Remove DPP fallback in CometNativeScan.isSupported() - native scan now 
accepts DPP queries
   * Add DPP auto-selection in CometScanRule.selectScan() - automatically 
selects native_datafusion for queries with dynamic pruning filters
   
   
   ## How are these changes tested?
   
   Unit tests 
   
   Add CometDPPSuite - test suite validating DPP with native scan
   Add CometDPPBenchmark - benchmark showing I/O reduction metrics
   
   
   ```
   Implementation                    numOutputRows       Reduction Factor
   
--------------------------------------------------------------------------------
   Spark (baseline)                       15,781,140       1.0x
   Comet (auto scan)                       5,295,380       3.0x
   Comet (native_datafusion + DPP)            52,500       301x
   ```
   
   Run the benchmark using 
   
   ```
   SPARK_GENERATE_BENCHMARK_FILES=1 make 
benchmark-org.apache.spark.sql.benchmark.CometDPPBenchmark
   
   # check the result: cat spark/benchmarks/CometDPPBenchmark-jdk17-results.txt
   ```
   
   <img width="946" height="914" alt="Screenshot 2026-01-10 at 1 25 43 AM" 
src="https://github.com/user-attachments/assets/596198c8-d676-4a93-b716-ef7eebdb502f";
 />
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to