[jira] [Created] (SPARK-33760) Extend Dynamic Partition Pruning Support to DataSources

Anoop Johnson (Jira) Fri, 11 Dec 2020 09:09:04 -0800

Anoop Johnson created SPARK-33760:
-------------------------------------

             Summary: Extend Dynamic Partition Pruning Support to DataSources
                 Key: SPARK-33760
                 URL: https://issues.apache.org/jira/browse/SPARK-33760
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.0.1
            Reporter: Anoop Johnson



The implementation of Dynamic Partition Pruning  (DPP) in Spark is 
[specific|https://github.com/apache/spark/blob/fb2e3af4b5d92398d57e61b766466cc7efd9d7cb/sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala#L59-L64]
 to HadoopFSRelation. As a result, DPP is not triggered for queries that use 
data sources. 

The DataSource v2 readers can expose the partition metadata. Can we use this 
metadata and extend DPP to work on data sources as well?

Would appreciate thoughts or corner cases we need to handle.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33760) Extend Dynamic Partition Pruning Support to DataSources

Reply via email to