[jira] [Created] (SPARK-35864) CatalogFileIndex only used if Partition Columns Nonempty

Josh (Jira) Wed, 23 Jun 2021 09:57:29 -0700

Josh created SPARK-35864:
----------------------------

             Summary: CatalogFileIndex only used if Partition Columns Nonempty
                 Key: SPARK-35864
                 URL: https://issues.apache.org/jira/browse/SPARK-35864
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.1.2
            Reporter: Josh



Currently, when deciding whether to use a CatalogFileIndex, we gate on 
catalogTable.get.partitionColumnNames.nonEmpty ([see 
here|[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L398])]
 I believe what we actually want is to check that the 
catalogTable.get.dataSchema.nonEmpty, as I don't think it's actually necessary 
that a table have partition columns in order to be read from a 
CatalogFileIndex. This isn't a correctness issue, just a missed optimization 
any time there are no partition columns for a table.

 

Palantir [fixed this in our 
fork|https://github.com/palantir/spark/commit/e040ff5bf4d1b2d37264ad19468e0892c63b9798]
 a long time ago, but I don't think anyone ever remembered to push it upstream.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35864) CatalogFileIndex only used if Partition Columns Nonempty

Reply via email to