Josh created SPARK-35864: ---------------------------- Summary: CatalogFileIndex only used if Partition Columns Nonempty Key: SPARK-35864 URL: https://issues.apache.org/jira/browse/SPARK-35864 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.2 Reporter: Josh
Currently, when deciding whether to use a CatalogFileIndex, we gate on catalogTable.get.partitionColumnNames.nonEmpty ([see here|[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L398])] I believe what we actually want is to check that the catalogTable.get.dataSchema.nonEmpty, as I don't think it's actually necessary that a table have partition columns in order to be read from a CatalogFileIndex. This isn't a correctness issue, just a missed optimization any time there are no partition columns for a table. Palantir [fixed this in our fork|https://github.com/palantir/spark/commit/e040ff5bf4d1b2d37264ad19468e0892c63b9798] a long time ago, but I don't think anyone ever remembered to push it upstream. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org