Github user ericl commented on the issue: https://github.com/apache/spark/pull/14690 > For one thing, a ListingFileCatalog performs a file tree traversal right off the bat. However, the external catalog returns the locations of partitions as part of the listPartitionsByFilter call. I believe that should suffice for the purpose of building a query plan for metastore-backed tables and executing it. You'd have to re-implement a large portion of the parallel traversal logic here right? I think we should keep this PR minimal and leave that for future work. I am also thinking of adding a per-directory file listing cache as a followup to avoid performance regressions, which would likely involve refactoring this path anyways. >I would be wary of amending our data sources to support case-insensitive field resolution. For one thing, strictly speaking it can lead to ambiguity in schema resolution. In theâpotential but unlikelyâevent that a (case-sensitive) data source schema has two distinct fields x1 and x2 such that x1.toLowerCase == x2.toLowerCase we're going to get undefined behavior. > For another, for case-sensitive data sources this adds code complexity in their implementation. I do agree this might be an issue with other datasources. For parquet though, I talked with @liancheng and we don't think there are any issues with supporting case-insensitive field resolution. Given that, I think we can also leave this for future work when we add datasource table support. It might also be that we need to add back something like https://github.com/apache/spark/pull/14750 > Finally, this would require us to read the schema files. That's something I'm trying to avoid in this patch. Not sure what you mean here, but the parquet change should be execution time only. I'll submit a pr here for that.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org