BlakeOrth opened a new issue, #17049: URL: https://github.com/apache/datafusion/issues/17049
### Describe the bug When using a `ListingTableFactory` (i.e. using the `datafusion-cli`) tables that leverage Hive partitioning do not yield the expected columns derived from the hive key/value pairs (and other such oddities). ### To Reproduce A publicly accessible S3 bucket hosting Hive partitioned parquet data can be used to show the issues. The general structure of the data can be seen below: ```console $ aws s3 ls s3://overturemaps-us-west-2/release/2025-07-23.0/ PRE theme=addresses/ PRE theme=base/ PRE theme=buildings/ PRE theme=divisions/ PRE theme=places/ PRE theme=transportation/ ``` ```console $ aws s3 ls s3://overturemaps-us-west-2/release/2025-07-23.0/ --recursive 2025-07-22 14:42:50 1073537247 release/2025-07-23.0/theme=addresses/type=address/part-00000-57f746d8-98a1-4faa-94c2-4f084343ecac-c000.zstd.parquet 2025-07-22 14:43:06 1013189887 release/2025-07-23.0/theme=addresses/type=address/part-00001-57f746d8-98a1-4faa-94c2-4f084343ecac-c000.zstd.parquet . . . ``` Here are a couple of examples off odd behavior when trying to access this data. ```sql DataFusion CLI v49.0.0 > SET datafusion.execution.listing_table_ignore_subdirectory to false; 0 row(s) fetched. Elapsed 0.001 seconds. > CREATE EXTERNAL TABLE overture_maps STORED AS PARQUET LOCATION 's3://overturemaps-us-west-2/release/2025-07-23.0/'; 0 row(s) fetched. Elapsed 0.645 seconds. > describe overture_maps; +-------------+-----------+-------------+ | column_name | data_type | is_nullable | +-------------+-----------+-------------+ +-------------+-----------+-------------+ 0 row(s) fetched. Elapsed 0.001 seconds. ``` In the above example no schema has been detected at all, either from Hive keys or the underlying parquet data itself, in spite of ensuring the listing table can traverse subdirectories. ```sql > CREATE EXTERNAL TABLE overture_maps STORED AS PARQUET LOCATION 's3://overturemaps-us-west-2/release/2025-07-23.0/theme=addresses/'; 0 row(s) fetched. Elapsed 0.911 seconds. > select column_name, is_nullable from information_schema.columns where table_name='overture_maps'; +----------------+-------------+ | column_name | is_nullable | +----------------+-------------+ | id | YES | | geometry | YES | | bbox | YES | | country | YES | | postcode | YES | | street | YES | | number | YES | | unit | YES | | address_levels | YES | | postal_city | YES | | version | YES | | sources | YES | +----------------+-------------+ 12 row(s) fetched. Elapsed 0.005 seconds. > select column_name, is_nullable from information_schema.columns where table_name='overture_maps' and column_name='type'; +-------------+-------------+ | column_name | is_nullable | +-------------+-------------+ +-------------+-------------+ 0 row(s) fetched. Elapsed 0.006 seconds. ``` Next, when a path to a subdirectory is given to create the table, the underlying parquet schema is successfully detected, however the columns corresponding to the Hive keys are still missing. ### Expected behavior When accessing a hive partitioned dataset with no explicit schema set, and having subdirectory traversal enabled, I would expect to get a table that can be queried/filtered on both the columns represented in the parquet schema as well as the hive partitions. ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org