BlakeOrth opened a new pull request, #17050: URL: https://github.com/apache/datafusion/pull/17050
## Which issue does this PR close? - Closes #17049 ## What changes are included in this PR? - Fixes an issue in the ListingTableFactory where hive columns are not detected and incorporated into the table schema when an explicit schema has not been set by the user - Fixes an issue where NO files are detected when a path that represents a collection has a . in the final element of the prefix because the contents following the . was interpreted as a file extension (i.e. s3://bucket/prefix/version.v1/ would only attempt to list files with ending with '.v1' instead of the expected extension such as .csv or .parquet) - Fixes an issue where subdirectories that do not follow Hive formatting (e.g. key=value) could be erroneously interpreted as contributing to the table schema ## Are these changes tested? I'm initially submitting this as a draft PR without tests to provide a solid basis for discussion on whether or not this is the desired solution to the linked issue. If/when the solution to the PR is ready to merge I will make additional commits to address feedback as well as implement tests for the solution. At present, the changes have been tested functionally using `datafusion-cli` and the public dataset noted in the issue. ## Are there any user-facing changes? Part of the reason I've left this in draft is because I think there's a possibility for the changes to impact users of the `ListingTableFactory`. In my mind the behavior represented here is what I would think the "expected" behavior should be, but there's a good possibility users are relying on the previous behavior and could get unexpected results if this PR is merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org