rdettai edited a comment on issue #1220: URL: https://github.com/apache/arrow-datafusion/issues/1220#issuecomment-957591340
#1185 is also a follow up to #1139 that is closely related to this. Maybe we can merge the two issues and create subtasks? @alamb my idea was that each standard/technique for getting the list of files (table catalog) should be a different provider. The `ListingTable` provider might handle folder structures that are slightly different from the hive one (e.g `mytable/2021/11/02` instead of `mytable/year=2021/month=11/day=02`), but it focuses on setups where the partitions are encoded in the folder structure itself and are discovered by **"listing"** the file system. Most of the code inside the `datasource/listing` module should be specialized to do precisely that (e.g chose a listing strategy, parse the paths...). Everything else can be taken out and mutualized into a common module for reuse in other table providers 😊. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
