rdettai edited a comment on issue #1220:
URL: 
https://github.com/apache/arrow-datafusion/issues/1220#issuecomment-957591340


   #1185 is also a follow up to #1139 that is closely related to this. Maybe we 
can merge the two issues and create subtasks?
   
   @alamb my idea was that each standard/technique for getting the list of 
files (table catalog) should be a different provider. The `ListingTable` 
provider might handle folder structures that are slightly different from the 
hive one (e.g `mytable/2021/11/02` instead of 
`mytable/year=2021/month=11/day=02`), but it focuses on setups where the 
partitions are encoded in the folder structure itself and are discovered by 
**"listing"** the file system. Most of the code inside the `datasource/listing` 
module should be specialized to do precisely that (e.g chose a listing 
strategy, parse the paths...). Everything else can (should 😉) be taken out and 
mutualized into a common module for reuse in other table providers 😊.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to