alamb commented on issue #2445: URL: https://github.com/apache/arrow-datafusion/issues/2445#issuecomment-1120413298
> From my perspective It might be beneficial to push information about data source from TableProvider to ObjectStore. Then ObjectStore for a local file system, would combine data(table) location and strategy for listing that kind of storage. As a result listing methods present in ObjectStore could drop the concept of path as a way to access data. I really like the idea of providing an extensible storage interface that allows APIs such as suggested by @Cheappie and @timvw. Given these APIs seem to be adding semantics to the list of files on ObjectStorage, perhaps we could an extra layer specifically in the APIs rather than trying to extend `ObjectStore` or adding more logic to `ListingTable`. Perhaps something like the `StorageFormat` in: ```text ┌───────────────────────────────────┐ │ │ │ ListingTable │ │ │ └───────────────────────────────────┘ ┌───────────────────────────────────┐ │ StorageCatalog │ │ (e.g figure out which files on │ │ object store to process) │ └───────────────────────────────────┘ ┌────────────────┐ ┌────────────────┐ │ ObjectStore │ │ File Format │ │(e.g. S3, HDFS) │ │ (e.g. parquet) │ │ │ │ │ └────────────────┘ └────────────────┘ ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
