EmilyMatt opened a new issue, #16991: URL: https://github.com/apache/datafusion/issues/16991
### Is your feature request related to a problem or challenge? Currently, a filegroup will contain partitioned files, which when created from a path, will have the scheme and authority stripped from them in the parsing. The datasourceexec itself only supports a single objectstoreurl/objectstore therefore, if I create a filegroup from the following partitioned files "s3://my-bucket/file" and "file:///my-local-file", I would get a file not found error from the object store, since the datasource itself can only be created with a single object store, either for the s3 or file. For my use case, my partitioner found that the best partitioning scheme was to load files from those two locations in one partition, and the current exec doesn't let me do that. ### Describe the solution you'd like For the datasource exec to maintain a map of ObjectStoreUrl -> ObjectStore, or just a set of registered ObjectStoreUrls which can then be used to fetch the object store from the task context. And for the PartitionedFile itself to maintain an ObjectStoreUrl, as well as the file Path, so the right object store be selected for the file opener. Additionally, when the datasource exec is created, it should verify that all the partitionedfiles's objectstore is available for the operator. ### Describe alternatives you've considered Currently I need to either manually fix the partitioning, which is not great for my flow, or manually create a datasourceexec for each file/store and execute it, then have an operator to manually inject each record batch into the rest of the streams, but this ruins the ease-of-use for the datasourceexec's partitioning mechanism which would automatically generate the stream for each partition. ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org