EmilyMatt opened a new issue, #16991:
URL: https://github.com/apache/datafusion/issues/16991

   ### Is your feature request related to a problem or challenge?
   
   Currently, a filegroup will contain partitioned files, which when created 
from a path, will have the scheme and authority stripped from them in the 
parsing.
   The datasourceexec itself only supports a single objectstoreurl/objectstore
   therefore, if I create a filegroup from the following partitioned files
   "s3://my-bucket/file" and "file:///my-local-file", I would get a file not 
found error from the object store, since the datasource itself can only be 
created with a single object store, either for the s3 or file.
   For my use case, my partitioner found that the best partitioning scheme was 
to load files from those two locations in one partition, and the current exec 
doesn't let me do that.
   
   ### Describe the solution you'd like
   
   For the datasource exec to maintain a map of ObjectStoreUrl -> ObjectStore, 
or just a set of registered ObjectStoreUrls which can then be used to fetch the 
object store from the task context.
   And for the PartitionedFile itself to maintain an ObjectStoreUrl, as well as 
the file Path, so the right object store be selected for the file opener.
   
   Additionally, when the datasource exec is created, it should verify that all 
the partitionedfiles's objectstore is available for the operator.
   
   ### Describe alternatives you've considered
   
   Currently I need to either manually fix the partitioning, which is not great 
for my flow, or manually create a datasourceexec for each file/store and 
execute it, then have an operator to manually inject each record batch into the 
rest of the streams, but this ruins the ease-of-use for the datasourceexec's 
partitioning mechanism which would automatically generate the stream for each 
partition.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to