samredai commented on pull request #3691: URL: https://github.com/apache/iceberg/pull/3691#issuecomment-1017927265
> I think as a generic answer looking at what [fsspec](https://filesystem-spec.readthedocs.io/en/latest/developer.html#implementing-a-backend) has done (and having these as separate packages) that the use can install in there environment probably makes sense. Thanks for pointing out the entry_point mechanism that fsspec uses. I have to take a closer look at it but I really like the idea of the user simply plugging in a custom implementation while we maintain "known implementations" in the main library. > Specifically for S3, if pyarrow is a hard dependency for parquet reading providing reference implementations based off of its file systems (it comes prepackaged with S3) could make sense. I'm wondering if the FileIO implementations need to be storage-specific. For example, pyarrow, boto, and smartopen all could be used as an implementation for various cloud storage solutions. Instead of having something like `PyarrowS3FileIO` to differentiate between maybe like a `BotoS3FileIO`, we could instead do a `PyarrowFileIO` which can be an entry point to any of the storage io options provided by pyarrow. I don't think this has any implications for this PR in particular so I'll work on updating this asap with the suggestions and we can tackle these other questions in follow-up discussions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
