samredai commented on pull request #3691:
URL: https://github.com/apache/iceberg/pull/3691#issuecomment-1017927265


   > I think as a generic answer looking at what 
[fsspec](https://filesystem-spec.readthedocs.io/en/latest/developer.html#implementing-a-backend)
 has done (and having these as separate packages) that the use can install in 
there environment probably makes sense.
   
   Thanks for pointing out the entry_point mechanism that fsspec uses. I have 
to take a closer look at it but I really like the idea of the user simply 
plugging in a custom implementation while we maintain "known implementations" 
in the main library.
   
   > Specifically for S3, if pyarrow is a hard dependency for parquet reading 
providing reference implementations based off of its file systems (it comes 
prepackaged with S3) could make sense.
   
   I'm wondering if the FileIO implementations need to be storage-specific. For 
example, pyarrow, boto, and smartopen all could be used as an implementation 
for various cloud storage solutions. Instead of having something like 
`PyarrowS3FileIO` to differentiate between maybe like a `BotoS3FileIO`, we 
could instead do a `PyarrowFileIO` which can be an entry point to any of the 
storage io options provided by pyarrow. I don't think this has any implications 
for this PR in particular so I'll work on updating this asap with the 
suggestions and we can tackle these other questions in follow-up discussions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to