Hello all, I was thinking that a filesystem with support for s3 would be great to have in the Python SDK. If I am not wrong, it would simply involve implementing the filesystem classes <https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/filesystem.py> with s3, right?
I am not familiar enough with s3, nor with filesystems, nor with AWS in general - but I have some outstanding questions: - Does this mean that we probably would need an extra [s3] target for installing apache_beam, like we do with [gcp]? - Not strictly necessary, but probably desirable... - How do we handle KMS in GCS filesystem? - Would the filesystem encapsulation make KMS support in an s3 filesystem difficult? - Or even more... is the KMS support in AWS very different than in GCP? - I'd love comments from anyone informed around this : ) - Is this project of an appropriate size for a GSoC student? Thoughts? Best -P.