Implementing (ARROW-1119) [Python] Enable reading Parquet data sets from Amazon S3

Kevin Moore Wed, 21 Jun 2017 21:54:35 -0700

Has anyone started looking into how to read data sets from S3? I started
looking into it and wondered if anyone has a design in mind.


We could implement an S3FileSystem class in pyarrow/filesystem.py. The
filesystem components could probably be written against the AWS Python SDK.

The HDFS file system and file classes, however, are implemented at least
partially in Cython & C++. Is there an advantage to doing that for S3 too?

Thanks,

Kevin

----
Kevin Moore
CEO, Quilt Data, Inc.
[email protected] | LinkedIn <https://www.linkedin.com/in/kevinemoore/>
(415) 497-7895


Data packages for fast, reproducible data science
quiltdata.com

Implementing (ARROW-1119) [Python] Enable reading Parquet data sets from Amazon S3

Reply via email to