Google Cloud Storage connector into Hadoop

James Malone Mon, 07 Dec 2015 14:36:07 -0800

Hello,

We're from a team within Google Cloud Platform focused on OSS and data
technologies, especially Hadoop (and Spark.) Before we cut a JIRA for
something we’d like to do, we wanted to reach out to this list to ask a two
quick questions, describe our proposed action, and check for any major
objections.


Proposed action:
We have a Hadoop connector[1] (more info[2]) for Google Cloud Storage (GCS)
which we have been building and maintaining for some time. After we clean
up our code and tests to conform (to these[3] and other requirements) we
would like to contribute it to Hadoop. We have many customers using the
connector in high-throughput production Hadoop clusters; we’d like to make
it easier and faster to use Hadoop and GCS.

Timeline:
Presently, we are working on the beta of Google Cloud Dataproc[4] which
limits our time a bit, so we’re targeting late Q1 2016 for creating a JIRA
issue and adapting our connector code as needed.

Our (quick) questions:
* Do we need to take any (non-coding) action for this beyond submitting a
JIRA when we are ready?
* Are there any up-front concerns or questions which we can (or will need
to) address?

Thank you!

James Malone
On behalf of the Google Big Data OSS Engineering Team

Links:
[1] - https://github.com/GoogleCloudPlatform/bigdata-interop/tree/master/gcs
[2] - https://cloud.google.com/hadoop/google-cloud-storage-connector
[3] - https://github.com/GoogleCloudPlatform/bigdata-interop/tree/master/gcs
[4] - https://cloud.google.com/dataproc

Google Cloud Storage connector into Hadoop

Reply via email to