Hi,
Sadly upgrading googledatastore -> google-cloud-datastore is non-trivial (
https://issues.apache.org/jira/browse/BEAM-4543). I wrote a doc to
summarize the plan:
https://docs.google.com/document/d/1sL9p7NE5Z0p-5SB5uwpxWrddj_UCESKSrsvDTWNKqb4/edit?usp=sharing

Contents pasted below:
Beam Python SDK: Datastore Client Upgrade

[email protected]

public, draft, 2018-10-16
Objective

Upgrade Beam's Python SDK dependency to use google-cloud-datastore v1.70
(or later), replacing googledatastore v7.0.1, providing Beam users a
migration path to a new Datastore PTransform API.
Background

Beam currently uses the googledatastore package to provide access to Google
Cloud Datastore, however that package doesn't seem to be getting regular
releases (last release in 2017-04
<https://pypi.org/project/googledatastore/>) and it doesn't officially
support Python 3 <https://issues.apache.org/jira/browse/BEAM-4543>.

The current Beam API for Datastore queries exposes googledatastore types,
such as using a protobuf to define a query (wordcount example
<https://github.com/apache/beam/blob/79049b02949affe5aa2390dec9b890a04e1fde89/sdks/python/apache_beam/examples/cookbook/datastore_wordcount.py#L159>).
Conversely, google-cloud-datastore hides this implementation detail (query
API
<https://googleapis.github.io/google-cloud-python/latest/datastore/queries.html>).
Since Beam API has to change the data types it accepts, it forces users to
change their code. This makes the migration to google-cloud-datastore
non-trivial.
Proposal

This proposal includes a period in which two Beam APIs are available to
access Datastore.


   -

   Add a new PTransforms that use google-cloud-datastore and mark as
   deprecated the existing API (ReadFromDatastore, WriteToDatastore,
   DeleteFromDatastore).
   -

   Implement apache_beam/io/datastore.py using google-cloud-datastore,
   taking care to not expose Datastore client internals.
   -

   (optional) Remove googledatastore from GCP_REQUIREMENTS
   
<https://github.com/apache/beam/blob/79049b02949affe5aa2390dec9b890a04e1fde89/sdks/python/setup.py#L139>
   package list, and add it to a separate list, e.g., pip install
   apache-beam[gcp,googledatastore].
   -

   Remove googledatastore-based API from Beam after 2 releases.

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to