Udi thank you for the proposal and thank you for sharing it in plain email.
My comments are below.

Overall, this is a good plan to get us out of a tough situation with an old
dependency.

On Tue, Oct 16, 2018 at 6:59 PM, Udi Meiri <[email protected]> wrote:

> Hi,
> Sadly upgrading googledatastore -> google-cloud-datastore is non-trivial (
> https://issues.apache.org/jira/browse/BEAM-4543). I wrote a doc to
> summarize the plan:
> https://docs.google.com/document/d/1sL9p7NE5Z0p-5SB5uwpxWrddj_
> UCESKSrsvDTWNKqb4/edit?usp=sharing
>
> Contents pasted below:
> Beam Python SDK: Datastore Client Upgrade
>
> [email protected]
>
> public, draft, 2018-10-16
> Objective
>
> Upgrade Beam's Python SDK dependency to use google-cloud-datastore v1.70
> (or later), replacing googledatastore v7.0.1, providing Beam users a
> migration path to a new Datastore PTransform API.
> Background
>
> Beam currently uses the googledatastore package to provide access to
> Google Cloud Datastore, however that package doesn't seem to be getting
> regular releases (last release in 2017-04
> <https://pypi.org/project/googledatastore/>) and it doesn't officially
> support Python 3 <https://issues.apache.org/jira/browse/BEAM-4543>.
>
> The current Beam API for Datastore queries exposes googledatastore types,
> such as using a protobuf to define a query (wordcount example
> <https://github.com/apache/beam/blob/79049b02949affe5aa2390dec9b890a04e1fde89/sdks/python/apache_beam/examples/cookbook/datastore_wordcount.py#L159>).
> Conversely, google-cloud-datastore hides this implementation detail (query
> API
> <https://googleapis.github.io/google-cloud-python/latest/datastore/queries.html>).
> Since Beam API has to change the data types it accepts, it forces users to
> change their code. This makes the migration to google-cloud-datastore
> non-trivial.
> Proposal
>
> This proposal includes a period in which two Beam APIs are available to
> access Datastore.
>
>
>    -
>
>    Add a new PTransforms that use google-cloud-datastore and mark as
>    deprecated the existing API (ReadFromDatastore, WriteToDatastore,
>    DeleteFromDatastore).
>    -
>
>    Implement apache_beam/io/datastore.py using google-cloud-datastore,
>    taking care to not expose Datastore client internals.
>    -
>
>    (optional) Remove googledatastore from GCP_REQUIREMENTS
>    
> <https://github.com/apache/beam/blob/79049b02949affe5aa2390dec9b890a04e1fde89/sdks/python/setup.py#L139>
>    package list, and add it to a separate list, e.g., pip install
>    apache-beam[gcp,googledatastore].
>
>
I would like to avoid defining new sets of extra packages. Assuming that
these two packages are not incompatible together, we could keep them both
in [gcp].


>
>    -
>
>    Remove googledatastore-based API from Beam after 2 releases.
>
>
The removal needs to wait until next major version by default. Unless, we
have a way of asking our users and ensuring that nobody is really using the
existing API. Removing a current API in 2 releases (~3 months period) will
hurt some users.

Reply via email to