On Wed, Oct 17, 2018 at 11:49 AM, Chamikara Jayalath <[email protected]>
wrote:

> Thanks Udi. Added some comments.
>
> On Wed, Oct 17, 2018 at 10:50 AM Ahmet Altay <[email protected]> wrote:
>
>> Udi thank you for the proposal and thank you for sharing it in plain
>> email. My comments are below.
>>
>> Overall, this is a good plan to get us out of a tough situation with an
>> old dependency.
>>
>> On Tue, Oct 16, 2018 at 6:59 PM, Udi Meiri <[email protected]> wrote:
>>
>>> Hi,
>>> Sadly upgrading googledatastore -> google-cloud-datastore is non-trivial
>>> (https://issues.apache.org/jira/browse/BEAM-4543). I wrote a doc to
>>> summarize the plan:
>>> https://docs.google.com/document/d/1sL9p7NE5Z0p-5SB5uwpxWrddj_
>>> UCESKSrsvDTWNKqb4/edit?usp=sharing
>>>
>>> Contents pasted below:
>>> Beam Python SDK: Datastore Client Upgrade
>>>
>>> [email protected]
>>>
>>> public, draft, 2018-10-16
>>> Objective
>>>
>>> Upgrade Beam's Python SDK dependency to use google-cloud-datastore v1.70
>>> (or later), replacing googledatastore v7.0.1, providing Beam users a
>>> migration path to a new Datastore PTransform API.
>>> Background
>>>
>>> Beam currently uses the googledatastore package to provide access to
>>> Google Cloud Datastore, however that package doesn't seem to be getting
>>> regular releases (last release in 2017-04
>>> <https://pypi.org/project/googledatastore/>) and it doesn't officially
>>> support Python 3 <https://issues.apache.org/jira/browse/BEAM-4543>.
>>>
>>> The current Beam API for Datastore queries exposes googledatastore
>>> types, such as using a protobuf to define a query (wordcount example
>>> <https://github.com/apache/beam/blob/79049b02949affe5aa2390dec9b890a04e1fde89/sdks/python/apache_beam/examples/cookbook/datastore_wordcount.py#L159>).
>>> Conversely, google-cloud-datastore hides this implementation detail (query
>>> API
>>> <https://googleapis.github.io/google-cloud-python/latest/datastore/queries.html>).
>>> Since Beam API has to change the data types it accepts, it forces users to
>>> change their code. This makes the migration to google-cloud-datastore
>>> non-trivial.
>>> Proposal
>>>
>>> This proposal includes a period in which two Beam APIs are available to
>>> access Datastore.
>>>
>>>
>>>    -
>>>
>>>    Add a new PTransforms that use google-cloud-datastore and mark as
>>>    deprecated the existing API (ReadFromDatastore, WriteToDatastore,
>>>    DeleteFromDatastore).
>>>    -
>>>
>>>    Implement apache_beam/io/datastore.py using google-cloud-datastore,
>>>    taking care to not expose Datastore client internals.
>>>    -
>>>
>>>    (optional) Remove googledatastore from GCP_REQUIREMENTS
>>>    
>>> <https://github.com/apache/beam/blob/79049b02949affe5aa2390dec9b890a04e1fde89/sdks/python/setup.py#L139>
>>>    package list, and add it to a separate list, e.g., pip install
>>>    apache-beam[gcp,googledatastore].
>>>
>>>
>> I would like to avoid defining new sets of extra packages. Assuming that
>> these two packages are not incompatible together, we could keep them both
>> in [gcp].
>>
>
> I think we might need this since googleclouddatastore package (1) does not
> seems to be getting upgraded (2) depends on older versions of packages (for
> example, httplib2).
>
> This conflicts with more recent releases of other tools (for example,
> gsutil).
>

This is fine, if it is the only viable option. But note that it is also a
breaking change in the way people install beam in order to use old
datastore APIs.


>
>
>>
>>
>>>
>>>    -
>>>
>>>    Remove googledatastore-based API from Beam after 2 releases.
>>>
>>>
>> The removal needs to wait until next major version by default. Unless, we
>> have a way of asking our users and ensuring that nobody is really using the
>> existing API. Removing a current API in 2 releases (~3 months period) will
>> hurt some users.
>>
> +1
>
>>
>>
>>

Reply via email to