Thanks for taking the time to communicate about the updated plan. Really
appreciate it.

Kenn

On Tue, Mar 19, 2019 at 10:25 AM Udi Meiri <[email protected]> wrote:

> Update: I'm back to working on this.
> To allow a smoother migration, I'm planning on having apache-beam depend
> on both googledatastore and google-cloud-datastore and having 2 Beam
> modules.
> The newer client is a bit more limited in expressing queries (only ANDs
> for composite filtering).
> OTOH it supports transactions so we could add inserts of incomplete
> entities.
>
> Updated plan here:
> https://docs.google.com/document/d/1sL9p7NE5Z0p-5SB5uwpxWrddj_UCESKSrsvDTWNKqb4/edit
>
> On Wed, Oct 17, 2018 at 12:49 PM Ahmet Altay <[email protected]> wrote:
>
>>
>>
>> On Wed, Oct 17, 2018 at 11:49 AM, Chamikara Jayalath <
>> [email protected]> wrote:
>>
>>> Thanks Udi. Added some comments.
>>>
>>> On Wed, Oct 17, 2018 at 10:50 AM Ahmet Altay <[email protected]> wrote:
>>>
>>>> Udi thank you for the proposal and thank you for sharing it in plain
>>>> email. My comments are below.
>>>>
>>>> Overall, this is a good plan to get us out of a tough situation with an
>>>> old dependency.
>>>>
>>>> On Tue, Oct 16, 2018 at 6:59 PM, Udi Meiri <[email protected]> wrote:
>>>>
>>>>> Hi,
>>>>> Sadly upgrading googledatastore -> google-cloud-datastore is
>>>>> non-trivial (https://issues.apache.org/jira/browse/BEAM-4543). I
>>>>> wrote a doc to summarize the plan:
>>>>>
>>>>> https://docs.google.com/document/d/1sL9p7NE5Z0p-5SB5uwpxWrddj_UCESKSrsvDTWNKqb4/edit?usp=sharing
>>>>>
>>>>> Contents pasted below:
>>>>> Beam Python SDK: Datastore Client Upgrade
>>>>>
>>>>> [email protected]
>>>>>
>>>>> public, draft, 2018-10-16
>>>>> Objective
>>>>>
>>>>> Upgrade Beam's Python SDK dependency to use google-cloud-datastore
>>>>> v1.70 (or later), replacing googledatastore v7.0.1, providing Beam users a
>>>>> migration path to a new Datastore PTransform API.
>>>>> Background
>>>>>
>>>>> Beam currently uses the googledatastore package to provide access to
>>>>> Google Cloud Datastore, however that package doesn't seem to be getting
>>>>> regular releases (last release in 2017-04
>>>>> <https://pypi.org/project/googledatastore/>) and it doesn't
>>>>> officially support Python 3
>>>>> <https://issues.apache.org/jira/browse/BEAM-4543>.
>>>>>
>>>>> The current Beam API for Datastore queries exposes googledatastore
>>>>> types, such as using a protobuf to define a query (wordcount example
>>>>> <https://github.com/apache/beam/blob/79049b02949affe5aa2390dec9b890a04e1fde89/sdks/python/apache_beam/examples/cookbook/datastore_wordcount.py#L159>).
>>>>> Conversely, google-cloud-datastore hides this implementation detail (query
>>>>> API
>>>>> <https://googleapis.github.io/google-cloud-python/latest/datastore/queries.html>).
>>>>> Since Beam API has to change the data types it accepts, it forces users to
>>>>> change their code. This makes the migration to google-cloud-datastore
>>>>> non-trivial.
>>>>> Proposal
>>>>>
>>>>> This proposal includes a period in which two Beam APIs are available
>>>>> to access Datastore.
>>>>>
>>>>>
>>>>>    -
>>>>>
>>>>>    Add a new PTransforms that use google-cloud-datastore and mark as
>>>>>    deprecated the existing API (ReadFromDatastore, WriteToDatastore,
>>>>>    DeleteFromDatastore).
>>>>>    -
>>>>>
>>>>>    Implement apache_beam/io/datastore.py using
>>>>>    google-cloud-datastore, taking care to not expose Datastore client
>>>>>    internals.
>>>>>    -
>>>>>
>>>>>    (optional) Remove googledatastore from GCP_REQUIREMENTS
>>>>>    
>>>>> <https://github.com/apache/beam/blob/79049b02949affe5aa2390dec9b890a04e1fde89/sdks/python/setup.py#L139>
>>>>>    package list, and add it to a separate list, e.g., pip install
>>>>>    apache-beam[gcp,googledatastore].
>>>>>
>>>>>
>>>> I would like to avoid defining new sets of extra packages. Assuming
>>>> that these two packages are not incompatible together, we could keep them
>>>> both in [gcp].
>>>>
>>>
>>> I think we might need this since googleclouddatastore package (1) does
>>> not seems to be getting upgraded (2) depends on older versions of packages
>>> (for example, httplib2).
>>>
>>> This conflicts with more recent releases of other tools (for example,
>>> gsutil).
>>>
>>
>> This is fine, if it is the only viable option. But note that it is also a
>> breaking change in the way people install beam in order to use old
>> datastore APIs.
>>
>>
>>>
>>>
>>>>
>>>>
>>>>>
>>>>>    -
>>>>>
>>>>>    Remove googledatastore-based API from Beam after 2 releases.
>>>>>
>>>>>
>>>> The removal needs to wait until next major version by default. Unless,
>>>> we have a way of asking our users and ensuring that nobody is really using
>>>> the existing API. Removing a current API in 2 releases (~3 months period)
>>>> will hurt some users.
>>>>
>>> +1
>>>
>>>>
>>>>
>>>>
>>

Reply via email to