Re: Python Datastore client upgrade plan

2019-03-19 Thread Kenneth Knowles
Thanks for taking the time to communicate about the updated plan. Really
appreciate it.

Kenn

On Tue, Mar 19, 2019 at 10:25 AM Udi Meiri  wrote:

> Update: I'm back to working on this.
> To allow a smoother migration, I'm planning on having apache-beam depend
> on both googledatastore and google-cloud-datastore and having 2 Beam
> modules.
> The newer client is a bit more limited in expressing queries (only ANDs
> for composite filtering).
> OTOH it supports transactions so we could add inserts of incomplete
> entities.
>
> Updated plan here:
> https://docs.google.com/document/d/1sL9p7NE5Z0p-5SB5uwpxWrddj_UCESKSrsvDTWNKqb4/edit
>
> On Wed, Oct 17, 2018 at 12:49 PM Ahmet Altay  wrote:
>
>>
>>
>> On Wed, Oct 17, 2018 at 11:49 AM, Chamikara Jayalath <
>> chamik...@google.com> wrote:
>>
>>> Thanks Udi. Added some comments.
>>>
>>> On Wed, Oct 17, 2018 at 10:50 AM Ahmet Altay  wrote:
>>>
 Udi thank you for the proposal and thank you for sharing it in plain
 email. My comments are below.

 Overall, this is a good plan to get us out of a tough situation with an
 old dependency.

 On Tue, Oct 16, 2018 at 6:59 PM, Udi Meiri  wrote:

> Hi,
> Sadly upgrading googledatastore -> google-cloud-datastore is
> non-trivial (https://issues.apache.org/jira/browse/BEAM-4543). I
> wrote a doc to summarize the plan:
>
> https://docs.google.com/document/d/1sL9p7NE5Z0p-5SB5uwpxWrddj_UCESKSrsvDTWNKqb4/edit?usp=sharing
>
> Contents pasted below:
> Beam Python SDK: Datastore Client Upgrade
>
> eh...@google.com
>
> public, draft, 2018-10-16
> Objective
>
> Upgrade Beam's Python SDK dependency to use google-cloud-datastore
> v1.70 (or later), replacing googledatastore v7.0.1, providing Beam users a
> migration path to a new Datastore PTransform API.
> Background
>
> Beam currently uses the googledatastore package to provide access to
> Google Cloud Datastore, however that package doesn't seem to be getting
> regular releases (last release in 2017-04
> ) and it doesn't
> officially support Python 3
> .
>
> The current Beam API for Datastore queries exposes googledatastore
> types, such as using a protobuf to define a query (wordcount example
> ).
> Conversely, google-cloud-datastore hides this implementation detail (query
> API
> ).
> Since Beam API has to change the data types it accepts, it forces users to
> change their code. This makes the migration to google-cloud-datastore
> non-trivial.
> Proposal
>
> This proposal includes a period in which two Beam APIs are available
> to access Datastore.
>
>
>-
>
>Add a new PTransforms that use google-cloud-datastore and mark as
>deprecated the existing API (ReadFromDatastore, WriteToDatastore,
>DeleteFromDatastore).
>-
>
>Implement apache_beam/io/datastore.py using
>google-cloud-datastore, taking care to not expose Datastore client
>internals.
>-
>
>(optional) Remove googledatastore from GCP_REQUIREMENTS
>
> 
>package list, and add it to a separate list, e.g., pip install
>apache-beam[gcp,googledatastore].
>
>
 I would like to avoid defining new sets of extra packages. Assuming
 that these two packages are not incompatible together, we could keep them
 both in [gcp].

>>>
>>> I think we might need this since googleclouddatastore package (1) does
>>> not seems to be getting upgraded (2) depends on older versions of packages
>>> (for example, httplib2).
>>>
>>> This conflicts with more recent releases of other tools (for example,
>>> gsutil).
>>>
>>
>> This is fine, if it is the only viable option. But note that it is also a
>> breaking change in the way people install beam in order to use old
>> datastore APIs.
>>
>>
>>>
>>>


>
>-
>
>Remove googledatastore-based API from Beam after 2 releases.
>
>
 The removal needs to wait until next major version by default. Unless,
 we have a way of asking our users and ensuring that nobody is really using
 the existing API. Removing a current API in 2 releases (~3 months period)
 will hurt some users.

>>> +1
>>>



>>


Re: Python Datastore client upgrade plan

2019-03-19 Thread Udi Meiri
Update: I'm back to working on this.
To allow a smoother migration, I'm planning on having apache-beam depend on
both googledatastore and google-cloud-datastore and having 2 Beam modules.
The newer client is a bit more limited in expressing queries (only ANDs for
composite filtering).
OTOH it supports transactions so we could add inserts of incomplete
entities.

Updated plan here:
https://docs.google.com/document/d/1sL9p7NE5Z0p-5SB5uwpxWrddj_UCESKSrsvDTWNKqb4/edit

On Wed, Oct 17, 2018 at 12:49 PM Ahmet Altay  wrote:

>
>
> On Wed, Oct 17, 2018 at 11:49 AM, Chamikara Jayalath  > wrote:
>
>> Thanks Udi. Added some comments.
>>
>> On Wed, Oct 17, 2018 at 10:50 AM Ahmet Altay  wrote:
>>
>>> Udi thank you for the proposal and thank you for sharing it in plain
>>> email. My comments are below.
>>>
>>> Overall, this is a good plan to get us out of a tough situation with an
>>> old dependency.
>>>
>>> On Tue, Oct 16, 2018 at 6:59 PM, Udi Meiri  wrote:
>>>
 Hi,
 Sadly upgrading googledatastore -> google-cloud-datastore is
 non-trivial (https://issues.apache.org/jira/browse/BEAM-4543). I wrote
 a doc to summarize the plan:

 https://docs.google.com/document/d/1sL9p7NE5Z0p-5SB5uwpxWrddj_UCESKSrsvDTWNKqb4/edit?usp=sharing

 Contents pasted below:
 Beam Python SDK: Datastore Client Upgrade

 eh...@google.com

 public, draft, 2018-10-16
 Objective

 Upgrade Beam's Python SDK dependency to use google-cloud-datastore
 v1.70 (or later), replacing googledatastore v7.0.1, providing Beam users a
 migration path to a new Datastore PTransform API.
 Background

 Beam currently uses the googledatastore package to provide access to
 Google Cloud Datastore, however that package doesn't seem to be getting
 regular releases (last release in 2017-04
 ) and it doesn't officially
 support Python 3 .

 The current Beam API for Datastore queries exposes googledatastore
 types, such as using a protobuf to define a query (wordcount example
 ).
 Conversely, google-cloud-datastore hides this implementation detail (query
 API
 ).
 Since Beam API has to change the data types it accepts, it forces users to
 change their code. This makes the migration to google-cloud-datastore
 non-trivial.
 Proposal

 This proposal includes a period in which two Beam APIs are available to
 access Datastore.


-

Add a new PTransforms that use google-cloud-datastore and mark as
deprecated the existing API (ReadFromDatastore, WriteToDatastore,
DeleteFromDatastore).
-

Implement apache_beam/io/datastore.py using google-cloud-datastore,
taking care to not expose Datastore client internals.
-

(optional) Remove googledatastore from GCP_REQUIREMENTS

 
package list, and add it to a separate list, e.g., pip install
apache-beam[gcp,googledatastore].


>>> I would like to avoid defining new sets of extra packages. Assuming that
>>> these two packages are not incompatible together, we could keep them both
>>> in [gcp].
>>>
>>
>> I think we might need this since googleclouddatastore package (1) does
>> not seems to be getting upgraded (2) depends on older versions of packages
>> (for example, httplib2).
>>
>> This conflicts with more recent releases of other tools (for example,
>> gsutil).
>>
>
> This is fine, if it is the only viable option. But note that it is also a
> breaking change in the way people install beam in order to use old
> datastore APIs.
>
>
>>
>>
>>>
>>>

-

Remove googledatastore-based API from Beam after 2 releases.


>>> The removal needs to wait until next major version by default. Unless,
>>> we have a way of asking our users and ensuring that nobody is really using
>>> the existing API. Removing a current API in 2 releases (~3 months period)
>>> will hurt some users.
>>>
>> +1
>>
>>>
>>>
>>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Python Datastore client upgrade plan

2018-10-17 Thread Ahmet Altay
On Wed, Oct 17, 2018 at 11:49 AM, Chamikara Jayalath 
wrote:

> Thanks Udi. Added some comments.
>
> On Wed, Oct 17, 2018 at 10:50 AM Ahmet Altay  wrote:
>
>> Udi thank you for the proposal and thank you for sharing it in plain
>> email. My comments are below.
>>
>> Overall, this is a good plan to get us out of a tough situation with an
>> old dependency.
>>
>> On Tue, Oct 16, 2018 at 6:59 PM, Udi Meiri  wrote:
>>
>>> Hi,
>>> Sadly upgrading googledatastore -> google-cloud-datastore is non-trivial
>>> (https://issues.apache.org/jira/browse/BEAM-4543). I wrote a doc to
>>> summarize the plan:
>>> https://docs.google.com/document/d/1sL9p7NE5Z0p-5SB5uwpxWrddj_
>>> UCESKSrsvDTWNKqb4/edit?usp=sharing
>>>
>>> Contents pasted below:
>>> Beam Python SDK: Datastore Client Upgrade
>>>
>>> eh...@google.com
>>>
>>> public, draft, 2018-10-16
>>> Objective
>>>
>>> Upgrade Beam's Python SDK dependency to use google-cloud-datastore v1.70
>>> (or later), replacing googledatastore v7.0.1, providing Beam users a
>>> migration path to a new Datastore PTransform API.
>>> Background
>>>
>>> Beam currently uses the googledatastore package to provide access to
>>> Google Cloud Datastore, however that package doesn't seem to be getting
>>> regular releases (last release in 2017-04
>>> ) and it doesn't officially
>>> support Python 3 .
>>>
>>> The current Beam API for Datastore queries exposes googledatastore
>>> types, such as using a protobuf to define a query (wordcount example
>>> ).
>>> Conversely, google-cloud-datastore hides this implementation detail (query
>>> API
>>> ).
>>> Since Beam API has to change the data types it accepts, it forces users to
>>> change their code. This makes the migration to google-cloud-datastore
>>> non-trivial.
>>> Proposal
>>>
>>> This proposal includes a period in which two Beam APIs are available to
>>> access Datastore.
>>>
>>>
>>>-
>>>
>>>Add a new PTransforms that use google-cloud-datastore and mark as
>>>deprecated the existing API (ReadFromDatastore, WriteToDatastore,
>>>DeleteFromDatastore).
>>>-
>>>
>>>Implement apache_beam/io/datastore.py using google-cloud-datastore,
>>>taking care to not expose Datastore client internals.
>>>-
>>>
>>>(optional) Remove googledatastore from GCP_REQUIREMENTS
>>>
>>> 
>>>package list, and add it to a separate list, e.g., pip install
>>>apache-beam[gcp,googledatastore].
>>>
>>>
>> I would like to avoid defining new sets of extra packages. Assuming that
>> these two packages are not incompatible together, we could keep them both
>> in [gcp].
>>
>
> I think we might need this since googleclouddatastore package (1) does not
> seems to be getting upgraded (2) depends on older versions of packages (for
> example, httplib2).
>
> This conflicts with more recent releases of other tools (for example,
> gsutil).
>

This is fine, if it is the only viable option. But note that it is also a
breaking change in the way people install beam in order to use old
datastore APIs.


>
>
>>
>>
>>>
>>>-
>>>
>>>Remove googledatastore-based API from Beam after 2 releases.
>>>
>>>
>> The removal needs to wait until next major version by default. Unless, we
>> have a way of asking our users and ensuring that nobody is really using the
>> existing API. Removing a current API in 2 releases (~3 months period) will
>> hurt some users.
>>
> +1
>
>>
>>
>>


Re: Python Datastore client upgrade plan

2018-10-17 Thread Chamikara Jayalath
Thanks Udi. Added some comments.

On Wed, Oct 17, 2018 at 10:50 AM Ahmet Altay  wrote:

> Udi thank you for the proposal and thank you for sharing it in plain
> email. My comments are below.
>
> Overall, this is a good plan to get us out of a tough situation with an
> old dependency.
>
> On Tue, Oct 16, 2018 at 6:59 PM, Udi Meiri  wrote:
>
>> Hi,
>> Sadly upgrading googledatastore -> google-cloud-datastore is non-trivial (
>> https://issues.apache.org/jira/browse/BEAM-4543). I wrote a doc to
>> summarize the plan:
>>
>> https://docs.google.com/document/d/1sL9p7NE5Z0p-5SB5uwpxWrddj_UCESKSrsvDTWNKqb4/edit?usp=sharing
>>
>> Contents pasted below:
>> Beam Python SDK: Datastore Client Upgrade
>>
>> eh...@google.com
>>
>> public, draft, 2018-10-16
>> Objective
>>
>> Upgrade Beam's Python SDK dependency to use google-cloud-datastore v1.70
>> (or later), replacing googledatastore v7.0.1, providing Beam users a
>> migration path to a new Datastore PTransform API.
>> Background
>>
>> Beam currently uses the googledatastore package to provide access to
>> Google Cloud Datastore, however that package doesn't seem to be getting
>> regular releases (last release in 2017-04
>> ) and it doesn't officially
>> support Python 3 .
>>
>> The current Beam API for Datastore queries exposes googledatastore types,
>> such as using a protobuf to define a query (wordcount example
>> ).
>> Conversely, google-cloud-datastore hides this implementation detail (query
>> API
>> ).
>> Since Beam API has to change the data types it accepts, it forces users to
>> change their code. This makes the migration to google-cloud-datastore
>> non-trivial.
>> Proposal
>>
>> This proposal includes a period in which two Beam APIs are available to
>> access Datastore.
>>
>>
>>-
>>
>>Add a new PTransforms that use google-cloud-datastore and mark as
>>deprecated the existing API (ReadFromDatastore, WriteToDatastore,
>>DeleteFromDatastore).
>>-
>>
>>Implement apache_beam/io/datastore.py using google-cloud-datastore,
>>taking care to not expose Datastore client internals.
>>-
>>
>>(optional) Remove googledatastore from GCP_REQUIREMENTS
>>
>> 
>>package list, and add it to a separate list, e.g., pip install
>>apache-beam[gcp,googledatastore].
>>
>>
> I would like to avoid defining new sets of extra packages. Assuming that
> these two packages are not incompatible together, we could keep them both
> in [gcp].
>

I think we might need this since googleclouddatastore package (1) does not
seems to be getting upgraded (2) depends on older versions of packages (for
example, httplib2).

This conflicts with more recent releases of other tools (for example,
gsutil).


>
>
>>
>>-
>>
>>Remove googledatastore-based API from Beam after 2 releases.
>>
>>
> The removal needs to wait until next major version by default. Unless, we
> have a way of asking our users and ensuring that nobody is really using the
> existing API. Removing a current API in 2 releases (~3 months period) will
> hurt some users.
>
+1

>
>
>


Re: Python Datastore client upgrade plan

2018-10-17 Thread Ahmet Altay
Udi thank you for the proposal and thank you for sharing it in plain email.
My comments are below.

Overall, this is a good plan to get us out of a tough situation with an old
dependency.

On Tue, Oct 16, 2018 at 6:59 PM, Udi Meiri  wrote:

> Hi,
> Sadly upgrading googledatastore -> google-cloud-datastore is non-trivial (
> https://issues.apache.org/jira/browse/BEAM-4543). I wrote a doc to
> summarize the plan:
> https://docs.google.com/document/d/1sL9p7NE5Z0p-5SB5uwpxWrddj_
> UCESKSrsvDTWNKqb4/edit?usp=sharing
>
> Contents pasted below:
> Beam Python SDK: Datastore Client Upgrade
>
> eh...@google.com
>
> public, draft, 2018-10-16
> Objective
>
> Upgrade Beam's Python SDK dependency to use google-cloud-datastore v1.70
> (or later), replacing googledatastore v7.0.1, providing Beam users a
> migration path to a new Datastore PTransform API.
> Background
>
> Beam currently uses the googledatastore package to provide access to
> Google Cloud Datastore, however that package doesn't seem to be getting
> regular releases (last release in 2017-04
> ) and it doesn't officially
> support Python 3 .
>
> The current Beam API for Datastore queries exposes googledatastore types,
> such as using a protobuf to define a query (wordcount example
> ).
> Conversely, google-cloud-datastore hides this implementation detail (query
> API
> ).
> Since Beam API has to change the data types it accepts, it forces users to
> change their code. This makes the migration to google-cloud-datastore
> non-trivial.
> Proposal
>
> This proposal includes a period in which two Beam APIs are available to
> access Datastore.
>
>
>-
>
>Add a new PTransforms that use google-cloud-datastore and mark as
>deprecated the existing API (ReadFromDatastore, WriteToDatastore,
>DeleteFromDatastore).
>-
>
>Implement apache_beam/io/datastore.py using google-cloud-datastore,
>taking care to not expose Datastore client internals.
>-
>
>(optional) Remove googledatastore from GCP_REQUIREMENTS
>
> 
>package list, and add it to a separate list, e.g., pip install
>apache-beam[gcp,googledatastore].
>
>
I would like to avoid defining new sets of extra packages. Assuming that
these two packages are not incompatible together, we could keep them both
in [gcp].


>
>-
>
>Remove googledatastore-based API from Beam after 2 releases.
>
>
The removal needs to wait until next major version by default. Unless, we
have a way of asking our users and ensuring that nobody is really using the
existing API. Removing a current API in 2 releases (~3 months period) will
hurt some users.