[VOTE] Vendored Dependencies Release

2019-08-27 Thread Rui Wang
Please review the release of the following artifacts that we vendor:

 * beam-vendor-calcite-1_20_0

Hi everyone,

Please review and vote on the release candidate #1 for the
org.apache.beam:beam-vendor-calcite-1_20_0:0.1,
as follows:

[ ] +1, Approve the release

[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:

* the official Apache source release to be deployed to dist.apache.org [1],
which is signed with the key with fingerprint
0D7BE1A252DBCEE89F6491BBDFA64862B703F5C8 [2],

* all artifacts to be deployed to the Maven Central Repository [3],

* commit hash "664e25019fc1977e7041e4b834e8d9628b912473" [4],

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,

Rui

[1] https://dist.apache.org/repos/dist/dev/beam/vendor/calcite/1_20_0

[2] https://dist.apache.org/repos/dist/release/beam/KEYS

[3] https://repository.apache.org/content/repositories/orgapachebeam-1083/

[4]
https://github.com/apache/beam/commit/664e25019fc1977e7041e4b834e8d9628b912473


Re: Improve container support

2019-08-27 Thread Hannah Jiang
add dev@

On Tue, Aug 27, 2019 at 9:29 PM Hannah Jiang  wrote:

> Thanks for commenting and discussions.
> I created a Google Docs
> 
>  for
> easy commenting and reviewing. From this moment, all changes will be
> updated to the Google Docs and I will sync to wiki after finalize all plans.
>
> Thanks,
> Hannah
>
> On Tue, Aug 27, 2019 at 9:24 PM Ahmet Altay  wrote:
>
>> Hi datapls-engprod,
>>
>> I have a question. Do you know what would it take to create a new gcp
>> project similar to apache-beam-testing for purposes of distributing gcr
>> packages? We can use the same billing account.
>>
>> Hannah, Robert, depending on the complexity of creating another gcp
>> project we can go with that, or simply create a new bintray account. Either
>> way would give us a clean new project to publish artifacts.
>>
>> Ahmet
>>
>> -- Forwarded message -
>> From: Robert Bradshaw 
>> Date: Tue, Aug 27, 2019 at 6:48 PM
>> Subject: Re: Improve container support
>> To: dev 
>>
>>
>> On Tue, Aug 27, 2019 at 6:20 PM Ahmet Altay  wrote:
>> >
>> > On Tue, Aug 27, 2019 at 5:50 PM Robert Bradshaw 
>> wrote:
>> >>
>> >> On Tue, Aug 27, 2019 at 3:35 PM Hannah Jiang 
>> wrote:
>> >> >
>> >> > Hi team
>> >> >
>> >> > I am working on improving docker container support for Beam. We
>> would like to publish prebuilt containers for each release version and
>> daily snapshot. Current work focuses on release images only and it would be
>> part of the release process.
>> >>
>> >> This would be great!
>> >>
>> >> > The release images will be pushed to GCR which is publicly
>> accessible(pullable). We will use the following locations.
>> >> > Repository: gcr.io/beam
>> >> > Project: apache-beam-testing
>> >>
>> >> Given that these are release artifacts, we should use a project with
>> >> more restricted access than "anyone who opens a PR on github."
>> >
>> >
>> > We have two options:
>> > -  gcr.io works based on the permissions of the gcs bucket that is
>> backing it. GCS supports bucket only permissions. These permissions needs
>> to be explicitly granted and the service accounts used by jenkins jobs does
>> not have these explicit permissions today.
>> > - we can create a new project in gcr, bintray or anything else that
>> offers the same service.
>>
>> I think the cleanest is to simply have a new project whose membership
>> consists of (interested) PMC members. If we have to populate this
>> manually I think that'd still be OK as the churn is quite low.
>>
>


Re: [ANNOUNCE] New committer: Valentyn Tymofieiev

2019-08-27 Thread Ruoyun Huang
Congratulations Valentyn!

On Tue, Aug 27, 2019 at 6:16 PM Daniel Oliveira 
wrote:

> Congratulations Valentyn!
>
> On Tue, Aug 27, 2019, 11:31 AM Boyuan Zhang  wrote:
>
>> Congratulations!
>>
>> On Tue, Aug 27, 2019 at 10:44 AM Udi Meiri  wrote:
>>
>>> Congrats!
>>>
>>> On Tue, Aug 27, 2019 at 9:50 AM Yichi Zhang  wrote:
>>>
 Congrats Valentyn!

 On Tue, Aug 27, 2019 at 7:55 AM Valentyn Tymofieiev <
 valen...@google.com> wrote:

> Thank you everyone!
>
> On Tue, Aug 27, 2019 at 2:57 AM Alexey Romanenko <
> aromanenko@gmail.com> wrote:
>
>> Congrats, well deserved!
>>
>> On 27 Aug 2019, at 11:25, Jan Lukavský  wrote:
>>
>> Congrats Valentyn!
>> On 8/26/19 11:43 PM, Rui Wang wrote:
>>
>> Congratulations!
>>
>>
>> -Rui
>>
>> On Mon, Aug 26, 2019 at 2:36 PM Hannah Jiang 
>> wrote:
>>
>>> Congratulations Valentyn, well deserved!
>>>
>>> On Mon, Aug 26, 2019 at 2:34 PM Chamikara Jayalath <
>>> chamik...@google.com> wrote:
>>>
 Congrats Valentyn!

 On Mon, Aug 26, 2019 at 2:32 PM Pablo Estrada 
 wrote:

> Thanks Valentyn!
>
> On Mon, Aug 26, 2019 at 2:29 PM Robin Qiu 
> wrote:
>
>> Thank you Valentyn! Congratulations!
>>
>> On Mon, Aug 26, 2019 at 2:28 PM Robert Bradshaw <
>> rober...@google.com> wrote:
>>
>>> Hi,
>>>
>>> Please join me and the rest of the Beam PMC in welcoming a new
>>> committer: Valentyn Tymofieiev
>>>
>>> Valentyn has made numerous contributions to Beam over the last
>>> several
>>> years (including 100+ pull requests), most recently pushing
>>> through
>>> the effort to make Beam compatible with Python 3. He is also an
>>> active
>>> participant in design discussions on the list, participates in
>>> release
>>> candidate validation, and proactively helps keep our tests green.
>>>
>>> In consideration of Valentyn's contributions, the Beam PMC
>>> trusts him
>>> with the responsibilities of a Beam committer [1].
>>>
>>> Thank you, Valentyn, for your contributions and looking forward
>>> to many more!
>>>
>>> Robert, on behalf of the Apache Beam PMC
>>>
>>> [1]
>>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>>
>>
>>

-- 

Ruoyun  Huang


Re: Improve container support

2019-08-27 Thread Ahmet Altay
On Tue, Aug 27, 2019 at 5:50 PM Robert Bradshaw  wrote:

> On Tue, Aug 27, 2019 at 3:35 PM Hannah Jiang 
> wrote:
> >
> > Hi team
> >
> > I am working on improving docker container support for Beam. We would
> like to publish prebuilt containers for each release version and daily
> snapshot. Current work focuses on release images only and it would be part
> of the release process.
>
> This would be great!
>
> > The release images will be pushed to GCR which is publicly
> accessible(pullable). We will use the following locations.
> > Repository: gcr.io/beam
> > Project: apache-beam-testing
>
> Given that these are release artifacts, we should use a project with
> more restricted access than "anyone who opens a PR on github."
>

We have two options:
-  gcr.io works based on the permissions of the gcs bucket that is backing
it. GCS supports bucket only permissions. These permissions needs to be
explicitly granted and the service accounts used by jenkins jobs does not
have these explicit permissions today.
- we can create a new project in gcr, bintray or anything else that offers
the same service.


>
> > More details, including naming and tagging scheme, can be found at wiki
> which is written by several contributors.
>
> Would it make sense to put this in a format more amenable to commenting?
>
> > I would like to discuss these two questions.
> > 1. How many tests do we need to run before pushing images to gcr?
> > Publishing artifacts is the last step of the release process, so at this
> moment, we already verified all codebase. In addition, many Jenkins tests
> use containers, so it is already verified several times. Do we need to run
> it again?
> >
> > 2. How many tests do we need to run to validate pushed images?
> > When we push the images, we assume the images would work and pass all
> the tests. After pushing, we should confirm the images are pullable and
> useable. I suggest we run several tests on dataflow with each pushed image.
> What do you think?
>
> I think the release manager publishing these images as part of the
> release process (just like publishing to the maven repo and svn) and
> validation happens as part of validating the artifacts during the
> vote.
>


Re: [ANNOUNCE] New committer: Valentyn Tymofieiev

2019-08-27 Thread Daniel Oliveira
Congratulations Valentyn!

On Tue, Aug 27, 2019, 11:31 AM Boyuan Zhang  wrote:

> Congratulations!
>
> On Tue, Aug 27, 2019 at 10:44 AM Udi Meiri  wrote:
>
>> Congrats!
>>
>> On Tue, Aug 27, 2019 at 9:50 AM Yichi Zhang  wrote:
>>
>>> Congrats Valentyn!
>>>
>>> On Tue, Aug 27, 2019 at 7:55 AM Valentyn Tymofieiev 
>>> wrote:
>>>
 Thank you everyone!

 On Tue, Aug 27, 2019 at 2:57 AM Alexey Romanenko <
 aromanenko@gmail.com> wrote:

> Congrats, well deserved!
>
> On 27 Aug 2019, at 11:25, Jan Lukavský  wrote:
>
> Congrats Valentyn!
> On 8/26/19 11:43 PM, Rui Wang wrote:
>
> Congratulations!
>
>
> -Rui
>
> On Mon, Aug 26, 2019 at 2:36 PM Hannah Jiang 
> wrote:
>
>> Congratulations Valentyn, well deserved!
>>
>> On Mon, Aug 26, 2019 at 2:34 PM Chamikara Jayalath <
>> chamik...@google.com> wrote:
>>
>>> Congrats Valentyn!
>>>
>>> On Mon, Aug 26, 2019 at 2:32 PM Pablo Estrada 
>>> wrote:
>>>
 Thanks Valentyn!

 On Mon, Aug 26, 2019 at 2:29 PM Robin Qiu 
 wrote:

> Thank you Valentyn! Congratulations!
>
> On Mon, Aug 26, 2019 at 2:28 PM Robert Bradshaw <
> rober...@google.com> wrote:
>
>> Hi,
>>
>> Please join me and the rest of the Beam PMC in welcoming a new
>> committer: Valentyn Tymofieiev
>>
>> Valentyn has made numerous contributions to Beam over the last
>> several
>> years (including 100+ pull requests), most recently pushing
>> through
>> the effort to make Beam compatible with Python 3. He is also an
>> active
>> participant in design discussions on the list, participates in
>> release
>> candidate validation, and proactively helps keep our tests green.
>>
>> In consideration of Valentyn's contributions, the Beam PMC trusts
>> him
>> with the responsibilities of a Beam committer [1].
>>
>> Thank you, Valentyn, for your contributions and looking forward
>> to many more!
>>
>> Robert, on behalf of the Apache Beam PMC
>>
>> [1]
>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>
>
>


Improve container support

2019-08-27 Thread Hannah Jiang
Hi team

I am working on improving docker container support for Beam. We would like
to publish prebuilt containers for each release version and daily snapshot.
Current work focuses on release images only and it would be part of the
release process.

The release images will be pushed to GCR which is publicly
accessible(pullable). We will use the following locations.
*Repository*: gcr.io/beam
*Project*: apache-beam-testing
More details, including naming and tagging scheme, can be found at wiki

which
is written by several contributors.

I would like to discuss these two questions.
*1. How many tests do we need to run before pushing images to gcr*?
Publishing artifacts is the last step of the release process, so at this
moment, we already verified all codebase. In addition, many Jenkins tests
use containers, so it is already verified several times. Do we need to run
it again?

*2. How many tests do we need to run to validate pushed images?*
When we push the images, we assume the images would work and pass all the
tests. After pushing, we should confirm the images are pullable and
useable. I suggest we run several tests on dataflow with each pushed image.
What do you think?

This work can be refined later as we explore more during our release
process.
Please comment or edit the wiki page or reply to this email with your
opinions.

Thanks,
Hannah


Re: Write-through-cache in State logic

2019-08-27 Thread Robert Bradshaw
Just to clarify, the repeated list of cache tokens in the process
bundle request is used to validate reading *and* stored when writing?
In that sense, should they just be called version identifiers or
something like that?

On Tue, Aug 27, 2019 at 11:33 AM Maximilian Michels  wrote:
>
> Thanks. Updated:
>
> message ProcessBundleRequest {
>   // (Required) A reference to the process bundle descriptor that must be
>   // instantiated and executed by the SDK harness.
>   string process_bundle_descriptor_reference = 1;
>
>   // A cache token which can be used by an SDK to check for the validity
>   // of cached elements which have a cache token associated.
>   message CacheToken {
>
> // A flag to indicate a cache token is valid for user state.
> message UserState {}
>
> // A flag to indicate a cache token is valid for a side input.
> message SideInput {
>   // The id of a side input.
>   string side_input = 1;
> }
>
> // The scope of a cache token.
> oneof type {
>   UserState user_state = 1;
>   SideInput side_input = 2;
> }
>
> // The cache token identifier which should be globally unique.
> bytes token = 10;
>   }
>
>   // (Optional) A list of cache tokens that can be used by an SDK to reuse
>   // cached data returned by the State API across multiple bundles.
>   repeated CacheToken cache_tokens = 2;
> }
>
> On 27.08.19 19:22, Lukasz Cwik wrote:
>
> SideInputState -> SideInput (side_input_state -> side_input)
> + more comments around the messages and the fields.
>
>
> On Tue, Aug 27, 2019 at 10:18 AM Maximilian Michels  wrote:
>>
>> We would have to differentiate cache tokens for user state and side inputs. 
>> How about something like this?
>>
>> message ProcessBundleRequest {
>>   // (Required) A reference to the process bundle descriptor that must be
>>   // instantiated and executed by the SDK harness.
>>   string process_bundle_descriptor_reference = 1;
>>
>>   message CacheToken {
>>
>> message UserState {
>> }
>>
>> message SideInputState {
>>   string side_input_id = 1;
>> }
>>
>> oneof type {
>>   UserState user_state = 1;
>>   SideInputState side_input_state = 2;
>> }
>>
>> bytes token = 10;
>>   }
>>
>>   // (Optional) A list of cache tokens that can be used by an SDK to reuse
>>   // cached data returned by the State API across multiple bundles.
>>   repeated CacheToken cache_tokens = 2;
>> }
>>
>> -Max
>>
>> On 27.08.19 18:43, Lukasz Cwik wrote:
>>
>> The bundles view of side inputs should never change during processing and 
>> should have a point in time snapshot.
>>
>> I was just trying to say that the cache token for side inputs being deferred 
>> till side input request time simplified the runners implementation since 
>> that is conclusively when the runner would need to take a look at the side 
>> input. Putting them as part of the ProcesBundleRequest complicates that but 
>> does make the SDK implementation significantly simpler which is a win.
>>
>> On Tue, Aug 27, 2019 at 9:14 AM Maximilian Michels  wrote:
>>>
>>> Thanks for the quick response.
>>>
>>> Just to clarify, the issue with versioning side input is also present
>>> when supplying the cache tokens on a request basis instead of per
>>> bundle. The SDK never knows when the Runner receives a new version of
>>> the side input. Like you pointed out, it needs to mark side inputs as
>>> stale and generate new cache tokens for the stale side inputs.
>>>
>>> The difference between per-request tokens and per-bundle tokens would be
>>> that the side input can only change after a bundle completes vs. during
>>> the bundle. Side inputs are always fuzzy in that regard because there is
>>> no precise instance where side inputs are atomically updated, other than
>>> the assumption that they eventually will be updated. In that regard
>>> per-bundle tokens for side input seem to be fine.
>>>
>>> All of the above is not an issue for user state, as its cache can remain
>>> valid for the lifetime of a Runner<=>SDK Harness connection. A simple
>>> solution would be to not cache side input because there are many cases
>>> where the caching just adds additional overhead. However, I can also
>>> imagine cases where side input is valid forever and caching would be
>>> very beneficial.
>>>
>>> For the first version I want to focus on user state because that's where
>>> I see the most benefit for caching. I don't see a problem though for the
>>> Runner to detect new side input and reflect that in the cache tokens
>>> supplied for a new bundle.
>>>
>>> -Max
>>>
>>> On 26.08.19 22:27, Lukasz Cwik wrote:
>>> > Your summary below makes sense to me. I can see that recovery from
>>> > rolling back doesn't need to be a priority and simplifies the solution
>>> > for user state caching down to one token.
>>> >
>>> > Providing cache tokens upfront does require the Runner to know what
>>> > "version" of everything it may supply to the SDK upfront (instead of on
>>> > 

Re: Write-through-cache in State logic

2019-08-27 Thread Robert Bradshaw
On Sun, Aug 18, 2019 at 7:30 PM Rakesh Kumar  wrote:
>
> not to completely hijack Max's question but a tangential question regarding 
> LRU cache.
>
> What is the preferred python library for LRU cache?
> I noticed that cachetools [1] is used as one of the dependencies for GCP [2]. 
> Cachetools[1] has LRU cache and it supports Python 2 & 3. It can potentially 
> support our use case.  Can we move cachetools to the required pacakge list 
> [3] and use it for cross bundle caching?
>
> 1. https://pypi.org/project/cachetools/
> 2. 
> https://github.com/apache/beam/blob/96abacba9b8c7475c753eb3c0b58cca27c46feb1/sdks/python/setup.py#L143
> 3. 
> https://github.com/apache/beam/blob/96abacba9b8c7475c753eb3c0b58cca27c46feb1/sdks/python/setup.py#L104

cachetools sounds like a fine choice to me.


Re: Write-through-cache in State logic

2019-08-27 Thread Lukasz Cwik
Open up a PR for the proto changes and we can work through any minor
comments there.

On Tue, Aug 27, 2019 at 11:33 AM Maximilian Michels  wrote:

> Thanks. Updated:
>
> message ProcessBundleRequest {
>   // (Required) A reference to the process bundle descriptor that must be  // 
> instantiated and executed by the SDK harness.  string 
> process_bundle_descriptor_reference = 1;
>
>   // A cache token which can be used by an SDK to check for the validity  // 
> of cached elements which have a cache token associated.  message CacheToken {
>
> // A flag to indicate a cache token is valid for user state.message 
> UserState {}
>
> // A flag to indicate a cache token is valid for a side input.message 
> SideInput {
>   // The id of a side input.  string side_input = 1;
> }
>
> // The scope of a cache token.oneof type {
>   UserState user_state = 1;
>   SideInput side_input = 2;
> }
>
> // The cache token identifier which should be globally unique.bytes 
> token = 10;
>   }
>
>   // (Optional) A list of cache tokens that can be used by an SDK to reuse  
> // cached data returned by the State API across multiple bundles.  repeated 
> CacheToken cache_tokens = 2;
> }
>
> On 27.08.19 19:22, Lukasz Cwik wrote:
>
> SideInputState -> SideInput (side_input_state -> side_input)
> + more comments around the messages and the fields.
>
>
> On Tue, Aug 27, 2019 at 10:18 AM Maximilian Michels 
> wrote:
>
>> We would have to differentiate cache tokens for user state and side
>> inputs. How about something like this?
>>
>> message ProcessBundleRequest {
>>   // (Required) A reference to the process bundle descriptor that must be  
>> // instantiated and executed by the SDK harness.  string 
>> process_bundle_descriptor_reference = 1;
>>
>>   message CacheToken {
>>
>> message UserState {
>> }
>>
>> message SideInputState {
>>   string side_input_id = 1;
>> }
>>
>> oneof type {
>>   UserState user_state = 1;
>>   SideInputState side_input_state = 2;
>> }
>>
>> bytes token = 10;
>>   }
>>
>>   // (Optional) A list of cache tokens that can be used by an SDK to reuse  
>> // cached data returned by the State API across multiple bundles.  repeated 
>> CacheToken cache_tokens = 2;
>> }
>>
>> -Max
>>
>> On 27.08.19 18:43, Lukasz Cwik wrote:
>>
>> The bundles view of side inputs should never change during processing and
>> should have a point in time snapshot.
>>
>> I was just trying to say that the cache token for side inputs being
>> deferred till side input request time simplified the runners implementation
>> since that is conclusively when the runner would need to take a look at the
>> side input. Putting them as part of the ProcesBundleRequest complicates
>> that but does make the SDK implementation significantly simpler which is a
>> win.
>>
>> On Tue, Aug 27, 2019 at 9:14 AM Maximilian Michels 
>> wrote:
>>
>>> Thanks for the quick response.
>>>
>>> Just to clarify, the issue with versioning side input is also present
>>> when supplying the cache tokens on a request basis instead of per
>>> bundle. The SDK never knows when the Runner receives a new version of
>>> the side input. Like you pointed out, it needs to mark side inputs as
>>> stale and generate new cache tokens for the stale side inputs.
>>>
>>> The difference between per-request tokens and per-bundle tokens would be
>>> that the side input can only change after a bundle completes vs. during
>>> the bundle. Side inputs are always fuzzy in that regard because there is
>>> no precise instance where side inputs are atomically updated, other than
>>> the assumption that they eventually will be updated. In that regard
>>> per-bundle tokens for side input seem to be fine.
>>>
>>> All of the above is not an issue for user state, as its cache can remain
>>> valid for the lifetime of a Runner<=>SDK Harness connection. A simple
>>> solution would be to not cache side input because there are many cases
>>> where the caching just adds additional overhead. However, I can also
>>> imagine cases where side input is valid forever and caching would be
>>> very beneficial.
>>>
>>> For the first version I want to focus on user state because that's where
>>> I see the most benefit for caching. I don't see a problem though for the
>>> Runner to detect new side input and reflect that in the cache tokens
>>> supplied for a new bundle.
>>>
>>> -Max
>>>
>>> On 26.08.19 22:27, Lukasz Cwik wrote:
>>> > Your summary below makes sense to me. I can see that recovery from
>>> > rolling back doesn't need to be a priority and simplifies the solution
>>> > for user state caching down to one token.
>>> >
>>> > Providing cache tokens upfront does require the Runner to know what
>>> > "version" of everything it may supply to the SDK upfront (instead of
>>> on
>>> > request) which would mean that the Runner may need to have a mapping
>>> > from cache token to internal version identifier for things like side

Re: [ANNOUNCE] New committer: Valentyn Tymofieiev

2019-08-27 Thread Boyuan Zhang
Congratulations!

On Tue, Aug 27, 2019 at 10:44 AM Udi Meiri  wrote:

> Congrats!
>
> On Tue, Aug 27, 2019 at 9:50 AM Yichi Zhang  wrote:
>
>> Congrats Valentyn!
>>
>> On Tue, Aug 27, 2019 at 7:55 AM Valentyn Tymofieiev 
>> wrote:
>>
>>> Thank you everyone!
>>>
>>> On Tue, Aug 27, 2019 at 2:57 AM Alexey Romanenko <
>>> aromanenko@gmail.com> wrote:
>>>
 Congrats, well deserved!

 On 27 Aug 2019, at 11:25, Jan Lukavský  wrote:

 Congrats Valentyn!
 On 8/26/19 11:43 PM, Rui Wang wrote:

 Congratulations!


 -Rui

 On Mon, Aug 26, 2019 at 2:36 PM Hannah Jiang 
 wrote:

> Congratulations Valentyn, well deserved!
>
> On Mon, Aug 26, 2019 at 2:34 PM Chamikara Jayalath <
> chamik...@google.com> wrote:
>
>> Congrats Valentyn!
>>
>> On Mon, Aug 26, 2019 at 2:32 PM Pablo Estrada 
>> wrote:
>>
>>> Thanks Valentyn!
>>>
>>> On Mon, Aug 26, 2019 at 2:29 PM Robin Qiu 
>>> wrote:
>>>
 Thank you Valentyn! Congratulations!

 On Mon, Aug 26, 2019 at 2:28 PM Robert Bradshaw <
 rober...@google.com> wrote:

> Hi,
>
> Please join me and the rest of the Beam PMC in welcoming a new
> committer: Valentyn Tymofieiev
>
> Valentyn has made numerous contributions to Beam over the last
> several
> years (including 100+ pull requests), most recently pushing through
> the effort to make Beam compatible with Python 3. He is also an
> active
> participant in design discussions on the list, participates in
> release
> candidate validation, and proactively helps keep our tests green.
>
> In consideration of Valentyn's contributions, the Beam PMC trusts
> him
> with the responsibilities of a Beam committer [1].
>
> Thank you, Valentyn, for your contributions and looking forward to
> many more!
>
> Robert, on behalf of the Apache Beam PMC
>
> [1]
> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>




Re: [ANNOUNCE] New committer: Valentyn Tymofieiev

2019-08-27 Thread Udi Meiri
Congrats!

On Tue, Aug 27, 2019 at 9:50 AM Yichi Zhang  wrote:

> Congrats Valentyn!
>
> On Tue, Aug 27, 2019 at 7:55 AM Valentyn Tymofieiev 
> wrote:
>
>> Thank you everyone!
>>
>> On Tue, Aug 27, 2019 at 2:57 AM Alexey Romanenko <
>> aromanenko@gmail.com> wrote:
>>
>>> Congrats, well deserved!
>>>
>>> On 27 Aug 2019, at 11:25, Jan Lukavský  wrote:
>>>
>>> Congrats Valentyn!
>>> On 8/26/19 11:43 PM, Rui Wang wrote:
>>>
>>> Congratulations!
>>>
>>>
>>> -Rui
>>>
>>> On Mon, Aug 26, 2019 at 2:36 PM Hannah Jiang 
>>> wrote:
>>>
 Congratulations Valentyn, well deserved!

 On Mon, Aug 26, 2019 at 2:34 PM Chamikara Jayalath <
 chamik...@google.com> wrote:

> Congrats Valentyn!
>
> On Mon, Aug 26, 2019 at 2:32 PM Pablo Estrada 
> wrote:
>
>> Thanks Valentyn!
>>
>> On Mon, Aug 26, 2019 at 2:29 PM Robin Qiu  wrote:
>>
>>> Thank you Valentyn! Congratulations!
>>>
>>> On Mon, Aug 26, 2019 at 2:28 PM Robert Bradshaw 
>>> wrote:
>>>
 Hi,

 Please join me and the rest of the Beam PMC in welcoming a new
 committer: Valentyn Tymofieiev

 Valentyn has made numerous contributions to Beam over the last
 several
 years (including 100+ pull requests), most recently pushing through
 the effort to make Beam compatible with Python 3. He is also an
 active
 participant in design discussions on the list, participates in
 release
 candidate validation, and proactively helps keep our tests green.

 In consideration of Valentyn's contributions, the Beam PMC trusts
 him
 with the responsibilities of a Beam committer [1].

 Thank you, Valentyn, for your contributions and looking forward to
 many more!

 Robert, on behalf of the Apache Beam PMC

 [1]
 https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer

>>>
>>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Write-through-cache in State logic

2019-08-27 Thread Lukasz Cwik
SideInputState -> SideInput (side_input_state -> side_input)
+ more comments around the messages and the fields.


On Tue, Aug 27, 2019 at 10:18 AM Maximilian Michels  wrote:

> We would have to differentiate cache tokens for user state and side
> inputs. How about something like this?
>
> message ProcessBundleRequest {
>   // (Required) A reference to the process bundle descriptor that must be  // 
> instantiated and executed by the SDK harness.  string 
> process_bundle_descriptor_reference = 1;
>
>   message CacheToken {
>
> message UserState {
> }
>
> message SideInputState {
>   string side_input_id = 1;
> }
>
> oneof type {
>   UserState user_state = 1;
>   SideInputState side_input_state = 2;
> }
>
> bytes token = 10;
>   }
>
>   // (Optional) A list of cache tokens that can be used by an SDK to reuse  
> // cached data returned by the State API across multiple bundles.  repeated 
> CacheToken cache_tokens = 2;
> }
>
> -Max
>
> On 27.08.19 18:43, Lukasz Cwik wrote:
>
> The bundles view of side inputs should never change during processing and
> should have a point in time snapshot.
>
> I was just trying to say that the cache token for side inputs being
> deferred till side input request time simplified the runners implementation
> since that is conclusively when the runner would need to take a look at the
> side input. Putting them as part of the ProcesBundleRequest complicates
> that but does make the SDK implementation significantly simpler which is a
> win.
>
> On Tue, Aug 27, 2019 at 9:14 AM Maximilian Michels  wrote:
>
>> Thanks for the quick response.
>>
>> Just to clarify, the issue with versioning side input is also present
>> when supplying the cache tokens on a request basis instead of per
>> bundle. The SDK never knows when the Runner receives a new version of
>> the side input. Like you pointed out, it needs to mark side inputs as
>> stale and generate new cache tokens for the stale side inputs.
>>
>> The difference between per-request tokens and per-bundle tokens would be
>> that the side input can only change after a bundle completes vs. during
>> the bundle. Side inputs are always fuzzy in that regard because there is
>> no precise instance where side inputs are atomically updated, other than
>> the assumption that they eventually will be updated. In that regard
>> per-bundle tokens for side input seem to be fine.
>>
>> All of the above is not an issue for user state, as its cache can remain
>> valid for the lifetime of a Runner<=>SDK Harness connection. A simple
>> solution would be to not cache side input because there are many cases
>> where the caching just adds additional overhead. However, I can also
>> imagine cases where side input is valid forever and caching would be
>> very beneficial.
>>
>> For the first version I want to focus on user state because that's where
>> I see the most benefit for caching. I don't see a problem though for the
>> Runner to detect new side input and reflect that in the cache tokens
>> supplied for a new bundle.
>>
>> -Max
>>
>> On 26.08.19 22:27, Lukasz Cwik wrote:
>> > Your summary below makes sense to me. I can see that recovery from
>> > rolling back doesn't need to be a priority and simplifies the solution
>> > for user state caching down to one token.
>> >
>> > Providing cache tokens upfront does require the Runner to know what
>> > "version" of everything it may supply to the SDK upfront (instead of on
>> > request) which would mean that the Runner may need to have a mapping
>> > from cache token to internal version identifier for things like side
>> > inputs which are typically broadcast. The Runner would also need to
>> poll
>> > to see if the side input has changed in the background to not block
>> > processing bundles with "stale" side input data.
>> >
>> > Ping me once you have the Runner PR updated and I'll take a look again.
>> >
>> > On Mon, Aug 26, 2019 at 12:20 PM Maximilian Michels > > > wrote:
>> >
>> > Thank you for the summary Luke. I really appreciate the effort you
>> put
>> > into this!
>> >
>> >  > Based upon your discussion you seem to want option #1
>> >
>> > I'm actually for option #2. The option to cache/invalidate side
>> inputs
>> > is important, and we should incorporate this in the design. That's
>> why
>> > option #1 is not flexible enough. However, a first implementation
>> could
>> > defer caching of side inputs.
>> >
>> > Option #3 was my initial thinking and the first version of the PR,
>> but I
>> > think we agreed that there wouldn't be much gain from keeping a
>> cache
>> > token per state id.
>> >
>> > Option #4 is what is specifically documented in the reference doc
>> and
>> > already part of the Proto, where valid tokens are provided for each
>> new
>> > bundle and also as part of the response of a get/put/clear. We
>> mentioned
>> > that the reply does not have to be waited on 

Re: Write-through-cache in State logic

2019-08-27 Thread Maximilian Michels
We would have to differentiate cache tokens for user state and side 
inputs. How about something like this?


message ProcessBundleRequest {
  // (Required) A reference to the process bundle descriptor that must be 
// instantiated and executed by the SDK harness. string process_bundle_descriptor_reference =1;


  message CacheToken {

message UserState {
}

message SideInputState {
  string side_input_id =1;
}

oneof type {
  UserState user_state =1;
  SideInputState side_input_state =2;
}

bytes token =10;
  }

  // (Optional) A list of cache tokens that can be used by an SDK to reuse 
// cached data returned by the State API across multiple bundles. repeated CacheToken cache_tokens =2;

}

-Max

On 27.08.19 18:43, Lukasz Cwik wrote:
The bundles view of side inputs should never change during processing 
and should have a point in time snapshot.


I was just trying to say that the cache token for side inputs being 
deferred till side input request time simplified the runners 
implementation since that is conclusively when the runner would need 
to take a look at the side input. Putting them as part of the 
ProcesBundleRequest complicates that but does make the SDK 
implementation significantly simpler which is a win.


On Tue, Aug 27, 2019 at 9:14 AM Maximilian Michels > wrote:


Thanks for the quick response.

Just to clarify, the issue with versioning side input is also present
when supplying the cache tokens on a request basis instead of per
bundle. The SDK never knows when the Runner receives a new version of
the side input. Like you pointed out, it needs to mark side inputs as
stale and generate new cache tokens for the stale side inputs.

The difference between per-request tokens and per-bundle tokens
would be
that the side input can only change after a bundle completes vs.
during
the bundle. Side inputs are always fuzzy in that regard because
there is
no precise instance where side inputs are atomically updated,
other than
the assumption that they eventually will be updated. In that regard
per-bundle tokens for side input seem to be fine.

All of the above is not an issue for user state, as its cache can
remain
valid for the lifetime of a Runner<=>SDK Harness connection. A simple
solution would be to not cache side input because there are many
cases
where the caching just adds additional overhead. However, I can also
imagine cases where side input is valid forever and caching would be
very beneficial.

For the first version I want to focus on user state because that's
where
I see the most benefit for caching. I don't see a problem though
for the
Runner to detect new side input and reflect that in the cache tokens
supplied for a new bundle.

-Max

On 26.08.19 22:27, Lukasz Cwik wrote:
> Your summary below makes sense to me. I can see that recovery from
> rolling back doesn't need to be a priority and simplifies the
solution
> for user state caching down to one token.
>
> Providing cache tokens upfront does require the Runner to know what
> "version" of everything it may supply to the SDK upfront
(instead of on
> request) which would mean that the Runner may need to have a
mapping
> from cache token to internal version identifier for things like
side
> inputs which are typically broadcast. The Runner would also need
to poll
> to see if the side input has changed in the background to not block
> processing bundles with "stale" side input data.
>
> Ping me once you have the Runner PR updated and I'll take a look
again.
>
> On Mon, Aug 26, 2019 at 12:20 PM Maximilian Michels
mailto:m...@apache.org>
> >> wrote:
>
>     Thank you for the summary Luke. I really appreciate the
effort you put
>     into this!
>
>      > Based upon your discussion you seem to want option #1
>
>     I'm actually for option #2. The option to cache/invalidate
side inputs
>     is important, and we should incorporate this in the design.
That's why
>     option #1 is not flexible enough. However, a first
implementation could
>     defer caching of side inputs.
>
>     Option #3 was my initial thinking and the first version of
the PR, but I
>     think we agreed that there wouldn't be much gain from
keeping a cache
>     token per state id.
>
>     Option #4 is what is specifically documented in the
reference doc and
>     already part of the Proto, where valid tokens are provided
for each new
>     bundle and also as part of the response of a get/put/clear.
We mentioned
>     that the reply does not have to be waited on synchronously
(I mentioned
>     it even), but it complicates the 

Re: [ANNOUNCE] New committer: Valentyn Tymofieiev

2019-08-27 Thread Yichi Zhang
Congrats Valentyn!

On Tue, Aug 27, 2019 at 7:55 AM Valentyn Tymofieiev 
wrote:

> Thank you everyone!
>
> On Tue, Aug 27, 2019 at 2:57 AM Alexey Romanenko 
> wrote:
>
>> Congrats, well deserved!
>>
>> On 27 Aug 2019, at 11:25, Jan Lukavský  wrote:
>>
>> Congrats Valentyn!
>> On 8/26/19 11:43 PM, Rui Wang wrote:
>>
>> Congratulations!
>>
>>
>> -Rui
>>
>> On Mon, Aug 26, 2019 at 2:36 PM Hannah Jiang 
>> wrote:
>>
>>> Congratulations Valentyn, well deserved!
>>>
>>> On Mon, Aug 26, 2019 at 2:34 PM Chamikara Jayalath 
>>> wrote:
>>>
 Congrats Valentyn!

 On Mon, Aug 26, 2019 at 2:32 PM Pablo Estrada 
 wrote:

> Thanks Valentyn!
>
> On Mon, Aug 26, 2019 at 2:29 PM Robin Qiu  wrote:
>
>> Thank you Valentyn! Congratulations!
>>
>> On Mon, Aug 26, 2019 at 2:28 PM Robert Bradshaw 
>> wrote:
>>
>>> Hi,
>>>
>>> Please join me and the rest of the Beam PMC in welcoming a new
>>> committer: Valentyn Tymofieiev
>>>
>>> Valentyn has made numerous contributions to Beam over the last
>>> several
>>> years (including 100+ pull requests), most recently pushing through
>>> the effort to make Beam compatible with Python 3. He is also an
>>> active
>>> participant in design discussions on the list, participates in
>>> release
>>> candidate validation, and proactively helps keep our tests green.
>>>
>>> In consideration of Valentyn's contributions, the Beam PMC trusts him
>>> with the responsibilities of a Beam committer [1].
>>>
>>> Thank you, Valentyn, for your contributions and looking forward to
>>> many more!
>>>
>>> Robert, on behalf of the Apache Beam PMC
>>>
>>> [1]
>>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>>
>>
>>


Re: Write-through-cache in State logic

2019-08-27 Thread Lukasz Cwik
The bundles view of side inputs should never change during processing and
should have a point in time snapshot.

I was just trying to say that the cache token for side inputs being
deferred till side input request time simplified the runners implementation
since that is conclusively when the runner would need to take a look at the
side input. Putting them as part of the ProcesBundleRequest complicates
that but does make the SDK implementation significantly simpler which is a
win.

On Tue, Aug 27, 2019 at 9:14 AM Maximilian Michels  wrote:

> Thanks for the quick response.
>
> Just to clarify, the issue with versioning side input is also present
> when supplying the cache tokens on a request basis instead of per
> bundle. The SDK never knows when the Runner receives a new version of
> the side input. Like you pointed out, it needs to mark side inputs as
> stale and generate new cache tokens for the stale side inputs.
>
> The difference between per-request tokens and per-bundle tokens would be
> that the side input can only change after a bundle completes vs. during
> the bundle. Side inputs are always fuzzy in that regard because there is
> no precise instance where side inputs are atomically updated, other than
> the assumption that they eventually will be updated. In that regard
> per-bundle tokens for side input seem to be fine.
>
> All of the above is not an issue for user state, as its cache can remain
> valid for the lifetime of a Runner<=>SDK Harness connection. A simple
> solution would be to not cache side input because there are many cases
> where the caching just adds additional overhead. However, I can also
> imagine cases where side input is valid forever and caching would be
> very beneficial.
>
> For the first version I want to focus on user state because that's where
> I see the most benefit for caching. I don't see a problem though for the
> Runner to detect new side input and reflect that in the cache tokens
> supplied for a new bundle.
>
> -Max
>
> On 26.08.19 22:27, Lukasz Cwik wrote:
> > Your summary below makes sense to me. I can see that recovery from
> > rolling back doesn't need to be a priority and simplifies the solution
> > for user state caching down to one token.
> >
> > Providing cache tokens upfront does require the Runner to know what
> > "version" of everything it may supply to the SDK upfront (instead of on
> > request) which would mean that the Runner may need to have a mapping
> > from cache token to internal version identifier for things like side
> > inputs which are typically broadcast. The Runner would also need to poll
> > to see if the side input has changed in the background to not block
> > processing bundles with "stale" side input data.
> >
> > Ping me once you have the Runner PR updated and I'll take a look again.
> >
> > On Mon, Aug 26, 2019 at 12:20 PM Maximilian Michels  > > wrote:
> >
> > Thank you for the summary Luke. I really appreciate the effort you
> put
> > into this!
> >
> >  > Based upon your discussion you seem to want option #1
> >
> > I'm actually for option #2. The option to cache/invalidate side
> inputs
> > is important, and we should incorporate this in the design. That's
> why
> > option #1 is not flexible enough. However, a first implementation
> could
> > defer caching of side inputs.
> >
> > Option #3 was my initial thinking and the first version of the PR,
> but I
> > think we agreed that there wouldn't be much gain from keeping a cache
> > token per state id.
> >
> > Option #4 is what is specifically documented in the reference doc and
> > already part of the Proto, where valid tokens are provided for each
> new
> > bundle and also as part of the response of a get/put/clear. We
> mentioned
> > that the reply does not have to be waited on synchronously (I
> mentioned
> > it even), but it complicates the implementation. The idea Thomas and
> I
> > expressed was that a response is not even necessary if we assume
> > validity of the upfront provided cache tokens for the lifetime of a
> > bundle and that cache tokens will be invalidated as soon as the
> Runner
> > fails in any way. This is naturally the case for Flink because it
> will
> > simply "forget" its current cache tokens.
> >
> > I currently envision the following schema:
> >
> > Runner
> > ==
> >
> > - Runner generates a globally unique cache token, one for user state
> and
> > one for each side input
> >
> > - The token is supplied to the SDK Harness for each bundle request
> >
> > - For the lifetime of a Runner<=>SDK Harness connection this cache
> token
> > will not change
> > - Runner will generate a new token if the connection/key space
> changes
> > between Runner and SDK Harness
> >
> >
> > SDK
> > ===
> >
> > - For each bundle the SDK worker stores the list of valid cache
> tokens
> > - The SDK Harness keep a 

Re: Write-through-cache in State logic

2019-08-27 Thread Maximilian Michels

Thanks for the quick response.

Just to clarify, the issue with versioning side input is also present 
when supplying the cache tokens on a request basis instead of per 
bundle. The SDK never knows when the Runner receives a new version of 
the side input. Like you pointed out, it needs to mark side inputs as 
stale and generate new cache tokens for the stale side inputs.


The difference between per-request tokens and per-bundle tokens would be 
that the side input can only change after a bundle completes vs. during 
the bundle. Side inputs are always fuzzy in that regard because there is 
no precise instance where side inputs are atomically updated, other than 
the assumption that they eventually will be updated. In that regard 
per-bundle tokens for side input seem to be fine.


All of the above is not an issue for user state, as its cache can remain 
valid for the lifetime of a Runner<=>SDK Harness connection. A simple 
solution would be to not cache side input because there are many cases 
where the caching just adds additional overhead. However, I can also 
imagine cases where side input is valid forever and caching would be 
very beneficial.


For the first version I want to focus on user state because that's where 
I see the most benefit for caching. I don't see a problem though for the 
Runner to detect new side input and reflect that in the cache tokens 
supplied for a new bundle.


-Max

On 26.08.19 22:27, Lukasz Cwik wrote:
Your summary below makes sense to me. I can see that recovery from 
rolling back doesn't need to be a priority and simplifies the solution 
for user state caching down to one token.


Providing cache tokens upfront does require the Runner to know what 
"version" of everything it may supply to the SDK upfront (instead of on 
request) which would mean that the Runner may need to have a mapping 
from cache token to internal version identifier for things like side 
inputs which are typically broadcast. The Runner would also need to poll 
to see if the side input has changed in the background to not block 
processing bundles with "stale" side input data.


Ping me once you have the Runner PR updated and I'll take a look again.

On Mon, Aug 26, 2019 at 12:20 PM Maximilian Michels > wrote:


Thank you for the summary Luke. I really appreciate the effort you put
into this!

 > Based upon your discussion you seem to want option #1

I'm actually for option #2. The option to cache/invalidate side inputs
is important, and we should incorporate this in the design. That's why
option #1 is not flexible enough. However, a first implementation could
defer caching of side inputs.

Option #3 was my initial thinking and the first version of the PR, but I
think we agreed that there wouldn't be much gain from keeping a cache
token per state id.

Option #4 is what is specifically documented in the reference doc and
already part of the Proto, where valid tokens are provided for each new
bundle and also as part of the response of a get/put/clear. We mentioned
that the reply does not have to be waited on synchronously (I mentioned
it even), but it complicates the implementation. The idea Thomas and I
expressed was that a response is not even necessary if we assume
validity of the upfront provided cache tokens for the lifetime of a
bundle and that cache tokens will be invalidated as soon as the Runner
fails in any way. This is naturally the case for Flink because it will
simply "forget" its current cache tokens.

I currently envision the following schema:

Runner
==

- Runner generates a globally unique cache token, one for user state and
one for each side input

- The token is supplied to the SDK Harness for each bundle request 


- For the lifetime of a Runner<=>SDK Harness connection this cache token
will not change
- Runner will generate a new token if the connection/key space changes
between Runner and SDK Harness


SDK
===

- For each bundle the SDK worker stores the list of valid cache tokens
- The SDK Harness keep a global cache across all its (local) workers
which is a LRU cache: state_key => (cache_token, value)
- get: Lookup cache using the valid cache token for the state. If no
match, then fetch from Runner and use the already available token for
caching
- put: Put value in cache with a valid cache token, put value to pending
writes which will be flushed out latest when the bundle ends
- clear: same as put but clear cache

It does look like this is not too far off from what you were describing.
The main difference is that we just work with a single cache token. In
my opinion we do not need the second cache token for writes, as long as
we ensure that we generate a new cache token if the
bundle/checkpoint fails.

I have a draft PR
   for the Runner: 

Re: Help triaging Jira issues

2019-08-27 Thread Lukasz Cwik
It still requests that we upgrade to Jira Pro.

For those who are interested, you can select "Project automation" from the
Beam Project settings page or use this direct link
https://issues.apache.org/jira/secure/AutomationProjectAdminAction!default.jspa?projectKey=BEAM

On Tue, Aug 27, 2019 at 5:20 AM Ismaël Mejía  wrote:

> Apache's JIRA was updated recently. Luke (or someone else maybe) can
> please help
> me check if we still cannot create the rule to self triage the issues that
> were
> created and have already an assigned contributor. This will easily reduce
> issue
> triage to half because due to JIRA's UI it is really easy to forget to
> self-triage the issues they create.
>
> Also a kind reminder to contributors and in particular current committers,
> when
> you create JIRAs, they are not automatically 'self-triaged' but in many
> cases
> they should be. Please take care to do the triage if it is already
> assigned on creation or if you judge it is complete enough but prefer to
> let it
> unassigned in case someone else can work on it. That will for sure reduce
> the
> triage work until this becomes automatic.
>
> On Wed, Jun 12, 2019 at 6:16 PM Lukasz Cwik  wrote:
> >
> > I looked at automating the two in JIRA but got the unhelpful:
> >
> > "You are using Automation Lite for Jira. This is the free offering of
> Automation for Jira Pro and only contains a small subset of the many
> awesome features of the paid app. For example, project admins like yourself
> can can only create and edit automation rules in the paid offering."
> >
> > On Wed, Jun 12, 2019 at 2:22 AM Ismaël Mejía  wrote:
> >>
> >> Kenn can you or someone else with more JIRA-fu than me automatize both
> cases (I
> >> just triaged most of the still untriaged issues and found multiple new
> >> instances of
> >> both cases).
> >>
> >> On Fri, Jun 7, 2019 at 10:27 PM Kenneth Knowles 
> wrote:
> >> >
> >> > Nice. I noticed the huge drop in untriaged issues. Both of those
> ideas for automation sound reasonable.
> >> >
> >> > I think the other things that are harder to optimize can probably be
> addressed by re-triaging stale bugs. We will probably find those that
> should have been closed and those that are just sitting on an inactive
> contributor.
> >> >
> >> > Kenn
> >> >
> >> > On Fri, Jun 7, 2019 at 12:53 AM Ismaël Mejía 
> wrote:
> >> >>
> >> >> I took a look and reduced the untriaged issues to around 100. I
> >> >> noticed however some patterns that are producing more untriaged
> issues
> >> >> that we should have. Those can be probably automated (if JIRA has
> ways
> >> >> to do it):
> >> >>
> >> >> 1. Issues created and assigned on creation can be marked as open.
> >> >> 2. Once an issue has an associated PR it could be marked as open if
> it
> >> >> was in Triaged state.
> >> >>
> >> >> Other common case that is probably harder to automate are issues that
> >> >> are in Triaged state because we forgot to resolve/close them. I don’t
> >> >> know how we can improve these, apart of reminding people to look that
> >> >> they do not have untriaged assigned issues.
> >> >>
> >> >> Another interesting triage to do are the issues that are Open and
> >> >> assigned to members of the community that are not active anymore in
> >> >> the project, but that’s probably worth of another discussion, as well
> >> >> as how can we more effectively track open unassigned issues (which
> are
> >> >> currently around 1600).
> >> >>
> >> >> On Wed, Jun 5, 2019 at 7:03 PM Tanay Tummalapalli <
> ttanay...@gmail.com> wrote:
> >> >> >
> >> >> > Hi Kenneth,
> >> >> >
> >> >> > I already follow the issues@ mailing list pretty much daily.
> >> >> > I'd like to help with triaging issues, especially ones related to
> the Python SDK since I'm most familiar with it.
> >> >> >
> >> >> > On Wed, Jun 5, 2019 at 10:26 PM Alex Van Boxel 
> wrote:
> >> >> >>
> >> >> >> Hey Kenneth, I help out. I'm planning to contribute more on Beam
> and it seems to be ideal to keep up-to-date with the project.
> >> >> >>
> >> >> >>  _/
> >> >> >> _/ Alex Van Boxel
> >> >> >>
> >> >> >>
> >> >> >> On Wed, Jun 5, 2019 at 6:46 PM Kenneth Knowles 
> wrote:
> >> >> >>>
> >> >> >>> Hi all,
> >> >> >>>
> >> >> >>> I am requesting help in triaging incoming issues. I made a
> search here: https://issues.apache.org/jira/issues/?filter=12345682
> >> >> >>>
> >> >> >>> I have a daily email subscription to this filter as a reminder,
> but rarely can really sit down to do triage for very long. It has grown
> from just under 200 to just over 200. The rate is actually pretty low but
> there is a backlog. I also want to start re-triaging stale bugs but
> priority would be (1) keep up with new bugs (2) clear backlog (3) re-triage
> stale bugs.
> >> >> >>>
> >> >> >>> Just FYI what I look for before I clicked "Triaged" is:
> >> >> >>>
> >> >> >>>  - correct component
> >> >> >>>  - correct priority
> >> >> >>>  - maybe ping someone in a comment or assign
> >> >> >>>  - write to dev@ if it is a major 

Re: Help triaging Jira issues

2019-08-27 Thread Ismaël Mejía
Apache's JIRA was updated recently. Luke (or someone else maybe) can please help
me check if we still cannot create the rule to self triage the issues that were
created and have already an assigned contributor. This will easily reduce issue
triage to half because due to JIRA's UI it is really easy to forget to
self-triage the issues they create.

Also a kind reminder to contributors and in particular current committers, when
you create JIRAs, they are not automatically 'self-triaged' but in many cases
they should be. Please take care to do the triage if it is already
assigned on creation or if you judge it is complete enough but prefer to let it
unassigned in case someone else can work on it. That will for sure reduce the
triage work until this becomes automatic.

On Wed, Jun 12, 2019 at 6:16 PM Lukasz Cwik  wrote:
>
> I looked at automating the two in JIRA but got the unhelpful:
>
> "You are using Automation Lite for Jira. This is the free offering of 
> Automation for Jira Pro and only contains a small subset of the many awesome 
> features of the paid app. For example, project admins like yourself can can 
> only create and edit automation rules in the paid offering."
>
> On Wed, Jun 12, 2019 at 2:22 AM Ismaël Mejía  wrote:
>>
>> Kenn can you or someone else with more JIRA-fu than me automatize both cases 
>> (I
>> just triaged most of the still untriaged issues and found multiple new
>> instances of
>> both cases).
>>
>> On Fri, Jun 7, 2019 at 10:27 PM Kenneth Knowles  wrote:
>> >
>> > Nice. I noticed the huge drop in untriaged issues. Both of those ideas for 
>> > automation sound reasonable.
>> >
>> > I think the other things that are harder to optimize can probably be 
>> > addressed by re-triaging stale bugs. We will probably find those that 
>> > should have been closed and those that are just sitting on an inactive 
>> > contributor.
>> >
>> > Kenn
>> >
>> > On Fri, Jun 7, 2019 at 12:53 AM Ismaël Mejía  wrote:
>> >>
>> >> I took a look and reduced the untriaged issues to around 100. I
>> >> noticed however some patterns that are producing more untriaged issues
>> >> that we should have. Those can be probably automated (if JIRA has ways
>> >> to do it):
>> >>
>> >> 1. Issues created and assigned on creation can be marked as open.
>> >> 2. Once an issue has an associated PR it could be marked as open if it
>> >> was in Triaged state.
>> >>
>> >> Other common case that is probably harder to automate are issues that
>> >> are in Triaged state because we forgot to resolve/close them. I don’t
>> >> know how we can improve these, apart of reminding people to look that
>> >> they do not have untriaged assigned issues.
>> >>
>> >> Another interesting triage to do are the issues that are Open and
>> >> assigned to members of the community that are not active anymore in
>> >> the project, but that’s probably worth of another discussion, as well
>> >> as how can we more effectively track open unassigned issues (which are
>> >> currently around 1600).
>> >>
>> >> On Wed, Jun 5, 2019 at 7:03 PM Tanay Tummalapalli  
>> >> wrote:
>> >> >
>> >> > Hi Kenneth,
>> >> >
>> >> > I already follow the issues@ mailing list pretty much daily.
>> >> > I'd like to help with triaging issues, especially ones related to the 
>> >> > Python SDK since I'm most familiar with it.
>> >> >
>> >> > On Wed, Jun 5, 2019 at 10:26 PM Alex Van Boxel  wrote:
>> >> >>
>> >> >> Hey Kenneth, I help out. I'm planning to contribute more on Beam and 
>> >> >> it seems to be ideal to keep up-to-date with the project.
>> >> >>
>> >> >>  _/
>> >> >> _/ Alex Van Boxel
>> >> >>
>> >> >>
>> >> >> On Wed, Jun 5, 2019 at 6:46 PM Kenneth Knowles  wrote:
>> >> >>>
>> >> >>> Hi all,
>> >> >>>
>> >> >>> I am requesting help in triaging incoming issues. I made a search 
>> >> >>> here: https://issues.apache.org/jira/issues/?filter=12345682
>> >> >>>
>> >> >>> I have a daily email subscription to this filter as a reminder, but 
>> >> >>> rarely can really sit down to do triage for very long. It has grown 
>> >> >>> from just under 200 to just over 200. The rate is actually pretty low 
>> >> >>> but there is a backlog. I also want to start re-triaging stale bugs 
>> >> >>> but priority would be (1) keep up with new bugs (2) clear backlog (3) 
>> >> >>> re-triage stale bugs.
>> >> >>>
>> >> >>> Just FYI what I look for before I clicked "Triaged" is:
>> >> >>>
>> >> >>>  - correct component
>> >> >>>  - correct priority
>> >> >>>  - maybe ping someone in a comment or assign
>> >> >>>  - write to dev@ if it is a major problem
>> >> >>>
>> >> >>> If I can't figure that out, then I ask the reporter for clarification 
>> >> >>> and "Start Watching" the issue so I will receive their response.
>> >> >>>
>> >> >>> To avoid duplicate triage work it may help to assign to yourself 
>> >> >>> temporarily during triage phase.
>> >> >>>
>> >> >>> Any help greatly appreciated!
>> >> >>>
>> >> >>> Kenn


Re: [ANNOUNCE] New committer: Valentyn Tymofieiev

2019-08-27 Thread Alexey Romanenko
Congrats, well deserved!

> On 27 Aug 2019, at 11:25, Jan Lukavský  wrote:
> 
> Congrats Valentyn!
> On 8/26/19 11:43 PM, Rui Wang wrote:
>> Congratulations!
>> 
>> 
>> -Rui
>> 
>> On Mon, Aug 26, 2019 at 2:36 PM Hannah Jiang > > wrote:
>> Congratulations Valentyn, well deserved!
>> 
>> On Mon, Aug 26, 2019 at 2:34 PM Chamikara Jayalath > > wrote:
>> Congrats Valentyn!
>> 
>> On Mon, Aug 26, 2019 at 2:32 PM Pablo Estrada > > wrote:
>> Thanks Valentyn!
>> 
>> On Mon, Aug 26, 2019 at 2:29 PM Robin Qiu > > wrote:
>> Thank you Valentyn! Congratulations!
>> 
>> On Mon, Aug 26, 2019 at 2:28 PM Robert Bradshaw > > wrote:
>> Hi,
>> 
>> Please join me and the rest of the Beam PMC in welcoming a new
>> committer: Valentyn Tymofieiev
>> 
>> Valentyn has made numerous contributions to Beam over the last several
>> years (including 100+ pull requests), most recently pushing through
>> the effort to make Beam compatible with Python 3. He is also an active
>> participant in design discussions on the list, participates in release
>> candidate validation, and proactively helps keep our tests green.
>> 
>> In consideration of Valentyn's contributions, the Beam PMC trusts him
>> with the responsibilities of a Beam committer [1].
>> 
>> Thank you, Valentyn, for your contributions and looking forward to many more!
>> 
>> Robert, on behalf of the Apache Beam PMC
>> 
>> [1] 
>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>  
>> 



Re: [ANNOUNCE] New committer: Valentyn Tymofieiev

2019-08-27 Thread Jan Lukavský

Congrats Valentyn!

On 8/26/19 11:43 PM, Rui Wang wrote:

Congratulations!


-Rui

On Mon, Aug 26, 2019 at 2:36 PM Hannah Jiang > wrote:


Congratulations Valentyn, well deserved!

On Mon, Aug 26, 2019 at 2:34 PM Chamikara Jayalath
mailto:chamik...@google.com>> wrote:

Congrats Valentyn!

On Mon, Aug 26, 2019 at 2:32 PM Pablo Estrada
mailto:pabl...@google.com>> wrote:

Thanks Valentyn!

On Mon, Aug 26, 2019 at 2:29 PM Robin Qiu
mailto:robi...@google.com>> wrote:

Thank you Valentyn! Congratulations!

On Mon, Aug 26, 2019 at 2:28 PM Robert Bradshaw
mailto:rober...@google.com>> wrote:

Hi,

Please join me and the rest of the Beam PMC in
welcoming a new
committer: Valentyn Tymofieiev

Valentyn has made numerous contributions to Beam
over the last several
years (including 100+ pull requests), most
recently pushing through
the effort to make Beam compatible with Python 3.
He is also an active
participant in design discussions on the list,
participates in release
candidate validation, and proactively helps keep
our tests green.

In consideration of Valentyn's contributions, the
Beam PMC trusts him
with the responsibilities of a Beam committer [1].

Thank you, Valentyn, for your contributions and
looking forward to many more!

Robert, on behalf of the Apache Beam PMC

[1]

https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer



Re: [ANNOUNCE] Beam 2.15.0 Released!

2019-08-27 Thread Ismaël Mejía
Thanks, really a smooth release, impressive.

On Tue, Aug 27, 2019 at 7:26 AM jincheng sun  wrote:
>
> Cheers!! Thanks for driving the release, Yifan!
> Thanks a lot to everyone who helped making this release possible!
>
> Best,
> Jincheng
>
> Thomas Weise  于2019年8月27日周二 下午12:54写道:
>>
>> Yifan, thanks for managing this release. It went smoothly!
>>
>>
>> On Fri, Aug 23, 2019 at 2:32 PM Kenneth Knowles  wrote:
>>>
>>> Nice work!
>>>
>>> On Fri, Aug 23, 2019 at 11:26 AM Charles Chen  wrote:

 Thank you Yifan!

 On Fri, Aug 23, 2019 at 11:12 AM Hannah Jiang  
 wrote:
>
> Thank you Yifan!
>
> On Fri, Aug 23, 2019 at 11:09 AM Yichi Zhang  wrote:
>>
>> Thank you Yifan!
>>
>> On Fri, Aug 23, 2019 at 11:06 AM Robin Qiu  wrote:
>>>
>>> Thank you Yifan!
>>>
>>> On Fri, Aug 23, 2019 at 11:05 AM Rui Wang  wrote:

 Thank you Yifan!

 -Rui

 On Fri, Aug 23, 2019 at 9:21 AM Pablo Estrada  
 wrote:
>
> Thanks Yifan!
>
> On Fri, Aug 23, 2019 at 8:54 AM Connell O'Callaghan 
>  wrote:
>>
>>
>> +1 thank you Yifan!!!
>>
>> On Fri, Aug 23, 2019 at 8:49 AM Ahmet Altay  wrote:
>>>
>>> Thank you Yifan!
>>>
>>> On Fri, Aug 23, 2019 at 8:00 AM Yifan Zou  
>>> wrote:

 The Apache Beam team is pleased to announce the release of version 
 2.15.0.

 Apache Beam is an open source unified programming model to define 
 and
 execute data processing pipelines, including ETL, batch and stream
 (continuous) processing. See https://beam.apache.org

 You can download the release here:

 https://beam.apache.org/get-started/downloads/

 This release includes bug fixes, features, and improvements 
 detailed on
 the Beam blog: 
 https://beam.apache.org/blog/2019/08/22/beam-2.15.0.html

 Thanks to everyone who contributed to this release, and we hope 
 you enjoy
 using Beam 2.15.0.

 Yifan Zou


Re: [ANNOUNCE] New committer: Valentyn Tymofieiev

2019-08-27 Thread Juta Staes
Congratulations Valentyn!

On Tue, 27 Aug 2019 at 09:58, Robbe Sneyders  wrote:

> Congrats Valentyn!
>
> [image: https://ml6.eu] 
>
> * Robbe Sneyders*
>
> ML6 Gent
> 
>
> M: +32 474 71 31 08
>
>
> On Tue, 27 Aug 2019 at 09:26, Gleb Kanterov  wrote:
>
>> Congratulations Valentyn!
>>
>> On Tue, Aug 27, 2019 at 7:22 AM jincheng sun 
>> wrote:
>>
>>> Congrats Valentyn!
>>>
>>> Best,
>>> Jincheng
>>>
>>> Ankur Goenka  于2019年8月27日周二 上午10:37写道:
>>>
 Congratulations Valentyn!

 On Mon, Aug 26, 2019, 5:02 PM Yifan Zou  wrote:

> Congratulations, Valentyn! Well deserved!
>
> On Mon, Aug 26, 2019 at 3:31 PM Aizhamal Nurmamat kyzy <
> aizha...@google.com> wrote:
>
>> Congratulations! and thank you for your contributions, Valentyn!
>>
>> On Mon, Aug 26, 2019 at 3:26 PM Thomas Weise  wrote:
>>
>>> Congrats!
>>>
>>>
>>> On Mon, Aug 26, 2019 at 3:22 PM Heejong Lee 
>>> wrote:
>>>
 Congratulations! :)

 On Mon, Aug 26, 2019 at 2:44 PM Rui Wang  wrote:

> Congratulations!
>
>
> -Rui
>
> On Mon, Aug 26, 2019 at 2:36 PM Hannah Jiang <
> hannahji...@google.com> wrote:
>
>> Congratulations Valentyn, well deserved!
>>
>> On Mon, Aug 26, 2019 at 2:34 PM Chamikara Jayalath <
>> chamik...@google.com> wrote:
>>
>>> Congrats Valentyn!
>>>
>>> On Mon, Aug 26, 2019 at 2:32 PM Pablo Estrada <
>>> pabl...@google.com> wrote:
>>>
 Thanks Valentyn!

 On Mon, Aug 26, 2019 at 2:29 PM Robin Qiu 
 wrote:

> Thank you Valentyn! Congratulations!
>
> On Mon, Aug 26, 2019 at 2:28 PM Robert Bradshaw <
> rober...@google.com> wrote:
>
>> Hi,
>>
>> Please join me and the rest of the Beam PMC in welcoming a new
>> committer: Valentyn Tymofieiev
>>
>> Valentyn has made numerous contributions to Beam over the
>> last several
>> years (including 100+ pull requests), most recently pushing
>> through
>> the effort to make Beam compatible with Python 3. He is also
>> an active
>> participant in design discussions on the list, participates
>> in release
>> candidate validation, and proactively helps keep our tests
>> green.
>>
>> In consideration of Valentyn's contributions, the Beam PMC
>> trusts him
>> with the responsibilities of a Beam committer [1].
>>
>> Thank you, Valentyn, for your contributions and looking
>> forward to many more!
>>
>> Robert, on behalf of the Apache Beam PMC
>>
>> [1]
>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>
>
>>
>> --
>> Cheers,
>> Gleb
>>
>

-- 

[image: https://ml6.eu] 

* Juta Staes*
ML6 Gent


 DISCLAIMER 
This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify the system manager.
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately by e-mail if you have received this e-mail by mistake and
delete this e-mail from your system. If you are not the intended recipient
you are notified that disclosing, copying, distributing or taking any
action in reliance on the contents of this information is strictly
prohibited.


Re: [ANNOUNCE] New committer: Valentyn Tymofieiev

2019-08-27 Thread Robbe Sneyders
Congrats Valentyn!

[image: https://ml6.eu] 

* Robbe Sneyders*

ML6 Gent


M: +32 474 71 31 08


On Tue, 27 Aug 2019 at 09:26, Gleb Kanterov  wrote:

> Congratulations Valentyn!
>
> On Tue, Aug 27, 2019 at 7:22 AM jincheng sun 
> wrote:
>
>> Congrats Valentyn!
>>
>> Best,
>> Jincheng
>>
>> Ankur Goenka  于2019年8月27日周二 上午10:37写道:
>>
>>> Congratulations Valentyn!
>>>
>>> On Mon, Aug 26, 2019, 5:02 PM Yifan Zou  wrote:
>>>
 Congratulations, Valentyn! Well deserved!

 On Mon, Aug 26, 2019 at 3:31 PM Aizhamal Nurmamat kyzy <
 aizha...@google.com> wrote:

> Congratulations! and thank you for your contributions, Valentyn!
>
> On Mon, Aug 26, 2019 at 3:26 PM Thomas Weise  wrote:
>
>> Congrats!
>>
>>
>> On Mon, Aug 26, 2019 at 3:22 PM Heejong Lee 
>> wrote:
>>
>>> Congratulations! :)
>>>
>>> On Mon, Aug 26, 2019 at 2:44 PM Rui Wang  wrote:
>>>
 Congratulations!


 -Rui

 On Mon, Aug 26, 2019 at 2:36 PM Hannah Jiang <
 hannahji...@google.com> wrote:

> Congratulations Valentyn, well deserved!
>
> On Mon, Aug 26, 2019 at 2:34 PM Chamikara Jayalath <
> chamik...@google.com> wrote:
>
>> Congrats Valentyn!
>>
>> On Mon, Aug 26, 2019 at 2:32 PM Pablo Estrada 
>> wrote:
>>
>>> Thanks Valentyn!
>>>
>>> On Mon, Aug 26, 2019 at 2:29 PM Robin Qiu 
>>> wrote:
>>>
 Thank you Valentyn! Congratulations!

 On Mon, Aug 26, 2019 at 2:28 PM Robert Bradshaw <
 rober...@google.com> wrote:

> Hi,
>
> Please join me and the rest of the Beam PMC in welcoming a new
> committer: Valentyn Tymofieiev
>
> Valentyn has made numerous contributions to Beam over the last
> several
> years (including 100+ pull requests), most recently pushing
> through
> the effort to make Beam compatible with Python 3. He is also
> an active
> participant in design discussions on the list, participates in
> release
> candidate validation, and proactively helps keep our tests
> green.
>
> In consideration of Valentyn's contributions, the Beam PMC
> trusts him
> with the responsibilities of a Beam committer [1].
>
> Thank you, Valentyn, for your contributions and looking
> forward to many more!
>
> Robert, on behalf of the Apache Beam PMC
>
> [1]
> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>

>
> --
> Cheers,
> Gleb
>


Re: [ANNOUNCE] New committer: Valentyn Tymofieiev

2019-08-27 Thread Gleb Kanterov
Congratulations Valentyn!

On Tue, Aug 27, 2019 at 7:22 AM jincheng sun 
wrote:

> Congrats Valentyn!
>
> Best,
> Jincheng
>
> Ankur Goenka  于2019年8月27日周二 上午10:37写道:
>
>> Congratulations Valentyn!
>>
>> On Mon, Aug 26, 2019, 5:02 PM Yifan Zou  wrote:
>>
>>> Congratulations, Valentyn! Well deserved!
>>>
>>> On Mon, Aug 26, 2019 at 3:31 PM Aizhamal Nurmamat kyzy <
>>> aizha...@google.com> wrote:
>>>
 Congratulations! and thank you for your contributions, Valentyn!

 On Mon, Aug 26, 2019 at 3:26 PM Thomas Weise  wrote:

> Congrats!
>
>
> On Mon, Aug 26, 2019 at 3:22 PM Heejong Lee 
> wrote:
>
>> Congratulations! :)
>>
>> On Mon, Aug 26, 2019 at 2:44 PM Rui Wang  wrote:
>>
>>> Congratulations!
>>>
>>>
>>> -Rui
>>>
>>> On Mon, Aug 26, 2019 at 2:36 PM Hannah Jiang 
>>> wrote:
>>>
 Congratulations Valentyn, well deserved!

 On Mon, Aug 26, 2019 at 2:34 PM Chamikara Jayalath <
 chamik...@google.com> wrote:

> Congrats Valentyn!
>
> On Mon, Aug 26, 2019 at 2:32 PM Pablo Estrada 
> wrote:
>
>> Thanks Valentyn!
>>
>> On Mon, Aug 26, 2019 at 2:29 PM Robin Qiu 
>> wrote:
>>
>>> Thank you Valentyn! Congratulations!
>>>
>>> On Mon, Aug 26, 2019 at 2:28 PM Robert Bradshaw <
>>> rober...@google.com> wrote:
>>>
 Hi,

 Please join me and the rest of the Beam PMC in welcoming a new
 committer: Valentyn Tymofieiev

 Valentyn has made numerous contributions to Beam over the last
 several
 years (including 100+ pull requests), most recently pushing
 through
 the effort to make Beam compatible with Python 3. He is also an
 active
 participant in design discussions on the list, participates in
 release
 candidate validation, and proactively helps keep our tests
 green.

 In consideration of Valentyn's contributions, the Beam PMC
 trusts him
 with the responsibilities of a Beam committer [1].

 Thank you, Valentyn, for your contributions and looking forward
 to many more!

 Robert, on behalf of the Apache Beam PMC

 [1]
 https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer

>>>

-- 
Cheers,
Gleb