Re: [VOTE] Release 2.9.0, release candidate #1

2018-12-06 Thread Chamikara Jayalath
Udi or anybody else who is familiar about Nexmark,  please -1 the vote
thread if you think this particular performance regression for Spark/Direct
runners is a blocker. Otherwise I think we can continue the vote.

Thanks,
Cham

On Thu, Dec 6, 2018 at 6:19 PM Chamikara Jayalath 
wrote:

> Are either of these regressions due to known issues ? If not should they
> be considered release blockers ?
>
> Thanks,
> Cham
>
> On Thu, Dec 6, 2018 at 6:11 PM Udi Meiri  wrote:
>
>> For DirectRunner there are regressions in query 7 sql direct runner
>> batch mode
>> 
>>  (2x)
>> and streaming mode (5x).
>>
>>
>> On Thu, Dec 6, 2018 at 5:59 PM Udi Meiri  wrote:
>>
>>> I see a regression for query 7 spark runner batch mode
>>> 
>>>  on
>>> about 2018-11-13.
>>> [image: image.png]
>>>
>>> On Thu, Dec 6, 2018 at 2:46 AM Chamikara Jayalath 
>>> wrote:
>>>
 Hi everyone,

 Please review and vote on the release candidate #1 for the version
 2.9.0, as follows:
 [ ] +1, Approve the release
 [ ] -1, Do not approve the release (please provide specific comments)


 The complete staging area is available for your review, which includes:
 * JIRA release notes [1],
 * the official Apache source release to be deployed to dist.apache.org
 [2], which is signed with the key with fingerprint EEAC70DF3D0BC23B
  [3],
 * all artifacts to be deployed to the Maven Central Repository [4],
 * source code tag "v2.9.0-RC1" [5],
 * website pull request listing the release [6] and publishing the API
 reference manual [7].
 * Python artifacts are deployed along with the source release to the
 dist.apache.org [2].
 * Validation sheet with a tab for 2.9.0 release to help with
 validation [7].

 The vote will be open for at least 72 hours. It is adopted by majority
 approval, with at least 3 PMC affirmative votes.

 Thanks,
 Cham

 [1]
 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344258
 [2] https://dist.apache.org/repos/dist/dev/beam/2.9.0/
 [3] https://dist.apache.org/repos/dist/release/beam/KEYS
 [4]
 https://repository.apache.org/content/repositories/orgapachebeam-1054/
 [5] https://github.com/apache/beam/tree/v2.9.0-RC1
 [6] https://github.com/apache/beam/pull/7215
 [7] https://github.com/apache/beam-site/pull/584
 [8]
 https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529

>>>


Re: [VOTE] Release 2.9.0, release candidate #1

2018-12-06 Thread Chamikara Jayalath
Are either of these regressions due to known issues ? If not should they be
considered release blockers ?

Thanks,
Cham

On Thu, Dec 6, 2018 at 6:11 PM Udi Meiri  wrote:

> For DirectRunner there are regressions in query 7 sql direct runner batch
> mode
> 
>  (2x)
> and streaming mode (5x).
>
>
> On Thu, Dec 6, 2018 at 5:59 PM Udi Meiri  wrote:
>
>> I see a regression for query 7 spark runner batch mode
>> 
>>  on
>> about 2018-11-13.
>> [image: image.png]
>>
>> On Thu, Dec 6, 2018 at 2:46 AM Chamikara Jayalath 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> Please review and vote on the release candidate #1 for the version
>>> 2.9.0, as follows:
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>
>>>
>>> The complete staging area is available for your review, which includes:
>>> * JIRA release notes [1],
>>> * the official Apache source release to be deployed to dist.apache.org
>>> [2], which is signed with the key with fingerprint EEAC70DF3D0BC23B [3],
>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>> * source code tag "v2.9.0-RC1" [5],
>>> * website pull request listing the release [6] and publishing the API
>>> reference manual [7].
>>> * Python artifacts are deployed along with the source release to the
>>> dist.apache.org [2].
>>> * Validation sheet with a tab for 2.9.0 release to help with validation
>>> [7].
>>>
>>> The vote will be open for at least 72 hours. It is adopted by majority
>>> approval, with at least 3 PMC affirmative votes.
>>>
>>> Thanks,
>>> Cham
>>>
>>> [1]
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344258
>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.9.0/
>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>> [4]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1054/
>>> [5] https://github.com/apache/beam/tree/v2.9.0-RC1
>>> [6] https://github.com/apache/beam/pull/7215
>>> [7] https://github.com/apache/beam-site/pull/584
>>> [8]
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529
>>>
>>


Re: [VOTE] Release 2.9.0, release candidate #1

2018-12-06 Thread Udi Meiri
For DirectRunner there are regressions in query 7 sql direct runner batch
mode

(2x)
and streaming mode (5x).


On Thu, Dec 6, 2018 at 5:59 PM Udi Meiri  wrote:

> I see a regression for query 7 spark runner batch mode
> 
>  on
> about 2018-11-13.
> [image: image.png]
>
> On Thu, Dec 6, 2018 at 2:46 AM Chamikara Jayalath 
> wrote:
>
>> Hi everyone,
>>
>> Please review and vote on the release candidate #1 for the version 2.9.0,
>> as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>>
>> The complete staging area is available for your review, which includes:
>> * JIRA release notes [1],
>> * the official Apache source release to be deployed to dist.apache.org
>> [2], which is signed with the key with fingerprint EEAC70DF3D0BC23B [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "v2.9.0-RC1" [5],
>> * website pull request listing the release [6] and publishing the API
>> reference manual [7].
>> * Python artifacts are deployed along with the source release to the
>> dist.apache.org [2].
>> * Validation sheet with a tab for 2.9.0 release to help with validation
>> [7].
>>
>> The vote will be open for at least 72 hours. It is adopted by majority
>> approval, with at least 3 PMC affirmative votes.
>>
>> Thanks,
>> Cham
>>
>> [1]
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344258
>> [2] https://dist.apache.org/repos/dist/dev/beam/2.9.0/
>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> [4]
>> https://repository.apache.org/content/repositories/orgapachebeam-1054/
>> [5] https://github.com/apache/beam/tree/v2.9.0-RC1
>> [6] https://github.com/apache/beam/pull/7215
>> [7] https://github.com/apache/beam-site/pull/584
>> [8]
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Reviews for a few changes to the Go SDK

2018-12-06 Thread Ahmet Altay
Hi Andrew,

Following up, I reviewed both changes. Left a comment for the first one,
and merged the second one. Let me know if you need additional help.

Ahmet

On Mon, Dec 3, 2018 at 10:59 AM Ahmet Altay  wrote:

> Hi Andrew,
>
>  +Robert Burke  (assignee for the both JIRAs) would be a
> better reviewer but he is out of office this week. I was helping him with a
> few reviews recently and I would be happy to review your changes too in his
> absence.
>
> Since you are not in a rush, I will try to review your changes before end
> of the week.
>
> Ahmet
>
> On Sun, Dec 2, 2018 at 2:05 PM Andrew Brampton  wrote:
>
>> Hi,
>>
>> I've been making a few changes to the experimental Go SDK. I'm in no
>> rush, but per the contributors guide I'm sharing my intent, and looking for
>> a reviewer.
>>
>> Specifically:
>> [BEAM-6144] Add support for the autoscalingAlgorithm flag
>> 
>> [BEAM-6155] Migrate the Go SDK to the modern GCS library
>> 
>>
>> thanks
>> Andrew
>>
>


Re: [VOTE] Release 2.9.0, release candidate #1

2018-12-06 Thread Udi Meiri
I see a regression for query 7 spark runner batch mode

on
about 2018-11-13.
[image: image.png]

On Thu, Dec 6, 2018 at 2:46 AM Chamikara Jayalath 
wrote:

> Hi everyone,
>
> Please review and vote on the release candidate #1 for the version 2.9.0,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2], which is signed with the key with fingerprint EEAC70DF3D0BC23B [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.9.0-RC1" [5],
> * website pull request listing the release [6] and publishing the API
> reference manual [7].
> * Python artifacts are deployed along with the source release to the
> dist.apache.org [2].
> * Validation sheet with a tab for 2.9.0 release to help with validation
> [7].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Cham
>
> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344258
> [2] https://dist.apache.org/repos/dist/dev/beam/2.9.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1054/
> [5] https://github.com/apache/beam/tree/v2.9.0-RC1
> [6] https://github.com/apache/beam/pull/7215
> [7] https://github.com/apache/beam-site/pull/584
> [8]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Jira

2018-12-06 Thread Lukasz Cwik
Welcome, I have added you.

On Thu, Dec 6, 2018 at 11:12 AM Dustin Rhodes  wrote:

> Hi,
>
> This Dustin from Google.  I'm working on the Dataflow Runner.  Can someone
> add me as a contributor for Beam's Jira issue tracker? I would like to be
> able to assign this Jira issue I created to myself
> https://issues.apache.org/jira/browse/BEAM-6190 .  My Jira ID is dustin12.
>
> Thanks!
> Dustin
>


Re: GSOC - Summer of Code, on Beam?

2018-12-06 Thread jhzgg2017



On 2018/12/05 00:29:48, Pablo Estrada  wrote: 
> Hi Austin!
> Thanks a lot for surfacing this. I participated in GSOC as a student a
> couple times, and loved it. This being my first time around as a committer,
> I'm excited to try and help.
> 
> I think, for starters, it may be good to find issues in JIRA to label with
> "gsoc", so please everyone who knows of good candidate project issues,
> label them with "gsoc".
> 
> And then we can find mentors for these issues, and start helping students
> in the application process.
> 
> Best
> -P.
> 
> On Tue, Dec 4, 2018 at 3:46 PM Austin Bennett 
> wrote:
> 
> > Would it make sense to have any GSOC students for next summer work on
> > Beam?  Do we have some candidate things that would be suitable and
> > sufficiently discrete projects?
> >
> > Initial applications for organizations not even open for about a month,
> > though thought worth getting a sense from the group.
> >
> > A bit of info:
> > https://summerofcode.withgoogle.com/archive/
> >
> > https://opensource.googleblog.com/2018/11/google-summer-of-code-15-years-strong.html
> >
> >
> >
> >
> Hi Pablo!
>I am a junior majoring in CS and interested in Apache Beam and data process. I 
>hope to >participate in GSOC and work on Beam next summer. Could you give me 
>some advice on 
>how to prepare for it? Thanks a lot.



Jira

2018-12-06 Thread Dustin Rhodes
Hi,

This Dustin from Google.  I'm working on the Dataflow Runner.  Can someone
add me as a contributor for Beam's Jira issue tracker? I would like to be
able to assign this Jira issue I created to myself
https://issues.apache.org/jira/browse/BEAM-6190 .  My Jira ID is dustin12.

Thanks!
Dustin


Re: org.apache.beam.runners.flink.PortableTimersExecutionTest is very flakey

2018-12-06 Thread Maximilian Michels

Hi Alex,

Thank you for your PR. I agree PAssert is much nicer.

You're right. Despite the source not being executed in parallel, the results of 
the source will then be distributed round-robin to all the other tasks. Note 
that there is no GroupBy operation. Thanks for spotting the issue.


I think the apparent severity of the issue fixed in your PR was also caused by 
the more critical concurrency issue fixed in 
https://github.com/apache/beam/pull/7171/files I've run the test several 
thousand times since then and didn't see any failures anymore. But that's just 
on my machine, PAssert is clearly much safer.


> Please also try to follow this pattern on any other tests which you may have.

You already had me convinced by your arguments before you said this :) I'll 
change the remaining tests to use PAssert as well. I'll request a review for it 
from you if you don't mind.


I think it would make sense if we synced more timely when we both look into the 
same test. Are you on the ASF Slack? Feel free to reach out there for me.


Thanks,
Max

On 05.12.18 21:41, Alex Amato wrote:
I believe that the ParDos are being invoked in parallel. I'm not sure on the 
exact semantics, but I believe that beam will execute separate keys on separate 
threads, when it processes different bundles for different those keys.
I logged the thread IDs in this test, to verify that different threads are 
invoking this code.

Applying my fix, I was able to pass the test 400/400 runs.

I talked to Luke, and he suggested using PAssert, which is the most thread 
safe/standard way to verify pipeline results It also simplifies the code a 
little bit, removing the last unnecessary DoFn.


PTAL at this PR, I recommend committing this in to remove the concurrency issue 
collecting test results and remove flakeyness in this test.

https://github.com/apache/beam/pull/7214/files

Please also try to follow this pattern on any other tests which you may have.

FWIW, Here is the logged thread ids that I saw, I appended logs to a 
ConcurrentLinkedQueue and printed them at the end of the test, so this shows the 
separate threads and the interleaving of them.
processElement collectResults 26 : 5000 results: 1601591000 threadId: 
pool-32-thread-15
processElement collectResults 34 : 4093 results: 1360449464 threadId: 
pool-32-thread-14
processElement collectResults 47 : 4093 results: 323962224 threadId: 
pool-32-thread-19
processElement collectResults 19 : 4093 results: 323962224 threadId: 
pool-32-thread-19
processElement collectResults 45 : 4093 results: 167183883 threadId: 
pool-32-thread-18
processElement collectResults 0 : 4093 results: 167183883 threadId: 
pool-32-thread-18
processElement collectResults 2 : 4093 results: 167183883 threadId: 
pool-32-thread-18
processElement collectResults 30 : 4093 results: 865903006 threadId: 
pool-32-thread-21
processElement collectResults 11 : 4093 results: 865903006 threadId: 
pool-32-thread-21
processElement collectResults 1 : 4093 results: 865903006 threadId: 
pool-32-thread-21
processElement collectResults 41 : 4093 results: 1183940089 threadId: 
pool-32-thread-23
processElement collectResults 7 : 4093 results: 1183940089 threadId: 
pool-32-thread-23
processElement collectResults 13 : 4093 results: 1183940089 threadId: 
pool-32-thread-23
processElement collectResults 36 : 4093 results: 1183940089 threadId: 
pool-32-thread-23
processElement collectResults 21 : 4093 results: 907415986 threadId: 
pool-32-thread-17
processElement collectResults 32 : 4093 results: 907415986 threadId: 
pool-32-thread-17
processElement collectResults 10 : 4093 results: 907415986 threadId: 
pool-32-thread-17
processElement collectResults 20 : 4093 results: 907415986 threadId: 
pool-32-thread-17
processElement collectResults 14 : 4093 results: 907415986 threadId: 
pool-32-thread-17
processElement collectResults 24 : 4093 results: 1391785351 threadId: 
pool-32-thread-15
processElement collectResults 46 : 4093 results: 1391785351 threadId: 
pool-32-thread-15
processElement collectResults 17 : 4093 results: 1391785351 threadId: 
pool-32-thread-15




On Wed, Dec 5, 2018 at 6:49 AM Maximilian Michels > wrote:


Thank you for looking into the test. For me the flakiness was solely caused 
by
the non thread-safe GrpcStateService. I have since closed the JIRA issue
because
I didn't see another failure since the fix.

Your fixes are valid, but they won't fix flakiness (if present) in the 
current
testing pipeline. Why? The results are only ever written by 1 worker because
the
testing source uses Impulse which generates a signal only received by a 
single
worker. So the shared results list is not a problem for this test.

Let me quickly comment on the changes you mentioned:

1) Yes, if we had a parallel source, the List should be a concurrent or
synchronized list.

2) Using a static list should be fine for testing purposes. There are no 
other
tests accessing 

[VOTE] Release 2.9.0, release candidate #1

2018-12-06 Thread Chamikara Jayalath
Hi everyone,

Please review and vote on the release candidate #1 for the version 2.9.0,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2],
which is signed with the key with fingerprint EEAC70DF3D0BC23B [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "v2.9.0-RC1" [5],
* website pull request listing the release [6] and publishing the API
reference manual [7].
* Python artifacts are deployed along with the source release to the
dist.apache.org [2].
* Validation sheet with a tab for 2.9.0 release to help with validation [7].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Cham

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344258
[2] https://dist.apache.org/repos/dist/dev/beam/2.9.0/
[3] https://dist.apache.org/repos/dist/release/beam/KEYS
[4] https://repository.apache.org/content/repositories/orgapachebeam-1054/
[5] https://github.com/apache/beam/tree/v2.9.0-RC1
[6] https://github.com/apache/beam/pull/7215
[7] https://github.com/apache/beam-site/pull/584
[8]
https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529