Re: beam9 failing most of the python tests

2018-12-07 Thread Ankur Goenka
Virtual env setup is failing because of the following error. Can we reboot
the machine to see if it fixes the issue?

:beam-sdks-python:setupVirtualenv FAILED
Traceback (most recent call last):
New python executable in
/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Portable_Python_Commit@2
/src/build/gradleenv/1327086738/bin/python2
File "/usr/lib/python3/dist-packages/virtualenv.py", line 2363, in 
Also creating executable in
/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Portable_Python_Commit@2
/src/build/gradleenv/1327086738/bin/python
main()
File "/usr/lib/python3/dist-packages/virtualenv.py", line 719, in main
symlink=options.symlink)
File "/usr/lib/python3/dist-packages/virtualenv.py", line 942, in
create_environment
site_packages=site_packages, clear=clear, symlink=symlink))
File "/usr/lib/python3/dist-packages/virtualenv.py", line 1423, in
install_python
raise e
OSError: [Errno 11] Resource temporarily unavailable
Running virtualenv with interpreter /usr/bin/python2

On Mon, Dec 3, 2018 at 1:12 PM Ankur Goenka  wrote:

> Hi,
>
> I see that beam9 is failing significantly more number of python related
> builds [1].
> This also results in more failure of beam_PreCommit_Portable_Python_Commit
> [2] on beam9.
> Can someone with access to beam9 take a look?
>
> Thanks,
> Ankur
>
>
> [1] https://builds.apache.org/computer/beam9/builds
> [2]
> https://builds.apache.org/job/beam_PreCommit_Portable_Python_Commit/buildTimeTrend
>


[Call for items] ❄️ December Beam Newsletter

2018-12-07 Thread Rose Nguyen
Hi folks:

Time for the last newsletter of the year!

*Add to [1] the highlights from November to now (or planned events and
talks) that you want to share by 12/12 11:59 p.m. PDT.*

We will collect the notes via Google docs but send out the final version
directly to the user mailing list. If you do not know how to format
something, it is OK to just put down the info and I will edit. I'll ship
out the newsletter on 12/13.

[1]
https://docs.google.com/document/d/1q4KBkcLR7orr6n_QUHMpAVBKYzKYahlRZw_K_PaIuDo

Cheers,
-- 
Rose Thị Nguyễn


OOO

2018-12-07 Thread Lukasz Cwik
I'll be away for the next three months taking care of my little one[1] and
am excited to see what happens within Apache Beam when I return.

I have been mainly focusing on the portability and SplittableDoFn efforts.
If there are questions while I'm out, feel free to reach out to this dev@
list as there are several community members that have been involved.

For portability related stuff:
Thomas Weise
Robert Bradshaw
Maximilian Michels
Ankur Goenka

For SplittableDoFn stuff:
Robert Bradshaw
Ismael Mejia
JB Onofre

1: https://photos.app.goo.gl/sqdcgC5rxDbURPE7A


Re: [VOTE] Release 2.9.0, release candidate #1

2018-12-07 Thread Ismaël Mejía
Looking at the dates on the Spark runner git log there was a PR merged to
change Spark translation from classes to URNs. I cannot see how this can
impact performance. Looking at the other queries in the dashboards, there
seems to be a great variability in the executions of the Spark runner to
the point of feeling we don't have guarantees anymore. I wonder if this was
because of other loads shared in the server(s), or because our sample is
too small for the standard deviation.

I would proceed with the release, the real question is if we can somehow
constraint the execution of this tests to have a more consistent output.


On Fri, Dec 7, 2018 at 4:10 PM Etienne Chauchot 
wrote:

> Hi all,
> Regarding query7 in spark:
> - there doesn't seem to be a functional regression: query passes and
> output size is still the same
>
> - Also the performance degradation seems to be only on spark, the other
> runners do not seem to suffer from it.
>
> - performance degradation seems to be constant from 11/12 so we can
> eliminate temporary load on the jenkins server that would generate delays
> in Max transform.
>
> => query7 uses Max transform, fanout and side inputs, has one of these
> parts recently (11/12/18) changed in spark?
>
> Etienne
>
> Le jeudi 06 décembre 2018 à 21:32 -0800, Chamikara Jayalath a écrit :
>
> Udi or anybody else who is familiar about Nexmark,  please -1 the vote
> thread if you think this particular performance regression for Spark/Direct
> runners is a blocker. Otherwise I think we can continue the vote.
>
> Thanks,
> Cham
>
> On Thu, Dec 6, 2018 at 6:19 PM Chamikara Jayalath 
> wrote:
>
> Are either of these regressions due to known issues ? If not should they
> be considered release blockers ?
>
> Thanks,
> Cham
>
> On Thu, Dec 6, 2018 at 6:11 PM Udi Meiri  wrote:
>
> For DirectRunner there are regressions in query 7 sql direct runner batch
> mode
> 
>  (2x)
> and streaming mode (5x).
>
>
> On Thu, Dec 6, 2018 at 5:59 PM Udi Meiri  wrote:
>
> I see a regression for query 7 spark runner batch mode
> 
>  on
> about 2018-11-13.
> [image: image.png]
>
> On Thu, Dec 6, 2018 at 2:46 AM Chamikara Jayalath 
> wrote:
>
> Hi everyone,
>
> Please review and vote on the release candidate #1 for the version 2.9.0,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2], which is signed with the key with fingerprint EEAC70DF3D0BC23B [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.9.0-RC1" [5],
> * website pull request listing the release [6] and publishing the API
> reference manual [7].
> * Python artifacts are deployed along with the source release to the
> dist.apache.org [2].
> * Validation sheet with a tab for 2.9.0 release to help with validation
> [7].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Cham
>
> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344258
> [2] https://dist.apache.org/repos/dist/dev/beam/2.9.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1054/
> [5] https://github.com/apache/beam/tree/v2.9.0-RC1
> [6] https://github.com/apache/beam/pull/7215
> [7] https://github.com/apache/beam-site/pull/584
> [8]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529
>
>


Re: [VOTE] Release 2.9.0, release candidate #1

2018-12-07 Thread Etienne Chauchot
Hi all, Regarding query7 in spark:- there doesn't seem to be a functional 
regression: query passes and output size is
still the same
- Also the performance degradation seems to be only on spark, the other runners 
do not seem to suffer from it.
- performance degradation seems to be constant from 11/12 so we can eliminate 
temporary load on the jenkins server that
would generate delays in Max transform.
=> query7 uses Max transform, fanout and side inputs, has one of these parts 
recently (11/12/18) changed  in spark?
Etienne
Le jeudi 06 décembre 2018 à 21:32 -0800, Chamikara Jayalath a écrit :
> Udi or anybody else who is familiar about Nexmark,  please -1 the vote thread 
> if you think this particular performance
> regression for Spark/Direct runners is a blocker. Otherwise I think we can 
> continue the vote.
> Thanks,
> Cham
> On Thu, Dec 6, 2018 at 6:19 PM Chamikara Jayalath  
> wrote:
> > Are either of these regressions due to known issues ? If not should they be 
> > considered release blockers ?
> > 
> > Thanks,
> > Cham
> > On Thu, Dec 6, 2018 at 6:11 PM Udi Meiri  wrote:
> > > For DirectRunner there are regressions in query 7 sql direct runner batch 
> > > mode (2x) and streaming mode (5x).
> > > 
> > > 
> > > On Thu, Dec 6, 2018 at 5:59 PM Udi Meiri  wrote:
> > > > I see a regression for query 7 spark runner batch mode on about 
> > > > 2018-11-13.
> > > > On Thu, Dec 6, 2018 at 2:46 AM Chamikara Jayalath 
> > > >  wrote:
> > > > > Hi everyone,
> > > > > 
> > > > > Please review and vote on the release candidate #1 for the version 
> > > > > 2.9.0, as follows:
> > > > > [ ] +1, Approve the release
> > > > > [ ] -1, Do not approve the release (please provide specific comments)
> > > > > 
> > > > > 
> > > > > The complete staging area is available for your review, which 
> > > > > includes:
> > > > > * JIRA release notes [1],
> > > > > * the official Apache source release to be deployed to 
> > > > > dist.apache.org [2], which is signed with the key with
> > > > > fingerprint EEAC70DF3D0BC23B [3],
> > > > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > > > * source code tag "v2.9.0-RC1" [5],
> > > > > * website pull request listing the release [6] and publishing the API 
> > > > > reference manual [7].
> > > > > * Python artifacts are deployed along with the source release to the 
> > > > > dist.apache.org [2].
> > > > > * Validation sheet with a tab for 2.9.0 release to help with 
> > > > > validation [7].
> > > > > 
> > > > > The vote will be open for at least 72 hours. It is adopted by 
> > > > > majority approval, with at least 3 PMC
> > > > > affirmative votes.
> > > > > 
> > > > > Thanks,
> > > > > Cham
> > > > > 
> > > > > [1] 
> > > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344258
> > > > > [2] https://dist.apache.org/repos/dist/dev/beam/2.9.0/
> > > > > [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> > > > > [4] 
> > > > > https://repository.apache.org/content/repositories/orgapachebeam-1054/
> > > > > [5] https://github.com/apache/beam/tree/v2.9.0-RC1
> > > > > [6] https://github.com/apache/beam/pull/7215
> > > > > [7] https://github.com/apache/beam-site/pull/584
> > > > > [8] 
> > > > > https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529


Re: Stand at FOSDEM 2019

2018-12-07 Thread Maximilian Michels

I've put us in the schedule for Saturday with the exact time to be determined.

Btw, who is speaking at FOSDEM? Not sure if all Dev rooms have already announced 
their talks. I'll be talking about Portability in the HPC/Big Data room.


Cheers,
Max

On 30.11.18 18:24, Alexey Romanenko wrote:

I’m going to visit FOSDEM this year as well and will be glad to help as much as 
I can.


On 30 Nov 2018, at 13:00, Maximilian Michels  wrote:

Thank you for all your reactions. Looks like we will have a great presence at 
FOSDEM :)

@Matthias: Yes, I'm planning to go.
@Wout: Locals are perfect :)
@Gris: Thanks for helping out with the merch!
@JB: Are you also around?

I'll try to book something for Saturday afternoon.

Thanks,
Max

On 30.11.18 09:15, Wout Scheepers wrote:

I’m based in Brussels and happy to help out.
Wout
*From: *Griselda Cuevas 
*Reply-To: *"dev@beam.apache.org" 
*Date: *Thursday, 29 November 2018 at 21:44
*To: *"dev@beam.apache.org" 
*Subject: *Re: Stand at FOSDEM 2019
+1 -- I'm happy to help with the merch, I'll be attending and will help staff 
the booth :)
G
On Thu, 29 Nov 2018 at 05:46, Suneel Marthi mailto:smar...@apache.org>> wrote:
+1
On Thu, Nov 29, 2018 at 6:14 AM Matthias Baetens
mailto:baetensmatth...@gmail.com>> wrote:
Hey Max,
Great idea. I'd be very keen to join. I'll look at my calendar
over the weekend to see if this would work.
Are you going yourself?
Cheers,
Matthias
On Thu, 29 Nov 2018 at 11:06 Maximilian Michels mailto:m...@apache.org>> wrote:
Hi,
For everyone who might be attending FOSDEM19: What do you
think about
taking a slot for Beam at the Apache stand?
A slot is 2-3 hours. It is a great way to spread the word
about Beam. We
wouldn't have to prepare much, just bring some merch.
There is still plenty of space:
https://cwiki.apache.org/confluence/display/COMDEV/FOSDEM+2019
Cheers,
Max
PS: FOSDEM is an open-source conference in Brussels, Feb
2-3, 2019
--




Re: Can we allow SimpleFunction and SerializableFunction to throw Exception?

2018-12-07 Thread Robert Bradshaw
How should we move forward on this? The idea looks good, and even
comes with a PR to review. Any objections to the names?
On Wed, Dec 5, 2018 at 12:48 PM Jeff Klukas  wrote:
>
> Reminder that I'm looking for review on 
> https://github.com/apache/beam/pull/7160
>
> On Thu, Nov 29, 2018, 11:48 AM Jeff Klukas >
>> I created a JIRA and a PR for this:
>>
>> https://issues.apache.org/jira/browse/BEAM-6150
>> https://github.com/apache/beam/pull/7160
>>
>> On naming, I'm proposing that SerializableFunction extend ProcessFunction 
>> (since this new superinterface is particularly appropriate for user code 
>> executed inside a ProcessElement method) and that SimpleFunction extend 
>> InferableFunction (since type information and coder inference are what 
>> distinguish this from ProcessFunction).
>>
>> We originally discussed deprecating SerializableFunction and SimpleFunction 
>> in favor of the new types, but there appear to be two fairly separate use 
>> cases for SerializableFunction. It's either defining user code that will be 
>> executed in a DoFn, in which case I think we always want to prefer the new 
>> interface that allows declared exceptions. But it's also used where the code 
>> is to be executed as part of pipeline construction, in which case it may be 
>> reasonable to want to restrict checked exceptions. In any case, deprecating 
>> SerializableFunction and SimpleFunction can be discussed further in the 
>> future.
>>
>>
>> On Wed, Nov 28, 2018 at 9:53 PM Kenneth Knowles  wrote:
>>>
>>> Nice! A clean solution and an opportunity to bikeshed on names. This has 
>>> everything I love.
>>>
>>> Kenn
>>>
>>> On Wed, Nov 28, 2018 at 6:43 PM Jeff Klukas  wrote:

 It looks like we can add make the new interface a superinterface for the 
 existing SerializableFunction while maintaining binary compatibility [0].

 We'd have:

 public interface NewSerializableFunction extends 
 Serializable {
   OutputT apply(InputT input) throws Exception;
 }

 and then modify SerializableFunction to inherit from it:

 public interface SerializableFunction extends 
 NewSerializableFunction, Serializable {
   @Override
   OutputT apply(InputT input);
 }


 IIUC, we can then more or less replace all references to 
 SerializableFunction with NewSerializableFunction across the beam codebase 
 without having to introduce any new overrides. I'm working on a proof of 
 concept now.

 It's not clear what the actual names for NewSerializableFunction and 
 NewSimpleFunction should be.

 [0] 
 https://docs.oracle.com/javase/specs/jls/se8/html/jls-13.html#jls-13.4.4


 On Mon, Nov 26, 2018 at 9:54 PM Thomas Weise  wrote:
>
> +1 for introducing the new interface now and deprecating the old one. The 
> major version change then provides the opportunity to remove deprecated 
> code.
>
>
> On Mon, Nov 26, 2018 at 10:09 AM Lukasz Cwik  wrote:
>>
>> Before 3.0 we will still want to introduce this giving time for people 
>> to migrate, would it make sense to do that now and deprecate the 
>> alternatives that it replaces?
>>
>> On Mon, Nov 26, 2018 at 5:59 AM Jeff Klukas  wrote:
>>>
>>> Picking up this thread again. Based on the feedback from Kenn, Reuven, 
>>> and Romain, it sounds like there's no objection to the idea of 
>>> SimpleFunction and SerializableFunction declaring that they throw 
>>> Exception. So the discussion at this point is about whether there's an 
>>> acceptable way to introduce that change.
>>>
>>> IIUC correctly, Kenn was suggesting that we need to ensure backwards 
>>> compatibility for existing user code both at runtime and recompile, 
>>> which means we can't simply add the declaration to the existing 
>>> interfaces, since that would cause errors at compile time for user code 
>>> directly invoking SerializableFunction instances.
>>>
>>> I don't see an obvious way that introducing a new functional interface 
>>> would help without littering the API with more variants (it's already a 
>>> bit confusing that i.e. MapElements has multiple via() methods to 
>>> support three different function interfaces).
>>>
>>> Perhaps this kind of cleanup is best left for Beam 3.0?
>>>
>>> On Mon, Oct 15, 2018 at 6:51 PM Reuven Lax  wrote:

 Compilation compatibility is a big part of what Beam aims to provide 
 with its guarantees. Romain makes a good point that most users are not 
 invoking SeralizableFunctions themselves (they are usually invoked 
 inside of Beam classes such as MapElements), however I suspect some 
 users do these things.

 On Sun, Oct 14, 2018 at 2:38 PM Kenneth Knowles  
 wrote:
>
> Romain has brought up two good aspects of backwards compatibility:
>