Re: [DISCUSS] Remove findbugs from sdks/java

2018-05-17 Thread Jean-Baptiste Onofré
Thanks Tim.

I think we will be able to remove findbugs after some run/check using 
ErrorProne and see the gaps.

Regards
JB

Le 18 mai 2018 à 07:49, à 07:49, Tim Robertson  a 
écrit:
>Thank you all.
>
>I think this is clear.  Removing findbugs can happen at a future point.
>
>@Scott - I've been working through the java IO error prone issues (some
>already merged, some with open PRs now) so will take those IO Jiras. I
>will
>enable failOnWarning, address dependency issues for findbugs and tackle
>the
>error prone warnings.
>
>
>On Fri, May 18, 2018 at 1:07 AM, Scott Wegner 
>wrote:
>
>> +0.02173913
>>
>> I'm happy to replace FindBugs with ErrorProne, but we need to first
>> upgrade ErrorProne analyzer warnings to errors. Currently the
>codebase is
>> full of warning spam, and there's no enforcement preventing future
>> violations from being added.
>>
>> I've done the work for enforcing ErrorProne analysis on java-sdk-core
>[1],
>> and I've sharded out the rest of the Java components in JIRA issues
>[2] (45
>> total).  Fixing the issues is relatively straightforward, and I've
>tried to
>> provide enough guidance to make them as starter tasks (example: [3]).
>Teng
>> Peng has already started on Spark [4] (thanks!)
>>
>> [1] https://github.com/apache/beam/pull/5319
>> [2] https://issues.apache.org/jira/issues/?jql=project%20%
>>
>3D%20BEAM%20AND%20status%20%3D%20Open%20AND%20labels%20%3D%20errorprone
>> [3] https://issues.apache.org/jira/browse/BEAM-4347
>> [4] https://issues.apache.org/jira/browse/BEAM-4318
>>
>> On Thu, May 17, 2018 at 2:00 PM Ismaël Mejía 
>wrote:
>>
>>> +0.7 also. Findbugs support for more recent versions of Java is
>lacking
>>> and
>>> the maintenance seems frozen in time.
>>>
>>> As a possible plan b can we identify the missing important
>validations to
>>> identify how much we lose and if it is considerable, maybe we can
>create a
>>> minimal configuration for those, and eventually migrate from
>findbugs to
>>> spotbugs (https://github.com/spotbugs/spotbugs/) that seems at least
>to
>>> be
>>> maintained and the most active findbugs fork.
>>>
>>>
>>> On Thu, May 17, 2018 at 9:31 PM Kenneth Knowles 
>wrote:
>>>
>>> > +0.7 I think we should work to remove findbugs. Errorprone covers
>most
>>> of
>>> the same stuff but better and faster.
>>>
>>> > The one thing I'm not sure about is nullness analysis. Findbugs
>has some
>>> serious limitations there but it really improves code quality and
>prevents
>>> blunders. I'm not sure errorprone covers that. I know the Checker
>analyzer
>>> has a full solution that makes NPE impossible as in most modern
>languages.
>>> Maybe that is easy to plug in. The core Java SDK is a good candidate
>for
>>> the first place to do it since it is affects everything else.
>>>
>>> > On Thu, May 17, 2018 at 12:02 PM Tim Robertson <
>>> timrobertson...@gmail.com>
>>> wrote:
>>>
>>> >> Hi all,
>>> >> [bringing a side thread discussion from slack to here]
>>>
>>> >> We're tackling error-prone warnings now and we aim to fail the
>build on
>>> warnings raised [1].
>>>
>>> >> Enabling failOnWarning also fails the build on findbugs warnings.
>>> Currently I see places where these  arise from missing a dependency
>on
>>> findbugs_annotations and I asked on slack the best way to introduce
>this
>>> globally in gradle.
>>>
>>> >> In that discussion the idea was floated to consider removing
>findbugs
>>> completely given it is older, has licensing considerations and is
>not
>>> released regularly.
>>>
>>> >> What do people think about this idea please?
>>>
>>> >> Thanks,
>>> >> Tim
>>> >> [1]
>>> https://lists.apache.org/thread.html/95aae2785c3cd728c2d3378cbdff2a
>>> 7ba19caffcd4faa2049d2e2f46@%3Cdev.beam.apache.org%3E
>>>
>>


Re: [DISCUSS] Remove findbugs from sdks/java

2018-05-17 Thread Tim Robertson
Thank you all.

I think this is clear.  Removing findbugs can happen at a future point.

@Scott - I've been working through the java IO error prone issues (some
already merged, some with open PRs now) so will take those IO Jiras. I will
enable failOnWarning, address dependency issues for findbugs and tackle the
error prone warnings.


On Fri, May 18, 2018 at 1:07 AM, Scott Wegner  wrote:

> +0.02173913
>
> I'm happy to replace FindBugs with ErrorProne, but we need to first
> upgrade ErrorProne analyzer warnings to errors. Currently the codebase is
> full of warning spam, and there's no enforcement preventing future
> violations from being added.
>
> I've done the work for enforcing ErrorProne analysis on java-sdk-core [1],
> and I've sharded out the rest of the Java components in JIRA issues [2] (45
> total).  Fixing the issues is relatively straightforward, and I've tried to
> provide enough guidance to make them as starter tasks (example: [3]). Teng
> Peng has already started on Spark [4] (thanks!)
>
> [1] https://github.com/apache/beam/pull/5319
> [2] https://issues.apache.org/jira/issues/?jql=project%20%
> 3D%20BEAM%20AND%20status%20%3D%20Open%20AND%20labels%20%3D%20errorprone
> [3] https://issues.apache.org/jira/browse/BEAM-4347
> [4] https://issues.apache.org/jira/browse/BEAM-4318
>
> On Thu, May 17, 2018 at 2:00 PM Ismaël Mejía  wrote:
>
>> +0.7 also. Findbugs support for more recent versions of Java is lacking
>> and
>> the maintenance seems frozen in time.
>>
>> As a possible plan b can we identify the missing important validations to
>> identify how much we lose and if it is considerable, maybe we can create a
>> minimal configuration for those, and eventually migrate from findbugs to
>> spotbugs (https://github.com/spotbugs/spotbugs/) that seems at least to
>> be
>> maintained and the most active findbugs fork.
>>
>>
>> On Thu, May 17, 2018 at 9:31 PM Kenneth Knowles  wrote:
>>
>> > +0.7 I think we should work to remove findbugs. Errorprone covers most
>> of
>> the same stuff but better and faster.
>>
>> > The one thing I'm not sure about is nullness analysis. Findbugs has some
>> serious limitations there but it really improves code quality and prevents
>> blunders. I'm not sure errorprone covers that. I know the Checker analyzer
>> has a full solution that makes NPE impossible as in most modern languages.
>> Maybe that is easy to plug in. The core Java SDK is a good candidate for
>> the first place to do it since it is affects everything else.
>>
>> > On Thu, May 17, 2018 at 12:02 PM Tim Robertson <
>> timrobertson...@gmail.com>
>> wrote:
>>
>> >> Hi all,
>> >> [bringing a side thread discussion from slack to here]
>>
>> >> We're tackling error-prone warnings now and we aim to fail the build on
>> warnings raised [1].
>>
>> >> Enabling failOnWarning also fails the build on findbugs warnings.
>> Currently I see places where these  arise from missing a dependency on
>> findbugs_annotations and I asked on slack the best way to introduce this
>> globally in gradle.
>>
>> >> In that discussion the idea was floated to consider removing findbugs
>> completely given it is older, has licensing considerations and is not
>> released regularly.
>>
>> >> What do people think about this idea please?
>>
>> >> Thanks,
>> >> Tim
>> >> [1]
>> https://lists.apache.org/thread.html/95aae2785c3cd728c2d3378cbdff2a
>> 7ba19caffcd4faa2049d2e2f46@%3Cdev.beam.apache.org%3E
>>
>


Re: Current progress on Portable runners

2018-05-17 Thread Thomas Weise
Hi Eugene,

Thanks for putting this together, this is a very nice update and brings
much needed visibility to those hoping to make use of the portability
features or contribute to them.

Since the P1 (MVP) milestone is "wordcount" and some of the next things
listed are more contributor oriented, perhaps we can get more detailed on
what functionality users can expect?

The next P2 milestone is basically everything and that is a lot. It might
actually help to break this down a bit more. A couple of things that I'm
specifically interested in for Python on Flink:

AFAIK state and timer support in Python are not being worked on yet, is
anyone planning to and any idea by when SDK and portable runner might
support it?

Session windows are supported in the Python SDK, but will they (and all
other windowing features) work equally well on the portable Flink runner?
We know that custom window functions will need work..

BTW can you clarify the dependency between streaming support (which I'm
working on) and SDF. It refers to new connectors?

Thanks,
Thomas


On Thu, May 17, 2018 at 3:12 PM, Eugene Kirpichov 
wrote:

> Hi all,
>
> A little over a month ago, a large group of Beam community members has
> been working a prototype of a portable Flink runner - that is, a runner
> that can execute Beam pipelines on Flink via the Portability API
> . The prototype was developed in a 
> separate
> branch  and was
> successfully demonstrated at Flink Forward, where it ran Python and Go
> pipelines in a limited setting.
>
> Since then, a smaller group of people (Ankur Goenka, Axel Magnuson, Ben
> Sidhom and myself) have been working on productionizing the prototype to
> address its limitations and do things "the right way", preparing to reuse
> this work for developing other portable runners (e.g. Spark). This involves
> a surprising amount of work, since many important design and implementation
> concerns could be ignored for the purposes of a prototype. I wanted to give
> an update on where we stand now.
>
> Our immediate milestone in sight is *Run Java and Python batch WordCount
> examples against a distributed remote Flink cluster*. That involves a few
> moving parts, roughly in order of appearance:
>
> *Job submission:*
> - The SDK is configured to use a "portable runner", whose responsibility
> is to run the pipeline against a given JobService endpoint.
> - The portable runner converts the pipeline to a portable Pipeline proto
> - The runner finds out which artifacts it needs to stage, and staging them
> against an ArtifactStagingService
> - A Flink-specific JobService receives the Pipeline proto, performs some
> optimizations (e.g. fusion) and translates it to Flink datasets and
> functions
>
> *Job execution:*
> - A Flink function executes a fused chain of Beam transforms (an
> "executable stage") by converting the input and the stage to bundles and
> executing them against an SDK harness
> - The function starts the proper SDK harness, auxiliary services (e.g.
> artifact retrieval, side input handling) and wires them together
> - The function feeds the data to the harness and receives data back.
>
> *And here is our status of implementation for these parts:* basically,
> almost everything is either done or in review.
>
> *Job submission:*
> - General-purpose portable runner in the Python SDK: done
> ; Java SDK: also done
> 
> - Artifact staging from the Python SDK: in review (PR
> , PR
> ); in java, it's done also
> - Flink JobService: in review 
> - Translation from a Pipeline proto to Flink datasets and functions: done
> 
> - ArtifactStagingService implementation that stages artifacts to a
> location on a distributed filesystem: in development (design is clear)
>
> *Job execution:*
> - Flink function for executing via an SDK harness: done
> 
> - APIs for managing lifecycle of an SDK harness: done
> 
> - Specific implementation of those APIs using Docker: part done
> , part in review
> 
> - ArtifactRetrievalService that retrieves artifacts from the location
> where ArtifactStagingService staged them: in development.
>
> We expect that the in-review parts will be done, and the in-development
> parts be developed, in the next 2-3 weeks. We will, of course, update the
> community when this important milestone is reached.
>
> *After that, the next milestones include:*
> - Sett up Java, Python and Go ValidatesRunner tests to run against the
> portable Flink runner, and get them to pass
> - Expand 

Re: Java PreCommit seems broken

2018-05-17 Thread Scott Wegner
I noticed that tests tests simply run "mvn clean install" on the archetype
project. But I don't see any dependent task which installs built artifacts
into the local Maven repo. Is that an oversight?

If that's the case, perhaps the tests are failing sporadically when there
are no previously installed snapshot artifacts cached on the machine.

On Thu, May 17, 2018, 2:45 PM Pablo Estrada  wrote:

> I'm seeing failures on Maven Archetype-related tests.
>
> Build Scan of a sample run: https://scans.gradle.com/s/kr23q43mh6fmk
>
> And the failure is here specifically:
> https://scans.gradle.com/s/kr23q43mh6fmk/console-log?task=:beam-sdks-java-maven-archetypes-examples:generateAndBuildArchetypeTest#L116
>
>
> Does anyone know why this might be happening?
> Best
> -P.
> --
> Got feedback? go/pabloem-feedback
> 
>


Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-05-17 Thread Jean-Baptiste Onofré
Hi,

The build was OK  yesterday but the maven-metadata is still missing.

That's the point to  fix before being able to move forward on  the release.

I  gonna tackle this later today.

Regards
JB

On 05/18/2018 02:41 AM, Ahmet Altay wrote:
> Hi JB and all,
> 
> I wanted to follow up on my previous email. The python streaming issue I
> mentioned is resolved and removed from the blocker list. Blocker list is empty
> now. You can go ahead with the release branch cut when you are ready.
> 
> Thank you,
> Ahmet
> 
> 
> On Sun, May 13, 2018 at 8:43 AM, Jean-Baptiste Onofré  > wrote:
> 
> Hi guys,
> 
> just to let you know that the build fully passed on my box.
> 
> I'm testing the artifacts right now.
> 
> Regards
> JB
> 
> On 06/04/2018 10:48, Jean-Baptiste Onofré wrote:
> 
> Hi guys,
> 
> Apache Beam 2.4.0 has been released on March 20th.
> 
> According to our cycle of release (roughly 6 weeks), we should think
> about 2.5.0.
> 
> I'm volunteer to tackle this release.
> 
> I'm proposing the following items:
> 
> 1. We start the Jira triage now, up to Tuesday
> 2. I would like to cut the release on Tuesday night (Europe time)
> 2bis. I think it's wiser to still use Maven for this release. Do you
> think we
> will be ready to try a release with Gradle ?
> 
> After this release, I would like a discussion about:
> 1. Gradle release (if we release 2.5.0 with Maven)
> 2. Isolate release cycle per Beam part. I think it would be 
> interesting
> to have
> different release cycle: SDKs, DSLs, Runners, IOs. That's another
> discussion, I
> will start a thread about that.
> 
> Thoughts ?
> 
> Regards
> JB
> 
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: I'm back and ready to help grow our community!

2018-05-17 Thread Jesse Anderson
Congrats!

On Thu, May 17, 2018, 6:44 PM Robert Burke  wrote:

> Congrats & welcome back!
>
> On Thu, May 17, 2018, 5:44 PM Huygaa Batsaikhan  wrote:
>
>> Welcome back, Gris! Congratulations!
>>
>> On Thu, May 17, 2018 at 4:24 PM Robert Bradshaw 
>> wrote:
>>
>>> Congratulations, Gris! And welcome back!
>>> On Thu, May 17, 2018 at 3:30 PM Robin Qiu  wrote:
>>>
>>> > Congratulations! Welcome back!
>>>
>>> > On Thu, May 17, 2018 at 3:23 PM Reuven Lax  wrote:
>>>
>>> >> Congratulations! Good to see you back!
>>>
>>> >> Reuven
>>>
>>> >> On Thu, May 17, 2018 at 2:24 PM Griselda Cuevas 
>>> wrote:
>>>
>>> >>> Hi Everyone,
>>>
>>>
>>> >>> I was absent from the mailing list, slack channel and our Beam
>>> community for the past six weeks, the reason was that I took a leave to
>>> focus on finishing my Masters Degree, which I finally did on May 15th.
>>>
>>>
>>> >>> I graduated as a Masters of Engineering in Operations Research with a
>>> concentration in Data Science from UC Berkeley. I'm glad to be part of
>>> this
>>> community and I'd like to share this accomplishment with you so I'm
>>> adding
>>> two pictures of that day :)
>>>
>>>
>>> >>> Given that I've seen so many new folks around, I'd like to use this
>>> opportunity to re-introduce myself. I'm Gris Cuevas and I work at Google.
>>> Now that I'm back, I'll continue to work on supporting our community in
>>> two
>>> main streams: Contribution Experience & Events, Meetups, and Conferences.
>>>
>>>
>>> >>> It's good to be back and I look forward to collaborating with you.
>>>
>>>
>>> >>> Cheers,
>>>
>>> >>> Gris
>>>
>>


Re: SQL shaded jars don't work. How to test?

2018-05-17 Thread Andrew Pilloud
Yep, I added the issue as a blocker.
https://issues.apache.org/jira/projects/BEAM/issues/BEAM-4357

On Thu, May 17, 2018, 6:05 PM Kenneth Knowles  wrote:

> This sounds like a release blocker. Can you add it to the list? (Assign
> fix version on jira)
>
> Kenn
>
> On Thu, May 17, 2018, 17:30 Lukasz Cwik  wrote:
>
>> Typically we have a test block which uses a configuration that has the
>> shadow/shadowTest configurations on the classpath instead of the
>> compile/testCompile configurations. The most common examples are validates
>> runner/integration tests for example:
>>
>> https://github.com/apache/beam/blob/0c5ebc449554a02cae5e4fd01afb07ecdb0bbaea/runners/direct-java/build.gradle#L84
>>
>> On Thu, May 17, 2018 at 3:59 PM Andrew Pilloud 
>> wrote:
>>
>>> I decided to try our new JDBC support with sqlline and discovered that
>>> our SQL shaded jar is completely broken. As
>>> in java.lang.NoClassDefFoundError all over the place. How are we testing
>>> the output jars from other beam packages? Is there an example I can follow
>>> to make our integration tests run against the release artifacts?
>>>
>>> Andrew
>>>
>>


Proposal: keeping precommit times fast

2018-05-17 Thread Udi Meiri
HI,
I have a proposal to improve contributor experience by keeping precommit
times low.

I'm looking to get community consensus and approval about:
1. How long should precommits take. 2 hours @95th percentile over the past
4 weeks is the current proposal.
2. The process for dealing with slowness. Do we: fix, roll back, remove a
test from precommit?
Rolling back if a fix is estimated to take longer than 2 weeks is the
current proposal.

https://docs.google.com/document/d/1udtvggmS2LTMmdwjEtZCcUQy6aQAiYTI3OrTP8CLfJM/edit?usp=sharing


smime.p7s
Description: S/MIME Cryptographic Signature


Re: I'm back and ready to help grow our community!

2018-05-17 Thread Robert Burke
Congrats & welcome back!

On Thu, May 17, 2018, 5:44 PM Huygaa Batsaikhan  wrote:

> Welcome back, Gris! Congratulations!
>
> On Thu, May 17, 2018 at 4:24 PM Robert Bradshaw 
> wrote:
>
>> Congratulations, Gris! And welcome back!
>> On Thu, May 17, 2018 at 3:30 PM Robin Qiu  wrote:
>>
>> > Congratulations! Welcome back!
>>
>> > On Thu, May 17, 2018 at 3:23 PM Reuven Lax  wrote:
>>
>> >> Congratulations! Good to see you back!
>>
>> >> Reuven
>>
>> >> On Thu, May 17, 2018 at 2:24 PM Griselda Cuevas 
>> wrote:
>>
>> >>> Hi Everyone,
>>
>>
>> >>> I was absent from the mailing list, slack channel and our Beam
>> community for the past six weeks, the reason was that I took a leave to
>> focus on finishing my Masters Degree, which I finally did on May 15th.
>>
>>
>> >>> I graduated as a Masters of Engineering in Operations Research with a
>> concentration in Data Science from UC Berkeley. I'm glad to be part of
>> this
>> community and I'd like to share this accomplishment with you so I'm adding
>> two pictures of that day :)
>>
>>
>> >>> Given that I've seen so many new folks around, I'd like to use this
>> opportunity to re-introduce myself. I'm Gris Cuevas and I work at Google.
>> Now that I'm back, I'll continue to work on supporting our community in
>> two
>> main streams: Contribution Experience & Events, Meetups, and Conferences.
>>
>>
>> >>> It's good to be back and I look forward to collaborating with you.
>>
>>
>> >>> Cheers,
>>
>> >>> Gris
>>
>


Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-05-17 Thread Ahmet Altay
On Thu, May 17, 2018 at 6:08 PM, Kenneth Knowles  wrote:

> In case you didn't see the other thread, Andrew just discovered a problem
> in SQL's jar build. It may be a release blocker.
>

I missed Andrew's email. I only looked at the release blocking list. If it
might be a release blocker, could you please add it to the list?


>
> Just an FYI. Since the fix is likely small fixes to build file it seems ok
> to cut the branch and cherry pick.
>
> Kenn
>
> On Thu, May 17, 2018, 17:41 Ahmet Altay  wrote:
>
>> Hi JB and all,
>>
>> I wanted to follow up on my previous email. The python streaming issue I
>> mentioned is resolved and removed from the blocker list. Blocker list is
>> empty now. You can go ahead with the release branch cut when you are ready.
>>
>> Thank you,
>> Ahmet
>>
>>
>> On Sun, May 13, 2018 at 8:43 AM, Jean-Baptiste Onofré 
>> wrote:
>>
>>> Hi guys,
>>>
>>> just to let you know that the build fully passed on my box.
>>>
>>> I'm testing the artifacts right now.
>>>
>>> Regards
>>> JB
>>>
>>> On 06/04/2018 10:48, Jean-Baptiste Onofré wrote:
>>>
 Hi guys,

 Apache Beam 2.4.0 has been released on March 20th.

 According to our cycle of release (roughly 6 weeks), we should think
 about 2.5.0.

 I'm volunteer to tackle this release.

 I'm proposing the following items:

 1. We start the Jira triage now, up to Tuesday
 2. I would like to cut the release on Tuesday night (Europe time)
 2bis. I think it's wiser to still use Maven for this release. Do you
 think we
 will be ready to try a release with Gradle ?

 After this release, I would like a discussion about:
 1. Gradle release (if we release 2.5.0 with Maven)
 2. Isolate release cycle per Beam part. I think it would be interesting
 to have
 different release cycle: SDKs, DSLs, Runners, IOs. That's another
 discussion, I
 will start a thread about that.

 Thoughts ?

 Regards
 JB


>>


Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-05-17 Thread Kenneth Knowles
In case you didn't see the other thread, Andrew just discovered a problem
in SQL's jar build. It may be a release blocker.

Just an FYI. Since the fix is likely small fixes to build file it seems ok
to cut the branch and cherry pick.

Kenn

On Thu, May 17, 2018, 17:41 Ahmet Altay  wrote:

> Hi JB and all,
>
> I wanted to follow up on my previous email. The python streaming issue I
> mentioned is resolved and removed from the blocker list. Blocker list is
> empty now. You can go ahead with the release branch cut when you are ready.
>
> Thank you,
> Ahmet
>
>
> On Sun, May 13, 2018 at 8:43 AM, Jean-Baptiste Onofré 
> wrote:
>
>> Hi guys,
>>
>> just to let you know that the build fully passed on my box.
>>
>> I'm testing the artifacts right now.
>>
>> Regards
>> JB
>>
>> On 06/04/2018 10:48, Jean-Baptiste Onofré wrote:
>>
>>> Hi guys,
>>>
>>> Apache Beam 2.4.0 has been released on March 20th.
>>>
>>> According to our cycle of release (roughly 6 weeks), we should think
>>> about 2.5.0.
>>>
>>> I'm volunteer to tackle this release.
>>>
>>> I'm proposing the following items:
>>>
>>> 1. We start the Jira triage now, up to Tuesday
>>> 2. I would like to cut the release on Tuesday night (Europe time)
>>> 2bis. I think it's wiser to still use Maven for this release. Do you
>>> think we
>>> will be ready to try a release with Gradle ?
>>>
>>> After this release, I would like a discussion about:
>>> 1. Gradle release (if we release 2.5.0 with Maven)
>>> 2. Isolate release cycle per Beam part. I think it would be interesting
>>> to have
>>> different release cycle: SDKs, DSLs, Runners, IOs. That's another
>>> discussion, I
>>> will start a thread about that.
>>>
>>> Thoughts ?
>>>
>>> Regards
>>> JB
>>>
>>>
>


Re: SQL shaded jars don't work. How to test?

2018-05-17 Thread Kenneth Knowles
This sounds like a release blocker. Can you add it to the list? (Assign fix
version on jira)

Kenn

On Thu, May 17, 2018, 17:30 Lukasz Cwik  wrote:

> Typically we have a test block which uses a configuration that has the
> shadow/shadowTest configurations on the classpath instead of the
> compile/testCompile configurations. The most common examples are validates
> runner/integration tests for example:
>
> https://github.com/apache/beam/blob/0c5ebc449554a02cae5e4fd01afb07ecdb0bbaea/runners/direct-java/build.gradle#L84
>
> On Thu, May 17, 2018 at 3:59 PM Andrew Pilloud 
> wrote:
>
>> I decided to try our new JDBC support with sqlline and discovered that
>> our SQL shaded jar is completely broken. As
>> in java.lang.NoClassDefFoundError all over the place. How are we testing
>> the output jars from other beam packages? Is there an example I can follow
>> to make our integration tests run against the release artifacts?
>>
>> Andrew
>>
>


Re: I'm back and ready to help grow our community!

2018-05-17 Thread Huygaa Batsaikhan
Welcome back, Gris! Congratulations!

On Thu, May 17, 2018 at 4:24 PM Robert Bradshaw  wrote:

> Congratulations, Gris! And welcome back!
> On Thu, May 17, 2018 at 3:30 PM Robin Qiu  wrote:
>
> > Congratulations! Welcome back!
>
> > On Thu, May 17, 2018 at 3:23 PM Reuven Lax  wrote:
>
> >> Congratulations! Good to see you back!
>
> >> Reuven
>
> >> On Thu, May 17, 2018 at 2:24 PM Griselda Cuevas 
> wrote:
>
> >>> Hi Everyone,
>
>
> >>> I was absent from the mailing list, slack channel and our Beam
> community for the past six weeks, the reason was that I took a leave to
> focus on finishing my Masters Degree, which I finally did on May 15th.
>
>
> >>> I graduated as a Masters of Engineering in Operations Research with a
> concentration in Data Science from UC Berkeley. I'm glad to be part of this
> community and I'd like to share this accomplishment with you so I'm adding
> two pictures of that day :)
>
>
> >>> Given that I've seen so many new folks around, I'd like to use this
> opportunity to re-introduce myself. I'm Gris Cuevas and I work at Google.
> Now that I'm back, I'll continue to work on supporting our community in two
> main streams: Contribution Experience & Events, Meetups, and Conferences.
>
>
> >>> It's good to be back and I look forward to collaborating with you.
>
>
> >>> Cheers,
>
> >>> Gris
>


Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-05-17 Thread Ahmet Altay
Hi JB and all,

I wanted to follow up on my previous email. The python streaming issue I
mentioned is resolved and removed from the blocker list. Blocker list is
empty now. You can go ahead with the release branch cut when you are ready.

Thank you,
Ahmet


On Sun, May 13, 2018 at 8:43 AM, Jean-Baptiste Onofré 
wrote:

> Hi guys,
>
> just to let you know that the build fully passed on my box.
>
> I'm testing the artifacts right now.
>
> Regards
> JB
>
> On 06/04/2018 10:48, Jean-Baptiste Onofré wrote:
>
>> Hi guys,
>>
>> Apache Beam 2.4.0 has been released on March 20th.
>>
>> According to our cycle of release (roughly 6 weeks), we should think
>> about 2.5.0.
>>
>> I'm volunteer to tackle this release.
>>
>> I'm proposing the following items:
>>
>> 1. We start the Jira triage now, up to Tuesday
>> 2. I would like to cut the release on Tuesday night (Europe time)
>> 2bis. I think it's wiser to still use Maven for this release. Do you
>> think we
>> will be ready to try a release with Gradle ?
>>
>> After this release, I would like a discussion about:
>> 1. Gradle release (if we release 2.5.0 with Maven)
>> 2. Isolate release cycle per Beam part. I think it would be interesting
>> to have
>> different release cycle: SDKs, DSLs, Runners, IOs. That's another
>> discussion, I
>> will start a thread about that.
>>
>> Thoughts ?
>>
>> Regards
>> JB
>>
>>


Re: SQL shaded jars don't work. How to test?

2018-05-17 Thread Lukasz Cwik
Typically we have a test block which uses a configuration that has the
shadow/shadowTest configurations on the classpath instead of the
compile/testCompile configurations. The most common examples are validates
runner/integration tests for example:
https://github.com/apache/beam/blob/0c5ebc449554a02cae5e4fd01afb07ecdb0bbaea/runners/direct-java/build.gradle#L84

On Thu, May 17, 2018 at 3:59 PM Andrew Pilloud  wrote:

> I decided to try our new JDBC support with sqlline and discovered that our
> SQL shaded jar is completely broken. As in java.lang.NoClassDefFoundError
> all over the place. How are we testing the output jars from other beam
> packages? Is there an example I can follow to make our integration tests
> run against the release artifacts?
>
> Andrew
>


Re: Java code under main depends on junit?

2018-05-17 Thread Thomas Weise
Thanks!

IMO we should at least run "mvn verify -DskipTests" in precommit until the
maven build can be retired (== deleted from master).


On Thu, May 17, 2018 at 5:00 PM, Anton Kedin  wrote:

> Opened PR  to fix the current
> build issue, opened BEAM-4358
>  to extract test
> dependencies.
>
> Should we keep maven precommits running for now if we have to fix the
> issues like these? In the PR I had to fix another issue in the same
> project, and I suspect other projects are broken for me for similar reasons.
>
> Regards,
> Anton
>
> On Thu, May 17, 2018 at 4:52 PM Kenneth Knowles  wrote:
>
>> I know what you mean. But indeed, test artifacts are unsuitable to depend
>> on since transitive deps don't work correctly. I think it makes sense to
>> have a separate test utility. For the core, one reason we didn't was to
>> have PAssert available in main. But now that we have Gradle we actually can
>> do that because it is not a true cycle but a false cycle introduced by
>> maven.
>>
>> For GCP it is even easier.
>>
>> Kenn
>>
>>
>> On Thu, May 17, 2018, 16:28 Thomas Weise  wrote:
>>
>>> It is possible to depend on a test artifact to achieve the same, but
>>> unfortunately not transitively.
>>>
>>> Mixing test utilities into the main artifacts seems undesirable, since
>>> they are only needed for tests. It may give more food to the shading
>>> monster also..
>>>
>>> So it is probably better to create a dedicated test tools artifact that
>>> qualifies as transitive dependency?
>>>
>>> Thanks
>>>
>>>
>>> On Thu, May 17, 2018 at 4:17 PM, Kenneth Knowles  wrote:
>>>
 This seems correct. Test jars are for tests. Utilities to be used for
 tests need to be in main jars. (If for no other reason, this is how
 transitive deps work)

 We've considered putting these things in a separate package (still in
 main). Just no one has done it.

 Kenn

 On Thu, May 17, 2018, 16:04 Thomas Weise  wrote:

> Hi,
>
> Is the following dependency intended or an oversight?
>
> https://github.com/apache/beam/blob/06c70bdf871c5da8a115011b43f807
> 2916cd79e8/sdks/java/io/google-cloud-platform/src/
> main/java/org/apache/beam/sdk/io/gcp/pubsub/TestPubsub.java#L32
>
> It appears that dependent code is in test scope.
>
> Should the build flag this (the maven build fails)?
>
> Thanks
>
>
>>>


Re: Java code under main depends on junit?

2018-05-17 Thread Anton Kedin
Opened PR  to fix the current
build issue, opened BEAM-4358
 to extract test
dependencies.

Should we keep maven precommits running for now if we have to fix the
issues like these? In the PR I had to fix another issue in the same
project, and I suspect other projects are broken for me for similar reasons.

Regards,
Anton

On Thu, May 17, 2018 at 4:52 PM Kenneth Knowles  wrote:

> I know what you mean. But indeed, test artifacts are unsuitable to depend
> on since transitive deps don't work correctly. I think it makes sense to
> have a separate test utility. For the core, one reason we didn't was to
> have PAssert available in main. But now that we have Gradle we actually can
> do that because it is not a true cycle but a false cycle introduced by
> maven.
>
> For GCP it is even easier.
>
> Kenn
>
>
> On Thu, May 17, 2018, 16:28 Thomas Weise  wrote:
>
>> It is possible to depend on a test artifact to achieve the same, but
>> unfortunately not transitively.
>>
>> Mixing test utilities into the main artifacts seems undesirable, since
>> they are only needed for tests. It may give more food to the shading
>> monster also..
>>
>> So it is probably better to create a dedicated test tools artifact that
>> qualifies as transitive dependency?
>>
>> Thanks
>>
>>
>> On Thu, May 17, 2018 at 4:17 PM, Kenneth Knowles  wrote:
>>
>>> This seems correct. Test jars are for tests. Utilities to be used for
>>> tests need to be in main jars. (If for no other reason, this is how
>>> transitive deps work)
>>>
>>> We've considered putting these things in a separate package (still in
>>> main). Just no one has done it.
>>>
>>> Kenn
>>>
>>> On Thu, May 17, 2018, 16:04 Thomas Weise  wrote:
>>>
 Hi,

 Is the following dependency intended or an oversight?


 https://github.com/apache/beam/blob/06c70bdf871c5da8a115011b43f8072916cd79e8/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/TestPubsub.java#L32

 It appears that dependent code is in test scope.

 Should the build flag this (the maven build fails)?

 Thanks


>>


Re: Java code under main depends on junit?

2018-05-17 Thread Kenneth Knowles
I know what you mean. But indeed, test artifacts are unsuitable to depend
on since transitive deps don't work correctly. I think it makes sense to
have a separate test utility. For the core, one reason we didn't was to
have PAssert available in main. But now that we have Gradle we actually can
do that because it is not a true cycle but a false cycle introduced by
maven.

For GCP it is even easier.

Kenn


On Thu, May 17, 2018, 16:28 Thomas Weise  wrote:

> It is possible to depend on a test artifact to achieve the same, but
> unfortunately not transitively.
>
> Mixing test utilities into the main artifacts seems undesirable, since
> they are only needed for tests. It may give more food to the shading
> monster also..
>
> So it is probably better to create a dedicated test tools artifact that
> qualifies as transitive dependency?
>
> Thanks
>
>
> On Thu, May 17, 2018 at 4:17 PM, Kenneth Knowles  wrote:
>
>> This seems correct. Test jars are for tests. Utilities to be used for
>> tests need to be in main jars. (If for no other reason, this is how
>> transitive deps work)
>>
>> We've considered putting these things in a separate package (still in
>> main). Just no one has done it.
>>
>> Kenn
>>
>> On Thu, May 17, 2018, 16:04 Thomas Weise  wrote:
>>
>>> Hi,
>>>
>>> Is the following dependency intended or an oversight?
>>>
>>>
>>> https://github.com/apache/beam/blob/06c70bdf871c5da8a115011b43f8072916cd79e8/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/TestPubsub.java#L32
>>>
>>> It appears that dependent code is in test scope.
>>>
>>> Should the build flag this (the maven build fails)?
>>>
>>> Thanks
>>>
>>>
>


Re: Java code under main depends on junit?

2018-05-17 Thread Thomas Weise
It is possible to depend on a test artifact to achieve the same, but
unfortunately not transitively.

Mixing test utilities into the main artifacts seems undesirable, since they
are only needed for tests. It may give more food to the shading monster
also..

So it is probably better to create a dedicated test tools artifact that
qualifies as transitive dependency?

Thanks


On Thu, May 17, 2018 at 4:17 PM, Kenneth Knowles  wrote:

> This seems correct. Test jars are for tests. Utilities to be used for
> tests need to be in main jars. (If for no other reason, this is how
> transitive deps work)
>
> We've considered putting these things in a separate package (still in
> main). Just no one has done it.
>
> Kenn
>
> On Thu, May 17, 2018, 16:04 Thomas Weise  wrote:
>
>> Hi,
>>
>> Is the following dependency intended or an oversight?
>>
>> https://github.com/apache/beam/blob/06c70bdf871c5da8a115011b43f807
>> 2916cd79e8/sdks/java/io/google-cloud-platform/src/
>> main/java/org/apache/beam/sdk/io/gcp/pubsub/TestPubsub.java#L32
>>
>> It appears that dependent code is in test scope.
>>
>> Should the build flag this (the maven build fails)?
>>
>> Thanks
>>
>>


Re: I'm back and ready to help grow our community!

2018-05-17 Thread Robert Bradshaw
Congratulations, Gris! And welcome back!
On Thu, May 17, 2018 at 3:30 PM Robin Qiu  wrote:

> Congratulations! Welcome back!

> On Thu, May 17, 2018 at 3:23 PM Reuven Lax  wrote:

>> Congratulations! Good to see you back!

>> Reuven

>> On Thu, May 17, 2018 at 2:24 PM Griselda Cuevas  wrote:

>>> Hi Everyone,


>>> I was absent from the mailing list, slack channel and our Beam
community for the past six weeks, the reason was that I took a leave to
focus on finishing my Masters Degree, which I finally did on May 15th.


>>> I graduated as a Masters of Engineering in Operations Research with a
concentration in Data Science from UC Berkeley. I'm glad to be part of this
community and I'd like to share this accomplishment with you so I'm adding
two pictures of that day :)


>>> Given that I've seen so many new folks around, I'd like to use this
opportunity to re-introduce myself. I'm Gris Cuevas and I work at Google.
Now that I'm back, I'll continue to work on supporting our community in two
main streams: Contribution Experience & Events, Meetups, and Conferences.


>>> It's good to be back and I look forward to collaborating with you.


>>> Cheers,

>>> Gris


Re: Java code under main depends on junit?

2018-05-17 Thread Kenneth Knowles
This seems correct. Test jars are for tests. Utilities to be used for tests
need to be in main jars. (If for no other reason, this is how transitive
deps work)

We've considered putting these things in a separate package (still in
main). Just no one has done it.

Kenn

On Thu, May 17, 2018, 16:04 Thomas Weise  wrote:

> Hi,
>
> Is the following dependency intended or an oversight?
>
>
> https://github.com/apache/beam/blob/06c70bdf871c5da8a115011b43f8072916cd79e8/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/TestPubsub.java#L32
>
> It appears that dependent code is in test scope.
>
> Should the build flag this (the maven build fails)?
>
> Thanks
>
>


Re: Java code under main depends on junit?

2018-05-17 Thread Anton Kedin
My fault, I'll fix the maven issue.

I added this file and it is not in test intentionally. The purpose of this
class is similar to TestPipeline, in that other packages which depend on
GCP IO can use this class in tests, including integration tests. For
example, right now Beam SQL project depends on GCP IO project and uses both
TestPipeline and TestPubsub in the integration tests. Is there a better
approach for such use case?

Regards,
Anton

On Thu, May 17, 2018 at 4:04 PM Thomas Weise  wrote:

> Hi,
>
> Is the following dependency intended or an oversight?
>
>
> https://github.com/apache/beam/blob/06c70bdf871c5da8a115011b43f8072916cd79e8/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/TestPubsub.java#L32
>
> It appears that dependent code is in test scope.
>
> Should the build flag this (the maven build fails)?
>
> Thanks
>
>


Re: [DISCUSS] Remove findbugs from sdks/java

2018-05-17 Thread Scott Wegner
+0.02173913

I'm happy to replace FindBugs with ErrorProne, but we need to first upgrade
ErrorProne analyzer warnings to errors. Currently the codebase is full of
warning spam, and there's no enforcement preventing future violations from
being added.

I've done the work for enforcing ErrorProne analysis on java-sdk-core [1],
and I've sharded out the rest of the Java components in JIRA issues [2] (45
total).  Fixing the issues is relatively straightforward, and I've tried to
provide enough guidance to make them as starter tasks (example: [3]). Teng
Peng has already started on Spark [4] (thanks!)

[1] https://github.com/apache/beam/pull/5319
[2]
https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20%3D%20Open%20AND%20labels%20%3D%20errorprone
[3] https://issues.apache.org/jira/browse/BEAM-4347
[4] https://issues.apache.org/jira/browse/BEAM-4318

On Thu, May 17, 2018 at 2:00 PM Ismaël Mejía  wrote:

> +0.7 also. Findbugs support for more recent versions of Java is lacking and
> the maintenance seems frozen in time.
>
> As a possible plan b can we identify the missing important validations to
> identify how much we lose and if it is considerable, maybe we can create a
> minimal configuration for those, and eventually migrate from findbugs to
> spotbugs (https://github.com/spotbugs/spotbugs/) that seems at least to be
> maintained and the most active findbugs fork.
>
>
> On Thu, May 17, 2018 at 9:31 PM Kenneth Knowles  wrote:
>
> > +0.7 I think we should work to remove findbugs. Errorprone covers most of
> the same stuff but better and faster.
>
> > The one thing I'm not sure about is nullness analysis. Findbugs has some
> serious limitations there but it really improves code quality and prevents
> blunders. I'm not sure errorprone covers that. I know the Checker analyzer
> has a full solution that makes NPE impossible as in most modern languages.
> Maybe that is easy to plug in. The core Java SDK is a good candidate for
> the first place to do it since it is affects everything else.
>
> > On Thu, May 17, 2018 at 12:02 PM Tim Robertson <
> timrobertson...@gmail.com>
> wrote:
>
> >> Hi all,
> >> [bringing a side thread discussion from slack to here]
>
> >> We're tackling error-prone warnings now and we aim to fail the build on
> warnings raised [1].
>
> >> Enabling failOnWarning also fails the build on findbugs warnings.
> Currently I see places where these  arise from missing a dependency on
> findbugs_annotations and I asked on slack the best way to introduce this
> globally in gradle.
>
> >> In that discussion the idea was floated to consider removing findbugs
> completely given it is older, has licensing considerations and is not
> released regularly.
>
> >> What do people think about this idea please?
>
> >> Thanks,
> >> Tim
> >> [1]
>
> https://lists.apache.org/thread.html/95aae2785c3cd728c2d3378cbdff2a7ba19caffcd4faa2049d2e2f46@%3Cdev.beam.apache.org%3E
>


Java code under main depends on junit?

2018-05-17 Thread Thomas Weise
Hi,

Is the following dependency intended or an oversight?

https://github.com/apache/beam/blob/06c70bdf871c5da8a115011b43f8072916cd79e8/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/TestPubsub.java#L32

It appears that dependent code is in test scope.

Should the build flag this (the maven build fails)?

Thanks


SQL shaded jars don't work. How to test?

2018-05-17 Thread Andrew Pilloud
I decided to try our new JDBC support with sqlline and discovered that our
SQL shaded jar is completely broken. As in java.lang.NoClassDefFoundError
all over the place. How are we testing the output jars from other beam
packages? Is there an example I can follow to make our integration tests
run against the release artifacts?

Andrew


Re: I'm back and ready to help grow our community!

2018-05-17 Thread Robin Qiu
Congratulations! Welcome back!

On Thu, May 17, 2018 at 3:23 PM Reuven Lax  wrote:

> Congratulations! Good to see you back!
>
> Reuven
>
> On Thu, May 17, 2018 at 2:24 PM Griselda Cuevas  wrote:
>
>> Hi Everyone,
>>
>>
>> I was absent from the mailing list, slack channel and our Beam community
>> for the past six weeks, the reason was that I took a leave to focus on
>> finishing my Masters Degree, which I finally did on May 15th.
>>
>>
>> I graduated as a Masters of Engineering in Operations Research with a
>> concentration in Data Science from UC Berkeley. I'm glad to be part of this
>> community and I'd like to share this accomplishment with you so I'm adding
>> two pictures of that day :)
>>
>>
>> Given that I've seen so many new folks around, I'd like to use this
>> opportunity to re-introduce myself. I'm Gris Cuevas and I work at Google.
>> Now that I'm back, I'll continue to work on supporting our community in two
>> main streams: Contribution Experience & Events, Meetups, and Conferences.
>>
>>
>> It's good to be back and I look forward to collaborating with you.
>>
>>
>> Cheers,
>>
>> Gris
>>
>


Re: I'm back and ready to help grow our community!

2018-05-17 Thread Reuven Lax
Congratulations! Good to see you back!

Reuven

On Thu, May 17, 2018 at 2:24 PM Griselda Cuevas  wrote:

> Hi Everyone,
>
>
> I was absent from the mailing list, slack channel and our Beam community
> for the past six weeks, the reason was that I took a leave to focus on
> finishing my Masters Degree, which I finally did on May 15th.
>
>
> I graduated as a Masters of Engineering in Operations Research with a
> concentration in Data Science from UC Berkeley. I'm glad to be part of this
> community and I'd like to share this accomplishment with you so I'm adding
> two pictures of that day :)
>
>
> Given that I've seen so many new folks around, I'd like to use this
> opportunity to re-introduce myself. I'm Gris Cuevas and I work at Google.
> Now that I'm back, I'll continue to work on supporting our community in two
> main streams: Contribution Experience & Events, Meetups, and Conferences.
>
>
> It's good to be back and I look forward to collaborating with you.
>
>
> Cheers,
>
> Gris
>


Current progress on Portable runners

2018-05-17 Thread Eugene Kirpichov
Hi all,

A little over a month ago, a large group of Beam community members has been
working a prototype of a portable Flink runner - that is, a runner that can
execute Beam pipelines on Flink via the Portability API
. The prototype was developed in
a separate
branch  and was
successfully demonstrated at Flink Forward, where it ran Python and Go
pipelines in a limited setting.

Since then, a smaller group of people (Ankur Goenka, Axel Magnuson, Ben
Sidhom and myself) have been working on productionizing the prototype to
address its limitations and do things "the right way", preparing to reuse
this work for developing other portable runners (e.g. Spark). This involves
a surprising amount of work, since many important design and implementation
concerns could be ignored for the purposes of a prototype. I wanted to give
an update on where we stand now.

Our immediate milestone in sight is *Run Java and Python batch WordCount
examples against a distributed remote Flink cluster*. That involves a few
moving parts, roughly in order of appearance:

*Job submission:*
- The SDK is configured to use a "portable runner", whose responsibility is
to run the pipeline against a given JobService endpoint.
- The portable runner converts the pipeline to a portable Pipeline proto
- The runner finds out which artifacts it needs to stage, and staging them
against an ArtifactStagingService
- A Flink-specific JobService receives the Pipeline proto, performs some
optimizations (e.g. fusion) and translates it to Flink datasets and
functions

*Job execution:*
- A Flink function executes a fused chain of Beam transforms (an
"executable stage") by converting the input and the stage to bundles and
executing them against an SDK harness
- The function starts the proper SDK harness, auxiliary services (e.g.
artifact retrieval, side input handling) and wires them together
- The function feeds the data to the harness and receives data back.

*And here is our status of implementation for these parts:* basically,
almost everything is either done or in review.

*Job submission:*
- General-purpose portable runner in the Python SDK: done
; Java SDK: also done

- Artifact staging from the Python SDK: in review (PR
, PR
); in java, it's done also
- Flink JobService: in review 
- Translation from a Pipeline proto to Flink datasets and functions: done

- ArtifactStagingService implementation that stages artifacts to a location
on a distributed filesystem: in development (design is clear)

*Job execution:*
- Flink function for executing via an SDK harness: done

- APIs for managing lifecycle of an SDK harness: done

- Specific implementation of those APIs using Docker: part done
, part in review

- ArtifactRetrievalService that retrieves artifacts from the location where
ArtifactStagingService staged them: in development.

We expect that the in-review parts will be done, and the in-development
parts be developed, in the next 2-3 weeks. We will, of course, update the
community when this important milestone is reached.

*After that, the next milestones include:*
- Sett up Java, Python and Go ValidatesRunner tests to run against the
portable Flink runner, and get them to pass
- Expand Python and Go to parity in terms of such test coverage
- Implement the portable Spark runner, with a similar lifecycle but reusing
almost all of the Flink work
- Add support for streaming to both (which requires SDF - that work is
progressing in parallel and by this point should be in a suitable place)

*For people who would like to get involved in this effort: *You can already
help out by improving ValidatesRunner test coverage in Python and Go. Java
has >300 such tests, Python has only a handful. There'll be a large amount
of parallelizable work once we get the VR test suites running - stay tuned.
SDF+Portability is also expected to produce a lot of parallelizable work up
for grabs within several weeks.

Thanks!


Re: I'm back and ready to help grow our community!

2018-05-17 Thread OrielResearch Eila Arich-Landkof
Congratulations


On Thu, May 17, 2018 at 5:19 PM, Griselda Cuevas  wrote:

> Hi Everyone,
>
>
> I was absent from the mailing list, slack channel and our Beam community
> for the past six weeks, the reason was that I took a leave to focus on
> finishing my Masters Degree, which I finally did on May 15th.
>
>
> I graduated as a Masters of Engineering in Operations Research with a
> concentration in Data Science from UC Berkeley. I'm glad to be part of this
> community and I'd like to share this accomplishment with you so I'm adding
> two pictures of that day :)
>
>
> Given that I've seen so many new folks around, I'd like to use this
> opportunity to re-introduce myself. I'm Gris Cuevas and I work at Google.
> Now that I'm back, I'll continue to work on supporting our community in two
> main streams: Contribution Experience & Events, Meetups, and Conferences.
>
>
> It's good to be back and I look forward to collaborating with you.
>
>
> Cheers,
>
> Gris
>



-- 
Eila
www.orielresearch.org
https://www.meetup.com/Deep-Learning-In-Production/


Java PreCommit seems broken

2018-05-17 Thread Pablo Estrada
I'm seeing failures on Maven Archetype-related tests.

Build Scan of a sample run: https://scans.gradle.com/s/kr23q43mh6fmk

And the failure is here specifically:
https://scans.gradle.com/s/kr23q43mh6fmk/console-log?task=:beam-sdks-java-maven-archetypes-examples:generateAndBuildArchetypeTest#L116


Does anyone know why this might be happening?
Best
-P.
-- 
Got feedback? go/pabloem-feedback


Re: I'm back and ready to help grow our community!

2018-05-17 Thread Jason Kuster
Wonderful Gris; warmest congratulations on the milestone and glad to have
you back. :D

On Thu, May 17, 2018 at 2:36 PM Kenneth Knowles  wrote:

> Congratulations!!
>
> On Thu, May 17, 2018 at 2:21 PM Griselda Cuevas  wrote:
>
>> Hi Everyone,
>>
>>
>> I was absent from the mailing list, slack channel and our Beam community
>> for the past six weeks, the reason was that I took a leave to focus on
>> finishing my Masters Degree, which I finally did on May 15th.
>>
>>
>> I graduated as a Masters of Engineering in Operations Research with a
>> concentration in Data Science from UC Berkeley. I'm glad to be part of this
>> community and I'd like to share this accomplishment with you so I'm adding
>> two pictures of that day :)
>>
>>
>> Given that I've seen so many new folks around, I'd like to use this
>> opportunity to re-introduce myself. I'm Gris Cuevas and I work at Google.
>> Now that I'm back, I'll continue to work on supporting our community in two
>> main streams: Contribution Experience & Events, Meetups, and Conferences.
>>
>>
>> It's good to be back and I look forward to collaborating with you.
>>
>>
>> Cheers,
>>
>> Gris
>>
>

-- 
---
Jason Kuster
Apache Beam / Google Cloud Dataflow

See something? Say something. go/jasonkuster-feedback


Re: I'm back and ready to help grow our community!

2018-05-17 Thread Kenneth Knowles
Congratulations!!

On Thu, May 17, 2018 at 2:21 PM Griselda Cuevas  wrote:

> Hi Everyone,
>
>
> I was absent from the mailing list, slack channel and our Beam community
> for the past six weeks, the reason was that I took a leave to focus on
> finishing my Masters Degree, which I finally did on May 15th.
>
>
> I graduated as a Masters of Engineering in Operations Research with a
> concentration in Data Science from UC Berkeley. I'm glad to be part of this
> community and I'd like to share this accomplishment with you so I'm adding
> two pictures of that day :)
>
>
> Given that I've seen so many new folks around, I'd like to use this
> opportunity to re-introduce myself. I'm Gris Cuevas and I work at Google.
> Now that I'm back, I'll continue to work on supporting our community in two
> main streams: Contribution Experience & Events, Meetups, and Conferences.
>
>
> It's good to be back and I look forward to collaborating with you.
>
>
> Cheers,
>
> Gris
>


Re: I'm back and ready to help grow our community!

2018-05-17 Thread Udi Meiri
Welcome back and congrats again!

On Thu, May 17, 2018 at 2:23 PM Dmitry Demeshchuk 
wrote:

> While this may be a bit off topic, I still want to say this.
>
> Congratulations on your graduation, Gris!
>
> On Thu, May 17, 2018 at 2:19 PM, Griselda Cuevas  wrote:
>
>> Hi Everyone,
>>
>>
>> I was absent from the mailing list, slack channel and our Beam community
>> for the past six weeks, the reason was that I took a leave to focus on
>> finishing my Masters Degree, which I finally did on May 15th.
>>
>>
>> I graduated as a Masters of Engineering in Operations Research with a
>> concentration in Data Science from UC Berkeley. I'm glad to be part of this
>> community and I'd like to share this accomplishment with you so I'm adding
>> two pictures of that day :)
>>
>>
>> Given that I've seen so many new folks around, I'd like to use this
>> opportunity to re-introduce myself. I'm Gris Cuevas and I work at Google.
>> Now that I'm back, I'll continue to work on supporting our community in two
>> main streams: Contribution Experience & Events, Meetups, and Conferences.
>>
>>
>> It's good to be back and I look forward to collaborating with you.
>>
>>
>> Cheers,
>>
>> Gris
>>
>
>
>
> --
> Best regards,
> Dmitry Demeshchuk.
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: I'm back and ready to help grow our community!

2018-05-17 Thread Dmitry Demeshchuk
While this may be a bit off topic, I still want to say this.

Congratulations on your graduation, Gris!

On Thu, May 17, 2018 at 2:19 PM, Griselda Cuevas  wrote:

> Hi Everyone,
>
>
> I was absent from the mailing list, slack channel and our Beam community
> for the past six weeks, the reason was that I took a leave to focus on
> finishing my Masters Degree, which I finally did on May 15th.
>
>
> I graduated as a Masters of Engineering in Operations Research with a
> concentration in Data Science from UC Berkeley. I'm glad to be part of this
> community and I'd like to share this accomplishment with you so I'm adding
> two pictures of that day :)
>
>
> Given that I've seen so many new folks around, I'd like to use this
> opportunity to re-introduce myself. I'm Gris Cuevas and I work at Google.
> Now that I'm back, I'll continue to work on supporting our community in two
> main streams: Contribution Experience & Events, Meetups, and Conferences.
>
>
> It's good to be back and I look forward to collaborating with you.
>
>
> Cheers,
>
> Gris
>



-- 
Best regards,
Dmitry Demeshchuk.


Re: Wait.on() - "Do this, then that" transform

2018-05-17 Thread Eugene Kirpichov
I mean it has to return a PCollection of something, that contains elements
representing the result of completing processing of the respective window.
E.g. FileIO.write() returns a PCollection of filenames; SpannerIO.write()
returns simply a PCollection of Void.

However, connectors such as BigtableIO.write() and BigQueryIO.write() don't
return such a PCollection. The former returns PDone; the latter returns a
PCollection of failed inserts that in some cases is unconnected to the
actual processing (when using load jobs).

On Thu, May 17, 2018 at 1:55 PM Ismaël Mejía  wrote:

> This sounds super interesting and useful !
>
> Eugene can you please elaborate on this phrase 'has to return a result that
> can be waited on'. It is not clear for me what this means and I would like
> to understand this to evaluate what other IOs could potentially support
> this.
>
>
> On Thu, May 17, 2018 at 10:13 PM Eugene Kirpichov 
> wrote:
>
> > Thanks Kenn, forwarding to user@ is a good idea; just did that.
>
> > JB - this is orthogonal to SDF, because I'd expect this transform to be
> primarily used for waiting on the results of SomethingIO.write(), whereas
> SDF is primarily useful for implementing SomethingIO.read().
>
> > On Mon, May 14, 2018 at 10:25 PM Jean-Baptiste Onofré 
> wrote:
>
> >> Cool !!!
>
> >> I guess we can leverage this in IOs with SDF.
>
> >> Thanks
> >> Regards
> >> JB
>
> >> On 14/05/2018 23:48, Eugene Kirpichov wrote:
> >> > Hi folks,
> >> >
> >> > Wanted to give a heads up about the existence of a commonly requested
> >> > feature and its first successful production usage.
> >> >
> >> > The feature is the Wait.on() transform [1] , and the first successful
> >> > production usage is in Spanner [2] .
> >> >
> >> > The Wait.on() transform allows you to "do this, then that" - in the
> >> > sense that a.apply(Wait.on(signal)) re-emits PCollection "a", but only
> >> > after the PCollection "signal" is "done" in the same window (i.e. when
> >> > no more elements can arrive into the same window of "signal"). The
> >> > PCollection "signal" is typically a collection of results of some
> >> > operation - so Wait.on(signal) allows you to wait until that operation
> >> > is done. It transparently works correctly in streaming pipelines too.
> >> >
> >> > This may sound a little convoluted, so the example from documentation
> >> > should help.
> >> >
> >> > PCollection firstWriteResults = data.apply(ParDo.of(...write to
> >> > first database...));
> >> > data.apply(Wait.on(firstWriteResults))
> >> >   // Windows of this intermediate PCollection will be processed no
> >> > earlier than when
> >> >   // the respective window of firstWriteResults closes.
> >> >   .apply(ParDo.of(...write to second database...));
> >> >
> >> > This is indeed what Spanner folks have done, and AFAIK they intend
> this
> >> > for importing multiple dependent database tables - e.g. first import a
> >> > parent table; when it's done, import the child table - all within one
> >> > pipeline. You can see example code in the tests [3].
> >> >
> >> > Please note that this kind of stuff requires support from the IO
> >> > connector - IO.write() has to return a result that can be waited on.
> The
> >> > code of SpannerIO is a great example; another example is
> FileIO.write().
> >> >
> >> > People have expressed wishes for similar support in Bigtable and
> >> > BigQuery connectors but it's not there yet. It would be really cool if
> >> > somebody added it to these connectors or others (I think there was a
> >> > recent thread discussing how to add it to BigQueryIO).
> >> >
> >> > [1]
> >> >
>
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Wait.java
> >> > [2] https://github.com/apache/beam/pull/4264
> >> > [3]
> >> >
>
> https://github.com/apache/beam/blob/a3ce091b3bbebf724c63be910bd3bc4cede4d11f/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerWriteIT.java#L158
> >> >
>


Re: [DISCUSS] Remove findbugs from sdks/java

2018-05-17 Thread Ismaël Mejía
+0.7 also. Findbugs support for more recent versions of Java is lacking and
the maintenance seems frozen in time.

As a possible plan b can we identify the missing important validations to
identify how much we lose and if it is considerable, maybe we can create a
minimal configuration for those, and eventually migrate from findbugs to
spotbugs (https://github.com/spotbugs/spotbugs/) that seems at least to be
maintained and the most active findbugs fork.


On Thu, May 17, 2018 at 9:31 PM Kenneth Knowles  wrote:

> +0.7 I think we should work to remove findbugs. Errorprone covers most of
the same stuff but better and faster.

> The one thing I'm not sure about is nullness analysis. Findbugs has some
serious limitations there but it really improves code quality and prevents
blunders. I'm not sure errorprone covers that. I know the Checker analyzer
has a full solution that makes NPE impossible as in most modern languages.
Maybe that is easy to plug in. The core Java SDK is a good candidate for
the first place to do it since it is affects everything else.

> On Thu, May 17, 2018 at 12:02 PM Tim Robertson 
wrote:

>> Hi all,
>> [bringing a side thread discussion from slack to here]

>> We're tackling error-prone warnings now and we aim to fail the build on
warnings raised [1].

>> Enabling failOnWarning also fails the build on findbugs warnings.
Currently I see places where these  arise from missing a dependency on
findbugs_annotations and I asked on slack the best way to introduce this
globally in gradle.

>> In that discussion the idea was floated to consider removing findbugs
completely given it is older, has licensing considerations and is not
released regularly.

>> What do people think about this idea please?

>> Thanks,
>> Tim
>> [1]
https://lists.apache.org/thread.html/95aae2785c3cd728c2d3378cbdff2a7ba19caffcd4faa2049d2e2f46@%3Cdev.beam.apache.org%3E


Re: Wait.on() - "Do this, then that" transform

2018-05-17 Thread Ismaël Mejía
This sounds super interesting and useful !

Eugene can you please elaborate on this phrase 'has to return a result that
can be waited on'. It is not clear for me what this means and I would like
to understand this to evaluate what other IOs could potentially support
this.


On Thu, May 17, 2018 at 10:13 PM Eugene Kirpichov 
wrote:

> Thanks Kenn, forwarding to user@ is a good idea; just did that.

> JB - this is orthogonal to SDF, because I'd expect this transform to be
primarily used for waiting on the results of SomethingIO.write(), whereas
SDF is primarily useful for implementing SomethingIO.read().

> On Mon, May 14, 2018 at 10:25 PM Jean-Baptiste Onofré 
wrote:

>> Cool !!!

>> I guess we can leverage this in IOs with SDF.

>> Thanks
>> Regards
>> JB

>> On 14/05/2018 23:48, Eugene Kirpichov wrote:
>> > Hi folks,
>> >
>> > Wanted to give a heads up about the existence of a commonly requested
>> > feature and its first successful production usage.
>> >
>> > The feature is the Wait.on() transform [1] , and the first successful
>> > production usage is in Spanner [2] .
>> >
>> > The Wait.on() transform allows you to "do this, then that" - in the
>> > sense that a.apply(Wait.on(signal)) re-emits PCollection "a", but only
>> > after the PCollection "signal" is "done" in the same window (i.e. when
>> > no more elements can arrive into the same window of "signal"). The
>> > PCollection "signal" is typically a collection of results of some
>> > operation - so Wait.on(signal) allows you to wait until that operation
>> > is done. It transparently works correctly in streaming pipelines too.
>> >
>> > This may sound a little convoluted, so the example from documentation
>> > should help.
>> >
>> > PCollection firstWriteResults = data.apply(ParDo.of(...write to
>> > first database...));
>> > data.apply(Wait.on(firstWriteResults))
>> >   // Windows of this intermediate PCollection will be processed no
>> > earlier than when
>> >   // the respective window of firstWriteResults closes.
>> >   .apply(ParDo.of(...write to second database...));
>> >
>> > This is indeed what Spanner folks have done, and AFAIK they intend this
>> > for importing multiple dependent database tables - e.g. first import a
>> > parent table; when it's done, import the child table - all within one
>> > pipeline. You can see example code in the tests [3].
>> >
>> > Please note that this kind of stuff requires support from the IO
>> > connector - IO.write() has to return a result that can be waited on.
The
>> > code of SpannerIO is a great example; another example is
FileIO.write().
>> >
>> > People have expressed wishes for similar support in Bigtable and
>> > BigQuery connectors but it's not there yet. It would be really cool if
>> > somebody added it to these connectors or others (I think there was a
>> > recent thread discussing how to add it to BigQueryIO).
>> >
>> > [1]
>> >
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Wait.java
>> > [2] https://github.com/apache/beam/pull/4264
>> > [3]
>> >
https://github.com/apache/beam/blob/a3ce091b3bbebf724c63be910bd3bc4cede4d11f/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerWriteIT.java#L158
>> >


Re: Beam high level directions (was "Graal instead of docker?")

2018-05-17 Thread Romain Manni-Bucau
All runners just provide translations so it is easy to build features on
top of primitives, ie basic translations, instead of requiring runners to
use the same lib which is not yet done and will likely not be done when
adding new reusable parts - keep in mind beam starts to have runners not
hosted at beam.

Having a runner able to do that is a more elegant and robust way than the
fallback which consists to let the user call the pipeline visit to
translate the dag N times but it leads to almost the same at a few specific
exceptions.

Le jeu. 17 mai 2018 21:48, Kenneth Knowles  a écrit :

> If all engines were identical, having a shared optimizer would be useful. 
> Having
> a proxy runner that performance optimizations before submission to an
> actual engine-specific runner has downsides in both directions:
>
>  - obscures the ability of engine-specific runners to optimize the Beam
> primitives because they only receive post-optimized graph
>  - has to be extremely conservative in its optimizations because it does
> not know about the semantics of the underlying engine
>
> Building it as libraries let's engine-specific runners do what is best for
> their engine, while still maximizing reuse.
>
> Kenn
>
> On Thu, May 17, 2018 at 11:43 AM Robert Burke  wrote:
>
>> The approach you're looking for sounds like the user's Runner of Choice,
>> would use a user side version of the runner core, without changing the
>> Runner of Choice?
>>
>> So a user would update their version of the SDK, and the runner would
>> have to pull the core component from the user pipeline?
>>
>> That sounds like it increases pipeline size and decreases pipeline
>> portability, especially for pipelines that are not in the same language as
>> the runner-core, such as for Python and Go.
>>
>> It's not clear to me what runners would be doing in that scenario either.
>> Do you have a proposal about where the interface boundaries would be?
>>
>> On Wed, May 16, 2018, 10:05 PM Romain Manni-Bucau 
>> wrote:
>>
>>> The runner core doesnt fully align on that or rephrased more accurately,
>>> it doesnt go as far as it could for me. Having to call it, is still an
>>> issue since it requires a runner update instead of getting the new feature
>>> for free. The next step sounds to be *one* runner where implementations
>>> plug their translations probably. It would reverse the current pattern and
>>> prepare beam for the future. One good example of such implementation is the
>>> sdf which can "just" reuse dofn primitives to wire its support through
>>> runners.
>>>
>>> Le jeu. 17 mai 2018 02:01, Jesse Anderson  a
>>> écrit :
>>>
 This -> "I'd like that each time you think that you ask yourself "does
 it need?"."

 On Wed, May 16, 2018 at 4:53 PM Robert Bradshaw 
 wrote:

> Thanks for your email, Romain. It helps understand your goals and where
> you're coming from. I'd also like to see a thinner core, and agree it's
> beneficial to reduce dependencies where possible, especially when
> supporting the usecase where the pipeline is constructed in an
> environment
> other than an end-user's main.
>
> It seems a lot of the portability work, despite being on the surface
> driven
> by multi-language, aligns well with many of these goals. For example,
> all
> the work going on in runners-core to provide a rich library that all
> (Java,
> and perhaps non-Java) runners can leverage to do DAG preprocessing
> (fusion,
> combiner lifting, ...) and handle the low-level details of managing
> worker
> subprocesses. As you state, the more we can put into these libraries,
> the
> more all runners can get "for free" by interacting with them, while
> still
> providing the flexibility to adapt to their differing models and
> strengths.
>
> Getting this right is, for me at least, one of the highest priorities
> for
> Beam.
>
> - Robert
> On Wed, May 16, 2018 at 11:51 AM Kenneth Knowles 
> wrote:
>
> > Hi Romain,
>
> > This gives a clear view of your perspective. I also recommend you ask
> around to those who have been working on Beam and big data processing
> for a
> long time to learn more about their perspective.
>
> > Your "Beam Analysis" is pretty accurate about what we've been trying
> to
> build. I would say (a) & (b) as "any language on any runner" and (c)
> is our
> plan of how to do it: define primitives which are fundamental to
> parallel
> processing and formalize a language-independent representation, with
> adapters for each language and data processing engine.
>
> > Of course anyone in the community may have their own particular
> goal. We
> don't control what they work on, and we are grateful for their efforts.

Re: Wait.on() - "Do this, then that" transform

2018-05-17 Thread Eugene Kirpichov
Thanks Kenn, forwarding to user@ is a good idea; just did that.

JB - this is orthogonal to SDF, because I'd expect this transform to be
primarily used for waiting on the results of SomethingIO.write(), whereas
SDF is primarily useful for implementing SomethingIO.read().

On Mon, May 14, 2018 at 10:25 PM Jean-Baptiste Onofré 
wrote:

> Cool !!!
>
> I guess we can leverage this in IOs with SDF.
>
> Thanks
> Regards
> JB
>
> On 14/05/2018 23:48, Eugene Kirpichov wrote:
> > Hi folks,
> >
> > Wanted to give a heads up about the existence of a commonly requested
> > feature and its first successful production usage.
> >
> > The feature is the Wait.on() transform [1] , and the first successful
> > production usage is in Spanner [2] .
> >
> > The Wait.on() transform allows you to "do this, then that" - in the
> > sense that a.apply(Wait.on(signal)) re-emits PCollection "a", but only
> > after the PCollection "signal" is "done" in the same window (i.e. when
> > no more elements can arrive into the same window of "signal"). The
> > PCollection "signal" is typically a collection of results of some
> > operation - so Wait.on(signal) allows you to wait until that operation
> > is done. It transparently works correctly in streaming pipelines too.
> >
> > This may sound a little convoluted, so the example from documentation
> > should help.
> >
> > PCollection firstWriteResults = data.apply(ParDo.of(...write to
> > first database...));
> > data.apply(Wait.on(firstWriteResults))
> >   // Windows of this intermediate PCollection will be processed no
> > earlier than when
> >   // the respective window of firstWriteResults closes.
> >   .apply(ParDo.of(...write to second database...));
> >
> > This is indeed what Spanner folks have done, and AFAIK they intend this
> > for importing multiple dependent database tables - e.g. first import a
> > parent table; when it's done, import the child table - all within one
> > pipeline. You can see example code in the tests [3].
> >
> > Please note that this kind of stuff requires support from the IO
> > connector - IO.write() has to return a result that can be waited on. The
> > code of SpannerIO is a great example; another example is FileIO.write().
> >
> > People have expressed wishes for similar support in Bigtable and
> > BigQuery connectors but it's not there yet. It would be really cool if
> > somebody added it to these connectors or others (I think there was a
> > recent thread discussing how to add it to BigQueryIO).
> >
> > [1]
> >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Wait.java
> > [2] https://github.com/apache/beam/pull/4264
> > [3]
> >
> https://github.com/apache/beam/blob/a3ce091b3bbebf724c63be910bd3bc4cede4d11f/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerWriteIT.java#L158
> >
>


Jenkins build is back to normal : beam_SeedJob_Standalone #1015

2018-05-17 Thread Apache Jenkins Server
See 




Re: Beam high level directions (was "Graal instead of docker?")

2018-05-17 Thread Kenneth Knowles
If all engines were identical, having a shared optimizer would be
useful. Having
a proxy runner that performance optimizations before submission to an
actual engine-specific runner has downsides in both directions:

 - obscures the ability of engine-specific runners to optimize the Beam
primitives because they only receive post-optimized graph
 - has to be extremely conservative in its optimizations because it does
not know about the semantics of the underlying engine

Building it as libraries let's engine-specific runners do what is best for
their engine, while still maximizing reuse.

Kenn

On Thu, May 17, 2018 at 11:43 AM Robert Burke  wrote:

> The approach you're looking for sounds like the user's Runner of Choice,
> would use a user side version of the runner core, without changing the
> Runner of Choice?
>
> So a user would update their version of the SDK, and the runner would have
> to pull the core component from the user pipeline?
>
> That sounds like it increases pipeline size and decreases pipeline
> portability, especially for pipelines that are not in the same language as
> the runner-core, such as for Python and Go.
>
> It's not clear to me what runners would be doing in that scenario either.
> Do you have a proposal about where the interface boundaries would be?
>
> On Wed, May 16, 2018, 10:05 PM Romain Manni-Bucau 
> wrote:
>
>> The runner core doesnt fully align on that or rephrased more accurately,
>> it doesnt go as far as it could for me. Having to call it, is still an
>> issue since it requires a runner update instead of getting the new feature
>> for free. The next step sounds to be *one* runner where implementations
>> plug their translations probably. It would reverse the current pattern and
>> prepare beam for the future. One good example of such implementation is the
>> sdf which can "just" reuse dofn primitives to wire its support through
>> runners.
>>
>> Le jeu. 17 mai 2018 02:01, Jesse Anderson  a
>> écrit :
>>
>>> This -> "I'd like that each time you think that you ask yourself "does
>>> it need?"."
>>>
>>> On Wed, May 16, 2018 at 4:53 PM Robert Bradshaw 
>>> wrote:
>>>
 Thanks for your email, Romain. It helps understand your goals and where
 you're coming from. I'd also like to see a thinner core, and agree it's
 beneficial to reduce dependencies where possible, especially when
 supporting the usecase where the pipeline is constructed in an
 environment
 other than an end-user's main.

 It seems a lot of the portability work, despite being on the surface
 driven
 by multi-language, aligns well with many of these goals. For example,
 all
 the work going on in runners-core to provide a rich library that all
 (Java,
 and perhaps non-Java) runners can leverage to do DAG preprocessing
 (fusion,
 combiner lifting, ...) and handle the low-level details of managing
 worker
 subprocesses. As you state, the more we can put into these libraries,
 the
 more all runners can get "for free" by interacting with them, while
 still
 providing the flexibility to adapt to their differing models and
 strengths.

 Getting this right is, for me at least, one of the highest priorities
 for
 Beam.

 - Robert
 On Wed, May 16, 2018 at 11:51 AM Kenneth Knowles 
 wrote:

 > Hi Romain,

 > This gives a clear view of your perspective. I also recommend you ask
 around to those who have been working on Beam and big data processing
 for a
 long time to learn more about their perspective.

 > Your "Beam Analysis" is pretty accurate about what we've been trying
 to
 build. I would say (a) & (b) as "any language on any runner" and (c) is
 our
 plan of how to do it: define primitives which are fundamental to
 parallel
 processing and formalize a language-independent representation, with
 adapters for each language and data processing engine.

 > Of course anyone in the community may have their own particular goal.
 We
 don't control what they work on, and we are grateful for their efforts.

 > Technically, there is plenty to agree with. I think as you learn about
 Beam you will find that many of your suggestions are already handled in
 some way. You may also continue to learn sometimes about the specific
 reasons things are done in a different way than you expected. These
 should
 help you find how to build what you want to build.

 > Kenn

 > On Wed, May 16, 2018 at 1:14 AM Romain Manni-Bucau <
 rmannibu...@gmail.com>
 wrote:

 >> Hi guys,

 >> Since it is not the first time we have a thread where we end up not
 understanding each other, I'd like to take this as an opportunity to
 clarify what i'm looking for, in a more formal 

Re: [DISCUSS] Remove findbugs from sdks/java

2018-05-17 Thread Kenneth Knowles
+0.7 I think we should work to remove findbugs. Errorprone covers most of
the same stuff but better and faster.

The one thing I'm not sure about is nullness analysis. Findbugs has some
serious limitations there but it really improves code quality and prevents
blunders. I'm not sure errorprone covers that. I know the Checker analyzer
has a full solution that makes NPE impossible as in most modern languages.
Maybe that is easy to plug in. The core Java SDK is a good candidate for
the first place to do it since it is affects everything else.

On Thu, May 17, 2018 at 12:02 PM Tim Robertson 
wrote:

> Hi all,
> [bringing a side thread discussion from slack to here]
>
> We're tackling error-prone warnings now and we aim to fail the build on
> warnings raised [1].
>
> Enabling failOnWarning also fails the build on findbugs warnings.
> Currently I see places where these  arise from missing a dependency on
> findbugs_annotations and I asked on slack the best way to introduce this
> globally in gradle.
>
> In that discussion the idea was floated to consider removing findbugs
> completely given it is older, has licensing considerations and is not
> released regularly.
>
> What do people think about this idea please?
>
> Thanks,
> Tim
> [1]
> https://lists.apache.org/thread.html/95aae2785c3cd728c2d3378cbdff2a7ba19caffcd4faa2049d2e2f46@%3Cdev.beam.apache.org%3E
>
>
>
>
>
>
>
>
>


[DISCUSS] Remove findbugs from sdks/java

2018-05-17 Thread Tim Robertson
Hi all,
[bringing a side thread discussion from slack to here]

We're tackling error-prone warnings now and we aim to fail the build on
warnings raised [1].

Enabling failOnWarning also fails the build on findbugs warnings.
Currently I see places where these  arise from missing a dependency on
findbugs_annotations and I asked on slack the best way to introduce this
globally in gradle.

In that discussion the idea was floated to consider removing findbugs
completely given it is older, has licensing considerations and is not
released regularly.

What do people think about this idea please?

Thanks,
Tim
[1]
https://lists.apache.org/thread.html/95aae2785c3cd728c2d3378cbdff2a7ba19caffcd4faa2049d2e2f46@%3Cdev.beam.apache.org%3E


Re: Beam high level directions (was "Graal instead of docker?")

2018-05-17 Thread Robert Burke
The approach you're looking for sounds like the user's Runner of Choice,
would use a user side version of the runner core, without changing the
Runner of Choice?

So a user would update their version of the SDK, and the runner would have
to pull the core component from the user pipeline?

That sounds like it increases pipeline size and decreases pipeline
portability, especially for pipelines that are not in the same language as
the runner-core, such as for Python and Go.

It's not clear to me what runners would be doing in that scenario either.
Do you have a proposal about where the interface boundaries would be?

On Wed, May 16, 2018, 10:05 PM Romain Manni-Bucau 
wrote:

> The runner core doesnt fully align on that or rephrased more accurately,
> it doesnt go as far as it could for me. Having to call it, is still an
> issue since it requires a runner update instead of getting the new feature
> for free. The next step sounds to be *one* runner where implementations
> plug their translations probably. It would reverse the current pattern and
> prepare beam for the future. One good example of such implementation is the
> sdf which can "just" reuse dofn primitives to wire its support through
> runners.
>
> Le jeu. 17 mai 2018 02:01, Jesse Anderson  a
> écrit :
>
>> This -> "I'd like that each time you think that you ask yourself "does
>> it need?"."
>>
>> On Wed, May 16, 2018 at 4:53 PM Robert Bradshaw 
>> wrote:
>>
>>> Thanks for your email, Romain. It helps understand your goals and where
>>> you're coming from. I'd also like to see a thinner core, and agree it's
>>> beneficial to reduce dependencies where possible, especially when
>>> supporting the usecase where the pipeline is constructed in an
>>> environment
>>> other than an end-user's main.
>>>
>>> It seems a lot of the portability work, despite being on the surface
>>> driven
>>> by multi-language, aligns well with many of these goals. For example, all
>>> the work going on in runners-core to provide a rich library that all
>>> (Java,
>>> and perhaps non-Java) runners can leverage to do DAG preprocessing
>>> (fusion,
>>> combiner lifting, ...) and handle the low-level details of managing
>>> worker
>>> subprocesses. As you state, the more we can put into these libraries, the
>>> more all runners can get "for free" by interacting with them, while still
>>> providing the flexibility to adapt to their differing models and
>>> strengths.
>>>
>>> Getting this right is, for me at least, one of the highest priorities for
>>> Beam.
>>>
>>> - Robert
>>> On Wed, May 16, 2018 at 11:51 AM Kenneth Knowles  wrote:
>>>
>>> > Hi Romain,
>>>
>>> > This gives a clear view of your perspective. I also recommend you ask
>>> around to those who have been working on Beam and big data processing
>>> for a
>>> long time to learn more about their perspective.
>>>
>>> > Your "Beam Analysis" is pretty accurate about what we've been trying to
>>> build. I would say (a) & (b) as "any language on any runner" and (c) is
>>> our
>>> plan of how to do it: define primitives which are fundamental to parallel
>>> processing and formalize a language-independent representation, with
>>> adapters for each language and data processing engine.
>>>
>>> > Of course anyone in the community may have their own particular goal.
>>> We
>>> don't control what they work on, and we are grateful for their efforts.
>>>
>>> > Technically, there is plenty to agree with. I think as you learn about
>>> Beam you will find that many of your suggestions are already handled in
>>> some way. You may also continue to learn sometimes about the specific
>>> reasons things are done in a different way than you expected. These
>>> should
>>> help you find how to build what you want to build.
>>>
>>> > Kenn
>>>
>>> > On Wed, May 16, 2018 at 1:14 AM Romain Manni-Bucau <
>>> rmannibu...@gmail.com>
>>> wrote:
>>>
>>> >> Hi guys,
>>>
>>> >> Since it is not the first time we have a thread where we end up not
>>> understanding each other, I'd like to take this as an opportunity to
>>> clarify what i'm looking for, in a more formal way. This assumes our
>>> misunderstandings come from the fact I mainly tried to fix issues one by
>>> ones, instead of painting the big picture I'm getting after. (My rational
>>> was I was not able to invest more time in that but I start to think it
>>> was
>>> not a good chocie). I really hope it helps.
>>>
>>> >> 1. Beam analysis
>>>
>>> >> Beam has three main goals:
>>>
>>> >> a. Being a portable API accross runners (I also call them
>>> "implementations" by opposition of "api")
>>> >> b. Bringing some interoperability between languages and therefore
>>> users
>>> >> c. Provide primitives (groupby for instance), I/O and generic
>>> processing
>>> items
>>>
>>> >> Indeed it doesn't cover all beam's features but, high level, it is
>>> what
>>> it brings.
>>>
>>> >> In terms of advantages and why choosing 

Re: JDBC support for Beam SQL

2018-05-17 Thread Andrew Pilloud
I hear some reasonable concerns around locking us into Calcite JDBC. I
agree that there are quite a few unknowns around if it would work at all.

I didn't think of option three, thanks for the suggestion Anton! I think
building JDBC from scratch would be a large, tedious project. We should
leverage whatever libraries we can to avoid the protocol compatibility work
on that front. However I do agree that rewriting the layer between Calcite
Avatica and the Calcite Planner might be the right path forward. I've stuck
to option one for now knowing that we will revisit options two and three in
a few months once some of our other feature concerns have a clear path
forward.

Pull request is here: https://github.com/apache/beam/pull/5399

Andrew

On Wed, May 16, 2018 at 10:32 AM Anton Kedin  wrote:

> Among these options I would lean towards option 1. We already support a
> lot of infrastructure to call into Calcite for non-JDBC path, so adding
> some code to generate config does not seem like a big of a deal, especially
> if it will be a supported way at some point in Calcite.
>
> Pulling implementation RelNode out of JDBC seems to bring a lot more
> unknowns:
>  - it feels it goes against the JDBC approach as we're basically going
> around JDBC result sets;
>  - we will expose 2 ways to extract results, with different schemas,
> types, etc;
>
> I think the third option is to implement the JDBC driver ourselves without
> using Calcite JDBC infrastructure. This way we have the only path into
> Calcite and control everything. I don't know how much effort it would take
> to implement a functional JDBC to cover our use cases though, but I think
> it's on a similar order of magnitude as we don't have to implement a lot of
> the API in the beginning, e.g. transactions, cursors, DML.
>
>
> On Wed, May 16, 2018 at 10:15 AM Kenneth Knowles  wrote:
>
>> IIUC in #2 Beam SQL would live on the other side of a JDBC boundary from
>> any use of it (including the BeamSQL transform). I'm a bit worried we'll
>> have a problem plumbing all the info we need, either now or later,
>> especially if we make funky extensions to support our version of SQL.
>>
>> Kenn
>>
>> On Wed, May 16, 2018 at 10:08 AM Andrew Pilloud 
>> wrote:
>>
>>> I'm currently adding JDBC support to Beam SQL! Unfortunately Calcite has
>>> two distinct entry points, one for JDBC and one for everything else (see
>>> CALCITE-1525). Eventually that will change, but I'd like to avoid having
>>> two versions of Beam SQL until Calcite converges on a single path for
>>> parsing SQL. Here are the options I am looking at:
>>>
>>> 1. Make JDBC the source of truth for Calcite config and state. Generate
>>> a FrameworkConfig based on the JDBC connection and continue to use the
>>> non-JDBC interface to Calcite. This option comes with the risk that the two
>>> paths into Calcite will diverge (as there is a bunch of code copied from
>>> Calcite to generate the config), but is the easiest to implement and
>>> understand.
>>>
>>> 2. Make JDBC the only path into Calcite. Use prepareStatement and unwrap
>>> to extract a BeamRelNode out of the JDBC interface. This eliminates a
>>> significant amount of code in Beam, but the unwrap path is a little
>>> convoluted.
>>>
>>> Both options leave the user facing non-JDBC interface to Beam SQL
>>> unchanged, these changes are internal.
>>>
>>> Andrew
>>>
>>


Jenkins build is back to normal : beam_SeedJob #1726

2018-05-17 Thread Apache Jenkins Server
See 



Re: Performance Testing - request for comments

2018-05-17 Thread Łukasz Gajowy
Hi, a small update on this:

I improved the command in Perfkit. If you're interested, below you can find
the link to the PR. I also noticed that the task used for running
integration tests sometimes gets cached (locally, this doesn't happen on
Jenkins). The PR to this issue is also below[2].

Regarding the idea to collect execution time from profile report: it seems
doable and would require changing the beam_integration_benchmark.py
implementation[3]. I postponed doing this as I find proper building a more
crucial task.

Best regards,
Łukasz

[1] https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/pull/1690
[2] https://github.com/apache/beam/pull/5395
[3]
https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/blob/bcec62124a432f7b9cc81af4da5f659723249189/perfkitbenchmarker/linux_benchmarks/beam_integration_benchmark.py#L161

2018-05-14 12:35 GMT+02:00 Łukasz Gajowy :

> Hi,
>
> thanks for all the advice - much appreciated! During the mvn -> gradle
> migration we just "translated" the existing mvn commands to gradle. We
> definitely need to improve them in PerfKit now. I also like Scott's idea
> about using the --profile flag. It would be awesome to utilize this in
> Perfkit so I will investigate the topic further too.
>
> Best regards,
> Łukasz
>
> 2018-05-10 1:55 GMT+02:00 Lukasz Cwik :
>
>> +1 on only specifying the target that you need to build, You should use
>> './gradlew -p path/to/java/project assemble' OR './gradlew
>> :project-artifact-name:assemble' to build the jars that you should need.
>> You can run these commands in a checked out version of your workspace and
>> validate that they produce what you expect.
>>
>> On Tue, May 8, 2018 at 9:17 AM Scott Wegner  wrote:
>>
>>> A few thoughts:
>>>
>>> 1. Gradle can intelligently build only the dependencies necessary for a
>>> task, so it shouldn't build all of Python if the test suite, if you only
>>> specify the task you're interested in. I'm not sure of the command for
>>> "build all of the dependencies of my tests but don't run my tests"; maybe
>>> "./gradlew mytests -x mytests" ?
>>>
>>> 2. Some tasks in the build are not yet cacheable for various reasons. So
>>> you may see them getting rebuilt on the second execution even on success,
>>> which would then be included in your overall build timing. Information
>>> about which tasks were used from the build cache is available in the Gradle
>>> build scan (--scan).
>>>
>>> Another idea for measuring the execution time of just your tests would
>>> be to pull this out of Gradle's build report.  Adding the --profile flag
>>> generates a report in $buildDir/reports/profile, which should have the
>>> timing info for just the task you're interested in:
>>> https://docs.gradle.org/current/userguide/command_line_interface.html
>>>
>>> On Tue, May 8, 2018 at 8:23 AM Łukasz Gajowy 
>>> wrote:
>>>
 Hi Beam Devs,

 currently PerfkitBenchmarker (a tool used to invoke performance tests)
 has two phases that run gradle commands:

- Pre-build phase: this is where all the beam repo is build. This
phase is to prepare the necessary artifacts so that it doesn't happen 
 when
executing tests.
- Actual test running phase. After all necessary code is built we
run the test and measure it's execution time. The execution time is
displayed on the PerfKit dashboard [1].

 After the recent mvn - gradle migration we noticed that we are unable
 to "Pre build" the code[2]. Because one of the python related tasks fails,
 the whole "preBuild" phase fails silently and the actual building happens
 in the "test running" phase which increases the execution time (this is
 visible in the plots on the dashboard).

 This whole situation made me wonder about several things, and I'd like
 to ask you for opinions. I think:

- we should skip all the python related tasks while building beam
for java performance tests in PerfKit. Those are not needed anyway when 
 we
are running java tests. Is it possible to skip them in one go (eg. the 
 same
fashion we skip all checks using -xcheck option)?
- the same goes for Python tests: we should skip all java related
tasks when building beam for python performance tests in PerfKit. Note 
 that
this bullet is something to be developed in the future, as
beam_PerformanceTests_Python job (the only Python Performance test job) 
 is
failing for 4 months now and seems abandoned. IMO it should be done when
someone will bring the test back to life. For now the job should be
disabled.
- we should modify Perfkit so that when the prebuild phase fails
for some reason, the test is not executed. Now we don't do this and the
test execution time depends on whether "gradle integrationTest" 

Re: com.google.api.services.clouddebugger.v2.CloudDebugger ???

2018-05-17 Thread Lukasz Cwik
Thanks Cham, forgot to mention about the recent migration to 1.23.0.

On Wed, May 16, 2018 at 5:56 PM Chamikara Jayalath 
wrote:

> Are you running using 2.4.0 or HEAD ? We upgraded google-api-client
> dependencies of HEAD to 1.23 last month:
> https://github.com/apache/beam/pull/5046/files
>
> If you are using HEAD make sure that you are not picking up clouddebugger
> 1.22 (or any other 1.22 dependency).
>
> Thanks,
> Cham
>
> On Wed, May 16, 2018 at 8:49 AM Lukasz Cwik  wrote:
>
>> The Dataflow worker relies on this dependency to be supplied as part of
>> the users application. Are you sure the way you build/package your
>> application hasn't changed in the past few days?
>>
>> Note that the DataflowRunner has specified
>> "google-api-services-clouddebugger" as a dependency[1]. The other issue
>> could be that a dependency of CloudDebugger is incompatible. There is a
>> known issue where "google-api-client" 1.22.0 is incompatible with 1.23.0
>> and hence all Google libraries that depend on "google-api-client" should
>> use a set of dependencies that are consistent with using a single version
>> of "google-api-client" (The DataflowRunner in Apache Beam uses 1.22.0).
>>
>> 1:
>> https://mvnrepository.com/artifact/org.apache.beam/beam-runners-google-cloud-dataflow-java/2.4.0
>>
>> On Wed, May 16, 2018 at 3:53 AM Frank Yellin  wrote:
>>
>>> Something just starting breaking on me in the last day or so.  I have
>>> the trivial program:
>>>
>>> public static void main(String[] args) {
>>>   PipelineOptions options =
>>>   PipelineOptionsFactory.fromArgs(args).create();
>>>   Pipeline pipeline = Pipeline.create(options);
>>> }
>>>
>>>
>>> If I call it with no arguments, it quietly does nothing, as expected.
>>> If I pass the argument
>>> --runner=dataflowRunner
>>> I get the error message
>>>
>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>> com/google/api/services/clouddebugger/v2/CloudDebugger
>>>
>>>
>>> I can't find any reference to v2/CloudDebugger and I do not know who is
>>> including it or calling it.  Has something changed in the last few days on
>>> GCE?
>>>
>>> Help!
>>>
>>>


Build failed in Jenkins: beam_SeedJob_Standalone #1014

2018-05-17 Thread Apache Jenkins Server
See 


--
Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on beam2 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/${ghprbPullId}/*:refs/remotes/origin/pr/${ghprbPullId}/*
 > git rev-parse origin/master^{commit} # timeout=10
Checking out Revision 9ba58eea7bbaffdb16f849836cf51c1f59282d06 (origin/master)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 9ba58eea7bbaffdb16f849836cf51c1f59282d06
Commit message: "Merge pull request #5290: [BEAM-3983] Restore BigQuery SQL 
Support with copied enums"
 > git rev-list --no-walk 9ba58eea7bbaffdb16f849836cf51c1f59282d06 # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
Processing DSL script job_00_seed.groovy
Processing DSL script job_Inventory.groovy
Processing DSL script job_PerformanceTests_Dataflow.groovy
Processing DSL script job_PerformanceTests_FileBasedIO_IT.groovy
Processing DSL script job_PerformanceTests_FileBasedIO_IT_HDFS.groovy
Processing DSL script job_PerformanceTests_HadoopInputFormat.groovy
Processing DSL script job_PerformanceTests_JDBC.groovy
Processing DSL script job_PerformanceTests_MongoDBIO_IT.groovy
Processing DSL script job_PerformanceTests_Python.groovy
Processing DSL script job_PerformanceTests_Spark.groovy
Processing DSL script job_PostCommit_Go_GradleBuild.groovy
Processing DSL script job_PostCommit_Java_GradleBuild.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Apex.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Dataflow.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Flink.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Gearpump.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Spark.groovy
Processing DSL script job_PostCommit_Python_ValidatesContainer_Dataflow.groovy
Processing DSL script job_PostCommit_Python_ValidatesRunner_Dataflow.groovy
java.lang.IllegalArgumentException: beam_PostCommit_Py_VR_Dataflow already 
exists
at hudson.model.Items.verifyItemDoesNotAlreadyExist(Items.java:640)
at hudson.model.AbstractItem.renameTo(AbstractItem.java:254)
at hudson.model.Job.renameTo(Job.java:657)
at 
javaposse.jobdsl.plugin.JenkinsJobManagement.renameJob(JenkinsJobManagement.java:558)
at 
javaposse.jobdsl.plugin.JenkinsJobManagement.renameJobMatching(JenkinsJobManagement.java:351)
at javaposse.jobdsl.dsl.JobManagement$renameJobMatching$5.call(Unknown 
Source)
at 
javaposse.jobdsl.plugin.InterruptibleJobManagement.renameJobMatching(InterruptibleJobManagement.groovy:51)
at javaposse.jobdsl.dsl.JobManagement$renameJobMatching$5.call(Unknown 
Source)
at 
javaposse.jobdsl.dsl.AbstractDslScriptLoader$_extractGeneratedJobs_closure4.doCall(AbstractDslScriptLoader.groovy:191)
at sun.reflect.GeneratedMethodAccessor5199.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
at 
org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022)
at groovy.lang.Closure.call(Closure.java:414)
at groovy.lang.Closure.call(Closure.java:430)
at 
org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2040)
at 
org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2025)
at 
org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2066)
at org.codehaus.groovy.runtime.dgm$162.invoke(Unknown Source)
at 
org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:274)
at 
org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:56)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
at 
javaposse.jobdsl.dsl.AbstractDslScriptLoader.extractGeneratedJobs(AbstractDslScriptLoader.groovy:187)
at 

Re: ElasticsearchIOTest failed during gradle build

2018-05-17 Thread Etienne Chauchot
Thanks for the tests and the details Tim !
Etienne 
Le jeudi 17 mai 2018 à 15:29 +0200, Tim Robertson a écrit :
> Hey folks,
> 
> I am new to gradle, but Boyuan and I had a chat on the slack beam late last 
> night (CEST) on this.
> 
> Here are my notes I've collected from my build attempts but I haven't yet 
> isolated the problem:
> 
>   - seemingly only happens with -PisRelease
>   - need --info and --stacktrace or else you miss detail
>   - it is sporadic and happens on different projects
>   - gradle caches come in to play (subsequent build might pass the stage) 
>     - race condition?
>     - I remove ~/.graddle each time
>   - I suspected jar signing - but I have commented that out and the issue 
> remains
>   - zip exceptions I have seen include:
>      -  archive is not a ZIP archive
>      - invalid block type
>      - too many length or distance symbols
>   - It is using the zip reader org.apache.tools.zip.ZipFile (from Ant I 
> believe)
> 
> I hope this helps,
> Tim
> 
> 
> On Thu, May 17, 2018 at 3:15 PM, Etienne Chauchot  
> wrote:
> > Hey,
> > Thanks for pointing out ! I'll take a look. Very strange ZipException
> > 
> > Etienne
> > 
> > Le mercredi 16 mai 2018 à 11:50 -0700, Boyuan Zhang a écrit :
> > > Hey all,
> > > 
> > > I'm working on debugging the process of release process and when running 
> > > ./gradlew -PisRelease clean build, I got
> > > several tests failed. Here is one build scan: 
> > > https://scans.gradle.com/s/t4ryx7y3jhdeo/console-log?task=:beam-sdks
> > > -java-io-elasticsearch-tests-5:test#L3. Any idea about why this happened?
> > > 
> > > Thanks for all your help!
> > > 
> > > Boyuan

Build failed in Jenkins: beam_SeedJob #1725

2018-05-17 Thread Apache Jenkins Server
See 

--
GitHub pull request #5242 of commit eb00d20f65e394b263fed9a9f7958ea58fb782ca, 
no merge conflicts.
Setting status of eb00d20f65e394b263fed9a9f7958ea58fb782ca to PENDING with url 
https://builds.apache.org/job/beam_SeedJob/1725/ and message: 'Build started 
sha1 is merged.'
Using context: Jenkins: Seed Job
[EnvInject] - Loading node environment variables.
Building remotely on beam14 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/5242/*:refs/remotes/origin/pr/5242/*
 > git rev-parse refs/remotes/origin/pr/5242/merge^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/pr/5242/merge^{commit} # timeout=10
Checking out Revision 080f09d078086c79adc4d4655c69dda3e13b9f2d 
(refs/remotes/origin/pr/5242/merge)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 080f09d078086c79adc4d4655c69dda3e13b9f2d
Commit message: "Merge eb00d20f65e394b263fed9a9f7958ea58fb782ca into 
9ba58eea7bbaffdb16f849836cf51c1f59282d06"
First time build. Skipping changelog.
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
Processing DSL script job_00_seed.groovy
Processing DSL script job_Inventory.groovy
Processing DSL script job_PerformanceTests_Dataflow.groovy
Processing DSL script job_PerformanceTests_FileBasedIO_IT.groovy
Processing DSL script job_PerformanceTests_FileBasedIO_IT_HDFS.groovy
Processing DSL script job_PerformanceTests_HadoopInputFormat.groovy
Processing DSL script job_PerformanceTests_JDBC.groovy
Processing DSL script job_PerformanceTests_MongoDBIO_IT.groovy
Processing DSL script job_PerformanceTests_Python.groovy
Processing DSL script job_PerformanceTests_Spark.groovy
Processing DSL script job_PostCommit_Go_GradleBuild.groovy
Processing DSL script job_PostCommit_Java_GradleBuild.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Apex.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Dataflow.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Flink.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Gearpump.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Spark.groovy
Processing DSL script job_PostCommit_Python_ValidatesContainer_Dataflow.groovy
Processing DSL script job_PostCommit_Python_ValidatesRunner_Dataflow.groovy
java.lang.IllegalArgumentException: beam_PostCommit_Py_VR_Dataflow already 
exists
at hudson.model.Items.verifyItemDoesNotAlreadyExist(Items.java:640)
at hudson.model.AbstractItem.renameTo(AbstractItem.java:254)
at hudson.model.Job.renameTo(Job.java:657)
at 
javaposse.jobdsl.plugin.JenkinsJobManagement.renameJob(JenkinsJobManagement.java:558)
at 
javaposse.jobdsl.plugin.JenkinsJobManagement.renameJobMatching(JenkinsJobManagement.java:351)
at javaposse.jobdsl.dsl.JobManagement$renameJobMatching$5.call(Unknown 
Source)
at 
javaposse.jobdsl.plugin.InterruptibleJobManagement.renameJobMatching(InterruptibleJobManagement.groovy:51)
at javaposse.jobdsl.dsl.JobManagement$renameJobMatching$5.call(Unknown 
Source)
at 
javaposse.jobdsl.dsl.AbstractDslScriptLoader$_extractGeneratedJobs_closure4.doCall(AbstractDslScriptLoader.groovy:191)
at sun.reflect.GeneratedMethodAccessor5199.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
at 
org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022)
at groovy.lang.Closure.call(Closure.java:414)
at groovy.lang.Closure.call(Closure.java:430)
at 
org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2040)
at 
org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2025)
at 
org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2066)
at org.codehaus.groovy.runtime.dgm$162.invoke(Unknown Source)
at 
org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:274)
at 

Re: ElasticsearchIOTest failed during gradle build

2018-05-17 Thread Tim Robertson
Hey folks,

I am new to gradle, but Boyuan and I had a chat on the slack beam late last
night (CEST) on this.

Here are my notes I've collected from my build attempts but I haven't yet
isolated the problem:

  - seemingly only happens with -PisRelease
  - need --info and --stacktrace or else you miss detail
  - it is sporadic and happens on different projects
  - gradle caches come in to play (subsequent build might pass the stage)
- race condition?
- I remove ~/.graddle each time
  - I suspected jar signing - but I have commented that out and the issue
remains
  - zip exceptions I have seen include:
 -  archive is not a ZIP archive
 - invalid block type
 - too many length or distance symbols
  - It is using the zip reader org.apache.tools.zip.ZipFile (from Ant I
believe)

I hope this helps,
Tim


On Thu, May 17, 2018 at 3:15 PM, Etienne Chauchot 
wrote:

> Hey,
> Thanks for pointing out ! I'll take a look. Very strange ZipException
>
> Etienne
>
> Le mercredi 16 mai 2018 à 11:50 -0700, Boyuan Zhang a écrit :
>
> Hey all,
>
> I'm working on debugging the process of release process and when running
> ./gradlew -PisRelease clean build, I got several tests failed. Here is one
> build scan: https://scans.gradle.com/s/t4ryx7y3jhdeo/console-log?
> task=:beam-sdks-java-io-elasticsearch-tests-5:test#L3. Any idea about why
> this happened?
>
> Thanks for all your help!
>
> Boyuan
>
>


Re: ElasticsearchIOTest failed during gradle build

2018-05-17 Thread Etienne Chauchot
Hey,
Thanks for pointing out ! I'll take a look. Very strange ZipException

Etienne

Le mercredi 16 mai 2018 à 11:50 -0700, Boyuan Zhang a écrit :
> Hey all,
> 
> I'm working on debugging the process of release process and when running 
> ./gradlew -PisRelease clean build, I got
> several tests failed. Here is one build scan: 
> https://scans.gradle.com/s/t4ryx7y3jhdeo/console-log?task=:beam-sdks-jav
> a-io-elasticsearch-tests-5:test#L3. Any idea about why this happened?
> 
> Thanks for all your help!
> 
> Boyuan

Build failed in Jenkins: beam_SeedJob #1724

2018-05-17 Thread Apache Jenkins Server
See 

--
Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on beam12 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/${ghprbPullId}/*:refs/remotes/origin/pr/${ghprbPullId}/*
 > git rev-parse origin/master^{commit} # timeout=10
Checking out Revision 9ba58eea7bbaffdb16f849836cf51c1f59282d06 (origin/master)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 9ba58eea7bbaffdb16f849836cf51c1f59282d06
Commit message: "Merge pull request #5290: [BEAM-3983] Restore BigQuery SQL 
Support with copied enums"
 > git rev-list --no-walk 9ba58eea7bbaffdb16f849836cf51c1f59282d06 # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
Processing DSL script job_00_seed.groovy
Processing DSL script job_Inventory.groovy
Processing DSL script job_PerformanceTests_Dataflow.groovy
Processing DSL script job_PerformanceTests_FileBasedIO_IT.groovy
Processing DSL script job_PerformanceTests_FileBasedIO_IT_HDFS.groovy
Processing DSL script job_PerformanceTests_HadoopInputFormat.groovy
Processing DSL script job_PerformanceTests_JDBC.groovy
Processing DSL script job_PerformanceTests_MongoDBIO_IT.groovy
Processing DSL script job_PerformanceTests_Python.groovy
Processing DSL script job_PerformanceTests_Spark.groovy
Processing DSL script job_PostCommit_Go_GradleBuild.groovy
Processing DSL script job_PostCommit_Java_GradleBuild.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Apex.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Dataflow.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Flink.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Gearpump.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Spark.groovy
Processing DSL script job_PostCommit_Python_ValidatesContainer_Dataflow.groovy
Processing DSL script job_PostCommit_Python_ValidatesRunner_Dataflow.groovy
java.lang.IllegalArgumentException: beam_PostCommit_Py_VR_Dataflow already 
exists
at hudson.model.Items.verifyItemDoesNotAlreadyExist(Items.java:640)
at hudson.model.AbstractItem.renameTo(AbstractItem.java:254)
at hudson.model.Job.renameTo(Job.java:657)
at 
javaposse.jobdsl.plugin.JenkinsJobManagement.renameJob(JenkinsJobManagement.java:558)
at 
javaposse.jobdsl.plugin.JenkinsJobManagement.renameJobMatching(JenkinsJobManagement.java:351)
at javaposse.jobdsl.dsl.JobManagement$renameJobMatching$5.call(Unknown 
Source)
at 
javaposse.jobdsl.plugin.InterruptibleJobManagement.renameJobMatching(InterruptibleJobManagement.groovy:51)
at javaposse.jobdsl.dsl.JobManagement$renameJobMatching$5.call(Unknown 
Source)
at 
javaposse.jobdsl.dsl.AbstractDslScriptLoader$_extractGeneratedJobs_closure4.doCall(AbstractDslScriptLoader.groovy:191)
at sun.reflect.GeneratedMethodAccessor5199.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
at 
org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022)
at groovy.lang.Closure.call(Closure.java:414)
at groovy.lang.Closure.call(Closure.java:430)
at 
org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2040)
at 
org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2025)
at 
org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2066)
at org.codehaus.groovy.runtime.dgm$162.invoke(Unknown Source)
at 
org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:274)
at 
org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:56)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
at 
javaposse.jobdsl.dsl.AbstractDslScriptLoader.extractGeneratedJobs(AbstractDslScriptLoader.groovy:187)
at sun.reflect.GeneratedMethodAccessor5281.invoke(Unknown Source)

Re: Performance Testing Dashboard - which results should be official?

2018-05-17 Thread Łukasz Gajowy
That is correct - I asked for purely organizational purposes. Please keep
in mind that there is still some work to do in terms of getting rid of some
test flakiness, properly building the test code before running the tests
and detecting the anomalies/regressions that happen in IOs. We're working
on it and will inform the community when it's done.

Thank you for all the comments so far!

2018-05-16 23:11 GMT+02:00 Kenneth Knowles :

> Commented on the JIRA. I think this topic isn't so much about
> runner-to-runner comparison but just getting organized. For me working on a
> particular runner or IO or DSL the results are very helpful for seeing
> trends over time.
>
> On Wed, May 16, 2018 at 7:05 AM Jean-Baptiste Onofré 
> wrote:
>
>> Hi Lukasz,
>>
>> Thanks, gonna comment in the Jira.
>>
>> Generally speaking, I'm not a big fan to compare a runner versus
>> another, because there are bunch of parameters that can influence the
>> results.
>>
>> Regards
>> JB
>>
>> On 16/05/2018 15:54, Łukasz Gajowy wrote:
>> > Hi all,
>> >
>> > I created an issue which I believe is interesting in terms of what
>> > should be included in the Performance Testing dashboard and what
>> > shouldn't. Speaking more generally, we have to settle which
>> > results should be treated as official ones. The issue description
>> > contains my idea of solving it, but I might miss something there. If
>> > you're interested in this topic and willing to contribute you're
>> welcome
>> > to!
>> >
>> > Issue link: https://issues.apache.org/jira/browse/BEAM-4298
>> >
>> > (please note that there's a related issue linked)
>> >
>> >
>> > Best regards,
>> > Łukasz Gajowy
>>
>


Build failed in Jenkins: beam_SeedJob #1723

2018-05-17 Thread Apache Jenkins Server
See 

--
GitHub pull request #5180 of commit c89c4aba92e38f6e7b7adca7ce0165e679b76690, 
no merge conflicts.
Setting status of c89c4aba92e38f6e7b7adca7ce0165e679b76690 to PENDING with url 
https://builds.apache.org/job/beam_SeedJob/1723/ and message: 'Build started 
sha1 is merged.'
Using context: Jenkins: Seed Job
[EnvInject] - Loading node environment variables.
Building remotely on beam14 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/5180/*:refs/remotes/origin/pr/5180/*
 > git rev-parse refs/remotes/origin/pr/5180/merge^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/pr/5180/merge^{commit} # timeout=10
Checking out Revision 88b5cdf16d91b75b04f43d1618fdb8896564e1d0 
(refs/remotes/origin/pr/5180/merge)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 88b5cdf16d91b75b04f43d1618fdb8896564e1d0
Commit message: "Merge c89c4aba92e38f6e7b7adca7ce0165e679b76690 into 
9ba58eea7bbaffdb16f849836cf51c1f59282d06"
First time build. Skipping changelog.
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
Processing DSL script job_00_seed.groovy
Processing DSL script job_Inventory.groovy
Processing DSL script job_PerformanceTests_Dataflow.groovy
Processing DSL script job_PerformanceTests_FileBasedIO_IT.groovy
Processing DSL script job_PerformanceTests_FileBasedIO_IT_HDFS.groovy
Processing DSL script job_PerformanceTests_HadoopInputFormat.groovy
Processing DSL script job_PerformanceTests_JDBC.groovy
Processing DSL script job_PerformanceTests_MongoDBIO_IT.groovy
Processing DSL script job_PerformanceTests_Python.groovy
Processing DSL script job_PerformanceTests_Spark.groovy
Processing DSL script job_PostCommit_Go_GradleBuild.groovy
Processing DSL script job_PostCommit_Java_GradleBuild.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Apex.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Dataflow.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Flink.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Gearpump.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Spark.groovy
Processing DSL script job_PostCommit_Python_ValidatesContainer_Dataflow.groovy
Processing DSL script job_PostCommit_Python_ValidatesRunner_Dataflow.groovy
java.lang.IllegalArgumentException: beam_PostCommit_Py_VR_Dataflow already 
exists
at hudson.model.Items.verifyItemDoesNotAlreadyExist(Items.java:640)
at hudson.model.AbstractItem.renameTo(AbstractItem.java:254)
at hudson.model.Job.renameTo(Job.java:657)
at 
javaposse.jobdsl.plugin.JenkinsJobManagement.renameJob(JenkinsJobManagement.java:558)
at 
javaposse.jobdsl.plugin.JenkinsJobManagement.renameJobMatching(JenkinsJobManagement.java:351)
at javaposse.jobdsl.dsl.JobManagement$renameJobMatching$5.call(Unknown 
Source)
at 
javaposse.jobdsl.plugin.InterruptibleJobManagement.renameJobMatching(InterruptibleJobManagement.groovy:51)
at javaposse.jobdsl.dsl.JobManagement$renameJobMatching$5.call(Unknown 
Source)
at 
javaposse.jobdsl.dsl.AbstractDslScriptLoader$_extractGeneratedJobs_closure4.doCall(AbstractDslScriptLoader.groovy:191)
at sun.reflect.GeneratedMethodAccessor5199.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
at 
org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022)
at groovy.lang.Closure.call(Closure.java:414)
at groovy.lang.Closure.call(Closure.java:430)
at 
org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2040)
at 
org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2025)
at 
org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2066)
at org.codehaus.groovy.runtime.dgm$162.invoke(Unknown Source)
at 
org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:274)
at 

Build failed in Jenkins: beam_SeedJob_Standalone #1013

2018-05-17 Thread Apache Jenkins Server
See 


--
Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on beam2 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/${ghprbPullId}/*:refs/remotes/origin/pr/${ghprbPullId}/*
 > git rev-parse origin/master^{commit} # timeout=10
Checking out Revision 9ba58eea7bbaffdb16f849836cf51c1f59282d06 (origin/master)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 9ba58eea7bbaffdb16f849836cf51c1f59282d06
Commit message: "Merge pull request #5290: [BEAM-3983] Restore BigQuery SQL 
Support with copied enums"
 > git rev-list --no-walk 9ba58eea7bbaffdb16f849836cf51c1f59282d06 # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
Processing DSL script job_00_seed.groovy
Processing DSL script job_Inventory.groovy
Processing DSL script job_PerformanceTests_Dataflow.groovy
Processing DSL script job_PerformanceTests_FileBasedIO_IT.groovy
Processing DSL script job_PerformanceTests_FileBasedIO_IT_HDFS.groovy
Processing DSL script job_PerformanceTests_HadoopInputFormat.groovy
Processing DSL script job_PerformanceTests_JDBC.groovy
Processing DSL script job_PerformanceTests_MongoDBIO_IT.groovy
Processing DSL script job_PerformanceTests_Python.groovy
Processing DSL script job_PerformanceTests_Spark.groovy
Processing DSL script job_PostCommit_Go_GradleBuild.groovy
Processing DSL script job_PostCommit_Java_GradleBuild.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Apex.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Dataflow.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Flink.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Gearpump.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Spark.groovy
Processing DSL script job_PostCommit_Python_ValidatesContainer_Dataflow.groovy
Processing DSL script job_PostCommit_Python_ValidatesRunner_Dataflow.groovy
java.lang.IllegalArgumentException: beam_PostCommit_Py_VR_Dataflow already 
exists
at hudson.model.Items.verifyItemDoesNotAlreadyExist(Items.java:640)
at hudson.model.AbstractItem.renameTo(AbstractItem.java:254)
at hudson.model.Job.renameTo(Job.java:657)
at 
javaposse.jobdsl.plugin.JenkinsJobManagement.renameJob(JenkinsJobManagement.java:558)
at 
javaposse.jobdsl.plugin.JenkinsJobManagement.renameJobMatching(JenkinsJobManagement.java:351)
at javaposse.jobdsl.dsl.JobManagement$renameJobMatching$5.call(Unknown 
Source)
at 
javaposse.jobdsl.plugin.InterruptibleJobManagement.renameJobMatching(InterruptibleJobManagement.groovy:51)
at javaposse.jobdsl.dsl.JobManagement$renameJobMatching$5.call(Unknown 
Source)
at 
javaposse.jobdsl.dsl.AbstractDslScriptLoader$_extractGeneratedJobs_closure4.doCall(AbstractDslScriptLoader.groovy:191)
at sun.reflect.GeneratedMethodAccessor5199.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
at 
org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022)
at groovy.lang.Closure.call(Closure.java:414)
at groovy.lang.Closure.call(Closure.java:430)
at 
org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2040)
at 
org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2025)
at 
org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2066)
at org.codehaus.groovy.runtime.dgm$162.invoke(Unknown Source)
at 
org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:274)
at 
org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:56)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
at 
javaposse.jobdsl.dsl.AbstractDslScriptLoader.extractGeneratedJobs(AbstractDslScriptLoader.groovy:187)
at 

Build failed in Jenkins: beam_SeedJob #1722

2018-05-17 Thread Apache Jenkins Server
See 


Changes:

[kmj] Bugfix: Read BQ bytes processed from correct field.

[apilloud] [BEAM-3983] Add utils for converting to BigQuery types

[apilloud] [BEAM-3983][SQL] Add BigQuery table provider

[apilloud] [BEAM-4248] Copy enums from com.google.cloud

[github] [BEAM-4300] Fix ValidatesRunner tests in Python: run with same 
mechanism

--
Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on beam12 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/${ghprbPullId}/*:refs/remotes/origin/pr/${ghprbPullId}/*
 > git rev-parse origin/master^{commit} # timeout=10
Checking out Revision 9ba58eea7bbaffdb16f849836cf51c1f59282d06 (origin/master)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 9ba58eea7bbaffdb16f849836cf51c1f59282d06
Commit message: "Merge pull request #5290: [BEAM-3983] Restore BigQuery SQL 
Support with copied enums"
 > git rev-list --no-walk c3c2ffdce7a4da2cf65f47ff8cb01f30f423170a # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
Processing DSL script job_00_seed.groovy
Processing DSL script job_Inventory.groovy
Processing DSL script job_PerformanceTests_Dataflow.groovy
Processing DSL script job_PerformanceTests_FileBasedIO_IT.groovy
Processing DSL script job_PerformanceTests_FileBasedIO_IT_HDFS.groovy
Processing DSL script job_PerformanceTests_HadoopInputFormat.groovy
Processing DSL script job_PerformanceTests_JDBC.groovy
Processing DSL script job_PerformanceTests_MongoDBIO_IT.groovy
Processing DSL script job_PerformanceTests_Python.groovy
Processing DSL script job_PerformanceTests_Spark.groovy
Processing DSL script job_PostCommit_Go_GradleBuild.groovy
Processing DSL script job_PostCommit_Java_GradleBuild.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Apex.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Dataflow.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Flink.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Gearpump.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Spark.groovy
Processing DSL script job_PostCommit_Python_ValidatesContainer_Dataflow.groovy
Processing DSL script job_PostCommit_Python_ValidatesRunner_Dataflow.groovy
java.lang.IllegalArgumentException: beam_PostCommit_Py_VR_Dataflow already 
exists
at hudson.model.Items.verifyItemDoesNotAlreadyExist(Items.java:640)
at hudson.model.AbstractItem.renameTo(AbstractItem.java:254)
at hudson.model.Job.renameTo(Job.java:657)
at 
javaposse.jobdsl.plugin.JenkinsJobManagement.renameJob(JenkinsJobManagement.java:558)
at 
javaposse.jobdsl.plugin.JenkinsJobManagement.renameJobMatching(JenkinsJobManagement.java:351)
at javaposse.jobdsl.dsl.JobManagement$renameJobMatching$5.call(Unknown 
Source)
at 
javaposse.jobdsl.plugin.InterruptibleJobManagement.renameJobMatching(InterruptibleJobManagement.groovy:51)
at javaposse.jobdsl.dsl.JobManagement$renameJobMatching$5.call(Unknown 
Source)
at 
javaposse.jobdsl.dsl.AbstractDslScriptLoader$_extractGeneratedJobs_closure4.doCall(AbstractDslScriptLoader.groovy:191)
at sun.reflect.GeneratedMethodAccessor4451.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
at 
org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022)
at groovy.lang.Closure.call(Closure.java:414)
at groovy.lang.Closure.call(Closure.java:430)
at 
org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2040)
at 
org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2025)
at 
org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2066)
at org.codehaus.groovy.runtime.dgm$162.invoke(Unknown Source)
at 
org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:274)
at