[jira] [Commented] (BEAM-2527) Test queries on unbounded PCollections with BeamSql dsl API

2017-06-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069539#comment-16069539
 ] 

ASF GitHub Bot commented on BEAM-2527:
--

GitHub user XuMingmin opened a pull request:

https://github.com/apache/beam/pull/3477

[BEAM-2527] Test queries on unbounded PCollections with BeamSql dsl API

R: @xumingming @takidau 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/XuMingmin/beam BEAM-2527

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3477.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3477


commit 57c7e777cdc47975630c24b77ccf2c79381556c0
Author: mingmxu 
Date:   2017-06-29T03:54:02Z

add tests with unbounded pcollections.




> Test queries on unbounded PCollections with BeamSql dsl API
> ---
>
> Key: BEAM-2527
> URL: https://issues.apache.org/jira/browse/BEAM-2527
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>  Labels: dsl_sql_merge
>
> Beam SQL supports both BOUNDED and UNBOUNDED PCollections. 
> This task will add unit tests for UNBOUNDED PCollection. For BOUNDED 
> PCollection, see BEAM-2452.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3477: [BEAM-2527] Test queries on unbounded PCollections ...

2017-06-29 Thread XuMingmin
GitHub user XuMingmin opened a pull request:

https://github.com/apache/beam/pull/3477

[BEAM-2527] Test queries on unbounded PCollections with BeamSql dsl API

R: @xumingming @takidau 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/XuMingmin/beam BEAM-2527

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3477.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3477


commit 57c7e777cdc47975630c24b77ccf2c79381556c0
Author: mingmxu 
Date:   2017-06-29T03:54:02Z

add tests with unbounded pcollections.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-2529) SpannerWriteIT failing in postcommit

2017-06-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069500#comment-16069500
 ] 

Jean-Baptiste Onofré commented on BEAM-2529:


It's now OK: 
https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_MavenInstall/4271/

Thanks !

> SpannerWriteIT failing in postcommit
> 
>
> Key: BEAM-2529
> URL: https://issues.apache.org/jira/browse/BEAM-2529
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Blocker
> Fix For: 2.1.0
>
>
> There are some NPEs and some other issues after the upgrade of commons-lang3 
> in 
> https://github.com/apache/beam/commit/6681472a2aa277ba83fd9e2ffec5a57c46d5820c#diff-f74bb28a007fa6d75d065f14a7c76b38



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (BEAM-2529) SpannerWriteIT failing in postcommit

2017-06-29 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/BEAM-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré resolved BEAM-2529.

   Resolution: Fixed
Fix Version/s: 2.1.0

> SpannerWriteIT failing in postcommit
> 
>
> Key: BEAM-2529
> URL: https://issues.apache.org/jira/browse/BEAM-2529
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Blocker
> Fix For: 2.1.0
>
>
> There are some NPEs and some other issues after the upgrade of commons-lang3 
> in 
> https://github.com/apache/beam/commit/6681472a2aa277ba83fd9e2ffec5a57c46d5820c#diff-f74bb28a007fa6d75d065f14a7c76b38



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3437: Add commons-text to shaded jar to fix SpannerIO Wri...

2017-06-29 Thread jbonofre
Github user jbonofre closed the pull request at:

https://github.com/apache/beam/pull/3437


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #3484

2017-06-29 Thread Apache Jenkins Server
See 




Jenkins build is back to normal : beam_PostCommit_Java_MavenInstall #4271

2017-06-29 Thread Apache Jenkins Server
See 




[jira] [Assigned] (BEAM-2230) Core SDK ApiSurface should be only org.apache.beam.sdk and should be defined outside of the general ApiSurface class

2017-06-29 Thread Innocent (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Innocent reassigned BEAM-2230:
--

Assignee: Innocent

> Core SDK ApiSurface should be only org.apache.beam.sdk and should be defined 
> outside of the general ApiSurface class
> 
>
> Key: BEAM-2230
> URL: https://issues.apache.org/jira/browse/BEAM-2230
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Innocent
>
> Currenlty, ApiSurface.getSdkApiSurface() is highly specialized and also not 
> correct.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2231) ApiSurface should be lazy

2017-06-29 Thread Innocent (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Innocent reassigned BEAM-2231:
--

Assignee: Innocent

> ApiSurface should be lazy
> -
>
> Key: BEAM-2231
> URL: https://issues.apache.org/jira/browse/BEAM-2231
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Innocent
>
> Currently, the ApiSurface loads classes recursively, when they should be 
> pruned before loading by the pruning pattern. This has caused crashes because 
> some classes that are never referenced in our code.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-1785) Port game example in Beam repo to use InjectorUnboundedSource

2017-06-29 Thread Innocent (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Innocent reassigned BEAM-1785:
--

Assignee: Innocent

> Port game example in Beam repo to use InjectorUnboundedSource
> -
>
> Key: BEAM-1785
> URL: https://issues.apache.org/jira/browse/BEAM-1785
> Project: Beam
>  Issue Type: Improvement
>  Components: examples-java
>Reporter: Kenneth Knowles
>Assignee: Innocent
>Priority: Minor
>
> We developed an UnboundedSource so users didn't have to run a separate 
> injector, but it isn't in the Beam example. We should put it there, which 
> eliminates a big chunk of complexity.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-1939) Serialize more coders via URN + Class name

2017-06-29 Thread Innocent (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069461#comment-16069461
 ] 

Innocent commented on BEAM-1939:


Hi Thomas, can you please provide more information on this ticket? I did not 
found any Coders.java class in runner-core/constructor may be this have changed 
since the ticket was created. Did you have any particular class in mind when 
you created the ticket?
Thanks.

> Serialize more coders via URN + Class name
> --
>
> Key: BEAM-1939
> URL: https://issues.apache.org/jira/browse/BEAM-1939
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Innocent
>Priority: Trivial
>
> If the size of serializing Standard Coders becomes too large, an arbitrary 
> Standard Coder can be encoded, alongside its components, via an URN and 
> looking up the class when it is to be deserialized.
> See 
> https://github.com/tgroh/beam/commit/070854845346d8e4df824e4aa374688bd095c2c6 
> as an example



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2519

2017-06-29 Thread Apache Jenkins Server
See 




[2/2] beam git commit: This closes #3472: Remove Apache Commons dependency which DataflowRunner appears to not stage

2017-06-29 Thread kenn
This closes #3472: Remove Apache Commons dependency which DataflowRunner 
appears to not stage

  Ditch apache commons


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/893bf428
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/893bf428
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/893bf428

Branch: refs/heads/master
Commit: 893bf428a279e4c7f81accdd5e7eb7a4aaebb6b2
Parents: 7fc73d7 39a2ed0
Author: Kenneth Knowles 
Authored: Thu Jun 29 20:26:09 2017 -0700
Committer: Kenneth Knowles 
Committed: Thu Jun 29 20:26:09 2017 -0700

--
 sdks/java/io/google-cloud-platform/pom.xml  | 11 --
 .../beam/sdk/io/gcp/spanner/RandomUtils.java| 41 
 .../beam/sdk/io/gcp/spanner/SpannerReadIT.java  | 11 ++
 .../beam/sdk/io/gcp/spanner/SpannerWriteIT.java | 10 ++---
 4 files changed, 47 insertions(+), 26 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/893bf428/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java
--

http://git-wip-us.apache.org/repos/asf/beam/blob/893bf428/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerWriteIT.java
--



[1/2] beam git commit: Ditch apache commons

2017-06-29 Thread kenn
Repository: beam
Updated Branches:
  refs/heads/master 7fc73d790 -> 893bf428a


Ditch apache commons


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/39a2ed0c
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/39a2ed0c
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/39a2ed0c

Branch: refs/heads/master
Commit: 39a2ed0ccb53bcc96c179c64405c80226bac7b9b
Parents: 2dd1907
Author: Mairbek Khadikov 
Authored: Thu Jun 29 10:12:50 2017 -0700
Committer: Mairbek Khadikov 
Committed: Thu Jun 29 10:12:50 2017 -0700

--
 sdks/java/io/google-cloud-platform/pom.xml  | 11 --
 .../beam/sdk/io/gcp/spanner/RandomUtils.java| 41 
 .../beam/sdk/io/gcp/spanner/SpannerReadIT.java  | 11 ++
 .../beam/sdk/io/gcp/spanner/SpannerWriteIT.java | 10 ++---
 4 files changed, 47 insertions(+), 26 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/39a2ed0c/sdks/java/io/google-cloud-platform/pom.xml
--
diff --git a/sdks/java/io/google-cloud-platform/pom.xml 
b/sdks/java/io/google-cloud-platform/pom.xml
index 94066c7..09a430a 100644
--- a/sdks/java/io/google-cloud-platform/pom.xml
+++ b/sdks/java/io/google-cloud-platform/pom.xml
@@ -258,19 +258,8 @@
   proto-google-common-protos
 
 
-
-  org.apache.commons
-  commons-lang3
-  provided
-
-
 
 
-  org.apache.commons
-  commons-text
-  test
-
-
   org.apache.beam
   beam-sdks-java-core
   tests

http://git-wip-us.apache.org/repos/asf/beam/blob/39a2ed0c/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/RandomUtils.java
--
diff --git 
a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/RandomUtils.java
 
b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/RandomUtils.java
new file mode 100644
index 000..f479b4a
--- /dev/null
+++ 
b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/RandomUtils.java
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.gcp.spanner;
+
+import java.util.Random;
+
+/**
+ * Useful randomness related utilities.
+ */
+public class RandomUtils {
+
+  private static final char[] ALPHANUMERIC = 
"1234567890abcdefghijklmnopqrstuvwxyz".toCharArray();
+
+  private RandomUtils() {
+  }
+
+  public static String randomAlphaNumeric(int length) {
+Random random = new Random();
+char[] result = new char[length];
+for (int i = 0; i < length; i++) {
+  result[i] = ALPHANUMERIC[random.nextInt(ALPHANUMERIC.length)];
+}
+return new String(result);
+  }
+
+}

http://git-wip-us.apache.org/repos/asf/beam/blob/39a2ed0c/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java
--
diff --git 
a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java
 
b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java
index ca43b40..9f7c64e 100644
--- 
a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java
+++ 
b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java
@@ -40,8 +40,6 @@ import org.apache.beam.sdk.testing.TestPipelineOptions;
 import org.apache.beam.sdk.transforms.Count;
 import org.apache.beam.sdk.values.PCollection;
 import org.apache.beam.sdk.values.PCollectionView;
-import org.apache.commons.lang3.RandomStringUtils;
-import org.apache.commons.text.RandomStringGenerator;
 import org.junit.After;
 import org.junit.Before;
 import org.junit.Rule;
@@ -127,7 +125,7 @@ public class SpannerReadIT {
   .set("key")
 

[GitHub] beam pull request #3472: Get rid of Apache Commons dependency

2017-06-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3472


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-2529) SpannerWriteIT failing in postcommit

2017-06-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069415#comment-16069415
 ] 

ASF GitHub Bot commented on BEAM-2529:
--

Github user kennknowles closed the pull request at:

https://github.com/apache/beam/pull/3476


> SpannerWriteIT failing in postcommit
> 
>
> Key: BEAM-2529
> URL: https://issues.apache.org/jira/browse/BEAM-2529
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Blocker
>
> There are some NPEs and some other issues after the upgrade of commons-lang3 
> in 
> https://github.com/apache/beam/commit/6681472a2aa277ba83fd9e2ffec5a57c46d5820c#diff-f74bb28a007fa6d75d065f14a7c76b38



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3476: [BEAM-2529] Fix dependency scope error in GCP IO mo...

2017-06-29 Thread kennknowles
Github user kennknowles closed the pull request at:

https://github.com/apache/beam/pull/3476


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build became unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #3483

2017-06-29 Thread Apache Jenkins Server
See 




Jenkins build is back to stable : beam_PostCommit_Java_MavenInstall #4269

2017-06-29 Thread Apache Jenkins Server
See 




[jira] [Assigned] (BEAM-1187) GCP Transport not performing timed backoff after connection failure

2017-06-29 Thread Innocent (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Innocent reassigned BEAM-1187:
--

Assignee: Luke Cwik  (was: Innocent)

> GCP Transport not performing timed backoff after connection failure
> ---
>
> Key: BEAM-1187
> URL: https://issues.apache.org/jira/browse/BEAM-1187
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-java-core, sdk-java-gcp
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Minor
>
> The http request retries are failing and seemingly being immediately retried 
> if there is a connection exception. Note that below all the times are the 
> same, and also that we are logging too much. This seems to be related to the 
> interaction by the chaining http request initializer combining the Credential 
> initializer followed by the RetryHttpRequestInitializer. Also, note that we 
> never log "Request failed with IOException, will NOT retry" which implies 
> that the retry logic never made it to the RetryHttpRequestInitializer.
> Action items are:
> 1) Ensure that the RetryHttpRequestInitializer is used
> 2) Ensure that calls do backoff
> 3) PR/3430 -Reduce the logging to one terminal statement saying that we 
> retried X times and final failure was YYY-
> Dump of console output:
> Dec 20, 2016 9:12:20 AM 
> com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner fromOptions
> INFO: PipelineOptions.filesToStage was not specified. Defaulting to files 
> from the classpath: will stage 1 files. Enable logging at DEBUG level to see 
> which files will be staged.
> Dec 20, 2016 9:12:21 AM 
> com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner run
> INFO: Executing pipeline on the Dataflow Service, which will have billing 
> implications related to Google Compute Engine usage and other Google Cloud 
> Services.
> Dec 20, 2016 9:12:21 AM com.google.cloud.dataflow.sdk.util.PackageUtil 
> stageClasspathElements
> INFO: Uploading 1 files from PipelineOptions.filesToStage to staging location 
> to prepare for execution.
> Dec 20, 2016 9:12:21 AM com.google.cloud.dataflow.sdk.util.PackageUtil 
> stageClasspathElements
> INFO: Uploading PipelineOptions.filesToStage complete: 1 files newly 
> uploaded, 0 files cached
> Dec 20, 2016 9:12:22 AM com.google.api.client.http.HttpRequest execute
> WARNING: exception thrown while executing request
> java.net.ConnectException: Connection refused
>   at java.net.PlainSocketImpl.socketConnect(Native Method)
>   at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
>   at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
>   at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>   at java.net.Socket.connect(Socket.java:589)
>   at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
>   at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
>   at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
>   at sun.net.www.http.HttpClient.(HttpClient.java:211)
>   at sun.net.www.http.HttpClient.New(HttpClient.java:308)
>   at sun.net.www.http.HttpClient.New(HttpClient.java:326)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1169)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1105)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:999)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:933)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1283)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1258)
>   at 
> com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:77)
>   at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981)
>   at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
>   at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
>   at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
>   at 
> com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner.run(DataflowPipelineRunner.java:632)
>   at 
> com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner.run(DataflowPipelineRunner.java:201)
>   at com.google.cloud.dataflow.sdk.Pipeline.run(Pipeline.java:181)
>   at 
> 

Build failed in Jenkins: beam_PostCommit_Java_MavenInstall #4270

2017-06-29 Thread Apache Jenkins Server
See 


--
[...truncated 1.21 MB...]
2017-06-30T02:39:03.532 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/jamon/jamon-runtime/2.3.1/jamon-runtime-2.3.1.pom
2017-06-30T02:39:03.702 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/jamon/jamon-runtime/2.3.1/jamon-runtime-2.3.1.pom
 (2 KB at 10.5 KB/sec)
2017-06-30T02:39:03.704 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/jamon/jamon-project/1.0.2/jamon-project-1.0.2.pom
2017-06-30T02:39:03.751 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/jamon/jamon-project/1.0.2/jamon-project-1.0.2.pom
 (9 KB at 190.1 KB/sec)
2017-06-30T02:39:03.766 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/hadoop/hadoop-archives/2.6.0/hadoop-archives-2.6.0.pom
2017-06-30T02:39:03.794 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/hadoop/hadoop-archives/2.6.0/hadoop-archives-2.6.0.pom
 (5 KB at 150.5 KB/sec)
[JENKINS] Archiving disabled
2017-06-30T02:39:04.417 [INFO]  
   
2017-06-30T02:39:04.417 [INFO] 

2017-06-30T02:39:04.417 [INFO] Skipping Apache Beam :: Parent
2017-06-30T02:39:04.417 [INFO] This project has been banned from the build due 
to previous failures.
2017-06-30T02:39:04.417 [INFO] 

[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled
[JENKINS] Archiving disabled2017-06-30T02:39:23.531 [INFO] 

2017-06-30T02:39:23.531 [INFO] Reactor Summary:
2017-06-30T02:39:23.531 [INFO] 
2017-06-30T02:39:23.531 [INFO] Apache Beam :: Parent 
.. SUCCESS [ 40.675 s]
2017-06-30T02:39:23.531 [INFO] Apache Beam :: SDKs :: Java :: Build Tools 
. SUCCESS [  9.511 s]
2017-06-30T02:39:23.531 [INFO] Apache Beam :: SDKs 
 SUCCESS [  4.632 s]
2017-06-30T02:39:23.531 [INFO] Apache Beam :: SDKs :: Common 
.. SUCCESS [  2.090 s]
2017-06-30T02:39:23.531 [INFO] Apache Beam :: SDKs :: Common :: Runner API 
 SUCCESS [ 15.565 s]
2017-06-30T02:39:23.531 [INFO] Apache Beam :: SDKs :: Common :: Fn API 
 SUCCESS [ 15.963 s]
2017-06-30T02:39:23.531 [INFO] Apache Beam :: SDKs :: Java 
 SUCCESS [  1.846 s]
2017-06-30T02:39:23.531 [INFO] Apache Beam :: SDKs :: Java :: Core 
 SUCCESS [02:21 min]
2017-06-30T02:39:23.531 [INFO] Apache Beam :: Runners 
. SUCCESS [  1.603 s]
2017-06-30T02:39:23.531 [INFO] Apache Beam :: Runners :: Core Construction Java 
... SUCCESS [ 26.443 s]
2017-06-30T02:39:23.531 [INFO] Apache Beam :: Runners :: Core Java 
 SUCCESS [ 50.818 s]
2017-06-30T02:39:23.531 [INFO] Apache Beam :: Runners :: Direct Java 
.. SUCCESS [03:47 min]
2017-06-30T02:39:23.531 [INFO] Apache Beam :: SDKs :: Java :: IO 
.. SUCCESS [  3.116 s]
2017-06-30T02:39:23.531 [INFO] Apache Beam :: SDKs :: Java :: IO :: AMQP 
.. SUCCESS [ 30.423 s]
2017-06-30T02:39:23.531 [INFO] Apache Beam :: SDKs :: Java :: IO :: Common 
 SUCCESS [  5.978 s]
2017-06-30T02:39:23.531 [INFO] Apache Beam :: SDKs :: Java :: IO :: Cassandra 
. SUCCESS [ 35.377 s]
2017-06-30T02:39:23.531 [INFO] Apache Beam :: SDKs :: Java :: IO :: 
Elasticsearch . SUCCESS [ 45.331 s]
2017-06-30T02:39:23.531 

[jira] [Commented] (BEAM-2529) SpannerWriteIT failing in postcommit

2017-06-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069357#comment-16069357
 ] 

ASF GitHub Bot commented on BEAM-2529:
--

GitHub user kennknowles opened a pull request:

https://github.com/apache/beam/pull/3476

[BEAM-2529] Fix dependency scope error in GCP IO module

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`.
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/beam SpannerWriteIT

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3476.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3476


commit 598ba4459078808ebde48eef16e28bf5d7683d68
Author: Kenneth Knowles 
Date:   2017-06-30T02:10:32Z

Fix dependency scope error in GCP IO module




> SpannerWriteIT failing in postcommit
> 
>
> Key: BEAM-2529
> URL: https://issues.apache.org/jira/browse/BEAM-2529
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Blocker
>
> There are some NPEs and some other issues after the upgrade of commons-lang3 
> in 
> https://github.com/apache/beam/commit/6681472a2aa277ba83fd9e2ffec5a57c46d5820c#diff-f74bb28a007fa6d75d065f14a7c76b38



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3476: [BEAM-2529] Fix dependency scope error in GCP IO mo...

2017-06-29 Thread kennknowles
GitHub user kennknowles opened a pull request:

https://github.com/apache/beam/pull/3476

[BEAM-2529] Fix dependency scope error in GCP IO module

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`.
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/beam SpannerWriteIT

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3476.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3476


commit 598ba4459078808ebde48eef16e28bf5d7683d68
Author: Kenneth Knowles 
Date:   2017-06-30T02:10:32Z

Fix dependency scope error in GCP IO module




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-2193) JOIN: inner join, left join, right join, full outer join

2017-06-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069336#comment-16069336
 ] 

ASF GitHub Bot commented on BEAM-2193:
--

Github user xumingming closed the pull request at:

https://github.com/apache/beam/pull/3277


> JOIN: inner join, left join, right join, full outer join
> 
>
> Key: BEAM-2193
> URL: https://issues.apache.org/jira/browse/BEAM-2193
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: James Xu
>Assignee: James Xu
>  Labels: dsl_sql_merge
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3277: [BEAM-2193] implement join

2017-06-29 Thread xumingming
Github user xumingming closed the pull request at:

https://github.com/apache/beam/pull/3277


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #4268

2017-06-29 Thread Apache Jenkins Server
See 




Jenkins build is unstable: beam_PostCommit_Java_MavenInstall #4267

2017-06-29 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-2545) bigtable e2e tests failing - UNKNOWN: Stale requests/Error mutating row

2017-06-29 Thread Solomon Duskis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069316#comment-16069316
 ] 

Solomon Duskis commented on BEAM-2545:
--

I would suggest upgrading to the 1.0.0-pre1 release.   We did a complete 
overhaul of BulkMutation between 0.9.7.1 and 1.0.0-pre1.  We didn't see "stale 
requests" in our tests of 0.9.7.1, but we saw some stuckness under heavy load.  
1.0.0-pre1 didn't exhibit any of the problems we saw in earlier versions.

> bigtable e2e tests failing -  UNKNOWN: Stale requests/Error mutating row
> 
>
> Key: BEAM-2545
> URL: https://issues.apache.org/jira/browse/BEAM-2545
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Reporter: Stephen Sisk
>Assignee: Stephen Sisk
> Fix For: 2.1.0
>
>
> The BigtableWriteIT is taking a long time (~10min) and throwing errors. 
> Example test run: 
> https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_MavenInstall/4264/org.apache.beam$beam-runners-google-cloud-dataflow-java/testReport/junit/org.apache.beam.sdk.io.gcp.bigtable/BigtableWriteIT/testE2EBigtableWrite/
> (96dc5c8efaf8fa26): java.io.IOException: At least 25 errors occurred writing 
> to Bigtable. First 10 errors: 
> Error mutating row key00175 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00175"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00176 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00176"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00177 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00177"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00178 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00178"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00179 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00179"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00180 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00180"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00181 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00181"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00182 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00182"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00183 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00183"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00184 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00184"
> }
> ]: UNKNOWN: Stale requests.
>  at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.checkForFailures(BigtableIO.java:655)
>  at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.finishBundle(BigtableIO.java:607)
> Stacktrace
> java.lang.RuntimeException: 
> (96dc5c8efaf8fa26): java.io.IOException: At least 25 errors occurred writing 
> to Bigtable. First 10 errors: 
> Error mutating row key00175 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00175"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00176 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00176"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00177 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00177"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00178 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00178"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00179 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00179"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00180 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00180"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00181 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00181"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00182 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00182"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00183 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00183"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00184 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00184"
> }
> ]: UNKNOWN: Stale requests.
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.checkForFailures(BigtableIO.java:655)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.finishBundle(BigtableIO.java:607)
>   at 
> 

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2518

2017-06-29 Thread Apache Jenkins Server
See 




Jenkins build is unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2517

2017-06-29 Thread Apache Jenkins Server
See 




Build failed in Jenkins: beam_PostCommit_Java_MavenInstall #4266

2017-06-29 Thread Apache Jenkins Server
See 


--
[...truncated 1.10 MB...]
at 
org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndexNames(IndexNameExpressionResolver.java:140)
at 
org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndexNames(IndexNameExpressionResolver.java:73)
at 
org.elasticsearch.action.admin.indices.delete.TransportDeleteIndexAction.checkBlock(TransportDeleteIndexAction.java:81)
at 
org.elasticsearch.action.admin.indices.delete.TransportDeleteIndexAction.checkBlock(TransportDeleteIndexAction.java:49)
at 
org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.doStart(TransportMasterNodeAction.java:140)
at 
org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.start(TransportMasterNodeAction.java:132)
at 
org.elasticsearch.action.support.master.TransportMasterNodeAction.doExecute(TransportMasterNodeAction.java:103)
at 
org.elasticsearch.action.admin.indices.delete.TransportDeleteIndexAction.doExecute(TransportDeleteIndexAction.java:76)
at 
org.elasticsearch.action.admin.indices.delete.TransportDeleteIndexAction.doExecute(TransportDeleteIndexAction.java:49)
at 
org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:182)
at 
org.elasticsearch.action.ingest.IngestActionFilter.apply(IngestActionFilter.java:82)
at 
org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:180)
at 
org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:153)
at 
org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:87)
at 
org.elasticsearch.client.node.NodeClient.executeLocally(NodeClient.java:75)
at 
org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:64)
at 
org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:403)
at 
org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:392)
at 
org.elasticsearch.client.support.AbstractClient$IndicesAdmin.execute(AbstractClient.java:1215)
at 
org.elasticsearch.client.support.AbstractClient$IndicesAdmin.delete(AbstractClient.java:1357)
at 
org.apache.beam.sdk.io.hadoop.inputformat.HIFIOWithElasticTest$ElasticEmbeddedServer.shutdown(HIFIOWithElasticTest.java:250)
at 
org.apache.beam.sdk.io.hadoop.inputformat.HIFIOWithElasticTest.shutdownServer(HIFIOWithElasticTest.java:202)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runners.Suite.runChild(Suite.java:128)
at org.junit.runners.Suite.runChild(Suite.java:27)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.apache.maven.surefire.junitcore.JUnitCore.run(JUnitCore.java:55)
at 
org.apache.maven.surefire.junitcore.JUnitCoreWrapper.createRequestAndRun(JUnitCoreWrapper.java:137)
at 
org.apache.maven.surefire.junitcore.JUnitCoreWrapper.executeEager(JUnitCoreWrapper.java:107)
at 
org.apache.maven.surefire.junitcore.JUnitCoreWrapper.execute(JUnitCoreWrapper.java:83)
at 
org.apache.maven.surefire.junitcore.JUnitCoreWrapper.execute(JUnitCoreWrapper.java:75)
at 
org.apache.maven.surefire.junitcore.JUnitCoreProvider.invoke(JUnitCoreProvider.java:157)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:386)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:323)
at 

[GitHub] beam pull request #3462: Add a Combine Test for Sliding Windows without Cont...

2017-06-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3462


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: Add a Combine Test for Sliding Windows without Context

2017-06-29 Thread kenn
Repository: beam
Updated Branches:
  refs/heads/master 52cea71ed -> 7fc73d790


Add a Combine Test for Sliding Windows without Context


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/6ade8426
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/6ade8426
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/6ade8426

Branch: refs/heads/master
Commit: 6ade8426edc2ace1a9bec8f9501d8dad17e91365
Parents: 1e16aa2
Author: Thomas Groh 
Authored: Wed Jun 28 12:51:31 2017 -0700
Committer: Thomas Groh 
Committed: Wed Jun 28 12:52:37 2017 -0700

--
 .../apache/beam/sdk/transforms/CombineTest.java | 63 
 1 file changed, 63 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/6ade8426/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CombineTest.java
--
diff --git 
a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CombineTest.java 
b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CombineTest.java
index e2469ab..b24d82d 100644
--- 
a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CombineTest.java
+++ 
b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CombineTest.java
@@ -29,11 +29,13 @@ import static org.junit.Assert.assertEquals;
 import static org.junit.Assert.assertThat;
 
 import com.google.common.base.MoreObjects;
+import com.google.common.collect.ImmutableList;
 import com.google.common.collect.ImmutableSet;
 import java.io.IOException;
 import java.io.InputStream;
 import java.io.OutputStream;
 import java.io.Serializable;
+import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.Collections;
 import java.util.HashSet;
@@ -325,6 +327,67 @@ public class CombineTest implements Serializable {
 
   @Test
   @Category(ValidatesRunner.class)
+  public void testSlidingWindowsCombine() {
+PCollection input =
+pipeline
+.apply(
+Create.timestamped(
+TimestampedValue.of("a", new Instant(1L)),
+TimestampedValue.of("b", new Instant(2L)),
+TimestampedValue.of("c", new Instant(3L
+.apply(
+Window.into(
+
SlidingWindows.of(Duration.millis(3)).every(Duration.millis(1L;
+PCollection combined =
+input.apply(
+Combine.globally(
+new CombineFn() {
+  @Override
+  public List createAccumulator() {
+return new ArrayList<>();
+  }
+
+  @Override
+  public List addInput(List accumulator, 
String input) {
+accumulator.add(input);
+return accumulator;
+  }
+
+  @Override
+  public List 
mergeAccumulators(Iterable accumulators) {
+// Mutate all of the accumulators. Instances should be 
used in only one
+// place, and not
+// reused after merging.
+List cur = createAccumulator();
+for (List accumulator : accumulators) {
+  accumulator.addAll(cur);
+  cur = accumulator;
+}
+return cur;
+  }
+
+  @Override
+  public List extractOutput(List 
accumulator) {
+List result = new ArrayList<>(accumulator);
+Collections.sort(result);
+return result;
+  }
+})
+.withoutDefaults());
+
+PAssert.that(combined)
+.containsInAnyOrder(
+ImmutableList.of("a"),
+ImmutableList.of("a", "b"),
+ImmutableList.of("a", "b", "c"),
+ImmutableList.of("b", "c"),
+ImmutableList.of("c"));
+
+pipeline.run();
+  }
+
+  @Test
+  @Category(ValidatesRunner.class)
   public void testSlidingWindowsCombineWithContext() {
 // [a: 1, 1], [a: 4; b: 1], [b: 13]
 PCollection> perKeyInput =



[2/2] beam git commit: This closes #3462: Add a Combine Test for Sliding Windows without Context

2017-06-29 Thread kenn
This closes #3462: Add a Combine Test for Sliding Windows without Context


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/7fc73d79
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/7fc73d79
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/7fc73d79

Branch: refs/heads/master
Commit: 7fc73d790803760c1a0684106be0c38126266598
Parents: 52cea71 6ade842
Author: Kenneth Knowles 
Authored: Thu Jun 29 17:19:43 2017 -0700
Committer: Kenneth Knowles 
Committed: Thu Jun 29 17:19:43 2017 -0700

--
 .../apache/beam/sdk/transforms/CombineTest.java | 63 
 1 file changed, 63 insertions(+)
--




[jira] [Commented] (BEAM-2537) Can't run IO ITs - conflicting project vs projectId pipeline options

2017-06-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069243#comment-16069243
 ] 

ASF GitHub Bot commented on BEAM-2537:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3465


> Can't run IO ITs - conflicting project vs projectId pipeline options
> 
>
> Key: BEAM-2537
> URL: https://issues.apache.org/jira/browse/BEAM-2537
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions, sdk-java-gcp
>Reporter: Stephen Sisk
>Assignee: Stephen Sisk
>
> Today, the GPC IO ITs use a mix of --project and --projectId pipeline options.
> eg. GcpOptions.java uses --project:
> https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L81
> and e.g the datastore IO ITs use that parameter:
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/V1ReadIT.java#L60
> however, BigtableTestOptions uses --projectId:
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableTestOptions.java#L30
> as does the spanner IO IT:
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java#L61
> These conflicts mean that you can't run the tests as a suite successfully if 
> you need to manually specify the project.
> This works with the integration tests currently checked in b/c they are using 
> the default project values (apache-beam-testing) when running in CI. However, 
> that's not an option for running these tests as part of a suite when doing 
> local development.
> cc [~jasonkuster]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[2/3] beam git commit: Don't call .testingPipelineOptions() a second time

2017-06-29 Thread kenn
Don't call .testingPipelineOptions() a second time


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/ab7f6f6d
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/ab7f6f6d
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/ab7f6f6d

Branch: refs/heads/master
Commit: ab7f6f6dbd3ad67c5da577bc9395cb09e35069d9
Parents: 0acfe70
Author: Stephen Sisk 
Authored: Thu Jun 29 13:29:22 2017 -0700
Committer: Stephen Sisk 
Committed: Thu Jun 29 13:29:22 2017 -0700

--
 .../java/org/apache/beam/sdk/io/gcp/bigtable/BigtableReadIT.java   | 2 +-
 .../java/org/apache/beam/sdk/io/gcp/bigtable/BigtableWriteIT.java  | 2 +-
 .../java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java | 2 +-
 .../java/org/apache/beam/sdk/io/gcp/spanner/SpannerWriteIT.java| 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/ab7f6f6d/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableReadIT.java
--
diff --git 
a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableReadIT.java
 
b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableReadIT.java
index e47fd0f..91f0bae 100644
--- 
a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableReadIT.java
+++ 
b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableReadIT.java
@@ -42,7 +42,7 @@ public class BigtableReadIT {
 BigtableTestOptions options = TestPipeline.testingPipelineOptions()
 .as(BigtableTestOptions.class);
 
-String project = 
TestPipeline.testingPipelineOptions().as(GcpOptions.class).getProject();
+String project = options.as(GcpOptions.class).getProject();
 
 BigtableOptions.Builder bigtableOptionsBuilder = new 
BigtableOptions.Builder()
 .setProjectId(project)

http://git-wip-us.apache.org/repos/asf/beam/blob/ab7f6f6d/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableWriteIT.java
--
diff --git 
a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableWriteIT.java
 
b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableWriteIT.java
index 72ba836..010bcc4 100644
--- 
a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableWriteIT.java
+++ 
b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableWriteIT.java
@@ -79,7 +79,7 @@ public class BigtableWriteIT implements Serializable {
   public void setup() throws Exception {
 PipelineOptionsFactory.register(BigtableTestOptions.class);
 options = 
TestPipeline.testingPipelineOptions().as(BigtableTestOptions.class);
-project = 
TestPipeline.testingPipelineOptions().as(GcpOptions.class).getProject();
+project = options.as(GcpOptions.class).getProject();
 
 bigtableOptions =
 new Builder()

http://git-wip-us.apache.org/repos/asf/beam/blob/ab7f6f6d/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java
--
diff --git 
a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java
 
b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java
index 32183f9..bfbda50 100644
--- 
a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java
+++ 
b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java
@@ -88,7 +88,7 @@ public class SpannerReadIT {
 PipelineOptionsFactory.register(SpannerTestPipelineOptions.class);
 options = 
TestPipeline.testingPipelineOptions().as(SpannerTestPipelineOptions.class);
 
-project = 
TestPipeline.testingPipelineOptions().as(GcpOptions.class).getProject();
+project = options.as(GcpOptions.class).getProject();
 
 spanner = 
SpannerOptions.newBuilder().setProjectId(project).build().getService();
 

http://git-wip-us.apache.org/repos/asf/beam/blob/ab7f6f6d/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerWriteIT.java
--
diff --git 
a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerWriteIT.java
 

[1/3] beam git commit: GCP IO ITs now all use --project option

2017-06-29 Thread kenn
Repository: beam
Updated Branches:
  refs/heads/master 99b5b0563 -> 52cea71ed


GCP IO ITs now all use --project option

Up until now, some IO ITs used --projectId and others used --project

This mixing meant that running all the tests in one test run was
impossible.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/0acfe704
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/0acfe704
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/0acfe704

Branch: refs/heads/master
Commit: 0acfe70439b50184a69601ca4bb8cff9780fa4ef
Parents: 99b5b05
Author: Stephen Sisk 
Authored: Wed Jun 28 15:34:45 2017 -0700
Committer: Stephen Sisk 
Committed: Thu Jun 29 13:24:45 2017 -0700

--
 .../beam/sdk/io/gcp/bigtable/BigtableReadIT.java|  5 -
 .../sdk/io/gcp/bigtable/BigtableTestOptions.java|  5 -
 .../beam/sdk/io/gcp/bigtable/BigtableWriteIT.java   |  4 +++-
 .../beam/sdk/io/gcp/spanner/SpannerReadIT.java  | 16 
 .../beam/sdk/io/gcp/spanner/SpannerWriteIT.java | 15 +++
 5 files changed, 22 insertions(+), 23 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/0acfe704/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableReadIT.java
--
diff --git 
a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableReadIT.java
 
b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableReadIT.java
index a064bd6..e47fd0f 100644
--- 
a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableReadIT.java
+++ 
b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableReadIT.java
@@ -20,6 +20,7 @@ package org.apache.beam.sdk.io.gcp.bigtable;
 import com.google.bigtable.v2.Row;
 import com.google.cloud.bigtable.config.BigtableOptions;
 import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.extensions.gcp.options.GcpOptions;
 import org.apache.beam.sdk.options.PipelineOptionsFactory;
 import org.apache.beam.sdk.testing.PAssert;
 import org.apache.beam.sdk.testing.TestPipeline;
@@ -41,8 +42,10 @@ public class BigtableReadIT {
 BigtableTestOptions options = TestPipeline.testingPipelineOptions()
 .as(BigtableTestOptions.class);
 
+String project = 
TestPipeline.testingPipelineOptions().as(GcpOptions.class).getProject();
+
 BigtableOptions.Builder bigtableOptionsBuilder = new 
BigtableOptions.Builder()
-.setProjectId(options.getProjectId())
+.setProjectId(project)
 .setInstanceId(options.getInstanceId());
 
 final String tableId = "BigtableReadTest";

http://git-wip-us.apache.org/repos/asf/beam/blob/0acfe704/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableTestOptions.java
--
diff --git 
a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableTestOptions.java
 
b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableTestOptions.java
index 0ab7576..03cb697 100644
--- 
a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableTestOptions.java
+++ 
b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableTestOptions.java
@@ -25,11 +25,6 @@ import org.apache.beam.sdk.testing.TestPipelineOptions;
  * Properties needed when using Bigtable with the Beam SDK.
  */
 public interface BigtableTestOptions extends TestPipelineOptions {
-  @Description("Project ID for Bigtable")
-  @Default.String("apache-beam-testing")
-  String getProjectId();
-  void setProjectId(String value);
-
   @Description("Instance ID for Bigtable")
   @Default.String("beam-test")
   String getInstanceId();

http://git-wip-us.apache.org/repos/asf/beam/blob/0acfe704/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableWriteIT.java
--
diff --git 
a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableWriteIT.java
 
b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableWriteIT.java
index 1d168f1..72ba836 100644
--- 
a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableWriteIT.java
+++ 
b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableWriteIT.java
@@ -73,15 +73,17 @@ public class BigtableWriteIT implements Serializable {
   private static 

[GitHub] beam pull request #3465: [BEAM-2537] GCP IO ITs now all use --project option

2017-06-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3465


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[3/3] beam git commit: This closes #3465: [BEAM-2537] GCP IO ITs now all use --project option

2017-06-29 Thread kenn
This closes #3465: [BEAM-2537] GCP IO ITs now all use --project option

  Don't call .testingPipelineOptions() a second time
  GCP IO ITs now all use --project option


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/52cea71e
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/52cea71e
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/52cea71e

Branch: refs/heads/master
Commit: 52cea71ed4094b8dc192449923b4c95b57b9ea5f
Parents: 99b5b05 ab7f6f6
Author: Kenneth Knowles 
Authored: Thu Jun 29 17:09:18 2017 -0700
Committer: Kenneth Knowles 
Committed: Thu Jun 29 17:09:18 2017 -0700

--
 .../beam/sdk/io/gcp/bigtable/BigtableReadIT.java|  5 -
 .../sdk/io/gcp/bigtable/BigtableTestOptions.java|  5 -
 .../beam/sdk/io/gcp/bigtable/BigtableWriteIT.java   |  4 +++-
 .../beam/sdk/io/gcp/spanner/SpannerReadIT.java  | 16 
 .../beam/sdk/io/gcp/spanner/SpannerWriteIT.java | 15 +++
 5 files changed, 22 insertions(+), 23 deletions(-)
--




Build failed in Jenkins: beam_PostCommit_Java_ValidatesRunner_Spark #2516

2017-06-29 Thread Apache Jenkins Server
See 


--
[...truncated 448.14 KB...]
2017-06-30T00:06:21.131 [INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 
0, Time elapsed: 0.001 s - in 
org.apache.beam.runners.core.triggers.DefaultTriggerStateMachineTest
2017-06-30T00:06:21.131 [INFO] Running 
org.apache.beam.runners.core.triggers.AfterAllStateMachineTest
2017-06-30T00:06:21.133 [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 
0, Time elapsed: 0 s - in 
org.apache.beam.runners.core.triggers.AfterAllStateMachineTest
2017-06-30T00:06:21.133 [INFO] Running 
org.apache.beam.runners.core.triggers.ReshuffleTriggerStateMachineTest
2017-06-30T00:06:21.135 [INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 
0, Time elapsed: 0.001 s - in 
org.apache.beam.runners.core.triggers.ReshuffleTriggerStateMachineTest
2017-06-30T00:06:21.136 [INFO] Running 
org.apache.beam.runners.core.triggers.AfterProcessingTimeStateMachineTest
2017-06-30T00:06:21.138 [INFO] Tests run: 7, Failures: 0, Errors: 0, Skipped: 
0, Time elapsed: 0 s - in 
org.apache.beam.runners.core.triggers.AfterProcessingTimeStateMachineTest
2017-06-30T00:06:21.138 [INFO] Running 
org.apache.beam.runners.core.triggers.AfterPaneStateMachineTest
2017-06-30T00:06:21.140 [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 
0, Time elapsed: 0.001 s - in 
org.apache.beam.runners.core.triggers.AfterPaneStateMachineTest
2017-06-30T00:06:21.141 [INFO] Running 
org.apache.beam.runners.core.triggers.AfterSynchronizedProcessingTimeStateMachineTest
2017-06-30T00:06:21.143 [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 
0, Time elapsed: 0 s - in 
org.apache.beam.runners.core.triggers.AfterSynchronizedProcessingTimeStateMachineTest
2017-06-30T00:06:21.522 [INFO] 
2017-06-30T00:06:21.522 [INFO] Results:
2017-06-30T00:06:21.522 [INFO] 
2017-06-30T00:06:21.522 [WARNING] Tests run: 223, Failures: 0, Errors: 0, 
Skipped: 1
2017-06-30T00:06:21.522 [INFO] 
[JENKINS] Recording test results
2017-06-30T00:06:23.969 [INFO] 
2017-06-30T00:06:23.969 [INFO] --- 
build-helper-maven-plugin:3.0.0:regex-properties (render-artifact-id) @ 
beam-runners-core-java ---
2017-06-30T00:06:24.020 [INFO] 
2017-06-30T00:06:24.020 [INFO] --- jacoco-maven-plugin:0.7.8:report (report) @ 
beam-runners-core-java ---
2017-06-30T00:06:24.022 [INFO] Loading execution data file 

2017-06-30T00:06:24.054 [INFO] Analyzed bundle 'Apache Beam :: Runners :: Core 
Java' with 193 classes
2017-06-30T00:06:24.580 [INFO] 
2017-06-30T00:06:24.580 [INFO] --- maven-jar-plugin:3.0.2:jar (default-jar) @ 
beam-runners-core-java ---
2017-06-30T00:06:24.603 [INFO] Building jar: 

2017-06-30T00:06:24.678 [INFO] 
2017-06-30T00:06:24.678 [INFO] --- maven-site-plugin:3.5.1:attach-descriptor 
(attach-descriptor) @ beam-runners-core-java ---
2017-06-30T00:06:24.819 [INFO] 
2017-06-30T00:06:24.819 [INFO] --- maven-jar-plugin:3.0.2:test-jar 
(default-test-jar) @ beam-runners-core-java ---
2017-06-30T00:06:24.832 [INFO] Building jar: 

2017-06-30T00:06:24.899 [INFO] 
2017-06-30T00:06:24.899 [INFO] --- maven-shade-plugin:3.0.0:shade 
(bundle-and-repackage) @ beam-runners-core-java ---
2017-06-30T00:06:24.902 [INFO] Excluding 
org.apache.beam:beam-sdks-java-core:jar:2.1.0-SNAPSHOT from the shaded jar.
2017-06-30T00:06:24.902 [INFO] Excluding 
com.google.protobuf:protobuf-java:jar:3.2.0 from the shaded jar.
2017-06-30T00:06:24.902 [INFO] Excluding 
com.fasterxml.jackson.core:jackson-core:jar:2.8.9 from the shaded jar.
2017-06-30T00:06:24.902 [INFO] Excluding 
com.fasterxml.jackson.core:jackson-annotations:jar:2.8.9 from the shaded jar.
2017-06-30T00:06:24.902 [INFO] Excluding 
com.fasterxml.jackson.core:jackson-databind:jar:2.8.9 from the shaded jar.
2017-06-30T00:06:24.902 [INFO] Excluding org.slf4j:slf4j-api:jar:1.7.14 from 
the shaded jar.
2017-06-30T00:06:24.902 [INFO] Excluding net.bytebuddy:byte-buddy:jar:1.6.8 
from the shaded jar.
2017-06-30T00:06:24.902 [INFO] Excluding org.apache.avro:avro:jar:1.8.2 from 
the shaded jar.
2017-06-30T00:06:24.902 [INFO] Excluding 
org.codehaus.jackson:jackson-core-asl:jar:1.9.13 from the shaded jar.
2017-06-30T00:06:24.902 [INFO] Excluding 
org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13 from the shaded jar.
2017-06-30T00:06:24.902 [INFO] Excluding 
com.thoughtworks.paranamer:paranamer:jar:2.7 from the shaded jar.
2017-06-30T00:06:24.902 [INFO] Excluding org.tukaani:xz:jar:1.5 from the shaded 
jar.
2017-06-30T00:06:24.902 [INFO] Excluding 
org.xerial.snappy:snappy-java:jar:1.1.4-M3 

Build failed in Jenkins: beam_PerformanceTests_Python #43

2017-06-29 Thread Apache Jenkins Server
See 


Changes:

[jbonofre] Define the projectId in the SpannerIO Read Test (utest, not itest)

[lcwik] [BEAM-2373] Upgrade commons-compress dependency version to 1.14

[altay] Select SDK distribution based on the selected SDK name

--
Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on beam7 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/${ghprbPullId}/*:refs/remotes/origin/pr/${ghprbPullId}/*
 > git rev-parse origin/master^{commit} # timeout=10
Checking out Revision 99b5b0563f85e6e2e22bb7b9148b4058955aeae1 (origin/master)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 99b5b0563f85e6e2e22bb7b9148b4058955aeae1
 > git rev-list bf5aa1bca4861d6867978a9508dc5c952bd7fc2b # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
[EnvInject] - Executing scripts and injecting environment variables after the 
SCM step.
[EnvInject] - Injecting as environment variables the properties content 
SPARK_LOCAL_IP=127.0.0.1

[EnvInject] - Variables injected successfully.
[beam_PerformanceTests_Python] $ /bin/bash -xe /tmp/hudson3720446743593330834.sh
+ rm -rf PerfKitBenchmarker
[beam_PerformanceTests_Python] $ /bin/bash -xe /tmp/hudson3699940936324549660.sh
+ git clone https://github.com/GoogleCloudPlatform/PerfKitBenchmarker.git
Cloning into 'PerfKitBenchmarker'...
[beam_PerformanceTests_Python] $ /bin/bash -xe /tmp/hudson1420830304241512074.sh
+ pip install --user -r PerfKitBenchmarker/requirements.txt
Requirement already satisfied (use --upgrade to upgrade): python-gflags==3.1.1 
in /home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 14))
Requirement already satisfied (use --upgrade to upgrade): jinja2>=2.7 in 
/usr/local/lib/python2.7/dist-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 15))
Requirement already satisfied (use --upgrade to upgrade): setuptools in 
/usr/lib/python2.7/dist-packages (from -r PerfKitBenchmarker/requirements.txt 
(line 16))
Requirement already satisfied (use --upgrade to upgrade): 
colorlog[windows]==2.6.0 in /home/jenkins/.local/lib/python2.7/site-packages 
(from -r PerfKitBenchmarker/requirements.txt (line 17))
  Installing extra requirements: 'windows'
Requirement already satisfied (use --upgrade to upgrade): blinker>=1.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 18))
Requirement already satisfied (use --upgrade to upgrade): futures>=3.0.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 19))
Requirement already satisfied (use --upgrade to upgrade): PyYAML==3.12 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 20))
Requirement already satisfied (use --upgrade to upgrade): pint>=0.7 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 21))
Requirement already satisfied (use --upgrade to upgrade): numpy in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 22))
Requirement already satisfied (use --upgrade to upgrade): functools32 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 23))
Requirement already satisfied (use --upgrade to upgrade): contextlib2>=0.5.1 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 24))
Cleaning up...
[beam_PerformanceTests_Python] $ /bin/bash -xe /tmp/hudson4430559336406215381.sh
+ pip install --user -e 'sdks/python/[gcp,test]'
Obtaining 
file://
  Running setup.py 
(path:
 egg_info for package from 
file://

:66:
 UserWarning: You are using version 1.5.4 of pip. However, version 7.0.0 is 
recommended.
  _PIP_VERSION, REQUIRED_PIP_VERSION
no previously-included directories found matching 'doc/.build'

Installed 

[jira] [Updated] (BEAM-2545) bigtable e2e tests failing - UNKNOWN: Stale requests/Error mutating row

2017-06-29 Thread Stephen Sisk (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Sisk updated BEAM-2545:
---
Fix Version/s: 2.1.0

> bigtable e2e tests failing -  UNKNOWN: Stale requests/Error mutating row
> 
>
> Key: BEAM-2545
> URL: https://issues.apache.org/jira/browse/BEAM-2545
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Reporter: Stephen Sisk
>Assignee: Stephen Sisk
> Fix For: 2.1.0
>
>
> The BigtableWriteIT is taking a long time (~10min) and throwing errors. 
> Example test run: 
> https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_MavenInstall/4264/org.apache.beam$beam-runners-google-cloud-dataflow-java/testReport/junit/org.apache.beam.sdk.io.gcp.bigtable/BigtableWriteIT/testE2EBigtableWrite/
> (96dc5c8efaf8fa26): java.io.IOException: At least 25 errors occurred writing 
> to Bigtable. First 10 errors: 
> Error mutating row key00175 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00175"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00176 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00176"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00177 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00177"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00178 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00178"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00179 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00179"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00180 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00180"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00181 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00181"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00182 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00182"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00183 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00183"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00184 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00184"
> }
> ]: UNKNOWN: Stale requests.
>  at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.checkForFailures(BigtableIO.java:655)
>  at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.finishBundle(BigtableIO.java:607)
> Stacktrace
> java.lang.RuntimeException: 
> (96dc5c8efaf8fa26): java.io.IOException: At least 25 errors occurred writing 
> to Bigtable. First 10 errors: 
> Error mutating row key00175 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00175"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00176 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00176"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00177 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00177"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00178 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00178"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00179 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00179"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00180 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00180"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00181 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00181"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00182 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00182"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00183 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00183"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00184 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00184"
> }
> ]: UNKNOWN: Stale requests.
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.checkForFailures(BigtableIO.java:655)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.finishBundle(BigtableIO.java:607)
>   at 
> org.apache.beam.runners.dataflow.TestDataflowRunner.run(TestDataflowRunner.java:133)
>   at 
> org.apache.beam.runners.dataflow.TestDataflowRunner.run(TestDataflowRunner.java:89)
>   at 
> org.apache.beam.runners.dataflow.TestDataflowRunner.run(TestDataflowRunner.java:54)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
>   

[jira] [Commented] (BEAM-2545) bigtable e2e tests failing - UNKNOWN: Stale requests/Error mutating row

2017-06-29 Thread Stephen Sisk (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069203#comment-16069203
 ] 

Stephen Sisk commented on BEAM-2545:


cc [~sduskis] I realized this could potentially be related to the recent 
upgrade to 0.9.71 client in beam.

> bigtable e2e tests failing -  UNKNOWN: Stale requests/Error mutating row
> 
>
> Key: BEAM-2545
> URL: https://issues.apache.org/jira/browse/BEAM-2545
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Reporter: Stephen Sisk
>Assignee: Stephen Sisk
>
> The BigtableWriteIT is taking a long time (~10min) and throwing errors. 
> Example test run: 
> https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_MavenInstall/4264/org.apache.beam$beam-runners-google-cloud-dataflow-java/testReport/junit/org.apache.beam.sdk.io.gcp.bigtable/BigtableWriteIT/testE2EBigtableWrite/
> (96dc5c8efaf8fa26): java.io.IOException: At least 25 errors occurred writing 
> to Bigtable. First 10 errors: 
> Error mutating row key00175 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00175"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00176 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00176"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00177 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00177"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00178 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00178"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00179 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00179"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00180 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00180"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00181 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00181"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00182 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00182"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00183 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00183"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00184 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00184"
> }
> ]: UNKNOWN: Stale requests.
>  at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.checkForFailures(BigtableIO.java:655)
>  at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.finishBundle(BigtableIO.java:607)
> Stacktrace
> java.lang.RuntimeException: 
> (96dc5c8efaf8fa26): java.io.IOException: At least 25 errors occurred writing 
> to Bigtable. First 10 errors: 
> Error mutating row key00175 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00175"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00176 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00176"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00177 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00177"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00178 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00178"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00179 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00179"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00180 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00180"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00181 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00181"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00182 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00182"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00183 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00183"
> }
> ]: UNKNOWN: Stale requests.
> Error mutating row key00184 with mutations [set_cell {
>   family_name: "cf"
>   value: "value00184"
> }
> ]: UNKNOWN: Stale requests.
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.checkForFailures(BigtableIO.java:655)
>   at 
> org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.finishBundle(BigtableIO.java:607)
>   at 
> org.apache.beam.runners.dataflow.TestDataflowRunner.run(TestDataflowRunner.java:133)
>   at 
> org.apache.beam.runners.dataflow.TestDataflowRunner.run(TestDataflowRunner.java:89)
>   at 
> 

[jira] [Created] (BEAM-2545) bigtable e2e tests failing - UNKNOWN: Stale requests/Error mutating row

2017-06-29 Thread Stephen Sisk (JIRA)
Stephen Sisk created BEAM-2545:
--

 Summary: bigtable e2e tests failing -  UNKNOWN: Stale 
requests/Error mutating row
 Key: BEAM-2545
 URL: https://issues.apache.org/jira/browse/BEAM-2545
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-gcp
Reporter: Stephen Sisk
Assignee: Stephen Sisk


The BigtableWriteIT is taking a long time (~10min) and throwing errors. 

Example test run: 
https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_MavenInstall/4264/org.apache.beam$beam-runners-google-cloud-dataflow-java/testReport/junit/org.apache.beam.sdk.io.gcp.bigtable/BigtableWriteIT/testE2EBigtableWrite/

(96dc5c8efaf8fa26): java.io.IOException: At least 25 errors occurred writing to 
Bigtable. First 10 errors: 
Error mutating row key00175 with mutations [set_cell {
  family_name: "cf"
  value: "value00175"
}
]: UNKNOWN: Stale requests.
Error mutating row key00176 with mutations [set_cell {
  family_name: "cf"
  value: "value00176"
}
]: UNKNOWN: Stale requests.
Error mutating row key00177 with mutations [set_cell {
  family_name: "cf"
  value: "value00177"
}
]: UNKNOWN: Stale requests.
Error mutating row key00178 with mutations [set_cell {
  family_name: "cf"
  value: "value00178"
}
]: UNKNOWN: Stale requests.
Error mutating row key00179 with mutations [set_cell {
  family_name: "cf"
  value: "value00179"
}
]: UNKNOWN: Stale requests.
Error mutating row key00180 with mutations [set_cell {
  family_name: "cf"
  value: "value00180"
}
]: UNKNOWN: Stale requests.
Error mutating row key00181 with mutations [set_cell {
  family_name: "cf"
  value: "value00181"
}
]: UNKNOWN: Stale requests.
Error mutating row key00182 with mutations [set_cell {
  family_name: "cf"
  value: "value00182"
}
]: UNKNOWN: Stale requests.
Error mutating row key00183 with mutations [set_cell {
  family_name: "cf"
  value: "value00183"
}
]: UNKNOWN: Stale requests.
Error mutating row key00184 with mutations [set_cell {
  family_name: "cf"
  value: "value00184"
}
]: UNKNOWN: Stale requests.
 at 
org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.checkForFailures(BigtableIO.java:655)
 at 
org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.finishBundle(BigtableIO.java:607)
Stacktrace

java.lang.RuntimeException: 
(96dc5c8efaf8fa26): java.io.IOException: At least 25 errors occurred writing to 
Bigtable. First 10 errors: 
Error mutating row key00175 with mutations [set_cell {
  family_name: "cf"
  value: "value00175"
}
]: UNKNOWN: Stale requests.
Error mutating row key00176 with mutations [set_cell {
  family_name: "cf"
  value: "value00176"
}
]: UNKNOWN: Stale requests.
Error mutating row key00177 with mutations [set_cell {
  family_name: "cf"
  value: "value00177"
}
]: UNKNOWN: Stale requests.
Error mutating row key00178 with mutations [set_cell {
  family_name: "cf"
  value: "value00178"
}
]: UNKNOWN: Stale requests.
Error mutating row key00179 with mutations [set_cell {
  family_name: "cf"
  value: "value00179"
}
]: UNKNOWN: Stale requests.
Error mutating row key00180 with mutations [set_cell {
  family_name: "cf"
  value: "value00180"
}
]: UNKNOWN: Stale requests.
Error mutating row key00181 with mutations [set_cell {
  family_name: "cf"
  value: "value00181"
}
]: UNKNOWN: Stale requests.
Error mutating row key00182 with mutations [set_cell {
  family_name: "cf"
  value: "value00182"
}
]: UNKNOWN: Stale requests.
Error mutating row key00183 with mutations [set_cell {
  family_name: "cf"
  value: "value00183"
}
]: UNKNOWN: Stale requests.
Error mutating row key00184 with mutations [set_cell {
  family_name: "cf"
  value: "value00184"
}
]: UNKNOWN: Stale requests.
at 
org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.checkForFailures(BigtableIO.java:655)
at 
org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.finishBundle(BigtableIO.java:607)

at 
org.apache.beam.runners.dataflow.TestDataflowRunner.run(TestDataflowRunner.java:133)
at 
org.apache.beam.runners.dataflow.TestDataflowRunner.run(TestDataflowRunner.java:89)
at 
org.apache.beam.runners.dataflow.TestDataflowRunner.run(TestDataflowRunner.java:54)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:283)
at 
org.apache.beam.sdk.io.gcp.bigtable.BigtableWriteIT.testE2EBigtableWrite(BigtableWriteIT.java:128)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at 

[1/2] beam git commit: [BEAM-2193] Implement FULL, INNER, and OUTER JOIN: - FULL and INNER supported on all variations of unbounded/bounded joins. - OUTER JOIN supported when outer side is unbounded.

2017-06-29 Thread takidau
Repository: beam
Updated Branches:
  refs/heads/DSL_SQL ab4b11886 -> 2096da25e


[BEAM-2193] Implement FULL, INNER, and OUTER JOIN:
- FULL and INNER supported on all variations of unbounded/bounded joins.
- OUTER JOIN supported when outer side is unbounded.
- Unbounded/bounded joins implemented via side inputs.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/928cec59
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/928cec59
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/928cec59

Branch: refs/heads/DSL_SQL
Commit: 928cec597175c363d444331b35ac8793297a242b
Parents: ab4b118
Author: James Xu 
Authored: Mon May 29 11:11:34 2017 +0800
Committer: Tyler Akidau 
Committed: Thu Jun 29 16:32:23 2017 -0700

--
 dsls/pom.xml|   2 +-
 dsls/sql/pom.xml|  16 +-
 .../beam/dsls/sql/planner/BeamRuleSets.java |   6 +-
 .../beam/dsls/sql/rel/BeamAggregationRel.java   |  19 +-
 .../apache/beam/dsls/sql/rel/BeamJoinRel.java   | 305 +++
 .../apache/beam/dsls/sql/rule/BeamJoinRule.java |  53 
 .../beam/dsls/sql/schema/BeamSqlRecordType.java |   2 +-
 .../apache/beam/dsls/sql/schema/BeamSqlRow.java |   2 +-
 .../beam/dsls/sql/schema/BeamSqlRowCoder.java   |   3 -
 .../dsls/sql/transform/BeamJoinTransforms.java  | 166 ++
 .../org/apache/beam/dsls/sql/TestUtils.java | 125 
 .../dsls/sql/planner/MockedBeamSqlTable.java|   5 +-
 .../beam/dsls/sql/planner/MockedTable.java  |  33 ++
 .../dsls/sql/planner/MockedUnboundedTable.java  | 120 
 .../rel/BeamJoinRelBoundedVsBoundedTest.java| 195 
 .../rel/BeamJoinRelUnboundedVsBoundedTest.java  | 242 +++
 .../BeamJoinRelUnboundedVsUnboundedTest.java| 219 +
 .../dsls/sql/schema/BeamSqlRowCoderTest.java|   2 +-
 18 files changed, 1486 insertions(+), 29 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/928cec59/dsls/pom.xml
--
diff --git a/dsls/pom.xml b/dsls/pom.xml
index d932698..a518d03 100644
--- a/dsls/pom.xml
+++ b/dsls/pom.xml
@@ -66,7 +66,7 @@
 
   
 
-
+
 
   
 org.apache.maven.plugins

http://git-wip-us.apache.org/repos/asf/beam/blob/928cec59/dsls/sql/pom.xml
--
diff --git a/dsls/sql/pom.xml b/dsls/sql/pom.xml
index a2279d5..54f590e 100644
--- a/dsls/sql/pom.xml
+++ b/dsls/sql/pom.xml
@@ -157,6 +157,11 @@
   provided
 
 
+  org.apache.kafka
+  kafka-clients
+  0.9.0.1
+
+
   com.google.guava
   guava
 
@@ -193,21 +198,18 @@
   joda-time
 
 
-  org.apache.kafka
-  kafka-clients
-  0.10.1.0
-  provided
-
-
   com.google.auto.value
   auto-value
   provided
 
-
 
   org.apache.beam
   beam-runners-direct-java
   test
 
+
+  org.apache.beam
+  beam-sdks-java-extensions-join-library
+
   
 

http://git-wip-us.apache.org/repos/asf/beam/blob/928cec59/dsls/sql/src/main/java/org/apache/beam/dsls/sql/planner/BeamRuleSets.java
--
diff --git 
a/dsls/sql/src/main/java/org/apache/beam/dsls/sql/planner/BeamRuleSets.java 
b/dsls/sql/src/main/java/org/apache/beam/dsls/sql/planner/BeamRuleSets.java
index 6c73558..552ff8f 100644
--- a/dsls/sql/src/main/java/org/apache/beam/dsls/sql/planner/BeamRuleSets.java
+++ b/dsls/sql/src/main/java/org/apache/beam/dsls/sql/planner/BeamRuleSets.java
@@ -19,15 +19,14 @@ package org.apache.beam.dsls.sql.planner;
 
 import com.google.common.collect.ImmutableList;
 import com.google.common.collect.ImmutableSet;
-
 import java.util.Iterator;
-
 import org.apache.beam.dsls.sql.rel.BeamRelNode;
 import org.apache.beam.dsls.sql.rule.BeamAggregationRule;
 import org.apache.beam.dsls.sql.rule.BeamFilterRule;
 import org.apache.beam.dsls.sql.rule.BeamIOSinkRule;
 import org.apache.beam.dsls.sql.rule.BeamIOSourceRule;
 import org.apache.beam.dsls.sql.rule.BeamIntersectRule;
+import org.apache.beam.dsls.sql.rule.BeamJoinRule;
 import org.apache.beam.dsls.sql.rule.BeamMinusRule;
 import org.apache.beam.dsls.sql.rule.BeamProjectRule;
 import org.apache.beam.dsls.sql.rule.BeamSortRule;
@@ -47,7 +46,8 @@ public class BeamRuleSets {
   .builder().add(BeamIOSourceRule.INSTANCE, 
BeamProjectRule.INSTANCE,
   BeamFilterRule.INSTANCE, BeamIOSinkRule.INSTANCE,
   BeamAggregationRule.INSTANCE, BeamSortRule.INSTANCE, 
BeamValuesRule.INSTANCE,
-  BeamIntersectRule.INSTANCE, BeamMinusRule.INSTANCE, 
BeamUnionRule.INSTANCE)
+  

[2/2] beam git commit: [BEAM-2193] This closes #3277

2017-06-29 Thread takidau
[BEAM-2193] This closes #3277


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/2096da25
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/2096da25
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/2096da25

Branch: refs/heads/DSL_SQL
Commit: 2096da25e85d97ab52850453c2130ff706d7bcdf
Parents: ab4b118 928cec5
Author: Tyler Akidau 
Authored: Thu Jun 29 16:34:45 2017 -0700
Committer: Tyler Akidau 
Committed: Thu Jun 29 16:34:45 2017 -0700

--
 dsls/pom.xml|   2 +-
 dsls/sql/pom.xml|  16 +-
 .../beam/dsls/sql/planner/BeamRuleSets.java |   6 +-
 .../beam/dsls/sql/rel/BeamAggregationRel.java   |  19 +-
 .../apache/beam/dsls/sql/rel/BeamJoinRel.java   | 305 +++
 .../apache/beam/dsls/sql/rule/BeamJoinRule.java |  53 
 .../beam/dsls/sql/schema/BeamSqlRecordType.java |   2 +-
 .../apache/beam/dsls/sql/schema/BeamSqlRow.java |   2 +-
 .../beam/dsls/sql/schema/BeamSqlRowCoder.java   |   3 -
 .../dsls/sql/transform/BeamJoinTransforms.java  | 166 ++
 .../org/apache/beam/dsls/sql/TestUtils.java | 125 
 .../dsls/sql/planner/MockedBeamSqlTable.java|   5 +-
 .../beam/dsls/sql/planner/MockedTable.java  |  33 ++
 .../dsls/sql/planner/MockedUnboundedTable.java  | 120 
 .../rel/BeamJoinRelBoundedVsBoundedTest.java| 195 
 .../rel/BeamJoinRelUnboundedVsBoundedTest.java  | 242 +++
 .../BeamJoinRelUnboundedVsUnboundedTest.java| 219 +
 .../dsls/sql/schema/BeamSqlRowCoderTest.java|   2 +-
 18 files changed, 1486 insertions(+), 29 deletions(-)
--




Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #4265

2017-06-29 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-2490) ReadFromText function is not taking all data with glob operator (*)

2017-06-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069092#comment-16069092
 ] 

Guillermo Rodríguez Cano commented on BEAM-2490:


Awesome. Thanks [~chamikara]!

I guess that this issue is then blocked until the gzip performance issue in 
Python is resolved. I'll have a look at the use of zlib during the weekend, and 
report in the new issue with the findings :)

> ReadFromText function is not taking all data with glob operator (*) 
> 
>
> Key: BEAM-2490
> URL: https://issues.apache.org/jira/browse/BEAM-2490
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Affects Versions: 2.0.0
> Environment: Usage with Google Cloud Platform: Dataflow runner
>Reporter: Olivier NGUYEN QUOC
>Assignee: Chamikara Jayalath
> Fix For: Not applicable
>
>
> I run a very simple pipeline:
> * Read my files from Google Cloud Storage
> * Split with '\n' char
> * Write in on a Google Cloud Storage
> I have 8 files that match with the pattern:
> * my_files_2016090116_20160902_060051_xx.csv.gz (229.25 MB)
> * my_files_2016090117_20160902_060051_xx.csv.gz (184.1 MB)
> * my_files_2016090118_20160902_060051_xx.csv.gz (171.73 MB)
> * my_files_2016090119_20160902_060051_xx.csv.gz (151.34 MB)
> * my_files_2016090120_20160902_060051_xx.csv.gz (129.69 MB)
> * my_files_2016090121_20160902_060051_xx.csv.gz (151.7 MB)
> * my_files_2016090122_20160902_060051_xx.csv.gz (346.46 MB)
> * my_files_2016090122_20160902_060051_xx.csv.gz (222.57 MB)
> This code should take them all:
> {code:python}
> beam.io.ReadFromText(
>   "gs://_folder1/my_files_20160901*.csv.gz",
>   skip_header_lines=1,
>   compression_type=beam.io.filesystem.CompressionTypes.GZIP
>   )
> {code}
> It runs well but there is only a 288.62 MB file in output of this pipeline 
> (instead of a 1.5 GB file).
> The whole pipeline code:
> {code:python}
> data = (p | 'ReadMyFiles' >> beam.io.ReadFromText(
>   "gs://_folder1/my_files_20160901*.csv.gz",
>   skip_header_lines=1,
>   compression_type=beam.io.filesystem.CompressionTypes.GZIP
>   )
>| 'SplitLines' >> beam.FlatMap(lambda x: x.split('\n'))
> )
> output = (
>   data| "Write" >> beam.io.WriteToText('gs://XXX_folder2/test.csv', 
> num_shards=1)
> )
> {code}
> Dataflow indicates me that the estimated size of the output after the 
> ReadFromText step is 602.29 MB only, which not correspond to any unique input 
> file size nor the overall file size matching with the pattern.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #4264

2017-06-29 Thread Apache Jenkins Server
See 




[jira] [Updated] (BEAM-2535) Allow explicit output time independent of firing specification for all timers

2017-06-29 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-2535:
--
Description: 
Today, we have insufficient control over the event time timestamp of elements 
output from a timer callback.

1. For an event time timer, it is the timestamp of the timer itself.
2. For a processing time timer, it is the current input watermark at the time 
of processing.

But for both of these, we may want to reserve the right to output a particular 
time, aka set a "watermark hold".

A naive implementation of a {{TimerWithWatermarkHold}} would work for making 
sure output is not droppable, but does not fully explain window expiration and 
late data/timer dropping.

In the natural interpretation of a timer as a feedback loop on a transform, 
timers should be viewed as another channel of input, with a watermark, and 
items on that channel _all need event time timestamps even if they are 
delivered according to a different time domain_.

I propose that the specification for when a timer should fire should be 
separated (with nice defaults) from the specification of the event time of 
resulting outputs. These timestamps will determine a side channel with a new 
"timer watermark" that constrains the output watermark.

 - We still need to fire event time timers according to the input watermark, so 
that event time timers fire.
 - Late data dropping and window expiration will be in terms of the minimum of 
the input watermark and the timer watermark. In this way, whenever a timer is 
set, the window is not going to be garbage collected.
 - We will need to make sure we have a way to "wake up" a window once it is 
expired; this may be as simple as exhausting the timer channel as soon as the 
input watermark indicates expiration of a window

This is mostly aimed at end-user timers in a stateful+timely {{DoFn}}. It seems 
reasonable to use timers as an implementation detail (e.g. in runners-core 
utilities) without wanting any of this additional machinery. For example, if 
there is no possibility of output from the timer callback.

  was:
Today, we have insufficient control over the event time timestamp of elements 
output from a timer callback.

1. For an event time timer, it is the timestamp of the timer itself.
2. For a processing time timer, it is the current input watermark at the time 
of processing.

But for both of these, we may want to reserve the right to output a particular 
time, aka set a "watermark hold".

A naive implementation of a {{TimerWithWatermarkHold}} would work for making 
sure output is not droppable, but does not fully explain window expiration and 
late data/timer dropping.

In the natural interpretation of a timer as a feedback loop on a transform, 
timers should be viewed as another channel of input, with a watermark, and 
items on that channel _all need event time timestamps even if they are 
delivered according to a different time domain_.

I propose that the specification for when a timer should fire should be 
separated (with nice defaults) from the specification of the event time of 
resulting outputs. These timestamps will determine a side channel with a new 
"timer watermark" that constrains the output watermark.

 - We still need to fire event time timers according to the input watermark, so 
that event time timers fire.
 - Late data dropping and window expiration will be in terms of the minimum of 
the input watermark and the timer watermark. In this way, whenever a timer is 
set, the window is not going to be garbage collected.
 - We will need to make sure we have a way to "wake up" a window once it is 
expired; this may be as simple as exhausting the timer channel as soon as the 
input watermark indicates expiration of a window


> Allow explicit output time independent of firing specification for all timers
> -
>
> Key: BEAM-2535
> URL: https://issues.apache.org/jira/browse/BEAM-2535
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> Today, we have insufficient control over the event time timestamp of elements 
> output from a timer callback.
> 1. For an event time timer, it is the timestamp of the timer itself.
> 2. For a processing time timer, it is the current input watermark at the time 
> of processing.
> But for both of these, we may want to reserve the right to output a 
> particular time, aka set a "watermark hold".
> A naive implementation of a {{TimerWithWatermarkHold}} would work for making 
> sure output is not droppable, but does not fully explain window expiration 
> and late data/timer dropping.
> In the natural interpretation of a timer as a feedback loop on a transform, 
> timers should be viewed as 

[jira] [Comment Edited] (BEAM-2535) Allow explicit output time independent of firing specification for all timers

2017-06-29 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067342#comment-16067342
 ] 

Kenneth Knowles edited comment on BEAM-2535 at 6/29/17 8:45 PM:


CC [~reuvenlax] [~aljoscha] [~lzljs3620320] [~jkff]


was (Author: kenn):
CC [~reuvenlax] [~aljoscha] [~lzljs3620320]

> Allow explicit output time independent of firing specification for all timers
> -
>
> Key: BEAM-2535
> URL: https://issues.apache.org/jira/browse/BEAM-2535
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> Today, we have insufficient control over the event time timestamp of elements 
> output from a timer callback.
> 1. For an event time timer, it is the timestamp of the timer itself.
> 2. For a processing time timer, it is the current input watermark at the time 
> of processing.
> But for both of these, we may want to reserve the right to output a 
> particular time, aka set a "watermark hold".
> A naive implementation of a {{TimerWithWatermarkHold}} would work for making 
> sure output is not droppable, but does not fully explain window expiration 
> and late data/timer dropping.
> In the natural interpretation of a timer as a feedback loop on a transform, 
> timers should be viewed as another channel of input, with a watermark, and 
> items on that channel _all need event time timestamps even if they are 
> delivered according to a different time domain_.
> I propose that the specification for when a timer should fire should be 
> separated (with nice defaults) from the specification of the event time of 
> resulting outputs. These timestamps will determine a side channel with a new 
> "timer watermark" that constrains the output watermark.
>  - We still need to fire event time timers according to the input watermark, 
> so that event time timers fire.
>  - Late data dropping and window expiration will be in terms of the minimum 
> of the input watermark and the timer watermark. In this way, whenever a timer 
> is set, the window is not going to be garbage collected.
>  - We will need to make sure we have a way to "wake up" a window once it is 
> expired; this may be as simple as exhausting the timer channel as soon as the 
> input watermark indicates expiration of a window



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2140) Fix SplittableDoFn ValidatesRunner tests in FlinkRunner

2017-06-29 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16068957#comment-16068957
 ] 

Kenneth Knowles commented on BEAM-2140:
---

Yea, as you say SDF can play by its own rules, so you can certainly just choose 
to not drop any timers, and you can clean up stored element+restriction pairs 
when your input watermark is far enough along and there are no more timers 
pending for that window. Seems likely that in the SDF implementation you might 
find an even simpler way to just look at the residual restriction, notice it is 
empty, and clear the state.

I don't think an SDF should wait at all in a batch pipeline. In the unified 
model, this means bounded-to-bounded SDFs. It doesn't really make sense to even 
use timers to implement so I think the discussion is moot.

> Fix SplittableDoFn ValidatesRunner tests in FlinkRunner
> ---
>
> Key: BEAM-2140
> URL: https://issues.apache.org/jira/browse/BEAM-2140
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> As discovered as part of BEAM-1763, there is a failing SDF test. We disabled 
> the tests to unblock the open PR for BEAM-1763.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2544) AvroIOTest is flaky

2017-06-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16068941#comment-16068941
 ] 

ASF GitHub Bot commented on BEAM-2544:
--

GitHub user alex-filatov opened a pull request:

https://github.com/apache/beam/pull/3475

[BEAM-2544] Fix flaky AvroIOTest

Source of the flakiness is a race condition in a "write then read" subset 
of tests. Test pipeline was constructed in a way that write and read operations 
were performed concurrently instead of sequentially.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alex-filatov/beam 
beam-2544-fix-flaky-avroittest

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3475.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3475


commit 1b6b38c7069820ca81dd18f44576c3959c35d80a
Author: Alex Filatov 
Date:   2017-06-29T20:23:04Z

[BEAM-2544] Fix flaky AvroIOTest by eliminating race condition in "write 
then read" tests.




> AvroIOTest is flaky
> ---
>
> Key: BEAM-2544
> URL: https://issues.apache.org/jira/browse/BEAM-2544
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Alex Filatov
>Assignee: Davor Bonaci
>Priority: Minor
>
> "Write then read" tests randomly fail.
> Steps to reproduce:
> cd /runners/direct-java
> mvn clean compile
> mvn surefire:test@validates-runner-tests -Dtest=AvroIOTest
> Repeat last step until a failure (on my machine failure rate is approx 1/3).
> Example:
> [ERROR] 
> testAvroIOWriteAndReadSchemaUpgrade(org.apache.beam.sdk.io.AvroIOTest)  Time 
> elapsed: 0.198 s  <<< ERROR!
> java.lang.RuntimeException: java.io.FileNotFoundException: 
> /var/folders/1c/sl733g5s1g7_4mq61_qmbjx4gn/T/junit3332447750239941326/output.avro
>  (No such file or directory)
>   at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:340)
>   at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302)
>   at 
> org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:201)
>   at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:283)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:340)
>   at 
> org.apache.beam.sdk.io.AvroIOTest.testAvroIOWriteAndReadSchemaUpgrade(AvroIOTest.java:275)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:321)
>   at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:321)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runners.Suite.runChild(Suite.java:128)
>   at org.junit.runners.Suite.runChild(Suite.java:27)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> 

[GitHub] beam pull request #3475: [BEAM-2544] Fix flaky AvroIOTest

2017-06-29 Thread alex-filatov
GitHub user alex-filatov opened a pull request:

https://github.com/apache/beam/pull/3475

[BEAM-2544] Fix flaky AvroIOTest

Source of the flakiness is a race condition in a "write then read" subset 
of tests. Test pipeline was constructed in a way that write and read operations 
were performed concurrently instead of sequentially.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alex-filatov/beam 
beam-2544-fix-flaky-avroittest

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3475.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3475


commit 1b6b38c7069820ca81dd18f44576c3959c35d80a
Author: Alex Filatov 
Date:   2017-06-29T20:23:04Z

[BEAM-2544] Fix flaky AvroIOTest by eliminating race condition in "write 
then read" tests.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (BEAM-2544) AvroIOTest is flaky

2017-06-29 Thread Alex Filatov (JIRA)
Alex Filatov created BEAM-2544:
--

 Summary: AvroIOTest is flaky
 Key: BEAM-2544
 URL: https://issues.apache.org/jira/browse/BEAM-2544
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Reporter: Alex Filatov
Assignee: Davor Bonaci
Priority: Minor


"Write then read" tests randomly fail.

Steps to reproduce:

cd /runners/direct-java
mvn clean compile
mvn surefire:test@validates-runner-tests -Dtest=AvroIOTest

Repeat last step until a failure (on my machine failure rate is approx 1/3).

Example:

[ERROR] testAvroIOWriteAndReadSchemaUpgrade(org.apache.beam.sdk.io.AvroIOTest)  
Time elapsed: 0.198 s  <<< ERROR!
java.lang.RuntimeException: java.io.FileNotFoundException: 
/var/folders/1c/sl733g5s1g7_4mq61_qmbjx4gn/T/junit3332447750239941326/output.avro
 (No such file or directory)
at 
org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:340)
at 
org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302)
at 
org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:201)
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:283)
at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:340)
at 
org.apache.beam.sdk.io.AvroIOTest.testAvroIOWriteAndReadSchemaUpgrade(AvroIOTest.java:275)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
at 
org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:321)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at 
org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:321)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runners.Suite.runChild(Suite.java:128)
at org.junit.runners.Suite.runChild(Suite.java:27)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.apache.maven.surefire.junitcore.JUnitCore.run(JUnitCore.java:55)
at 
org.apache.maven.surefire.junitcore.JUnitCoreWrapper.createRequestAndRun(JUnitCoreWrapper.java:137)
at 
org.apache.maven.surefire.junitcore.JUnitCoreWrapper.executeEager(JUnitCoreWrapper.java:107)
at 
org.apache.maven.surefire.junitcore.JUnitCoreWrapper.execute(JUnitCoreWrapper.java:83)
at 
org.apache.maven.surefire.junitcore.JUnitCoreWrapper.execute(JUnitCoreWrapper.java:75)
at 
org.apache.maven.surefire.junitcore.JUnitCoreProvider.invoke(JUnitCoreProvider.java:157)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:386)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:323)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:143)
Caused by: java.io.FileNotFoundException: 

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2515

2017-06-29 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #4263

2017-06-29 Thread Apache Jenkins Server
See 




Jenkins build is unstable: beam_PostCommit_Java_MavenInstall #4262

2017-06-29 Thread Apache Jenkins Server
See 




[jira] [Resolved] (BEAM-2373) AvroSource: Premature End of stream Exception on SnappyCompressorInputStream

2017-06-29 Thread Luke Cwik (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-2373.
-
Resolution: Fixed

> AvroSource: Premature End of stream Exception on SnappyCompressorInputStream
> 
>
> Key: BEAM-2373
> URL: https://issues.apache.org/jira/browse/BEAM-2373
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.0.0
>Reporter: Michael Luckey
>Assignee: Michael Luckey
>Priority: Critical
> Fix For: 2.1.0
>
>
> During processing we encountered on some of our snappy encoded avro input 
> files
> {noformat}
> Exception in thread "main" java.lang.RuntimeException: java.io.IOException: 
> Premature end of stream
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:330)
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:292)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:200)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:63)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:295)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:281)
> Caused by: java.io.IOException: Premature end of stream
>  at 
> org.apache.beam.sdk.repackaged.org.apache.commons.compress.compressors.snappy.SnappyCompressorInputStream.expandLiteral(SnappyCompressorInputStream.java:310)
>  at 
> org.apache.beam.sdk.repackaged.org.apache.commons.compress.compressors.snappy.SnappyCompressorInputStream.fill(SnappyCompressorInputStream.java:169)
>  at 
> org.apache.beam.sdk.repackaged.org.apache.commons.compress.compressors.snappy.SnappyCompressorInputStream.read(SnappyCompressorInputStream.java:134)
>  at 
> org.apache.avro.io.BinaryDecoder$InputStreamByteSource.tryReadRaw(BinaryDecoder.java:839)
>  at 
> org.apache.avro.io.BinaryDecoder$ByteSource.compactAndFill(BinaryDecoder.java:692)
>  at org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:471)
>  at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128)
>  at org.apache.avro.io.BinaryDecoder.readIndex(BinaryDecoder.java:423)
>  at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:290)
>  at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>  at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267)
>  at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:178)
>  at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
>  at 
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:240)
>  at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:230)
>  at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:174)
>  at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
>  at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
>  at 
> org.apache.beam.sdk.io.AvroSource$AvroBlock.readNextRecord(AvroSource.java:579)
>  at 
> org.apache.beam.sdk.io.BlockBasedSource$BlockBasedReader.readNextRecord(BlockBasedSource.java:198)
>  at 
> org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.advanceImpl(FileBasedSource.java:479)
>  at 
> org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.advance(OffsetBasedSource.java:277)
>  at 
> org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$BoundedReadEvaluator.processElement(BoundedReadEvaluatorFactory.java:148)
>  at 
> org.apache.beam.runners.direct.TransformExecutor.processElements(TransformExecutor.java:146)
>  at 
> org.apache.beam.runners.direct.TransformExecutor.run(TransformExecutor.java:110)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This seems to be caused by a bug in apache.commons.compress:1.9, which was 
> addressed here:
> https://github.com/apache/commons-compress/commit/9ae37525134089dd0c9ee1cf8738192b70e0fc07
> Used a pipeline using AvroIO, on spark and direct, both on hdfs and local 
> file system.
> In our short tests we got it running without exceptions by either:
> * upgrading to commons.compress:1.14
> * applying the patch to the 1.9er code of SnappyCompressorInputStream
> Impacts on other components were not tested, of course :(



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2521) Simplify packaging for python distributions

2017-06-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16068867#comment-16068867
 ] 

ASF GitHub Bot commented on BEAM-2521:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3474


> Simplify packaging for python distributions
> ---
>
> Key: BEAM-2521
> URL: https://issues.apache.org/jira/browse/BEAM-2521
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
> Fix For: 2.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3474: [BEAM-2521] Select SDK distribution based on the se...

2017-06-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3474


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: This closes #3474

2017-06-29 Thread altay
This closes #3474


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/99b5b056
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/99b5b056
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/99b5b056

Branch: refs/heads/master
Commit: 99b5b0563f85e6e2e22bb7b9148b4058955aeae1
Parents: 3690073 dc1dca8
Author: Ahmet Altay 
Authored: Thu Jun 29 12:45:44 2017 -0700
Committer: Ahmet Altay 
Committed: Thu Jun 29 12:45:44 2017 -0700

--
 .../runners/dataflow/internal/dependency.py | 23 ++--
 1 file changed, 16 insertions(+), 7 deletions(-)
--




[1/2] beam git commit: Select SDK distribution based on the selected SDK name

2017-06-29 Thread altay
Repository: beam
Updated Branches:
  refs/heads/master 369007316 -> 99b5b0563


Select SDK distribution based on the selected SDK name


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/dc1dca86
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/dc1dca86
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/dc1dca86

Branch: refs/heads/master
Commit: dc1dca8633775545b5b4b509724716108d5d01e4
Parents: 3690073
Author: Ahmet Altay 
Authored: Thu Jun 29 10:56:25 2017 -0700
Committer: Ahmet Altay 
Committed: Thu Jun 29 12:45:41 2017 -0700

--
 .../runners/dataflow/internal/dependency.py | 23 ++--
 1 file changed, 16 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/dc1dca86/sdks/python/apache_beam/runners/dataflow/internal/dependency.py
--
diff --git a/sdks/python/apache_beam/runners/dataflow/internal/dependency.py 
b/sdks/python/apache_beam/runners/dataflow/internal/dependency.py
index a40a582..62c09ed 100644
--- a/sdks/python/apache_beam/runners/dataflow/internal/dependency.py
+++ b/sdks/python/apache_beam/runners/dataflow/internal/dependency.py
@@ -69,12 +69,15 @@ from apache_beam.utils import processes
 from apache_beam.options.pipeline_options import GoogleCloudOptions
 from apache_beam.options.pipeline_options import SetupOptions
 
+# All constants are for internal use only; no backwards-compatibility
+# guarantees.
 
+# In a released version BEAM_CONTAINER_VERSION and BEAM_FNAPI_CONTAINER_VERSION
+# should match each other, and should be in the same format as the SDK version
+# (i.e. MAJOR.MINOR.PATCH). For non-released (dev) versions, read below.
 # Update this version to the next version whenever there is a change that will
 # require changes to legacy Dataflow worker execution environment.
 # This should be in the beam-[version]-[date] format, date is optional.
-# BEAM_CONTAINER_VERSION and BEAM_FNAPI_CONTAINER version should coincide
-# when we make a release.
 BEAM_CONTAINER_VERSION = 'beam-2.1.0-20170626'
 # Update this version to the next version whenever there is a change that
 # requires changes to SDK harness container or SDK harness launcher.
@@ -86,9 +89,14 @@ WORKFLOW_TARBALL_FILE = 'workflow.tar.gz'
 REQUIREMENTS_FILE = 'requirements.txt'
 EXTRA_PACKAGES_FILE = 'extra_packages.txt'
 
+# Package names for different distributions
 GOOGLE_PACKAGE_NAME = 'google-cloud-dataflow'
 BEAM_PACKAGE_NAME = 'apache-beam'
 
+# SDK identifiers for different distributions
+GOOGLE_SDK_NAME = 'Google Cloud Dataflow SDK for Python'
+BEAM_SDK_NAME = 'Apache Beam SDK for Python'
+
 
 def _dependency_file_copy(from_path, to_path):
   """Copies a local file to a GCS file or vice versa."""
@@ -536,19 +544,20 @@ def get_sdk_name_and_version():
   container_version = _get_required_container_version()
   try:
 pkg.get_distribution(GOOGLE_PACKAGE_NAME)
-return ('Google Cloud Dataflow SDK for Python', container_version)
+return (GOOGLE_SDK_NAME, container_version)
   except pkg.DistributionNotFound:
-return ('Apache Beam SDK for Python', beam_version.__version__)
+return (BEAM_SDK_NAME, beam_version.__version__)
 
 
 def get_sdk_package_name():
   """For internal use only; no backwards-compatibility guarantees.
 
   Returns the PyPI package name to be staged to Google Cloud Dataflow."""
-  container_version = _get_required_container_version()
-  if container_version == BEAM_CONTAINER_VERSION:
+  sdk_name, _ = get_sdk_name_and_version()
+  if sdk_name == GOOGLE_SDK_NAME:
+return GOOGLE_PACKAGE_NAME
+  else:
 return BEAM_PACKAGE_NAME
-  return GOOGLE_PACKAGE_NAME
 
 
 def _download_pypi_sdk_package(temp_dir):



Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2514

2017-06-29 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2513

2017-06-29 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-2373) AvroSource: Premature End of stream Exception on SnappyCompressorInputStream

2017-06-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16068790#comment-16068790
 ] 

ASF GitHub Bot commented on BEAM-2373:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3252


> AvroSource: Premature End of stream Exception on SnappyCompressorInputStream
> 
>
> Key: BEAM-2373
> URL: https://issues.apache.org/jira/browse/BEAM-2373
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.0.0
>Reporter: Michael Luckey
>Assignee: Michael Luckey
>Priority: Critical
> Fix For: 2.1.0
>
>
> During processing we encountered on some of our snappy encoded avro input 
> files
> {noformat}
> Exception in thread "main" java.lang.RuntimeException: java.io.IOException: 
> Premature end of stream
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:330)
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:292)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:200)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:63)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:295)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:281)
> Caused by: java.io.IOException: Premature end of stream
>  at 
> org.apache.beam.sdk.repackaged.org.apache.commons.compress.compressors.snappy.SnappyCompressorInputStream.expandLiteral(SnappyCompressorInputStream.java:310)
>  at 
> org.apache.beam.sdk.repackaged.org.apache.commons.compress.compressors.snappy.SnappyCompressorInputStream.fill(SnappyCompressorInputStream.java:169)
>  at 
> org.apache.beam.sdk.repackaged.org.apache.commons.compress.compressors.snappy.SnappyCompressorInputStream.read(SnappyCompressorInputStream.java:134)
>  at 
> org.apache.avro.io.BinaryDecoder$InputStreamByteSource.tryReadRaw(BinaryDecoder.java:839)
>  at 
> org.apache.avro.io.BinaryDecoder$ByteSource.compactAndFill(BinaryDecoder.java:692)
>  at org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:471)
>  at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128)
>  at org.apache.avro.io.BinaryDecoder.readIndex(BinaryDecoder.java:423)
>  at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:290)
>  at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>  at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267)
>  at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:178)
>  at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
>  at 
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:240)
>  at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:230)
>  at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:174)
>  at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
>  at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
>  at 
> org.apache.beam.sdk.io.AvroSource$AvroBlock.readNextRecord(AvroSource.java:579)
>  at 
> org.apache.beam.sdk.io.BlockBasedSource$BlockBasedReader.readNextRecord(BlockBasedSource.java:198)
>  at 
> org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.advanceImpl(FileBasedSource.java:479)
>  at 
> org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.advance(OffsetBasedSource.java:277)
>  at 
> org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$BoundedReadEvaluator.processElement(BoundedReadEvaluatorFactory.java:148)
>  at 
> org.apache.beam.runners.direct.TransformExecutor.processElements(TransformExecutor.java:146)
>  at 
> org.apache.beam.runners.direct.TransformExecutor.run(TransformExecutor.java:110)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This seems to be caused by a bug in apache.commons.compress:1.9, which was 
> addressed here:
> https://github.com/apache/commons-compress/commit/9ae37525134089dd0c9ee1cf8738192b70e0fc07
> Used a pipeline using AvroIO, on spark and direct, both on hdfs and local 
> file system.
> In our short tests we got it running without exceptions by either:
> * upgrading to commons.compress:1.14
> * applying the patch to the 1.9er code of SnappyCompressorInputStream
> Impacts on other components were not tested, of 

[GitHub] beam pull request #3252: BEAM-2373 Upgrade commons-compress dependency versi...

2017-06-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3252


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: [BEAM-2373] Upgrade commons-compress dependency version to 1.14

2017-06-29 Thread lcwik
Repository: beam
Updated Branches:
  refs/heads/master ae3a75263 -> 369007316


[BEAM-2373] Upgrade commons-compress dependency version to 1.14


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/5744fa84
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/5744fa84
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/5744fa84

Branch: refs/heads/master
Commit: 5744fa84520c16cce73752e7d04e8b6628ef8979
Parents: ae3a752
Author: Michael Luckey 
Authored: Mon May 29 01:00:48 2017 +0200
Committer: Luke Cwik 
Committed: Thu Jun 29 11:56:36 2017 -0700

--
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/5744fa84/pom.xml
--
diff --git a/pom.xml b/pom.xml
index 536a11c..fe51660 100644
--- a/pom.xml
+++ b/pom.xml
@@ -101,7 +101,7 @@
 
 
 
-1.9
+1.14
 3.6
 1.1
 2.24.0



[2/2] beam git commit: [BEAM-2373] Upgrade commons-compress dependency version to 1.14

2017-06-29 Thread lcwik
[BEAM-2373] Upgrade commons-compress dependency version to 1.14

This closes #3252


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/36900731
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/36900731
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/36900731

Branch: refs/heads/master
Commit: 3690073169bdd512bba24e1eb5daaea991e8879d
Parents: ae3a752 5744fa8
Author: Luke Cwik 
Authored: Thu Jun 29 11:57:16 2017 -0700
Committer: Luke Cwik 
Committed: Thu Jun 29 11:57:16 2017 -0700

--
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--




[jira] [Commented] (BEAM-2140) Fix SplittableDoFn ValidatesRunner tests in FlinkRunner

2017-06-29 Thread Eugene Kirpichov (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16068782#comment-16068782
 ] 

Eugene Kirpichov commented on BEAM-2140:


Conceptually, watermarks are for PCollections - lower bound on timestamps of 
new elements that may get added to the collection.
However, at the implementation level, watermarks are assigned to transforms: 
they have an "input watermark" and "output watermark" (I suppose, per input and 
per output).
The difference between the output watermark of a transform producing PC and the 
input watermark of a transform consuming PC is as follows: the input watermark 
is held by "pending elements", that we know need to be processed, but yet 
haven't.
The input watermark is also held by the event-time of pending timers set by the 
transform. In other words, logically the transform's input is (output of the 
producer of the input) + (timers set by the transform itself), and the input 
watermark is held by both of these.

Currently the input watermark of a transform is held only by _event-time_ 
timers; however, it makes sense to hold it also by _processing-time_ timers. 
For that we need to assign them an event-time timestamp. Currently this isn't 
happening at all (except assigning an "effective timestamp" to output from the 
timer firing, when it fires - it is assigned from the current input watermark). 
The suggestion in case of SDF is to use the ProcessContinuation's output 
watermark as the event-time for the residual timer.

We also discussed handling of processing-time timers in batch. Coming from the 
point of view that things should work exactly the same way in batch - setting a 
processing-time timer in batch for firing in 5 minutes should actually fire it 
after 5 minutes, including possibly delaying the bundle until processing-time 
timers quiesce. Motivating use case is, say, using an SDF-based polling 
continuous glob expander in a batch pipeline - it should process the same set 
of files it would in a streaming pipeline.

A few questions I still do not understand:
- Where exactly do the processing-timers get dropped, and on what condition? 
Kenn says that event-time timers don't get dropped: we just forbid setting them 
if they would be already "late". 
- When can an input to the SDF, or a timer set by the SDF be late at all; and 
should the SDF drop them? Technically a runner is free to drop late data at any 
point in the pipeline, but in practice it happens after GBKs; and semantically 
an SDF need not involve a GBK, so it should be allowed to just not drop 
anything late, no? - like a regular DoFn would (as long as it doesn't leak 
state)

Seems like we also should file JIRAs for the following:
- state leakage
- handling processing-time timers in batch properly
- holding watermark by processing-time timers
- allowing the timer API (internals or the user-facing one) to specifying 
event-time of processing-time timers
- more?

> Fix SplittableDoFn ValidatesRunner tests in FlinkRunner
> ---
>
> Key: BEAM-2140
> URL: https://issues.apache.org/jira/browse/BEAM-2140
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> As discovered as part of BEAM-1763, there is a failing SDF test. We disabled 
> the tests to unblock the open PR for BEAM-1763.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #4260

2017-06-29 Thread Apache Jenkins Server
See 




[GitHub] beam pull request #3471: Define the projectId in the SpannerIO Read Test (ut...

2017-06-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3471


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: Define the projectId in the SpannerIO Read Test (utest, not itest)

2017-06-29 Thread jbonofre
Repository: beam
Updated Branches:
  refs/heads/master bf5aa1bca -> ae3a75263


Define the projectId in the SpannerIO Read Test (utest, not itest)


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/c6de4233
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/c6de4233
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/c6de4233

Branch: refs/heads/master
Commit: c6de4233d1c1bc812ba2dea45291d9dcb40aa152
Parents: bf5aa1b
Author: Jean-Baptiste Onofré 
Authored: Thu Jun 29 13:25:36 2017 +0200
Committer: Jean-Baptiste Onofré 
Committed: Thu Jun 29 20:39:39 2017 +0200

--
 .../org/apache/beam/sdk/io/gcp/spanner/SpannerIOReadTest.java  | 6 ++
 1 file changed, 6 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/c6de4233/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOReadTest.java
--
diff --git 
a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOReadTest.java
 
b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOReadTest.java
index e5d4e72..5ba2da0 100644
--- 
a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOReadTest.java
+++ 
b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOReadTest.java
@@ -150,6 +150,7 @@ public class SpannerIOReadTest implements Serializable {
   public void runQuery() throws Exception {
 SpannerIO.Read read =
 SpannerIO.read()
+.withProjectId("test")
 .withInstanceId("123")
 .withDatabaseId("aaa")
 .withTimestamp(Timestamp.now())
@@ -176,6 +177,7 @@ public class SpannerIOReadTest implements Serializable {
   public void runRead() throws Exception {
 SpannerIO.Read read =
 SpannerIO.read()
+.withProjectId("test")
 .withInstanceId("123")
 .withDatabaseId("aaa")
 .withTimestamp(Timestamp.now())
@@ -202,6 +204,7 @@ public class SpannerIOReadTest implements Serializable {
   public void runReadUsingIndex() throws Exception {
 SpannerIO.Read read =
 SpannerIO.read()
+.withProjectId("test")
 .withInstanceId("123")
 .withDatabaseId("aaa")
 .withTimestamp(Timestamp.now())
@@ -232,11 +235,13 @@ public class SpannerIOReadTest implements Serializable {
 
 PCollectionView tx = pipeline
 .apply("tx", SpannerIO.createTransaction()
+.withProjectId("test")
 .withInstanceId("123")
 .withDatabaseId("aaa")
 .withServiceFactory(serviceFactory));
 
 PCollection one = pipeline.apply("read q", SpannerIO.read()
+.withProjectId("test")
 .withInstanceId("123")
 .withDatabaseId("aaa")
 .withTimestamp(Timestamp.now())
@@ -244,6 +249,7 @@ public class SpannerIOReadTest implements Serializable {
 .withServiceFactory(serviceFactory)
 .withTransaction(tx));
 PCollection two = pipeline.apply("read r", SpannerIO.read()
+.withProjectId("test")
 .withInstanceId("123")
 .withDatabaseId("aaa")
 .withTimestamp(Timestamp.now())



[2/2] beam git commit: This closes #3471

2017-06-29 Thread jbonofre
This closes #3471


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/ae3a7526
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/ae3a7526
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/ae3a7526

Branch: refs/heads/master
Commit: ae3a75263c3fc06e0dc16bc6964e199df167e32f
Parents: bf5aa1b c6de423
Author: Jean-Baptiste Onofré 
Authored: Thu Jun 29 20:39:59 2017 +0200
Committer: Jean-Baptiste Onofré 
Committed: Thu Jun 29 20:39:59 2017 +0200

--
 .../org/apache/beam/sdk/io/gcp/spanner/SpannerIOReadTest.java  | 6 ++
 1 file changed, 6 insertions(+)
--




[jira] [Commented] (BEAM-2425) Package does not exit when building beam-sdks-java-javadoc in JDK1.7

2017-06-29 Thread Mark Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16068768#comment-16068768
 ] 

Mark Liu commented on BEAM-2425:


friendly ping. Do you have time to look at it?

> Package does not exit when building beam-sdks-java-javadoc in JDK1.7
> 
>
> Key: BEAM-2425
> URL: https://issues.apache.org/jira/browse/BEAM-2425
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Mark Liu
>Assignee: Jean-Baptiste Onofré
>
> Error from [Jenkins 
> build|https://builds.apache.org/job/beam_PostCommit_Java_JDK_Versions_Test/57/jdk=JDK%201.7%20(latest),label=beam/console]:
> {code}
> [javadoc] Constructing Javadoc information...
>   [javadoc] 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Java_JDK_Versions_Test/jdk/JDK
>  1.7 
> (latest)/label/beam/sdks/java/javadoc/target/collected/org/apache/beam/sdk/io/cassandra/CassandraServiceImpl.java:20:
>  error: package com.datastax.driver.core does not exist
>   [javadoc] import com.datastax.driver.core.Cluster;
>   [javadoc]^
> ...
>   [javadoc] 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Java_JDK_Versions_Test/jdk/JDK
>  1.7 
> (latest)/label/beam/sdks/java/javadoc/target/collected/org/apache/beam/sdk/io/cassandra/CassandraServiceImpl.java:27:
>  error: package com.datastax.driver.core.policies does not exist
>   [javadoc] import com.datastax.driver.core.policies.DCAwareRoundRobinPolicy;
>   [javadoc] ^
> ...
> {code}
> This is maven Postcommit build in JDK1.7
> We have exclude many packages for Java8 in [this ant build 
> script|https://github.com/apache/beam/blob/master/sdks/java/javadoc/ant.xml], 
> but not org.apache.beam.sdk.io.cassandra. Probably need to add there?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (BEAM-2425) Package does not exit when building beam-sdks-java-javadoc in JDK1.7

2017-06-29 Thread Mark Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16068768#comment-16068768
 ] 

Mark Liu edited comment on BEAM-2425 at 6/29/17 6:38 PM:
-

friendly ping. Did you have a change to look at it?


was (Author: markflyhigh):
friendly ping. Do you have time to look at it?

> Package does not exit when building beam-sdks-java-javadoc in JDK1.7
> 
>
> Key: BEAM-2425
> URL: https://issues.apache.org/jira/browse/BEAM-2425
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Mark Liu
>Assignee: Jean-Baptiste Onofré
>
> Error from [Jenkins 
> build|https://builds.apache.org/job/beam_PostCommit_Java_JDK_Versions_Test/57/jdk=JDK%201.7%20(latest),label=beam/console]:
> {code}
> [javadoc] Constructing Javadoc information...
>   [javadoc] 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Java_JDK_Versions_Test/jdk/JDK
>  1.7 
> (latest)/label/beam/sdks/java/javadoc/target/collected/org/apache/beam/sdk/io/cassandra/CassandraServiceImpl.java:20:
>  error: package com.datastax.driver.core does not exist
>   [javadoc] import com.datastax.driver.core.Cluster;
>   [javadoc]^
> ...
>   [javadoc] 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Java_JDK_Versions_Test/jdk/JDK
>  1.7 
> (latest)/label/beam/sdks/java/javadoc/target/collected/org/apache/beam/sdk/io/cassandra/CassandraServiceImpl.java:27:
>  error: package com.datastax.driver.core.policies does not exist
>   [javadoc] import com.datastax.driver.core.policies.DCAwareRoundRobinPolicy;
>   [javadoc] ^
> ...
> {code}
> This is maven Postcommit build in JDK1.7
> We have exclude many packages for Java8 in [this ant build 
> script|https://github.com/apache/beam/blob/master/sdks/java/javadoc/ant.xml], 
> but not org.apache.beam.sdk.io.cassandra. Probably need to add there?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-2543) HBaseIOTest failed Java cross-JDK version test

2017-06-29 Thread Mark Liu (JIRA)
Mark Liu created BEAM-2543:
--

 Summary: HBaseIOTest failed Java cross-JDK version test
 Key: BEAM-2543
 URL: https://issues.apache.org/jira/browse/BEAM-2543
 Project: Beam
  Issue Type: Bug
  Components: testing
Reporter: Mark Liu
Assignee: Ismaël Mejía


HBaseIOTest failed Jenkins Java cross-JDK tests since June 22 as can tell from 
Jenkins history. Both JDK7, OpenJDK7&8 are affected.

Errors:
{code}
2017-06-29\T\18:13:56.187 [ERROR] org.apache.beam.sdk.io.hbase.HBaseIOTest  
Time elapsed: 0 s  <<< ERROR!
java.io.IOException: Shutting down
at 
org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:235)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:97)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1036)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1002)
at 
org.apache.beam.sdk.io.hbase.HBaseIOTest.beforeClass(HBaseIOTest.java:102)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runners.Suite.runChild(Suite.java:128)
at org.junit.runners.Suite.runChild(Suite.java:27)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at 
org.apache.maven.surefire.junitcore.pc.Scheduler$1.run(Scheduler.java:393)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Failed construction of Master: class 
org.apache.hadoop.hbase.master.HMasterIllegal character in path at index 89: 
file:/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Java_JDK_Versions_Test/jdk/JDK
 1.7 
(latest)/label/beam/sdks/java/io/hbase/target/test-data/b11a0828-4628-4fe9-885d-073fb641ddc9
at 
org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:143)
at 
org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:220)
at 
org.apache.hadoop.hbase.LocalHBaseCluster.(LocalHBaseCluster.java:155)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:217)
... 23 more
Caused by: java.lang.IllegalArgumentException: Illegal character in path at 
index 89: 
file:/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Java_JDK_Versions_Test/jdk/JDK
 1.7 
(latest)/label/beam/sdks/java/io/hbase/target/test-data/b11a0828-4628-4fe9-885d-073fb641ddc9
at java.net.URI.create(URI.java:859)
at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:175)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
at org.apache.hadoop.hbase.fs.HFileSystem.(HFileSystem.java:80)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.initializeFileSystem(HRegionServer.java:613)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:564)
at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:412)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at 
org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:139)
... 26 more
Caused by: java.net.URISyntaxException: Illegal character in path at index 89: 
file:/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Java_JDK_Versions_Test/jdk/JDK
 1.7 
(latest)/label/beam/sdks/java/io/hbase/target/test-data/b11a0828-4628-4fe9-885d-073fb641ddc9
at 

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2512

2017-06-29 Thread Apache Jenkins Server
See 




[jira] [Created] (BEAM-2542) Python performance test failed in Beam repo init

2017-06-29 Thread Mark Liu (JIRA)
Mark Liu created BEAM-2542:
--

 Summary: Python performance test failed in Beam repo init
 Key: BEAM-2542
 URL: https://issues.apache.org/jira/browse/BEAM-2542
 Project: Beam
  Issue Type: Bug
  Components: testing
Reporter: Mark Liu
Assignee: Mark Liu


Jenkins link: 
https://builds.apache.org/view/Beam/job/beam_PerformanceTests_Python/39

Errors:
{code}
2017-06-29 00:00:32,173 b368d0d5 MainThread beam_integration_benchmark(1/1) 
ERRORError during benchmark beam_integration_benchmark
Traceback (most recent call last):
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_Python/PerfKitBenchmarker/perfkitbenchmarker/pkb.py",
 line 555, in RunBenchmark
DoPreparePhase(spec, detailed_timer)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_Python/PerfKitBenchmarker/perfkitbenchmarker/pkb.py",
 line 450, in DoPreparePhase
spec.BenchmarkPrepare(spec)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_Python/PerfKitBenchmarker/perfkitbenchmarker/linux_benchmarks/beam_integration_benchmark.py",
 line 91, in Prepare
beam_benchmark_helper.InitializeBeamRepo(benchmark_spec)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_Python/PerfKitBenchmarker/perfkitbenchmarker/beam_benchmark_helper.py",
 line 136, in InitializeBeamRepo
mvn_command)
TypeError: AddRunnerProfileMvnArgument() takes exactly 3 arguments (2 given)
{code}

Need to investigate on benchmark code in Perfkit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2140) Fix SplittableDoFn ValidatesRunner tests in FlinkRunner

2017-06-29 Thread Eugene Kirpichov (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16068740#comment-16068740
 ] 

Eugene Kirpichov commented on BEAM-2140:


So, to elaborate on what Kenn said. We dug a bit deeper into this yesterday and 
came up with the following conclusions.

1) The reason that this stuff works in Dataflow and Direct runner is that, for 
running SDF, they use a code path that simply _does not drop late data/timers 
or GC state_. These happen in LateDataDroppingRunner and ReduceFnRunner and 
StatefulDoFnRunner - and the path for running ProcessFn does not involve any of 
these. Aljoscha, maybe you can see why your current codepaths for running 
ProcessFn in Flink involve dropping of late data / late timers, and make them 
not involve it? :) (I'm not sure where this dropping happens in Flink)
2) As a consequence, however, state doesn't get GC'd. In practice this means 
that, if you apply an SDF to input that is in many windows (e.g. to input 
windowed by fixed or sliding windows), it will slowly leak state. However, in 
practice this is likely not a huge concern because SDFs are expected to mostly 
be used when the amount of input is not super large (at least compared to 
output), and it is usually globally windowed. Especially in streaming use 
cases. I.e. it can be treated as a "Known issue" rather than "SDF does not work 
at all". *I would recommend proceeding to implement it in Flink runner with 
this same known issue*, and then solving the issue uniformly across all runners.

Posting this comment for now and writing another on how to do it without state 
leakage.

> Fix SplittableDoFn ValidatesRunner tests in FlinkRunner
> ---
>
> Key: BEAM-2140
> URL: https://issues.apache.org/jira/browse/BEAM-2140
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> As discovered as part of BEAM-1763, there is a failing SDF test. We disabled 
> the tests to unblock the open PR for BEAM-1763.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is unstable: beam_PostCommit_Java_MavenInstall #4259

2017-06-29 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2511

2017-06-29 Thread Apache Jenkins Server
See 




Build failed in Jenkins: beam_PerformanceTests_Python #42

2017-06-29 Thread Apache Jenkins Server
See 


Changes:

[klk] Only use ASCII 'a' through 'z' for temporary Spanner tables

[altay] Add PubSub I/O support to Python DirectRunner

[altay] Use SDK harness container for FnAPI jobs when

--
Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on beam6 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/${ghprbPullId}/*:refs/remotes/origin/pr/${ghprbPullId}/*
 > git rev-parse origin/master^{commit} # timeout=10
Checking out Revision bf5aa1bca4861d6867978a9508dc5c952bd7fc2b (origin/master)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f bf5aa1bca4861d6867978a9508dc5c952bd7fc2b
 > git rev-list 997bf402511d2b2468385fcfd2b3aa4dba52eac5 # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
[EnvInject] - Executing scripts and injecting environment variables after the 
SCM step.
[EnvInject] - Injecting as environment variables the properties content 
SPARK_LOCAL_IP=127.0.0.1

[EnvInject] - Variables injected successfully.
[beam_PerformanceTests_Python] $ /bin/bash -xe /tmp/hudson2853509941124061676.sh
+ rm -rf PerfKitBenchmarker
[beam_PerformanceTests_Python] $ /bin/bash -xe /tmp/hudson5209238805176174882.sh
+ git clone https://github.com/GoogleCloudPlatform/PerfKitBenchmarker.git
Cloning into 'PerfKitBenchmarker'...
[beam_PerformanceTests_Python] $ /bin/bash -xe /tmp/hudson169748395403883254.sh
+ pip install --user -r PerfKitBenchmarker/requirements.txt
Requirement already satisfied (use --upgrade to upgrade): python-gflags==3.1.1 
in /home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 14))
Requirement already satisfied (use --upgrade to upgrade): jinja2>=2.7 in 
/usr/local/lib/python2.7/dist-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 15))
Requirement already satisfied (use --upgrade to upgrade): setuptools in 
/usr/lib/python2.7/dist-packages (from -r PerfKitBenchmarker/requirements.txt 
(line 16))
Requirement already satisfied (use --upgrade to upgrade): 
colorlog[windows]==2.6.0 in /home/jenkins/.local/lib/python2.7/site-packages 
(from -r PerfKitBenchmarker/requirements.txt (line 17))
  Installing extra requirements: 'windows'
Requirement already satisfied (use --upgrade to upgrade): blinker>=1.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 18))
Requirement already satisfied (use --upgrade to upgrade): futures>=3.0.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 19))
Requirement already satisfied (use --upgrade to upgrade): PyYAML==3.12 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 20))
Requirement already satisfied (use --upgrade to upgrade): pint>=0.7 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 21))
Requirement already satisfied (use --upgrade to upgrade): numpy in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 22))
Requirement already satisfied (use --upgrade to upgrade): functools32 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 23))
Requirement already satisfied (use --upgrade to upgrade): contextlib2>=0.5.1 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 24))
Cleaning up...
[beam_PerformanceTests_Python] $ /bin/bash -xe /tmp/hudson7312679479954471964.sh
+ pip install --user -e 'sdks/python/[gcp,test]'
Obtaining 
file://
  Running setup.py 
(path:
 egg_info for package from 
file://

:66:
 UserWarning: You are using version 1.5.4 of pip. However, version 7.0.0 is 
recommended.
  _PIP_VERSION, REQUIRED_PIP_VERSION
no previously-included directories found matching 'doc/.build'

Installed 


  

[jira] [Commented] (BEAM-2521) Simplify packaging for python distributions

2017-06-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16068713#comment-16068713
 ] 

ASF GitHub Bot commented on BEAM-2521:
--

GitHub user aaltay opened a pull request:

https://github.com/apache/beam/pull/3474

[BEAM-2521] Select SDK distribution based on the selected SDK name

R: @tvalentyn 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aaltay/beam dist2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3474.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3474


commit fdd6309ac0b5e9555cbe2bf215894e959990763a
Author: Ahmet Altay 
Date:   2017-06-29T17:56:25Z

Select SDK distribution based on the selected SDK name




> Simplify packaging for python distributions
> ---
>
> Key: BEAM-2521
> URL: https://issues.apache.org/jira/browse/BEAM-2521
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
> Fix For: 2.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3474: [BEAM-2521] Select SDK distribution based on the se...

2017-06-29 Thread aaltay
GitHub user aaltay opened a pull request:

https://github.com/apache/beam/pull/3474

[BEAM-2521] Select SDK distribution based on the selected SDK name

R: @tvalentyn 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aaltay/beam dist2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3474.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3474


commit fdd6309ac0b5e9555cbe2bf215894e959990763a
Author: Ahmet Altay 
Date:   2017-06-29T17:56:25Z

Select SDK distribution based on the selected SDK name




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #3473: Simplify package selection logic

2017-06-29 Thread tvalentyn
Github user tvalentyn closed the pull request at:

https://github.com/apache/beam/pull/3473


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #3473: Simplify package selection logic

2017-06-29 Thread tvalentyn
GitHub user tvalentyn opened a pull request:

https://github.com/apache/beam/pull/3473

Simplify package selection logic

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`.
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tvalentyn/beam 
simplify_package_selection_logic

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3473.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3473


commit 5c4b236d3c9f540f4bb1d038eb01fc7d187482b2
Author: Valentyn Tymofieiev 
Date:   2017-06-29T17:36:33Z

Simplify package selection logic




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #3468: Use SDK harness container for FnAPI jobs when worke...

2017-06-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3468


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: This closes #3468

2017-06-29 Thread altay
This closes #3468


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/bf5aa1bc
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/bf5aa1bc
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/bf5aa1bc

Branch: refs/heads/master
Commit: bf5aa1bca4861d6867978a9508dc5c952bd7fc2b
Parents: 2dd1907 f46a40c
Author: Ahmet Altay 
Authored: Thu Jun 29 10:35:56 2017 -0700
Committer: Ahmet Altay 
Committed: Thu Jun 29 10:35:56 2017 -0700

--
 .../runners/dataflow/internal/apiclient.py  |  6 +--
 .../runners/dataflow/internal/dependency.py | 44 +---
 2 files changed, 39 insertions(+), 11 deletions(-)
--




[1/2] beam git commit: Use SDK harness container for FnAPI jobs when worker_harness_container_image is not specified. Add a separate image tag to use with the SDK harness container.

2017-06-29 Thread altay
Repository: beam
Updated Branches:
  refs/heads/master 2dd1907c6 -> bf5aa1bca


Use SDK harness container for FnAPI jobs when worker_harness_container_image is 
not specified. Add a separate image tag to use with the SDK harness container.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/f46a40c2
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/f46a40c2
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/f46a40c2

Branch: refs/heads/master
Commit: f46a40c279499737bb7fb45af5e299d76f6af49b
Parents: 2dd1907
Author: Valentyn Tymofieiev 
Authored: Wed Jun 28 16:41:03 2017 -0700
Committer: Ahmet Altay 
Committed: Thu Jun 29 10:35:53 2017 -0700

--
 .../runners/dataflow/internal/apiclient.py  |  6 +--
 .../runners/dataflow/internal/dependency.py | 44 +---
 2 files changed, 39 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/f46a40c2/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
--
diff --git a/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py 
b/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
index df1a3f2..edac9d7 100644
--- a/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
+++ b/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
@@ -38,7 +38,6 @@ from apache_beam.io.filesystems import FileSystems
 from apache_beam.io.gcp.internal.clients import storage
 from apache_beam.runners.dataflow.internal import dependency
 from apache_beam.runners.dataflow.internal.clients import dataflow
-from apache_beam.runners.dataflow.internal.dependency import 
get_required_container_version
 from apache_beam.runners.dataflow.internal.dependency import 
get_sdk_name_and_version
 from apache_beam.runners.dataflow.internal.names import PropertyNames
 from apache_beam.transforms import cy_combiners
@@ -205,11 +204,8 @@ class Environment(object):
   pool.workerHarnessContainerImage = (
   self.worker_options.worker_harness_container_image)
 else:
-  # Default to using the worker harness container image for the current SDK
-  # version.
   pool.workerHarnessContainerImage = (
-  'dataflow.gcr.io/v1beta3/python:%s' %
-  get_required_container_version())
+  dependency.get_default_container_image_for_current_sdk(job_type))
 if self.worker_options.use_public_ips is not None:
   if self.worker_options.use_public_ips:
 pool.ipConfiguration = (

http://git-wip-us.apache.org/repos/asf/beam/blob/f46a40c2/sdks/python/apache_beam/runners/dataflow/internal/dependency.py
--
diff --git a/sdks/python/apache_beam/runners/dataflow/internal/dependency.py 
b/sdks/python/apache_beam/runners/dataflow/internal/dependency.py
index 03e1794..a40a582 100644
--- a/sdks/python/apache_beam/runners/dataflow/internal/dependency.py
+++ b/sdks/python/apache_beam/runners/dataflow/internal/dependency.py
@@ -71,9 +71,15 @@ from apache_beam.options.pipeline_options import SetupOptions
 
 
 # Update this version to the next version whenever there is a change that will
-# require changes to the execution environment.
+# require changes to legacy Dataflow worker execution environment.
 # This should be in the beam-[version]-[date] format, date is optional.
+# BEAM_CONTAINER_VERSION and BEAM_FNAPI_CONTAINER version should coincide
+# when we make a release.
 BEAM_CONTAINER_VERSION = 'beam-2.1.0-20170626'
+# Update this version to the next version whenever there is a change that
+# requires changes to SDK harness container or SDK harness launcher.
+# This should be in the beam-[version]-[date] format, date is optional.
+BEAM_FNAPI_CONTAINER_VERSION = 'beam-2.1.0-20170621'
 
 # Standard file names used for staging files.
 WORKFLOW_TARBALL_FILE = 'workflow.tar.gz'
@@ -474,10 +480,33 @@ def _stage_beam_sdk_tarball(sdk_remote_location, 
staged_path, temp_dir):
 'type of location: %s' % sdk_remote_location)
 
 
-def get_required_container_version():
+def get_default_container_image_for_current_sdk(job_type):
+  """For internal use only; no backwards-compatibility guarantees.
+
+  Args:
+job_type (str): BEAM job type.
+
+  Returns:
+str: Google Cloud Dataflow container image for remote execution.
+  """
+  # TODO(tvalentyn): Use enumerated type instead of strings for job types.
+  if job_type == 'FNAPI_BATCH' or job_type == 'FNAPI_STREAMING':
+image_name = 'dataflow.gcr.io/v1beta3/python-fnapi'
+  else:
+image_name = 'dataflow.gcr.io/v1beta3/python'
+  image_tag = _get_required_container_version(job_type)
+  return image_name + ':' + image_tag
+
+
+def 

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2510

2017-06-29 Thread Apache Jenkins Server
See 




[GitHub] beam pull request #3472: Get rid of Apache Commons dependency

2017-06-29 Thread mairbek
GitHub user mairbek opened a pull request:

https://github.com/apache/beam/pull/3472

Get rid of Apache Commons dependency

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`.
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mairbek/beam commons

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3472.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3472


commit 39a2ed0ccb53bcc96c179c64405c80226bac7b9b
Author: Mairbek Khadikov 
Date:   2017-06-29T17:12:50Z

Ditch apache commons




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-2298) Java WordCount doesn't work in Window OS for glob expressions or file: prefixed paths

2017-06-29 Thread Flavio Fiszman (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16068632#comment-16068632
 ] 

Flavio Fiszman commented on BEAM-2298:
--

Hi Vinay, thanks for the extra piece of information.
I'm currently working on a fix for this problem and will update this issue with 
progress.

> Java WordCount doesn't work in Window OS for glob expressions or file: 
> prefixed paths
> -
>
> Key: BEAM-2298
> URL: https://issues.apache.org/jira/browse/BEAM-2298
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Pei He
>Assignee: Flavio Fiszman
> Fix For: 2.1.0
>
>
> I am not able to build beam repo in Windows OS, so I copied the jar file from 
> my Mac.
> WordCount failed with the following cmd:
> java -cp beam-examples-java-2.0.0-jar-with-dependencies.jar
>  org.apache.beam.examples.WordCount --inputFile=input.txt --output=counts
> May 15, 2017 6:09:48 AM org.apache.beam.sdk.io.FileBasedSource 
> getEstimatedSizeB
> ytes
> INFO: Filepattern input.txt matched 1 files with total size 0
> May 15, 2017 6:09:48 AM org.apache.beam.sdk.io.FileBasedSource 
> expandFilePattern
> INFO: Matched 1 files for pattern input.txt
> May 15, 2017 6:09:48 AM org.apache.beam.sdk.io.FileBasedSource split
> INFO: Splitting filepattern input.txt into bundles of size 0 took 0 ms and 
> produ
> ced 1 files and 0 bundles
> May 15, 2017 6:09:48 AM org.apache.beam.sdk.io.WriteFiles$2 processElement
> INFO: Finalizing write operation 
> TextWriteOperation{tempDirectory=C:\Users\Pei\D
> esktop\.temp-beam-2017-05-135_13-09-48-1\, windowedWrites=false}.
> May 15, 2017 6:09:48 AM org.apache.beam.sdk.io.WriteFiles$2 processElement
> INFO: Creating 1 empty output shards in addition to 0 written for a total of 
> 1.
> Exception in thread "main" 
> org.apache.beam.sdk.Pipeline$PipelineExecutionExcepti
> on: java.lang.IllegalStateException: Unable to find registrar for c
> at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.wait
> UntilFinish(DirectRunner.java:322)
> at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.wait
> UntilFinish(DirectRunner.java:292)
> at 
> org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:200
> )
> at 
> org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:63)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:295)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:281)
> at org.apache.beam.examples.WordCount.main(WordCount.java:184)
> Caused by: java.lang.IllegalStateException: Unable to find registrar for c
> at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.
> java:447)
> at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:111)
> at 
> org.apache.beam.sdk.io.FileSystems.matchResources(FileSystems.java:17
> 4)
> at 
> org.apache.beam.sdk.io.FileSystems.filterMissingFiles(FileSystems.jav
> a:367)
> at org.apache.beam.sdk.io.FileSystems.copy(FileSystems.java:251)
> at 
> org.apache.beam.sdk.io.FileBasedSink$WriteOperation.copyToOutputFiles
> (FileBasedSink.java:641)
> at 
> org.apache.beam.sdk.io.FileBasedSink$WriteOperation.finalize(FileBase
> dSink.java:529)
> at 
> org.apache.beam.sdk.io.WriteFiles$2.processElement(WriteFiles.java:59
> 2)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (BEAM-2521) Simplify packaging for python distributions

2017-06-29 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay resolved BEAM-2521.
---
Resolution: Fixed

> Simplify packaging for python distributions
> ---
>
> Key: BEAM-2521
> URL: https://issues.apache.org/jira/browse/BEAM-2521
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
> Fix For: 2.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[1/2] beam git commit: Add PubSub I/O support to Python DirectRunner

2017-06-29 Thread altay
Repository: beam
Updated Branches:
  refs/heads/master 4d41e25d8 -> 2dd1907c6


Add PubSub I/O support to Python DirectRunner


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/fb7ec28c
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/fb7ec28c
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/fb7ec28c

Branch: refs/heads/master
Commit: fb7ec28cfb1291b04e0eac738054eefe0bb9a103
Parents: 4d41e25
Author: Charles Chen 
Authored: Mon Jun 26 18:03:53 2017 -0700
Committer: Ahmet Altay 
Committed: Thu Jun 29 09:46:03 2017 -0700

--
 .../apache_beam/examples/streaming_wordcount.py | 12 ++-
 sdks/python/apache_beam/io/gcp/pubsub.py| 91 +++-
 sdks/python/apache_beam/io/gcp/pubsub_test.py   | 89 +++
 .../runners/dataflow/dataflow_runner.py | 11 +--
 .../apache_beam/runners/direct/direct_runner.py | 54 
 .../runners/direct/transform_evaluator.py   | 89 +++
 6 files changed, 281 insertions(+), 65 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/fb7ec28c/sdks/python/apache_beam/examples/streaming_wordcount.py
--
diff --git a/sdks/python/apache_beam/examples/streaming_wordcount.py 
b/sdks/python/apache_beam/examples/streaming_wordcount.py
index 4c29f2b..7696d77 100644
--- a/sdks/python/apache_beam/examples/streaming_wordcount.py
+++ b/sdks/python/apache_beam/examples/streaming_wordcount.py
@@ -28,6 +28,8 @@ import logging
 
 
 import apache_beam as beam
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import StandardOptions
 import apache_beam.transforms.window as window
 
 
@@ -41,13 +43,17 @@ def run(argv=None):
   parser = argparse.ArgumentParser()
   parser.add_argument(
   '--input_topic', required=True,
-  help='Input PubSub topic of the form "/topics//".')
+  help=('Input PubSub topic of the form '
+'"projects//topics/".'))
   parser.add_argument(
   '--output_topic', required=True,
-  help='Output PubSub topic of the form "/topics//".')
+  help=('Output PubSub topic of the form '
+'"projects//topic/".'))
   known_args, pipeline_args = parser.parse_known_args(argv)
+  options = PipelineOptions(pipeline_args)
+  options.view_as(StandardOptions).streaming = True
 
-  with beam.Pipeline(argv=pipeline_args) as p:
+  with beam.Pipeline(options=options) as p:
 
 # Read from PubSub into a PCollection.
 lines = p | beam.io.ReadStringsFromPubSub(known_args.input_topic)

http://git-wip-us.apache.org/repos/asf/beam/blob/fb7ec28c/sdks/python/apache_beam/io/gcp/pubsub.py
--
diff --git a/sdks/python/apache_beam/io/gcp/pubsub.py 
b/sdks/python/apache_beam/io/gcp/pubsub.py
index fabe296..32d388a 100644
--- a/sdks/python/apache_beam/io/gcp/pubsub.py
+++ b/sdks/python/apache_beam/io/gcp/pubsub.py
@@ -24,12 +24,16 @@ This API is currently under development and is subject to 
change.
 
 from __future__ import absolute_import
 
+import re
+
 from apache_beam import coders
 from apache_beam.io.iobase import Read
 from apache_beam.io.iobase import Write
 from apache_beam.runners.dataflow.native_io import iobase as dataflow_io
+from apache_beam.transforms import core
 from apache_beam.transforms import PTransform
 from apache_beam.transforms import Map
+from apache_beam.transforms import window
 from apache_beam.transforms.display import DisplayDataItem
 
 
@@ -43,11 +47,12 @@ class ReadStringsFromPubSub(PTransform):
 """Initializes ``ReadStringsFromPubSub``.
 
 Attributes:
-  topic: Cloud Pub/Sub topic in the form "/topics//". If
-provided then subscription must be None.
+  topic: Cloud Pub/Sub topic in the form "projects//topics/
+". If provided, subscription must be None.
   subscription: Existing Cloud Pub/Sub subscription to use in the
-form "projects//subscriptions/". If provided 
then
-topic must be None.
+form "projects//subscriptions/". If not
+specified, a temporary subscription will be created from the specified
+topic. If provided, topic must be None.
   id_label: The attribute on incoming Pub/Sub messages to use as a unique
 record identifier.  When specified, the value of this attribute (which
 can be any string that uniquely identifies the record) will be used for
@@ -56,17 +61,14 @@ class ReadStringsFromPubSub(PTransform):
 case, deduplication of the stream will be strictly best effort.
 """
 super(ReadStringsFromPubSub, self).__init__()
-if topic and subscription:
-  raise ValueError("Only one of topic or 

[GitHub] beam pull request #3454: Add PubSub I/O support to Python DirectRunner

2017-06-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3454


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: This closes #3454

2017-06-29 Thread altay
This closes #3454


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/2dd1907c
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/2dd1907c
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/2dd1907c

Branch: refs/heads/master
Commit: 2dd1907c65cdd04f2a54f2ef1368ee39f72c19fe
Parents: 4d41e25 fb7ec28
Author: Ahmet Altay 
Authored: Thu Jun 29 09:46:07 2017 -0700
Committer: Ahmet Altay 
Committed: Thu Jun 29 09:46:07 2017 -0700

--
 .../apache_beam/examples/streaming_wordcount.py | 12 ++-
 sdks/python/apache_beam/io/gcp/pubsub.py| 91 +++-
 sdks/python/apache_beam/io/gcp/pubsub_test.py   | 89 +++
 .../runners/dataflow/dataflow_runner.py | 11 +--
 .../apache_beam/runners/direct/direct_runner.py | 54 
 .../runners/direct/transform_evaluator.py   | 89 +++
 6 files changed, 281 insertions(+), 65 deletions(-)
--




Jenkins build is back to normal : beam_PostCommit_Java_ValidatesRunner_Flink #3279

2017-06-29 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2509

2017-06-29 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-2140) Fix SplittableDoFn ValidatesRunner tests in FlinkRunner

2017-06-29 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16068561#comment-16068561
 ] 

Kenneth Knowles commented on BEAM-2140:
---

Yea. I think if timers are set with an event time timestamp, then we hold for 
that time and use the min of the hold and the input watermark to govern GC.

> Fix SplittableDoFn ValidatesRunner tests in FlinkRunner
> ---
>
> Key: BEAM-2140
> URL: https://issues.apache.org/jira/browse/BEAM-2140
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> As discovered as part of BEAM-1763, there is a failing SDF test. We disabled 
> the tests to unblock the open PR for BEAM-1763.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   >