date:20170724

[jira] [Updated] (BEAM-2575) ApexRunner doesn't emit watermarks for additional outputs

2017-07-24 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré updated BEAM-2575:
---
Fix Version/s: 2.1.0

> ApexRunner doesn't emit watermarks for additional outputs 
> --
>
> Key: BEAM-2575
> URL: https://issues.apache.org/jira/browse/BEAM-2575
> Project: Beam
>  Issue Type: Bug
>  Components: runner-apex
>Reporter: Thomas Weise
>Assignee: Thomas Weise
> Fix For: 2.1.0, 2.2.0
>
>
> https://lists.apache.org/thread.html/51113a207f96d0522fb81adb65e35e134a0c52cf4bbe1cfc46508d83@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2701

2017-07-24 Thread Apache Jenkins Server

See

Build failed in Jenkins: beam_PerformanceTests_Python #140

2017-07-24 Thread Apache Jenkins Server

See 


Changes:

[klk] Add stub DisplayDataTranslation

[klk] Fix tests that passed invalid input to DynamicDestinations

[klk] Add Pipeline rehydration from proto

[klk] Dehydrate then rehydrate Pipeline before DirectRunner.run()

--
Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on beam2 (beam) in workspace 

Cloning the remote Git repository
Cloning repository https://github.com/apache/beam.git
 > git init  # 
 > timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # 
 > timeout=10
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/${ghprbPullId}/*:refs/remotes/origin/pr/${ghprbPullId}/*
 > git rev-parse origin/master^{commit} # timeout=10
Checking out Revision 01408c864e9d844f4ffb74cc3f18276ff6a5c447 (origin/master)
Commit message: "This closes #3334: [BEAM-2333] Go to proto and back before 
running a pipeline in Java DirectRunner"
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 01408c864e9d844f4ffb74cc3f18276ff6a5c447
 > git rev-list 0064fb37ad13a10fc510e567d21873403a42340a # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
[EnvInject] - Executing scripts and injecting environment variables after the 
SCM step.
[EnvInject] - Injecting as environment variables the properties content 
SPARK_LOCAL_IP=127.0.0.1

[EnvInject] - Variables injected successfully.
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins8070557183743054525.sh
+ rm -rf PerfKitBenchmarker
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins1481964471315043761.sh
+ git clone https://github.com/GoogleCloudPlatform/PerfKitBenchmarker.git
Cloning into 'PerfKitBenchmarker'...
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins5135023019300192958.sh
+ pip install --user -r PerfKitBenchmarker/requirements.txt
Requirement already satisfied (use --upgrade to upgrade): python-gflags==3.1.1 
in /home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 14))
Requirement already satisfied (use --upgrade to upgrade): jinja2>=2.7 in 
/usr/local/lib/python2.7/dist-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 15))
Requirement already satisfied (use --upgrade to upgrade): setuptools in 
/usr/lib/python2.7/dist-packages (from -r PerfKitBenchmarker/requirements.txt 
(line 16))
Requirement already satisfied (use --upgrade to upgrade): 
colorlog[windows]==2.6.0 in /home/jenkins/.local/lib/python2.7/site-packages 
(from -r PerfKitBenchmarker/requirements.txt (line 17))
  Installing extra requirements: 'windows'
Requirement already satisfied (use --upgrade to upgrade): blinker>=1.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 18))
Requirement already satisfied (use --upgrade to upgrade): futures>=3.0.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 19))
Requirement already satisfied (use --upgrade to upgrade): PyYAML==3.12 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 20))
Requirement already satisfied (use --upgrade to upgrade): pint>=0.7 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 21))
Requirement already satisfied (use --upgrade to upgrade): numpy in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 22))
Requirement already satisfied (use --upgrade to upgrade): functools32 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 23))
Requirement already satisfied (use --upgrade to upgrade): contextlib2>=0.5.1 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 24))
Cleaning up...
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins5966406085273035054.sh
+ pip install --user -e 'sdks/python/[gcp,test]'
Obtaining 
file://
  Running setup.py

[jira] [Commented] (BEAM-79) Gearpump runner

2017-07-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099471#comment-16099471
 ] 

ASF GitHub Bot commented on BEAM-79:


GitHub user manuzhang opened a pull request:

https://github.com/apache/beam/pull/3637

[BEAM-79] Prepare for merging gearpump-runner

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manuzhang/beam gearpump-runner

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3637.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3637


commit e655f53c2e732bcd082449ba8d0b551f417e8aaa
Author: manuzhang 
Date:   2017-07-21T06:27:37Z

Revert accidental changes to sdks/java/pom.xml

commit daa7566939e212095ef745ff004580bfe4209b38
Author: manuzhang 
Date:   2017-07-21T06:58:13Z

Upgrade BEAM version to 2.2.0-SNAPSHOT in gearpump-runner

commit 49d4ed59567852e978d14729c89b9ea91bb96c32
Author: manuzhang 
Date:   2017-07-21T14:33:36Z

Add beam-runners-gearpump dependency to javadoc




> Gearpump runner
> ---
>
> Key: BEAM-79
> URL: https://issues.apache.org/jira/browse/BEAM-79
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-gearpump
>Reporter: Tyler Akidau
>Assignee: Manu Zhang
>
> Intel is submitting Gearpump (http://www.gearpump.io) to ASF 
> (https://wiki.apache.org/incubator/GearpumpProposal). Appears to be a mix of 
> low-level primitives a la MillWheel, with some higher level primitives like 
> non-merging windowing mixed in. Seems like it would make a nice Beam runner.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] beam pull request #3637: [BEAM-79] Prepare for merging gearpump-runner

2017-07-24 Thread manuzhang

GitHub user manuzhang opened a pull request:

https://github.com/apache/beam/pull/3637

[BEAM-79] Prepare for merging gearpump-runner

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manuzhang/beam gearpump-runner

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3637.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3637


commit e655f53c2e732bcd082449ba8d0b551f417e8aaa
Author: manuzhang 
Date:   2017-07-21T06:27:37Z

Revert accidental changes to sdks/java/pom.xml

commit daa7566939e212095ef745ff004580bfe4209b38
Author: manuzhang 
Date:   2017-07-21T06:58:13Z

Upgrade BEAM version to 2.2.0-SNAPSHOT in gearpump-runner

commit 49d4ed59567852e978d14729c89b9ea91bb96c32
Author: manuzhang 
Date:   2017-07-21T14:33:36Z

Add beam-runners-gearpump dependency to javadoc




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (BEAM-79) Gearpump runner

2017-07-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099465#comment-16099465
 ] 

ASF GitHub Bot commented on BEAM-79:


GitHub user manuzhang opened a pull request:

https://github.com/apache/beam/pull/3636

[BEAM-79] merge gearpump-runner into master

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/beam gearpump-runner

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3636.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3636


commit 9478f4117de3a2d0ea40614ed4cb801918610724
Author: manuzhang 
Date:   2016-03-15T08:15:16Z

[BEAM-79] add Gearpump runner

commit 02b2248a5b3c8a2c064547d7380bebc97f849bf1
Author: Kenneth Knowles 
Date:   2016-07-20T16:07:48Z

This closes #323

commit 2a0ba61e8507e1539115b583749a78f14d577bd8
Author: Kenneth Knowles 
Date:   2016-08-25T18:36:45Z

Merge branch master into gearpump-runner

commit 1672b5483e029292816397248dc6fe63bf51f4af
Author: manuzhang 
Date:   2016-07-23T06:10:15Z

move integration tests to profile

commit 276a2e106aa1a573fc2eb2426b640f63cf68
Author: manuzhang 
Date:   2016-07-28T08:30:13Z

add package-info.java

commit 40be715a696bb1218b209f7ad9a979b7e5d088d3
Author: Kenneth Knowles 
Date:   2016-08-10T17:26:57Z

Update Gearpump runner version to 0.3.0-incubating

commit bc1b354949416db3b52c4f37c66968bdb86f0813
Author: manuzhang 
Date:   2016-08-11T23:22:00Z

Rename DoFn to OldDoFn in Gearpump runner

commit 091a15a07c7625ae3009cefaecece3a29a34c109
Author: Kenneth Knowles 
Date:   2016-08-25T18:40:03Z

This closess #750

commit fb74c936ed92c7a8548c338cc03957794fc60902
Author: Dan Halperin 
Date:   2016-08-26T23:25:58Z

gearpump: switch to stable version

They have apparently deleted the SNAPSHOT jar and now builds are failing.

commit bf0a2edae11416a3cbddeaff2c0a70adc272c5fe
Author: Dan Halperin 
Date:   2016-08-27T00:46:42Z

Closes #895

commit 89921c41ca9d4c333af45efa32359a631214c1df
Author: bchambers 
Date:   2016-07-29T16:41:17Z

Remove Counter and associated code

Aggregator is the model level concept. Counter was specific to the
Dataflow Runner, and is now not needed as part of Beam.

commit 7fc2c6848f002ac8b2ccbe35e6b5a576777a7af9
Author: Mark Liu 
Date:   2016-08-03T00:25:14Z

[BEAM-495] Create General Verifier for File Checksum

commit b47549e4893a6d487c00ea0ba02619168a3f19f3
Author: Mark Liu 
Date:   2016-08-03T00:47:46Z

Add output checksum to  WordCountITOptions

commit 58cd781c82fa728f34f5ab0641f8f9b6edcf449c
Author: Ian Zhou 
Date:   2016-08-05T22:31:59Z

Added unit tests and error handling in removeTemporaryTables

commit 36a9aa232ea56de449930194788becce585212ef
Author: Thomas Groh 
Date:   2016-08-09T02:09:58Z

Improve Write Error Message

If provided with an Unbounded PCollection, Write will fail due to
restriction of calling finalize only once. This error message fails in a
deep stack trace based on it not being possible to apply a GroupByKey.
Instead, throw immediately on application with a specific error message.

commit d5641553cebb02f08ca7c1fe667948d39cb3962c
Author: Thomas Groh 
Date:   2016-08-09T17:47:09Z

Remove Streaming Write Overrides in

[GitHub] beam pull request #3636: [BEAM-79] merge gearpump-runner into master

2017-07-24 Thread manuzhang

GitHub user manuzhang opened a pull request:

https://github.com/apache/beam/pull/3636

[BEAM-79] merge gearpump-runner into master

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/beam gearpump-runner

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3636.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3636


commit 9478f4117de3a2d0ea40614ed4cb801918610724
Author: manuzhang 
Date:   2016-03-15T08:15:16Z

[BEAM-79] add Gearpump runner

commit 02b2248a5b3c8a2c064547d7380bebc97f849bf1
Author: Kenneth Knowles 
Date:   2016-07-20T16:07:48Z

This closes #323

commit 2a0ba61e8507e1539115b583749a78f14d577bd8
Author: Kenneth Knowles 
Date:   2016-08-25T18:36:45Z

Merge branch master into gearpump-runner

commit 1672b5483e029292816397248dc6fe63bf51f4af
Author: manuzhang 
Date:   2016-07-23T06:10:15Z

move integration tests to profile

commit 276a2e106aa1a573fc2eb2426b640f63cf68
Author: manuzhang 
Date:   2016-07-28T08:30:13Z

add package-info.java

commit 40be715a696bb1218b209f7ad9a979b7e5d088d3
Author: Kenneth Knowles 
Date:   2016-08-10T17:26:57Z

Update Gearpump runner version to 0.3.0-incubating

commit bc1b354949416db3b52c4f37c66968bdb86f0813
Author: manuzhang 
Date:   2016-08-11T23:22:00Z

Rename DoFn to OldDoFn in Gearpump runner

commit 091a15a07c7625ae3009cefaecece3a29a34c109
Author: Kenneth Knowles 
Date:   2016-08-25T18:40:03Z

This closess #750

commit fb74c936ed92c7a8548c338cc03957794fc60902
Author: Dan Halperin 
Date:   2016-08-26T23:25:58Z

gearpump: switch to stable version

They have apparently deleted the SNAPSHOT jar and now builds are failing.

commit bf0a2edae11416a3cbddeaff2c0a70adc272c5fe
Author: Dan Halperin 
Date:   2016-08-27T00:46:42Z

Closes #895

commit 89921c41ca9d4c333af45efa32359a631214c1df
Author: bchambers 
Date:   2016-07-29T16:41:17Z

Remove Counter and associated code

Aggregator is the model level concept. Counter was specific to the
Dataflow Runner, and is now not needed as part of Beam.

commit 7fc2c6848f002ac8b2ccbe35e6b5a576777a7af9
Author: Mark Liu 
Date:   2016-08-03T00:25:14Z

[BEAM-495] Create General Verifier for File Checksum

commit b47549e4893a6d487c00ea0ba02619168a3f19f3
Author: Mark Liu 
Date:   2016-08-03T00:47:46Z

Add output checksum to  WordCountITOptions

commit 58cd781c82fa728f34f5ab0641f8f9b6edcf449c
Author: Ian Zhou 
Date:   2016-08-05T22:31:59Z

Added unit tests and error handling in removeTemporaryTables

commit 36a9aa232ea56de449930194788becce585212ef
Author: Thomas Groh 
Date:   2016-08-09T02:09:58Z

Improve Write Error Message

If provided with an Unbounded PCollection, Write will fail due to
restriction of calling finalize only once. This error message fails in a
deep stack trace based on it not being possible to apply a GroupByKey.
Instead, throw immediately on application with a specific error message.

commit d5641553cebb02f08ca7c1fe667948d39cb3962c
Author: Thomas Groh 
Date:   2016-08-09T17:47:09Z

Remove Streaming Write Overrides in DataflowRunner

These writes should be forbidden based on the boundedness of the input
PCollection. As Write explicitly forbids the application of the
transform to an Unbounded PCollection, this will be equivalent in

[jira] [Commented] (BEAM-2333) Rehydrate Pipeline from Runner API proto

2017-07-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099405#comment-16099405
 ] 

ASF GitHub Bot commented on BEAM-2333:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3334


> Rehydrate Pipeline from Runner API proto
> 
>
> Key: BEAM-2333
> URL: https://issues.apache.org/jira/browse/BEAM-2333
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>  Labels: beam-python-everywhere
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[2/5] beam git commit: Add stub DisplayDataTranslation

2017-07-24 Thread kenn

Add stub DisplayDataTranslation


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/efe2dc17
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/efe2dc17
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/efe2dc17

Branch: refs/heads/master
Commit: efe2dc17af2a66c5c33076467cb46f73bb7fb9ab
Parents: 0064fb3
Author: Kenneth Knowles 
Authored: Wed Jul 19 20:09:52 2017 -0700
Committer: Kenneth Knowles 
Committed: Mon Jul 24 18:53:25 2017 -0700

--
 .../construction/DisplayDataTranslation.java| 39 
 .../construction/PTransformTranslation.java |  4 +-
 2 files changed, 42 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/efe2dc17/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DisplayDataTranslation.java
--
diff --git 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DisplayDataTranslation.java
 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DisplayDataTranslation.java
new file mode 100644
index 000..ff7f9f2
--- /dev/null
+++ 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DisplayDataTranslation.java
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.core.construction;
+
+import com.google.protobuf.Any;
+import com.google.protobuf.BoolValue;
+import org.apache.beam.sdk.common.runner.v1.RunnerApi;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+
+/** Utilities for going to/from DisplayData protos. */
+public class DisplayDataTranslation {
+  public static RunnerApi.DisplayData toProto(DisplayData displayData) {
+// TODO https://issues.apache.org/jira/browse/BEAM-2645
+return RunnerApi.DisplayData.newBuilder()
+.addItems(
+RunnerApi.DisplayData.Item.newBuilder()
+
.setId(RunnerApi.DisplayData.Identifier.newBuilder().setKey("stubImplementation"))
+.setLabel("Stub implementation")
+.setType(RunnerApi.DisplayData.Type.BOOLEAN)
+
.setValue(Any.pack(BoolValue.newBuilder().setValue(true).build(
+.build();
+  }
+}

http://git-wip-us.apache.org/repos/asf/beam/blob/efe2dc17/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PTransformTranslation.java
--
diff --git 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PTransformTranslation.java
 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PTransformTranslation.java
index 3b94724..d459645 100644
--- 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PTransformTranslation.java
+++ 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PTransformTranslation.java
@@ -33,6 +33,7 @@ import org.apache.beam.sdk.common.runner.v1.RunnerApi;
 import org.apache.beam.sdk.common.runner.v1.RunnerApi.FunctionSpec;
 import org.apache.beam.sdk.runners.AppliedPTransform;
 import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.display.DisplayData;
 import org.apache.beam.sdk.values.PCollection;
 import org.apache.beam.sdk.values.PInput;
 import org.apache.beam.sdk.values.POutput;
@@ -118,7 +119,8 @@ public class PTransformTranslation {
 }
 
 transformBuilder.setUniqueName(appliedPTransform.getFullName());
-// TODO: Display Data
+transformBuilder.setDisplayData(
+
DisplayDataTranslation.toProto(DisplayData.from(appliedPTransform.getTransform(;
 
 PTransform transform = appliedPTransform.getTransform();
 // A RawPTransform directly vends its payload. Because it will generally be

[1/5] beam git commit: Fix tests that passed invalid input to DynamicDestinations

2017-07-24 Thread kenn

Repository: beam
Updated Branches:
  refs/heads/master 0064fb37a -> 01408c864


Fix tests that passed invalid input to DynamicDestinations


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/12c277f3
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/12c277f3
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/12c277f3

Branch: refs/heads/master
Commit: 12c277f31b2b8a295e0a41cab3290943a3eff7cd
Parents: efe2dc1
Author: Kenneth Knowles 
Authored: Fri Jul 21 20:32:20 2017 -0700
Committer: Kenneth Knowles 
Committed: Mon Jul 24 18:53:25 2017 -0700

--
 .../construction/PTransformMatchersTest.java| 25 +-
 .../direct/WriteWithShardingFactoryTest.java| 27 +++-
 .../runners/dataflow/DataflowRunnerTest.java| 24 -
 .../beam/sdk/io/DynamicFileDestinations.java|  6 -
 4 files changed, 78 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/12c277f3/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PTransformMatchersTest.java
--
diff --git 
a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PTransformMatchersTest.java
 
b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PTransformMatchersTest.java
index 99d3dd1..316645b 100644
--- 
a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PTransformMatchersTest.java
+++ 
b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PTransformMatchersTest.java
@@ -27,6 +27,7 @@ import com.google.common.base.MoreObjects;
 import com.google.common.collect.ImmutableMap;
 import java.io.Serializable;
 import java.util.Collections;
+import javax.annotation.Nullable;
 import org.apache.beam.sdk.coders.KvCoder;
 import org.apache.beam.sdk.coders.StringUtf8Coder;
 import org.apache.beam.sdk.coders.VarIntCoder;
@@ -62,7 +63,9 @@ import org.apache.beam.sdk.transforms.View;
 import org.apache.beam.sdk.transforms.View.CreatePCollectionView;
 import org.apache.beam.sdk.transforms.ViewFn;
 import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker;
+import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
 import org.apache.beam.sdk.transforms.windowing.GlobalWindows;
+import org.apache.beam.sdk.transforms.windowing.PaneInfo;
 import org.apache.beam.sdk.transforms.windowing.Window;
 import org.apache.beam.sdk.util.WindowedValue;
 import org.apache.beam.sdk.values.KV;
@@ -547,7 +550,8 @@ public class PTransformMatchersTest implements Serializable 
{
 WriteFiles write =
 WriteFiles.to(
 new FileBasedSink(
-StaticValueProvider.of(outputDirectory), 
DynamicFileDestinations.constant(null)) {
+StaticValueProvider.of(outputDirectory),
+DynamicFileDestinations.constant(new FakeFilenamePolicy())) {
   @Override
   public WriteOperation createWriteOperation() {
 return null;
@@ -580,4 +584,23 @@ public class PTransformMatchersTest implements 
Serializable {
 write,
 p);
   }
+
+  private static class FakeFilenamePolicy extends FilenamePolicy {
+@Override
+public ResourceId windowedFilename(
+int shardNumber,
+int numShards,
+BoundedWindow window,
+PaneInfo paneInfo,
+FileBasedSink.OutputFileHints outputFileHints) {
+  throw new UnsupportedOperationException("should not be called");
+}
+
+@Nullable
+@Override
+public ResourceId unwindowedFilename(
+int shardNumber, int numShards, FileBasedSink.OutputFileHints 
outputFileHints) {
+  throw new UnsupportedOperationException("should not be called");
+}
+  }
 }

http://git-wip-us.apache.org/repos/asf/beam/blob/12c277f3/runners/direct-java/src/test/java/org/apache/beam/runners/direct/WriteWithShardingFactoryTest.java
--
diff --git 
a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/WriteWithShardingFactoryTest.java
 
b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/WriteWithShardingFactoryTest.java
index 546a181..6dd069c 100644
--- 
a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/WriteWithShardingFactoryTest.java
+++ 
b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/WriteWithShardingFactoryTest.java
@@ -36,6 +36,7 @@ import java.util.ArrayList;
 import java.util.Collections;
 import java.util.List;
 import java.util.UUID;
+import javax.annotation.Nullable;

[3/5] beam git commit: Add Pipeline rehydration from proto

2017-07-24 Thread kenn

Add Pipeline rehydration from proto


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/43481595
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/43481595
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/43481595

Branch: refs/heads/master
Commit: 43481595ebc854f4a7188609fd53267497e68124
Parents: 12c277f
Author: Kenneth Knowles 
Authored: Fri May 26 11:22:50 2017 -0700
Committer: Kenneth Knowles 
Committed: Mon Jul 24 18:53:26 2017 -0700

--
 .../construction/PTransformTranslation.java |   8 +
 .../core/construction/PipelineTranslation.java  | 280 +++
 .../core/construction/RehydratedComponents.java |   3 +-
 .../core/construction/SdkComponents.java|  52 
 .../construction/PipelineTranslationTest.java   | 199 +
 .../core/construction/SdkComponentsTest.java| 107 ---
 .../src/main/proto/beam_runner_api.proto|   4 +-
 .../main/java/org/apache/beam/sdk/Pipeline.java |  15 +-
 .../beam/sdk/runners/TransformHierarchy.java|  69 +
 9 files changed, 574 insertions(+), 163 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/43481595/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PTransformTranslation.java
--
diff --git 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PTransformTranslation.java
 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PTransformTranslation.java
index d459645..b8365c9 100644
--- 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PTransformTranslation.java
+++ 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PTransformTranslation.java
@@ -92,6 +92,7 @@ public class PTransformTranslation {
   List subtransforms,
   SdkComponents components)
   throws IOException {
+// TODO include DisplayData https://issues.apache.org/jira/browse/BEAM-2645
 RunnerApi.PTransform.Builder transformBuilder = 
RunnerApi.PTransform.newBuilder();
 for (Map.Entry taggedInput : 
appliedPTransform.getInputs().entrySet()) {
   checkArgument(
@@ -136,6 +137,7 @@ public class PTransformTranslation {
 }
 transformBuilder.setSpec(payload);
   }
+  rawPTransform.registerComponents(components);
 } else if (KNOWN_PAYLOAD_TRANSLATORS.containsKey(transform.getClass())) {
   FunctionSpec payload =
   KNOWN_PAYLOAD_TRANSLATORS
@@ -225,6 +227,8 @@ public class PTransformTranslation {
 public Any getPayload() {
   return null;
 }
+
+public void registerComponents(SdkComponents components) {}
   }
 
   /**
@@ -255,6 +259,10 @@ public class PTransformTranslation {
 transformSpec.setParameter(payload);
   }
 
+  // Transforms like Combine may have Coders that need to be added but do 
not
+  // occur in a black-box traversal
+  transform.getTransform().registerComponents(components);
+
   return transformSpec.build();
 }
   }

http://git-wip-us.apache.org/repos/asf/beam/blob/43481595/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PipelineTranslation.java
--
diff --git 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PipelineTranslation.java
 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PipelineTranslation.java
new file mode 100644
index 000..9e4839a
--- /dev/null
+++ 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PipelineTranslation.java
@@ -0,0 +1,280 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.core.construction;
+
+import static

[4/5] beam git commit: Dehydrate then rehydrate Pipeline before DirectRunner.run()

2017-07-24 Thread kenn

Dehydrate then rehydrate Pipeline before DirectRunner.run()


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/8ca45915
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/8ca45915
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/8ca45915

Branch: refs/heads/master
Commit: 8ca45915693839edb14f824fa6835ebe3e67
Parents: 4348159
Author: Kenneth Knowles 
Authored: Fri May 26 11:23:05 2017 -0700
Committer: Kenneth Knowles 
Committed: Mon Jul 24 18:53:26 2017 -0700

--
 .../beam/runners/direct/DirectRunner.java   | 11 +-
 .../runners/direct/ViewOverrideFactoryTest.java | 41 
 2 files changed, 10 insertions(+), 42 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/8ca45915/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DirectRunner.java
--
diff --git 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DirectRunner.java
 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DirectRunner.java
index 4621224..c5f29e5 100644
--- 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DirectRunner.java
+++ 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DirectRunner.java
@@ -22,6 +22,7 @@ import com.google.common.base.Supplier;
 import com.google.common.collect.ImmutableList;
 import com.google.common.collect.ImmutableMap;
 import com.google.common.collect.ImmutableSet;
+import java.io.IOException;
 import java.util.Collection;
 import java.util.Collections;
 import java.util.EnumSet;
@@ -31,6 +32,7 @@ import java.util.Set;
 import org.apache.beam.runners.core.SplittableParDoViaKeyedWorkItems;
 import org.apache.beam.runners.core.construction.PTransformMatchers;
 import org.apache.beam.runners.core.construction.PTransformTranslation;
+import org.apache.beam.runners.core.construction.PipelineTranslation;
 import org.apache.beam.runners.core.construction.SplittableParDo;
 import org.apache.beam.runners.direct.DirectRunner.DirectPipelineResult;
 import 
org.apache.beam.runners.direct.TestStreamEvaluatorFactory.DirectTestStreamFactory;
@@ -156,7 +158,14 @@ public class DirectRunner extends 
PipelineRunner {
   }
 
   @Override
-  public DirectPipelineResult run(Pipeline pipeline) {
+  public DirectPipelineResult run(Pipeline originalPipeline) {
+Pipeline pipeline;
+try {
+  pipeline = PipelineTranslation.fromProto(
+  PipelineTranslation.toProto(originalPipeline));
+} catch (IOException exception) {
+  throw new RuntimeException("Error preparing pipeline for direct 
execution.", exception);
+}
 pipeline.replaceAll(defaultTransformOverrides());
 MetricsEnvironment.setMetricsSupported(true);
 DirectGraphVisitor graphVisitor = new DirectGraphVisitor();

http://git-wip-us.apache.org/repos/asf/beam/blob/8ca45915/runners/direct-java/src/test/java/org/apache/beam/runners/direct/ViewOverrideFactoryTest.java
--
diff --git 
a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/ViewOverrideFactoryTest.java
 
b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/ViewOverrideFactoryTest.java
index 6af9273..94d8d70 100644
--- 
a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/ViewOverrideFactoryTest.java
+++ 
b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/ViewOverrideFactoryTest.java
@@ -23,22 +23,17 @@ import static org.hamcrest.Matchers.hasSize;
 import static org.hamcrest.Matchers.is;
 import static org.junit.Assert.assertThat;
 
-import com.google.common.collect.ImmutableSet;
 import java.io.Serializable;
 import java.util.List;
-import java.util.Set;
 import java.util.concurrent.atomic.AtomicBoolean;
 import org.apache.beam.runners.direct.ViewOverrideFactory.WriteView;
 import org.apache.beam.sdk.Pipeline.PipelineVisitor;
 import org.apache.beam.sdk.runners.AppliedPTransform;
 import 
org.apache.beam.sdk.runners.PTransformOverrideFactory.PTransformReplacement;
 import org.apache.beam.sdk.runners.TransformHierarchy.Node;
-import org.apache.beam.sdk.testing.PAssert;
 import org.apache.beam.sdk.testing.TestPipeline;
 import org.apache.beam.sdk.transforms.Create;
-import org.apache.beam.sdk.transforms.DoFn;
 import org.apache.beam.sdk.transforms.PTransform;
-import org.apache.beam.sdk.transforms.ParDo;
 import org.apache.beam.sdk.transforms.View.CreatePCollectionView;
 import org.apache.beam.sdk.transforms.ViewFn;
 import org.apache.beam.sdk.transforms.windowing.WindowMappingFn;
@@ -62,42 +57,6 @@ public class ViewOverrideFactoryTest implements Serializable 
{
   new ViewOverrideFactory<>();
 
   @Test
-  public void

[GitHub] beam pull request #3334: [BEAM-2333] Go to proto and back before running a p...

2017-07-24 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3334


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[5/5] beam git commit: This closes #3334: [BEAM-2333] Go to proto and back before running a pipeline in Java DirectRunner

2017-07-24 Thread kenn

This closes #3334: [BEAM-2333] Go to proto and back before running a pipeline 
in Java DirectRunner

  Dehydrate then rehydrate Pipeline before DirectRunner.run()
  Add Pipeline rehydration from proto
  Fix tests that passed invalid input to DynamicDestinations
  Add stub DisplayDataTranslation


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/01408c86
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/01408c86
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/01408c86

Branch: refs/heads/master
Commit: 01408c864e9d844f4ffb74cc3f18276ff6a5c447
Parents: 0064fb3 8ca4591
Author: Kenneth Knowles 
Authored: Mon Jul 24 18:59:57 2017 -0700
Committer: Kenneth Knowles 
Committed: Mon Jul 24 18:59:57 2017 -0700

--
 .../construction/DisplayDataTranslation.java|  39 +++
 .../construction/PTransformTranslation.java |  12 +-
 .../core/construction/PipelineTranslation.java  | 280 +++
 .../core/construction/RehydratedComponents.java |   3 +-
 .../core/construction/SdkComponents.java|  52 
 .../construction/PTransformMatchersTest.java|  25 +-
 .../construction/PipelineTranslationTest.java   | 199 +
 .../core/construction/SdkComponentsTest.java| 107 ---
 .../beam/runners/direct/DirectRunner.java   |  11 +-
 .../runners/direct/ViewOverrideFactoryTest.java |  41 ---
 .../direct/WriteWithShardingFactoryTest.java|  27 +-
 .../runners/dataflow/DataflowRunnerTest.java|  24 +-
 .../src/main/proto/beam_runner_api.proto|   4 +-
 .../main/java/org/apache/beam/sdk/Pipeline.java |  15 +-
 .../beam/sdk/io/DynamicFileDestinations.java|   6 +-
 .../beam/sdk/runners/TransformHierarchy.java|  69 +
 16 files changed, 704 insertions(+), 210 deletions(-)
--

[GitHub] beam pull request #3635: Runner api read good

2017-07-24 Thread robertwb

GitHub user robertwb opened a pull request:

https://github.com/apache/beam/pull/3635

Runner api read good

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/robertwb/incubator-beam runner-api-read-good

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3635.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3635


commit 617d3aff0da7d28448f862e4478c62d2810088d8
Author: Robert Bradshaw 
Date:   2017-07-17T23:51:52Z

Use runner API for read operation.

commit 61db58c4022cbcb505be97b6c9877592b725aacb
Author: Robert Bradshaw 
Date:   2017-07-25T00:19:47Z

Disable abc metaclass due to issues with pickling.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (BEAM-1820) Source.getDefaultOutputCoder() should be @Nullable

2017-07-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099290#comment-16099290
 ] 

ASF GitHub Bot commented on BEAM-1820:
--

GitHub user jkff opened a pull request:

https://github.com/apache/beam/pull/3634

[BEAM-1820] Source.getDefaultOutputCoder() throws 
CannotProvideCoderException

This is based on https://github.com/apache/beam/pull/3549 by @lgajowy, but 
it turned out that a lot more needs to be done to fix it properly, and the 
necessary changes require knowledge of sufficiently dark corners of the SDK and 
runners that asking an external contributor to do this is unfair, so I did the 
changes myself.

TL;DR: callers of Source.getDefaultOutputCoder shouldn't require it to 
succeed.

Source.getDefaultOutputCoder is only a hint for inferring the Coder of its 
elements, and it should not be required to succeed.

E.g. when a runner is replacing a Read.from(source) transform, and the 
override needs to know a coder for elements of the source, if the source 
doesn't provide a coder, the user may have set a coder on the returned 
PCollection explicitly. In this case, the runner should use that coder.

In other cases, when an API uses a Source and needs its coder, it must let 
the caller provide a Coder explicitly (e.g. SourceTestUtils).

I am treating this as a backwards-compatible change because 
Source.getDefaultOutputCoder() is a runner-facing rather than user-facing API. 
From the point of view of a user implementing a Source, adding a throws to the 
base class method signature is compatible.

R: @tgroh 
CC: @lgajowy


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkff/incubator-beam source-coder-fallback

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3634.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3634


commit 830c5657736ffe49e06e16039f947e7227fa45a7
Author: Eugene Kirpichov 
Date:   2017-07-24T23:38:27Z

[BEAM-1820] Source.getDefaultOutputCoder() throws 
CannotProvideCoderException

This is based on https://github.com/apache/beam/pull/3549 by @lgajowy,
but it turned out that a lot more needs to be done to fix it properly,
and the necessary changes require knowledge of sufficiently dark
corners of the SDK and runners that asking an external contributor
to do this is unfair, so I did the changes myself.

TL;DR: callers of Source.getDefaultOutputCoder shouldn't require it to 
succeed.

Source.getDefaultOutputCoder is only a hint for inferring the Coder
of its elements, and it should not be required to succeed.

E.g. when a runner is replacing a Read.from(source) transform,
and the override needs to know a coder for elements of the source,
if the source doesn't provide a coder, the user may have set
a coder on the returned PCollection explicitly. In this case,
the runner should use that coder.

In other cases, when an API uses a Source and needs its coder,
it must let the caller provide a Coder explicitly (e.g.
SourceTestUtils).




> Source.getDefaultOutputCoder() should be @Nullable
> --
>
> Key: BEAM-1820
> URL: https://issues.apache.org/jira/browse/BEAM-1820
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Łukasz Gajowy
>  Labels: easyfix, starter
>
> Source.getDefaultOutputCoder() returns a coder for elements produced by the 
> source.
> However, the Source objects are nearly always hidden from the user and 
> instead encapsulated in a transform. Often, an enclosing transform has a 
> better idea of what coder should be used to encode these elements (e.g. a 
> user supplied a Coder to that transform's configuration). In that case, it'd 
> be good if Source.getDefaultOutputCoder() could just return null, and coder 
> would have to be handled by the enclosing transform or perhaps specified on 
> the output of that transform explicitly.
> Right now there's a bunch of code in the SDK and runners that assumes 
> Source.getDefaultOutputCoder() returns non-null. That code would need to be 
> fixed to instead use the coder set on the collection produced by 
> Read.from(source).
> It all appears pretty easy to fix, so this is a good starter item.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] beam pull request #3634: [BEAM-1820] Source.getDefaultOutputCoder() throws C...

2017-07-24 Thread jkff

GitHub user jkff opened a pull request:

https://github.com/apache/beam/pull/3634

[BEAM-1820] Source.getDefaultOutputCoder() throws 
CannotProvideCoderException

This is based on https://github.com/apache/beam/pull/3549 by @lgajowy, but 
it turned out that a lot more needs to be done to fix it properly, and the 
necessary changes require knowledge of sufficiently dark corners of the SDK and 
runners that asking an external contributor to do this is unfair, so I did the 
changes myself.

TL;DR: callers of Source.getDefaultOutputCoder shouldn't require it to 
succeed.

Source.getDefaultOutputCoder is only a hint for inferring the Coder of its 
elements, and it should not be required to succeed.

E.g. when a runner is replacing a Read.from(source) transform, and the 
override needs to know a coder for elements of the source, if the source 
doesn't provide a coder, the user may have set a coder on the returned 
PCollection explicitly. In this case, the runner should use that coder.

In other cases, when an API uses a Source and needs its coder, it must let 
the caller provide a Coder explicitly (e.g. SourceTestUtils).

I am treating this as a backwards-compatible change because 
Source.getDefaultOutputCoder() is a runner-facing rather than user-facing API. 
From the point of view of a user implementing a Source, adding a throws to the 
base class method signature is compatible.

R: @tgroh 
CC: @lgajowy


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkff/incubator-beam source-coder-fallback

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3634.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3634


commit 830c5657736ffe49e06e16039f947e7227fa45a7
Author: Eugene Kirpichov 
Date:   2017-07-24T23:38:27Z

[BEAM-1820] Source.getDefaultOutputCoder() throws 
CannotProvideCoderException

This is based on https://github.com/apache/beam/pull/3549 by @lgajowy,
but it turned out that a lot more needs to be done to fix it properly,
and the necessary changes require knowledge of sufficiently dark
corners of the SDK and runners that asking an external contributor
to do this is unfair, so I did the changes myself.

TL;DR: callers of Source.getDefaultOutputCoder shouldn't require it to 
succeed.

Source.getDefaultOutputCoder is only a hint for inferring the Coder
of its elements, and it should not be required to succeed.

E.g. when a runner is replacing a Read.from(source) transform,
and the override needs to know a coder for elements of the source,
if the source doesn't provide a coder, the user may have set
a coder on the returned PCollection explicitly. In this case,
the runner should use that coder.

In other cases, when an API uses a Source and needs its coder,
it must let the caller provide a Coder explicitly (e.g.
SourceTestUtils).




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (BEAM-2676) move BeamSqlRow and BeamSqlRowType to sdk/java/core

2017-07-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099249#comment-16099249
 ] 

ASF GitHub Bot commented on BEAM-2676:
--

GitHub user XuMingmin opened a pull request:

https://github.com/apache/beam/pull/3633

[BEAM-2676] move BeamSqlRow and BeamSqlRowType to sdk/java/core

It contains two parts:
1. remove SQL word in the name,
`BeamSqlRow` --> `BeamRow`
`BeamSqlRowType` --> `BeamRowType`

2. move from package `org.apache.beam.dsls.sql.schema` to 
`org.apache.beam.sdk.sd` (**sd** stands for structure data), in module 
`beam-sdks-java-core`

*Hint*: 
The 4 files are changed to remove dependencies on Calcite/BeamSql, others 
are updated automate by IDE due to class name change:
* BeamRow.java
* BeamRowCoder.java
* BeamRowType.java
* BeamSqlRowCoderTest.java

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/XuMingmin/beam BEAM-2676

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3633.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3633


commit 8e9382862cfac475db286b35d5cab379de83615d
Author: mingmxu 
Date:   2017-07-24T23:03:35Z

move BeamSqlRow and BeamSqlRowType to sdk/java/core




> move BeamSqlRow and BeamSqlRowType to sdk/java/core
> ---
>
> Key: BEAM-2676
> URL: https://issues.apache.org/jira/browse/BEAM-2676
> Project: Beam
>  Issue Type: Test
>  Components: dsl-sql
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>  Labels: dsl_sql_merge
>
> BeamSqlRow/BeamSqlRowType is the fundamental of structured data processing in 
> Beam, like joins, simple projections/expansions. It's more visible to move 
> them to sdk-java-core.
> It contains two parts:
> 1). remove SQL word in the name,
> BeamSqlRow --> BeamRow
> BeamSqlRowType --> BeamRowType
> 2). move from package {{org.apache.beam.dsls.sql.schema}} to 
> {{org.apache.beam.sdk.sd}} (sd stands for structure data), in module 
> {{beam-sdks-java-core}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] beam pull request #3633: [BEAM-2676] move BeamSqlRow and BeamSqlRowType to s...

2017-07-24 Thread XuMingmin

GitHub user XuMingmin opened a pull request:

https://github.com/apache/beam/pull/3633

[BEAM-2676] move BeamSqlRow and BeamSqlRowType to sdk/java/core

It contains two parts:
1. remove SQL word in the name,
`BeamSqlRow` --> `BeamRow`
`BeamSqlRowType` --> `BeamRowType`

2. move from package `org.apache.beam.dsls.sql.schema` to 
`org.apache.beam.sdk.sd` (**sd** stands for structure data), in module 
`beam-sdks-java-core`

*Hint*: 
The 4 files are changed to remove dependencies on Calcite/BeamSql, others 
are updated automate by IDE due to class name change:
* BeamRow.java
* BeamRowCoder.java
* BeamRowType.java
* BeamSqlRowCoderTest.java

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/XuMingmin/beam BEAM-2676

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3633.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3633


commit 8e9382862cfac475db286b35d5cab379de83615d
Author: mingmxu 
Date:   2017-07-24T23:03:35Z

move BeamSqlRow and BeamSqlRowType to sdk/java/core




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (BEAM-2677) AvroIO.read without specifying a schema

2017-07-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099182#comment-16099182
 ] 

ASF GitHub Bot commented on BEAM-2677:
--

GitHub user jkff opened a pull request:

https://github.com/apache/beam/pull/3632

[BEAM-2677] AvroIO.parseGenericRecords - schemaless AvroIO.read

To be done properly, this PR needs https://github.com/apache/beam/pull/3549.

R: @mairbek CC: @reuvenlax (please take a look at the API but hold off a 
full review until that PR is submitted)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkff/incubator-beam avroio-dynamic

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3632.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3632


commit 8bcafd94a38d9ccb962acd809536ff4bfcf70036
Author: Eugene Kirpichov 
Date:   2017-07-24T22:07:15Z

[BEAM-2677] AvroIO.parseGenericRecords - schemaless AvroIO.read




> AvroIO.read without specifying a schema
> ---
>
> Key: BEAM-2677
> URL: https://issues.apache.org/jira/browse/BEAM-2677
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
>
> Sometimes it is inconvenient to require the user of AvroIO.read/readAll to 
> specify a Schema for the Avro files they are reading, especially if different 
> files may have different schemas.
> It is possible to read GenericRecord objects from an Avro file, however it is 
> not possible to provide a Coder for GenericRecord without knowing the schema: 
> a GenericRecord knows its schema so we can encode it into a byte array, but 
> we can not decode it from a byte array without knowing the schema (and 
> encoding the full schema together with every record would be impractical).
> Instead, a reasonable approach is to treat schemaless GenericRecord as 
> unencodable and use the same approach as JdbcIO - a user-specified parse 
> callback.
> Suggested API: AvroIO.parseGenericRecords(SerializableFunction T> parseFn).from(filepattern).
> CC: [~mkhadikov] [~reuvenlax]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] beam pull request #3632: [BEAM-2677] AvroIO.parseGenericRecords - schemaless...

2017-07-24 Thread jkff

GitHub user jkff opened a pull request:

https://github.com/apache/beam/pull/3632

[BEAM-2677] AvroIO.parseGenericRecords - schemaless AvroIO.read

To be done properly, this PR needs https://github.com/apache/beam/pull/3549.

R: @mairbek CC: @reuvenlax (please take a look at the API but hold off a 
full review until that PR is submitted)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkff/incubator-beam avroio-dynamic

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3632.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3632


commit 8bcafd94a38d9ccb962acd809536ff4bfcf70036
Author: Eugene Kirpichov 
Date:   2017-07-24T22:07:15Z

[BEAM-2677] AvroIO.parseGenericRecords - schemaless AvroIO.read




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (BEAM-2677) AvroIO.read without specifying a schema

2017-07-24 Thread Eugene Kirpichov (JIRA)

Eugene Kirpichov created BEAM-2677:
--

 Summary: AvroIO.read without specifying a schema
 Key: BEAM-2677
 URL: https://issues.apache.org/jira/browse/BEAM-2677
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Reporter: Eugene Kirpichov
Assignee: Eugene Kirpichov


Sometimes it is inconvenient to require the user of AvroIO.read/readAll to 
specify a Schema for the Avro files they are reading, especially if different 
files may have different schemas.

It is possible to read GenericRecord objects from an Avro file, however it is 
not possible to provide a Coder for GenericRecord without knowing the schema: a 
GenericRecord knows its schema so we can encode it into a byte array, but we 
can not decode it from a byte array without knowing the schema (and encoding 
the full schema together with every record would be impractical).

Instead, a reasonable approach is to treat schemaless GenericRecord as 
unencodable and use the same approach as JdbcIO - a user-specified parse 
callback.

Suggested API: AvroIO.parseGenericRecords(SerializableFunction parseFn).from(filepattern).

CC: [~mkhadikov] [~reuvenlax]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (BEAM-2442) DSL SQL: Public classes/methods should not expose/use calcite types

2017-07-24 Thread Luke Cwik (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-2442.
-
   Resolution: Fixed
Fix Version/s: 2.2.0

> DSL SQL: Public classes/methods should not expose/use calcite types
> ---
>
> Key: BEAM-2442
> URL: https://issues.apache.org/jira/browse/BEAM-2442
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Luke Cwik
>Assignee: James Xu
>  Labels: dsl_sql_merge
> Fix For: 2.2.0
>
>
> Calcite is an internal implementation detail of how Beam SQL is operating. To 
> prevent a hard dependence on Calcite, public methods and classes should not 
> rely on consuming/producing Calcite types.
> For example, BeamSqlRecordType uses org.apache.calcite.sql.type.SqlTypeName 
> instead of using the Java SQL types 
> (https://docs.oracle.com/javase/8/docs/api/java/sql/Types.html).
> This task is to create an ApiSurfaceTest to help find, fix, and prevent 
> org.apache.calcite.* from being exposed. Example ApiSurfaceTest: 
> https://github.com/apache/beam/blob/367fcb28d544934797d25cb34d54136b2d7d6e99/sdks/java/core/src/test/java/org/apache/beam/SdkCoreApiSurfaceTest.java



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (BEAM-2676) move BeamSqlRow and BeamSqlRowType to sdk/java/core

2017-07-24 Thread Xu Mingmin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Mingmin updated BEAM-2676:
-
Description: 
BeamSqlRow/BeamSqlRowType is the fundamental of structured data processing in 
Beam, like joins, simple projections/expansions. It's more visible to move them 
to sdk-java-core.

It contains two parts:
1). remove SQL word in the name,
BeamSqlRow --> BeamRow
BeamSqlRowType --> BeamRowType

2). move from package {{org.apache.beam.dsls.sql.schema}} to 
{{org.apache.beam.sdk.sd}} (sd stands for structure data), in module 
{{beam-sdks-java-core}}


  was:
BeamSqlRow/BeamSqlRowType is the fundamental of structured data processing in 
Beam, like joins, simple projections/expansions. It's more visible to move them 
to sdk-java-core.

It contains two parts:
1). remove SQL word in the name,
BeamSqlRow --> BeamRow
BeamSqlRowType --> BeamRowType

2). move from package {{org.apache.beam.dsls.sql.schema}} to 
{{org.apache.beam.sdk.sd}}. (sd stands for structure data)



> move BeamSqlRow and BeamSqlRowType to sdk/java/core
> ---
>
> Key: BEAM-2676
> URL: https://issues.apache.org/jira/browse/BEAM-2676
> Project: Beam
>  Issue Type: Test
>  Components: dsl-sql
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>  Labels: dsl_sql_merge
>
> BeamSqlRow/BeamSqlRowType is the fundamental of structured data processing in 
> Beam, like joins, simple projections/expansions. It's more visible to move 
> them to sdk-java-core.
> It contains two parts:
> 1). remove SQL word in the name,
> BeamSqlRow --> BeamRow
> BeamSqlRowType --> BeamRowType
> 2). move from package {{org.apache.beam.dsls.sql.schema}} to 
> {{org.apache.beam.sdk.sd}} (sd stands for structure data), in module 
> {{beam-sdks-java-core}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (BEAM-2440) DSL SQL: Reduce visibility to simplify backwards compatibility

2017-07-24 Thread Luke Cwik (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-2440.
-
   Resolution: Fixed
Fix Version/s: 2.2.0

> DSL SQL: Reduce visibility to simplify backwards compatibility
> --
>
> Key: BEAM-2440
> URL: https://issues.apache.org/jira/browse/BEAM-2440
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Luke Cwik
>Assignee: James Xu
>  Labels: dsl_sql_merge
> Fix For: 2.2.0
>
>
> The package namespace should be flattened into one java package and 
> everything made package private except for the public classes like BeamSql, 
> BeamSqlCli, BeamSqlRow, BeamSqlRowCoder, ...
> This will simplify the backwards compatibility story after merging since it 
> reduces the visible surface that users can interact with.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-2676) move BeamSqlRow and BeamSqlRowType to sdk/java/core

2017-07-24 Thread Xu Mingmin (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099155#comment-16099155
 ] 

Xu Mingmin commented on BEAM-2676:
--

[~robertwb] [~lcwik] for any comments 

> move BeamSqlRow and BeamSqlRowType to sdk/java/core
> ---
>
> Key: BEAM-2676
> URL: https://issues.apache.org/jira/browse/BEAM-2676
> Project: Beam
>  Issue Type: Test
>  Components: dsl-sql
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>  Labels: dsl_sql_merge
>
> BeamSqlRow/BeamSqlRowType is the fundamental of structured data processing in 
> Beam, like joins, simple projections/expansions. It's more visible to move 
> them to sdk-java-core.
> It contains two parts:
> 1). remove SQL word in the name,
> BeamSqlRow --> BeamRow
> BeamSqlRowType --> BeamRowType
> 2). move from package {{org.apache.beam.dsls.sql.schema}} to 
> {{org.apache.beam.sdk.sd}}. (sd stands for structure data)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (BEAM-2676) move BeamSqlRow and BeamSqlRowType to sdk/java/core

2017-07-24 Thread Xu Mingmin (JIRA)

Xu Mingmin created BEAM-2676:


 Summary: move BeamSqlRow and BeamSqlRowType to sdk/java/core
 Key: BEAM-2676
 URL: https://issues.apache.org/jira/browse/BEAM-2676
 Project: Beam
  Issue Type: Test
  Components: dsl-sql
Reporter: Xu Mingmin
Assignee: Xu Mingmin


BeamSqlRow/BeamSqlRowType is the fundamental of structured data processing in 
Beam, like joins, simple projections/expansions. It's more visible to move them 
to sdk-java-core.

It contains two parts:
1). remove SQL word in the name,
BeamSqlRow --> BeamRow
BeamSqlRowType --> BeamRowType

2). move from package {{org.apache.beam.dsls.sql.schema}} to 
{{org.apache.beam.sdk.sd}}. (sd stands for structure data)




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-2675) Consolidate ValidatesRunner Groovy Definitions

2017-07-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099142#comment-16099142
 ] 

ASF GitHub Bot commented on BEAM-2675:
--

GitHub user jasonkuster opened a pull request:

https://github.com/apache/beam/pull/3631

[BEAM-2675] Consolidate ValidatesRunner suite building into one file.

Signed-off-by: Jason Kuster 

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [x] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [x] Each commit in the pull request should have a meaningful subject 
line and body.
 - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [x] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [x] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---

This change reduces duplication in the ValidatesRunner tests and avoids the 
proliferation of additional files in favor of the minimum information necessary 
(runner name, maven args to run) for the test.

R: @kennknowles 

Feel free to punt to someone else -- assigning to you as you've been 
working with the ValidatesRunner tests for a while.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jasonkuster/beam vr-consolidate

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3631.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3631


commit f48be1487cb69294460541a09cea4a27819f32fe
Author: Jason Kuster 
Date:   2017-07-24T21:12:18Z

Consolidate ValidatesRunner suite building into one file.

Signed-off-by: Jason Kuster 




> Consolidate ValidatesRunner Groovy Definitions
> --
>
> Key: BEAM-2675
> URL: https://issues.apache.org/jira/browse/BEAM-2675
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Jason Kuster
>Assignee: Jason Kuster
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] beam pull request #3631: [BEAM-2675] Consolidate ValidatesRunner suite build...

2017-07-24 Thread jasonkuster

GitHub user jasonkuster opened a pull request:

https://github.com/apache/beam/pull/3631

[BEAM-2675] Consolidate ValidatesRunner suite building into one file.

Signed-off-by: Jason Kuster 

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [x] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [x] Each commit in the pull request should have a meaningful subject 
line and body.
 - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [x] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [x] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---

This change reduces duplication in the ValidatesRunner tests and avoids the 
proliferation of additional files in favor of the minimum information necessary 
(runner name, maven args to run) for the test.

R: @kennknowles 

Feel free to punt to someone else -- assigning to you as you've been 
working with the ValidatesRunner tests for a while.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jasonkuster/beam vr-consolidate

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3631.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3631


commit f48be1487cb69294460541a09cea4a27819f32fe
Author: Jason Kuster 
Date:   2017-07-24T21:12:18Z

Consolidate ValidatesRunner suite building into one file.

Signed-off-by: Jason Kuster 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (BEAM-2675) Consolidate ValidatesRunner Groovy Definitions

2017-07-24 Thread Jason Kuster (JIRA)

Jason Kuster created BEAM-2675:
--

 Summary: Consolidate ValidatesRunner Groovy Definitions
 Key: BEAM-2675
 URL: https://issues.apache.org/jira/browse/BEAM-2675
 Project: Beam
  Issue Type: Improvement
  Components: testing
Reporter: Jason Kuster
Assignee: Jason Kuster






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-2673) BigQuery Sink should use the Load API

2017-07-24 Thread Sourabh Bajaj (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099103#comment-16099103
 ] 

Sourabh Bajaj commented on BEAM-2673:
-

No as there is no Truncate mode in streaming, you can only append or create a 
new table.

> BigQuery Sink should use the Load API
> -
>
> Key: BEAM-2673
> URL: https://issues.apache.org/jira/browse/BEAM-2673
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Sourabh Bajaj
>Assignee: Ahmet Altay
>
> Currently the BigQuery sink is written to by using the streaming api in the 
> direct runner. Instead we should just use the load api and also simplify the 
> management of different create and write disposition. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-2673) BigQuery Sink should use the Load API

2017-07-24 Thread Ahmet Altay (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099098#comment-16099098
 ] 

Ahmet Altay commented on BEAM-2673:
---

Is not it also hit when python streaming is used with DataflowRunner ?

> BigQuery Sink should use the Load API
> -
>
> Key: BEAM-2673
> URL: https://issues.apache.org/jira/browse/BEAM-2673
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Sourabh Bajaj
>Assignee: Ahmet Altay
>
> Currently the BigQuery sink is written to by using the streaming api in the 
> direct runner. Instead we should just use the load api and also simplify the 
> management of different create and write disposition. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-2445) DSL SQL to use service locator pattern to automatically register UDFs

2017-07-24 Thread Xu Mingmin (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099079#comment-16099079
 ] 

Xu Mingmin commented on BEAM-2445:
--

Talked with Luke, we plan to enable service loader for UDFs in both CLI and 
DSL, as:
1). for CLI
The service loader is called by
{code}
add jar [load] [force|ignore|error] /path/to/jar
{code}
, it loads all functions automate and handle any potential conflict as 
{{[force|ignore|error]}}

Meanwhile, the traditional usage is also kept, whenever user prefer it, also in 
case of resolving conflict of auto load
{code}
add jar /path/to/jar
add funtion fun_name path.to.udf.class
{code}

2). for DSL
DSL can also benefit from this service loader, to avoid a long chain of 
{{withUdf().withUdf().withUdf()...}}

The service is proposed as 
{code}
interface UdfRegistrar {
  Map udfsByName();
}
{code}
UDF developer can leverage it to enable service loader.

> DSL SQL to use service locator pattern to automatically register UDFs
> -
>
> Key: BEAM-2445
> URL: https://issues.apache.org/jira/browse/BEAM-2445
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Luke Cwik
>Assignee: James Xu
>Priority: Minor
>
> Use a service locator pattern to find UDFs that can be registered. The 
> service loader can be used to register UDFs for standard functions via DSL 
> SQL, additional UDFs registered by third party libraries, and end user 
> created UDFs.
> Example ServiceLoader usage within Apache Beam to find coder providers:
> https://github.com/apache/beam/blob/7126fdc6ee5671e99a2dede3f25ba616aa0e8fa4/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/CoderRegistry.java#L147



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-2673) BigQuery Sink should use the Load API

2017-07-24 Thread Chamikara Jayalath (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099077#comment-16099077
 ] 

Chamikara Jayalath commented on BEAM-2673:
--

I think the fix here is to add a new BQ sink (which will work for both Direct 
and Dataflow runners) which will be a considerably large change. I agree that 
we should prioritize this but not sure if 2.2.0 is viable.

Also, IIUC, currently the delay only gets hit at DirectRunner when using 
WRITE_TRUNCATE which is not that severe.

> BigQuery Sink should use the Load API
> -
>
> Key: BEAM-2673
> URL: https://issues.apache.org/jira/browse/BEAM-2673
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Sourabh Bajaj
>Assignee: Ahmet Altay
>
> Currently the BigQuery sink is written to by using the streaming api in the 
> direct runner. Instead we should just use the load api and also simplify the 
> management of different create and write disposition. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (BEAM-2674) Runner API translators should own their rehydration

2017-07-24 Thread Kenneth Knowles (JIRA)

Kenneth Knowles created BEAM-2674:
-

 Summary: Runner API translators should own their rehydration
 Key: BEAM-2674
 URL: https://issues.apache.org/jira/browse/BEAM-2674
 Project: Beam
  Issue Type: New Feature
  Components: runner-core
Reporter: Kenneth Knowles
Assignee: Kenneth Knowles


This allows, for example, that a `ParDo` and `Combine` when rehydrated can have 
its additional inputs set up correctly. Also that `Combine` and others with 
auxiliary coders can register them appropriately. Currently these happen via 
sort of hackish one-off methods for each purpose.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #3645

2017-07-24 Thread Apache Jenkins Server

See

[jira] [Commented] (BEAM-2673) BigQuery Sink should use the Load API

2017-07-24 Thread Ahmet Altay (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099069#comment-16099069
 ] 

Ahmet Altay commented on BEAM-2673:
---

[~chamikara]] Given the added delay in 
https://github.com/apache/beam/pull/3630, should we work on fixing this issue 
for the 2.2.0 release?

> BigQuery Sink should use the Load API
> -
>
> Key: BEAM-2673
> URL: https://issues.apache.org/jira/browse/BEAM-2673
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Sourabh Bajaj
>Assignee: Ahmet Altay
>
> Currently the BigQuery sink is written to by using the streaming api in the 
> direct runner. Instead we should just use the load api and also simplify the 
> management of different create and write disposition. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] beam pull request #3630: We shouldn't write to re-created tables for 2 mins

2017-07-24 Thread sb2nov

GitHub user sb2nov opened a pull request:

https://github.com/apache/beam/pull/3630

We shouldn't write to re-created tables for 2 mins

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---

R: @chamikaramj PTAL

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sb2nov/beam BEAM-truncate-bigquery-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3630.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3630


commit 9f6350093a3eea568e67e235daa2c25586ce4dc5
Author: Sourabh Bajaj 
Date:   2017-07-24T20:17:27Z

We shouldn't write to re-created tables for 2 mins




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (BEAM-2673) BigQuery Sink should use the Load API

2017-07-24 Thread Sourabh Bajaj (JIRA)

Sourabh Bajaj created BEAM-2673:
---

 Summary: BigQuery Sink should use the Load API
 Key: BEAM-2673
 URL: https://issues.apache.org/jira/browse/BEAM-2673
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Sourabh Bajaj
Assignee: Ahmet Altay


Currently the BigQuery sink is written to by using the streaming api in the 
direct runner. Instead we should just use the load api and also simplify the 
management of different create and write disposition. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (BEAM-2672) test merge in BeamSqlUdaf

2017-07-24 Thread Xu Mingmin (JIRA)

Xu Mingmin created BEAM-2672:


 Summary: test merge in BeamSqlUdaf
 Key: BEAM-2672
 URL: https://issues.apache.org/jira/browse/BEAM-2672
 Project: Beam
  Issue Type: Test
  Components: dsl-sql
Reporter: Xu Mingmin
Assignee: Xu Mingmin


The integration test with {{DirectRunner}} cannot cover method {{merge}} in 
{{BeamSqlUdaf}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2700

2017-07-24 Thread Apache Jenkins Server

See

Build failed in Jenkins: beam_PostCommit_Python_Verify #2792

2017-07-24 Thread Apache Jenkins Server

See 


Changes:

[aviemzur] [BEAM-2314] Add ValidatesRunner test for merging custom windows

--
[...truncated 575.96 KB...]
  File was already downloaded /tmp/dataflow-requirements-cache/pbr-3.1.1.tar.gz
Successfully downloaded pyhamcrest mock setuptools six funcsigs pbr
test_default_value_singleton_side_input 
(apache_beam.transforms.sideinputs_test.SideInputsTest) ... ok
DEPRECATION: pip install --download has been deprecated and will be removed in 
the future. Pip now has a download command that should be used instead.
Collecting pyhamcrest (from -r postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/PyHamcrest-1.9.0.tar.gz
Collecting mock (from -r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/mock-2.0.0.tar.gz
Collecting setuptools (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/setuptools-36.2.2.zip
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.10.0.tar.gz
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded 
/tmp/dataflow-requirements-cache/funcsigs-1.0.2.tar.gz
Collecting pbr>=0.11 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/pbr-3.1.1.tar.gz
Successfully downloaded pyhamcrest mock setuptools six funcsigs pbr
test_empty_singleton_side_input 
(apache_beam.transforms.sideinputs_test.SideInputsTest) ... ok
test_iterable_side_input 
(apache_beam.transforms.sideinputs_test.SideInputsTest) ... ok
test_flattened_side_input 
(apache_beam.transforms.sideinputs_test.SideInputsTest) ... ok
test_multi_valued_singleton_side_input 
(apache_beam.transforms.sideinputs_test.SideInputsTest) ... ok

==
ERROR: test_par_do_with_multiple_outputs_and_using_yield 
(apache_beam.transforms.ptransform_test.PTransformTest)
--
Traceback (most recent call last):
  File 
"
 line 220, in test_par_do_with_multiple_outputs_and_using_yield
pipeline.run()
  File 
"
 line 96, in run
result = super(TestPipeline, self).run()
  File 
"
 line 319, in run
self.to_runner_api(), self.runner, self._options).run(False)
  File 
"
 line 328, in run
return self.runner.run(self)
  File 
"
 line 38, in run
self.result = super(TestDataflowRunner, self).run(pipeline)
  File 
"
 line 272, in run
self.dataflow_client.create_job(self.job), self)
  File 
"
 line 168, in wrapper
return fun(*args, **kwargs)
  File 
"
 line 438, in create_job
self.create_job_description(job)
  File 
"
 line 461, in create_job_description
job.options, file_copy=self._gcs_file_copy)
  File 
"
 line 327, in stage_job_resources
setup_options.requirements_file, requirements_cache_path)
  File 
"
 line 261, in _populate_requirements_cache
processes.check_call(cmd_args)
  File 
"
 line 44, in check_call
return subprocess.check_call(*args, **kwargs)
  File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
raise CalledProcessError(retcode, cmd)
CalledProcessError: Command

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2699

2017-07-24 Thread Apache Jenkins Server

See

[jira] [Updated] (BEAM-2445) DSL SQL to use service locator pattern to automatically register UDFs

2017-07-24 Thread Luke Cwik (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-2445:

Labels:   (was: dsl_sql_merge)

> DSL SQL to use service locator pattern to automatically register UDFs
> -
>
> Key: BEAM-2445
> URL: https://issues.apache.org/jira/browse/BEAM-2445
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Luke Cwik
>Assignee: James Xu
>Priority: Minor
>
> Use a service locator pattern to find UDFs that can be registered. The 
> service loader can be used to register UDFs for standard functions via DSL 
> SQL, additional UDFs registered by third party libraries, and end user 
> created UDFs.
> Example ServiceLoader usage within Apache Beam to find coder providers:
> https://github.com/apache/beam/blob/7126fdc6ee5671e99a2dede3f25ba616aa0e8fa4/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/CoderRegistry.java#L147



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Build failed in Jenkins: beam_PerformanceTests_Python #139

2017-07-24 Thread Apache Jenkins Server

See 


Changes:

[aviemzur] [BEAM-2314] Add ValidatesRunner test for merging custom windows

--
Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on beam6 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/${ghprbPullId}/*:refs/remotes/origin/pr/${ghprbPullId}/*
 > git rev-parse origin/master^{commit} # timeout=10
Checking out Revision 0064fb37ad13a10fc510e567d21873403a42340a (origin/master)
Commit message: "This closes #3286"
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 0064fb37ad13a10fc510e567d21873403a42340a
 > git rev-list f54072a1b156176624db5b666ed9c308f24bc2f9 # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
[EnvInject] - Executing scripts and injecting environment variables after the 
SCM step.
[EnvInject] - Injecting as environment variables the properties content 
SPARK_LOCAL_IP=127.0.0.1

[EnvInject] - Variables injected successfully.
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins6613939584988975999.sh
+ rm -rf PerfKitBenchmarker
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins7045287388938022842.sh
+ git clone https://github.com/GoogleCloudPlatform/PerfKitBenchmarker.git
Cloning into 'PerfKitBenchmarker'...
[beam_PerformanceTests_Python] $ /bin/bash -xe /tmp/jenkins502235382605065751.sh
+ pip install --user -r PerfKitBenchmarker/requirements.txt
Requirement already satisfied (use --upgrade to upgrade): python-gflags==3.1.1 
in /home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 14))
Requirement already satisfied (use --upgrade to upgrade): jinja2>=2.7 in 
/usr/local/lib/python2.7/dist-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 15))
Requirement already satisfied (use --upgrade to upgrade): setuptools in 
/usr/lib/python2.7/dist-packages (from -r PerfKitBenchmarker/requirements.txt 
(line 16))
Requirement already satisfied (use --upgrade to upgrade): 
colorlog[windows]==2.6.0 in /home/jenkins/.local/lib/python2.7/site-packages 
(from -r PerfKitBenchmarker/requirements.txt (line 17))
  Installing extra requirements: 'windows'
Requirement already satisfied (use --upgrade to upgrade): blinker>=1.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 18))
Requirement already satisfied (use --upgrade to upgrade): futures>=3.0.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 19))
Requirement already satisfied (use --upgrade to upgrade): PyYAML==3.12 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 20))
Requirement already satisfied (use --upgrade to upgrade): pint>=0.7 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 21))
Requirement already satisfied (use --upgrade to upgrade): numpy in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 22))
Requirement already satisfied (use --upgrade to upgrade): functools32 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 23))
Requirement already satisfied (use --upgrade to upgrade): contextlib2>=0.5.1 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 24))
Cleaning up...
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins1083794523405365569.sh
+ pip install --user -e 'sdks/python/[gcp,test]'
Obtaining 
file://
  Running setup.py 
(path:
 egg_info for package from 
file://

:66:
 UserWarning: You are using version 1.5.4 of pip. However, version 7.0.0 is 
recommended.
  _PIP_VERSION, REQUIRED_PIP_VERSION
no previously-included directories found matching 'doc/.build'

Installed 


warning: no files found matching 'README.md'

[jira] [Commented] (BEAM-2575) ApexRunner doesn't emit watermarks for additional outputs

2017-07-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098906#comment-16098906
 ] 

ASF GitHub Bot commented on BEAM-2575:
--

GitHub user tgroh opened a pull request:

https://github.com/apache/beam/pull/3629

Cherry-pick pull #3529 into Release 2.1.0

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---
BEAM-2575 ApexRunner doesn't emit watermarks for additional outputs

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/beam cherrypick_3529

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3629.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3629


commit 96ff22060c3e3e99e5bf913f5f58c095f48c0021
Author: Thomas Weise 
Date:   2017-07-09T18:57:43Z

BEAM-2575 ApexRunner doesn't emit watermarks for additional outputs




> ApexRunner doesn't emit watermarks for additional outputs 
> --
>
> Key: BEAM-2575
> URL: https://issues.apache.org/jira/browse/BEAM-2575
> Project: Beam
>  Issue Type: Bug
>  Components: runner-apex
>Reporter: Thomas Weise
>Assignee: Thomas Weise
> Fix For: 2.2.0
>
>
> https://lists.apache.org/thread.html/51113a207f96d0522fb81adb65e35e134a0c52cf4bbe1cfc46508d83@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] beam pull request #3629: Cherry-pick pull #3529 into Release 2.1.0

2017-07-24 Thread tgroh

GitHub user tgroh opened a pull request:

https://github.com/apache/beam/pull/3629

Cherry-pick pull #3529 into Release 2.1.0

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---
BEAM-2575 ApexRunner doesn't emit watermarks for additional outputs

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/beam cherrypick_3529

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3629.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3629


commit 96ff22060c3e3e99e5bf913f5f58c095f48c0021
Author: Thomas Weise 
Date:   2017-07-09T18:57:43Z

BEAM-2575 ApexRunner doesn't emit watermarks for additional outputs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Resolved] (BEAM-1928) Populate Runner API Components from the Java SDK

2017-07-24 Thread Thomas Groh (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Groh resolved BEAM-1928.
---
   Resolution: Fixed
Fix Version/s: 2.2.0

Components are populated. Future work to use the Runner-API copy of the 
Pipeline everywhere is tracked independently.

> Populate Runner API Components from the Java SDK
> 
>
> Key: BEAM-1928
> URL: https://issues.apache.org/jira/browse/BEAM-1928
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model-runner-api, runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
> Fix For: 2.2.0
>
>
> Permit conversion of Java SDK components to runner API protocol buffers, and 
> the extraction of those SDK components from the protocol buffers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-2558) Add a CombineFnTester

2017-07-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098899#comment-16098899
 ] 

ASF GitHub Bot commented on BEAM-2558:
--

GitHub user tgroh opened a pull request:

https://github.com/apache/beam/pull/3628

[BEAM-2558] Migrate checkCombineFn in TestUtils to CombineFnTester

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---
This makes CombineFnTester significantly more discoverable, and usable
without having dependencies on the test JAR.

Update existing tests.

This is purely a move of code with minor renaming, visibility limiting, and
updates to include empty accumulators on both ends of `mergeAccumulators`.

Worth considering is also ensuring that the output (as returned by 
`extractOutput`)
is not the same as any of the merged accumulators (as doing so potentially 
interacts
poorly with a lifted Combine)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/beam combine_fn_tester

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3628.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3628


commit 75a7ef967fe488d24b0c633213dd864875874918
Author: Thomas Groh 
Date:   2017-07-20T20:36:04Z

Migrate checkCombineFn in TestUtils to CombineFnTester

This makes CombineFnTester significantly more discoverable, and usable
without having dependencies on the test JAR.

Update existing tests.




> Add a CombineFnTester
> -
>
> Key: BEAM-2558
> URL: https://issues.apache.org/jira/browse/BEAM-2558
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>
> This should be in the style of {{CoderProperties}} or {{EqualsTester}}: the 
> user should provide some inputs, and it should exercise potential paths by 
> which those elements may be accumulated and ensure that they all produce the 
> same results.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] beam pull request #3628: [BEAM-2558] Migrate checkCombineFn in TestUtils to ...

2017-07-24 Thread tgroh

GitHub user tgroh opened a pull request:

https://github.com/apache/beam/pull/3628

[BEAM-2558] Migrate checkCombineFn in TestUtils to CombineFnTester

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---
This makes CombineFnTester significantly more discoverable, and usable
without having dependencies on the test JAR.

Update existing tests.

This is purely a move of code with minor renaming, visibility limiting, and
updates to include empty accumulators on both ends of `mergeAccumulators`.

Worth considering is also ensuring that the output (as returned by 
`extractOutput`)
is not the same as any of the merged accumulators (as doing so potentially 
interacts
poorly with a lifted Combine)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/beam combine_fn_tester

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3628.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3628


commit 75a7ef967fe488d24b0c633213dd864875874918
Author: Thomas Groh 
Date:   2017-07-20T20:36:04Z

Migrate checkCombineFn in TestUtils to CombineFnTester

This makes CombineFnTester significantly more discoverable, and usable
without having dependencies on the test JAR.

Update existing tests.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Closed] (BEAM-2314) Add ValidatesRunner test for custom merging windows

2017-07-24 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed BEAM-2314.
--
   Resolution: Fixed
Fix Version/s: 2.2.0

> Add ValidatesRunner test for custom merging windows
> ---
>
> Key: BEAM-2314
> URL: https://issues.apache.org/jira/browse/BEAM-2314
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Aljoscha Krettek
>Assignee: Etienne Chauchot
> Fix For: 2.2.0
>
>
> While working on BEAM-160 it was discovered that not all Runners supported 
> custom merging {{WindowFns}}. This is now fixed but we should still add a 
> {{ValidatesRunner}} test for this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-2636) user_score on DataflowRunner is broken

2017-07-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098626#comment-16098626
 ] 

ASF GitHub Bot commented on BEAM-2636:
--

GitHub user sb2nov opened a pull request:

https://github.com/apache/beam/pull/3627

[BEAM-2636] Make sure we only override the correct class

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---

R: @jbonofre PTAL
CC: @aaltay 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sb2nov/beam BEAM-2636-cherry-pick

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3627.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3627


commit ab21078b0ea88f610d164245c379da8634ecd854
Author: Sourabh Bajaj 
Date:   2017-07-19T17:08:14Z

[BEAM-2636] Make sure we only override the correct class




> user_score on DataflowRunner is broken
> --
>
> Key: BEAM-2636
> URL: https://issues.apache.org/jira/browse/BEAM-2636
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Affects Versions: 2.1.0
>Reporter: Ahmet Altay
>Assignee: Sourabh Bajaj
> Fix For: 2.2.0
>
>
> UserScore has a custom transform named {{WriteToBigQuery}}, dataflow runner 
> has a special code handling transforms with that name, this will break for 
> all user transforms that has this name.
> We can either:
> - Handle this correctly
> - Or document this as a reserved keyword and change the example.
> cc: [~chamikara]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] beam pull request #3627: [BEAM-2636] Make sure we only override the correct ...

2017-07-24 Thread sb2nov

GitHub user sb2nov opened a pull request:

https://github.com/apache/beam/pull/3627

[BEAM-2636] Make sure we only override the correct class

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---

R: @jbonofre PTAL
CC: @aaltay 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sb2nov/beam BEAM-2636-cherry-pick

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3627.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3627


commit ab21078b0ea88f610d164245c379da8634ecd854
Author: Sourabh Bajaj 
Date:   2017-07-19T17:08:14Z

[BEAM-2636] Make sure we only override the correct class




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (BEAM-1274) Add SSL/TLS support to ElasticSearch IO

2017-07-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098606#comment-16098606
 ] 

ASF GitHub Bot commented on BEAM-1274:
--

GitHub user jbonofre opened a pull request:

https://github.com/apache/beam/pull/3626

[BEAM-1274] Add SSL mutual authentication support

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [X] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [X] Each commit in the pull request should have a meaningful subject 
line and body.
 - [X] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [X] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [X] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [X] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jbonofre/beam BEAM-1274-ES-TLS

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3626.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3626


commit 3c5a1b8ef6f35a447c6fe185f00e7cf2c0cc0b63
Author: Jean-Baptiste Onofré 
Date:   2017-07-24T15:53:15Z

[BEAM-1274] Add SSL mutual authentication support




> Add SSL/TLS support to ElasticSearch IO
> ---
>
> Key: BEAM-1274
> URL: https://issues.apache.org/jira/browse/BEAM-1274
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Etienne Chauchot
>Assignee: Jean-Baptiste Onofré
>Priority: Minor
>  Labels: security
>
> Some users of ElasticSearch IO need to connect to ElasticSearch using HTTPS. 
> This feature is an improvement of 
> https://issues.apache.org/jira/browse/BEAM-425



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] beam pull request #3626: [BEAM-1274] Add SSL mutual authentication support

2017-07-24 Thread jbonofre

GitHub user jbonofre opened a pull request:

https://github.com/apache/beam/pull/3626

[BEAM-1274] Add SSL mutual authentication support

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [X] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [X] Each commit in the pull request should have a meaningful subject 
line and body.
 - [X] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [X] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [X] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [X] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jbonofre/beam BEAM-1274-ES-TLS

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3626.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3626


commit 3c5a1b8ef6f35a447c6fe185f00e7cf2c0cc0b63
Author: Jean-Baptiste OnofrÃ© 
Date:   2017-07-24T15:53:15Z

[BEAM-1274] Add SSL mutual authentication support




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (BEAM-2314) Add ValidatesRunner test for custom merging windows

2017-07-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098573#comment-16098573
 ] 

ASF GitHub Bot commented on BEAM-2314:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3286


> Add ValidatesRunner test for custom merging windows
> ---
>
> Key: BEAM-2314
> URL: https://issues.apache.org/jira/browse/BEAM-2314
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Aljoscha Krettek
>Assignee: Etienne Chauchot
>
> While working on BEAM-160 it was discovered that not all Runners supported 
> custom merging {{WindowFns}}. This is now fixed but we should still add a 
> {{ValidatesRunner}} test for this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] beam pull request #3286: [BEAM-2314] Add ValidatesRunner test for merging cu...

2017-07-24 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3286


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[1/2] beam git commit: [BEAM-2314] Add ValidatesRunner test for merging custom windows

2017-07-24 Thread aviemzur

Repository: beam
Updated Branches:
  refs/heads/master f54072a1b -> 0064fb37a


[BEAM-2314] Add ValidatesRunner test for merging custom windows


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/dfa983ce
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/dfa983ce
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/dfa983ce

Branch: refs/heads/master
Commit: dfa983ce4adb85d211497460254b6a95944ce869
Parents: f54072a
Author: Etienne Chauchot 
Authored: Mon May 29 12:05:51 2017 +0200
Committer: Aviem Zur 
Committed: Mon Jul 24 14:33:00 2017 +0300

--
 runners/spark/pom.xml   |   3 +-
 .../sdk/testing/UsesCustomWindowMerging.java|  23 +++
 .../sdk/transforms/windowing/WindowTest.java| 184 +++
 3 files changed, 209 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/dfa983ce/runners/spark/pom.xml
--
diff --git a/runners/spark/pom.xml b/runners/spark/pom.xml
index 7f70204..35e933b 100644
--- a/runners/spark/pom.xml
+++ b/runners/spark/pom.xml
@@ -77,7 +77,8 @@
   
 org.apache.beam.sdk.testing.UsesSplittableParDo,
 org.apache.beam.sdk.testing.UsesCommittedMetrics,
-org.apache.beam.sdk.testing.UsesTestStream
+org.apache.beam.sdk.testing.UsesTestStream,
+org.apache.beam.sdk.testing.UsesCustomWindowMerging
   
   none
   1

http://git-wip-us.apache.org/repos/asf/beam/blob/dfa983ce/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/UsesCustomWindowMerging.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/UsesCustomWindowMerging.java
 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/UsesCustomWindowMerging.java
new file mode 100644
index 000..fc40e02
--- /dev/null
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/UsesCustomWindowMerging.java
@@ -0,0 +1,23 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.testing;
+
+/**
+ * Category tag for validation tests which utilize custom window merging.
+ */
+public interface UsesCustomWindowMerging {}

http://git-wip-us.apache.org/repos/asf/beam/blob/dfa983ce/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/WindowTest.java
--
diff --git 
a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/WindowTest.java
 
b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/WindowTest.java
index 65af7a1..5b6d046 100644
--- 
a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/WindowTest.java
+++ 
b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/WindowTest.java
@@ -31,19 +31,30 @@ import static org.junit.Assert.assertTrue;
 import static org.mockito.Mockito.when;
 
 import com.google.common.collect.Iterables;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
 import java.io.Serializable;
+import java.util.ArrayList;
 import java.util.Collection;
 import java.util.Collections;
+import java.util.List;
+import java.util.Objects;
 import java.util.concurrent.atomic.AtomicBoolean;
 import org.apache.beam.sdk.Pipeline.PipelineVisitor;
 import org.apache.beam.sdk.coders.Coder;
 import org.apache.beam.sdk.coders.Coder.NonDeterministicException;
+import org.apache.beam.sdk.coders.CustomCoder;
 import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.coders.VarIntCoder;
 import org.apache.beam.sdk.io.GenerateSequence;
 import org.apache.beam.sdk.runners.TransformHierarchy;
 import org.apache.beam.sdk.testing.PAssert;
 import org.apache.beam.sdk.testing.TestPipeline;
+import

[jira] [Commented] (BEAM-2587) Build fails due to python sdk

2017-07-24 Thread JIRA


[ 
https://issues.apache.org/jira/browse/BEAM-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098571#comment-16098571
 ] 

Jean-Baptiste Onofré commented on BEAM-2587:


Let's try to fix that for 2.1.0.

> Build fails due to python sdk
> -
>
> Key: BEAM-2587
> URL: https://issues.apache.org/jira/browse/BEAM-2587
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Affects Versions: 2.1.0
>Reporter: Ahmet Altay
> Fix For: 2.1.0
>
>
> Build fails with the following errors when {{mvn clean package}} is used on a 
> clean Ubuntu 16.04 LTS machine with pip 8.x. The issue is resolved when pip 
> is upgraded to pip 9.x
> "RuntimeError: Not in apache git tree; unable to find proto definitions."
> "DistutilsOptionError: can't combine user with prefix, exec_prefix/home, or 
> install_(plat)base"
> We need to understand the issue and maybe add a note about requiring pip 9.x 
> for development. Note that this does not affect end users using prepackaged 
> artifacts from central repositories.
> cc: [~robertwb]
> Script for reproduction:
> {code}
> #!/bin/bash
> set -e
> readonly MACHINE_ID=$(hexdump -n 1 -e '"%x"' /dev/random)
> readonly MACHINE="${USER}-beam-build-${MACHINE_ID}"
> readonly ZONE="us-central1-c"
> # provision building machine
> echo "Provisioning Build Machine (Ubuntu 16.04 LTS)"
> gcloud compute instances create "$MACHINE" \
>   --zone="$ZONE" \
>   --image-project="ubuntu-os-cloud" \
>   --image-family="ubuntu-1604-lts"
> # wait for ssh to be ready
> echo "Waiting for machine to finish booting"
> sleep 30
> # ssh into the machine
> # 1. install dependencies as specified by beam readme
> # 2. download beam source from github
> # 3. build with maven
> echo "Downloading and building Apache Beam (release-2.1.0)"
> gcloud compute ssh "$MACHINE" --zone="$ZONE" << EOF
> sudo apt-get --assume-yes update
> sudo apt-get --assume-yes install \
> openjdk-8-jdk \
> maven \
> python-setuptools \
> python-pip
> wget https://github.com/apache/beam/archive/release-2.1.0.tar.gz
> tar -xzf release-2.1.0.tar.gz
> cd beam-release-2.1.0
> mvn clean package
> EOF
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[2/2] beam git commit: This closes #3286

2017-07-24 Thread aviemzur

This closes #3286


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/0064fb37
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/0064fb37
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/0064fb37

Branch: refs/heads/master
Commit: 0064fb37ad13a10fc510e567d21873403a42340a
Parents: f54072a dfa983c
Author: Aviem Zur 
Authored: Mon Jul 24 18:22:42 2017 +0300
Committer: Aviem Zur 
Committed: Mon Jul 24 18:22:42 2017 +0300

--
 runners/spark/pom.xml   |   3 +-
 .../sdk/testing/UsesCustomWindowMerging.java|  23 +++
 .../sdk/transforms/windowing/WindowTest.java| 184 +++
 3 files changed, 209 insertions(+), 1 deletion(-)
--

[jira] [Updated] (BEAM-2587) Build fails due to python sdk

2017-07-24 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré updated BEAM-2587:
---
Fix Version/s: 2.2.0

> Build fails due to python sdk
> -
>
> Key: BEAM-2587
> URL: https://issues.apache.org/jira/browse/BEAM-2587
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Affects Versions: 2.1.0
>Reporter: Ahmet Altay
> Fix For: 2.1.0
>
>
> Build fails with the following errors when {{mvn clean package}} is used on a 
> clean Ubuntu 16.04 LTS machine with pip 8.x. The issue is resolved when pip 
> is upgraded to pip 9.x
> "RuntimeError: Not in apache git tree; unable to find proto definitions."
> "DistutilsOptionError: can't combine user with prefix, exec_prefix/home, or 
> install_(plat)base"
> We need to understand the issue and maybe add a note about requiring pip 9.x 
> for development. Note that this does not affect end users using prepackaged 
> artifacts from central repositories.
> cc: [~robertwb]
> Script for reproduction:
> {code}
> #!/bin/bash
> set -e
> readonly MACHINE_ID=$(hexdump -n 1 -e '"%x"' /dev/random)
> readonly MACHINE="${USER}-beam-build-${MACHINE_ID}"
> readonly ZONE="us-central1-c"
> # provision building machine
> echo "Provisioning Build Machine (Ubuntu 16.04 LTS)"
> gcloud compute instances create "$MACHINE" \
>   --zone="$ZONE" \
>   --image-project="ubuntu-os-cloud" \
>   --image-family="ubuntu-1604-lts"
> # wait for ssh to be ready
> echo "Waiting for machine to finish booting"
> sleep 30
> # ssh into the machine
> # 1. install dependencies as specified by beam readme
> # 2. download beam source from github
> # 3. build with maven
> echo "Downloading and building Apache Beam (release-2.1.0)"
> gcloud compute ssh "$MACHINE" --zone="$ZONE" << EOF
> sudo apt-get --assume-yes update
> sudo apt-get --assume-yes install \
> openjdk-8-jdk \
> maven \
> python-setuptools \
> python-pip
> wget https://github.com/apache/beam/archive/release-2.1.0.tar.gz
> tar -xzf release-2.1.0.tar.gz
> cd beam-release-2.1.0
> mvn clean package
> EOF
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (BEAM-2587) Build fails due to python sdk

2017-07-24 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré updated BEAM-2587:
---
Fix Version/s: (was: 2.2.0)
   2.1.0

> Build fails due to python sdk
> -
>
> Key: BEAM-2587
> URL: https://issues.apache.org/jira/browse/BEAM-2587
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Affects Versions: 2.1.0
>Reporter: Ahmet Altay
> Fix For: 2.1.0
>
>
> Build fails with the following errors when {{mvn clean package}} is used on a 
> clean Ubuntu 16.04 LTS machine with pip 8.x. The issue is resolved when pip 
> is upgraded to pip 9.x
> "RuntimeError: Not in apache git tree; unable to find proto definitions."
> "DistutilsOptionError: can't combine user with prefix, exec_prefix/home, or 
> install_(plat)base"
> We need to understand the issue and maybe add a note about requiring pip 9.x 
> for development. Note that this does not affect end users using prepackaged 
> artifacts from central repositories.
> cc: [~robertwb]
> Script for reproduction:
> {code}
> #!/bin/bash
> set -e
> readonly MACHINE_ID=$(hexdump -n 1 -e '"%x"' /dev/random)
> readonly MACHINE="${USER}-beam-build-${MACHINE_ID}"
> readonly ZONE="us-central1-c"
> # provision building machine
> echo "Provisioning Build Machine (Ubuntu 16.04 LTS)"
> gcloud compute instances create "$MACHINE" \
>   --zone="$ZONE" \
>   --image-project="ubuntu-os-cloud" \
>   --image-family="ubuntu-1604-lts"
> # wait for ssh to be ready
> echo "Waiting for machine to finish booting"
> sleep 30
> # ssh into the machine
> # 1. install dependencies as specified by beam readme
> # 2. download beam source from github
> # 3. build with maven
> echo "Downloading and building Apache Beam (release-2.1.0)"
> gcloud compute ssh "$MACHINE" --zone="$ZONE" << EOF
> sudo apt-get --assume-yes update
> sudo apt-get --assume-yes install \
> openjdk-8-jdk \
> maven \
> python-setuptools \
> python-pip
> wget https://github.com/apache/beam/archive/release-2.1.0.tar.gz
> tar -xzf release-2.1.0.tar.gz
> cd beam-release-2.1.0
> mvn clean package
> EOF
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (BEAM-2445) DSL SQL to use service locator pattern to automatically register UDFs

2017-07-24 Thread James Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Xu reassigned BEAM-2445:
--

Assignee: James Xu

> DSL SQL to use service locator pattern to automatically register UDFs
> -
>
> Key: BEAM-2445
> URL: https://issues.apache.org/jira/browse/BEAM-2445
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Luke Cwik
>Assignee: James Xu
>Priority: Minor
>  Labels: dsl_sql_merge
>
> Use a service locator pattern to find UDFs that can be registered. The 
> service loader can be used to register UDFs for standard functions via DSL 
> SQL, additional UDFs registered by third party libraries, and end user 
> created UDFs.
> Example ServiceLoader usage within Apache Beam to find coder providers:
> https://github.com/apache/beam/blob/7126fdc6ee5671e99a2dede3f25ba616aa0e8fa4/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/CoderRegistry.java#L147



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-2441) DSL SQL maven module location and name

2017-07-24 Thread James Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098523#comment-16098523
 ] 

James Xu commented on BEAM-2441:


Since there is already many extensions in `sdks/java/extensions/sql`, how about 
use `sdks/java/extensions/sql`?

> DSL SQL maven module location and name
> --
>
> Key: BEAM-2441
> URL: https://issues.apache.org/jira/browse/BEAM-2441
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Luke Cwik
>  Labels: dsl_sql_merge
>
> The current maven module location is *dsl/sql*, unfortunately this occludes 
> the fact that this is for the Java SDK and also prevents alternative language 
> implementations.
> Some alternative locations could be:
> {code}
> sdks/java/extensions/sql
> sdks/java/dsls/sql
> dsls/sql/java
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (BEAM-2571) Flink ValidatesRunner failing CombineTest.testSlidingWindowsCombineWithContext

2017-07-24 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré resolved BEAM-2571.

Resolution: Fixed

> Flink ValidatesRunner failing CombineTest.testSlidingWindowsCombineWithContext
> --
>
> Key: BEAM-2571
> URL: https://issues.apache.org/jira/browse/BEAM-2571
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kenneth Knowles
>Assignee: Aljoscha Krettek
> Fix For: 2.1.0, 2.2.0
>
>
> This appears to have been caused by https://github.com/apache/beam/pull/3429 
> which fixes a couple errors in how trigger timers were processed / final 
> panes labeled.
> I am investigating, considering roll back vs forward fix. Since it is an 
> esoteric use case where I would advise users to use a stateful DoFn instead, 
> I think the bug fixed probably outweighs the bug introduced. I would like to 
> fix for 2.1.0 but will report back soon.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-2571) Flink ValidatesRunner failing CombineTest.testSlidingWindowsCombineWithContext

2017-07-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098448#comment-16098448
 ] 

ASF GitHub Bot commented on BEAM-2571:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3625


> Flink ValidatesRunner failing CombineTest.testSlidingWindowsCombineWithContext
> --
>
> Key: BEAM-2571
> URL: https://issues.apache.org/jira/browse/BEAM-2571
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kenneth Knowles
>Assignee: Aljoscha Krettek
> Fix For: 2.1.0, 2.2.0
>
>
> This appears to have been caused by https://github.com/apache/beam/pull/3429 
> which fixes a couple errors in how trigger timers were processed / final 
> panes labeled.
> I am investigating, considering roll back vs forward fix. Since it is an 
> esoteric use case where I would advise users to use a stateful DoFn instead, 
> I think the bug fixed probably outweighs the bug introduced. I would like to 
> fix for 2.1.0 but will report back soon.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] beam pull request #3625: [BEAM-2571] Fix FlinkRunner Watermark issues on 2.1...

2017-07-24 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3625


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[2/4] beam git commit: [BEAM-2571] Change DoFnOperator to use Long.MAX_VALUE as max watermark

2017-07-24 Thread jbonofre

[BEAM-2571] Change DoFnOperator to use Long.MAX_VALUE as max watermark

This is in line with what Flink does and what BoundedSourceWrapper and
UnboundedSourceWrapper do.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/1686805d
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/1686805d
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/1686805d

Branch: refs/heads/release-2.1.0
Commit: 1686805d84316f2a8438c642214cdabe5b579381
Parents: 608de07
Author: Aljoscha Krettek 
Authored: Wed Jul 12 14:42:37 2017 +0200
Committer: Aljoscha Krettek 
Committed: Mon Jul 24 14:29:55 2017 +0200

--
 .../flink/translation/wrappers/streaming/DoFnOperator.java   | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/1686805d/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java
--
diff --git 
a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java
 
b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java
index 350f323..751eba1 100644
--- 
a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java
+++ 
b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java
@@ -334,7 +334,7 @@ public class DoFnOperator
   protected final long getPushbackWatermarkHold() {
 // if we don't have side inputs we never hold the watermark
 if (sideInputs.isEmpty()) {
-  return BoundedWindow.TIMESTAMP_MAX_VALUE.getMillis();
+  return Long.MAX_VALUE;
 }
 
 try {
@@ -353,7 +353,7 @@ public class DoFnOperator
   BagState pushedBack =
   pushbackStateInternals.state(StateNamespaces.global(), 
pushedBackTag);
 
-  long min = BoundedWindow.TIMESTAMP_MAX_VALUE.getMillis();
+  long min = Long.MAX_VALUE;
   for (WindowedValue value : pushedBack.read()) {
 min = Math.min(min, value.getTimestamp().getMillis());
   }
@@ -426,7 +426,7 @@ public class DoFnOperator
 }
 
 pushedBack.clear();
-long min = BoundedWindow.TIMESTAMP_MAX_VALUE.getMillis();
+long min = Long.MAX_VALUE;
 for (WindowedValue pushedBackValue : newPushedBack) {
   min = Math.min(min, pushedBackValue.getTimestamp().getMillis());
   pushedBack.add(pushedBackValue);
@@ -524,7 +524,7 @@ public class DoFnOperator
 
 pushedBack.clear();
 
-setPushedBackWatermark(BoundedWindow.TIMESTAMP_MAX_VALUE.getMillis());
+setPushedBackWatermark(Long.MAX_VALUE);
 
 pushbackDoFnRunner.finishBundle();
   }

[3/4] beam git commit: [BEAM-2571] Respect watermark contract in Flink DoFnOperator

2017-07-24 Thread jbonofre

[BEAM-2571] Respect watermark contract in Flink DoFnOperator

In Flink, a watermark T specifies that there will be no elements with a
timestamp <= T in the future. In Beam, a watermark T specifies that
there will not be element with a timestamp < T in the future. This leads
to problems when the watermark is exactly "on the timer timestamp", most
prominently, this happened with Triggers, where Flink would fire the
Trigger too early and the Trigger would determine (based on the
watermark) that it is not yet time to fire the window while Flink
thought it was time.

This also adds a test that specifially tests the edge case.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/ade50652
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/ade50652
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/ade50652

Branch: refs/heads/release-2.1.0
Commit: ade506526b4ff56eb4ed15e9eea04d1d3345bc13
Parents: 5c4a95a
Author: Aljoscha Krettek 
Authored: Wed Jul 12 15:38:06 2017 +0200
Committer: Aljoscha Krettek 
Committed: Mon Jul 24 14:29:56 2017 +0200

--
 .../wrappers/streaming/DoFnOperator.java|  13 ++-
 .../flink/streaming/DoFnOperatorTest.java   | 117 ++-
 2 files changed, 128 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/ade50652/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java
--
diff --git 
a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java
 
b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java
index 8da8de5..8884ce1 100644
--- 
a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java
+++ 
b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java
@@ -472,7 +472,7 @@ public class DoFnOperator
   // hold back by the pushed back values waiting for side inputs
   long pushedBackInputWatermark = Math.min(getPushbackWatermarkHold(), 
mark.getTimestamp());
 
-  timerService.advanceWatermark(pushedBackInputWatermark);
+  
timerService.advanceWatermark(toFlinkRuntimeWatermark(pushedBackInputWatermark));
 
   Instant watermarkHold = stateInternals.watermarkHold();
 
@@ -501,6 +501,17 @@ public class DoFnOperator
   }
 
   /**
+   * Converts a Beam watermark to a Flink watermark. This is only relevant 
when considering what
+   * event-time timers to fire: in Beam, a watermark {@code T} says there will 
not be any elements
+   * with a timestamp {@code < T} in the future. A Flink watermark {@code T} 
says there will not be
+   * any elements with a timestamp {@code <= T} in the future. We correct this 
by subtracting
+   * {@code 1} from a Beam watermark before passing to any relevant Flink 
runtime components.
+   */
+  private static long toFlinkRuntimeWatermark(long beamWatermark) {
+return beamWatermark - 1;
+  }
+
+  /**
* Emits all pushed-back data. This should be used once we know that there 
will not be
* any future side input, i.e. that there is no point in waiting.
*/

http://git-wip-us.apache.org/repos/asf/beam/blob/ade50652/runners/flink/src/test/java/org/apache/beam/runners/flink/streaming/DoFnOperatorTest.java
--
diff --git 
a/runners/flink/src/test/java/org/apache/beam/runners/flink/streaming/DoFnOperatorTest.java
 
b/runners/flink/src/test/java/org/apache/beam/runners/flink/streaming/DoFnOperatorTest.java
index ad9d236..4d2a912 100644
--- 
a/runners/flink/src/test/java/org/apache/beam/runners/flink/streaming/DoFnOperatorTest.java
+++ 
b/runners/flink/src/test/java/org/apache/beam/runners/flink/streaming/DoFnOperatorTest.java
@@ -33,6 +33,7 @@ import javax.annotation.Nullable;
 import org.apache.beam.runners.core.StatefulDoFnRunner;
 import org.apache.beam.runners.flink.FlinkPipelineOptions;
 import org.apache.beam.runners.flink.translation.types.CoderTypeInformation;
+import org.apache.beam.runners.flink.translation.types.CoderTypeSerializer;
 import 
org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator;
 import org.apache.beam.sdk.coders.Coder;
 import org.apache.beam.sdk.coders.KvCoder;
@@ -197,6 +198,118 @@ public class DoFnOperatorTest {
 testHarness.close();
   }
 
+  /**
+   * This test specifically verifies that we correctly map Flink watermarks to 
Beam watermarks. In
+   * Beam, a watermark {@code T} guarantees there will not be elements with a 
timestamp
+   * {@code < T} in the

[1/4] beam git commit: [BEAM-2571] Clarify pushedback variable name in DoFnOperator

2017-07-24 Thread jbonofre

Repository: beam
Updated Branches:
  refs/heads/release-2.1.0 608de07e4 -> 1cf560b4b


[BEAM-2571] Clarify pushedback variable name in DoFnOperator


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/5c4a95aa
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/5c4a95aa
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/5c4a95aa

Branch: refs/heads/release-2.1.0
Commit: 5c4a95aa63da23027b619a50928ebebe5beb05c2
Parents: 1686805
Author: Aljoscha Krettek 
Authored: Wed Jul 12 14:39:58 2017 +0200
Committer: Aljoscha Krettek 
Committed: Mon Jul 24 14:29:55 2017 +0200

--
 .../flink/translation/wrappers/streaming/DoFnOperator.java | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/5c4a95aa/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java
--
diff --git 
a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java
 
b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java
index 751eba1..8da8de5 100644
--- 
a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java
+++ 
b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java
@@ -470,15 +470,15 @@ public class DoFnOperator
   setCurrentInputWatermark(mark.getTimestamp());
 
   // hold back by the pushed back values waiting for side inputs
-  long actualInputWatermark = Math.min(getPushbackWatermarkHold(), 
mark.getTimestamp());
+  long pushedBackInputWatermark = Math.min(getPushbackWatermarkHold(), 
mark.getTimestamp());
 
-  timerService.advanceWatermark(actualInputWatermark);
+  timerService.advanceWatermark(pushedBackInputWatermark);
 
   Instant watermarkHold = stateInternals.watermarkHold();
 
   long combinedWatermarkHold = Math.min(watermarkHold.getMillis(), 
getPushbackWatermarkHold());
 
-  long potentialOutputWatermark = Math.min(currentInputWatermark, 
combinedWatermarkHold);
+  long potentialOutputWatermark = Math.min(pushedBackInputWatermark, 
combinedWatermarkHold);
 
   if (potentialOutputWatermark > currentOutputWatermark) {
 setCurrentOutputWatermark(potentialOutputWatermark);

[4/4] beam git commit: This closes #3625

2017-07-24 Thread jbonofre

This closes #3625


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/1cf560b4
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/1cf560b4
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/1cf560b4

Branch: refs/heads/release-2.1.0
Commit: 1cf560b4bb05bed3a112f902b452ac3bf59085d1
Parents: 608de07 ade5065
Author: Jean-Baptiste OnofrÃ© 
Authored: Mon Jul 24 15:55:15 2017 +0200
Committer: Jean-Baptiste OnofrÃ© 
Committed: Mon Jul 24 15:55:15 2017 +0200

--
 .../wrappers/streaming/DoFnOperator.java|  25 ++--
 .../flink/streaming/DoFnOperatorTest.java   | 117 ++-
 2 files changed, 134 insertions(+), 8 deletions(-)
--

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #3644

2017-07-24 Thread Apache Jenkins Server

See

Jenkins build is back to stable : beam_PostCommit_Java_MavenInstall #4445

2017-07-24 Thread Apache Jenkins Server

See

[jira] [Commented] (BEAM-2658) SerializableCoder has higher precedence over ProtoCoder in CoderRegistry#getCoder

2017-07-24 Thread Luke Cwik (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098322#comment-16098322
 ] 

Luke Cwik commented on BEAM-2658:
-

Arbitrary number based precedence orders are difficult to maintain. I agree 
with you that we should special case `DefaultCoder` and `SerializableCoder` for 
now and once we get the third or fourth special case, that may give us 
constraints on how we should structure coder provider ordering.

> SerializableCoder has higher precedence over ProtoCoder in 
> CoderRegistry#getCoder
> -
>
> Key: BEAM-2658
> URL: https://issues.apache.org/jira/browse/BEAM-2658
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.0.0
>Reporter: Neville Li
>Assignee: Davor Bonaci
>Priority: Minor
>
> {code}
> import com.google.protobuf.Timestamp;
> import org.apache.beam.sdk.Pipeline;
> import org.apache.beam.sdk.coders.CannotProvideCoderException;
> import org.apache.beam.sdk.coders.Coder;
> import org.apache.beam.sdk.options.PipelineOptions;
> import org.apache.beam.sdk.options.PipelineOptionsFactory;
> public class CoderTest {
>   public static void main(String[] args) throws CannotProvideCoderException {
> PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create();
> Pipeline pipeline = Pipeline.create(options);
> Coder coder = 
> pipeline.getCoderRegistry().getCoder(Timestamp.class);
> // class org.apache.beam.sdk.coders.SerializableCoder
> System.out.println(coder.getClass());
>   }
> }
> {code}
> Right now we're sorting {{CoderProviderRegistrar}}s by canonical name but 
> {{SerializableCoderProvider}} should be added last as a fallback if there're 
> other {{CoderProvider}}s that support the same type.
> {code}
> Set registrars = 
> Sets.newTreeSet(ObjectsClassComparator.INSTANCE);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (BEAM-2658) SerializableCoder has higher precedence over ProtoCoder in CoderRegistry#getCoder

2017-07-24 Thread Luke Cwik (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098322#comment-16098322
 ] 

Luke Cwik edited comment on BEAM-2658 at 7/24/17 12:57 PM:
---

Arbitrary number based precedence orders are difficult to maintain. I agree 
with you that we should special case *DefaultCoder* and *SerializableCoder* for 
now and once we get the third or fourth special case, that may give us 
constraints on how we should structure coder provider ordering.


was (Author: lcwik):
Arbitrary number based precedence orders are difficult to maintain. I agree 
with you that we should special case ```DefaultCoder``` and 
```SerializableCoder``` for now and once we get the third or fourth special 
case, that may give us constraints on how we should structure coder provider 
ordering.

> SerializableCoder has higher precedence over ProtoCoder in 
> CoderRegistry#getCoder
> -
>
> Key: BEAM-2658
> URL: https://issues.apache.org/jira/browse/BEAM-2658
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.0.0
>Reporter: Neville Li
>Assignee: Davor Bonaci
>Priority: Minor
>
> {code}
> import com.google.protobuf.Timestamp;
> import org.apache.beam.sdk.Pipeline;
> import org.apache.beam.sdk.coders.CannotProvideCoderException;
> import org.apache.beam.sdk.coders.Coder;
> import org.apache.beam.sdk.options.PipelineOptions;
> import org.apache.beam.sdk.options.PipelineOptionsFactory;
> public class CoderTest {
>   public static void main(String[] args) throws CannotProvideCoderException {
> PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create();
> Pipeline pipeline = Pipeline.create(options);
> Coder coder = 
> pipeline.getCoderRegistry().getCoder(Timestamp.class);
> // class org.apache.beam.sdk.coders.SerializableCoder
> System.out.println(coder.getClass());
>   }
> }
> {code}
> Right now we're sorting {{CoderProviderRegistrar}}s by canonical name but 
> {{SerializableCoderProvider}} should be added last as a fallback if there're 
> other {{CoderProvider}}s that support the same type.
> {code}
> Set registrars = 
> Sets.newTreeSet(ObjectsClassComparator.INSTANCE);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (BEAM-2658) SerializableCoder has higher precedence over ProtoCoder in CoderRegistry#getCoder

2017-07-24 Thread Luke Cwik (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098322#comment-16098322
 ] 

Luke Cwik edited comment on BEAM-2658 at 7/24/17 12:57 PM:
---

Arbitrary number based precedence orders are difficult to maintain. I agree 
with you that we should special case ```DefaultCoder``` and 
```SerializableCoder``` for now and once we get the third or fourth special 
case, that may give us constraints on how we should structure coder provider 
ordering.


was (Author: lcwik):
Arbitrary number based precedence orders are difficult to maintain. I agree 
with you that we should special case `DefaultCoder` and `SerializableCoder` for 
now and once we get the third or fourth special case, that may give us 
constraints on how we should structure coder provider ordering.

> SerializableCoder has higher precedence over ProtoCoder in 
> CoderRegistry#getCoder
> -
>
> Key: BEAM-2658
> URL: https://issues.apache.org/jira/browse/BEAM-2658
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.0.0
>Reporter: Neville Li
>Assignee: Davor Bonaci
>Priority: Minor
>
> {code}
> import com.google.protobuf.Timestamp;
> import org.apache.beam.sdk.Pipeline;
> import org.apache.beam.sdk.coders.CannotProvideCoderException;
> import org.apache.beam.sdk.coders.Coder;
> import org.apache.beam.sdk.options.PipelineOptions;
> import org.apache.beam.sdk.options.PipelineOptionsFactory;
> public class CoderTest {
>   public static void main(String[] args) throws CannotProvideCoderException {
> PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create();
> Pipeline pipeline = Pipeline.create(options);
> Coder coder = 
> pipeline.getCoderRegistry().getCoder(Timestamp.class);
> // class org.apache.beam.sdk.coders.SerializableCoder
> System.out.println(coder.getClass());
>   }
> }
> {code}
> Right now we're sorting {{CoderProviderRegistrar}}s by canonical name but 
> {{SerializableCoderProvider}} should be added last as a fallback if there're 
> other {{CoderProvider}}s that support the same type.
> {code}
> Set registrars = 
> Sets.newTreeSet(ObjectsClassComparator.INSTANCE);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Jenkins build is back to normal : beam_PostCommit_Java_ValidatesRunner_Flink #3479

2017-07-24 Thread Apache Jenkins Server

See

Build failed in Jenkins: beam_PostCommit_Java_ValidatesRunner_Flink #3478

2017-07-24 Thread Apache Jenkins Server

See 


--
[...truncated 300.37 KB...]
2017-07-24T12:34:47.636 [INFO] Excluding 
com.google.protobuf:protobuf-java:jar:3.2.0 from the shaded jar.
2017-07-24T12:34:48.407 [INFO] Replacing original artifact with shaded artifact.
2017-07-24T12:34:48.407 [INFO] Replacing 

 with 

2017-07-24T12:34:48.407 [INFO] Replacing original test artifact with shaded 
test artifact.
2017-07-24T12:34:48.407 [INFO] Replacing 

 with 

2017-07-24T12:34:48.514 [INFO] 
2017-07-24T12:34:48.514 [INFO] --- maven-dependency-plugin:3.0.1:analyze-only 
(default) @ beam-sdks-common-runner-api ---
2017-07-24T12:34:48.912 [INFO] No dependency problems found
[JENKINS] Archiving disabled
2017-07-24T12:34:50.297 [INFO]  
   
2017-07-24T12:34:50.297 [INFO] 

2017-07-24T12:34:50.297 [INFO] Building Apache Beam :: SDKs :: Common :: Fn API 
2.1.0-SNAPSHOT
2017-07-24T12:34:50.297 [INFO] 

2017-07-24T12:34:50.303 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/io/grpc/grpc-core/1.2.0/grpc-core-1.2.0.pom
2017-07-24T12:34:50.329 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/io/grpc/grpc-core/1.2.0/grpc-core-1.2.0.pom
 (3 KB at 100.0 KB/sec)
2017-07-24T12:34:50.331 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/google/errorprone/error_prone_annotations/2.0.15/error_prone_annotations-2.0.15.pom
2017-07-24T12:34:50.355 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/google/errorprone/error_prone_annotations/2.0.15/error_prone_annotations-2.0.15.pom
 (2 KB at 63.8 KB/sec)
2017-07-24T12:34:50.356 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/google/errorprone/error_prone_parent/2.0.15/error_prone_parent-2.0.15.pom
2017-07-24T12:34:50.382 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/google/errorprone/error_prone_parent/2.0.15/error_prone_parent-2.0.15.pom
 (6 KB at 200.2 KB/sec)
2017-07-24T12:34:50.384 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/google/code/findbugs/jsr305/3.0.1/jsr305-3.0.1.pom
2017-07-24T12:34:50.409 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/google/code/findbugs/jsr305/3.0.1/jsr305-3.0.1.pom
 (5 KB at 167.4 KB/sec)
2017-07-24T12:34:50.411 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/io/grpc/grpc-context/1.2.0/grpc-context-1.2.0.pom
2017-07-24T12:34:50.435 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/io/grpc/grpc-context/1.2.0/grpc-context-1.2.0.pom
 (2 KB at 71.1 KB/sec)
2017-07-24T12:34:50.437 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/google/instrumentation/instrumentation-api/0.3.0/instrumentation-api-0.3.0.pom
2017-07-24T12:34:50.461 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/google/instrumentation/instrumentation-api/0.3.0/instrumentation-api-0.3.0.pom
 (2 KB at 61.0 KB/sec)
2017-07-24T12:34:50.463 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/io/grpc/grpc-protobuf/1.2.0/grpc-protobuf-1.2.0.pom
2017-07-24T12:34:50.487 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/io/grpc/grpc-protobuf/1.2.0/grpc-protobuf-1.2.0.pom
 (3 KB at 108.0 KB/sec)
2017-07-24T12:34:50.488 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/google/protobuf/protobuf-java-util/3.2.0/protobuf-java-util-3.2.0.pom
2017-07-24T12:34:50.513 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/google/protobuf/protobuf-java-util/3.2.0/protobuf-java-util-3.2.0.pom
 (5 KB at 166.0 KB/sec)
2017-07-24T12:34:50.516 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/google/code/gson/gson/2.7/gson-2.7.pom
2017-07-24T12:34:50.541 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/google/code/gson/gson/2.7/gson-2.7.pom 
(2 KB at 56.4 KB/sec)
2017-07-24T12:34:50.543 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/google/code/gson/gson-parent/2.7/gson-parent-2.7.pom
2017-07-24T12:34:50.567 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/google/code/gson/gson-parent/2.7/gson-parent-2.7.pom
 (4 KB at 145.7 KB/sec)
2017-07-24T12:34:50.569 [INFO]

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2698

2017-07-24 Thread Apache Jenkins Server

See

[jira] [Commented] (BEAM-2571) Flink ValidatesRunner failing CombineTest.testSlidingWindowsCombineWithContext

2017-07-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098293#comment-16098293
 ] 

ASF GitHub Bot commented on BEAM-2571:
--

GitHub user aljoscha opened a pull request:

https://github.com/apache/beam/pull/3625

[BEAM-2571] Fix FlinkRunner Watermark issues on 2.1.0 branch



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aljoscha/beam 
jira-2571-fix-flink-watermark-issues-2.1.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3625.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3625


commit 5c4a95aa63da23027b619a50928ebebe5beb05c2
Author: Aljoscha Krettek 
Date:   2017-07-12T12:39:58Z

[BEAM-2571] Clarify pushedback variable name in DoFnOperator

commit 1686805d84316f2a8438c642214cdabe5b579381
Author: Aljoscha Krettek 
Date:   2017-07-12T12:42:37Z

[BEAM-2571] Change DoFnOperator to use Long.MAX_VALUE as max watermark

This is in line with what Flink does and what BoundedSourceWrapper and
UnboundedSourceWrapper do.

commit ade506526b4ff56eb4ed15e9eea04d1d3345bc13
Author: Aljoscha Krettek 
Date:   2017-07-12T13:38:06Z

[BEAM-2571] Respect watermark contract in Flink DoFnOperator

In Flink, a watermark T specifies that there will be no elements with a
timestamp <= T in the future. In Beam, a watermark T specifies that
there will not be element with a timestamp < T in the future. This leads
to problems when the watermark is exactly "on the timer timestamp", most
prominently, this happened with Triggers, where Flink would fire the
Trigger too early and the Trigger would determine (based on the
watermark) that it is not yet time to fire the window while Flink
thought it was time.

This also adds a test that specifially tests the edge case.




> Flink ValidatesRunner failing CombineTest.testSlidingWindowsCombineWithContext
> --
>
> Key: BEAM-2571
> URL: https://issues.apache.org/jira/browse/BEAM-2571
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kenneth Knowles
>Assignee: Aljoscha Krettek
> Fix For: 2.1.0, 2.2.0
>
>
> This appears to have been caused by https://github.com/apache/beam/pull/3429 
> which fixes a couple errors in how trigger timers were processed / final 
> panes labeled.
> I am investigating, considering roll back vs forward fix. Since it is an 
> esoteric use case where I would advise users to use a stateful DoFn instead, 
> I think the bug fixed probably outweighs the bug introduced. I would like to 
> fix for 2.1.0 but will report back soon.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] beam pull request #3625: [BEAM-2571] Fix FlinkRunner Watermark issues on 2.1...

2017-07-24 Thread aljoscha

GitHub user aljoscha opened a pull request:

https://github.com/apache/beam/pull/3625

[BEAM-2571] Fix FlinkRunner Watermark issues on 2.1.0 branch



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aljoscha/beam 
jira-2571-fix-flink-watermark-issues-2.1.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3625.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3625


commit 5c4a95aa63da23027b619a50928ebebe5beb05c2
Author: Aljoscha Krettek 
Date:   2017-07-12T12:39:58Z

[BEAM-2571] Clarify pushedback variable name in DoFnOperator

commit 1686805d84316f2a8438c642214cdabe5b579381
Author: Aljoscha Krettek 
Date:   2017-07-12T12:42:37Z

[BEAM-2571] Change DoFnOperator to use Long.MAX_VALUE as max watermark

This is in line with what Flink does and what BoundedSourceWrapper and
UnboundedSourceWrapper do.

commit ade506526b4ff56eb4ed15e9eea04d1d3345bc13
Author: Aljoscha Krettek 
Date:   2017-07-12T13:38:06Z

[BEAM-2571] Respect watermark contract in Flink DoFnOperator

In Flink, a watermark T specifies that there will be no elements with a
timestamp <= T in the future. In Beam, a watermark T specifies that
there will not be element with a timestamp < T in the future. This leads
to problems when the watermark is exactly "on the timer timestamp", most
prominently, this happened with Triggers, where Flink would fire the
Trigger too early and the Trigger would determine (based on the
watermark) that it is not yet time to fire the window while Flink
thought it was time.

This also adds a test that specifially tests the edge case.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (BEAM-2642) Upgrade to Google Auth 0.7.1

2017-07-24 Thread Luke Cwik (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-2642:

Fix Version/s: (was: 2.2.0)

> Upgrade to Google Auth 0.7.1
> 
>
> Key: BEAM-2642
> URL: https://issues.apache.org/jira/browse/BEAM-2642
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Affects Versions: 2.0.0
>Reporter: Luke Cwik
>Assignee: Luke Cwik
> Fix For: 2.1.0
>
>
> Looking up application default credentials on a GCE VM can fail due to VM 
> metadata server being unavailable during VM launch. This is a rare event but 
> Google Cloud Dataflow customers hit this rare case one or two times a month 
> due to the sheer number of VMs. GCE attempted to mitigate VM metadata server 
> unavailability but were only able to reduce it be an order of magnitude thus 
> we need support from the client to retry. Additionally, when contacting the 
> GCE VM metadata server, we should be using the fixed IP address avoiding the 
> nameserver lookup (another potential point of failure).
> Problem area in the code:
> https://github.com/google/google-auth-library-java/blob/b94f8e4d02bf6917af2e2f7ef8d7114a51dbcfa8/oauth2_http/java/com/google/auth/oauth2/DefaultCredentialsProvider.java#L261
> Note that the code in this library and the Apiary auth support code are very 
> similar. The fix was done within the Apiary auth code (note the use of the 
> static IP address and also the presence of a fixed number of retries):
> https://github.com/google/google-api-java-client/blob/4fc8c099d9db5646770868cc1bc9a33c9225b3c7/google-api-client/src/main/java/com/google/api/client/googleapis/auth/oauth2/OAuth2Utils.java#L74
> It turned out that the fixes resulted in zero future customer contacts about 
> this issue.
> Google Auth 0.7.1 was released containing these fixes mentioned in 
> https://github.com/google/google-auth-library-java/issues/109



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (BEAM-2662) Quickstart for Spark Java not working

2017-07-24 Thread Luke Cwik (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-2662:

Fix Version/s: (was: 2.2.0)

> Quickstart for Spark Java not working
> -
>
> Key: BEAM-2662
> URL: https://issues.apache.org/jira/browse/BEAM-2662
> Project: Beam
>  Issue Type: Bug
>  Components: examples-java, runner-spark
>Affects Versions: 2.1.0
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Blocker
> Fix For: 2.1.0
>
>
> Spark version is missing in generated examples archetype pom.xml:
> {code}
> 
>   org.apache.spark
>   spark-streaming_2.10
>   ${spark.version}  <--- Missing
>   runtime
>   
> 
>   org.slf4j
>   jul-to-slf4j
> 
>   
> 
> {code}
> Quickstart fails with:
> {code}
> lcwik@lcwik0:~/release/word-count-beam$ mvn compile exec:java 
> -Dexec.mainClass=org.apache.beam.examples.WordCount  
> -Dexec.args="--runner=SparkRunner --inputFile=pom.xml --output=counts" 
> -Pspark-runner
> [INFO] Scanning for projects...
> [ERROR] [ERROR] Some problems were encountered while processing the POMs:
> [ERROR] 'dependencies.dependency.version' for 
> org.apache.spark:spark-streaming_2.10:jar is missing. @ line 241, column 21
>  @ 
> [ERROR] The build could not read 1 project -> [Help 1]
> [ERROR]   
> [ERROR]   The project org.example:word-count-beam:0.1 
> (/usr/local/google/home/lcwik/release/word-count-beam/pom.xml) has 1 error
> [ERROR] 'dependencies.dependency.version' for 
> org.apache.spark:spark-streaming_2.10:jar is missing. @ line 241, column 21
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
> {code}
> I generated the archetype pom.xml with:
> {code}
> mvn archetype:generate   -DarchetypeGroupId=org.apache.beam   
> -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples   
> -DarchetypeVersion=2.1.0   -DgroupId=org.example   
> -DartifactId=word-count-beam   -Dversion="0.1"   
> -Dpackage=org.apache.beam.examples -DinteractiveMode=false
> {code}
> after adding 
> https://repository.apache.org/content/repositories/orgapachebeam-1019/ as a 
> repo in my settings.xml



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (BEAM-2671) CreateStreamTest.testFirstElementLate validatesRunner test fails on Spark runner

2017-07-24 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré updated BEAM-2671:
---
Fix Version/s: 2.2.0
   2.1.0

> CreateStreamTest.testFirstElementLate validatesRunner test fails on Spark 
> runner
> 
>
> Key: BEAM-2671
> URL: https://issues.apache.org/jira/browse/BEAM-2671
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Jean-Baptiste Onofré
> Fix For: 2.1.0, 2.2.0
>
>
> Error message:
> Flatten.Iterables/FlattenIterables/FlatMap/ParMultiDo(Anonymous).out0: 
> Expected: iterable over [] in any order
>  but: Not matched: "late"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-2571) Flink ValidatesRunner failing CombineTest.testSlidingWindowsCombineWithContext

2017-07-24 Thread JIRA


[ 
https://issues.apache.org/jira/browse/BEAM-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098281#comment-16098281
 ] 

Jean-Baptiste Onofré commented on BEAM-2571:


As discussed on the mailing list, we should cherry-pick the fix on 
release-2.1.0 branch.

> Flink ValidatesRunner failing CombineTest.testSlidingWindowsCombineWithContext
> --
>
> Key: BEAM-2571
> URL: https://issues.apache.org/jira/browse/BEAM-2571
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kenneth Knowles
>Assignee: Aljoscha Krettek
> Fix For: 2.1.0, 2.2.0
>
>
> This appears to have been caused by https://github.com/apache/beam/pull/3429 
> which fixes a couple errors in how trigger timers were processed / final 
> panes labeled.
> I am investigating, considering roll back vs forward fix. Since it is an 
> esoteric use case where I would advise users to use a stateful DoFn instead, 
> I think the bug fixed probably outweighs the bug introduced. I would like to 
> fix for 2.1.0 but will report back soon.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (BEAM-2571) Flink ValidatesRunner failing CombineTest.testSlidingWindowsCombineWithContext

2017-07-24 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré updated BEAM-2571:
---
Fix Version/s: 2.1.0

> Flink ValidatesRunner failing CombineTest.testSlidingWindowsCombineWithContext
> --
>
> Key: BEAM-2571
> URL: https://issues.apache.org/jira/browse/BEAM-2571
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kenneth Knowles
>Assignee: Aljoscha Krettek
> Fix For: 2.1.0, 2.2.0
>
>
> This appears to have been caused by https://github.com/apache/beam/pull/3429 
> which fixes a couple errors in how trigger timers were processed / final 
> panes labeled.
> I am investigating, considering roll back vs forward fix. Since it is an 
> esoteric use case where I would advise users to use a stateful DoFn instead, 
> I think the bug fixed probably outweighs the bug introduced. I would like to 
> fix for 2.1.0 but will report back soon.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Reopened] (BEAM-2571) Flink ValidatesRunner failing CombineTest.testSlidingWindowsCombineWithContext

2017-07-24 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré reopened BEAM-2571:


> Flink ValidatesRunner failing CombineTest.testSlidingWindowsCombineWithContext
> --
>
> Key: BEAM-2571
> URL: https://issues.apache.org/jira/browse/BEAM-2571
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kenneth Knowles
>Assignee: Aljoscha Krettek
> Fix For: 2.1.0, 2.2.0
>
>
> This appears to have been caused by https://github.com/apache/beam/pull/3429 
> which fixes a couple errors in how trigger timers were processed / final 
> panes labeled.
> I am investigating, considering roll back vs forward fix. Since it is an 
> esoteric use case where I would advise users to use a stateful DoFn instead, 
> I think the bug fixed probably outweighs the bug introduced. I would like to 
> fix for 2.1.0 but will report back soon.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (BEAM-2571) Flink ValidatesRunner failing CombineTest.testSlidingWindowsCombineWithContext

2017-07-24 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed BEAM-2571.
--
Resolution: Fixed

> Flink ValidatesRunner failing CombineTest.testSlidingWindowsCombineWithContext
> --
>
> Key: BEAM-2571
> URL: https://issues.apache.org/jira/browse/BEAM-2571
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kenneth Knowles
>Assignee: Aljoscha Krettek
> Fix For: 2.2.0
>
>
> This appears to have been caused by https://github.com/apache/beam/pull/3429 
> which fixes a couple errors in how trigger timers were processed / final 
> panes labeled.
> I am investigating, considering roll back vs forward fix. Since it is an 
> esoteric use case where I would advise users to use a stateful DoFn instead, 
> I think the bug fixed probably outweighs the bug introduced. I would like to 
> fix for 2.1.0 but will report back soon.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Build failed in Jenkins: beam_PerformanceTests_Python #138

2017-07-24 Thread Apache Jenkins Server

See 


Changes:

[aljoscha.krettek] [BEAM-2571] Change DoFnOperator to use Long.MAX_VALUE as max 
watermark

[aljoscha.krettek] [BEAM-2571] Clarify pushedback variable name in DoFnOperator

[aljoscha.krettek] [BEAM-2571] Respect watermark contract in Flink DoFnOperator

--
Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on beam7 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/${ghprbPullId}/*:refs/remotes/origin/pr/${ghprbPullId}/*
 > git rev-parse origin/master^{commit} # timeout=10
Checking out Revision f54072a1b156176624db5b666ed9c308f24bc2f9 (origin/master)
Commit message: "This closes #3600"
 > git config core.sparsecheckout # timeout=10
 > git checkout -f f54072a1b156176624db5b666ed9c308f24bc2f9
 > git rev-list 5f1d1365ba73aeac5a76a31c64ead2d7e94e4040 # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
[EnvInject] - Executing scripts and injecting environment variables after the 
SCM step.
[EnvInject] - Injecting as environment variables the properties content 
SPARK_LOCAL_IP=127.0.0.1

[EnvInject] - Variables injected successfully.
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins3289046703058156085.sh
+ rm -rf PerfKitBenchmarker
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins6864417314985724946.sh
+ git clone https://github.com/GoogleCloudPlatform/PerfKitBenchmarker.git
Cloning into 'PerfKitBenchmarker'...
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins4221975222546660170.sh
+ pip install --user -r PerfKitBenchmarker/requirements.txt
Requirement already satisfied (use --upgrade to upgrade): python-gflags==3.1.1 
in /home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 14))
Requirement already satisfied (use --upgrade to upgrade): jinja2>=2.7 in 
/usr/local/lib/python2.7/dist-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 15))
Requirement already satisfied (use --upgrade to upgrade): setuptools in 
/usr/lib/python2.7/dist-packages (from -r PerfKitBenchmarker/requirements.txt 
(line 16))
Requirement already satisfied (use --upgrade to upgrade): 
colorlog[windows]==2.6.0 in /home/jenkins/.local/lib/python2.7/site-packages 
(from -r PerfKitBenchmarker/requirements.txt (line 17))
  Installing extra requirements: 'windows'
Requirement already satisfied (use --upgrade to upgrade): blinker>=1.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 18))
Requirement already satisfied (use --upgrade to upgrade): futures>=3.0.3 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 19))
Requirement already satisfied (use --upgrade to upgrade): PyYAML==3.12 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 20))
Requirement already satisfied (use --upgrade to upgrade): pint>=0.7 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 21))
Requirement already satisfied (use --upgrade to upgrade): numpy in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 22))
Requirement already satisfied (use --upgrade to upgrade): functools32 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 23))
Requirement already satisfied (use --upgrade to upgrade): contextlib2>=0.5.1 in 
/home/jenkins/.local/lib/python2.7/site-packages (from -r 
PerfKitBenchmarker/requirements.txt (line 24))
Cleaning up...
[beam_PerformanceTests_Python] $ /bin/bash -xe 
/tmp/jenkins1365742652084777342.sh
+ pip install --user -e 'sdks/python/[gcp,test]'
Obtaining 
file://
  Running setup.py 
(path:
 egg_info for package from 
file://

:66:
 UserWarning: You are using version 1.5.4 of pip. However, version 7.0.0 is 
recommended.
  _PIP_VERSION, REQUIRED_PIP_VERSION
no previously-included directories found matching 'doc/.build'

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2697

2017-07-24 Thread Apache Jenkins Server

See

Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #4444

2017-07-24 Thread Apache Jenkins Server

See

[jira] [Commented] (BEAM-2671) CreateStreamTest.testFirstElementLate validatesRunner test fails on Spark runner

2017-07-24 Thread JIRA


[ 
https://issues.apache.org/jira/browse/BEAM-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098110#comment-16098110
 ] 

Jean-Baptiste Onofré commented on BEAM-2671:


We saw that and IMHO, it should be fix for 2.1.0 RC3. I gonna take a look.

> CreateStreamTest.testFirstElementLate validatesRunner test fails on Spark 
> runner
> 
>
> Key: BEAM-2671
> URL: https://issues.apache.org/jira/browse/BEAM-2671
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Jean-Baptiste Onofré
>
> Error message:
> Flatten.Iterables/FlattenIterables/FlatMap/ParMultiDo(Anonymous).out0: 
> Expected: iterable over [] in any order
>  but: Not matched: "late"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (BEAM-2671) CreateStreamTest.testFirstElementLate validatesRunner test fails on Spark runner

2017-07-24 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré reassigned BEAM-2671:
--

Assignee: Jean-Baptiste Onofré  (was: Amit Sela)

> CreateStreamTest.testFirstElementLate validatesRunner test fails on Spark 
> runner
> 
>
> Key: BEAM-2671
> URL: https://issues.apache.org/jira/browse/BEAM-2671
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Jean-Baptiste Onofré
>
> Error message:
> Flatten.Iterables/FlattenIterables/FlatMap/ParMultiDo(Anonymous).out0: 
> Expected: iterable over [] in any order
>  but: Not matched: "late"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (BEAM-2671) CreateStreamTest.testFirstElementLate validatesRunner test fails on Spark runner

2017-07-24 Thread Etienne Chauchot (JIRA)

Etienne Chauchot created BEAM-2671:
--

 Summary: CreateStreamTest.testFirstElementLate validatesRunner 
test fails on Spark runner
 Key: BEAM-2671
 URL: https://issues.apache.org/jira/browse/BEAM-2671
 Project: Beam
  Issue Type: Bug
  Components: runner-spark
Reporter: Etienne Chauchot
Assignee: Amit Sela


Error message:
Flatten.Iterables/FlattenIterables/FlatMap/ParMultiDo(Anonymous).out0: 
Expected: iterable over [] in any order
 but: Not matched: "late"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (BEAM-2670) ParDoTest.testPipelineOptionsParameter* validatesRunner tests fail on Spark runner

2017-07-24 Thread Etienne Chauchot (JIRA)

Etienne Chauchot created BEAM-2670:
--

 Summary: ParDoTest.testPipelineOptionsParameter* validatesRunner 
tests fail on Spark runner
 Key: BEAM-2670
 URL: https://issues.apache.org/jira/browse/BEAM-2670
 Project: Beam
  Issue Type: Bug
  Components: runner-spark
Reporter: Etienne Chauchot
Assignee: Amit Sela
Priority: Minor


Tests succeed when run alone but fail when the whole ParDoTest is run. May be 
related to PipelineOptions reusing / not cleaning.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2696

2017-07-24 Thread Apache Jenkins Server

See

[jira] [Updated] (BEAM-2669) Kryo serialization exception when DStreams containing non-Kryo-serializable data are cached

2017-07-24 Thread Aviem Zur (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-2669:

Description: 
Today, when we detect re-use of a dataset in a pipeline in Spark runner we 
eagerly cache it to avoid calculating the same data multiple times.
([EvaluationContext.java|https://github.com/apache/beam/blob/v2.0.0/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/EvaluationContext.java#L148])

When the dataset is bounded, which in Spark is represented by an {{RDD}}, we 
call {{RDD#persist}} and use storage level provided by the user via 
{{SparkPipelineOptions}}. 
([BoundedDataset.java|https://github.com/apache/beam/blob/v2.0.0/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/BoundedDataset.java#L103-L103])

When the dataset is unbounded, which in Spark is represented by a {{DStream}} 
we call {{DStream.cache()}} which defaults to persist the {{DStream}} using 
storage level {{MEMORY_ONLY_SER}} 
([UnboundedDataset.java|https://github.com/apache/beam/blob/v2.0.0/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/UnboundedDataset.java#L61])
 
([DStream.scala|https://github.com/apache/spark/blob/v1.6.3/streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala#L169])

Storage level {{MEMORY_ONLY_SER}} means Spark will serialize the data using its 
configured serializer. Since we configure this to be Kryo in a hard coded 
fashion, this means the data will be serialized using Kryo. 
([SparkContextFactory.java|https://github.com/apache/beam/blob/v2.0.0/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkContextFactory.java#L99-L99])

Due to this, if your {{DStream}} contains non-Kryo-serializable data you will 
encounter Kryo serialization exceptions and your task will fail.

Possible actions we should consider:
# Remove the hard coded Spark serializer configuration, this should be taken 
from the user's configuration of Spark, no real reason for us to interfere with 
this.
# Use the user's configured storage level configuration from 
{{SparkPipelineOptions}} when caching unbounded datasets ({{DStream}}s), same 
as we do for bounded datasets.
# Make caching of re-used datasets configurable in {{SparkPipelineOptions}} 
(enable/disable). Although overloading our configuration with more options is 
always something not to be taken lightly.

  was:
Today, when we detect re-use of a dataset in Spark runner we eagerly cache it 
to avoid calculating the same data multiple times.
([EvaluationContext.java|https://github.com/apache/beam/blob/v2.0.0/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/EvaluationContext.java#L148])

When the dataset is bounded, which in Spark is represented by an {{RDD}}, we 
call {{RDD#persist}} and use storage level provided by the user via 
{{SparkPipelineOptions}}. 
([BoundedDataset.java|https://github.com/apache/beam/blob/v2.0.0/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/BoundedDataset.java#L103-L103])

When the dataset is unbounded, which in Spark is represented by a {{DStream}} 
we call {{DStream.cache()}} which defaults to persist the {{DStream}} using 
storage level {{MEMORY_ONLY_SER}} 
([UnboundedDataset.java|https://github.com/apache/beam/blob/v2.0.0/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/UnboundedDataset.java#L61])
 
([DStream.scala|https://github.com/apache/spark/blob/v1.6.3/streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala#L169])

Storage level {{MEMORY_ONLY_SER}} means Spark will serialize the data using its 
configured serializer. Since we configure this to be Kryo in a hard coded 
fashion, this means the data will be serialized using Kryo. 
([SparkContextFactory.java|https://github.com/apache/beam/blob/v2.0.0/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkContextFactory.java#L99-L99])

Due to this, if your {{DStream}} contains non-Kryo-serializable data you will 
encounter Kryo serialization exceptions and your task will fail.

Possible actions we should consider:
# Remove the hard coded Spark serializer configuration, this should be taken 
from the user's configuration of Spark, no real reason for us to interfere with 
this.
# Use the user's configured storage level configuration from 
{{SparkPipelineOptions}} when caching unbounded datasets ({{DStream}}s), same 
as we do for bounded datasets.
# Make caching of re-used datasets configurable in {{SparkPipelineOptions}} 
(enable/disable). Although overloading our configuration with more options is 
always something not to be taken lightly.


> Kryo serialization exception when DStreams containing non-Kryo-serializable 
> data are cached
> ---
>
> Key: BEAM-2669
> URL:

[jira] [Commented] (BEAM-2669) Kryo serialization exception when DStreams containing non-Kryo-serializable data are cached

2017-07-24 Thread JIRA


[ 
https://issues.apache.org/jira/browse/BEAM-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098070#comment-16098070
 ] 

Jean-Baptiste Onofré commented on BEAM-2669:


Thanks for the details !

> Kryo serialization exception when DStreams containing non-Kryo-serializable 
> data are cached
> ---
>
> Key: BEAM-2669
> URL: https://issues.apache.org/jira/browse/BEAM-2669
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Affects Versions: 0.4.0, 0.5.0, 0.6.0, 2.0.0
>Reporter: Aviem Zur
>Assignee: Amit Sela
>
> Today, when we detect re-use of a dataset in Spark runner we eagerly cache it 
> to avoid calculating the same data multiple times.
> ([EvaluationContext.java|https://github.com/apache/beam/blob/v2.0.0/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/EvaluationContext.java#L148])
> When the dataset is bounded, which in Spark is represented by an {{RDD}}, we 
> call {{RDD#persist}} and use storage level provided by the user via 
> {{SparkPipelineOptions}}. 
> ([BoundedDataset.java|https://github.com/apache/beam/blob/v2.0.0/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/BoundedDataset.java#L103-L103])
> When the dataset is unbounded, which in Spark is represented by a {{DStream}} 
> we call {{DStream.cache()}} which defaults to persist the {{DStream}} using 
> storage level {{MEMORY_ONLY_SER}} 
> ([UnboundedDataset.java|https://github.com/apache/beam/blob/v2.0.0/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/UnboundedDataset.java#L61])
>  
> ([DStream.scala|https://github.com/apache/spark/blob/v1.6.3/streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala#L169])
> Storage level {{MEMORY_ONLY_SER}} means Spark will serialize the data using 
> its configured serializer. Since we configure this to be Kryo in a hard coded 
> fashion, this means the data will be serialized using Kryo. 
> ([SparkContextFactory.java|https://github.com/apache/beam/blob/v2.0.0/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkContextFactory.java#L99-L99])
> Due to this, if your {{DStream}} contains non-Kryo-serializable data you will 
> encounter Kryo serialization exceptions and your task will fail.
> Possible actions we should consider:
> # Remove the hard coded Spark serializer configuration, this should be taken 
> from the user's configuration of Spark, no real reason for us to interfere 
> with this.
> # Use the user's configured storage level configuration from 
> {{SparkPipelineOptions}} when caching unbounded datasets ({{DStream}}s), same 
> as we do for bounded datasets.
> # Make caching of re-used datasets configurable in {{SparkPipelineOptions}} 
> (enable/disable). Although overloading our configuration with more options is 
> always something not to be taken lightly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-2669) Kryo serialization exception when DStreams containing non-Kryo-serializable data are cached

2017-07-24 Thread JIRA


[ 
https://issues.apache.org/jira/browse/BEAM-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098072#comment-16098072
 ] 

Jean-Baptiste Onofré commented on BEAM-2669:


I would be in favor of 2, I don't remember the reason why we apply the user 
defined storage level to only {{RDDs}} and not {{DStreams}}.

> Kryo serialization exception when DStreams containing non-Kryo-serializable 
> data are cached
> ---
>
> Key: BEAM-2669
> URL: https://issues.apache.org/jira/browse/BEAM-2669
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Affects Versions: 0.4.0, 0.5.0, 0.6.0, 2.0.0
>Reporter: Aviem Zur
>Assignee: Amit Sela
>
> Today, when we detect re-use of a dataset in Spark runner we eagerly cache it 
> to avoid calculating the same data multiple times.
> ([EvaluationContext.java|https://github.com/apache/beam/blob/v2.0.0/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/EvaluationContext.java#L148])
> When the dataset is bounded, which in Spark is represented by an {{RDD}}, we 
> call {{RDD#persist}} and use storage level provided by the user via 
> {{SparkPipelineOptions}}. 
> ([BoundedDataset.java|https://github.com/apache/beam/blob/v2.0.0/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/BoundedDataset.java#L103-L103])
> When the dataset is unbounded, which in Spark is represented by a {{DStream}} 
> we call {{DStream.cache()}} which defaults to persist the {{DStream}} using 
> storage level {{MEMORY_ONLY_SER}} 
> ([UnboundedDataset.java|https://github.com/apache/beam/blob/v2.0.0/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/UnboundedDataset.java#L61])
>  
> ([DStream.scala|https://github.com/apache/spark/blob/v1.6.3/streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala#L169])
> Storage level {{MEMORY_ONLY_SER}} means Spark will serialize the data using 
> its configured serializer. Since we configure this to be Kryo in a hard coded 
> fashion, this means the data will be serialized using Kryo. 
> ([SparkContextFactory.java|https://github.com/apache/beam/blob/v2.0.0/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkContextFactory.java#L99-L99])
> Due to this, if your {{DStream}} contains non-Kryo-serializable data you will 
> encounter Kryo serialization exceptions and your task will fail.
> Possible actions we should consider:
> # Remove the hard coded Spark serializer configuration, this should be taken 
> from the user's configuration of Spark, no real reason for us to interfere 
> with this.
> # Use the user's configured storage level configuration from 
> {{SparkPipelineOptions}} when caching unbounded datasets ({{DStream}}s), same 
> as we do for bounded datasets.
> # Make caching of re-used datasets configurable in {{SparkPipelineOptions}} 
> (enable/disable). Although overloading our configuration with more options is 
> always something not to be taken lightly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (BEAM-2669) Kryo serialization exception when DStreams containing non-Kryo-serializable data are cached

2017-07-24 Thread Aviem Zur (JIRA)

Aviem Zur created BEAM-2669:
---

 Summary: Kryo serialization exception when DStreams containing 
non-Kryo-serializable data are cached
 Key: BEAM-2669
 URL: https://issues.apache.org/jira/browse/BEAM-2669
 Project: Beam
  Issue Type: Bug
  Components: runner-spark
Affects Versions: 2.0.0, 0.6.0, 0.5.0, 0.4.0
Reporter: Aviem Zur
Assignee: Amit Sela


Today, when we detect re-use of a dataset in Spark runner we eagerly cache it 
to avoid calculating the same data multiple times.
([EvaluationContext.java|https://github.com/apache/beam/blob/v2.0.0/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/EvaluationContext.java#L148])

When the dataset is bounded, which in Spark is represented by an {{RDD}}, we 
call {{RDD#persist}} and use storage level provided by the user via 
{{SparkPipelineOptions}}. 
([BoundedDataset.java|https://github.com/apache/beam/blob/v2.0.0/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/BoundedDataset.java#L103-L103])

When the dataset is unbounded, which in Spark is represented by a {{DStream}} 
we call {{DStream.cache()}} which defaults to persist the {{DStream}} using 
storage level {{MEMORY_ONLY_SER}} 
([UnboundedDataset.java|https://github.com/apache/beam/blob/v2.0.0/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/UnboundedDataset.java#L61])
 
([DStream.scala|https://github.com/apache/spark/blob/v1.6.3/streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala#L169])

Storage level {{MEMORY_ONLY_SER}} means Spark will serialize the data using its 
configured serializer. Since we configure this to be Kryo in a hard coded 
fashion, this means the data will be serialized using Kryo. 
([SparkContextFactory.java|https://github.com/apache/beam/blob/v2.0.0/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkContextFactory.java#L99-L99])

Due to this, if your {{DStream}} contains non-Kryo-serializable data you will 
encounter Kryo serialization exceptions and your task will fail.

Possible actions we should consider:
# Remove the hard coded Spark serializer configuration, this should be taken 
from the user's configuration of Spark, no real reason for us to interfere with 
this.
# Use the user's configured storage level configuration from 
{{SparkPipelineOptions}} when caching unbounded datasets ({{DStream}}s), same 
as we do for bounded datasets.
# Make caching of re-used datasets configurable in {{SparkPipelineOptions}} 
(enable/disable). Although overloading our configuration with more options is 
always something not to be taken lightly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

1 2 >

1 - 100 of 116 matches

Mail list logo