[jira] [Resolved] (BEAM-1397) Introduce IO metrics

2017-03-27 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur resolved BEAM-1397.
-
   Resolution: Done
Fix Version/s: First stable release

> Introduce IO metrics
> 
>
> Key: BEAM-1397
> URL: https://issues.apache.org/jira/browse/BEAM-1397
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Aviem Zur
> Fix For: First stable release
>
>
> Introduce the usage of metrics API in IOs.
> POC using {{CountingInput}}:
> * Add metrics to {{CountingInput}}
> * {{RunnableOnService}} test which creates a pipeline which asserts these 
> metrics.
> * Close any gaps in Direct runner and Spark runner to support these metrics.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build is back to stable : beam_PostCommit_Java_RunnableOnService_Dataflow #2662

2017-03-27 Thread Apache Jenkins Server
See 




Jenkins build is back to normal : beam_PostCommit_Java_MavenInstall #3063

2017-03-27 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-1398) KafkaIO metrics

2017-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944515#comment-15944515
 ] 

ASF GitHub Bot commented on BEAM-1398:
--

GitHub user aviemzur opened a pull request:

https://github.com/apache/beam/pull/2344

[BEAM-1398] KafkaIO metrics.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aviemzur/beam kafka-io-metrics

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2344.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2344


commit 200ef65c5751327aa058421d035f5f68e3db0ec1
Author: Aviem Zur 
Date:   2017-03-28T04:29:53Z

[BEAM-1398] KafkaIO metrics.




> KafkaIO metrics
> ---
>
> Key: BEAM-1398
> URL: https://issues.apache.org/jira/browse/BEAM-1398
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> Add metrics to {{KafkaIO}} using the metrics API.
> Metrics (Feel free to add more ideas here) per split (Where applicable):
> * Backlog in bytes.
> * Backlog in number of messages.
> * Messages consumed.
> * Bytes consumed.
> * Messages produced.
> Add {{RunnableOnService}} test which creates a pipeline and asserts metrics 
> values.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build is back to normal : beam_PostCommit_Java_RunnableOnService_Spark #1405

2017-03-27 Thread Apache Jenkins Server
See 




Jenkins build is back to normal : beam_PostCommit_Java_RunnableOnService_Flink #2091

2017-03-27 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-1397) Introduce IO metrics

2017-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944484#comment-15944484
 ] 

ASF GitHub Bot commented on BEAM-1397:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2162


> Introduce IO metrics
> 
>
> Key: BEAM-1397
> URL: https://issues.apache.org/jira/browse/BEAM-1397
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> Introduce the usage of metrics API in IOs.
> POC using {{CountingInput}}:
> * Add metrics to {{CountingInput}}
> * {{RunnableOnService}} test which creates a pipeline which asserts these 
> metrics.
> * Close any gaps in Direct runner and Spark runner to support these metrics.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2162: [BEAM-1397] Introduce IO metrics

2017-03-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2162


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: This closes #2162

2017-03-27 Thread aviemzur
This closes #2162


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/48fee91f
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/48fee91f
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/48fee91f

Branch: refs/heads/master
Commit: 48fee91f7d720d03da53476e9a237eabcbfc0460
Parents: 85b820c 65b5f00
Author: Aviem Zur 
Authored: Tue Mar 28 06:51:39 2017 +0300
Committer: Aviem Zur 
Committed: Tue Mar 28 06:51:39 2017 +0300

--
 .../beam/runners/spark/TestSparkRunner.java | 14 +++-
 .../apache/beam/runners/spark/io/SourceRDD.java | 51 +-
 .../runners/spark/io/SparkUnboundedSource.java  | 48 +
 .../spark/metrics/SparkMetricsContainer.java| 11 ++-
 .../spark/stateful/StateSpecFunctions.java  | 35 +++---
 .../spark/translation/TransformTranslator.java  |  3 +-
 .../streaming/StreamingTransformTranslator.java |  4 +-
 .../streaming/StreamingSourceMetricsTest.java   | 71 
 .../org/apache/beam/sdk/io/CountingSource.java  |  8 +++
 .../apache/beam/sdk/metrics/MetricsTest.java| 45 +
 10 files changed, 244 insertions(+), 46 deletions(-)
--




[1/2] beam git commit: [BEAM-1397] Introduce IO metrics

2017-03-27 Thread aviemzur
Repository: beam
Updated Branches:
  refs/heads/master 85b820c37 -> 48fee91f7


[BEAM-1397] Introduce IO metrics


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/65b5f001
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/65b5f001
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/65b5f001

Branch: refs/heads/master
Commit: 65b5f001a4e1790206efe3ff2d418018680ea621
Parents: 85b820c
Author: Aviem Zur 
Authored: Tue Mar 28 06:49:59 2017 +0300
Committer: Aviem Zur 
Committed: Tue Mar 28 06:49:59 2017 +0300

--
 .../beam/runners/spark/TestSparkRunner.java | 14 +++-
 .../apache/beam/runners/spark/io/SourceRDD.java | 51 +-
 .../runners/spark/io/SparkUnboundedSource.java  | 48 +
 .../spark/metrics/SparkMetricsContainer.java| 11 ++-
 .../spark/stateful/StateSpecFunctions.java  | 35 +++---
 .../spark/translation/TransformTranslator.java  |  3 +-
 .../streaming/StreamingTransformTranslator.java |  4 +-
 .../streaming/StreamingSourceMetricsTest.java   | 71 
 .../org/apache/beam/sdk/io/CountingSource.java  |  8 +++
 .../apache/beam/sdk/metrics/MetricsTest.java| 45 +
 10 files changed, 244 insertions(+), 46 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/65b5f001/runners/spark/src/main/java/org/apache/beam/runners/spark/TestSparkRunner.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/TestSparkRunner.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/TestSparkRunner.java
index e40534f..be9ff2e 100644
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/TestSparkRunner.java
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/TestSparkRunner.java
@@ -135,7 +135,12 @@ public final class TestSparkRunner extends 
PipelineRunner {
 isOneOf(PipelineResult.State.STOPPED, PipelineResult.State.DONE));
 
 // validate assertion succeeded (at least once).
-int successAssertions = 
result.getAggregatorValue(PAssert.SUCCESS_COUNTER, Integer.class);
+int successAssertions = 0;
+try {
+  successAssertions = 
result.getAggregatorValue(PAssert.SUCCESS_COUNTER, Integer.class);
+} catch (NullPointerException e) {
+  // No assertions registered will cause an NPE here.
+}
 Integer expectedAssertions = 
testSparkPipelineOptions.getExpectedAssertions() != null
 ? testSparkPipelineOptions.getExpectedAssertions() : 
expectedNumberOfAssertions;
 assertThat(
@@ -145,7 +150,12 @@ public final class TestSparkRunner extends 
PipelineRunner {
 successAssertions,
 is(expectedAssertions));
 // validate assertion didn't fail.
-int failedAssertions = 
result.getAggregatorValue(PAssert.FAILURE_COUNTER, Integer.class);
+int failedAssertions = 0;
+try {
+  failedAssertions = 
result.getAggregatorValue(PAssert.FAILURE_COUNTER, Integer.class);
+} catch (NullPointerException e) {
+  // No assertions registered will cause an NPE here.
+}
 assertThat(
 String.format("Found %d failed assertions.", failedAssertions),
 failedAssertions,

http://git-wip-us.apache.org/repos/asf/beam/blob/65b5f001/runners/spark/src/main/java/org/apache/beam/runners/spark/io/SourceRDD.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/io/SourceRDD.java 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/io/SourceRDD.java
index 1a3537f..2f9a827 100644
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/io/SourceRDD.java
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/io/SourceRDD.java
@@ -20,15 +20,21 @@ package org.apache.beam.runners.spark.io;
 
 import static com.google.common.base.Preconditions.checkArgument;
 
+import java.io.Closeable;
 import java.io.IOException;
 import java.util.Collections;
 import java.util.Iterator;
 import java.util.List;
+import org.apache.beam.runners.spark.metrics.MetricsAccumulator;
+import org.apache.beam.runners.spark.metrics.SparkMetricsContainer;
 import org.apache.beam.runners.spark.translation.SparkRuntimeContext;
 import org.apache.beam.sdk.io.BoundedSource;
 import org.apache.beam.sdk.io.Source;
 import org.apache.beam.sdk.io.UnboundedSource;
+import org.apache.beam.sdk.metrics.MetricsContainer;
+import org.apache.beam.sdk.metrics.MetricsEnvironment;
 import org.apache.beam.sdk.util.WindowedValue;
+import org.apache.spark.Accumulator;
 import org.apache.spark.Dependency;
 import org.apache.spark.HashPartitioner;
 

[jira] [Commented] (BEAM-1439) Beam Example(s) exploring public document datasets

2017-03-27 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944466#comment-15944466
 ] 

Kenneth Knowles commented on BEAM-1439:
---

And also, please engage with the Beam community early - before applications are 
reviewed!

Here are some ideas for getting engaged:

# Work through Beam's "getting started" materials such as 
https://beam.apache.org/get-started/quickstart-java/
#* Especially get as familiar as you can with the runner that you are 
interested in
# Subscribe to d...@beam.apache.org and/or u...@beam.apache.org
# You are welcome to share your applications for early commentary on 
d...@beam.apache.org to get early feedback and mentorship (this is quite normal 
for GSoC+Apache; even if you don't get selected by GSoC you will learn and make 
new acquaintances)
# Pick up starter bugs to get familiar with the codebase beyond our getting 
started material

> Beam Example(s) exploring public document datasets
> --
>
> Key: BEAM-1439
> URL: https://issues.apache.org/jira/browse/BEAM-1439
> Project: Beam
>  Issue Type: Wish
>  Components: examples-java
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Minor
>  Labels: gsoc2017, java, mentor, python
>
> In Beam, we have examples illustrating counting the occurrences of words and 
> performing a basic TF-IDF analysis on the works of Shakespeare (or whatever 
> you point it at). It would be even cooler to do these analyses, and more, on 
> a much larger data set that is really the subject of current investigations.
> In chatting with professors at the University of Washington, I've learned 
> that scholars of many fields would really like to explore new and highly 
> customized ways of processing the growing body of publicly-available 
> scholarly documents, such as PubMed Central. Queries like "show me documents 
> where chemical compounds X and Y were both used in the 'method' section"
> So I propose a Google Summer of Code project wherein a student writes some 
> large-scale Beam pipelines to perform analyses such as term frequency, bigram 
> frequency, etc.
> Skills required:
>  - Java or Python
>  - (nice to have) Working through the Beam getting started materials



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build is back to normal : beam_PostCommit_Python_Verify #1645

2017-03-27 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #2661

2017-03-27 Thread Apache Jenkins Server
See 




[jira] [Resolved] (BEAM-1622) Java: Rename RunnableOnService to ValidatesRunner

2017-03-27 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-1622.
---
   Resolution: Fixed
Fix Version/s: First stable release

> Java: Rename RunnableOnService to ValidatesRunner
> -
>
> Key: BEAM-1622
> URL: https://issues.apache.org/jira/browse/BEAM-1622
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core, testing
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Minor
> Fix For: First stable release
>
>
> We agreed on the dev list a long time ago to this rename, and Python SDK has 
> executed it. Since then I've had a half dozen conversations to clarify it, so 
> let's get this done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-848) Shuffle the input read-values to get maximum parallelism.

2017-03-27 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-848:
---
Summary: Shuffle the input read-values to get maximum parallelism.  (was: A 
better shuffle after reading from within mapWithState.)

> Shuffle the input read-values to get maximum parallelism.
> -
>
> Key: BEAM-848
> URL: https://issues.apache.org/jira/browse/BEAM-848
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Aviem Zur
>
> It would be wise to shuffle the read values _after_ flatmap to increase 
> parallelism in processing of the data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-848) A better shuffle after reading from within mapWithState.

2017-03-27 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-848:
---
Description: It would be wise to shuffle the read values _after_ flatmap to 
increase parallelism in processing of the data.  (was: The SparkRunner uses 
{{mapWithState}} to read and manage CheckpointMarks, and this stateful 
operation will be followed by a shuffle: 
https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/MapWithStateDStream.scala#L159

Since the stateful read maps "splitSource" -> "partition of a list of read 
values", the following shuffle won't benefit in any way (the list of read 
values has not been flatMapped yet). In order to avoid shuffle we need to set 
the input RDD ({{SourceRDD.Unbounded}}) partitioner to be a default 
{{HashPartitioner}} since {{mapWithState}} would use the same partitioner and 
will skip shuffle if the partitioners match.

It would be wise to shuffle the read values _after_ flatmap.

I will break this into two tasks:
# Set default-partitioner to the input RDD.
# Shuffle (using Coders) the input.)

> A better shuffle after reading from within mapWithState.
> 
>
> Key: BEAM-848
> URL: https://issues.apache.org/jira/browse/BEAM-848
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Aviem Zur
>
> It would be wise to shuffle the read values _after_ flatmap to increase 
> parallelism in processing of the data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1074) Set default-partitioner in SourceRDD.Unbounded.

2017-03-27 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1074:

Description: 
The SparkRunner uses {{mapWithState}} to read and manage CheckpointMarks, and 
this stateful operation will be followed by a shuffle: 
https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/MapWithStateDStream.scala#L159

Since the stateful read maps "splitSource" -> "partition of a list of read 
values", the following shuffle won't benefit in any way (the list of read 
values has not been flatMapped yet). In order to avoid shuffle we need to set 
the input RDD ({{SourceRDD.Unbounded}}) partitioner to be a default 
{{HashPartitioner}} since {{mapWithState}} would use the same partitioner and 
will skip shuffle if the partitioners match.

  was:This will make sure the following stateful read within {{mapWithState}} 
won't shuffle the read values as long as they are grouped in a {{List}}.


> Set default-partitioner in SourceRDD.Unbounded.
> ---
>
> Key: BEAM-1074
> URL: https://issues.apache.org/jira/browse/BEAM-1074
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Aviem Zur
> Fix For: First stable release
>
>
> The SparkRunner uses {{mapWithState}} to read and manage CheckpointMarks, and 
> this stateful operation will be followed by a shuffle: 
> https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/MapWithStateDStream.scala#L159
> Since the stateful read maps "splitSource" -> "partition of a list of read 
> values", the following shuffle won't benefit in any way (the list of read 
> values has not been flatMapped yet). In order to avoid shuffle we need to set 
> the input RDD ({{SourceRDD.Unbounded}}) partitioner to be a default 
> {{HashPartitioner}} since {{mapWithState}} would use the same partitioner and 
> will skip shuffle if the partitioners match.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1074) Set default-partitioner in SourceRDD.Unbounded.

2017-03-27 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1074:

Issue Type: Improvement  (was: Sub-task)
Parent: (was: BEAM-848)

> Set default-partitioner in SourceRDD.Unbounded.
> ---
>
> Key: BEAM-1074
> URL: https://issues.apache.org/jira/browse/BEAM-1074
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Aviem Zur
> Fix For: First stable release
>
>
> This will make sure the following stateful read within {{mapWithState}} won't 
> shuffle the read values as long as they are grouped in a {{List}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Build failed in Jenkins: beam_PostCommit_Java_RunnableOnService_Flink #2090

2017-03-27 Thread Apache Jenkins Server
See 


--
[...truncated 216.05 KB...]
 x [deleted] (none) -> origin/pr/942/head
 x [deleted] (none) -> origin/pr/942/merge
 x [deleted] (none) -> origin/pr/943/head
 x [deleted] (none) -> origin/pr/943/merge
 x [deleted] (none) -> origin/pr/944/head
 x [deleted] (none) -> origin/pr/945/head
 x [deleted] (none) -> origin/pr/945/merge
 x [deleted] (none) -> origin/pr/946/head
 x [deleted] (none) -> origin/pr/946/merge
 x [deleted] (none) -> origin/pr/947/head
 x [deleted] (none) -> origin/pr/947/merge
 x [deleted] (none) -> origin/pr/948/head
 x [deleted] (none) -> origin/pr/948/merge
 x [deleted] (none) -> origin/pr/949/head
 x [deleted] (none) -> origin/pr/949/merge
 x [deleted] (none) -> origin/pr/95/head
 x [deleted] (none) -> origin/pr/95/merge
 x [deleted] (none) -> origin/pr/950/head
 x [deleted] (none) -> origin/pr/951/head
 x [deleted] (none) -> origin/pr/951/merge
 x [deleted] (none) -> origin/pr/952/head
 x [deleted] (none) -> origin/pr/952/merge
 x [deleted] (none) -> origin/pr/953/head
 x [deleted] (none) -> origin/pr/954/head
 x [deleted] (none) -> origin/pr/954/merge
 x [deleted] (none) -> origin/pr/955/head
 x [deleted] (none) -> origin/pr/955/merge
 x [deleted] (none) -> origin/pr/956/head
 x [deleted] (none) -> origin/pr/957/head
 x [deleted] (none) -> origin/pr/958/head
 x [deleted] (none) -> origin/pr/959/head
 x [deleted] (none) -> origin/pr/959/merge
 x [deleted] (none) -> origin/pr/96/head
 x [deleted] (none) -> origin/pr/96/merge
 x [deleted] (none) -> origin/pr/960/head
 x [deleted] (none) -> origin/pr/960/merge
 x [deleted] (none) -> origin/pr/961/head
 x [deleted] (none) -> origin/pr/962/head
 x [deleted] (none) -> origin/pr/962/merge
 x [deleted] (none) -> origin/pr/963/head
 x [deleted] (none) -> origin/pr/963/merge
 x [deleted] (none) -> origin/pr/964/head
 x [deleted] (none) -> origin/pr/965/head
 x [deleted] (none) -> origin/pr/965/merge
 x [deleted] (none) -> origin/pr/966/head
 x [deleted] (none) -> origin/pr/967/head
 x [deleted] (none) -> origin/pr/967/merge
 x [deleted] (none) -> origin/pr/968/head
 x [deleted] (none) -> origin/pr/968/merge
 x [deleted] (none) -> origin/pr/969/head
 x [deleted] (none) -> origin/pr/969/merge
 x [deleted] (none) -> origin/pr/97/head
 x [deleted] (none) -> origin/pr/97/merge
 x [deleted] (none) -> origin/pr/970/head
 x [deleted] (none) -> origin/pr/970/merge
 x [deleted] (none) -> origin/pr/971/head
 x [deleted] (none) -> origin/pr/971/merge
 x [deleted] (none) -> origin/pr/972/head
 x [deleted] (none) -> origin/pr/973/head
 x [deleted] (none) -> origin/pr/974/head
 x [deleted] (none) -> origin/pr/974/merge
 x [deleted] (none) -> origin/pr/975/head
 x [deleted] (none) -> origin/pr/975/merge
 x [deleted] (none) -> origin/pr/976/head
 x [deleted] (none) -> origin/pr/976/merge
 x [deleted] (none) -> origin/pr/977/head
 x [deleted] (none) -> origin/pr/977/merge
 x [deleted] (none) -> origin/pr/978/head
 x [deleted] (none) -> origin/pr/978/merge
 x [deleted] (none) -> origin/pr/979/head
 x [deleted] (none) -> origin/pr/979/merge
 x [deleted] (none) -> origin/pr/98/head
 x [deleted] (none) -> origin/pr/980/head
 x [deleted] (none) -> origin/pr/980/merge
 x [deleted] (none) -> origin/pr/981/head
 x [deleted] (none) -> origin/pr/982/head
 x [deleted] (none) -> origin/pr/982/merge
 x [deleted] (none) -> origin/pr/983/head
 x [deleted] (none) -> origin/pr/983/merge
 x [deleted] (none) -> origin/pr/984/head
 x [deleted] (none) -> origin/pr/984/merge
 x [deleted] (none) -> origin/pr/985/head
 x [deleted] (none) -> origin/pr/985/merge
 x [deleted] (none) -> origin/pr/986/head
 x [deleted] (none) -> origin/pr/986/merge
 x [deleted] (none) -> origin/pr/987/head
 x [deleted] (none) -> origin/pr/988/head
 x [deleted] (none) -> origin/pr/988/merge
 x [deleted] (none) -> 

Build failed in Jenkins: beam_PostCommit_Java_RunnableOnService_Spark #1404

2017-03-27 Thread Apache Jenkins Server
See 


--
[...truncated 216.27 KB...]
 x [deleted] (none) -> origin/pr/942/head
 x [deleted] (none) -> origin/pr/942/merge
 x [deleted] (none) -> origin/pr/943/head
 x [deleted] (none) -> origin/pr/943/merge
 x [deleted] (none) -> origin/pr/944/head
 x [deleted] (none) -> origin/pr/945/head
 x [deleted] (none) -> origin/pr/945/merge
 x [deleted] (none) -> origin/pr/946/head
 x [deleted] (none) -> origin/pr/946/merge
 x [deleted] (none) -> origin/pr/947/head
 x [deleted] (none) -> origin/pr/947/merge
 x [deleted] (none) -> origin/pr/948/head
 x [deleted] (none) -> origin/pr/948/merge
 x [deleted] (none) -> origin/pr/949/head
 x [deleted] (none) -> origin/pr/949/merge
 x [deleted] (none) -> origin/pr/95/head
 x [deleted] (none) -> origin/pr/95/merge
 x [deleted] (none) -> origin/pr/950/head
 x [deleted] (none) -> origin/pr/951/head
 x [deleted] (none) -> origin/pr/951/merge
 x [deleted] (none) -> origin/pr/952/head
 x [deleted] (none) -> origin/pr/952/merge
 x [deleted] (none) -> origin/pr/953/head
 x [deleted] (none) -> origin/pr/954/head
 x [deleted] (none) -> origin/pr/954/merge
 x [deleted] (none) -> origin/pr/955/head
 x [deleted] (none) -> origin/pr/955/merge
 x [deleted] (none) -> origin/pr/956/head
 x [deleted] (none) -> origin/pr/957/head
 x [deleted] (none) -> origin/pr/958/head
 x [deleted] (none) -> origin/pr/959/head
 x [deleted] (none) -> origin/pr/959/merge
 x [deleted] (none) -> origin/pr/96/head
 x [deleted] (none) -> origin/pr/96/merge
 x [deleted] (none) -> origin/pr/960/head
 x [deleted] (none) -> origin/pr/960/merge
 x [deleted] (none) -> origin/pr/961/head
 x [deleted] (none) -> origin/pr/962/head
 x [deleted] (none) -> origin/pr/962/merge
 x [deleted] (none) -> origin/pr/963/head
 x [deleted] (none) -> origin/pr/963/merge
 x [deleted] (none) -> origin/pr/964/head
 x [deleted] (none) -> origin/pr/965/head
 x [deleted] (none) -> origin/pr/965/merge
 x [deleted] (none) -> origin/pr/966/head
 x [deleted] (none) -> origin/pr/967/head
 x [deleted] (none) -> origin/pr/967/merge
 x [deleted] (none) -> origin/pr/968/head
 x [deleted] (none) -> origin/pr/968/merge
 x [deleted] (none) -> origin/pr/969/head
 x [deleted] (none) -> origin/pr/969/merge
 x [deleted] (none) -> origin/pr/97/head
 x [deleted] (none) -> origin/pr/97/merge
 x [deleted] (none) -> origin/pr/970/head
 x [deleted] (none) -> origin/pr/970/merge
 x [deleted] (none) -> origin/pr/971/head
 x [deleted] (none) -> origin/pr/971/merge
 x [deleted] (none) -> origin/pr/972/head
 x [deleted] (none) -> origin/pr/973/head
 x [deleted] (none) -> origin/pr/974/head
 x [deleted] (none) -> origin/pr/974/merge
 x [deleted] (none) -> origin/pr/975/head
 x [deleted] (none) -> origin/pr/975/merge
 x [deleted] (none) -> origin/pr/976/head
 x [deleted] (none) -> origin/pr/976/merge
 x [deleted] (none) -> origin/pr/977/head
 x [deleted] (none) -> origin/pr/977/merge
 x [deleted] (none) -> origin/pr/978/head
 x [deleted] (none) -> origin/pr/978/merge
 x [deleted] (none) -> origin/pr/979/head
 x [deleted] (none) -> origin/pr/979/merge
 x [deleted] (none) -> origin/pr/98/head
 x [deleted] (none) -> origin/pr/980/head
 x [deleted] (none) -> origin/pr/980/merge
 x [deleted] (none) -> origin/pr/981/head
 x [deleted] (none) -> origin/pr/982/head
 x [deleted] (none) -> origin/pr/982/merge
 x [deleted] (none) -> origin/pr/983/head
 x [deleted] (none) -> origin/pr/983/merge
 x [deleted] (none) -> origin/pr/984/head
 x [deleted] (none) -> origin/pr/984/merge
 x [deleted] (none) -> origin/pr/985/head
 x [deleted] (none) -> origin/pr/985/merge
 x [deleted] (none) -> origin/pr/986/head
 x [deleted] (none) -> origin/pr/986/merge
 x [deleted] (none) -> origin/pr/987/head
 x [deleted] (none) -> origin/pr/988/head
 x [deleted] (none) -> origin/pr/988/merge
 x [deleted] (none) -> 

Build failed in Jenkins: beam_PostCommit_Java_MavenInstall #3062

2017-03-27 Thread Apache Jenkins Server
See 


--
[...truncated 219.30 KB...]
 x [deleted] (none) -> origin/pr/972/head
 x [deleted] (none) -> origin/pr/973/head
 x [deleted] (none) -> origin/pr/974/head
 x [deleted] (none) -> origin/pr/974/merge
 x [deleted] (none) -> origin/pr/975/head
 x [deleted] (none) -> origin/pr/975/merge
 x [deleted] (none) -> origin/pr/976/head
 x [deleted] (none) -> origin/pr/976/merge
 x [deleted] (none) -> origin/pr/977/head
 x [deleted] (none) -> origin/pr/977/merge
 x [deleted] (none) -> origin/pr/978/head
 x [deleted] (none) -> origin/pr/978/merge
 x [deleted] (none) -> origin/pr/979/head
 x [deleted] (none) -> origin/pr/979/merge
 x [deleted] (none) -> origin/pr/98/head
 x [deleted] (none) -> origin/pr/980/head
 x [deleted] (none) -> origin/pr/980/merge
 x [deleted] (none) -> origin/pr/981/head
 x [deleted] (none) -> origin/pr/982/head
 x [deleted] (none) -> origin/pr/982/merge
 x [deleted] (none) -> origin/pr/983/head
 x [deleted] (none) -> origin/pr/983/merge
 x [deleted] (none) -> origin/pr/984/head
 x [deleted] (none) -> origin/pr/984/merge
 x [deleted] (none) -> origin/pr/985/head
 x [deleted] (none) -> origin/pr/985/merge
 x [deleted] (none) -> origin/pr/986/head
 x [deleted] (none) -> origin/pr/986/merge
 x [deleted] (none) -> origin/pr/987/head
 x [deleted] (none) -> origin/pr/988/head
 x [deleted] (none) -> origin/pr/988/merge
 x [deleted] (none) -> origin/pr/989/head
 x [deleted] (none) -> origin/pr/989/merge
 x [deleted] (none) -> origin/pr/99/head
 x [deleted] (none) -> origin/pr/99/merge
 x [deleted] (none) -> origin/pr/990/head
 x [deleted] (none) -> origin/pr/990/merge
 x [deleted] (none) -> origin/pr/991/head
 x [deleted] (none) -> origin/pr/991/merge
 x [deleted] (none) -> origin/pr/992/head
 x [deleted] (none) -> origin/pr/992/merge
 x [deleted] (none) -> origin/pr/993/head
 x [deleted] (none) -> origin/pr/993/merge
 x [deleted] (none) -> origin/pr/994/head
 x [deleted] (none) -> origin/pr/994/merge
 x [deleted] (none) -> origin/pr/995/head
 x [deleted] (none) -> origin/pr/995/merge
 x [deleted] (none) -> origin/pr/996/head
 x [deleted] (none) -> origin/pr/996/merge
 x [deleted] (none) -> origin/pr/997/head
 x [deleted] (none) -> origin/pr/997/merge
 x [deleted] (none) -> origin/pr/998/head
 x [deleted] (none) -> origin/pr/999/head
 x [deleted] (none) -> origin/pr/999/merge
remote: Counting objects: 1408, done.
remote: Compressing objects:   2% (1/38)   remote: Compressing objects: 
  5% (2/38)   remote: Compressing objects:   7% (3/38)   
remote: Compressing objects:  10% (4/38)   remote: Compressing objects: 
 13% (5/38)   remote: Compressing objects:  15% (6/38)   
remote: Compressing objects:  18% (7/38)   remote: Compressing objects: 
 21% (8/38)   remote: Compressing objects:  23% (9/38)   
remote: Compressing objects:  26% (10/38)   remote: Compressing 
objects:  28% (11/38)   remote: Compressing objects:  31% (12/38)   
remote: Compressing objects:  34% (13/38)   remote: Compressing 
objects:  36% (14/38)   remote: Compressing objects:  39% (15/38)   
remote: Compressing objects:  42% (16/38)   remote: Compressing 
objects:  44% (17/38)   remote: Compressing objects:  47% (18/38)   
remote: Compressing objects:  50% (19/38)   remote: Compressing 
objects:  52% (20/38)   remote: Compressing objects:  55% (21/38)   
remote: Compressing objects:  57% (22/38)   remote: Compressing 
objects:  60% (23/38)   remote: Compressing objects:  63% (24/38)   
remote: Compressing objects:  65% (25/38)   remote: Compressing 
objects:  68% (26/38)   remote: Compressing objects:  71% (27/38)   
remote: Compressing objects:  73% (28/38)   remote: Compressing 
objects:  76% (29/38)   remote: Compressing objects:  78% (30/38)   
remote: Compressing objects:  81% (31/38)   remote: Compressing 
objects:  84% (32/38)   remote: Compressing objects:  86% (33/38)   
remote: Compressing objects:  89% (34/38)   remote: Compressing 
objects:  92% (35/38)   remote: 

[jira] [Resolved] (BEAM-1792) Spark runner uses its own filtering logic to match metrics

2017-03-27 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur resolved BEAM-1792.
-
   Resolution: Fixed
Fix Version/s: First stable release

> Spark runner uses its own filtering logic to match metrics
> --
>
> Key: BEAM-1792
> URL: https://issues.apache.org/jira/browse/BEAM-1792
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
> Fix For: First stable release
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1792) Spark runner uses its own filtering logic to match metrics

2017-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944438#comment-15944438
 ] 

ASF GitHub Bot commented on BEAM-1792:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2304


> Spark runner uses its own filtering logic to match metrics
> --
>
> Key: BEAM-1792
> URL: https://issues.apache.org/jira/browse/BEAM-1792
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2304: [BEAM-1792] Changing Spark to use MetricFiltering

2017-03-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2304


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: [BEAM-1792] Use MetricFiltering in Spark runner.

2017-03-27 Thread aviemzur
Repository: beam
Updated Branches:
  refs/heads/master fe441e34b -> 85b820c37


[BEAM-1792] Use MetricFiltering in Spark runner.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/241ded90
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/241ded90
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/241ded90

Branch: refs/heads/master
Commit: 241ded9022a9214c1d0768b1cb3c7a740a409873
Parents: fe441e3
Author: Pablo 
Authored: Fri Mar 24 10:48:43 2017 -0700
Committer: Aviem Zur 
Committed: Tue Mar 28 05:51:14 2017 +0300

--
 .../spark/metrics/SparkMetricResults.java   | 40 +---
 1 file changed, 2 insertions(+), 38 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/241ded90/runners/spark/src/main/java/org/apache/beam/runners/spark/metrics/SparkMetricResults.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/metrics/SparkMetricResults.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/metrics/SparkMetricResults.java
index c02027a..faf4c52 100644
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/metrics/SparkMetricResults.java
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/metrics/SparkMetricResults.java
@@ -19,17 +19,15 @@
 package org.apache.beam.runners.spark.metrics;
 
 import com.google.common.base.Function;
-import com.google.common.base.Objects;
 import com.google.common.base.Predicate;
 import com.google.common.collect.FluentIterable;
-import java.util.Set;
 import org.apache.beam.sdk.metrics.DistributionData;
 import org.apache.beam.sdk.metrics.DistributionResult;
 import org.apache.beam.sdk.metrics.GaugeData;
 import org.apache.beam.sdk.metrics.GaugeResult;
+import org.apache.beam.sdk.metrics.MetricFiltering;
 import org.apache.beam.sdk.metrics.MetricKey;
 import org.apache.beam.sdk.metrics.MetricName;
-import org.apache.beam.sdk.metrics.MetricNameFilter;
 import org.apache.beam.sdk.metrics.MetricQueryResults;
 import org.apache.beam.sdk.metrics.MetricResult;
 import org.apache.beam.sdk.metrics.MetricResults;
@@ -88,44 +86,10 @@ public class SparkMetricResults extends MetricResults {
   return new Predicate() {
 @Override
 public boolean apply(MetricUpdate metricResult) {
-  return matches(filter, metricResult.getKey());
+  return MetricFiltering.matches(filter, metricResult.getKey());
 }
   };
 }
-
-private boolean matches(MetricsFilter filter, MetricKey key) {
-  return matchesName(key.metricName(), filter.names())
-  && matchesScope(key.stepName(), filter.steps());
-}
-
-private boolean matchesName(MetricName metricName, Set 
nameFilters) {
-  if (nameFilters.isEmpty()) {
-return true;
-  }
-
-  for (MetricNameFilter nameFilter : nameFilters) {
-if ((nameFilter.getName() == null || 
nameFilter.getName().equals(metricName.name()))
-&& Objects.equal(metricName.namespace(), 
nameFilter.getNamespace())) {
-  return true;
-}
-  }
-
-  return false;
-}
-
-private boolean matchesScope(String actualScope, Set scopes) {
-  if (scopes.isEmpty() || scopes.contains(actualScope)) {
-return true;
-  }
-
-  for (String scope : scopes) {
-if (actualScope.startsWith(scope)) {
-  return true;
-}
-  }
-
-  return false;
-}
   }
 
   private static final Function



[2/2] beam git commit: This closes #2304

2017-03-27 Thread aviemzur
This closes #2304


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/85b820c3
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/85b820c3
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/85b820c3

Branch: refs/heads/master
Commit: 85b820c3799ff292873acfd5d456c1f3c4321ae9
Parents: fe441e3 241ded9
Author: Aviem Zur 
Authored: Tue Mar 28 05:55:49 2017 +0300
Committer: Aviem Zur 
Committed: Tue Mar 28 05:55:49 2017 +0300

--
 .../spark/metrics/SparkMetricResults.java   | 40 +---
 1 file changed, 2 insertions(+), 38 deletions(-)
--




Jenkins build became unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #2660

2017-03-27 Thread Apache Jenkins Server
See 




Jenkins build is back to normal : beam_PostCommit_Java_MavenInstall #3061

2017-03-27 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-1793) Frequent python post commit errors

2017-03-27 Thread Ahmet Altay (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944369#comment-15944369
 ] 

Ahmet Altay commented on BEAM-1793:
---

Another error after the change:
https://builds.apache.org/job/beam_PostCommit_Python_Verify/1644/consoleFull

Is there a way to verify that it ran one of the pinned nodes?

> Frequent python post commit errors
> --
>
> Key: BEAM-1793
> URL: https://issues.apache.org/jira/browse/BEAM-1793
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Sourabh Bajaj
>Priority: Critical
>
> 1. Failed because virtualenv was already installed. And postcommit script 
> made a wrong assumption about its location.
> https://builds.apache.org/view/Beam/job/beam_PostCommit_Python_Verify/1499/consoleFull
> 2. Failed because a really old version of pip is installed:
> https://builds.apache.org/view/Beam/job/beam_PostCommit_Python_Verify/1585/consoleFull
> (Also #1586 and #1597)
> 3. Failed because a file was already in the cache and pip did not want to 
> override it, even though this is not usually a problem:
> https://builds.apache.org/view/Beam/job/beam_PostCommit_Python_Verify/1596/consoleFull
> (Might be related: https://issues.apache.org/jira/browse/BEAM-1788)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Build failed in Jenkins: beam_PostCommit_Python_Verify #1644

2017-03-27 Thread Apache Jenkins Server
See 


Changes:

[altay] Add region option to Dataflow pipeline options.

--
[...truncated 505.83 KB...]
test_getitem_duplicates_ignored 
(apache_beam.typehints.typehints_test.UnionHintTestCase) ... ok
test_getitem_must_be_valid_type_param 
(apache_beam.typehints.typehints_test.UnionHintTestCase) ... ok
test_getitem_must_be_valid_type_param_cant_be_object_instance 
(apache_beam.typehints.typehints_test.UnionHintTestCase) ... ok
test_getitem_nested_unions_flattened 
(apache_beam.typehints.typehints_test.UnionHintTestCase) ... ok
test_nested_compatibility 
(apache_beam.typehints.typehints_test.UnionHintTestCase) ... ok
test_union_hint_compatibility 
(apache_beam.typehints.typehints_test.UnionHintTestCase) ... ok
test_union_hint_enforcement_composite_type_in_union 
(apache_beam.typehints.typehints_test.UnionHintTestCase) ... ok
test_union_hint_enforcement_not_part_of_union 
(apache_beam.typehints.typehints_test.UnionHintTestCase) ... ok
test_union_hint_enforcement_part_of_union 
(apache_beam.typehints.typehints_test.UnionHintTestCase) ... ok
test_union_hint_repr (apache_beam.typehints.typehints_test.UnionHintTestCase) 
... ok
test_deprecated_with_since_current 
(apache_beam.utils.annotations_test.AnnotationTests) ... ok
test_deprecated_without_current 
(apache_beam.utils.annotations_test.AnnotationTests) ... ok
test_deprecated_without_since_should_fail 
(apache_beam.utils.annotations_test.AnnotationTests) ... ok
test_experimental_with_current 
(apache_beam.utils.annotations_test.AnnotationTests) ... ok
test_experimental_without_current 
(apache_beam.utils.annotations_test.AnnotationTests) ... ok
test_frequency (apache_beam.utils.annotations_test.AnnotationTests)
Tests that the filter 'once' is sufficient to print once per ... ok
test_gcs_path (apache_beam.utils.path_test.Path) ... ok
test_unix_path (apache_beam.utils.path_test.Path) ... ok
test_windows_path (apache_beam.utils.path_test.Path) ... ok
test_dataflow_job_file 
(apache_beam.utils.pipeline_options_test.PipelineOptionsTest) ... ok
test_display_data (apache_beam.utils.pipeline_options_test.PipelineOptionsTest) 
... ok
test_experiments (apache_beam.utils.pipeline_options_test.PipelineOptionsTest) 
... ok
test_extra_package 
(apache_beam.utils.pipeline_options_test.PipelineOptionsTest) ... ok
test_from_dictionary 
(apache_beam.utils.pipeline_options_test.PipelineOptionsTest) ... ok
test_get_all_options 
(apache_beam.utils.pipeline_options_test.PipelineOptionsTest) ... ok
test_option_with_space 
(apache_beam.utils.pipeline_options_test.PipelineOptionsTest) ... ok
test_override_options 
(apache_beam.utils.pipeline_options_test.PipelineOptionsTest) ... ok
test_redefine_options 
(apache_beam.utils.pipeline_options_test.PipelineOptionsTest) ... ok
test_template_location 
(apache_beam.utils.pipeline_options_test.PipelineOptionsTest) ... ok
test_dataflow_job_file_and_template_location_mutually_exclusive 
(apache_beam.utils.pipeline_options_validator_test.SetupTest) ... ok
test_gcs_path (apache_beam.utils.pipeline_options_validator_test.SetupTest) ... 
ok
test_is_service_runner 
(apache_beam.utils.pipeline_options_validator_test.SetupTest) ... ok
test_job_name (apache_beam.utils.pipeline_options_validator_test.SetupTest) ... 
ok
test_local_runner (apache_beam.utils.pipeline_options_validator_test.SetupTest) 
... ok
test_missing_required_options 
(apache_beam.utils.pipeline_options_validator_test.SetupTest) ... ok
test_num_workers (apache_beam.utils.pipeline_options_validator_test.SetupTest) 
... ok
test_project (apache_beam.utils.pipeline_options_validator_test.SetupTest) ... 
ok
test_streaming (apache_beam.utils.pipeline_options_validator_test.SetupTest) 
... ok
test_test_matcher (apache_beam.utils.pipeline_options_validator_test.SetupTest) 
... ok
test_validate_dataflow_job_file 
(apache_beam.utils.pipeline_options_validator_test.SetupTest) ... ok
test_validate_template_location 
(apache_beam.utils.pipeline_options_validator_test.SetupTest) ... ok
test_method_forwarding_not_windows (apache_beam.utils.processes_test.Exec) ... 
ok
test_method_forwarding_windows (apache_beam.utils.processes_test.Exec) ... ok
test_call_two_objects (apache_beam.utils.retry_test.RetryStateTest) ... ok
test_single_failure (apache_beam.utils.retry_test.RetryStateTest) ... ok
test_two_failures (apache_beam.utils.retry_test.RetryStateTest) ... ok
test_log_calls_for_permanent_failure (apache_beam.utils.retry_test.RetryTest) 
... ok
test_log_calls_for_transient_failure (apache_beam.utils.retry_test.RetryTest) 
... ok
test_with_default_number_of_retries (apache_beam.utils.retry_test.RetryTest) 
... ok
test_with_explicit_decorator (apache_beam.utils.retry_test.RetryTest) ... ok
test_with_explicit_initial_delay (apache_beam.utils.retry_test.RetryTest) ... ok
test_with_explicit_number_of_retries (apache_beam.utils.retry_test.RetryTest) 
... ok

[jira] [Commented] (BEAM-1809) Fix one error in TravisCI build

2017-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944358#comment-15944358
 ] 

ASF GitHub Bot commented on BEAM-1809:
--

Github user wtanaka closed the pull request at:

https://github.com/apache/beam/pull/2326


> Fix one error in TravisCI build
> ---
>
> Key: BEAM-1809
> URL: https://issues.apache.org/jira/browse/BEAM-1809
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Affects Versions: 0.6.0
>Reporter: Wesley Tanaka
>Assignee: Davor Bonaci
>
> TravisCI builds are failing on:
> 2017-03-25T17:49:22.852 [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-checkstyle-plugin:2.17:check (default) on 
> project beam-sdks-parent: Failed during checkstyle execution: Unable to find 
> suppressions file at location: beam/suppressions.xml: Could not find resource 
> 'beam/suppressions.xml'. -> [Help 1]
> e.g. https://travis-ci.org/apache/beam/jobs/215023943
> See https://github.com/apache/beam/pull/2326



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2326: [BEAM-1809] Make TravisCI builds less broken

2017-03-27 Thread wtanaka
Github user wtanaka closed the pull request at:

https://github.com/apache/beam/pull/2326


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #2342: Add region option to Dataflow pipeline options.

2017-03-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2342


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: Add region option to Dataflow pipeline options.

2017-03-27 Thread altay
Repository: beam
Updated Branches:
  refs/heads/master e58155263 -> fe441e34b


Add region option to Dataflow pipeline options.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/c8d251fc
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/c8d251fc
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/c8d251fc

Branch: refs/heads/master
Commit: c8d251fc682489909be387b6270a3c053ab3f187
Parents: e581552
Author: Ahmet Altay 
Authored: Mon Mar 27 17:09:34 2017 -0700
Committer: Ahmet Altay 
Committed: Mon Mar 27 18:10:49 2017 -0700

--
 .../runners/dataflow/internal/apiclient.py  | 35 +++-
 .../apache_beam/utils/pipeline_options.py   | 11 +-
 2 files changed, 30 insertions(+), 16 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/c8d251fc/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
--
diff --git a/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py 
b/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
index f7daed0..6fa2f26 100644
--- a/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
+++ b/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
@@ -451,11 +451,12 @@ class DataflowApplicationClient(object):
 
   @retry.with_exponential_backoff(num_retries=3, initial_delay_secs=3)
   def get_job_metrics(self, job_id):
-request = dataflow.DataflowProjectsJobsGetMetricsRequest()
+request = dataflow.DataflowProjectsLocationsJobsGetMetricsRequest()
 request.jobId = job_id
+request.location = self.google_cloud_options.region
 request.projectId = self.google_cloud_options.project
 try:
-  response = self._client.projects_jobs.GetMetrics(request)
+  response = self._client.projects_locations_jobs.GetMetrics(request)
 except exceptions.BadStatusCodeError as e:
   logging.error('HTTP status %d. Unable to query metrics',
 e.response.status)
@@ -464,12 +465,13 @@ class DataflowApplicationClient(object):
 
   def submit_job_description(self, job):
 """Creates and excutes a job request."""
-request = dataflow.DataflowProjectsJobsCreateRequest()
+request = dataflow.DataflowProjectsLocationsJobsCreateRequest()
 request.projectId = self.google_cloud_options.project
+request.location = self.google_cloud_options.region
 request.job = job.proto
 
 try:
-  response = self._client.projects_jobs.Create(request)
+  response = self._client.projects_locations_jobs.Create(request)
 except exceptions.BadStatusCodeError as e:
   logging.error('HTTP status %d trying to create job'
 ' at dataflow service endpoint %s',
@@ -509,9 +511,10 @@ class DataflowApplicationClient(object):
   # Other states could only be set by the service.
   return False
 
-request = dataflow.DataflowProjectsJobsUpdateRequest()
+request = dataflow.DataflowProjectsLocationsJobsUpdateRequest()
 request.jobId = job_id
 request.projectId = self.google_cloud_options.project
+request.location = self.google_cloud_options.region
 request.job = dataflow.Job(requestedState=new_state)
 
 self._client.projects_jobs.Update(request)
@@ -539,10 +542,11 @@ class DataflowApplicationClient(object):
 (e.g. '2015-03-10T00:01:53.074Z')
   currentStateTime: UTC time for the current state of the job.
 """
-request = dataflow.DataflowProjectsJobsGetRequest()
+request = dataflow.DataflowProjectsLocationsJobsGetRequest()
 request.jobId = job_id
 request.projectId = self.google_cloud_options.project
-response = self._client.projects_jobs.Get(request)
+request.location = self.google_cloud_options.region
+response = self._client.projects_locations_jobs.Get(request)
 return response
 
   @retry.with_exponential_backoff()  # Using retry defaults from utils/retry.py
@@ -588,8 +592,9 @@ class DataflowApplicationClient(object):
 JOB_MESSAGE_WARNING, JOB_MESSAGE_ERROR.
  messageText: A message string.
 """
-request = dataflow.DataflowProjectsJobsMessagesListRequest(
-jobId=job_id, projectId=self.google_cloud_options.project)
+request = dataflow.DataflowProjectsLocationsJobsMessagesListRequest(
+jobId=job_id, location=self.google_cloud_options.region,
+projectId=self.google_cloud_options.project)
 if page_token is not None:
   request.pageToken = page_token
 if start_time is not None:
@@ -599,34 +604,34 @@ class DataflowApplicationClient(object):
 if minimum_importance is not None:
   if minimum_importance == 'JOB_MESSAGE_DEBUG':
 request.minimumImportance = (
-

[2/2] beam git commit: This closes #2342

2017-03-27 Thread altay
This closes #2342


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/fe441e34
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/fe441e34
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/fe441e34

Branch: refs/heads/master
Commit: fe441e34bbb2e0ddb438c72f9801533e7ddb29c4
Parents: e581552 c8d251f
Author: Ahmet Altay 
Authored: Mon Mar 27 18:11:11 2017 -0700
Committer: Ahmet Altay 
Committed: Mon Mar 27 18:11:11 2017 -0700

--
 .../runners/dataflow/internal/apiclient.py  | 35 +++-
 .../apache_beam/utils/pipeline_options.py   | 11 +-
 2 files changed, 30 insertions(+), 16 deletions(-)
--




Build failed in Jenkins: beam_PostCommit_Java_MavenInstall #3060

2017-03-27 Thread Apache Jenkins Server
See 


Changes:

[altay] Querying of both structured and unstructured metrics in dataflow.

--
[...truncated 1.40 MB...]
2017-03-28T00:59:33.348 [INFO] urls[20] = 

2017-03-28T00:59:33.348 [INFO] Number of foreign imports: 1
2017-03-28T00:59:33.348 [INFO] import: Entry[import  from realm 
ClassRealm[maven.api, parent: null]]
2017-03-28T00:59:33.348 [INFO] 
2017-03-28T00:59:33.348 [INFO] 
-
2017-03-28T00:59:33.348 [INFO] 
2017-03-28T00:59:33.348 [INFO]  at 
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:168)
2017-03-28T00:59:33.348 [INFO]  at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
2017-03-28T00:59:33.348 [INFO]  ... 20 more
2017-03-28T00:59:33.348 [INFO] Caused by: 
org.apache.maven.plugin.PluginContainerException: A required class was missing 
while executing org.apache.maven.plugins:maven-shade-plugin:2.4.1:shade: 
org/codehaus/plexus/util/xml/XmlStreamWriter
2017-03-28T00:59:33.348 [INFO] 
-
2017-03-28T00:59:33.348 [INFO] realm =
plugin>org.apache.maven.plugins:maven-shade-plugin:2.4.1
2017-03-28T00:59:33.348 [INFO] strategy = 
org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy
2017-03-28T00:59:33.348 [INFO] urls[0] = 

2017-03-28T00:59:33.348 [INFO] urls[1] = 

2017-03-28T00:59:33.348 [INFO] urls[2] = 

2017-03-28T00:59:33.348 [INFO] urls[3] = 

2017-03-28T00:59:33.348 [INFO] urls[4] = 

2017-03-28T00:59:33.348 [INFO] urls[5] = 

2017-03-28T00:59:33.348 [INFO] urls[6] = 

2017-03-28T00:59:33.348 [INFO] urls[7] = 

2017-03-28T00:59:33.348 [INFO] urls[8] = 

2017-03-28T00:59:33.348 [INFO] urls[9] = 

2017-03-28T00:59:33.348 [INFO] urls[10] = 

2017-03-28T00:59:33.348 [INFO] urls[11] = 

2017-03-28T00:59:33.348 [INFO] urls[12] = 

2017-03-28T00:59:33.348 [INFO] urls[13] = 

2017-03-28T00:59:33.348 [INFO] urls[14] = 

2017-03-28T00:59:33.349 [INFO] urls[15] = 

2017-03-28T00:59:33.349 [INFO] urls[16] = 

2017-03-28T00:59:33.349 [INFO] urls[17] = 

2017-03-28T00:59:33.349 [INFO] urls[18] = 

Jenkins build became unstable: beam_PostCommit_Java_MavenInstall #3059

2017-03-27 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-772) Implement Metrics support for Dataflow Runner

2017-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944304#comment-15944304
 ] 

ASF GitHub Bot commented on BEAM-772:
-

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2290


> Implement Metrics support for Dataflow Runner
> -
>
> Key: BEAM-772
> URL: https://issues.apache.org/jira/browse/BEAM-772
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Ben Chambers
>Assignee: Ben Chambers
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2290: [BEAM-772] Querying of both structured and unstruct...

2017-03-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2290


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: Querying of both structured and unstructured metrics in dataflow.

2017-03-27 Thread altay
Repository: beam
Updated Branches:
  refs/heads/master 6d627534b -> e58155263


Querying of both structured and unstructured metrics in dataflow.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/b98bde91
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/b98bde91
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/b98bde91

Branch: refs/heads/master
Commit: b98bde912c2b2dbcb0ce1d30f6af32b7219d831d
Parents: 6d62753
Author: Pablo 
Authored: Wed Mar 22 13:22:32 2017 -0700
Committer: Ahmet Altay 
Committed: Mon Mar 27 17:32:32 2017 -0700

--
 .../runners/dataflow/dataflow_metrics.py| 86 +++-
 .../runners/dataflow/dataflow_metrics_test.py   | 63 --
 .../runners/dataflow/dataflow_runner.py |  2 +-
 3 files changed, 123 insertions(+), 28 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/b98bde91/sdks/python/apache_beam/runners/dataflow/dataflow_metrics.py
--
diff --git a/sdks/python/apache_beam/runners/dataflow/dataflow_metrics.py 
b/sdks/python/apache_beam/runners/dataflow/dataflow_metrics.py
index db5a2bc..f90e3d5 100644
--- a/sdks/python/apache_beam/runners/dataflow/dataflow_metrics.py
+++ b/sdks/python/apache_beam/runners/dataflow/dataflow_metrics.py
@@ -30,25 +30,67 @@ from apache_beam.metrics.metric import MetricResults
 from apache_beam.metrics.metricbase import MetricName
 
 
-# TODO(pabloem)(JIRA-1381) Implement this once metrics are queriable from
-# dataflow service
 class DataflowMetrics(MetricResults):
   """Implementation of MetricResults class for the Dataflow runner."""
 
-  def __init__(self, dataflow_client=None, job_result=None):
+  def __init__(self, dataflow_client=None, job_result=None, job_graph=None):
 """Initialize the Dataflow metrics object.
 
 Args:
   dataflow_client: apiclient.DataflowApplicationClient to interact with the
 dataflow service.
   job_result: DataflowPipelineResult with the state and id information of
-the job
+the job.
+  job_graph: apiclient.Job instance to be able to translate between 
internal
+step names (e.g. "s2"), and user step names (e.g. "split").
 """
 super(DataflowMetrics, self).__init__()
 self._dataflow_client = dataflow_client
 self.job_result = job_result
 self._queried_after_termination = False
 self._cached_metrics = None
+self._job_graph = job_graph
+
+  def _translate_step_name(self, internal_name):
+"""Translate between internal step names (e.g. "s1") and user step 
names."""
+if not self._job_graph:
+  raise ValueError('Could not translate the internal step name.')
+
+try:
+  [step] = [step
+for step in self._job_graph.proto.steps
+if step.name == internal_name]
+  [user_step_name] = [prop.value.string_value
+  for prop in step.properties.additionalProperties
+  if prop.key == 'user_name']
+except ValueError:
+  raise ValueError('Could not translate the internal step name.')
+return user_step_name
+
+  def _get_metric_key(self, metric):
+"""Populate the MetricKey object for a queried metric result."""
+try:
+  # If ValueError is thrown within this try-block, it is because of
+  # one of the following:
+  # 1. Unable to translate the step name. Only happening with improperly
+  #   formatted job graph (unlikely), or step name not being the internal
+  #   step name (only happens for unstructured-named metrics).
+  # 2. Unable to unpack [step] or [namespace]; which should only happen
+  #   for unstructured names.
+  [step] = [prop.value
+for prop in metric.name.context.additionalProperties
+if prop.key == 'step']
+  step = self._translate_step_name(step)
+  [namespace] = [prop.value
+ for prop in metric.name.context.additionalProperties
+ if prop.key == 'namespace']
+  name = metric.name.name
+except ValueError:
+  # An unstructured metric name is "step/namespace/name", but step names
+  # can (and often do) contain slashes. Must only split on the right-most
+  # two slashes, to preserve the full step name.
+  [step, namespace, name] = metric.name.name.rsplit('/', 2)
+return MetricKey(step, MetricName(namespace, name))
 
   def _populate_metric_results(self, response):
 """Take a list of metrics, and convert it to a list of MetricResult."""
@@ -59,29 +101,31 @@ class DataflowMetrics(MetricResults):
 # Get the tentative/committed versions of every metric together.
 metrics_by_name = defaultdict(lambda: {})
 for metric in 

[2/2] beam git commit: This closes #2290

2017-03-27 Thread altay
This closes #2290


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/e5815526
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/e5815526
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/e5815526

Branch: refs/heads/master
Commit: e58155263ecff4b5533fda0033da4d4cecb66c5e
Parents: 6d62753 b98bde9
Author: Ahmet Altay 
Authored: Mon Mar 27 17:32:59 2017 -0700
Committer: Ahmet Altay 
Committed: Mon Mar 27 17:32:59 2017 -0700

--
 .../runners/dataflow/dataflow_metrics.py| 86 +++-
 .../runners/dataflow/dataflow_metrics_test.py   | 63 --
 .../runners/dataflow/dataflow_runner.py |  2 +-
 3 files changed, 123 insertions(+), 28 deletions(-)
--




[jira] [Commented] (BEAM-1735) Retry 403:rateLimitExceeded in GCS

2017-03-27 Thread Davor Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944295#comment-15944295
 ] 

Davor Bonaci commented on BEAM-1735:


This is a batched copy request to Cloud Storage during FileBasedSink 
finalization.

Stack trace from an SDK that was built pre-Beam package rename:
java.io.IOException: Error executing batch GCS request
at 
com.google.cloud.dataflow.sdk.util.GcsUtil.executeBatches(GcsUtil.java:414)
at com.google.cloud.dataflow.sdk.util.GcsUtil.copy(GcsUtil.java:421)
at 
com.google.cloud.dataflow.sdk.io.FileBasedSink$GcsOperations.copy(FileBasedSink.java:663)
at 
com.google.cloud.dataflow.sdk.io.FileBasedSink$FileBasedWriteOperation.copyToOutputFiles(FileBasedSink.java:378)
at 
com.google.cloud.dataflow.sdk.io.FileBasedSink$FileBasedWriteOperation.finalize(FileBasedSink.java:342)
at 
com.google.cloud.dataflow.sdk.io.Write$Bound$2.processElement(Write.java:416)
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Error 
trying to copy [cut]/part-14135-of-22733.protobuf.avro: 
{"code":403,"errors":[{"domain":"usageLimits","message":"User Rate Limit 
Exceeded","reason":"userRateLimitExceeded"}],"message":"User Rate Limit 
Exceeded"}
at 
com.google.cloud.dataflow.sdk.repackaged.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:476)
at 
com.google.cloud.dataflow.sdk.repackaged.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:455)
at 
com.google.cloud.dataflow.sdk.repackaged.com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:79)
at 
com.google.cloud.dataflow.sdk.util.GcsUtil.executeBatches(GcsUtil.java:409)
at com.google.cloud.dataflow.sdk.util.GcsUtil.copy(GcsUtil.java:421)
at 
com.google.cloud.dataflow.sdk.io.FileBasedSink$GcsOperations.copy(FileBasedSink.java:663)
at 
com.google.cloud.dataflow.sdk.io.FileBasedSink$FileBasedWriteOperation.copyToOutputFiles(FileBasedSink.java:378)
at 
com.google.cloud.dataflow.sdk.io.FileBasedSink$FileBasedWriteOperation.finalize(FileBasedSink.java:342)
at 
com.google.cloud.dataflow.sdk.io.Write$Bound$2.processElement(Write.java:416)
at 
com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49)
at 
com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:139)
at 
com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:188)
at 
com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42)
at 
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerLoggingParDoFn.processElement(DataflowWorkerLoggingParDoFn.java:47)
at 
com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.process(ParDoOperation.java:55)
at 
com.google.cloud.dataflow.sdk.util.common.worker.OutputReceiver.process(OutputReceiver.java:52)
at 
com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:221)
at 
com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.start(ReadOperation.java:182)
at 
com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:69)
at 
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:284)
at 
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:220)
at 
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:170)
at 
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.doWork(DataflowWorkerHarness.java:192)
at 
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:172)
at 
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:159)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Error trying to copy 
[cut]/part-14135-of-22733.protobuf.avro: 
{"code":403,"errors":[{"domain":"usageLimits","message":"User Rate Limit 
Exceeded","reason":"userRateLimitExceeded"}],"message":"User Rate Limit 
Exceeded"}
at 
com.google.cloud.dataflow.sdk.util.GcsUtil$3.onFailure(GcsUtil.java:484)
at 
com.google.api.client.googleapis.batch.json.JsonBatchCallback.onFailure(JsonBatchCallback.java:54)
at 

[GitHub] beam pull request #2342: Add region option to Dataflow pipeline options.

2017-03-27 Thread aaltay
GitHub user aaltay opened a pull request:

https://github.com/apache/beam/pull/2342

Add region option to Dataflow pipeline options.

R: @charlesccychen 
cc: @dhalperi

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aaltay/beam region

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2342.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2342


commit fd1190f196b0c3e3747e1482ca8e2b28a484310d
Author: Ahmet Altay 
Date:   2017-03-28T00:09:34Z

Add region option to Dataflow pipeline options.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Build failed in Jenkins: beam_PerformanceTests_Dataflow #238

2017-03-27 Thread Apache Jenkins Server
See 


Changes:

[klk] Port Java modules from RunnableOnService to ValidatesRunner

[klk] Remove deprecated RunnableOnService

[altay] Pin the python postcommit jenkins nodes

[tgroh] Do not reify timestamps in Reshuffle in Dataflow

[tgroh] Make WindowMappingFn#maximumLookback Configurable but Final

[klk] [BEAM-1778] Second clean up pass of dataflow references/URLs in Java SDK

[tgroh] Update Dataflow Worker Image

[altay] Fix Python Dataflow default job name

[altay] [BEAM-1693] Detect supported Python & pip executables in Python-SDK

[davor] [BEAM-1778] Clean up unused dataflow javadoc directory

--
[...truncated 272.09 KB...]
 * [new ref] refs/pull/928/head -> origin/pr/928/head
 * [new ref] refs/pull/929/head -> origin/pr/929/head
 * [new ref] refs/pull/93/head -> origin/pr/93/head
 * [new ref] refs/pull/930/head -> origin/pr/930/head
 * [new ref] refs/pull/930/merge -> origin/pr/930/merge
 * [new ref] refs/pull/931/head -> origin/pr/931/head
 * [new ref] refs/pull/931/merge -> origin/pr/931/merge
 * [new ref] refs/pull/932/head -> origin/pr/932/head
 * [new ref] refs/pull/932/merge -> origin/pr/932/merge
 * [new ref] refs/pull/933/head -> origin/pr/933/head
 * [new ref] refs/pull/933/merge -> origin/pr/933/merge
 * [new ref] refs/pull/934/head -> origin/pr/934/head
 * [new ref] refs/pull/934/merge -> origin/pr/934/merge
 * [new ref] refs/pull/935/head -> origin/pr/935/head
 * [new ref] refs/pull/936/head -> origin/pr/936/head
 * [new ref] refs/pull/936/merge -> origin/pr/936/merge
 * [new ref] refs/pull/937/head -> origin/pr/937/head
 * [new ref] refs/pull/937/merge -> origin/pr/937/merge
 * [new ref] refs/pull/938/head -> origin/pr/938/head
 * [new ref] refs/pull/939/head -> origin/pr/939/head
 * [new ref] refs/pull/94/head -> origin/pr/94/head
 * [new ref] refs/pull/940/head -> origin/pr/940/head
 * [new ref] refs/pull/940/merge -> origin/pr/940/merge
 * [new ref] refs/pull/941/head -> origin/pr/941/head
 * [new ref] refs/pull/941/merge -> origin/pr/941/merge
 * [new ref] refs/pull/942/head -> origin/pr/942/head
 * [new ref] refs/pull/942/merge -> origin/pr/942/merge
 * [new ref] refs/pull/943/head -> origin/pr/943/head
 * [new ref] refs/pull/943/merge -> origin/pr/943/merge
 * [new ref] refs/pull/944/head -> origin/pr/944/head
 * [new ref] refs/pull/945/head -> origin/pr/945/head
 * [new ref] refs/pull/945/merge -> origin/pr/945/merge
 * [new ref] refs/pull/946/head -> origin/pr/946/head
 * [new ref] refs/pull/946/merge -> origin/pr/946/merge
 * [new ref] refs/pull/947/head -> origin/pr/947/head
 * [new ref] refs/pull/947/merge -> origin/pr/947/merge
 * [new ref] refs/pull/948/head -> origin/pr/948/head
 * [new ref] refs/pull/948/merge -> origin/pr/948/merge
 * [new ref] refs/pull/949/head -> origin/pr/949/head
 * [new ref] refs/pull/949/merge -> origin/pr/949/merge
 * [new ref] refs/pull/95/head -> origin/pr/95/head
 * [new ref] refs/pull/95/merge -> origin/pr/95/merge
 * [new ref] refs/pull/950/head -> origin/pr/950/head
 * [new ref] refs/pull/951/head -> origin/pr/951/head
 * [new ref] refs/pull/951/merge -> origin/pr/951/merge
 * [new ref] refs/pull/952/head -> origin/pr/952/head
 * [new ref] refs/pull/952/merge -> origin/pr/952/merge
 * [new ref] refs/pull/953/head -> origin/pr/953/head
 * [new ref] refs/pull/954/head -> origin/pr/954/head
 * [new ref] refs/pull/954/merge -> origin/pr/954/merge
 * [new ref] refs/pull/955/head -> origin/pr/955/head
 * [new ref] refs/pull/955/merge -> origin/pr/955/merge
 * [new ref] refs/pull/956/head -> origin/pr/956/head
 * [new ref] refs/pull/957/head -> origin/pr/957/head
 * [new ref] refs/pull/958/head -> origin/pr/958/head
 * [new ref] refs/pull/959/head -> origin/pr/959/head
 * [new ref] refs/pull/959/merge -> origin/pr/959/merge
 * [new ref] refs/pull/96/head -> origin/pr/96/head
 * [new ref] refs/pull/96/merge -> origin/pr/96/merge
 * [new ref] refs/pull/960/head -> origin/pr/960/head
 * [new ref] refs/pull/960/merge -> origin/pr/960/merge
 * [new ref] refs/pull/961/head -> origin/pr/961/head
 * [new ref] refs/pull/962/head -> origin/pr/962/head
 * [new ref] refs/pull/962/merge -> origin/pr/962/merge
 * [new ref] refs/pull/963/head -> origin/pr/963/head
 * [new ref] refs/pull/963/merge -> origin/pr/963/merge
 * [new ref] refs/pull/964/head -> origin/pr/964/head
 * [new ref] 

[GitHub] beam pull request #2341: Refers to NeedsRunner in javadoc of TestPipeline

2017-03-27 Thread jkff
GitHub user jkff opened a pull request:

https://github.com/apache/beam/pull/2341

Refers to NeedsRunner in javadoc of TestPipeline

This is more correct than referring to ValidatesRunner,
since NeedsRunner is the far more common use case for end-users,
and ValidatesRunner is its special case.

R: @tgroh 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkff/incubator-beam needs-runner

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2341.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2341


commit 29dfb5a4ec8594d18a0ea3813a047c7c69e9eec4
Author: Eugene Kirpichov 
Date:   2017-03-27T23:16:41Z

Refers to NeedsRunner in javadoc of TestPipeline

This is more correct than referring to ValidatesRunner,
since NeedsRunner is the far more common use case for end-users,
and ValidatesRunner is its special case.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is back to normal : beam_PostCommit_Java_MavenInstall #3056

2017-03-27 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-688) Failure of beam-sdks-java-maven-archetypes-starter with undeclared dependency error

2017-03-27 Thread Davor Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944171#comment-15944171
 ] 

Davor Bonaci commented on BEAM-688:
---

Anybody else? ([~kenn], [~neelesh77], [~venbijjam])

> Failure of beam-sdks-java-maven-archetypes-starter with undeclared dependency 
> error
> ---
>
> Key: BEAM-688
> URL: https://issues.apache.org/jira/browse/BEAM-688
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>
> The starter archetype has flaky dependencies. It is reported to fail reliably 
> on repeatedly install.
> {noformat}
> [INFO] --- maven-dependency-plugin:2.10:analyze-only (default) @ 
> beam-sdks-java-maven-archetypes-starter ---
> [WARNING] Used undeclared dependencies found:
> [WARNING]org.slf4j:slf4j-api:jar:1.7.14:runtime
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1717) Maven release/deploy tries to uploads some artifacts more than once

2017-03-27 Thread Davor Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944168#comment-15944168
 ] 

Davor Bonaci commented on BEAM-1717:


Thanks! (I'll let you know if I get a little bit of time and find something 
obvious.)

> Maven release/deploy tries to uploads some artifacts more than once 
> 
>
> Key: BEAM-1717
> URL: https://issues.apache.org/jira/browse/BEAM-1717
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Amit Sela
>Assignee: Amit Sela
>Priority: Minor
> Fix For: First stable release
>
>
> Running maven {{release}} or {{deploy}} causes some artifacts to deploy more 
> than once which fails deployments to release Nexus.
> While this is not an issue for the Apache release process (because it uses a 
> staging Nexus), this affects users who wish to deploy their own fork. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1820) Source.getDefaultOutputCoder() should be @Nullable

2017-03-27 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-1820:
--

Assignee: (was: Davor Bonaci)

> Source.getDefaultOutputCoder() should be @Nullable
> --
>
> Key: BEAM-1820
> URL: https://issues.apache.org/jira/browse/BEAM-1820
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>  Labels: easyfix, starter
>
> Source.getDefaultOutputCoder() returns a coder for elements produced by the 
> source.
> However, the Source objects are nearly always hidden from the user and 
> instead encapsulated in a transform. Often, an enclosing transform has a 
> better idea of what coder should be used to encode these elements (e.g. a 
> user supplied a Coder to that transform's configuration). In that case, it'd 
> be good if Source.getDefaultOutputCoder() could just return null, and coder 
> would have to be handled by the enclosing transform or perhaps specified on 
> the output of that transform explicitly.
> Right now there's a bunch of code in the SDK and runners that assumes 
> Source.getDefaultOutputCoder() returns non-null. That code would need to be 
> fixed to instead use the coder set on the collection produced by 
> Read.from(source).
> It all appears pretty easy to fix, so this is a good starter item.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1778) Clean up pass of dataflow/google references/URLs in Java SDK

2017-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944165#comment-15944165
 ] 

ASF GitHub Bot commented on BEAM-1778:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2316


> Clean up pass of dataflow/google references/URLs in Java SDK
> 
>
> Key: BEAM-1778
> URL: https://issues.apache.org/jira/browse/BEAM-1778
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Melissa Pashniak
>Assignee: Melissa Pashniak
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1622) Java: Rename RunnableOnService to ValidatesRunner

2017-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944164#comment-15944164
 ] 

ASF GitHub Bot commented on BEAM-1622:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam-site/pull/194


> Java: Rename RunnableOnService to ValidatesRunner
> -
>
> Key: BEAM-1622
> URL: https://issues.apache.org/jira/browse/BEAM-1622
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core, testing
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Minor
>
> We agreed on the dev list a long time ago to this rename, and Python SDK has 
> executed it. Since then I've had a half dozen conversations to clarify it, so 
> let's get this done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[1/2] beam git commit: [BEAM-1778] Clean up unused dataflow javadoc directory

2017-03-27 Thread davor
Repository: beam
Updated Branches:
  refs/heads/master affcca669 -> 6d627534b


[BEAM-1778] Clean up unused dataflow javadoc directory


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/aa64a63b
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/aa64a63b
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/aa64a63b

Branch: refs/heads/master
Commit: aa64a63bd735027943d25105a42ef8cd05bfdda8
Parents: affcca6
Author: melissa 
Authored: Fri Mar 24 11:40:37 2017 -0700
Committer: Davor Bonaci 
Committed: Mon Mar 27 15:38:35 2017 -0700

--
 sdks/java/javadoc/dataflow-sdk-docs/package-list | 11 ---
 1 file changed, 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/aa64a63b/sdks/java/javadoc/dataflow-sdk-docs/package-list
--
diff --git a/sdks/java/javadoc/dataflow-sdk-docs/package-list 
b/sdks/java/javadoc/dataflow-sdk-docs/package-list
deleted file mode 100644
index 5c98c85..000
--- a/sdks/java/javadoc/dataflow-sdk-docs/package-list
+++ /dev/null
@@ -1,11 +0,0 @@
-com.google.cloud.dataflow.sdk
-org.apache.beam.sdk.annotations
-com.google.cloud.dataflow.sdk.coders
-com.google.cloud.dataflow.sdk.io
-com.google.cloud.dataflow.sdk.options
-com.google.cloud.dataflow.sdk.runners
-com.google.cloud.dataflow.sdk.testing
-com.google.cloud.dataflow.sdk.transforms
-com.google.cloud.dataflow.sdk.transforms.join
-com.google.cloud.dataflow.sdk.transforms.windowing
-com.google.cloud.dataflow.sdk.values



[GitHub] beam-site pull request #194: [BEAM-1622] Reword RunnableOnService to Validat...

2017-03-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam-site/pull/194


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[3/3] beam-site git commit: This closes #194

2017-03-27 Thread davor
This closes #194


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/0ee03242
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/0ee03242
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/0ee03242

Branch: refs/heads/asf-site
Commit: 0ee0324228bc59e92e05690ce3353359600fd05b
Parents: e342393 7453a1a
Author: Davor Bonaci 
Authored: Mon Mar 27 15:36:54 2017 -0700
Committer: Davor Bonaci 
Committed: Mon Mar 27 15:36:54 2017 -0700

--
 content/contribute/testing/index.html | 22 +++---
 src/contribute/testing.md | 18 +-
 2 files changed, 20 insertions(+), 20 deletions(-)
--




[2/3] beam-site git commit: Regenerate website

2017-03-27 Thread davor
Regenerate website


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/7453a1a8
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/7453a1a8
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/7453a1a8

Branch: refs/heads/asf-site
Commit: 7453a1a820a0d76fab7384e3504e6ae6bd4d15ce
Parents: 81bafde
Author: Davor Bonaci 
Authored: Mon Mar 27 15:36:54 2017 -0700
Committer: Davor Bonaci 
Committed: Mon Mar 27 15:36:54 2017 -0700

--
 content/contribute/testing/index.html | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam-site/blob/7453a1a8/content/contribute/testing/index.html
--
diff --git a/content/contribute/testing/index.html 
b/content/contribute/testing/index.html
index c0ebd6d..f9bc735 100644
--- a/content/contribute/testing/index.html
+++ b/content/contribute/testing/index.html
@@ -167,13 +167,13 @@
   
   Testing 
Types
   Unit
-  RunnableOnService (Working 
Title)
+  ValidatesRunner (Working 
Title)
   E2E
 
   
   Testing 
Systems
   E2E Testing Framework
-  RunnableOnService Tests
+  ValidatesRunner Tests
   Effective use of 
the TestPipeline JUnit rule
   API Surface testing
 
@@ -331,11 +331,11 @@ details on those testing types.

Correctness

-   E2E Test, https://github.com/apache/beam/blob/master/runners/pom.xml#L47;>@RunnableonService
+   E2E Test, https://github.com/apache/beam/blob/master/runners/pom.xml#L47;>@ValidatesRunner

https://github.com/apache/beam/blob/master/examples/java/src/test/java/org/apache/beam/examples/WordCountIT.java#L78;>WordCountIT,
 https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java;>ParDoTest

-   E2E, @RunnableonService
+   E2E, @ValidatesRunner

Postcommit

@@ -450,7 +450,7 @@ viewed.
 
 Running in postcommit removes as stringent of a time constraint, which 
gives us
 the ability to do some more comprehensive testing. In postcommit we have a test
-suite running the RunnableOnService tests against each supported runner, and
+suite running the ValidatesRunner tests against each supported runner, and
 another for running the full set of E2E tests against each runner.
 Currently-supported runners are Dataflow, Flink, Spark, and Gearpump, with
 others soon to follow. Work is ongoing to enable Flink, Spark, and Gearpump in
@@ -475,9 +475,9 @@ importance of testing, Beam has a robust set of unit tests, 
as well as testing
 coverage measurement tools, which protect the codebase from simple to moderate
 breakages. Beam Java unit tests are written in JUnit.
 
-RunnableOnService (Working Title)
+ValidatesRunner (Working Title)
 
-RunnableOnService tests contain components of both component and end-to-end
+ValidatesRunner tests contain components of both component and end-to-end
 tests. They fulfill the typical purpose of a component test - they are meant to
 test a well-scoped piece of Beam functionality or the interactions between two
 such pieces and can be run in a component-test-type fashion against the
@@ -487,7 +487,7 @@ functionality, but runner functionality as well. They are 
more lightweight than
 a traditional end-to-end test and, because of their well-scoped nature, provide
 good signal as to what exactly is working or broken against a particular 
runner.
 
-The name “RunnableOnService” is an artifact of when Beam was still the 
Google
+The name “ValidatesRunner” is an artifact of when Beam was still the 
Google
 Cloud Dataflow SDK and https://issues.apache.org/jira/browse/BEAM-655;>will be
 changing to something more
 indicative of its use in the coming months.
@@ -537,9 +537,9 @@ environments. We currently provide the ability to run 
against the DirectRunner,
 against a local Spark instance, a local Flink instance, and against the Google
 Cloud Dataflow service.
 
-RunnableOnService Tests
+ValidatesRunner Tests
 
-RunnableOnService tests are tests built to use the Beam TestPipeline class,
+ValidatesRunner tests are tests built to use the Beam TestPipeline class,
 which enables test authors to write simple functionality verification. They are
 meant to use some of the built-in utilities of the SDK, namely PAssert, to
 verify that the simple pipelines they run end in the correct state.
@@ -568,7 +568,7 @@ due to the one of the following scenarios:
 
 Abandoned node detection is automatically enabled when a real 
pipeline 
 runner (i.e. not a CrashingRunner) 
and/or a 
-@NeedsRunner / @RunnableOnService annotation are detected.
+@NeedsRunner / @ValidatesRunner annotation are detected.
 
 Consider the 

[jira] [Commented] (BEAM-1693) Detect supported Python & pip executables in Python-SDK

2017-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944158#comment-15944158
 ] 

ASF GitHub Bot commented on BEAM-1693:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2243


> Detect supported Python & pip executables in Python-SDK
> ---
>
> Key: BEAM-1693
> URL: https://issues.apache.org/jira/browse/BEAM-1693
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Tibor Kiss
>Assignee: Tibor Kiss
>
> Python SDK currently supports Python-2.7 only.
> The Python interpreter & pip definition in pom.xml points to {{python2}} & 
> {{pip2}} respectively. 
> Users with multiple Python interpreters installed might end up having python2 
> and pip2 pointing to their 2.6 installation. (This scenario happens mostly on 
> OS X machines.)
> There is no single, valid name for the executables as different OSes install 
> those binaries in various names:
> - CentOS6/EPEL: pip (python 2.6) & pip2 (python 2.6) & pip2.6 (python 2.6)
> - CentOS7/EPEL: pip (python 2.7) & pip2 (python 2.7) & pip2.7 (python 2.7)
> - Debian7: pip (python 2.7) & pip-2.6 (python 2.6) & pip-2.7 (python 2.7)
> - Debian8: pip (python 2.7) & pip2 (python 2.7)
> - Debian9: pip (python 2.7) & pip2 (python 2.7)
> - Ubuntu1204: pip (python 2.7)
> - Ubuntu1404: pip2 (python 2.7)
> - Ubuntu1604: pip (python 2.7) & pip2 (python 2.7)
> - OS X: pip (python 2.6) & pip2 (python 2.6) & pip2.7 (brew / python 2.7)
> - Windows: pip-2.7 (python.org based installer)
> To overcome this problem the pom.xml should be extended to determine the 
> suitable Python interpreter & pip automatically, in a platform independent 
> way.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1797) add CoGroupByKey to chapter 'Using GroupByKey'

2017-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944160#comment-15944160
 ] 

ASF GitHub Bot commented on BEAM-1797:
--

GitHub user XuMingmin opened a pull request:

https://github.com/apache/beam-site/pull/195

[BEAM-1797] add CoGroupByKey to chapter 'Using GroupByKey'  

`CoGroupByKey` is the main component to create a JOIN operation in Beam 
pipeline, add the usage to chapter `Using GroupByKey`.  

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/XuMingmin/beam-site asf-site

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam-site/pull/195.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #195


commit 9ae9977702147e8e791166a9a5ab2b724a3aa14c
Author: mingmxu 
Date:   2017-03-18T04:08:18Z

add code snippet to page design-your-pipeline

commit e95f83e069b1433ffed910721fe61dcee891dcff
Author: mingmxu 
Date:   2017-03-21T17:35:51Z

consistent words

commit 116d9bf8854bf7cde252b3a9ecb1e5e52f8aabcb
Author: mingmxu 
Date:   2017-03-21T18:26:28Z

restate pipeline multiple-in multiple-out

commit f31778f698c0f1a14694a7503583d229144ae982
Author: mingmxu 
Date:   2017-03-21T20:43:12Z

correct spelling

commit 1f8dcbf013081f5bd1473e31c472b8f9a60177cb
Author: mingmxu 
Date:   2017-03-27T22:28:26Z

add the usage of CoGroupByKey

commit dbd3bd8791d5d1780f2b27aa3bae00a9739eea2b
Author: XuMingmin 
Date:   2017-03-27T22:31:17Z

Merge pull request #1 from apache/asf-site

sync-up code from main repository




> add CoGroupByKey to chapter 'Using GroupByKey'  
> 
>
> Key: BEAM-1797
> URL: https://issues.apache.org/jira/browse/BEAM-1797
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>Priority: Minor
>
> In 
> https://beam.apache.org/documentation/programming-guide/#transforms-sideio, 
> the chapter {{Using GroupByKey}} describes how to apply a GroupBy on one 
> single {{PCollection}} with {{GroupByKey}}. 
> Would like to add the usage of {{CoGroupByKey}} which run on multiple 
> {{PCollection}} s



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam-site pull request #195: [BEAM-1797] add CoGroupByKey to chapter 'Using ...

2017-03-27 Thread XuMingmin
GitHub user XuMingmin opened a pull request:

https://github.com/apache/beam-site/pull/195

[BEAM-1797] add CoGroupByKey to chapter 'Using GroupByKey'  

`CoGroupByKey` is the main component to create a JOIN operation in Beam 
pipeline, add the usage to chapter `Using GroupByKey`.  

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/XuMingmin/beam-site asf-site

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam-site/pull/195.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #195


commit 9ae9977702147e8e791166a9a5ab2b724a3aa14c
Author: mingmxu 
Date:   2017-03-18T04:08:18Z

add code snippet to page design-your-pipeline

commit e95f83e069b1433ffed910721fe61dcee891dcff
Author: mingmxu 
Date:   2017-03-21T17:35:51Z

consistent words

commit 116d9bf8854bf7cde252b3a9ecb1e5e52f8aabcb
Author: mingmxu 
Date:   2017-03-21T18:26:28Z

restate pipeline multiple-in multiple-out

commit f31778f698c0f1a14694a7503583d229144ae982
Author: mingmxu 
Date:   2017-03-21T20:43:12Z

correct spelling

commit 1f8dcbf013081f5bd1473e31c472b8f9a60177cb
Author: mingmxu 
Date:   2017-03-27T22:28:26Z

add the usage of CoGroupByKey

commit dbd3bd8791d5d1780f2b27aa3bae00a9739eea2b
Author: XuMingmin 
Date:   2017-03-27T22:31:17Z

Merge pull request #1 from apache/asf-site

sync-up code from main repository




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: [BEAM-1693] Detect supported Python & pip executables in Python-SDK

2017-03-27 Thread altay
Repository: beam
Updated Branches:
  refs/heads/master 07274bbfe -> affcca669


[BEAM-1693] Detect supported Python & pip executables in Python-SDK


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/c7c153a5
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/c7c153a5
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/c7c153a5

Branch: refs/heads/master
Commit: c7c153a5945d351c868f6d99f0c809317d16d6b8
Parents: 07274bb
Author: Tibor Kiss 
Authored: Tue Mar 14 20:51:35 2017 +0100
Committer: Ahmet Altay 
Committed: Mon Mar 27 15:32:51 2017 -0700

--
 pom.xml|  1 +
 sdks/python/findSupportedPython.groovy | 80 +
 sdks/python/pom.xml| 31 ++-
 3 files changed, 110 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/c7c153a5/pom.xml
--
diff --git a/pom.xml b/pom.xml
index bf1d4f0..e7ae4ca 100644
--- a/pom.xml
+++ b/pom.xml
@@ -137,6 +137,7 @@
 v1-rev71-1.22.0
 4.4.1
 4.3.5.RELEASE
+2.0
 
 -Werror
 
-Xpkginfo:always

http://git-wip-us.apache.org/repos/asf/beam/blob/c7c153a5/sdks/python/findSupportedPython.groovy
--
diff --git a/sdks/python/findSupportedPython.groovy 
b/sdks/python/findSupportedPython.groovy
new file mode 100644
index 000..6984132
--- /dev/null
+++ b/sdks/python/findSupportedPython.groovy
@@ -0,0 +1,80 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/* This (groovy-maven-plugin) script finds the supported python interpreter 
and pip
+ * binary in the path. As there is no strict naming convention exists amongst 
OSes
+ * for Python & pip (some call it python2.7, others name it python-2.7),
+ * the script tries to execute the candidates and query their version.
+ * The first matching interpreter & pip is assigned to "python.interpreter.bin"
+ * and "python.pip.bin" (maven) properties respectively.
+ */
+
+import org.apache.maven.plugin.MojoFailureException
+
+requiredPythonVersion = /.*[Pp]ython 2\.7.*/
+
+pythonCandidates = ["python2.7", "python-2.7", "python2", "python-2", "python"]
+pipCandidates = ["pip2.7", "pip-2.7", "pip2", "pip-2", "pip"]
+
+def String findExecutable(String[] candidates, versionRegex) {
+for (candidate in candidates) {
+try {
+def exec = "${candidate} --version".execute()
+
+def consoleSB = new StringBuilder()
+exec.waitForProcessOutput(consoleSB, consoleSB)
+consoleStr = consoleSB.toString().replaceAll("\\r|\\n", "")
+
+if (exec.exitValue() == 0 && consoleStr ==~ versionRegex) {
+return candidate
+}
+} catch (IOException e) {
+continue
+}
+}
+return null
+}
+
+def Boolean isWindows() {
+return 
System.properties['os.name'].toLowerCase(Locale.ROOT).contains('windows');
+}
+
+/* On MS Windows applications with dots in the filename can only be executed
+ * if the .exe suffix is also included. That is 'pip2.7' will cause an 
execution error,
+ * while 'pip2.7.exe' will succeed (given that pip2.7.exe is an executable in 
the PATH).
+ * The specializeCandidateForOS closure takes care of this conversion.
+ */
+def specializeCandidateForOS = { it -> isWindows() ? it + '.exe' : it }
+
+pythonBin = findExecutable(pythonCandidates.collect(specializeCandidateForOS) 
as String[],
+   requiredPythonVersion)
+pipBin = findExecutable(pipCandidates.collect(specializeCandidateForOS) as 
String[],
+requiredPythonVersion)
+
+if (pythonBin == null) {
+   throw new MojoFailureException("Unable to find Python 2.7 in path")
+}
+
+if (pipBin == null) {
+   throw new MojoFailureException("Unable to find pip for Python 2.7 in path")
+}
+
+log.info("Using python interpreter binary '" + pythonBin + "' with pip '" + 
pipBin 

[jira] [Resolved] (BEAM-1801) default_job_name can generate names not accepted by DataFlow

2017-03-27 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay resolved BEAM-1801.
---
   Resolution: Fixed
Fix Version/s: First stable release

> default_job_name can generate names not accepted by DataFlow
> 
>
> Key: BEAM-1801
> URL: https://issues.apache.org/jira/browse/BEAM-1801
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Dennis Docter
>Assignee: Pablo Estrada
>Priority: Trivial
> Fix For: First stable release
>
>
> The default job name generated by: 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py#L288
> is partially derived from the os username of the executing user. These may 
> contain characters not accepted by Dataflow, resulting in errors like:
> apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: 
> Dataflow pipeline failed. State: FAILED, Error:
> (a1b878f3562c0e6d): Error processing pipeline. Causes: (a1b878f3562c04ae): 
> Prefix for cluster 'beamapp-dennis.docter-032-03231324-1edc-harness' should 
> match '[a-z]([-a-z0-9]{0,61}[a-z0-9])?'. This probably means the joblabel is 
> invalid.
> To solve this issue, sanitise the username to only container alphanumeric 
> characters and dashes.
> Also there seems to be no length restriction and dataflow imposes a 63 
> character length limit in the above case. Limiting on length substantially 
> shorter than that to allow for postfixes (like -harness in this case) may be 
> wise.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2243: [BEAM-1693] Detect supported Python & pip executabl...

2017-03-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2243


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: This closes #2243

2017-03-27 Thread altay
This closes #2243


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/affcca66
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/affcca66
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/affcca66

Branch: refs/heads/master
Commit: affcca669f2a87b9f6410f99cf9608ab92af5b28
Parents: 07274bb c7c153a
Author: Ahmet Altay 
Authored: Mon Mar 27 15:33:10 2017 -0700
Committer: Ahmet Altay 
Committed: Mon Mar 27 15:33:10 2017 -0700

--
 pom.xml|  1 +
 sdks/python/findSupportedPython.groovy | 80 +
 sdks/python/pom.xml| 31 ++-
 3 files changed, 110 insertions(+), 2 deletions(-)
--




Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #2657

2017-03-27 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-1801) default_job_name can generate names not accepted by DataFlow

2017-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944132#comment-15944132
 ] 

ASF GitHub Bot commented on BEAM-1801:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2322


> default_job_name can generate names not accepted by DataFlow
> 
>
> Key: BEAM-1801
> URL: https://issues.apache.org/jira/browse/BEAM-1801
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Dennis Docter
>Assignee: Pablo Estrada
>Priority: Trivial
>
> The default job name generated by: 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py#L288
> is partially derived from the os username of the executing user. These may 
> contain characters not accepted by Dataflow, resulting in errors like:
> apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: 
> Dataflow pipeline failed. State: FAILED, Error:
> (a1b878f3562c0e6d): Error processing pipeline. Causes: (a1b878f3562c04ae): 
> Prefix for cluster 'beamapp-dennis.docter-032-03231324-1edc-harness' should 
> match '[a-z]([-a-z0-9]{0,61}[a-z0-9])?'. This probably means the joblabel is 
> invalid.
> To solve this issue, sanitise the username to only container alphanumeric 
> characters and dashes.
> Also there seems to be no length restriction and dataflow imposes a 63 
> character length limit in the above case. Limiting on length substantially 
> shorter than that to allow for postfixes (like -harness in this case) may be 
> wise.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[1/2] beam git commit: Fix Python Dataflow default job name

2017-03-27 Thread altay
Repository: beam
Updated Branches:
  refs/heads/master 44472c76c -> 07274bbfe


Fix Python Dataflow default job name


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/021e2a07
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/021e2a07
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/021e2a07

Branch: refs/heads/master
Commit: 021e2a075df4832cb43e678d203fb7c56711032b
Parents: 44472c7
Author: Pablo 
Authored: Fri Mar 24 14:31:33 2017 -0700
Committer: Ahmet Altay 
Committed: Mon Mar 27 15:15:16 2017 -0700

--
 .../runners/dataflow/internal/apiclient.py  | 21 
 .../runners/dataflow/internal/apiclient_test.py | 12 ++-
 2 files changed, 28 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/021e2a07/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
--
diff --git a/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py 
b/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
index e980b14..f7daed0 100644
--- a/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
+++ b/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
@@ -285,12 +285,25 @@ class Job(object):
 indent=2, sort_keys=True)
 
   @staticmethod
+  def _build_default_job_name(user_name):
+"""Generates a default name for a job.
+
+user_name is lowercased, and any characters outside of [-a-z0-9]
+are removed. If necessary, the user_name is truncated to shorten
+the job name to 63 characters."""
+user_name = re.sub('[^-a-z0-9]', '', user_name.lower())
+date_component = datetime.utcnow().strftime('%m%d%H%M%S-%f')
+app_user_name = 'beamapp-{}'.format(user_name)
+job_name = '{}-{}'.format(app_user_name, date_component)
+if len(job_name) > 63:
+  job_name = '{}-{}'.format(app_user_name[:-(len(job_name) - 63)],
+date_component)
+return job_name
+
+  @staticmethod
   def default_job_name(job_name):
 if job_name is None:
-  user_name = getpass.getuser().lower()
-  date_component = datetime.utcnow().strftime('%m%d%H%M%S-%f')
-  app_name = 'beamapp'
-  job_name = '{}-{}-{}'.format(app_name, user_name, date_component)
+  job_name = Job._build_default_job_name(getpass.getuser())
 return job_name
 
   def __init__(self, options):

http://git-wip-us.apache.org/repos/asf/beam/blob/021e2a07/sdks/python/apache_beam/runners/dataflow/internal/apiclient_test.py
--
diff --git 
a/sdks/python/apache_beam/runners/dataflow/internal/apiclient_test.py 
b/sdks/python/apache_beam/runners/dataflow/internal/apiclient_test.py
index d60c7a5..e9aaacb 100644
--- a/sdks/python/apache_beam/runners/dataflow/internal/apiclient_test.py
+++ b/sdks/python/apache_beam/runners/dataflow/internal/apiclient_test.py
@@ -15,7 +15,6 @@
 # limitations under the License.
 #
 """Unit tests for the apiclient module."""
-
 import unittest
 
 from mock import Mock
@@ -45,6 +44,17 @@ class UtilTest(unittest.TestCase):
 pipeline_options,
 DataflowRunner.BATCH_ENVIRONMENT_MAJOR_VERSION)
 
+  def test_invalid_default_job_name(self):
+# Regexp for job names in dataflow.
+regexp = '^[a-z]([-a-z0-9]{0,61}[a-z0-9])?$'
+
+job_name = apiclient.Job._build_default_job_name('invalid.-_user_n*/ame')
+self.assertRegexpMatches(job_name, regexp)
+
+job_name = apiclient.Job._build_default_job_name(
+'invalid-extremely-long.username_that_shouldbeshortened_or_is_invalid')
+self.assertRegexpMatches(job_name, regexp)
+
   def test_default_job_name(self):
 job_name = apiclient.Job.default_job_name(None)
 regexp = 'beamapp-.*-[0-9]{10}-[0-9]{6}'



[2/2] beam git commit: This closes #2322

2017-03-27 Thread altay
This closes #2322


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/07274bbf
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/07274bbf
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/07274bbf

Branch: refs/heads/master
Commit: 07274bbfe765733e971cd42ecc97650ceb809167
Parents: 44472c7 021e2a0
Author: Ahmet Altay 
Authored: Mon Mar 27 15:16:39 2017 -0700
Committer: Ahmet Altay 
Committed: Mon Mar 27 15:16:39 2017 -0700

--
 .../runners/dataflow/internal/apiclient.py  | 21 
 .../runners/dataflow/internal/apiclient_test.py | 12 ++-
 2 files changed, 28 insertions(+), 5 deletions(-)
--




[jira] [Updated] (BEAM-1797) add CoGroupByKey to chapter 'Using GroupByKey'

2017-03-27 Thread Xu Mingmin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Mingmin updated BEAM-1797:
-
Description: 
In https://beam.apache.org/documentation/programming-guide/#transforms-sideio, 
the chapter {{Using GroupByKey}} describes how to apply a GroupBy on one single 
{{PCollection}} with {{GroupByKey}}. 
Would like to add the usage of {{CoGroupByKey}} which run on multiple 
{{PCollection}} s

  was:
In https://beam.apache.org/documentation/programming-guide/#transforms-sideio, 
the chapter {{Using GroupByKey}} describes how to apply a GroupBy on one single 
{{PCollection}} with {{GroupByKey}}. 
Would like to change the subject title as
{code}
Using GroupByKey and CoGroupByKey
{code}
add the usage of {{CoGroupByKey}} which run on multiple {{PCollection}} s


> add CoGroupByKey to chapter 'Using GroupByKey'  
> 
>
> Key: BEAM-1797
> URL: https://issues.apache.org/jira/browse/BEAM-1797
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>Priority: Minor
>
> In 
> https://beam.apache.org/documentation/programming-guide/#transforms-sideio, 
> the chapter {{Using GroupByKey}} describes how to apply a GroupBy on one 
> single {{PCollection}} with {{GroupByKey}}. 
> Would like to add the usage of {{CoGroupByKey}} which run on multiple 
> {{PCollection}} s



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1797) add CoGroupByKey to chapter 'Using GroupByKey'

2017-03-27 Thread Xu Mingmin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Mingmin updated BEAM-1797:
-
Description: 
In https://beam.apache.org/documentation/programming-guide/#transforms-sideio, 
the chapter {{Using GroupByKey}} describes how to apply a GroupBy on one single 
{{PCollection}} with {{GroupByKey}}. 
Would like to change the subject title as
{code}
Using GroupByKey and CoGroupByKey
{code}
add the usage of {{CoGroupByKey}} which run on multiple {{PCollection}} s

  was:
In https://beam.apache.org/documentation/programming-guide/#transforms-sideio, 
the chapter {{Using GroupByKey}} describes how to apply a GroupBy on one single 
{{PCollection}} with {{GroupByKey}}. 
Would like to change the subject title as
{code}
Using GroupByKey and CoGroupByKey
{code}
add the usage of {{CoGroupByKey}} which run on multiple {{PCollection}}s


> add CoGroupByKey to chapter 'Using GroupByKey'  
> 
>
> Key: BEAM-1797
> URL: https://issues.apache.org/jira/browse/BEAM-1797
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>Priority: Minor
>
> In 
> https://beam.apache.org/documentation/programming-guide/#transforms-sideio, 
> the chapter {{Using GroupByKey}} describes how to apply a GroupBy on one 
> single {{PCollection}} with {{GroupByKey}}. 
> Would like to change the subject title as
> {code}
> Using GroupByKey and CoGroupByKey
> {code}
> add the usage of {{CoGroupByKey}} which run on multiple {{PCollection}} s



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1797) add CoGroupByKey to chapter 'Using GroupByKey'

2017-03-27 Thread Xu Mingmin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Mingmin updated BEAM-1797:
-
Description: 
In https://beam.apache.org/documentation/programming-guide/#transforms-sideio, 
the chapter {{Using GroupByKey}} describes how to apply a GroupBy on one single 
{{PCollection}} with {{GroupByKey}}. 
Would like to change the subject title as
{code}
Using GroupByKey and CoGroupByKey
{code}
add the usage of {{CoGroupByKey}} which run on multiple {{PCollection}}s

  was:
In https://beam.apache.org/documentation/programming-guide/#transforms-sideio, 
the chapter {{Using GroupByKey}} describes how to join inside of one 
{{PCollection}} with {{GroupByKey}}. 
Would like to change the subject title as
{code}
Join using GroupByKey and CoGroupByKey
{code}
add the usage of {{CoGroupByKey}} which joins between {{PCollection}}s


> add CoGroupByKey to chapter 'Using GroupByKey'  
> 
>
> Key: BEAM-1797
> URL: https://issues.apache.org/jira/browse/BEAM-1797
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>Priority: Minor
>
> In 
> https://beam.apache.org/documentation/programming-guide/#transforms-sideio, 
> the chapter {{Using GroupByKey}} describes how to apply a GroupBy on one 
> single {{PCollection}} with {{GroupByKey}}. 
> Would like to change the subject title as
> {code}
> Using GroupByKey and CoGroupByKey
> {code}
> add the usage of {{CoGroupByKey}} which run on multiple {{PCollection}}s



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1328) Serialize/deserialize WindowingStrategy in a language-agnostic manner

2017-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944111#comment-15944111
 ] 

ASF GitHub Bot commented on BEAM-1328:
--

GitHub user kennknowles opened a pull request:

https://github.com/apache/beam/pull/2340

[BEAM-1328] DataflowRunner: Send WindowingStrategy as proto

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/beam ship-WindowingStrategy

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2340.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2340


commit 8f77b5e67794689b2b324c07447147d9661a7ea3
Author: Kenneth Knowles 
Date:   2017-03-27T21:55:12Z

DataflowRunner: Send WindowingStrategy as proto




> Serialize/deserialize WindowingStrategy in a language-agnostic manner
> -
>
> Key: BEAM-1328
> URL: https://issues.apache.org/jira/browse/BEAM-1328
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core, sdk-py
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Minor
>
> This is an upcoming blocker for Python, as the Python SDK needs to be able to 
> ship the pieces of the windowing strategy in a way a runner can grok. Only 
> the WindowFn should remain language-specific.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2340: [BEAM-1328] DataflowRunner: Send WindowingStrategy ...

2017-03-27 Thread kennknowles
GitHub user kennknowles opened a pull request:

https://github.com/apache/beam/pull/2340

[BEAM-1328] DataflowRunner: Send WindowingStrategy as proto

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/beam ship-WindowingStrategy

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2340.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2340


commit 8f77b5e67794689b2b324c07447147d9661a7ea3
Author: Kenneth Knowles 
Date:   2017-03-27T21:55:12Z

DataflowRunner: Send WindowingStrategy as proto




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (BEAM-1514) change default timestamp in KafkaIO

2017-03-27 Thread Xu Mingmin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Mingmin resolved BEAM-1514.
--
   Resolution: Fixed
Fix Version/s: Not applicable

> change default timestamp in KafkaIO
> ---
>
> Key: BEAM-1514
> URL: https://issues.apache.org/jira/browse/BEAM-1514
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
> Fix For: Not applicable
>
>
> When user use Kafka 0.10, the field 'timestamp' from Kafka should be used as 
> the default event timestamp.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1705) add method in Pipeline to visualize the DAG

2017-03-27 Thread Xu Mingmin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Mingmin updated BEAM-1705:
-
Priority: Minor  (was: Major)

> add method in Pipeline to visualize the DAG
> ---
>
> Key: BEAM-1705
> URL: https://issues.apache.org/jira/browse/BEAM-1705
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>Priority: Minor
>
> add a method to print out Pipeline with a easy-to-read format.
> Take WindowedWordCount.Java for example, the output could looks like:
> {code}
> 1. TextIO.Read/Read.out [PCollection]}]
>   2. ParDo(AddTimestamp).out [PCollection]}]
> 3. Window.Into()/Window.Assign.out [PCollection]}]
>   4. WordCount.CountWords/ParDo(ExtractWords).out [PCollection]}]
> 5. WordCount.CountWords/Count.PerElement/Init/Map.out [PCollection]}]
>   6. 
> WordCount.CountWords/Count.PerElement/Combine.perKey(Count)/GroupByKey.out 
> [PCollection]}]
> 7. 
> WordCount.CountWords/Count.PerElement/Combine.perKey(Count)/Combine.GroupedValues/ParDo(Anonymous).out
>  [PCollection]}]
>   8. ParDo(Anonymous).out [PCollection]}]
> 9. GroupByKey.out [PCollection]}]
>   10. ParDo(WriteWindowedFiles).out [PCollection]}]
> {code}
> Add the serial numbers here so it's readable for a complex pipeline with 
> multiple-in and multiple-out.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1775) fix issue of start_from_previous_offset in KafkaIO

2017-03-27 Thread Xu Mingmin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Mingmin updated BEAM-1775:
-
Priority: Minor  (was: Major)

> fix issue of start_from_previous_offset in KafkaIO
> --
>
> Key: BEAM-1775
> URL: https://issues.apache.org/jira/browse/BEAM-1775
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>Priority: Minor
>
> Jins George jins.geo...@aeris.net via aermail.onmicrosoft.com 
>   
> 5:50 PM (15 hours ago)
>   
> to user
> Hello,
> I am writing a Beam pipeline(streaming) with Flink runner to consume data 
> from Kafka and apply some transformations and persist to Hbase.
> If I restart the application ( due to failure/manual restart), consumer does 
> not resume from the offset where it was prior to restart. It always resume 
> from the latest offset.
> If I enable Flink checkpionting with hdfs state back-end, system appears to 
> be resuming from the earliest offset
> Is there a recommended way to resume from the offset where it was stopped ?
> Thanks,
> Jins George



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Build failed in Jenkins: beam_PostCommit_Java_MavenInstall #3055

2017-03-27 Thread Apache Jenkins Server
See 


--
[...truncated 215.66 KB...]
 x [deleted] (none) -> origin/pr/942/head
 x [deleted] (none) -> origin/pr/942/merge
 x [deleted] (none) -> origin/pr/943/head
 x [deleted] (none) -> origin/pr/943/merge
 x [deleted] (none) -> origin/pr/944/head
 x [deleted] (none) -> origin/pr/945/head
 x [deleted] (none) -> origin/pr/945/merge
 x [deleted] (none) -> origin/pr/946/head
 x [deleted] (none) -> origin/pr/946/merge
 x [deleted] (none) -> origin/pr/947/head
 x [deleted] (none) -> origin/pr/947/merge
 x [deleted] (none) -> origin/pr/948/head
 x [deleted] (none) -> origin/pr/948/merge
 x [deleted] (none) -> origin/pr/949/head
 x [deleted] (none) -> origin/pr/949/merge
 x [deleted] (none) -> origin/pr/95/head
 x [deleted] (none) -> origin/pr/95/merge
 x [deleted] (none) -> origin/pr/950/head
 x [deleted] (none) -> origin/pr/951/head
 x [deleted] (none) -> origin/pr/951/merge
 x [deleted] (none) -> origin/pr/952/head
 x [deleted] (none) -> origin/pr/952/merge
 x [deleted] (none) -> origin/pr/953/head
 x [deleted] (none) -> origin/pr/954/head
 x [deleted] (none) -> origin/pr/954/merge
 x [deleted] (none) -> origin/pr/955/head
 x [deleted] (none) -> origin/pr/955/merge
 x [deleted] (none) -> origin/pr/956/head
 x [deleted] (none) -> origin/pr/957/head
 x [deleted] (none) -> origin/pr/958/head
 x [deleted] (none) -> origin/pr/959/head
 x [deleted] (none) -> origin/pr/959/merge
 x [deleted] (none) -> origin/pr/96/head
 x [deleted] (none) -> origin/pr/96/merge
 x [deleted] (none) -> origin/pr/960/head
 x [deleted] (none) -> origin/pr/960/merge
 x [deleted] (none) -> origin/pr/961/head
 x [deleted] (none) -> origin/pr/962/head
 x [deleted] (none) -> origin/pr/962/merge
 x [deleted] (none) -> origin/pr/963/head
 x [deleted] (none) -> origin/pr/963/merge
 x [deleted] (none) -> origin/pr/964/head
 x [deleted] (none) -> origin/pr/965/head
 x [deleted] (none) -> origin/pr/965/merge
 x [deleted] (none) -> origin/pr/966/head
 x [deleted] (none) -> origin/pr/967/head
 x [deleted] (none) -> origin/pr/967/merge
 x [deleted] (none) -> origin/pr/968/head
 x [deleted] (none) -> origin/pr/968/merge
 x [deleted] (none) -> origin/pr/969/head
 x [deleted] (none) -> origin/pr/969/merge
 x [deleted] (none) -> origin/pr/97/head
 x [deleted] (none) -> origin/pr/97/merge
 x [deleted] (none) -> origin/pr/970/head
 x [deleted] (none) -> origin/pr/970/merge
 x [deleted] (none) -> origin/pr/971/head
 x [deleted] (none) -> origin/pr/971/merge
 x [deleted] (none) -> origin/pr/972/head
 x [deleted] (none) -> origin/pr/973/head
 x [deleted] (none) -> origin/pr/974/head
 x [deleted] (none) -> origin/pr/974/merge
 x [deleted] (none) -> origin/pr/975/head
 x [deleted] (none) -> origin/pr/975/merge
 x [deleted] (none) -> origin/pr/976/head
 x [deleted] (none) -> origin/pr/976/merge
 x [deleted] (none) -> origin/pr/977/head
 x [deleted] (none) -> origin/pr/977/merge
 x [deleted] (none) -> origin/pr/978/head
 x [deleted] (none) -> origin/pr/978/merge
 x [deleted] (none) -> origin/pr/979/head
 x [deleted] (none) -> origin/pr/979/merge
 x [deleted] (none) -> origin/pr/98/head
 x [deleted] (none) -> origin/pr/980/head
 x [deleted] (none) -> origin/pr/980/merge
 x [deleted] (none) -> origin/pr/981/head
 x [deleted] (none) -> origin/pr/982/head
 x [deleted] (none) -> origin/pr/982/merge
 x [deleted] (none) -> origin/pr/983/head
 x [deleted] (none) -> origin/pr/983/merge
 x [deleted] (none) -> origin/pr/984/head
 x [deleted] (none) -> origin/pr/984/merge
 x [deleted] (none) -> origin/pr/985/head
 x [deleted] (none) -> origin/pr/985/merge
 x [deleted] (none) -> origin/pr/986/head
 x [deleted] (none) -> origin/pr/986/merge
 x [deleted] (none) -> origin/pr/987/head
 x [deleted] (none) -> origin/pr/988/head
 x [deleted] (none) -> origin/pr/988/merge
 x [deleted] (none) -> 

[GitHub] beam pull request #2337: Update Dataflow Worker Image

2017-03-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2337


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: This closes #2337

2017-03-27 Thread tgroh
Repository: beam
Updated Branches:
  refs/heads/master ced1e5c3a -> 44472c76c


This closes #2337


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/44472c76
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/44472c76
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/44472c76

Branch: refs/heads/master
Commit: 44472c76cd33a95a4202c918d3b33cd7fbcc9fb9
Parents: ced1e5c 24eee22
Author: Thomas Groh 
Authored: Mon Mar 27 13:58:15 2017 -0700
Committer: Thomas Groh 
Committed: Mon Mar 27 13:58:15 2017 -0700

--
 runners/google-cloud-dataflow-java/pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--




[jira] [Commented] (BEAM-1805) a new option `asInnerJoin` for CoGroupByKey

2017-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944017#comment-15944017
 ] 

ASF GitHub Bot commented on BEAM-1805:
--

GitHub user XuMingmin opened a pull request:

https://github.com/apache/beam/pull/2339

[BEAM-1805] a new option `asInnerJoin` for CoGroupByKey

*INNER_JOIN* is available with 
[Join](https://github.com/apache/beam/blob/master/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java).
 However it's limited to two `PCollection`s, it's helpful to add it to 
`CoGroupByKey` for the case with multiple `PCollection` joined.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/XuMingmin/beam BEAM-1805

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2339.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2339


commit 09df9b5c15dd5b409d9da937225caa073a663255
Author: XuMingmin 
Date:   2017-03-15T21:45:17Z

Merge pull request #3 from apache/master

sync-up code

commit 93929873179a04fb0038400d57b55b3779d7b085
Author: XuMingmin 
Date:   2017-03-24T23:17:00Z

Merge pull request #4 from apache/master

sync up code

commit 62ff6bd83725414aa2ca11d30a5bd7776afe99fa
Author: XuMingmin 
Date:   2017-03-27T18:30:44Z

Merge pull request #5 from apache/master

sync-up code

commit 6f087a085094bd2520094ca0c5c540ae74b31542
Author: mingmxu 
Date:   2017-03-27T20:38:45Z

initial commit




> a new option `asInnerJoin` for CoGroupByKey
> ---
>
> Key: BEAM-1805
> URL: https://issues.apache.org/jira/browse/BEAM-1805
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>
> {{CoGroupByKey}} joins multiple PCollection>, act as full-outer join.
> Option {{asInnerJoin()}} restrict the output to convert to an inner-join 
> behavior.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2339: [BEAM-1805] a new option `asInnerJoin` for CoGroupB...

2017-03-27 Thread XuMingmin
GitHub user XuMingmin opened a pull request:

https://github.com/apache/beam/pull/2339

[BEAM-1805] a new option `asInnerJoin` for CoGroupByKey

*INNER_JOIN* is available with 
[Join](https://github.com/apache/beam/blob/master/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java).
 However it's limited to two `PCollection`s, it's helpful to add it to 
`CoGroupByKey` for the case with multiple `PCollection` joined.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/XuMingmin/beam BEAM-1805

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2339.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2339


commit 09df9b5c15dd5b409d9da937225caa073a663255
Author: XuMingmin 
Date:   2017-03-15T21:45:17Z

Merge pull request #3 from apache/master

sync-up code

commit 93929873179a04fb0038400d57b55b3779d7b085
Author: XuMingmin 
Date:   2017-03-24T23:17:00Z

Merge pull request #4 from apache/master

sync up code

commit 62ff6bd83725414aa2ca11d30a5bd7776afe99fa
Author: XuMingmin 
Date:   2017-03-27T18:30:44Z

Merge pull request #5 from apache/master

sync-up code

commit 6f087a085094bd2520094ca0c5c540ae74b31542
Author: mingmxu 
Date:   2017-03-27T20:38:45Z

initial commit




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (BEAM-1820) Source.getDefaultOutputCoder() should be @Nullable

2017-03-27 Thread Eugene Kirpichov (JIRA)
Eugene Kirpichov created BEAM-1820:
--

 Summary: Source.getDefaultOutputCoder() should be @Nullable
 Key: BEAM-1820
 URL: https://issues.apache.org/jira/browse/BEAM-1820
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core
Reporter: Eugene Kirpichov
Assignee: Davor Bonaci


Source.getDefaultOutputCoder() returns a coder for elements produced by the 
source.

However, the Source objects are nearly always hidden from the user and instead 
encapsulated in a transform. Often, an enclosing transform has a better idea of 
what coder should be used to encode these elements (e.g. a user supplied a 
Coder to that transform's configuration). In that case, it'd be good if 
Source.getDefaultOutputCoder() could just return null, and coder would have to 
be handled by the enclosing transform or perhaps specified on the output of 
that transform explicitly.

Right now there's a bunch of code in the SDK and runners that assumes 
Source.getDefaultOutputCoder() returns non-null. That code would need to be 
fixed to instead use the coder set on the collection produced by 
Read.from(source).

It all appears pretty easy to fix, so this is a good starter item.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Build failed in Jenkins: beam_PostCommit_Python_Verify #1638

2017-03-27 Thread Apache Jenkins Server
See 


Changes:

[tgroh] Do not reify timestamps in Reshuffle in Dataflow

[tgroh] Make WindowMappingFn#maximumLookback Configurable but Final

--
[...truncated 760.33 KB...]
  }
], 
"non_parallel_inputs": {}, 
"output_info": [
  {
"encoding": {
  "@type": "kind:windowed_value", 
  "component_encodings": [
{
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": [
{
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": []
}, 
{
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": []
}
  ], 
  "is_pair_like": true
}, 
{
  "@type": "kind:global_window"
}
  ], 
  "is_wrapper": true
}, 
"output_name": "out", 
"user_name": 
"assert_that/Group/Map(_merge_tagged_vals_under_key).out"
  }
], 
"parallel_input": {
  "@type": "OutputReference", 
  "output_name": "out", 
  "step_name": "s11"
}, 
"serialized_fn": "", 
"user_name": "assert_that/Group/Map(_merge_tagged_vals_under_key)"
  }
}, 
{
  "kind": "ParallelDo", 
  "name": "s13", 
  "properties": {
"display_data": [
  {
"key": "fn", 
"label": "Transform Function", 
"namespace": "apache_beam.transforms.core.CallableWrapperDoFn", 
"type": "STRING", 
"value": ""
  }, 
  {
"key": "fn", 
"label": "Transform Function", 
"namespace": "apache_beam.transforms.core.ParDo", 
"shortValue": "CallableWrapperDoFn", 
"type": "STRING", 
"value": "apache_beam.transforms.core.CallableWrapperDoFn"
  }
], 
"non_parallel_inputs": {}, 
"output_info": [
  {
"encoding": {
  "@type": "kind:windowed_value", 
  "component_encodings": [
{
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": [
{
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": []
}, 
{
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": []
}
  ], 
  "is_pair_like": true
}, 
{
  "@type": "kind:global_window"
}
  ], 
  "is_wrapper": true
}, 
"output_name": "out", 
"user_name": "assert_that/Unkey.out"
  }
], 
"parallel_input": {
  "@type": "OutputReference", 
  "output_name": "out", 
  "step_name": "s12"
}, 
"serialized_fn": "", 
"user_name": "assert_that/Unkey"
  }
}, 
{
  "kind": "ParallelDo", 
  "name": "s14", 
  "properties": {
"display_data": [
  {
"key": "fn", 
"label": "Transform Function", 
"namespace": "apache_beam.transforms.core.CallableWrapperDoFn", 
"type": "STRING", 
"value": "_equal"
  }, 
  {
"key": "fn", 
"label": "Transform Function", 
"namespace": "apache_beam.transforms.core.ParDo", 
"shortValue": "CallableWrapperDoFn", 
"type": "STRING", 
"value": "apache_beam.transforms.core.CallableWrapperDoFn"
  }
], 
"non_parallel_inputs": {}, 
"output_info": [
  {
"encoding": {
  "@type": "kind:windowed_value", 
  

[jira] [Commented] (BEAM-1778) Clean up pass of dataflow/google references/URLs in Java SDK

2017-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943971#comment-15943971
 ] 

ASF GitHub Bot commented on BEAM-1778:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2311


> Clean up pass of dataflow/google references/URLs in Java SDK
> 
>
> Key: BEAM-1778
> URL: https://issues.apache.org/jira/browse/BEAM-1778
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Melissa Pashniak
>Assignee: Melissa Pashniak
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2311: [BEAM-1778] Second clean up pass of dataflow refere...

2017-03-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2311


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: [BEAM-1778] Second clean up pass of dataflow references/URLs in Java SDK

2017-03-27 Thread kenn
Repository: beam
Updated Branches:
  refs/heads/master 1ea3b35ba -> ced1e5c3a


[BEAM-1778] Second clean up pass of dataflow references/URLs in Java SDK


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/07a50b48
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/07a50b48
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/07a50b48

Branch: refs/heads/master
Commit: 07a50b48af9c176732ff172e3d612052d4e15386
Parents: 87c8ef0
Author: melissa 
Authored: Thu Mar 23 16:49:41 2017 -0700
Committer: Kenneth Knowles 
Committed: Mon Mar 27 13:15:48 2017 -0700

--
 .../main/java/org/apache/beam/sdk/io/AvroSource.java |  2 +-
 .../java/org/apache/beam/sdk/io/BoundedSource.java   |  4 +---
 .../main/java/org/apache/beam/sdk/io/PubsubIO.java   |  6 --
 .../org/apache/beam/sdk/io/PubsubUnboundedSink.java  |  2 --
 .../apache/beam/sdk/io/PubsubUnboundedSource.java|  2 --
 .../main/java/org/apache/beam/sdk/io/TFRecordIO.java |  6 +++---
 .../main/java/org/apache/beam/sdk/io/XmlSink.java|  4 ++--
 .../org/apache/beam/sdk/io/range/ByteKeyRange.java   |  4 +---
 .../java/org/apache/beam/sdk/options/GcpOptions.java |  2 +-
 .../beam/sdk/testing/SerializableMatchers.java   |  2 +-
 .../org/apache/beam/sdk/testing/StreamingIT.java |  2 +-
 .../java/org/apache/beam/sdk/util/CoderUtils.java| 15 +++
 .../beam/sdk/coders/protobuf/ProtobufUtilTest.java   |  1 -
 .../apache/beam/sdk/runners/PipelineRunnerTest.java  |  2 +-
 .../io/gcp/bigquery/BigQueryTableRowIterator.java|  3 +--
 .../beam/sdk/io/gcp/datastore/DatastoreV1.java   |  2 +-
 .../org/apache/beam/sdk/io/hdfs/HDFSFileSink.java|  3 +--
 .../java/org/apache/beam/sdk/io/jdbc/JdbcIO.java |  4 
 .../java/org/apache/beam/sdk/io/kafka/KafkaIO.java   |  4 +---
 .../resources/META-INF/maven/archetype-metadata.xml  |  2 +-
 .../resources/META-INF/maven/archetype-metadata.xml  |  2 +-
 .../resources/META-INF/maven/archetype-metadata.xml  |  2 +-
 22 files changed, 26 insertions(+), 50 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/07a50b48/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSource.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSource.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSource.java
index fe3ac5c..0c52dea 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSource.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSource.java
@@ -109,7 +109,7 @@ import 
org.apache.commons.compress.utils.CountingInputStream;
  * than the end offset of the source.
  *
  * To use XZ-encoded Avro files, please include an explicit dependency on 
{@code xz-1.5.jar},
- * which has been marked as optional in the Maven {@code sdk/pom.xml} for 
Google Cloud Dataflow:
+ * which has been marked as optional in the Maven {@code sdk/pom.xml}.
  *
  * {@code
  * 

http://git-wip-us.apache.org/repos/asf/beam/blob/07a50b48/sdks/java/core/src/main/java/org/apache/beam/sdk/io/BoundedSource.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/BoundedSource.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/BoundedSource.java
index 8e5145c..8538e7f 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/BoundedSource.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/BoundedSource.java
@@ -104,9 +104,7 @@ public abstract class BoundedSource extends Source {
*
* Sources which support dynamic work rebalancing should use
* {@link org.apache.beam.sdk.io.range.RangeTracker} to manage the 
(source-specific)
-   * range of positions that is being split. If your source supports dynamic 
work rebalancing,
-   * please use that class to implement it if possible; if not possible, 
please contact the team
-   * at dataflow-feedb...@google.com.
+   * range of positions that is being split.
*/
   @Experimental(Experimental.Kind.SOURCE_SINK)
   public abstract static class BoundedReader extends Source.Reader {

http://git-wip-us.apache.org/repos/asf/beam/blob/07a50b48/sdks/java/core/src/main/java/org/apache/beam/sdk/io/PubsubIO.java
--
diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/PubsubIO.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/PubsubIO.java
index 806b7da..c1ad353 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/PubsubIO.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/PubsubIO.java
@@ -839,9 +839,6 @@ public class PubsubIO {
  * TODO: Consider replacing with 

[2/2] beam git commit: This closes #2311: Second clean up pass of dataflow references/URLs in Java SDK

2017-03-27 Thread kenn
This closes #2311: Second clean up pass of dataflow references/URLs in Java SDK

  [BEAM-1778] Second clean up pass of dataflow references/URLs in Java SDK


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/ced1e5c3
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/ced1e5c3
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/ced1e5c3

Branch: refs/heads/master
Commit: ced1e5c3a1ee1e90ffe5843f7b4e8daf2cb3d46e
Parents: 1ea3b35 07a50b4
Author: Kenneth Knowles 
Authored: Mon Mar 27 13:16:01 2017 -0700
Committer: Kenneth Knowles 
Committed: Mon Mar 27 13:16:01 2017 -0700

--
 .../main/java/org/apache/beam/sdk/io/AvroSource.java |  2 +-
 .../java/org/apache/beam/sdk/io/BoundedSource.java   |  4 +---
 .../main/java/org/apache/beam/sdk/io/PubsubIO.java   |  6 --
 .../org/apache/beam/sdk/io/PubsubUnboundedSink.java  |  2 --
 .../apache/beam/sdk/io/PubsubUnboundedSource.java|  2 --
 .../main/java/org/apache/beam/sdk/io/TFRecordIO.java |  6 +++---
 .../main/java/org/apache/beam/sdk/io/XmlSink.java|  4 ++--
 .../org/apache/beam/sdk/io/range/ByteKeyRange.java   |  4 +---
 .../java/org/apache/beam/sdk/options/GcpOptions.java |  2 +-
 .../beam/sdk/testing/SerializableMatchers.java   |  2 +-
 .../org/apache/beam/sdk/testing/StreamingIT.java |  2 +-
 .../java/org/apache/beam/sdk/util/CoderUtils.java| 15 +++
 .../beam/sdk/coders/protobuf/ProtobufUtilTest.java   |  1 -
 .../apache/beam/sdk/runners/PipelineRunnerTest.java  |  2 +-
 .../io/gcp/bigquery/BigQueryTableRowIterator.java|  3 +--
 .../beam/sdk/io/gcp/datastore/DatastoreV1.java   |  2 +-
 .../org/apache/beam/sdk/io/hdfs/HDFSFileSink.java|  3 +--
 .../java/org/apache/beam/sdk/io/jdbc/JdbcIO.java |  4 
 .../java/org/apache/beam/sdk/io/kafka/KafkaIO.java   |  4 +---
 .../resources/META-INF/maven/archetype-metadata.xml  |  2 +-
 .../resources/META-INF/maven/archetype-metadata.xml  |  2 +-
 .../resources/META-INF/maven/archetype-metadata.xml  |  2 +-
 22 files changed, 26 insertions(+), 50 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/ced1e5c3/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java
--



[jira] [Created] (BEAM-1819) Key should be available in @OnTimer methods

2017-03-27 Thread Thomas Groh (JIRA)
Thomas Groh created BEAM-1819:
-

 Summary: Key should be available in @OnTimer methods
 Key: BEAM-1819
 URL: https://issues.apache.org/jira/browse/BEAM-1819
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Reporter: Thomas Groh


Every timer firing has an associated key. This key should be available when the 
timer is delivered to a user's {{DoFn}}, so they don't have to store it in 
state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[2/2] beam git commit: Make WindowMappingFn#maximumLookback Configurable but Final

2017-03-27 Thread tgroh
Make WindowMappingFn#maximumLookback Configurable but Final

This enforces that it return a constant value.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/61bb6b4e
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/61bb6b4e
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/61bb6b4e

Branch: refs/heads/master
Commit: 61bb6b4e301a3675f20728730a2d691a79941156
Parents: 88ffc97
Author: Thomas Groh 
Authored: Thu Mar 23 15:20:06 2017 -0700
Committer: Thomas Groh 
Committed: Mon Mar 27 12:55:30 2017 -0700

--
 .../apache/beam/sdk/testing/StaticWindows.java  |  8 +--
 .../sdk/transforms/windowing/GlobalWindows.java |  6 -
 .../windowing/PartitioningWindowFn.java |  6 -
 .../transforms/windowing/SlidingWindows.java|  5 
 .../transforms/windowing/WindowMappingFn.java   | 24 +---
 .../sdk/util/IdentitySideInputWindowFn.java |  6 -
 6 files changed, 22 insertions(+), 33 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/61bb6b4e/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/StaticWindows.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/StaticWindows.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/StaticWindows.java
index 4be88c8..fde1669 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/StaticWindows.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/StaticWindows.java
@@ -103,7 +103,7 @@ final class StaticWindows extends 
NonMergingWindowFn {
 
   @Override
   public WindowMappingFn getDefaultWindowMappingFn() {
-return new WindowMappingFn() {
+return new WindowMappingFn(Duration.millis(Long.MAX_VALUE)) 
{
   @Override
   public BoundedWindow getSideInputWindow(BoundedWindow mainWindow) {
 checkArgument(
@@ -112,12 +112,6 @@ final class StaticWindows extends 
NonMergingWindowFn {
 StaticWindows.class.getSimpleName());
 return mainWindow;
   }
-
-  @Override
-  public Duration maximumLookback() {
-// TODO: This may be unsafe.
-return Duration.millis(Long.MAX_VALUE);
-  }
 };
   }
 }

http://git-wip-us.apache.org/repos/asf/beam/blob/61bb6b4e/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/GlobalWindows.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/GlobalWindows.java
 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/GlobalWindows.java
index e91fad1..400be1f 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/GlobalWindows.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/GlobalWindows.java
@@ -20,7 +20,6 @@ package org.apache.beam.sdk.transforms.windowing;
 import java.util.Collection;
 import java.util.Collections;
 import org.apache.beam.sdk.coders.Coder;
-import org.joda.time.Duration;
 import org.joda.time.Instant;
 
 /**
@@ -56,11 +55,6 @@ public class GlobalWindows extends 
NonMergingWindowFn {
 public GlobalWindow getSideInputWindow(BoundedWindow mainWindow) {
   return GlobalWindow.INSTANCE;
 }
-
-@Override
-public Duration maximumLookback() {
-  return Duration.ZERO;
-}
   }
 
   @Override

http://git-wip-us.apache.org/repos/asf/beam/blob/61bb6b4e/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/PartitioningWindowFn.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/PartitioningWindowFn.java
 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/PartitioningWindowFn.java
index 40cff8a..40ee68a 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/PartitioningWindowFn.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/PartitioningWindowFn.java
@@ -19,7 +19,6 @@ package org.apache.beam.sdk.transforms.windowing;
 
 import java.util.Arrays;
 import java.util.Collection;
-import org.joda.time.Duration;
 import org.joda.time.Instant;
 
 /**
@@ -52,11 +51,6 @@ public abstract class PartitioningWindowFn
 }
 return assignWindow(mainWindow.maxTimestamp());
   }
-
-  @Override
-  public Duration maximumLookback() {
-return Duration.ZERO;
-  }
 };
   }
 


[GitHub] beam pull request #2307: Make WindowMappingFn#maximumLookback Configurable b...

2017-03-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2307


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: This closes #2307

2017-03-27 Thread tgroh
Repository: beam
Updated Branches:
  refs/heads/master 88ffc97ec -> 1ea3b35ba


This closes #2307


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/1ea3b35b
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/1ea3b35b
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/1ea3b35b

Branch: refs/heads/master
Commit: 1ea3b35ba00ee69d57cc9bb3b3a0e932df4b8cd7
Parents: 88ffc97 61bb6b4
Author: Thomas Groh 
Authored: Mon Mar 27 12:55:30 2017 -0700
Committer: Thomas Groh 
Committed: Mon Mar 27 12:55:30 2017 -0700

--
 .../apache/beam/sdk/testing/StaticWindows.java  |  8 +--
 .../sdk/transforms/windowing/GlobalWindows.java |  6 -
 .../windowing/PartitioningWindowFn.java |  6 -
 .../transforms/windowing/SlidingWindows.java|  5 
 .../transforms/windowing/WindowMappingFn.java   | 24 +---
 .../sdk/util/IdentitySideInputWindowFn.java |  6 -
 6 files changed, 22 insertions(+), 33 deletions(-)
--




[jira] [Commented] (BEAM-1738) DataflowRunner should override Reshuffle transform

2017-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943942#comment-15943942
 ] 

ASF GitHub Bot commented on BEAM-1738:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2269


> DataflowRunner should override Reshuffle transform
> --
>
> Key: BEAM-1738
> URL: https://issues.apache.org/jira/browse/BEAM-1738
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Ben Chambers
>Assignee: Thomas Groh
> Fix For: Not applicable
>
>
> Verify that the code works, and then remove the reification of windows for 
> the Dataflow Runner since it handles Reshufle specially and doesn't need the 
> explicit reification.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2269: [BEAM-1738] Do not reify timestamps in Reshuffle in...

2017-03-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2269


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: This closes #2269

2017-03-27 Thread tgroh
Repository: beam
Updated Branches:
  refs/heads/master 815c9fa75 -> 88ffc97ec


This closes #2269


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/88ffc97e
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/88ffc97e
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/88ffc97e

Branch: refs/heads/master
Commit: 88ffc97ecc57861bcf823c6e6af549b01e023755
Parents: 815c9fa e991e61
Author: Thomas Groh 
Authored: Mon Mar 27 12:54:03 2017 -0700
Committer: Thomas Groh 
Committed: Mon Mar 27 12:54:03 2017 -0700

--
 .../beam/runners/dataflow/DataflowRunner.java   |  1 +
 .../dataflow/ReshuffleOverrideFactory.java  | 84 
 2 files changed, 85 insertions(+)
--




[2/2] beam git commit: Do not reify timestamps in Reshuffle in Dataflow

2017-03-27 Thread tgroh
Do not reify timestamps in Reshuffle in Dataflow

Dataflow has special handling of the ReshuffleTrigger that outputs
elements with their original timestamps.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/e991e619
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/e991e619
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/e991e619

Branch: refs/heads/master
Commit: e991e619657cbb02a0edacbb72a95ddf67eaa4c3
Parents: 815c9fa
Author: Thomas Groh 
Authored: Fri Mar 17 16:21:27 2017 -0700
Committer: Thomas Groh 
Committed: Mon Mar 27 12:54:03 2017 -0700

--
 .../beam/runners/dataflow/DataflowRunner.java   |  1 +
 .../dataflow/ReshuffleOverrideFactory.java  | 84 
 2 files changed, 85 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/e991e619/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java
--
diff --git 
a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java
 
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java
index 24e23da..c612a20 100644
--- 
a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java
+++ 
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java
@@ -366,6 +366,7 @@ public class DataflowRunner extends 
PipelineRunner {
   BatchViewOverrides.BatchViewAsIterable.class, this));
 }
 ptoverrides
+.put(PTransformMatchers.classEqualTo(Reshuffle.class), new 
ReshuffleOverrideFactory())
 // Order is important. Streaming views almost all use Combine 
internally.
 .put(
 PTransformMatchers.classEqualTo(Combine.GroupedValues.class),

http://git-wip-us.apache.org/repos/asf/beam/blob/e991e619/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/ReshuffleOverrideFactory.java
--
diff --git 
a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/ReshuffleOverrideFactory.java
 
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/ReshuffleOverrideFactory.java
new file mode 100644
index 000..230f5dc
--- /dev/null
+++ 
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/ReshuffleOverrideFactory.java
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.dataflow;
+
+import 
org.apache.beam.runners.core.construction.SingleInputOutputOverrideFactory;
+import org.apache.beam.sdk.runners.PTransformOverrideFactory;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.GroupByKey;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
+import org.apache.beam.sdk.transforms.windowing.Window;
+import org.apache.beam.sdk.util.IdentityWindowFn;
+import org.apache.beam.sdk.util.Reshuffle;
+import org.apache.beam.sdk.util.ReshuffleTrigger;
+import org.apache.beam.sdk.util.WindowingStrategy;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.joda.time.Duration;
+
+/**
+ * A {@link PTransformOverrideFactory} that overrides {@link Reshuffle} with a 
version that does not
+ * reify timestamps. Dataflow has special handling of the {@link 
ReshuffleTrigger} which never
+ * buffers elements and outputs elements with their original timestamp.
+ */
+class ReshuffleOverrideFactory
+extends SingleInputOutputOverrideFactory<
+PCollection>, PCollection>, Reshuffle> {
+  @Override
+  public PTransform>, PCollection>> 

[jira] [Commented] (BEAM-1807) IO ITs: shared language neutral directory for kubernetes resources

2017-03-27 Thread Stephen Sisk (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943940#comment-15943940
 ] 

Stephen Sisk commented on BEAM-1807:


great! For Jenkins: what is involved in updating the seed job?  (ie, where do I 
edit that/is that something I can edit?)

> IO ITs: shared language neutral directory for kubernetes resources
> --
>
> Key: BEAM-1807
> URL: https://issues.apache.org/jira/browse/BEAM-1807
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Stephen Sisk
>Assignee: Davor Bonaci
>
> This is a follow-up to BEAM-1644. As was discussed there: 
> "
> It is the case that different IOs will be created that connect to the same 
> data stores - HadoopInputFormat in particular uses ES and cassandra, which 
> are also used in their respective IOs as well. Jdbc is likely to have the 
> same type of overlap.
> It would be nice to share [...] kubernetes/docker scripts so that we don't 
> need to repeat them in each module. 
> "
> For BEAM-1644, we created a directory for java io-common resources - that's 
> perfect for the java pipeline options we needed. However, we shouldn't put 
> kubernetes resources in the newly created sdks/java/io/common because that'd 
> indicate that the scripts are java specific. 
> It's also worth noting that we have this problem already for jenkins and 
> travis, and solved it by creating .jenkins and .travis directories at the 
> top-level.
> Proposal
> ===
> move .jenkins and .travis into a new top level ".test-infra" folder, and put 
> a kubernetes directory there.
> So the new structure would look like:
> .test-infra
>   jenkins
>   travis
>   kubernetes
> sdks
> runners
> examples
> ...
> I don't know if travis/jenkins must look in .travis/.jenkins directories or 
> if those are things that we can change. If both do, would lessen my 
> excitement, but if at least one other thing can share that directory, that 
> would make it worthwhile in my mind.
> Alternate proposal
> ===
> add a top-level .kubernetes directory alongside .jenkins/.travis. 
> I'm not a huge fan of this since I'd love to not add more top level clutter.
> Alternate proposal
> ===
> We could create:
> sdks/common/test-infra/kubernetes
> and put the scripts there. 
> I don't like this option as much because it's kind of a just a random 
> directory and is disconnected from the rest of the test infrastructure 
> scripts that we use. I also prefer the other option since it reduces the 
> amount of top-level clutter.
> Thoughts?
> cc [~jasonkuster] [~davor] [~iemejia]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2192: DO NOT MERGE: Track the reader in UnboundedReaderEv...

2017-03-27 Thread bjchambers
Github user bjchambers closed the pull request at:

https://github.com/apache/beam/pull/2192


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-1027) Hosting data stores to enable IO Transform testing

2017-03-27 Thread Stephen Sisk (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943935#comment-15943935
 ] 

Stephen Sisk commented on BEAM-1027:


We currently have 1 IO data store running on a k8 instance. I'll resolve this 
bug once we've gotten a second going (so we know it's repeatable!)

> Hosting data stores to enable IO Transform testing
> --
>
> Key: BEAM-1027
> URL: https://issues.apache.org/jira/browse/BEAM-1027
> Project: Beam
>  Issue Type: Task
>  Components: testing
>Reporter: Stephen Sisk
>Assignee: Stephen Sisk
>
> Currently we have a good set of unit tests for our IO Transforms - those
> tend to run against in-memory versions of the data stores. However, we'd
> like to further increase our test coverage to include running them against
> real instances of the data stores that the IO Transforms work against (e.g.
> cassandra, mongodb, kafka, etc…), which means we'll need to have real
> instances of various data stores.
> Additionally, if we want to do performance regression detection, it's
> important to have instances of the services that behave realistically,
> which isn't true of in-memory or dev versions of the services.
> My proposed solution is in 
> https://lists.apache.org/thread.html/367fd9669411f21c9ec1f2d27df60464f49d5ce81e6bd16de401d035@%3Cdev.beam.apache.org%3E
>  
> - it still needs further discussion, and (assuming we agree on the general 
> idea), the beam community needs to decide which cluster management software 
> we want to use.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #2655

2017-03-27 Thread Apache Jenkins Server
See 




Jenkins build is back to normal : beam_PostCommit_Java_MavenInstall #3051

2017-03-27 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-1818) Expose side-channel inputs in PTransform

2017-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943902#comment-15943902
 ] 

ASF GitHub Bot commented on BEAM-1818:
--

GitHub user tgroh opened a pull request:

https://github.com/apache/beam/pull/2338

[BEAM-1818] Add PTransform#getAdditionalInputs()

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
This permits all PTransforms to expose any input PValues they recieve
via a side channel. This lets the Pipeline include those inputs as input
to the PTransform node even when the actual generic PTransform types do
not include those inputs (e.g. as a Side Input to a ParDo, which is not
represented in the input PCollection type).

Implement getAdditionalInputs in ParDo and Combine.

getAdditionalInputs is currently unused.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/beam ptransform_additional_outputs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2338.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2338


commit d4314198869436a2923039938460e62eb9879133
Author: Thomas Groh 
Date:   2017-03-22T20:48:57Z

Add PTransform#getAdditionalInputs()

This permits all PTransforms to expose any input PValues they recieve
via a side channel. This lets the Pipeline include those inputs as input
to the PTransform node even when the actual generic PTransform types do
not include those inputs (e.g. as a Side Input to a ParDo, which is not
represented in the input PCollection type).

Implement getAdditionalInputs in ParDo and Combine.

getAdditionalInputs is currently unused.




> Expose side-channel inputs in PTransform
> 
>
> Key: BEAM-1818
> URL: https://issues.apache.org/jira/browse/BEAM-1818
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>
> This enables all PTransforms to consume inputs without using those inputs 
> provided in the PInput they are applied to, the same way ParDo and Combine do 
> with {{getSideInputs}}
> See 
> https://docs.google.com/document/d/1GlcfuMLCTnvleZUWUN4ZAhK_ybXJwDb8ms593DFuA5U/edit



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1818) Expose side-channel inputs in PTransform

2017-03-27 Thread Thomas Groh (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Groh updated BEAM-1818:
--
Description: 
This enables all PTransforms to consume inputs without using those inputs 
provided in the PInput they are applied to, the same way ParDo and Combine do 
with {{getSideInputs}}

See 
https://docs.google.com/document/d/1GlcfuMLCTnvleZUWUN4ZAhK_ybXJwDb8ms593DFuA5U/edit

  was:This enables all PTransforms to consume inputs without using those inputs 
provided in the PInput they are applied to, the same way ParDo and Combine do 
with {{getSideInputs}}


> Expose side-channel inputs in PTransform
> 
>
> Key: BEAM-1818
> URL: https://issues.apache.org/jira/browse/BEAM-1818
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>
> This enables all PTransforms to consume inputs without using those inputs 
> provided in the PInput they are applied to, the same way ParDo and Combine do 
> with {{getSideInputs}}
> See 
> https://docs.google.com/document/d/1GlcfuMLCTnvleZUWUN4ZAhK_ybXJwDb8ms593DFuA5U/edit



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   >