[jira] [Created] (BEAM-1491) HDFSFileSource should be able to read the HADOOP_CONF(YARN_CONF) environmen variable

2017-02-15 Thread yangping wu (JIRA)
yangping wu created BEAM-1491:
-

 Summary: HDFSFileSource should be able to read the 
HADOOP_CONF(YARN_CONF) environmen variable
 Key: BEAM-1491
 URL: https://issues.apache.org/jira/browse/BEAM-1491
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core
Affects Versions: 0.5.0
Reporter: yangping wu
Assignee: Davor Bonaci


Currently, if we want to read file store on HDFS, we will do it as follow:
{code} PCollection> resultCollection = 
p.apply(HDFSFileSource.readFrom(
"hdfs://hadoopserver:8020/tmp/data.txt",
TextInputFormat.class, LongWritable.class, Text.class));
{code}
As we have seen above, we must be set {{hdfs://hadoopserver:8020}} in the file 
path, and we cann't set any variables when read file, because in 
[HDFSFileSource.java|https://github.com/apache/beam/blob/master/sdks/java/io/hdfs/src/main/java/org/apache/beam/sdk/io/hdfs/HDFSFileSource.java#L310]
 we initialize {{job}} instance as follow:
{code}
this.job = Job.getInstance();
{code}
we should initialize {{job}} instance by configure:
{code}
this.job = Job.getInstance(conf);
{code}
where {{conf}} is instance of {{Configuration}}, and we initialize {{conf}} by 
reading {{HADOOP_CONF}}({{YARN_CONF}}) environmen variable,then we can read 
HDFS file like this:
{code} PCollection> resultCollection = 
p.apply(HDFSFileSource.readFrom(
"/tmp/data.txt",
TextInputFormat.class, LongWritable.class, Text.class));
{code}
note we don't specify {{hdfs://hadoopserver:8020}} prefix, because the program 
read it from  {{HADOOP_CONF}}({{YARN_CONF}}) environmen.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build became unstable: beam_PostCommit_Java_RunnableOnService_Spark #953

2017-02-15 Thread Apache Jenkins Server
See 




[GitHub] beam pull request #2012: [BEAM-1393/1394/1445] Update to Flink 1.2.0 and fix...

2017-02-15 Thread aljoscha
GitHub user aljoscha opened a pull request:

https://github.com/apache/beam/pull/2012

[BEAM-1393/1394/1445] Update to Flink 1.2.0 and fix resulting problems

This is a copy of #1960. I'm hoping to be able to figure out what's causing 
the problems in the Maven verify hook.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aljoscha/beam finish-pr-1969-flink12

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2012.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2012


commit a4e182a731b8e09b6b9017c1f316991883492884
Author: JingsongLi 
Date:   2017-02-07T08:11:12Z

[BEAM-1393/1394/1445] Update to Flink 1.2.0 and fix resulting problems

We now use the Flink InternalTimerService for all our timer needs and we
use Flink Broadcast state to store side inputs.

Changing those was both necessary and easier to to in one commit while
updating to Flink 1.2.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-1393) Update Flink Runner to Flink 1.2.0

2017-02-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15867524#comment-15867524
 ] 

ASF GitHub Bot commented on BEAM-1393:
--

GitHub user aljoscha opened a pull request:

https://github.com/apache/beam/pull/2012

[BEAM-1393/1394/1445] Update to Flink 1.2.0 and fix resulting problems

This is a copy of #1960. I'm hoping to be able to figure out what's causing 
the problems in the Maven verify hook.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aljoscha/beam finish-pr-1969-flink12

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2012.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2012


commit a4e182a731b8e09b6b9017c1f316991883492884
Author: JingsongLi 
Date:   2017-02-07T08:11:12Z

[BEAM-1393/1394/1445] Update to Flink 1.2.0 and fix resulting problems

We now use the Flink InternalTimerService for all our timer needs and we
use Flink Broadcast state to store side inputs.

Changing those was both necessary and easier to to in one commit while
updating to Flink 1.2.




> Update Flink Runner to Flink 1.2.0
> --
>
> Key: BEAM-1393
> URL: https://issues.apache.org/jira/browse/BEAM-1393
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Jingsong Lee
>
> When we update to 1.2.0 we can use the new internal Timer API that is 
> available to Flink operators: {{InternalTimerService}} and also use broadcast 
> state to store side-input data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-774) Implement Metrics support for Spark runner

2017-02-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15867529#comment-15867529
 ] 

ASF GitHub Bot commented on BEAM-774:
-

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/1750


> Implement Metrics support for Spark runner
> --
>
> Key: BEAM-774
> URL: https://issues.apache.org/jira/browse/BEAM-774
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-spark
>Reporter: Ben Chambers
>Assignee: Aviem Zur
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[1/7] beam git commit: [BEAM-774] Implement Metrics support for Spark runner

2017-02-15 Thread amitsela
Repository: beam
Updated Branches:
  refs/heads/master e720a7c43 -> 24ecf6bbf


[BEAM-774] Implement Metrics support for Spark runner


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/8e203ea2
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/8e203ea2
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/8e203ea2

Branch: refs/heads/master
Commit: 8e203ea2ef34f01eacaa99eb43294f143532ecc3
Parents: e720a7c
Author: Aviem Zur 
Authored: Wed Jan 11 14:42:53 2017 +0200
Committer: Sela 
Committed: Wed Feb 15 11:10:47 2017 +0200

--
 .../beam/runners/spark/SparkPipelineResult.java |   5 +-
 .../apache/beam/runners/spark/SparkRunner.java  |   3 +
 .../runners/spark/metrics/MetricAggregator.java | 113 
 .../spark/metrics/MetricsAccumulator.java   |  60 
 .../spark/metrics/MetricsAccumulatorParam.java  |  42 +++
 .../spark/metrics/SparkMetricResults.java   | 188 
 .../spark/metrics/SparkMetricsContainer.java| 288 +++
 .../runners/spark/metrics/package-info.java |  20 ++
 .../runners/spark/translation/DoFnFunction.java |  26 +-
 .../translation/DoFnRunnerWithMetrics.java  |  98 +++
 .../spark/translation/EvaluationContext.java|   4 +
 .../spark/translation/SparkContextFactory.java  |   2 +
 .../spark/translation/TransformTranslator.java  |  13 +-
 .../streaming/StreamingTransformTranslator.java |  12 +-
 .../apache/beam/sdk/metrics/MetricMatchers.java |  96 +++
 .../apache/beam/sdk/metrics/MetricsTest.java|  24 +-
 16 files changed, 974 insertions(+), 20 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/8e203ea2/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineResult.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineResult.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineResult.java
index b1027a6..d0d5569 100644
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineResult.java
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineResult.java
@@ -24,6 +24,8 @@ import java.util.concurrent.Future;
 import java.util.concurrent.TimeUnit;
 import java.util.concurrent.TimeoutException;
 import org.apache.beam.runners.spark.aggregators.SparkAggregators;
+import org.apache.beam.runners.spark.metrics.SparkMetricResults;
+import org.apache.beam.runners.spark.metrics.SparkMetricsContainer;
 import org.apache.beam.runners.spark.translation.SparkContextFactory;
 import org.apache.beam.sdk.AggregatorRetrievalException;
 import org.apache.beam.sdk.AggregatorValues;
@@ -122,7 +124,8 @@ public abstract class SparkPipelineResult implements 
PipelineResult {
 
   @Override
   public MetricResults metrics() {
-throw new UnsupportedOperationException("The SparkRunner does not 
currently support metrics.");
+return new SparkMetricResults(
+
SparkMetricsContainer.getAccumulator(SparkContextFactory.EMPTY_CONTEXT));
   }
 
   @Override

http://git-wip-us.apache.org/repos/asf/beam/blob/8e203ea2/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunner.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunner.java 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunner.java
index 46492f8..cc20a30 100644
--- a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunner.java
+++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunner.java
@@ -29,6 +29,7 @@ import 
org.apache.beam.runners.spark.aggregators.AccumulatorSingleton;
 import org.apache.beam.runners.spark.aggregators.NamedAggregators;
 import org.apache.beam.runners.spark.aggregators.SparkAggregators;
 import 
org.apache.beam.runners.spark.aggregators.metrics.AggregatorMetricSource;
+import org.apache.beam.runners.spark.metrics.SparkMetricsContainer;
 import org.apache.beam.runners.spark.translation.EvaluationContext;
 import org.apache.beam.runners.spark.translation.SparkContextFactory;
 import org.apache.beam.runners.spark.translation.SparkPipelineTranslator;
@@ -141,6 +142,8 @@ public final class SparkRunner extends 
PipelineRunner {
 final Accumulator accum =
 SparkAggregators.getOrCreateNamedAggregators(jsc, maybeCheckpointDir);
 final NamedAggregators initialValue = accum.value();
+// Instantiate metrics accumulator
+SparkMetricsContainer.getAccumulator(jsc);
 
 if (opts.getEnableSparkMetricSinks()) {
   final MetricsSystem metricsSystem = 
SparkEnv$.MODULE$.get().metricsSystem();

http://git-wip-us.apache.org/repos/asf/beam/blob/8e203ea2/runners/spark/src/main/java/org/apach

[5/7] beam git commit: Register beam metrics with a MetricSource in Spark

2017-02-15 Thread amitsela
Register beam metrics with a MetricSource in Spark


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/31624fed
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/31624fed
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/31624fed

Branch: refs/heads/master
Commit: 31624fed4e15dd9e5f8aeac6315ca3cfb73f8616
Parents: 8e203ea
Author: Aviem Zur 
Authored: Tue Jan 17 15:03:59 2017 +0200
Committer: Sela 
Committed: Wed Feb 15 11:10:48 2017 +0200

--
 .../beam/runners/spark/SparkPipelineResult.java |   6 +-
 .../apache/beam/runners/spark/SparkRunner.java  |  31 +--
 .../spark/aggregators/AccumulatorSingleton.java | 137 
 .../aggregators/AggregatorsAccumulator.java | 137 
 .../spark/aggregators/NamedAggregators.java |   2 +-
 .../spark/aggregators/SparkAggregators.java |   2 +-
 .../aggregators/metrics/AggregatorMetric.java   |  44 
 .../metrics/AggregatorMetricSource.java |  50 -
 .../metrics/WithNamedAggregatorsSupport.java| 174 ---
 .../spark/aggregators/metrics/sink/CsvSink.java |  39 
 .../aggregators/metrics/sink/GraphiteSink.java  |  39 
 .../aggregators/metrics/sink/package-info.java  |  23 --
 .../runners/spark/metrics/AggregatorMetric.java |  43 
 .../spark/metrics/AggregatorMetricSource.java   |  50 +
 .../runners/spark/metrics/CompositeSource.java  |  49 +
 .../spark/metrics/MetricsAccumulator.java   |  15 +-
 .../spark/metrics/MetricsAccumulatorParam.java  |   2 +-
 .../runners/spark/metrics/SparkBeamMetric.java  |  62 ++
 .../spark/metrics/SparkBeamMetricSource.java|  50 +
 .../spark/metrics/SparkMetricResults.java   |  12 +-
 .../spark/metrics/SparkMetricsContainer.java|  38 ++--
 .../spark/metrics/WithMetricsSupport.java   | 209 +++
 .../runners/spark/metrics/sink/CsvSink.java |  38 
 .../spark/metrics/sink/GraphiteSink.java|  38 
 .../spark/metrics/sink/package-info.java|  22 ++
 .../translation/DoFnRunnerWithMetrics.java  |  57 +++--
 .../spark/translation/SparkContextFactory.java  |   2 -
 .../spark/translation/TransformTranslator.java  |   3 +-
 .../streaming/StreamingTransformTranslator.java |   3 +-
 .../spark/aggregators/ClearAggregatorsRule.java |   5 +-
 .../metrics/sink/InMemoryMetrics.java   |  10 +-
 .../spark/src/test/resources/metrics.properties |  10 +-
 .../src/main/resources/beam/findbugs-filter.xml |   4 +-
 33 files changed, 802 insertions(+), 604 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/31624fed/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineResult.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineResult.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineResult.java
index d0d5569..b0958b0 100644
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineResult.java
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineResult.java
@@ -25,7 +25,6 @@ import java.util.concurrent.TimeUnit;
 import java.util.concurrent.TimeoutException;
 import org.apache.beam.runners.spark.aggregators.SparkAggregators;
 import org.apache.beam.runners.spark.metrics.SparkMetricResults;
-import org.apache.beam.runners.spark.metrics.SparkMetricsContainer;
 import org.apache.beam.runners.spark.translation.SparkContextFactory;
 import org.apache.beam.sdk.AggregatorRetrievalException;
 import org.apache.beam.sdk.AggregatorValues;
@@ -46,8 +45,8 @@ public abstract class SparkPipelineResult implements 
PipelineResult {
 
   protected final Future pipelineExecution;
   protected JavaSparkContext javaSparkContext;
-
   protected PipelineResult.State state;
+  private final SparkMetricResults metricResults = new SparkMetricResults();
 
   SparkPipelineResult(final Future pipelineExecution,
   final JavaSparkContext javaSparkContext) {
@@ -124,8 +123,7 @@ public abstract class SparkPipelineResult implements 
PipelineResult {
 
   @Override
   public MetricResults metrics() {
-return new SparkMetricResults(
-
SparkMetricsContainer.getAccumulator(SparkContextFactory.EMPTY_CONTEXT));
+return metricResults;
   }
 
   @Override

http://git-wip-us.apache.org/repos/asf/beam/blob/31624fed/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunner.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunner.java 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunner.java
index cc20a30..3dc4857 100644
--- a/runners/spark/src/main/java/org/apache/beam/runners

[7/7] beam git commit: This closes #1750

2017-02-15 Thread amitsela
This closes #1750


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/24ecf6bb
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/24ecf6bb
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/24ecf6bb

Branch: refs/heads/master
Commit: 24ecf6bbfccefb33e846f7dac941b2f2e30842fd
Parents: e720a7c 3784b54
Author: Sela 
Authored: Wed Feb 15 11:28:27 2017 +0200
Committer: Sela 
Committed: Wed Feb 15 11:28:27 2017 +0200

--
 runners/spark/pom.xml   |   1 -
 .../beam/runners/spark/SparkPipelineResult.java |   5 +-
 .../apache/beam/runners/spark/SparkRunner.java  |  38 +++-
 .../beam/runners/spark/TestSparkRunner.java |   5 +
 .../spark/aggregators/AccumulatorSingleton.java | 137 
 .../aggregators/AggregatorsAccumulator.java | 111 ++
 .../spark/aggregators/NamedAggregators.java |   2 +-
 .../spark/aggregators/SparkAggregators.java |   4 +-
 .../aggregators/metrics/AggregatorMetric.java   |  44 
 .../metrics/AggregatorMetricSource.java |  50 -
 .../metrics/WithNamedAggregatorsSupport.java| 174 ---
 .../spark/aggregators/metrics/sink/CsvSink.java |  39 
 .../aggregators/metrics/sink/GraphiteSink.java  |  39 
 .../aggregators/metrics/sink/package-info.java  |  23 --
 .../runners/spark/metrics/AggregatorMetric.java |  43 
 .../spark/metrics/AggregatorMetricSource.java   |  50 +
 .../runners/spark/metrics/CompositeSource.java  |  49 +
 .../spark/metrics/MetricsAccumulator.java   | 124 +++
 .../spark/metrics/MetricsAccumulatorParam.java  |  42 
 .../runners/spark/metrics/SparkBeamMetric.java  |  62 ++
 .../spark/metrics/SparkBeamMetricSource.java|  50 +
 .../spark/metrics/SparkMetricResults.java   | 181 
 .../spark/metrics/SparkMetricsContainer.java| 155 ++
 .../spark/metrics/WithMetricsSupport.java   | 209 +++
 .../runners/spark/metrics/package-info.java |  20 ++
 .../runners/spark/metrics/sink/CsvSink.java |  38 
 .../spark/metrics/sink/GraphiteSink.java|  38 
 .../spark/metrics/sink/package-info.java|  22 ++
 .../runners/spark/translation/DoFnFunction.java |  26 ++-
 .../translation/DoFnRunnerWithMetrics.java  |  91 
 .../spark/translation/EvaluationContext.java|   4 +
 .../spark/translation/MultiDoFnFunction.java|  25 ++-
 .../spark/translation/TransformTranslator.java  |  29 ++-
 .../spark/translation/streaming/Checkpoint.java | 137 
 .../translation/streaming/CheckpointDir.java|  69 --
 .../SparkRunnerStreamingContextFactory.java |   1 +
 .../streaming/StreamingTransformTranslator.java |  27 ++-
 .../spark/aggregators/ClearAggregatorsRule.java |   5 +-
 .../metrics/sink/InMemoryMetrics.java   |  10 +-
 .../ResumeFromCheckpointStreamingTest.java  |  43 +++-
 .../spark/src/test/resources/metrics.properties |  10 +-
 .../src/main/resources/beam/findbugs-filter.xml |   4 +-
 .../beam/sdk/metrics/DistributionData.java  |   3 +-
 .../org/apache/beam/sdk/metrics/MetricKey.java  |   3 +-
 .../apache/beam/sdk/metrics/MetricUpdates.java  |   3 +-
 .../apache/beam/sdk/metrics/MetricMatchers.java |  96 +
 .../apache/beam/sdk/metrics/MetricsTest.java|  49 -
 47 files changed, 1727 insertions(+), 663 deletions(-)
--




[GitHub] beam pull request #1750: [BEAM-774] Implement Metrics support for Spark runn...

2017-02-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/1750


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/7] beam git commit: Throw UnsupportedOperationException for committed metrics results in spark runner

2017-02-15 Thread amitsela
Throw UnsupportedOperationException for committed metrics results in spark 
runner

Added metrics support for MultiDo


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/d7d49ce8
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/d7d49ce8
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/d7d49ce8

Branch: refs/heads/master
Commit: d7d49ce8a1bff63d4205fd641c90e36b0f88bb17
Parents: 2286578
Author: Aviem Zur 
Authored: Sun Jan 29 12:54:07 2017 +0200
Committer: Sela 
Committed: Wed Feb 15 11:10:48 2017 +0200

--
 runners/spark/pom.xml   |  1 -
 .../beam/runners/spark/TestSparkRunner.java |  5 
 .../runners/spark/metrics/SparkBeamMetric.java  |  4 ++--
 .../spark/metrics/SparkMetricResults.java   |  3 ++-
 .../spark/metrics/SparkMetricsContainer.java| 20 ++--
 .../spark/translation/MultiDoFnFunction.java| 25 ++--
 .../spark/translation/TransformTranslator.java  | 15 
 .../streaming/StreamingTransformTranslator.java | 14 +++
 .../apache/beam/sdk/metrics/MetricsTest.java| 25 +---
 9 files changed, 80 insertions(+), 32 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/d7d49ce8/runners/spark/pom.xml
--
diff --git a/runners/spark/pom.xml b/runners/spark/pom.xml
index c9d8e30..3ef7ef4 100644
--- a/runners/spark/pom.xml
+++ b/runners/spark/pom.xml
@@ -77,7 +77,6 @@
 org.apache.beam.sdk.testing.UsesStatefulParDo,
 org.apache.beam.sdk.testing.UsesTimersInParDo,
 org.apache.beam.sdk.testing.UsesSplittableParDo,
-org.apache.beam.sdk.testing.UsesAttemptedMetrics,
 org.apache.beam.sdk.testing.UsesCommittedMetrics
   
   1

http://git-wip-us.apache.org/repos/asf/beam/blob/d7d49ce8/runners/spark/src/main/java/org/apache/beam/runners/spark/TestSparkRunner.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/TestSparkRunner.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/TestSparkRunner.java
index 798ca47..e770164 100644
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/TestSparkRunner.java
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/TestSparkRunner.java
@@ -22,6 +22,7 @@ import static org.hamcrest.MatcherAssert.assertThat;
 import static org.hamcrest.Matchers.is;
 
 import org.apache.beam.runners.core.UnboundedReadFromBoundedSource;
+import org.apache.beam.runners.spark.metrics.SparkMetricsContainer;
 import org.apache.beam.sdk.Pipeline;
 import org.apache.beam.sdk.PipelineResult.State;
 import org.apache.beam.sdk.io.BoundedReadFromUnboundedSource;
@@ -106,6 +107,10 @@ public final class TestSparkRunner extends 
PipelineRunner {
   @Override
   public SparkPipelineResult run(Pipeline pipeline) {
 TestPipelineOptions testPipelineOptions = 
pipeline.getOptions().as(TestPipelineOptions.class);
+
+// clear metrics singleton
+SparkMetricsContainer.clear();
+
 SparkPipelineResult result = delegate.run(pipeline);
 result.waitUntilFinish();
 

http://git-wip-us.apache.org/repos/asf/beam/blob/d7d49ce8/runners/spark/src/main/java/org/apache/beam/runners/spark/metrics/SparkBeamMetric.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/metrics/SparkBeamMetric.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/metrics/SparkBeamMetric.java
index 0c656d7..8e31b22 100644
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/metrics/SparkBeamMetric.java
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/metrics/SparkBeamMetric.java
@@ -41,10 +41,10 @@ class SparkBeamMetric implements Metric {
 MetricQueryResults metricQueryResults =
 metricResults.queryMetrics(MetricsFilter.builder().build());
 for (MetricResult metricResult : metricQueryResults.counters()) {
-  metrics.put(renderName(metricResult), metricResult.committed());
+  metrics.put(renderName(metricResult), metricResult.attempted());
 }
 for (MetricResult metricResult : 
metricQueryResults.distributions()) {
-  DistributionResult result = metricResult.committed();
+  DistributionResult result = metricResult.attempted();
   metrics.put(renderName(metricResult) + ".count", result.count());
   metrics.put(renderName(metricResult) + ".sum", result.sum());
   metrics.put(renderName(metricResult) + ".min", result.min());

http://git-wip-us.apache.org/repos/asf/beam/blob/d7d49ce8/runners/spark/src/main/java/o

[3/7] beam git commit: Remove duplicate classes from spark runner marking sdk classes Serializable instead.

2017-02-15 Thread amitsela
Remove duplicate classes from spark runner marking sdk classes Serializable 
instead.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/22865780
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/22865780
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/22865780

Branch: refs/heads/master
Commit: 228657808f76e38f0b16767020d6d7e149d5dcdf
Parents: 31624fe
Author: Aviem Zur 
Authored: Thu Jan 19 13:58:14 2017 +0200
Committer: Sela 
Committed: Wed Feb 15 11:10:48 2017 +0200

--
 .../runners/spark/metrics/MetricAggregator.java | 113 --
 .../spark/metrics/MetricsAccumulatorParam.java  |   4 +-
 .../spark/metrics/SparkMetricResults.java   |  28 ++-
 .../spark/metrics/SparkMetricsContainer.java| 205 +++
 .../beam/sdk/metrics/DistributionData.java  |   3 +-
 .../org/apache/beam/sdk/metrics/MetricKey.java  |   3 +-
 .../apache/beam/sdk/metrics/MetricUpdates.java  |   3 +-
 7 files changed, 46 insertions(+), 313 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/22865780/runners/spark/src/main/java/org/apache/beam/runners/spark/metrics/MetricAggregator.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/metrics/MetricAggregator.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/metrics/MetricAggregator.java
deleted file mode 100644
index 79e49ce..000
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/metrics/MetricAggregator.java
+++ /dev/null
@@ -1,113 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.beam.runners.spark.metrics;
-
-import java.io.Serializable;
-import org.apache.beam.sdk.metrics.DistributionData;
-import org.apache.beam.sdk.metrics.MetricKey;
-
-
-/**
- * Metric values wrapper which adds aggregation methods.
- * @param  Metric value type.
- */
-abstract class MetricAggregator implements Serializable {
-  private final MetricKey key;
-  protected ValueT value;
-
-  private MetricAggregator(MetricKey key, ValueT value) {
-this.key = key;
-this.value = value;
-  }
-
-  public MetricKey getKey() {
-return key;
-  }
-
-  public ValueT getValue() {
-return value;
-  }
-
-  @SuppressWarnings("unused")
-  abstract MetricAggregator updated(ValueT update);
-
-  static class CounterAggregator extends MetricAggregator {
-CounterAggregator(MetricKey key, Long value) {
-  super(key, value);
-}
-
-@Override
-CounterAggregator updated(Long counterUpdate) {
-  value = value + counterUpdate;
-  return this;
-}
-  }
-
-  static class DistributionAggregator extends 
MetricAggregator {
-DistributionAggregator(MetricKey key, DistributionData value) {
-  super(key, value);
-}
-
-@Override
-DistributionAggregator updated(DistributionData distributionUpdate) {
-  this.value = new 
SparkDistributionData(this.value.combine(distributionUpdate));
-  return this;
-}
-  }
-
-  static class SparkDistributionData extends DistributionData implements 
Serializable {
-private final long sum;
-private final long count;
-private final long min;
-private final long max;
-
-SparkDistributionData(DistributionData original) {
-  this.sum = original.sum();
-  this.count = original.count();
-  this.min = original.min();
-  this.max = original.max();
-}
-
-@Override
-public long sum() {
-  return sum;
-}
-
-@Override
-public long count() {
-  return count;
-}
-
-@Override
-public long min() {
-  return min;
-}
-
-@Override
-public long max() {
-  return max;
-}
-  }
-
-  static  MetricAggregator updated(MetricAggregator metricAggregator, 
Object updateValue) {
-//noinspection unchecked
-return metricAggregator.updated((T) updateValue);
-  }
-}
-

http://git-wip-us.apache.org/repos/asf/beam/blob/22865780/runners/spark/src/main/java/org/apache/beam/runners/spark/metrics/MetricsAcc

[6/7] beam git commit: Recover metrics values from checkpoint

2017-02-15 Thread amitsela
Recover metrics values from checkpoint


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/3784b541
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/3784b541
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/3784b541

Branch: refs/heads/master
Commit: 3784b5417e3b6053e70c25606f2a7e5022ba0a6a
Parents: d7d49ce
Author: Aviem Zur 
Authored: Wed Feb 1 14:46:51 2017 +0200
Committer: Sela 
Committed: Wed Feb 15 11:10:52 2017 +0200

--
 .../apache/beam/runners/spark/SparkRunner.java  |  10 +-
 .../aggregators/AggregatorsAccumulator.java |  46 ++-
 .../spark/aggregators/SparkAggregators.java |   2 +-
 .../spark/metrics/MetricsAccumulator.java   |  65 -
 .../spark/translation/TransformTranslator.java  |   4 +-
 .../spark/translation/streaming/Checkpoint.java | 137 +++
 .../translation/streaming/CheckpointDir.java|  69 --
 .../SparkRunnerStreamingContextFactory.java |   1 +
 .../streaming/StreamingTransformTranslator.java |   4 +-
 .../ResumeFromCheckpointStreamingTest.java  |  43 --
 10 files changed, 256 insertions(+), 125 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/3784b541/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunner.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunner.java 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunner.java
index 3dc4857..ebac375 100644
--- a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunner.java
+++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunner.java
@@ -30,13 +30,14 @@ import 
org.apache.beam.runners.spark.aggregators.NamedAggregators;
 import org.apache.beam.runners.spark.aggregators.SparkAggregators;
 import org.apache.beam.runners.spark.metrics.AggregatorMetricSource;
 import org.apache.beam.runners.spark.metrics.CompositeSource;
+import org.apache.beam.runners.spark.metrics.MetricsAccumulator;
 import org.apache.beam.runners.spark.metrics.SparkBeamMetricSource;
 import org.apache.beam.runners.spark.translation.EvaluationContext;
 import org.apache.beam.runners.spark.translation.SparkContextFactory;
 import org.apache.beam.runners.spark.translation.SparkPipelineTranslator;
 import org.apache.beam.runners.spark.translation.TransformEvaluator;
 import org.apache.beam.runners.spark.translation.TransformTranslator;
-import org.apache.beam.runners.spark.translation.streaming.CheckpointDir;
+import 
org.apache.beam.runners.spark.translation.streaming.Checkpoint.CheckpointDir;
 import 
org.apache.beam.runners.spark.translation.streaming.SparkRunnerStreamingContextFactory;
 import org.apache.beam.sdk.Pipeline;
 import org.apache.beam.sdk.io.Read;
@@ -143,6 +144,8 @@ public final class SparkRunner extends 
PipelineRunner {
 : Optional.absent();
 final Accumulator aggregatorsAccumulator =
 SparkAggregators.getOrCreateNamedAggregators(jsc, maybeCheckpointDir);
+// Instantiate metrics accumulator
+MetricsAccumulator.init(jsc, maybeCheckpointDir);
 final NamedAggregators initialValue = aggregatorsAccumulator.value();
 if (opts.getEnableSparkMetricSinks()) {
   final MetricsSystem metricsSystem = 
SparkEnv$.MODULE$.get().metricsSystem();
@@ -180,10 +183,13 @@ public final class SparkRunner extends 
PipelineRunner {
   
JavaStreamingContext.getOrCreate(checkpointDir.getSparkCheckpointDir().toString(),
   contextFactory);
 
-  // Checkpoint aggregator values
+  // Checkpoint aggregator/metrics values
   jssc.addStreamingListener(
   new JavaStreamingListenerWrapper(
   new 
AggregatorsAccumulator.AccumulatorCheckpointingSparkListener()));
+  jssc.addStreamingListener(
+  new JavaStreamingListenerWrapper(
+  new MetricsAccumulator.AccumulatorCheckpointingSparkListener()));
 
   // register listeners.
   for (JavaStreamingListener listener: 
mOptions.as(SparkContextOptions.class).getListeners()) {

http://git-wip-us.apache.org/repos/asf/beam/blob/3784b541/runners/spark/src/main/java/org/apache/beam/runners/spark/aggregators/AggregatorsAccumulator.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/aggregators/AggregatorsAccumulator.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/aggregators/AggregatorsAccumulator.java
index 187205b..1b49e91 100644
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/aggregators/AggregatorsAccumulator.java
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/aggregators/AggregatorsAccumulator.java
@@ -21,11 +21,8 @@ pac

[4/7] beam git commit: Register beam metrics with a MetricSource in Spark

2017-02-15 Thread amitsela
http://git-wip-us.apache.org/repos/asf/beam/blob/31624fed/runners/spark/src/test/resources/metrics.properties
--
diff --git a/runners/spark/src/test/resources/metrics.properties 
b/runners/spark/src/test/resources/metrics.properties
index 143532d..b2af378 100644
--- a/runners/spark/src/test/resources/metrics.properties
+++ b/runners/spark/src/test/resources/metrics.properties
@@ -15,12 +15,12 @@
 # limitations under the License.
 
 
-# The "org.apache.beam.runners.spark.aggregators.metrics.sink.XSink"
+# The "org.apache.beam.runners.spark.metrics.sink.XSink"
 # (a.k.a Beam.XSink) is only configured for the driver, the executors are set 
with a Spark native
 # implementation "org.apache.spark.metrics.sink.XSink" (a.k.a Spark.XSink).
 # This is due to sink class loading behavior, which is different on the driver 
and executors nodes.
-# Since Beam aggregator metrics are reported via Spark accumulators and thus 
make their way to the
-# driver, we only need the "Beam.XSink" on the driver side. Executor nodes can 
keep
+# Since Beam aggregators and metrics are reported via Spark accumulators and 
thus make their way to
+# the # driver, we only need the "Beam.XSink" on the driver side. Executor 
nodes can keep
 # reporting Spark native metrics using the traditional Spark.XSink.
 #
 # The the current sink configuration pattern is therefore:
@@ -36,7 +36,7 @@
 
 # * A sample configuration for outputting metrics to Graphite 
*
 
-#driver.sink.graphite.class=org.apache.beam.runners.spark.aggregators.metrics.sink.GraphiteSink
+#driver.sink.graphite.class=org.apache.beam.runners.spark.metrics.sink.GraphiteSink
 #driver.sink.graphite.host=YOUR_HOST
 #driver.sink.graphite.port=2003
 #driver.sink.graphite.prefix=spark
@@ -55,7 +55,7 @@
 
 # * A sample configuration for outputting metrics to a CSV file. 
*
 
-#driver.sink.csv.class=org.apache.beam.runners.spark.aggregators.metrics.sink.CsvSink
+#driver.sink.csv.class=org.apache.beam.runners.spark.metrics.sink.CsvSink
 #driver.sink.csv.directory=/tmp/spark-metrics
 #driver.sink.csv.period=1
 #driver.sink.graphite.unit=SECONDS

http://git-wip-us.apache.org/repos/asf/beam/blob/31624fed/sdks/java/build-tools/src/main/resources/beam/findbugs-filter.xml
--
diff --git a/sdks/java/build-tools/src/main/resources/beam/findbugs-filter.xml 
b/sdks/java/build-tools/src/main/resources/beam/findbugs-filter.xml
index 0431252..edbdb14 100644
--- a/sdks/java/build-tools/src/main/resources/beam/findbugs-filter.xml
+++ b/sdks/java/build-tools/src/main/resources/beam/findbugs-filter.xml
@@ -155,13 +155,13 @@
   
 
   
-
+
 
 
   
 
   
-
+
 
 
   



[jira] [Resolved] (BEAM-774) Implement Metrics support for Spark runner

2017-02-15 Thread Amit Sela (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amit Sela resolved BEAM-774.

   Resolution: Fixed
Fix Version/s: 0.6.0

> Implement Metrics support for Spark runner
> --
>
> Key: BEAM-774
> URL: https://issues.apache.org/jira/browse/BEAM-774
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-spark
>Reporter: Ben Chambers
>Assignee: Aviem Zur
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build is back to stable : beam_PostCommit_Java_RunnableOnService_Spark #954

2017-02-15 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-831) ParDo Chaining

2017-02-15 Thread Chinmay Kolhatkar (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15867722#comment-15867722
 ] 

Chinmay Kolhatkar commented on BEAM-831:


[~thw], [~jkff], I wet through the links provided and understood 
producer-consumer fusion and sibling fusion concepts.

I'm currently focusing on producer-consumer fusion optimization.
To do that here is the approach I'm considering (Still working on a POC yet, so 
might change):
1. Majority of the changes would go in TranslateContext of apex runner.
2. The streams variable can hold information about Locality as well which will 
later be used in TranslationContext.populateDAG method.
3. In populateDAG Api, before the streams are connected, We can traverse the 
streams/operators in topological order and find out adjacent ParDo stages for 
putting them in either thread local OR container local and update hte field in 
streams variable with right Locality.

Only thing that I'm not sure about is when to stop the merging of ParDos... 
i.e. if the DAG is like ParDo A -> ParDo B -> ParDo C -> ParDo D.
Then at time it might be efficient to merge only B & C and not merge all of 
them... Any thoughts on this?

Also, has any runner already done this?

I'm also considering to update ApexFlattenOperator and put the streams in 
ThreadLocal instead of Default locality. Might be a different Jira for that.

Please share your opinion.

> ParDo Chaining
> --
>
> Key: BEAM-831
> URL: https://issues.apache.org/jira/browse/BEAM-831
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-apex
>Reporter: Thomas Weise
>
> Current state of Apex runner creates a plan that will place each operator in 
> a separate container (which would be processes when running on a YARN 
> cluster). Often the ParDo operators can be collocated in same thread or 
> container. Use Apex affinity/stream locality attributes for more efficient 
> execution plan.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-831) ParDo Chaining

2017-02-15 Thread Chinmay Kolhatkar (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15867722#comment-15867722
 ] 

Chinmay Kolhatkar edited comment on BEAM-831 at 2/15/17 12:17 PM:
--

[~thw], [~jkff], I went through the links provided and understood 
producer-consumer fusion and sibling fusion concepts.

I'm currently focusing on producer-consumer fusion optimization. I'm unsure how 
much good it is to do sibling fusion for Apex Runner.
To do that here is the approach I'm considering (Still working on a POC yet, so 
might change):
1. Majority of the changes would go in TranslateContext of apex runner.
2. The streams variable can hold information about Locality as well which will 
later be used in TranslationContext.populateDAG method.
3. In populateDAG Api, before the streams are connected, We can traverse the 
streams/operators in topological order and find out adjacent ParDo stages for 
putting them in either thread local OR container local and update hte field in 
streams variable with right Locality.

Only thing that I'm not sure about is when to stop the merging of ParDos... 
i.e. if the DAG is like ParDo A -> ParDo B -> ParDo C -> ParDo D.
Then at time it might be efficient to merge only B & C and not merge all of 
them... Any thoughts on this?

Also, has any runner already done this?

I'm also considering to update ApexFlattenOperator and put the streams in 
ThreadLocal instead of Default locality. Might be a different Jira for that.

Please share your opinion.


was (Author: chinmay):
[~thw], [~jkff], I wet through the links provided and understood 
producer-consumer fusion and sibling fusion concepts.

I'm currently focusing on producer-consumer fusion optimization.
To do that here is the approach I'm considering (Still working on a POC yet, so 
might change):
1. Majority of the changes would go in TranslateContext of apex runner.
2. The streams variable can hold information about Locality as well which will 
later be used in TranslationContext.populateDAG method.
3. In populateDAG Api, before the streams are connected, We can traverse the 
streams/operators in topological order and find out adjacent ParDo stages for 
putting them in either thread local OR container local and update hte field in 
streams variable with right Locality.

Only thing that I'm not sure about is when to stop the merging of ParDos... 
i.e. if the DAG is like ParDo A -> ParDo B -> ParDo C -> ParDo D.
Then at time it might be efficient to merge only B & C and not merge all of 
them... Any thoughts on this?

Also, has any runner already done this?

I'm also considering to update ApexFlattenOperator and put the streams in 
ThreadLocal instead of Default locality. Might be a different Jira for that.

Please share your opinion.

> ParDo Chaining
> --
>
> Key: BEAM-831
> URL: https://issues.apache.org/jira/browse/BEAM-831
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-apex
>Reporter: Thomas Weise
>
> Current state of Apex runner creates a plan that will place each operator in 
> a separate container (which would be processes when running on a YARN 
> cluster). Often the ParDo operators can be collocated in same thread or 
> container. Use Apex affinity/stream locality attributes for more efficient 
> execution plan.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam-site pull request #156: Fix some typos and small formatting issuesin th...

2017-02-15 Thread iemejia
GitHub user iemejia opened a pull request:

https://github.com/apache/beam-site/pull/156

Fix some typos and small formatting issuesin the last stateful processing 
blog post

- The first image in the section "Example: arbitrary-but-consistent index 
assignment" is really hard to see with the default settings. I made it bigger 
but probably it would be better to change the proportions of the elements of 
the diagram.
- There is a small typo I think (uninterseting => uninteresting).
- "You can provide the opportunity for parallelism by making sure that 
table has enough columns, either via many keys in few windows - for example, a 
globally windowed stateful computation keyed by user ID - or via many windows 
over few keys - for example, a fixed windowed stateful computation over a 
global key. Caveat: all Beam runners today parallelize only over the key." I 
suppose there is an issue with the list here in the markdown, or at least it 
seems more clear like this I think.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/iemejia/beam-site asf-site

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam-site/pull/156.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #156


commit 8518baa70e1ec0dc609dbd36889e246752bc995e
Author: Ismaël Mejía 
Date:   2017-02-15T16:45:25Z

Fix some typos and small formatting issues.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/3] beam-site git commit: Touch up some punctuation in state blog post

2017-02-15 Thread kenn
Touch up some punctuation in state blog post


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/f928b50e
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/f928b50e
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/f928b50e

Branch: refs/heads/asf-site
Commit: f928b50e1594cba453f07aa608ab28afb6329aff
Parents: 8518baa
Author: Kenneth Knowles 
Authored: Wed Feb 15 10:08:31 2017 -0800
Committer: Kenneth Knowles 
Committed: Wed Feb 15 10:08:31 2017 -0800

--
 src/_posts/2017-02-13-stateful-processing.md | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam-site/blob/f928b50e/src/_posts/2017-02-13-stateful-processing.md
--
diff --git a/src/_posts/2017-02-13-stateful-processing.md 
b/src/_posts/2017-02-13-stateful-processing.md
index b3e0c63..fbbe76b 100644
--- a/src/_posts/2017-02-13-stateful-processing.md
+++ b/src/_posts/2017-02-13-stateful-processing.md
@@ -238,11 +238,12 @@ key+window pairs, like this:
 keys and windows are independent dimensions)
 
 You can provide the opportunity for parallelism by making sure that table has
-enough columns, either via:
+enough columns. You might have many keys and many windows, or you might have
+many of just one or the other:
 
-- Many keys in few windows for example, a globally windowed stateful 
computation
+- Many keys in few windows, for example a globally windowed stateful 
computation
   keyed by user ID.
-- Many windows over few keys for example, a fixed windowed stateful computation
+- Many windows over few keys, for example a fixed windowed stateful computation
   over a global key.
 
 Caveat: all Beam runners today parallelize only over the key.



[GitHub] beam-site pull request #156: Fix some typos and small formatting issuesin th...

2017-02-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam-site/pull/156


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/3] beam-site git commit: Fix some typos and small formatting issues.

2017-02-15 Thread kenn
Repository: beam-site
Updated Branches:
  refs/heads/asf-site f3c189568 -> f48e97f67


Fix some typos and small formatting issues.


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/8518baa7
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/8518baa7
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/8518baa7

Branch: refs/heads/asf-site
Commit: 8518baa70e1ec0dc609dbd36889e246752bc995e
Parents: f3c1895
Author: Ismaël Mejía 
Authored: Wed Feb 15 17:45:25 2017 +0100
Committer: Ismaël Mejía 
Committed: Wed Feb 15 17:45:25 2017 +0100

--
 src/_posts/2017-02-13-stateful-processing.md | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam-site/blob/8518baa7/src/_posts/2017-02-13-stateful-processing.md
--
diff --git a/src/_posts/2017-02-13-stateful-processing.md 
b/src/_posts/2017-02-13-stateful-processing.md
index b00361a..b3e0c63 100644
--- a/src/_posts/2017-02-13-stateful-processing.md
+++ b/src/_posts/2017-02-13-stateful-processing.md
@@ -196,7 +196,7 @@ want to write a transform that maps input to output like 
this:
 
+width="180">
 
 The order of the elements A, B, C, D, E is arbitrary, hence their assigned
 indices are arbitrary, but downstream transforms just need to be OK with this.
@@ -238,9 +238,13 @@ key+window pairs, like this:
 keys and windows are independent dimensions)
 
 You can provide the opportunity for parallelism by making sure that table has
-enough columns, either via many keys in few windows - for example, a globally
-windowed stateful computation keyed by user ID - or via many windows over few
-keys - for example, a fixed windowed stateful computation over a global key.
+enough columns, either via:
+
+- Many keys in few windows for example, a globally windowed stateful 
computation
+  keyed by user ID.
+- Many windows over few keys for example, a fixed windowed stateful computation
+  over a global key.
+
 Caveat: all Beam runners today parallelize only over the key.
 
 Most often your mental model of state can be focused on only a single column of
@@ -444,7 +448,7 @@ outputs from the `ParDo` that will be processed downstream. 
If the
 output, then you cannot use a `Filter` transform to reduce data volume 
downstream.
 
 Stateful processing lets you address both the latency problem of side inputs
-and the cost problem of excessive uninterseting output. Here is the code, using
+and the cost problem of excessive uninteresting output. Here is the code, using
 only features I have already introduced:
 
 ```java



[3/3] beam-site git commit: This closes #156: Fix some typos and small formatting issuesin the last stateful processing blog post

2017-02-15 Thread kenn
This closes #156: Fix some typos and small formatting issuesin the last 
stateful processing blog post

  Touch up some punctuation in state blog post
  Fix some typos and small formatting issues.


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/f48e97f6
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/f48e97f6
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/f48e97f6

Branch: refs/heads/asf-site
Commit: f48e97f6753f3cb00573a3e8a0f2768e67d6366c
Parents: f3c1895 f928b50
Author: Kenneth Knowles 
Authored: Wed Feb 15 10:08:58 2017 -0800
Committer: Kenneth Knowles 
Committed: Wed Feb 15 10:08:58 2017 -0800

--
 src/_posts/2017-02-13-stateful-processing.md | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)
--




[1/2] beam git commit: Add unsigned 64 bit int read/write methods to cythonized stream

2017-02-15 Thread altay
Repository: beam
Updated Branches:
  refs/heads/master 24ecf6bbf -> 00ea3f7d7


Add unsigned 64 bit int read/write methods to cythonized stream


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/be911e88
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/be911e88
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/be911e88

Branch: refs/heads/master
Commit: be911e881e060c5695f7c15aebce0e542176b1ff
Parents: 24ecf6b
Author: Vikas Kedigehalli 
Authored: Mon Feb 13 22:11:25 2017 -0800
Committer: Ahmet Altay 
Committed: Wed Feb 15 09:58:45 2017 -0800

--
 sdks/python/apache_beam/coders/stream.pxd |  3 +++
 sdks/python/apache_beam/coders/stream.pyx | 10 +-
 sdks/python/apache_beam/coders/stream_test.py | 11 +++
 3 files changed, 23 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/be911e88/sdks/python/apache_beam/coders/stream.pxd
--
diff --git a/sdks/python/apache_beam/coders/stream.pxd 
b/sdks/python/apache_beam/coders/stream.pxd
index 16ea5d4..22ad8c1 100644
--- a/sdks/python/apache_beam/coders/stream.pxd
+++ b/sdks/python/apache_beam/coders/stream.pxd
@@ -27,6 +27,7 @@ cdef class OutputStream(object):
   cpdef write_byte(self, unsigned char val)
   cpdef write_var_int64(self, libc.stdint.int64_t v)
   cpdef write_bigendian_int64(self, libc.stdint.int64_t signed_v)
+  cpdef write_bigendian_uint64(self, libc.stdint.uint64_t signed_v)
   cpdef write_bigendian_int32(self, libc.stdint.int32_t signed_v)
   cpdef write_bigendian_double(self, double d)
 
@@ -41,6 +42,7 @@ cdef class ByteCountingOutputStream(OutputStream):
   cpdef write(self, bytes b, bint nested=*)
   cpdef write_byte(self, unsigned char val)
   cpdef write_bigendian_int64(self, libc.stdint.int64_t val)
+  cpdef write_bigendian_uint64(self, libc.stdint.uint64_t val)
   cpdef write_bigendian_int32(self, libc.stdint.int32_t val)
   cpdef size_t get_count(self)
   cpdef bytes get(self)
@@ -56,6 +58,7 @@ cdef class InputStream(object):
   cpdef long read_byte(self) except? -1
   cpdef libc.stdint.int64_t read_var_int64(self) except? -1
   cpdef libc.stdint.int64_t read_bigendian_int64(self) except? -1
+  cpdef libc.stdint.uint64_t read_bigendian_uint64(self) except? -1
   cpdef libc.stdint.int32_t read_bigendian_int32(self) except? -1
   cpdef double read_bigendian_double(self) except? -1
   cpdef bytes read_all(self, bint nested=*)

http://git-wip-us.apache.org/repos/asf/beam/blob/be911e88/sdks/python/apache_beam/coders/stream.pyx
--
diff --git a/sdks/python/apache_beam/coders/stream.pyx 
b/sdks/python/apache_beam/coders/stream.pyx
index cde900f..e29f121 100644
--- a/sdks/python/apache_beam/coders/stream.pyx
+++ b/sdks/python/apache_beam/coders/stream.pyx
@@ -63,7 +63,9 @@ cdef class OutputStream(object):
 break
 
   cpdef write_bigendian_int64(self, libc.stdint.int64_t signed_v):
-cdef libc.stdint.uint64_t v = signed_v
+self.write_bigendian_uint64(signed_v)
+
+  cpdef write_bigendian_uint64(self, libc.stdint.uint64_t v):
 if  self.size < self.pos + 8:
   self.extend(8)
 self.data[self.pos] = (v >> 56)
@@ -124,6 +126,9 @@ cdef class ByteCountingOutputStream(OutputStream):
   cpdef write_bigendian_int64(self, libc.stdint.int64_t _):
 self.count += 8
 
+  cpdef write_bigendian_uint64(self, libc.stdint.uint64_t _):
+self.count += 8
+
   cpdef write_bigendian_int32(self, libc.stdint.int32_t _):
 self.count += 4
 
@@ -182,6 +187,9 @@ cdef class InputStream(object):
 return result
 
   cpdef libc.stdint.int64_t read_bigendian_int64(self) except? -1:
+return self.read_bigendian_uint64()
+
+  cpdef libc.stdint.uint64_t read_bigendian_uint64(self) except? -1:
 self.pos += 8
 return (self.allc[self.pos - 1]
   | self.allc[self.pos - 2] <<  8

http://git-wip-us.apache.org/repos/asf/beam/blob/be911e88/sdks/python/apache_beam/coders/stream_test.py
--
diff --git a/sdks/python/apache_beam/coders/stream_test.py 
b/sdks/python/apache_beam/coders/stream_test.py
index cfd627f..e6108b6 100644
--- a/sdks/python/apache_beam/coders/stream_test.py
+++ b/sdks/python/apache_beam/coders/stream_test.py
@@ -106,6 +106,15 @@ class StreamTest(unittest.TestCase):
 for v in values:
   self.assertEquals(v, in_s.read_bigendian_int64())
 
+  def test_read_write_bigendian_uint64(self):
+values = 0, 1, 2**64-1, int(2**61 * math.pi)
+out_s = self.OutputStream()
+for v in values:
+  out_s.write_bigendian_uint64(v)
+in_s = self.InputStream(out_s.get())
+for v in values:
+  self.assertEquals(v, in_s.read_bigendian_uint64())
+
   de

[2/2] beam git commit: This closes #2004

2017-02-15 Thread altay
This closes #2004


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/00ea3f7d
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/00ea3f7d
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/00ea3f7d

Branch: refs/heads/master
Commit: 00ea3f7d7ffbca398f8deeb1feefc1e1d635ef2a
Parents: 24ecf6b be911e8
Author: Ahmet Altay 
Authored: Wed Feb 15 09:58:56 2017 -0800
Committer: Ahmet Altay 
Committed: Wed Feb 15 09:58:56 2017 -0800

--
 sdks/python/apache_beam/coders/stream.pxd |  3 +++
 sdks/python/apache_beam/coders/stream.pyx | 10 +-
 sdks/python/apache_beam/coders/stream_test.py | 11 +++
 3 files changed, 23 insertions(+), 1 deletion(-)
--




[GitHub] beam pull request #2004: Add unsigned 64 bit int read/write methods to cytho...

2017-02-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2004


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (BEAM-865) Support set/delete of timer by ID in DirectRunner

2017-02-15 Thread Thomas Groh (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Groh reassigned BEAM-865:


Assignee: Kenneth Knowles  (was: Thomas Groh)

> Support set/delete of timer by ID in DirectRunner
> -
>
> Key: BEAM-865
> URL: https://issues.apache.org/jira/browse/BEAM-865
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-direct
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> This is a prerequisite for full support of BEAM-27, and related to BEAM-35.
> (Since I am filing individual tickets for each runner, generalized discussion 
> should probably be on the dev list directly on or BEAM-35)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-1300) The DirectRunner resumes from unfinalized checkpoints

2017-02-15 Thread Thomas Groh (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Groh resolved BEAM-1300.
---
   Resolution: Fixed
Fix Version/s: 0.5.0

> The DirectRunner resumes from unfinalized checkpoints
> -
>
> Key: BEAM-1300
> URL: https://issues.apache.org/jira/browse/BEAM-1300
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Reporter: Thomas Groh
>Assignee: Thomas Groh
> Fix For: 0.5.0
>
>
> This is a bug in the order in which Source methods are invoked relative to 
> checkpoint methods. Sources may only resume from finalized checkpoints, but 
> the implementation of Unbounded Read in the DirectRunner finalizes a 
> checkpoint after a reader has been created but before it is started. It must 
> be finalized before the reader is created.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1399) Code coverage numbers are not accurate

2017-02-15 Thread Stas Levin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868375#comment-15868375
 ] 

Stas Levin commented on BEAM-1399:
--

Having spent a while researching this, I can share the following findings.

I believe you are correct in that the coverage is inaccurate, and the 
{{report}} goal does not capture coverage provided by external (to the tests) 
modules.
{{report-aggregate}} takes into account coverage provided by such modules, but 
it seems to assume that the external module (e.g. {{DirectRunner}}) is all 
about integration tests, despite it having unit test of its own which are not 
reported, and so the coverage is yet again, inaccurate.
In addition, as far as I could see {{report-aggregate}} does not provide a 
solution for the rest of the modules, assuming we would like to obtain the 
coverage across the entire Beam project and not just the SDK. This is also the 
assumption throughout the rest of this comment, so if it's wrong and we are 
fine with covering the SDK only, my conclusions below may not apply.

The general problem of obtaining code coverage for an arbitrary depth, 
multi-module maven project is heavily discussed ^1-3^ and is believed to be NP 
complete.
Nonetheless some heuristics are out there:
# Introduce a new module, which should be *dependent on all the modules in the 
project*, this will allow using {{Jacoco}}'s {{report-aggregate}} goal.
# Hack our way.
## Introduce a new reporter module
## Add {{maven-antrun-plugin}} to the new module and set up an {{Ant}} task to 
gain a more fine-grained control ^4^ over the coverage report generation.

I do not believe (1) is appropriate for Beam since it requires tons of manual 
labor, and does not scale well. This leaves us with (2).

To go with (2) we first need to address the following issues:
# We have test classes that essentially act as both unit tests, and integration 
tests (from {{maven}}'s perspective). Such classes are characterised by having 
both regular test methods, and test methods decorated with {{@NeedsRunner}} or 
{{@RunnableOnService}} which are run by {{maven-surefire-plugin}} during the 
{{integration-test}} phase (as opposed to the {{test}} phase for regular test 
methods). From my experiments this scenario is not well accommodated ^5^ in 
{{jacoco-maven-plugin}}.
# Reassigning {{@NeedsRunner}} and {{@RunnableOnService}}  based tests to be 
triggered during the {{test}} phase does not do the trick, probably because 
they scan dependencies (for tests) which fail to be ready when the {{test}} 
phase executes. I assume this was the reason they were configured to run as 
part of the {{verify}} phase in the first place.
# I believe these issues can be alleviated by moving {{@NeedsRunner}} and 
{{@RunnableOnService}} to separate *integration test* classes which will 
execute during the {{integration-test}} phase and thus avoid interfering with 
the tests that run during the {{test}} phase.

My impression on this is that it is doable, but the benefit-cost ratio seems to 
be quite questionable.
Perhaps we could look at external tools such as 
[SonarQube|https://www.sonarqube.org] which specialise in providing software 
metrics such as test coverage and the likes.
There is also a [Jenkins JaCoCo Plugin | 
https://wiki.jenkins-ci.org/display/JENKINS/JaCoCo+Plugin], which I have not 
looked into.

1. 
https://groups.google.com/forum/#!searchin/jacoco/%22multi$20module%22$20coverage%7Csort:relevance
2. https://github.com/jacoco/jacoco/wiki/MavenMultiModule
3. 
https://www.google.co.il/search?q=jacoco+multi+module+coverage&oq=jacoco+multi+module+coverage&aqs=chrome.0.0j69i57j69i60l3.7899j0j4&sourceid=chrome&ie=UTF-8
4. http://www.eclemma.org/jacoco/trunk/doc/ant.html
5. http://www.eclemma.org/jacoco/trunk/doc/classids.html

> Code coverage numbers are not accurate
> --
>
> Key: BEAM-1399
> URL: https://issues.apache.org/jira/browse/BEAM-1399
> Project: Beam
>  Issue Type: Bug
>  Components: build-system, sdk-java-core, testing
>Reporter: Daniel Halperin
>  Labels: newbie, starter
>
> We've started adding Java Code Coverage numbers to PRs using the jacoco 
> plugin. However, we are getting very low coverage reported for things like 
> the Java SDK core.
> My belief is that this is happening because we test the bulk of the SDK not 
> in the SDK module , but in fact in the DirectRunner and other similar modules.
> JaCoCo has a {{report:aggregate}} target that might do the trick, but with a 
> few minutes of playing with it I wasn't able to make it work satisfactorily. 
> Basic work in https://github.com/apache/beam/pull/1800
> This is a good "random improvement" issue for anyone to pick up.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-1399) Code coverage numbers are not accurate

2017-02-15 Thread Stas Levin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868375#comment-15868375
 ] 

Stas Levin edited comment on BEAM-1399 at 2/15/17 7:06 PM:
---

Having spent a while researching this, I can share the following findings.

I believe you are correct in that the coverage is inaccurate, and the 
{{report}} goal does not capture coverage provided by external (to the tests) 
modules.
{{report-aggregate}} takes into account coverage provided by such modules, but 
it seems to assume that the external module (e.g. {{DirectRunner}}) is all 
about integration tests, despite it having unit tests of its own which are not 
reported, and so the coverage is yet again, inaccurate.
In addition, as far as I could see {{report-aggregate}} does not provide a 
solution for the rest of the modules, assuming we would like to obtain the 
coverage across the entire Beam project and not just the SDK. This is also the 
assumption throughout the rest of this comment, so if it's wrong and we are 
fine with covering the SDK only, my conclusions below may not apply.

The general problem of obtaining code coverage for an arbitrary depth, 
multi-module maven project is heavily discussed ^1-3^ and is believed to be NP 
complete.
Nonetheless some heuristics are out there:
# Introduce a new module, which should be *dependent on all the modules in the 
project*, this will allow using {{Jacoco}}'s {{report-aggregate}} goal.
# Hack our way.
## Introduce a new reporter module
## Add {{maven-antrun-plugin}} to the new module and set up an {{Ant}} task to 
gain a more fine-grained control ^4^ over the coverage report generation.

I do not believe (1) is appropriate for Beam since it requires tons of manual 
labor, and does not scale well. This leaves us with (2).

To go with (2) we first need to address the following issues:
# We have test classes that essentially act as both unit tests, and integration 
tests (from {{maven}}'s perspective). Such classes are characterised by having 
both regular test methods, and test methods decorated with {{@NeedsRunner}} or 
{{@RunnableOnService}} which are run by {{maven-surefire-plugin}} during the 
{{integration-test}} phase (as opposed to the {{test}} phase for regular test 
methods). From my experiments this scenario is not well accommodated ^5^ in 
{{jacoco-maven-plugin}}.
# Reassigning {{@NeedsRunner}} and {{@RunnableOnService}}  based tests to be 
triggered during the {{test}} phase does not do the trick, probably because 
they scan dependencies (for tests) which fail to be ready when the {{test}} 
phase executes. I assume this was the reason they were configured to run as 
part of the {{verify}} phase in the first place.
# I believe these issues can be alleviated by moving {{@NeedsRunner}} and 
{{@RunnableOnService}} to separate *integration test* classes which will 
execute during the {{integration-test}} phase and thus avoid interfering with 
the tests that run during the {{test}} phase.

My impression on this is that it is doable, but the benefit-cost ratio seems to 
be quite questionable.
Perhaps we could look at external tools such as 
[SonarQube|https://www.sonarqube.org] which specialise in providing software 
metrics such as test coverage and the likes.
There is also a [Jenkins JaCoCo Plugin | 
https://wiki.jenkins-ci.org/display/JENKINS/JaCoCo+Plugin], which I have not 
looked into.

1. 
https://groups.google.com/forum/#!searchin/jacoco/%22multi$20module%22$20coverage%7Csort:relevance
2. https://github.com/jacoco/jacoco/wiki/MavenMultiModule
3. 
https://www.google.co.il/search?q=jacoco+multi+module+coverage&oq=jacoco+multi+module+coverage&aqs=chrome.0.0j69i57j69i60l3.7899j0j4&sourceid=chrome&ie=UTF-8
4. http://www.eclemma.org/jacoco/trunk/doc/ant.html
5. http://www.eclemma.org/jacoco/trunk/doc/classids.html


was (Author: staslev):
Having spent a while researching this, I can share the following findings.

I believe you are correct in that the coverage is inaccurate, and the 
{{report}} goal does not capture coverage provided by external (to the tests) 
modules.
{{report-aggregate}} takes into account coverage provided by such modules, but 
it seems to assume that the external module (e.g. {{DirectRunner}}) is all 
about integration tests, despite it having unit test of its own which are not 
reported, and so the coverage is yet again, inaccurate.
In addition, as far as I could see {{report-aggregate}} does not provide a 
solution for the rest of the modules, assuming we would like to obtain the 
coverage across the entire Beam project and not just the SDK. This is also the 
assumption throughout the rest of this comment, so if it's wrong and we are 
fine with covering the SDK only, my conclusions below may not apply.

The general problem of obtaining code coverage for an arbitrary depth, 
multi-module maven project is heavily discussed ^1-3^ and is

[1/2] beam git commit: [BEAM-79] Support merging windows in GearpumpRunner

2017-02-15 Thread kenn
Repository: beam
Updated Branches:
  refs/heads/gearpump-runner 4001aeb19 -> 2d0aed922


[BEAM-79] Support merging windows in GearpumpRunner


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/7af64720
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/7af64720
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/7af64720

Branch: refs/heads/gearpump-runner
Commit: 7af6472082cbc7f3853e87831ed4bdc72978a3a3
Parents: 4001aeb
Author: manuzhang 
Authored: Tue Feb 7 22:14:18 2017 +0800
Committer: manuzhang 
Committed: Wed Feb 15 14:59:42 2017 +0800

--
 runners/gearpump/pom.xml|   5 -
 .../gearpump/GearpumpPipelineResult.java|   8 +-
 .../beam/runners/gearpump/GearpumpRunner.java   |  24 +---
 .../translators/GroupByKeyTranslator.java   | 133 +++
 .../translators/WindowBoundTranslator.java  |  53 +---
 .../gearpump/translators/io/GearpumpSource.java |   6 +-
 .../translators/utils/DoFnRunnerFactory.java|   1 +
 .../translators/utils/TranslatorUtils.java  |  20 +++
 .../translators/utils/TranslatorUtilsTest.java  |  75 +++
 9 files changed, 186 insertions(+), 139 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/7af64720/runners/gearpump/pom.xml
--
diff --git a/runners/gearpump/pom.xml b/runners/gearpump/pom.xml
index 7c6fa76..6f91c50 100644
--- a/runners/gearpump/pom.xml
+++ b/runners/gearpump/pom.xml
@@ -93,11 +93,6 @@
   org.apache.beam.sdk.transforms.ViewTest,
   org.apache.beam.sdk.transforms.join.CoGroupByKeyTest
 
-
-
-  org.apache.beam.sdk.transforms.windowing.WindowingTest,
-  org.apache.beam.sdk.util.ReshuffleTest
-
   
   
 

http://git-wip-us.apache.org/repos/asf/beam/blob/7af64720/runners/gearpump/src/main/java/org/apache/beam/runners/gearpump/GearpumpPipelineResult.java
--
diff --git 
a/runners/gearpump/src/main/java/org/apache/beam/runners/gearpump/GearpumpPipelineResult.java
 
b/runners/gearpump/src/main/java/org/apache/beam/runners/gearpump/GearpumpPipelineResult.java
index 9e53517..a3740b7 100644
--- 
a/runners/gearpump/src/main/java/org/apache/beam/runners/gearpump/GearpumpPipelineResult.java
+++ 
b/runners/gearpump/src/main/java/org/apache/beam/runners/gearpump/GearpumpPipelineResult.java
@@ -27,7 +27,7 @@ import org.apache.beam.sdk.PipelineResult;
 import org.apache.beam.sdk.metrics.MetricResults;
 import org.apache.beam.sdk.transforms.Aggregator;
 
-import org.apache.gearpump.cluster.MasterToAppMaster;
+import org.apache.gearpump.cluster.ApplicationStatus;
 import org.apache.gearpump.cluster.MasterToAppMaster.AppMasterData;
 import org.apache.gearpump.cluster.client.ClientContext;
 import org.joda.time.Duration;
@@ -105,7 +105,7 @@ public class GearpumpPipelineResult implements 
PipelineResult {
   }
 
   private State getGearpumpState() {
-String status = null;
+ApplicationStatus status = null;
 List apps =
 JavaConverters.seqAsJavaListConverter(
 (Seq) client.listApps().appMasters()).asJava();
@@ -114,9 +114,9 @@ public class GearpumpPipelineResult implements 
PipelineResult {
 status = app.status();
   }
 }
-if (null == status || 
status.equals(MasterToAppMaster.AppMasterNonExist())) {
+if (null == status || status instanceof ApplicationStatus.NONEXIST$) {
   return State.UNKNOWN;
-} else if (status.equals(MasterToAppMaster.AppMasterActive())) {
+} else if (status instanceof ApplicationStatus.ACTIVE$) {
   return State.RUNNING;
 } else {
   return State.STOPPED;

http://git-wip-us.apache.org/repos/asf/beam/blob/7af64720/runners/gearpump/src/main/java/org/apache/beam/runners/gearpump/GearpumpRunner.java
--
diff --git 
a/runners/gearpump/src/main/java/org/apache/beam/runners/gearpump/GearpumpRunner.java
 
b/runners/gearpump/src/main/java/org/apache/beam/runners/gearpump/GearpumpRunner.java
index 01fdb3b..9ca1eb2 100644
--- 
a/runners/gearpump/src/main/java/org/apache/beam/runners/gearpump/GearpumpRunner.java
+++ 
b/runners/gearpump/src/main/java/org/apache/beam/runners/gearpump/GearpumpRunner.java
@@ -29,13 +29,8 @@ import org.apache.beam.sdk.options.PipelineOptions;
 import org.apache.beam.sdk.options.PipelineOptionsValidator;
 import org.apache.beam.sdk.runners.PipelineRunner;
 import org.apache.beam.sdk.transforms.Create;
-import org.apache.beam.sdk.transforms.DoFn;
 import org.apache.beam.sdk.transforms.

[GitHub] beam pull request #1935: [BEAM-79] Support merging windows in GearpumpRunner

2017-02-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/1935


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: This closes #1935: Support merging windows in GearpumpRunner

2017-02-15 Thread kenn
This closes #1935: Support merging windows in GearpumpRunner


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/2d0aed92
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/2d0aed92
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/2d0aed92

Branch: refs/heads/gearpump-runner
Commit: 2d0aed9221b4e763917b136ce2ecc60d08621a07
Parents: 4001aeb 7af6472
Author: Kenneth Knowles 
Authored: Wed Feb 15 13:17:30 2017 -0800
Committer: Kenneth Knowles 
Committed: Wed Feb 15 13:17:30 2017 -0800

--
 runners/gearpump/pom.xml|   5 -
 .../gearpump/GearpumpPipelineResult.java|   8 +-
 .../beam/runners/gearpump/GearpumpRunner.java   |  24 +---
 .../translators/GroupByKeyTranslator.java   | 133 +++
 .../translators/WindowBoundTranslator.java  |  53 +---
 .../gearpump/translators/io/GearpumpSource.java |   6 +-
 .../translators/utils/DoFnRunnerFactory.java|   1 +
 .../translators/utils/TranslatorUtils.java  |  20 +++
 .../translators/utils/TranslatorUtilsTest.java  |  75 +++
 9 files changed, 186 insertions(+), 139 deletions(-)
--




[jira] [Commented] (BEAM-79) Gearpump runner

2017-02-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868581#comment-15868581
 ] 

ASF GitHub Bot commented on BEAM-79:


Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/1935


> Gearpump runner
> ---
>
> Key: BEAM-79
> URL: https://issues.apache.org/jira/browse/BEAM-79
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-gearpump
>Reporter: Tyler Akidau
>Assignee: Manu Zhang
>
> Intel is submitting Gearpump (http://www.gearpump.io) to ASF 
> (https://wiki.apache.org/incubator/GearpumpProposal). Appears to be a mix of 
> low-level primitives a la MillWheel, with some higher level primitives like 
> non-merging windowing mixed in. Seems like it would make a nice Beam runner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Build failed in Jenkins: beam_PostCommit_Java_RunnableOnService_Gearpump #160

2017-02-15 Thread Apache Jenkins Server
See 


Changes:

[owenzhang1990] [BEAM-79] Support merging windows in GearpumpRunner

--
[...truncated 19443 lines...]
at 
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
at 
org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)
at 
org.jvnet.hudson.maven3.launcher.Maven32Launcher.main(Maven32Launcher.java:132)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.codehaus.plexus.classworlds.launcher.Launcher.launchStandard(Launcher.java:330)
at 
org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:238)
at jenkins.maven3.agent.Maven32Main.launch(Maven32Main.java:186)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at hudson.maven.Maven3Builder.call(Maven3Builder.java:136)
at hudson.maven.Maven3Builder.call(Maven3Builder.java:71)
at hudson.remoting.UserRequest.perform(UserRequest.java:153)
at hudson.remoting.UserRequest.perform(UserRequest.java:50)
at hudson.remoting.Request$2.run(Request.java:332)
at 
hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor166.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:222)
... 80 more
Caused by: java.lang.VerifyError: Inconsistent stackmap frames at branch target 
152
Exception Details:
  Location:
akka/dispatch/Mailbox.processAllSystemMessages()V @152: getstatic
  Reason:
Type top (current frame, locals[9]) is not assignable to 
'akka/dispatch/sysmsg/SystemMessage' (stack map, locals[9])
  Current Frame:
bci: @131
flags: { }
locals: { 'akka/dispatch/Mailbox', 'java/lang/InterruptedException', 
'akka/dispatch/sysmsg/SystemMessage', top, 'akka/dispatch/Mailbox', 
'java/lang/Throwable', 'java/lang/Throwable' }
stack: { integer }
  Stackmap Frame:
bci: @152
flags: { }
locals: { 'akka/dispatch/Mailbox', 'java/lang/InterruptedException', 
'akka/dispatch/sysmsg/SystemMessage', top, 'akka/dispatch/Mailbox', 
'java/lang/Throwable', 'java/lang/Throwable', top, top, 
'akka/dispatch/sysmsg/SystemMessage' }
stack: { }
  Bytecode:
0x000: 014c 2ab2 0132 b601 35b6 0139 4db2 013e
0x010: 2cb6 0142 9900 522a b600 c69a 004b 2c4e
0x020: b201 3e2c b601 454d 2db9 0148 0100 2ab6
0x030: 0052 2db6 014b b801 0999 000e bb00 e759
0x040: 1301 4db7 010f 4cb2 013e 2cb6 0150 99ff
0x050: bf2a b600 c69a ffb8 2ab2 0132 b601 35b6
0x060: 0139 4da7 ffaa 2ab6 0052 b600 56b6 0154
0x070: b601 5a3a 04a7 0091 3a05 1905 3a06 1906
0x080: c100 e799 0015 1906 c000 e73a 0719 074c
0x090: b200 f63a 08a7 0071 b201 5f19 06b6 0163
0x0a0: 3a0a 190a b601 6899 0006 1905 bf19 0ab6
0x0b0: 016c c000 df3a 0b2a b600 52b6 0170 b601
0x0c0: 76bb 000f 5919 0b2a b600 52b6 017a b601
0x0d0: 80b6 0186 2ab6 018a bb01 8c59 b701 8e13
0x0e0: 0190 b601 9419 09b6 0194 1301 96b6 0194
0x0f0: 190b b601 99b6 0194 b601 9ab7 019d b601
0x100: a3b2 00f6 3a08 b201 3e2c b601 4299 0026
0x110: 2c3a 09b2 013e 2cb6 0145 4d19 09b9 0148
0x120: 0100 1904 2ab6 0052 b601 7a19 09b6 01a7
0x130: a7ff d62b c600 09b8 0109 572b bfb1 
  Exception Handler Table:
bci [290, 307] => handler: 120
  Stackmap Table:
append_frame(@13,Object[#231],Object[#177])
append_frame(@71,Object[#177])
chop_frame(@102,1)


[jira] [Created] (BEAM-1492) Avoid potential issue in ASM 5.0

2017-02-15 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-1492:
-

 Summary: Avoid potential issue in ASM 5.0
 Key: BEAM-1492
 URL: https://issues.apache.org/jira/browse/BEAM-1492
 Project: Beam
  Issue Type: Task
  Components: sdk-java-core
Reporter: Kenneth Knowles
Assignee: Kenneth Knowles


There is a suspected bug in asm 5.0 that is considered the likely root cause of 
a [bug 
sbt-assembly|https://github.com/sbt/sbt-assembly/issues/205#issuecomment-279964607]
 that carried over to [GEARPUMP-236]. I have not found a direct reference to 
what the issue is, precisely, but the dependency effect of this is extremely 
small and these are libraries that are useful to keep current. And if/when 
Gearpump runner lands on master this will avoid any diamond dep issues.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1493) runners/core-java should be a pre-execution and an execution-time module

2017-02-15 Thread Thomas Groh (JIRA)
Thomas Groh created BEAM-1493:
-

 Summary: runners/core-java should be a pre-execution and an 
execution-time module
 Key: BEAM-1493
 URL: https://issues.apache.org/jira/browse/BEAM-1493
 Project: Beam
  Issue Type: Bug
  Components: runner-core
Reporter: Thomas Groh
Assignee: Thomas Groh


This permits a runner to use an internal version of runners-core, but have 
utilities that interact with the Pipeline within the runner shim.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2013: [BEAM-1493] Add runners/core-pipeline-java

2017-02-15 Thread tgroh
GitHub user tgroh opened a pull request:

https://github.com/apache/beam/pull/2013

[BEAM-1493] Add runners/core-pipeline-java

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
This module contains pre-execution PipelineRunner utilities.

Move PTransformMatchers, ReplacementOutputs to core-pipeline-java

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/beam split_runners_core

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2013.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2013


commit a3829af3e8e3e6da7a862016c4e9e1060e56c718
Author: Thomas Groh 
Date:   2017-02-15T20:57:52Z

Add runners/core-pipeline-java

This module contains pre-execution PipelineRunner utilities.

Move PTransformMatchers to core-pipeline-java




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-1493) runners/core-java should be a pre-execution and an execution-time module

2017-02-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868676#comment-15868676
 ] 

ASF GitHub Bot commented on BEAM-1493:
--

GitHub user tgroh opened a pull request:

https://github.com/apache/beam/pull/2013

[BEAM-1493] Add runners/core-pipeline-java

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
This module contains pre-execution PipelineRunner utilities.

Move PTransformMatchers, ReplacementOutputs to core-pipeline-java

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/beam split_runners_core

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2013.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2013


commit a3829af3e8e3e6da7a862016c4e9e1060e56c718
Author: Thomas Groh 
Date:   2017-02-15T20:57:52Z

Add runners/core-pipeline-java

This module contains pre-execution PipelineRunner utilities.

Move PTransformMatchers to core-pipeline-java




> runners/core-java should be a pre-execution and an execution-time module
> 
>
> Key: BEAM-1493
> URL: https://issues.apache.org/jira/browse/BEAM-1493
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>
> This permits a runner to use an internal version of runners-core, but have 
> utilities that interact with the Pipeline within the runner shim.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2014: [BEAM-1492] Upgrade bytebuddy to 1.6.8 to jump past...

2017-02-15 Thread kennknowles
GitHub user kennknowles opened a pull request:

https://github.com/apache/beam/pull/2014

[BEAM-1492] Upgrade bytebuddy to 1.6.8 to jump past asm 5.0

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

There is a suspected bug in asm 5.0 that is considered the likely root 
cause of a bug sbt/sbt-assembly#205 (see [this 
comment](https://github.com/sbt/sbt-assembly/issues/205#issuecomment-279964607))
 that carried over to 
[Gearpump](https://issues.apache.org/jira/browse/GEARPUMP-236). This commit 
upgrades us to depend on bytebuddy 1.6.8 that uses asm 5.2 in which those 
derivative bugs have cleared up.

I have not found a direct reference to what the issue is, precisely, but 
the dependency effect is nil because bytebuddy is fully shaded (no deps) so we 
might as well jump past any problematic version eagerly.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/beam bytebuddy-asm

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2014.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2014


commit 3e4c05c3449064e7f032a48b98551a73d71a5bbb
Author: Kenneth Knowles 
Date:   2017-02-15T22:02:38Z

Upgrade bytebuddy to 1.6.8 to jump past asm 5.0

There is a suspected bug in asm 5.0 that is considered the likely root 
cause of
a bug sbt-assembly [1] that carried over to Gearpump [2]. This commit 
upgrades
us to depend on asm 5.2 in which those derivative bugs have cleared up.

I have not found a direct reference to what the issue is, precisely, but
the dependency effect of this is extremely small and these are libraries
that are useful to keep current.

[1] https://github.com/sbt/sbt-assembly/issues/205#issuecomment-279964607
[2] https://issues.apache.org/jira/browse/GEARPUMP-236




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (BEAM-1492) Avoid potential issue in ASM 5.0

2017-02-15 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-1492:
-

Assignee: Amit Sela  (was: Kenneth Knowles)

> Avoid potential issue in ASM 5.0
> 
>
> Key: BEAM-1492
> URL: https://issues.apache.org/jira/browse/BEAM-1492
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Amit Sela
>
> There is a suspected bug in asm 5.0 that is considered the likely root cause 
> of a [bug 
> sbt-assembly|https://github.com/sbt/sbt-assembly/issues/205#issuecomment-279964607]
>  that carried over to [GEARPUMP-236]. I have not found a direct reference to 
> what the issue is, precisely, but the dependency effect of this is extremely 
> small and these are libraries that are useful to keep current. And if/when 
> Gearpump runner lands on master this will avoid any diamond dep issues.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1492) Avoid potential issue in ASM 5.0

2017-02-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868687#comment-15868687
 ] 

ASF GitHub Bot commented on BEAM-1492:
--

GitHub user kennknowles opened a pull request:

https://github.com/apache/beam/pull/2014

[BEAM-1492] Upgrade bytebuddy to 1.6.8 to jump past asm 5.0

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

There is a suspected bug in asm 5.0 that is considered the likely root 
cause of a bug sbt/sbt-assembly#205 (see [this 
comment](https://github.com/sbt/sbt-assembly/issues/205#issuecomment-279964607))
 that carried over to 
[Gearpump](https://issues.apache.org/jira/browse/GEARPUMP-236). This commit 
upgrades us to depend on bytebuddy 1.6.8 that uses asm 5.2 in which those 
derivative bugs have cleared up.

I have not found a direct reference to what the issue is, precisely, but 
the dependency effect is nil because bytebuddy is fully shaded (no deps) so we 
might as well jump past any problematic version eagerly.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/beam bytebuddy-asm

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2014.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2014


commit 3e4c05c3449064e7f032a48b98551a73d71a5bbb
Author: Kenneth Knowles 
Date:   2017-02-15T22:02:38Z

Upgrade bytebuddy to 1.6.8 to jump past asm 5.0

There is a suspected bug in asm 5.0 that is considered the likely root 
cause of
a bug sbt-assembly [1] that carried over to Gearpump [2]. This commit 
upgrades
us to depend on asm 5.2 in which those derivative bugs have cleared up.

I have not found a direct reference to what the issue is, precisely, but
the dependency effect of this is extremely small and these are libraries
that are useful to keep current.

[1] https://github.com/sbt/sbt-assembly/issues/205#issuecomment-279964607
[2] https://issues.apache.org/jira/browse/GEARPUMP-236




> Avoid potential issue in ASM 5.0
> 
>
> Key: BEAM-1492
> URL: https://issues.apache.org/jira/browse/BEAM-1492
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> There is a suspected bug in asm 5.0 that is considered the likely root cause 
> of a [bug 
> sbt-assembly|https://github.com/sbt/sbt-assembly/issues/205#issuecomment-279964607]
>  that carried over to [GEARPUMP-236]. I have not found a direct reference to 
> what the issue is, precisely, but the dependency effect of this is extremely 
> small and these are libraries that are useful to keep current. And if/when 
> Gearpump runner lands on master this will avoid any diamond dep issues.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2015: [BEAM-59] Beam GcsFileSystem: port fileSizes() from...

2017-02-15 Thread peihe
GitHub user peihe opened a pull request:

https://github.com/apache/beam/pull/2015

[BEAM-59] Beam GcsFileSystem: port fileSizes() from GcsUtil for batch 
matchNonGlobs().


Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/peihe/incubator-beam 
gcs-util-refacotr-StorageObjectOrIOException

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2015.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2015


commit 4602069dd1f546fc274335755c53baff7739c285
Author: Pei He 
Date:   2017-02-15T22:07:45Z

[BEAM-59] Beam GcsFileSystem: port fileSizes() from GcsUtil for batch get 
StorageObjects.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-59) IOChannelFactory rethinking/redesign

2017-02-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-59?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868692#comment-15868692
 ] 

ASF GitHub Bot commented on BEAM-59:


GitHub user peihe opened a pull request:

https://github.com/apache/beam/pull/2015

[BEAM-59] Beam GcsFileSystem: port fileSizes() from GcsUtil for batch 
matchNonGlobs().


Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/peihe/incubator-beam 
gcs-util-refacotr-StorageObjectOrIOException

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2015.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2015


commit 4602069dd1f546fc274335755c53baff7739c285
Author: Pei He 
Date:   2017-02-15T22:07:45Z

[BEAM-59] Beam GcsFileSystem: port fileSizes() from GcsUtil for batch get 
StorageObjects.




> IOChannelFactory rethinking/redesign
> 
>
> Key: BEAM-59
> URL: https://issues.apache.org/jira/browse/BEAM-59
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core, sdk-java-gcp
>Reporter: Daniel Halperin
>Assignee: Pei He
>
> Right now, FileBasedSource and FileBasedSink communication is mediated by 
> IOChannelFactory. There are a number of issues:
> * Global configuration -- e.g., all 'gs://' URIs use the same credentials. 
> This should be per-source/per-sink/etc.
> * Supported APIs -- currently IOChannelFactory is in the "non-public API" 
> util package and subject to change. We need users to be able to add new 
> backends ('s3://', 'hdfs://', etc.) directly, without fear that they will be 
> broken.
> * Per-backend features: e.g., creating buckets in GCS/s3, setting expiration 
> time, etc.
> Updates:
> Design docs posted on dev@ list:
> Part 1: IOChannelFactory Redesign: 
> https://docs.google.com/document/d/11TdPyZ9_zmjokhNWM3Id-XJsVG3qel2lhdKTknmZ_7M/edit#
> Part 2: Configurable BeamFileSystem:
> https://docs.google.com/document/d/1-7vo9nLRsEEzDGnb562PuL4q9mUiq_ZVpCAiyyJw8p8/edit#heading=h.p3gc3colc2cs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1492) Avoid potential issue in ASM 5.0

2017-02-15 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868702#comment-15868702
 ] 

Kenneth Knowles commented on BEAM-1492:
---

[~amitsela] - our current version of Kryo uses ASM v 4.0 (via reflectasm) which 
I think is prior to the bug. To get to ASM 5.1 there is not a version yet 
released that includes the fixes; I don't know if it is safe to trust semantic 
versioning and override 5.0 deps to 5.1 or 5.2.

And the bug may not ever hit us. But I thought I would hand this off to you (or 
anyone) to track that we should be careful when upgrading Kryo and possibly 
just upgrade way past it as soon as possible.

> Avoid potential issue in ASM 5.0
> 
>
> Key: BEAM-1492
> URL: https://issues.apache.org/jira/browse/BEAM-1492
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Amit Sela
>
> There is a suspected bug in asm 5.0 that is considered the likely root cause 
> of a [bug 
> sbt-assembly|https://github.com/sbt/sbt-assembly/issues/205#issuecomment-279964607]
>  that carried over to [GEARPUMP-236]. I have not found a direct reference to 
> what the issue is, precisely, but the dependency effect of this is extremely 
> small and these are libraries that are useful to keep current. And if/when 
> Gearpump runner lands on master this will avoid any diamond dep issues.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1494) GcsFileSystem should check content encoding when setting IsReadSeekEfficient

2017-02-15 Thread Pei He (JIRA)
Pei He created BEAM-1494:


 Summary: GcsFileSystem should check content encoding when setting 
IsReadSeekEfficient
 Key: BEAM-1494
 URL: https://issues.apache.org/jira/browse/BEAM-1494
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-extensions
Reporter: Pei He


It is incorrect to set IsReadSeekEfficient true for files with content encoding 
set to gzip. This is an inherited issue from GcsIOChannelFactory.

https://cloud.google.com/storage/docs/transcoding#content-type_vs_content-encoding



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[1/3] beam-site git commit: Clarify state further in capability matrix

2017-02-15 Thread davor
Repository: beam-site
Updated Branches:
  refs/heads/asf-site f48e97f67 -> b6675d294


Clarify state further in capability matrix


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/466edb35
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/466edb35
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/466edb35

Branch: refs/heads/asf-site
Commit: 466edb3592323b64cc7b580eb06e684af9b0ba2f
Parents: f48e97f
Author: Kenneth Knowles 
Authored: Tue Feb 14 19:14:32 2017 -0800
Committer: Davor Bonaci 
Committed: Wed Feb 15 14:53:50 2017 -0800

--
 src/_data/capability-matrix.yml | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam-site/blob/466edb35/src/_data/capability-matrix.yml
--
diff --git a/src/_data/capability-matrix.yml b/src/_data/capability-matrix.yml
index f2e58ac..89f89a2 100644
--- a/src/_data/capability-matrix.yml
+++ b/src/_data/capability-matrix.yml
@@ -196,7 +196,7 @@ categories:
 l2: Not implemented in runner.
 l3: 
 
-  - name: Keyed State
+  - name: Stateful Processing
 values:
   - class: model
 l1: 'Yes'
@@ -205,19 +205,19 @@ categories:
   - class: dataflow
 l1: 'Partially'
 l2: non-merging windows
-l3: Keyed state is fully supported for non-merging windows.
+l3: State is supported for non-merging windows. SetState and 
MapState are not yet supported.
   - class: flink
 l1: 'Partially'
 l2: streaming, non-merging windows
-l3: Keyed state is supported in streaming mode for non-merging 
windows.
+l3: State is supported in streaming mode for non-merging windows. 
SetState and MapState are not yet supported.
   - class: spark
 l1: 'No'
 l2: not implemented
-l3: Spark supports keyed state with mapWithState() so support 
shuold be straight forward.
+l3: Spark supports per-key state with mapWithState() so 
support should be straightforward.
   - class: apex
 l1: 'No'
 l2: not implemented
-l3: Apex supports keyed state, so adding support for this should 
be easy.
+l3: Apex supports per-key state, so adding support for this should 
be easy.
 
   - description: Where in event time?
 anchor: where



[GitHub] beam-site pull request #154: Clarify state further in capability matrix

2017-02-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam-site/pull/154


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[3/3] beam-site git commit: This closes #154

2017-02-15 Thread davor
This closes #154


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/b6675d29
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/b6675d29
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/b6675d29

Branch: refs/heads/asf-site
Commit: b6675d2945b8f8f3fe5ab2baaa228c4f05e70067
Parents: f48e97f 575e459
Author: Davor Bonaci 
Authored: Wed Feb 15 14:54:18 2017 -0800
Committer: Davor Bonaci 
Committed: Wed Feb 15 14:54:18 2017 -0800

--
 content/blog/2017/02/13/stateful-processing.html | 19 +--
 .../runners/capability-matrix/index.html | 12 ++--
 content/feed.xml | 19 +--
 src/_data/capability-matrix.yml  | 10 +-
 4 files changed, 37 insertions(+), 23 deletions(-)
--




[2/3] beam-site git commit: Regenerate website

2017-02-15 Thread davor
Regenerate website


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/575e4598
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/575e4598
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/575e4598

Branch: refs/heads/asf-site
Commit: 575e45987aa00dca16ee562441dd071362c873b1
Parents: 466edb3
Author: Davor Bonaci 
Authored: Wed Feb 15 14:54:18 2017 -0800
Committer: Davor Bonaci 
Committed: Wed Feb 15 14:54:18 2017 -0800

--
 content/blog/2017/02/13/stateful-processing.html | 19 +--
 .../runners/capability-matrix/index.html | 12 ++--
 content/feed.xml | 19 +--
 3 files changed, 32 insertions(+), 18 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam-site/blob/575e4598/content/blog/2017/02/13/stateful-processing.html
--
diff --git a/content/blog/2017/02/13/stateful-processing.html 
b/content/blog/2017/02/13/stateful-processing.html
index 833b2fd..a936f4b 100644
--- a/content/blog/2017/02/13/stateful-processing.html
+++ b/content/blog/2017/02/13/stateful-processing.html
@@ -328,7 +328,7 @@ unique and consistent. Before diving into the code for how 
to do this in a Beam
 SDK, I’ll go over this example from the level of the model. In pictures, you
 want to write a transform that maps input to output like this:
 
-
+
 
 The order of the elements A, B, C, D, E is arbitrary, hence their assigned
 indices are arbitrary, but downstream transforms just need to be OK with this.
@@ -400,10 +400,17 @@ key+window pairs, like this:
 keys and windows are independent dimensions)
 
 You can provide the opportunity for parallelism by making sure that table 
has
-enough columns, either via many keys in few windows - for example, a globally
-windowed stateful computation keyed by user ID - or via many windows over few
-keys - for example, a fixed windowed stateful computation over a global key.
-Caveat: all Beam runners today parallelize only over the key.
+enough columns. You might have many keys and many windows, or you might have
+many of just one or the other:
+
+
+  Many keys in few windows, for example a globally windowed stateful 
computation
+keyed by user ID.
+  Many windows over few keys, for example a fixed windowed stateful 
computation
+over a global key.
+
+
+Caveat: all Beam runners today parallelize only over the key.
 
 Most often your mental model of state can be focused on only a single 
column of
 the table, a single key+window pair. Cross-column interactions do not occur
@@ -610,7 +617,7 @@ outputs from the ParDo that will be proce
 output, then you cannot use a Filter 
transform to reduce data volume downstream.
 
 Stateful processing lets you address both the latency problem of side inputs
-and the cost problem of excessive uninterseting output. Here is the code, using
+and the cost problem of excessive uninteresting output. Here is the code, using
 only features I have already introduced:
 
 new DoFn, KV>() {

http://git-wip-us.apache.org/repos/asf/beam-site/blob/575e4598/content/documentation/runners/capability-matrix/index.html
--
diff --git a/content/documentation/runners/capability-matrix/index.html 
b/content/documentation/runners/capability-matrix/index.html
index 60f62b2..88da8eb 100644
--- a/content/documentation/runners/capability-matrix/index.html
+++ b/content/documentation/runners/capability-matrix/index.html
@@ -441,7 +441,7 @@
   
   
   
-Keyed State
+Stateful Processing
 
 
 
@@ -1353,7 +1353,7 @@
   
   
   
-Keyed State
+Stateful Processing
 
 
 
@@ -1362,22 +1362,22 @@
 
 
 
-Partially: 
non-merging windowsKeyed state is fully supported for 
non-merging windows.
+Partially: 
non-merging windowsState is supported for non-merging 
windows. SetState and MapState are not yet supported.
 
 
 
 
-Partially: 
streaming, non-merging windowsKeyed state is supported in 
streaming mode for non-merging windows.
+Partially: 
streaming, non-merging windowsState is supported in 
streaming mode for non-merging windows. SetState and MapState are not yet 
supported.
 
 
 
 
-No: not 
implementedSpark supports keyed state with mapWithState() so 
support shuold be straight forward.
+No: not 
implementedSpark supports per-key state with 
mapWithState() so support should be straightforward.
 
 
 
 
-No: not 
implementedApex supports keyed state, so adding support for 
this should be easy.
+No: not 
implementedApex supports per-key state, so adding support 
for this should be easy.
 
 
   

http://git-wip-us.apache.org/r

[jira] [Commented] (BEAM-780) Add support for pipeline metrics

2017-02-15 Thread Pablo Estrada (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868816#comment-15868816
 ] 

Pablo Estrada commented on BEAM-780:


Should we mark this a resolved?

> Add support for pipeline metrics
> 
>
> Key: BEAM-780
> URL: https://issues.apache.org/jira/browse/BEAM-780
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Frances Perry
>  Labels: sdk-consistency
>
> Remove aggregators and replace them with the metrics API.
> See: https://issues.apache.org/jira/browse/BEAM-147 for the Java SDK.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2016: [BEAM-1381] Making counter metrics queriable from P...

2017-02-15 Thread pabloem
GitHub user pabloem opened a pull request:

https://github.com/apache/beam/pull/2016

[BEAM-1381] Making counter metrics queriable from Python.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
Also fixing a display data error.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pabloem/incubator-beam queriable-metrics

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2016.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2016


commit 29180df7d43a34389ddb550caacf328976fda2c8
Author: Pablo 
Date:   2017-02-15T23:16:12Z

Making metrics queriable. Fixing display data issue.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-1381) Implement DataflowMetrics.query method

2017-02-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868821#comment-15868821
 ] 

ASF GitHub Bot commented on BEAM-1381:
--

GitHub user pabloem opened a pull request:

https://github.com/apache/beam/pull/2016

[BEAM-1381] Making counter metrics queriable from Python.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
Also fixing a display data error.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pabloem/incubator-beam queriable-metrics

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2016.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2016


commit 29180df7d43a34389ddb550caacf328976fda2c8
Author: Pablo 
Date:   2017-02-15T23:16:12Z

Making metrics queriable. Fixing display data issue.




> Implement DataflowMetrics.query method
> --
>
> Key: BEAM-1381
> URL: https://issues.apache.org/jira/browse/BEAM-1381
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Pablo Estrada
>Assignee: Ahmet Altay
>
> Once metrics can bequeried from the dataflow service, we must implement the 
> logic for this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[1/2] beam git commit: [BEAM-59] Beam GcsFileSystem: port expand() from GcsUtil for glob matching.

2017-02-15 Thread pei
Repository: beam
Updated Branches:
  refs/heads/master 00ea3f7d7 -> 013f11885


[BEAM-59] Beam GcsFileSystem: port expand() from GcsUtil for glob matching.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/993cd0c7
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/993cd0c7
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/993cd0c7

Branch: refs/heads/master
Commit: 993cd0c7bcd161cbb794651a5594499e1dbe0c47
Parents: 00ea3f7
Author: Pei He 
Authored: Mon Feb 13 17:17:55 2017 -0800
Committer: Pei He 
Committed: Wed Feb 15 16:31:16 2017 -0800

--
 .../DataflowPipelineTranslatorTest.java |   2 -
 .../runners/dataflow/DataflowRunnerTest.java|   2 -
 .../java/org/apache/beam/sdk/util/GcsUtil.java  | 104 +-
 .../beam/sdk/util/GcsPathValidatorTest.java |   2 -
 sdks/java/io/google-cloud-platform/pom.xml  |   5 +
 .../beam/sdk/io/gcp/storage/GcsFileSystem.java  |  62 ++
 .../sdk/io/gcp/storage/GcsFileSystemTest.java   | 189 +++
 7 files changed, 313 insertions(+), 53 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/993cd0c7/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslatorTest.java
--
diff --git 
a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslatorTest.java
 
b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslatorTest.java
index 84b585a..2ff1032 100644
--- 
a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslatorTest.java
+++ 
b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslatorTest.java
@@ -30,7 +30,6 @@ import static org.junit.Assert.assertNull;
 import static org.junit.Assert.assertThat;
 import static org.junit.Assert.assertTrue;
 import static org.mockito.Matchers.any;
-import static org.mockito.Matchers.anyString;
 import static org.mockito.Matchers.argThat;
 import static org.mockito.Matchers.eq;
 import static org.mockito.Mockito.mock;
@@ -161,7 +160,6 @@ public class DataflowPipelineTranslatorTest implements 
Serializable {
   }
 });
 when(mockGcsUtil.bucketAccessible(any(GcsPath.class))).thenReturn(true);
-when(mockGcsUtil.isGcsPatternSupported(anyString())).thenCallRealMethod();
 
 DataflowPipelineOptions options = 
PipelineOptionsFactory.as(DataflowPipelineOptions.class);
 options.setRunner(DataflowRunner.class);

http://git-wip-us.apache.org/repos/asf/beam/blob/993cd0c7/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowRunnerTest.java
--
diff --git 
a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowRunnerTest.java
 
b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowRunnerTest.java
index 4fff1c6..b2bc319 100644
--- 
a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowRunnerTest.java
+++ 
b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowRunnerTest.java
@@ -168,7 +168,6 @@ public class DataflowRunnerTest {
 StandardOpenOption.CREATE, StandardOpenOption.DELETE_ON_CLOSE);
   }
 });
-when(mockGcsUtil.isGcsPatternSupported(anyString())).thenReturn(true);
 when(mockGcsUtil.expand(any(GcsPath.class))).then(new 
Answer>() {
   @Override
   public List answer(InvocationOnMock invocation) throws 
Throwable {
@@ -238,7 +237,6 @@ public class DataflowRunnerTest {
 StandardOpenOption.CREATE, StandardOpenOption.DELETE_ON_CLOSE);
   }
 });
-when(mockGcsUtil.isGcsPatternSupported(anyString())).thenReturn(true);
 when(mockGcsUtil.expand(any(GcsPath.class))).then(new 
Answer>() {
   @Override
   public List answer(InvocationOnMock invocation) throws 
Throwable {

http://git-wip-us.apache.org/repos/asf/beam/blob/993cd0c7/sdks/java/core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java
--
diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java
index 44c49bc..6345867 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java
@@ -160,15 +160,68 @@ public class GcsUtil {
* Returns true if the given GCS pattern is supported otherwise fails with an
* exception.

[2/2] beam git commit: This closes #2002

2017-02-15 Thread pei
This closes #2002


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/013f1188
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/013f1188
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/013f1188

Branch: refs/heads/master
Commit: 013f118855569be52f89f94c75e263271b42f030
Parents: 00ea3f7 993cd0c
Author: Pei He 
Authored: Wed Feb 15 16:32:16 2017 -0800
Committer: Pei He 
Committed: Wed Feb 15 16:32:16 2017 -0800

--
 .../DataflowPipelineTranslatorTest.java |   2 -
 .../runners/dataflow/DataflowRunnerTest.java|   2 -
 .../java/org/apache/beam/sdk/util/GcsUtil.java  | 104 +-
 .../beam/sdk/util/GcsPathValidatorTest.java |   2 -
 sdks/java/io/google-cloud-platform/pom.xml  |   5 +
 .../beam/sdk/io/gcp/storage/GcsFileSystem.java  |  62 ++
 .../sdk/io/gcp/storage/GcsFileSystemTest.java   | 189 +++
 7 files changed, 313 insertions(+), 53 deletions(-)
--




[GitHub] beam pull request #2002: [BEAM-59] Beam GcsFileSystem: port expand() from Gc...

2017-02-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2002


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-59) IOChannelFactory rethinking/redesign

2017-02-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-59?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1586#comment-1586
 ] 

ASF GitHub Bot commented on BEAM-59:


Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2002


> IOChannelFactory rethinking/redesign
> 
>
> Key: BEAM-59
> URL: https://issues.apache.org/jira/browse/BEAM-59
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core, sdk-java-gcp
>Reporter: Daniel Halperin
>Assignee: Pei He
>
> Right now, FileBasedSource and FileBasedSink communication is mediated by 
> IOChannelFactory. There are a number of issues:
> * Global configuration -- e.g., all 'gs://' URIs use the same credentials. 
> This should be per-source/per-sink/etc.
> * Supported APIs -- currently IOChannelFactory is in the "non-public API" 
> util package and subject to change. We need users to be able to add new 
> backends ('s3://', 'hdfs://', etc.) directly, without fear that they will be 
> broken.
> * Per-backend features: e.g., creating buckets in GCS/s3, setting expiration 
> time, etc.
> Updates:
> Design docs posted on dev@ list:
> Part 1: IOChannelFactory Redesign: 
> https://docs.google.com/document/d/11TdPyZ9_zmjokhNWM3Id-XJsVG3qel2lhdKTknmZ_7M/edit#
> Part 2: Configurable BeamFileSystem:
> https://docs.google.com/document/d/1-7vo9nLRsEEzDGnb562PuL4q9mUiq_ZVpCAiyyJw8p8/edit#heading=h.p3gc3colc2cs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1495) Move ParDo.Bound as ParDo.BoundMulti into the core runners construction module

2017-02-15 Thread Thomas Groh (JIRA)
Thomas Groh created BEAM-1495:
-

 Summary: Move ParDo.Bound as ParDo.BoundMulti into the core 
runners construction module
 Key: BEAM-1495
 URL: https://issues.apache.org/jira/browse/BEAM-1495
 Project: Beam
  Issue Type: Bug
  Components: runner-core
Reporter: Thomas Groh
Priority: Minor


This is a reasonable runner-independent override.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Build failed in Jenkins: beam_PostCommit_Java_MavenInstall #2650

2017-02-15 Thread Apache Jenkins Server
See 


Changes:

[peihe] [BEAM-59] Beam GcsFileSystem: port expand() from GcsUtil for glob

--
[...truncated 4933 lines...]
2017-02-16T00:43:41.074 [INFO] --- findbugs-maven-plugin:3.0.4:findbugs 
(findbugs) @ beam-sdks-java-io-google-cloud-platform ---
2017-02-16T00:43:41.080 [INFO] Fork Value is true
2017-02-16T00:43:55.828 [INFO] Done FindBugs Analysis
2017-02-16T00:43:55.947 [INFO] 
2017-02-16T00:43:55.947 [INFO] <<< findbugs-maven-plugin:3.0.4:check (default) 
< :findbugs @ beam-sdks-java-io-google-cloud-platform <<<
2017-02-16T00:43:55.947 [INFO] 
2017-02-16T00:43:55.947 [INFO] --- findbugs-maven-plugin:3.0.4:check (default) 
@ beam-sdks-java-io-google-cloud-platform ---
2017-02-16T00:43:55.952 [INFO] BugInstance size is 0
2017-02-16T00:43:55.952 [INFO] Error size is 0
2017-02-16T00:43:55.955 [INFO] No errors/warnings found
2017-02-16T00:43:56.016 [INFO] 
2017-02-16T00:43:56.016 [INFO] --- maven-surefire-plugin:2.19.1:test 
(default-test) @ beam-sdks-java-io-google-cloud-platform ---
2017-02-16T00:43:56.020 [INFO] Surefire report directory: 

2017-02-16T00:43:56.020 [INFO] Using configured provider 
org.apache.maven.surefire.junitcore.JUnitCoreProvider
2017-02-16T00:43:56.020 [INFO] parallel='none', perCoreThreadCount=true, 
threadCount=0, useUnlimitedThreads=false, threadCountSuites=0, 
threadCountClasses=0, threadCountMethods=0, parallelOptimized=true

---
 T E S T S
---
Running org.apache.beam.sdk.io.gcp.GcpApiSurfaceTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.88 sec - in 
org.apache.beam.sdk.io.gcp.GcpApiSurfaceTest
Running org.apache.beam.sdk.io.gcp.bigquery.BigQueryTableRowIteratorTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.451 sec - in 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryTableRowIteratorTest
Running org.apache.beam.sdk.io.gcp.bigquery.BigQueryAvroUtilsTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.698 sec - in 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryAvroUtilsTest
Running org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOTest
Tests run: 75, Failures: 0, Errors: 0, Skipped: 4, Time elapsed: 6.087 sec - in 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOTest
Running org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtilTest
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.631 sec - in 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtilTest
Running org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImplTest
Tests run: 24, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.19 sec - in 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImplTest
Running org.apache.beam.sdk.io.gcp.bigtable.BigtableIOTest
Tests run: 27, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.488 sec - in 
org.apache.beam.sdk.io.gcp.bigtable.BigtableIOTest
Running org.apache.beam.sdk.io.gcp.datastore.DatastoreV1Test
Tests run: 43, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.265 sec - in 
org.apache.beam.sdk.io.gcp.datastore.DatastoreV1Test
Running org.apache.beam.sdk.io.gcp.storage.GcsFileSystemRegistrarTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.004 sec - in 
org.apache.beam.sdk.io.gcp.storage.GcsFileSystemRegistrarTest
Running org.apache.beam.sdk.io.gcp.storage.GcsResourceIdTest
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.004 sec - in 
org.apache.beam.sdk.io.gcp.storage.GcsResourceIdTest
Running org.apache.beam.sdk.io.gcp.storage.GcsFileSystemTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.02 sec - in 
org.apache.beam.sdk.io.gcp.storage.GcsFileSystemTest

Results :

Tests run: 194, Failures: 0, Errors: 0, Skipped: 4

[JENKINS] Recording test results
2017-02-16T00:44:21.409 [INFO] 
2017-02-16T00:44:21.409 [INFO] --- maven-jar-plugin:3.0.2:jar (default-jar) @ 
beam-sdks-java-io-google-cloud-platform ---
2017-02-16T00:44:21.420 [INFO] Building jar: 

2017-02-16T00:44:21.521 [INFO] 
2017-02-16T00:44:21.521 [INFO] --- maven-site-plugin:3.5.1:attach-descriptor 
(attach-descriptor) @ beam-sdks-java-io-google-cloud-platform ---
2017-02-16T00:44:21.571 [INFO] 
2017-02-16T00:44:21.571 [INFO] --- maven-javadoc-plugin:2.10.4:jar (javadoc) @ 
beam-sdks-java-io-google-cloud-platform ---
2017-02-16T00:44:23.889 [INFO] Building jar: 

2017-02-16T00:44:23.956 [INFO] 
2017-02-1

[jira] [Commented] (BEAM-1490) master branch for apache/beam-site still contains the Beam code

2017-02-15 Thread Davor Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868907#comment-15868907
 ] 

Davor Bonaci commented on BEAM-1490:


I think the right message should be "branch doesn't exist". I did a few things 
to try this out and didn't reproduce it. Could it be a configuration problem 
locally? Something stale? Stale remote? The branch is not visible in the UI 
either.

> master branch for apache/beam-site still contains the Beam code
> ---
>
> Key: BEAM-1490
> URL: https://issues.apache.org/jira/browse/BEAM-1490
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Kenneth Knowles
>Assignee: Davor Bonaci
>Priority: Trivial
>
> I accidentally typed {{git checkout -b mybranch github/master}} today instead 
> of {{git checkout -b mybranch github/asf-site}} and I was dropped into a 
> really old Beam codebase.
> Perhaps the best thing to have here would be just one empty file called 
> {{you-probably-meant-to-checkout-asf-site}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1496) pysdk's sideinputs_test requires nose, but not installed by default

2017-02-15 Thread Younghee Kwon (JIRA)
Younghee Kwon created BEAM-1496:
---

 Summary: pysdk's sideinputs_test requires nose, but not installed 
by default
 Key: BEAM-1496
 URL: https://issues.apache.org/jira/browse/BEAM-1496
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Younghee Kwon
Assignee: Ahmet Altay
Priority: Minor


$ PYTHONPATH= python -m apache_beam.transforms.sideinputs_test  
   
No handlers could be found for logger "oauth2client.contrib.multistore_file"
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
  File 
"/usr/local/google/home/youngheek/work/github/beam3/sdks/python/apache_beam/transforms/sideinputs_test.py",
 line 23, in 
from nose.plugins.attrib import attr
ImportError: No module named nose.plugins.attrib




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1490) master branch for apache/beam-site still contains the Beam code

2017-02-15 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868923#comment-15868923
 ] 

Kenneth Knowles commented on BEAM-1490:
---

Oh, I must be mistaking a local branch for a remote.

> master branch for apache/beam-site still contains the Beam code
> ---
>
> Key: BEAM-1490
> URL: https://issues.apache.org/jira/browse/BEAM-1490
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Kenneth Knowles
>Assignee: Davor Bonaci
>Priority: Trivial
>
> I accidentally typed {{git checkout -b mybranch github/master}} today instead 
> of {{git checkout -b mybranch github/asf-site}} and I was dropped into a 
> really old Beam codebase.
> Perhaps the best thing to have here would be just one empty file called 
> {{you-probably-meant-to-checkout-asf-site}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1496) pysdk's sideinputs_test requires nose, but not installed by default

2017-02-15 Thread Younghee Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868926#comment-15868926
 ] 

Younghee Kwon commented on BEAM-1496:
-

I could add nose to setup.py, but the notice in the site discourages me..
https://nose.readthedocs.io/en/latest/

Note to Users

Nose has been in maintenance mode for the past several years and will likely 
cease without a new person/team to take over maintainership. New projects 
should consider using Nose2, py.test, or just plain unittest/unittest2.



> pysdk's sideinputs_test requires nose, but not installed by default
> ---
>
> Key: BEAM-1496
> URL: https://issues.apache.org/jira/browse/BEAM-1496
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Younghee Kwon
>Assignee: Ahmet Altay
>Priority: Minor
>
> $ PYTHONPATH= python -m apache_beam.transforms.sideinputs_test
>  
> No handlers could be found for logger "oauth2client.contrib.multistore_file"
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
> "__main__", fname, loader, pkg_name)
>   File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
> exec code in run_globals
>   File 
> "/usr/local/google/home/youngheek/work/github/beam3/sdks/python/apache_beam/transforms/sideinputs_test.py",
>  line 23, in 
> from nose.plugins.attrib import attr
> ImportError: No module named nose.plugins.attrib



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (BEAM-1490) master branch for apache/beam-site still contains the Beam code

2017-02-15 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles closed BEAM-1490.
-
   Resolution: Invalid
Fix Version/s: Not applicable

> master branch for apache/beam-site still contains the Beam code
> ---
>
> Key: BEAM-1490
> URL: https://issues.apache.org/jira/browse/BEAM-1490
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Kenneth Knowles
>Assignee: Davor Bonaci
>Priority: Trivial
> Fix For: Not applicable
>
>
> I accidentally typed {{git checkout -b mybranch github/master}} today instead 
> of {{git checkout -b mybranch github/asf-site}} and I was dropped into a 
> really old Beam codebase.
> Perhaps the best thing to have here would be just one empty file called 
> {{you-probably-meant-to-checkout-asf-site}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2017: [BEAM-1410] python-sdk: add stacked WindowedValues ...

2017-02-15 Thread yk5
GitHub user yk5 opened a pull request:

https://github.com/apache/beam/pull/2017

[BEAM-1410] python-sdk: add stacked WindowedValues in DirectRunner.Bundle.

It saves memory for the typical cases that timestamp/window info is shared.

This is on by default, but could be turned off by sending 
--no_direct_runner_use_stacked_bundle to the pipeline.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yk5/beam stacked_bundle

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2017.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2017


commit f23f1b3af251c48789ff1afb9118b817b7d6fff4
Author: Younghee Kwon 
Date:   2017-02-16T01:23:34Z

python-sdk: add stacked WindowedValues in DirectRunner.Bundle.

It saves memory for the typical cases that timestamp/window info is shared.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-1410) Reduce sdk-py DirectRunner running time and memory consumption

2017-02-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868942#comment-15868942
 ] 

ASF GitHub Bot commented on BEAM-1410:
--

GitHub user yk5 opened a pull request:

https://github.com/apache/beam/pull/2017

[BEAM-1410] python-sdk: add stacked WindowedValues in DirectRunner.Bundle.

It saves memory for the typical cases that timestamp/window info is shared.

This is on by default, but could be turned off by sending 
--no_direct_runner_use_stacked_bundle to the pipeline.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yk5/beam stacked_bundle

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2017.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2017


commit f23f1b3af251c48789ff1afb9118b817b7d6fff4
Author: Younghee Kwon 
Date:   2017-02-16T01:23:34Z

python-sdk: add stacked WindowedValues in DirectRunner.Bundle.

It saves memory for the typical cases that timestamp/window info is shared.




> Reduce sdk-py DirectRunner running time and memory consumption
> --
>
> Key: BEAM-1410
> URL: https://issues.apache.org/jira/browse/BEAM-1410
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Younghee Kwon
>Assignee: Ahmet Altay
>Priority: Minor
>  Labels: performance, python
>
> Some experimental benchmarks shows that DirectRunner can improve performance 
> in cpu and memory. 
> I will roll out some CLs to improve them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1495) Move ParDo.Bound as ParDo.BoundMulti into the core runners construction module

2017-02-15 Thread Thomas Groh (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868989#comment-15868989
 ] 

Thomas Groh commented on BEAM-1495:
---

It is worth instead considering implementing ParDo.Bound purely in terms of 
ParDo.BoundMulti, and having a PrimitiveParDoSingle override factory for a 
runner who wishes to implement it as a single-output primitive. This allows us 
to cut out ParDo.Bound out of the set of primitives.

[~kenn]

> Move ParDo.Bound as ParDo.BoundMulti into the core runners construction module
> --
>
> Key: BEAM-1495
> URL: https://issues.apache.org/jira/browse/BEAM-1495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Thomas Groh
>Priority: Minor
>  Labels: starter
>
> This is a reasonable runner-independent override.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1467) Use well-known coder types for known window coders

2017-02-15 Thread Vikas Kedigehalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikas Kedigehalli updated BEAM-1467:

Description: 
Known window types include:

* GlobalWindow
* IntervalWindow
* WindowedValueCoder

Standardizing the name and encodings of these windows will enable many more 
pipelines to work across the Fn API with low overhead.

  was:
Known window types include:

* GlobalWindow
* IntervalWindow

Standardizing the name and encodings of these windows will enable many more 
pipelines to work across the Fn API with low overhead.


> Use well-known coder types for known window coders
> --
>
> Key: BEAM-1467
> URL: https://issues.apache.org/jira/browse/BEAM-1467
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-fn-api, beam-model-runner-api
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>Priority: Minor
>
> Known window types include:
> * GlobalWindow
> * IntervalWindow
> * WindowedValueCoder
> Standardizing the name and encodings of these windows will enable many more 
> pipelines to work across the Fn API with low overhead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2018: [BEAM-1467] dd cross-SDK implementations and tests ...

2017-02-15 Thread vikkyrk
GitHub user vikkyrk opened a pull request:

https://github.com/apache/beam/pull/2018

[BEAM-1467] dd cross-SDK implementations and tests of WindowedValueCoder

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

- Tests for GlobalWindowCoder
- Cross-sdk implementation / tests for WindowedValueCoder
- Note: PaneInfo isn't supported in python SDK yet, so we hard code the 
value to 0x0F which represents PaneInfo.NO_FIRING

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vikkyrk/incubator-beam 
common_windowed_value_coder

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2018.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2018


commit e0c5025ecc6529e2db04aff9e892ef045f7a7451
Author: Vikas Kedigehalli 
Date:   2017-02-16T02:13:12Z

Add cross-SDK implementations and tests of WindowedValueCoder

commit aa9840fe7c616c9ed85b475684fff61205449502
Author: Vikas Kedigehalli 
Date:   2017-02-16T02:40:29Z

Fix lint errors




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-1467) Use well-known coder types for known window coders

2017-02-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869035#comment-15869035
 ] 

ASF GitHub Bot commented on BEAM-1467:
--

GitHub user vikkyrk opened a pull request:

https://github.com/apache/beam/pull/2018

[BEAM-1467] dd cross-SDK implementations and tests of WindowedValueCoder

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

- Tests for GlobalWindowCoder
- Cross-sdk implementation / tests for WindowedValueCoder
- Note: PaneInfo isn't supported in python SDK yet, so we hard code the 
value to 0x0F which represents PaneInfo.NO_FIRING

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vikkyrk/incubator-beam 
common_windowed_value_coder

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2018.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2018


commit e0c5025ecc6529e2db04aff9e892ef045f7a7451
Author: Vikas Kedigehalli 
Date:   2017-02-16T02:13:12Z

Add cross-SDK implementations and tests of WindowedValueCoder

commit aa9840fe7c616c9ed85b475684fff61205449502
Author: Vikas Kedigehalli 
Date:   2017-02-16T02:40:29Z

Fix lint errors




> Use well-known coder types for known window coders
> --
>
> Key: BEAM-1467
> URL: https://issues.apache.org/jira/browse/BEAM-1467
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-fn-api, beam-model-runner-api
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>Priority: Minor
>
> Known window types include:
> * GlobalWindow
> * IntervalWindow
> * WindowedValueCoder
> Standardizing the name and encodings of these windows will enable many more 
> pipelines to work across the Fn API with low overhead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1467) Use well-known coder types for known window coders

2017-02-15 Thread Vikas Kedigehalli (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869039#comment-15869039
 ] 

Vikas Kedigehalli commented on BEAM-1467:
-

WindowedValueCoder implemenation and tests 
https://github.com/apache/beam/pull/2018

Note: PaneInfo isn't supported in python SDK yet, so we hard code the value to 
0x0F which represents PaneInfo.NO_FIRING

> Use well-known coder types for known window coders
> --
>
> Key: BEAM-1467
> URL: https://issues.apache.org/jira/browse/BEAM-1467
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-fn-api, beam-model-runner-api
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>Priority: Minor
>
> Known window types include:
> * GlobalWindow
> * IntervalWindow
> * WindowedValueCoder
> Standardizing the name and encodings of these windows will enable many more 
> pipelines to work across the Fn API with low overhead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[2/2] beam git commit: Add runners/core-construction-java

2017-02-15 Thread tgroh
Add runners/core-construction-java

This module contains pre-execution PipelineRunner utilities.

Move PTransformMatchers to core-construction-java


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/1ffbc689
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/1ffbc689
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/1ffbc689

Branch: refs/heads/master
Commit: 1ffbc6899c6c3109da19f505c94d22f267ee0974
Parents: 013f118
Author: Thomas Groh 
Authored: Wed Feb 15 12:57:52 2017 -0800
Committer: Thomas Groh 
Committed: Wed Feb 15 19:07:09 2017 -0800

--
 pom.xml |   6 +
 runners/core-construction-java/pom.xml  | 137 ++
 .../core/construction/PTransformMatchers.java   | 161 +++
 .../core/construction/ReplacementOutputs.java   | 105 +++
 .../runners/core/construction/package-info.java |  22 ++
 .../construction/PTransformMatchersTest.java| 273 +++
 .../construction/ReplacementOutputsTest.java| 254 +
 .../beam/runners/core/PTransformMatchers.java   | 142 --
 .../beam/runners/core/ReplacementOutputs.java   | 105 ---
 .../runners/core/PTransformMatchersTest.java| 273 ---
 .../runners/core/ReplacementOutputsTest.java| 254 -
 runners/direct-java/pom.xml |   7 +-
 ...ectGBKIntoKeyedWorkItemsOverrideFactory.java |   2 +-
 .../direct/DirectGroupByKeyOverrideFactory.java |   2 +-
 .../direct/ParDoMultiOverrideFactory.java   |   2 +-
 .../ParDoSingleViaMultiOverrideFactory.java |   2 +-
 .../direct/TestStreamEvaluatorFactory.java  |   2 +-
 .../runners/direct/ViewEvaluatorFactory.java|   2 +-
 runners/pom.xml |   1 +
 19 files changed, 971 insertions(+), 781 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/1ffbc689/pom.xml
--
diff --git a/pom.xml b/pom.xml
index d53502e..ac34016 100644
--- a/pom.xml
+++ b/pom.xml
@@ -421,6 +421,12 @@
 
   
 org.apache.beam
+beam-runners-core-construction-java
+${project.version}
+  
+
+  
+org.apache.beam
 beam-runners-core-java
 ${project.version}
   

http://git-wip-us.apache.org/repos/asf/beam/blob/1ffbc689/runners/core-construction-java/pom.xml
--
diff --git a/runners/core-construction-java/pom.xml 
b/runners/core-construction-java/pom.xml
new file mode 100644
index 000..868f743
--- /dev/null
+++ b/runners/core-construction-java/pom.xml
@@ -0,0 +1,137 @@
+
+
+
+http://maven.apache.org/POM/4.0.0";
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+
+  4.0.0
+
+  
+beam-runners-parent
+org.apache.beam
+0.6.0-SNAPSHOT
+../pom.xml
+  
+
+  beam-runners-core-construction-java
+  Apache Beam :: Runners :: Core Java Construction
+  Beam Runners Core provides utilities to aid runner authors 
interact with a Pipeline
+prior to execution.
+  
+
+  jar
+
+  
+
+  
+org.apache.maven.plugins
+maven-surefire-plugin
+
+  
+org.apache.beam.sdk.testing.NeedsRunner
+  
+  
+true
+  
+
+  
+
+  
+org.apache.maven.plugins
+maven-shade-plugin
+
+  
+bundle-and-repackage
+package
+
+  shade
+
+
+  true
+  
+
+  com.google.guava:guava
+
+  
+  
+
+  *:*
+  
+META-INF/*.SF
+META-INF/*.DSA
+META-INF/*.RSA
+  
+
+  
+  
+
+
+  com.google.common
+  
+
org.apache.beam.runners.core.construction.repackaged.com.google.common
+  
+
+
+  com.google.thirdparty
+  
+
org.apache.beam.runners.core.construction.repackaged.com.google.thirdparty
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  org.apache.beam
+  beam-sdks-java-core
+
+
+
+  joda-time
+  joda-time
+
+
+
+  com.google.guava
+  guava
+
+
+
+
+
+  org.hamcrest
+  hamcrest-all
+  test
+
+
+
+ 

[1/2] beam git commit: This closes #2013

2017-02-15 Thread tgroh
Repository: beam
Updated Branches:
  refs/heads/master 013f11885 -> 18f3767e3


This closes #2013


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/18f3767e
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/18f3767e
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/18f3767e

Branch: refs/heads/master
Commit: 18f3767e320b9fd9915d39d036102ad583e35dbd
Parents: 013f118 1ffbc68
Author: Thomas Groh 
Authored: Wed Feb 15 19:07:09 2017 -0800
Committer: Thomas Groh 
Committed: Wed Feb 15 19:07:09 2017 -0800

--
 pom.xml |   6 +
 runners/core-construction-java/pom.xml  | 137 ++
 .../core/construction/PTransformMatchers.java   | 161 +++
 .../core/construction/ReplacementOutputs.java   | 105 +++
 .../runners/core/construction/package-info.java |  22 ++
 .../construction/PTransformMatchersTest.java| 273 +++
 .../construction/ReplacementOutputsTest.java| 254 +
 .../beam/runners/core/PTransformMatchers.java   | 142 --
 .../beam/runners/core/ReplacementOutputs.java   | 105 ---
 .../runners/core/PTransformMatchersTest.java| 273 ---
 .../runners/core/ReplacementOutputsTest.java| 254 -
 runners/direct-java/pom.xml |   7 +-
 ...ectGBKIntoKeyedWorkItemsOverrideFactory.java |   2 +-
 .../direct/DirectGroupByKeyOverrideFactory.java |   2 +-
 .../direct/ParDoMultiOverrideFactory.java   |   2 +-
 .../ParDoSingleViaMultiOverrideFactory.java |   2 +-
 .../direct/TestStreamEvaluatorFactory.java  |   2 +-
 .../runners/direct/ViewEvaluatorFactory.java|   2 +-
 runners/pom.xml |   1 +
 19 files changed, 971 insertions(+), 781 deletions(-)
--




[jira] [Commented] (BEAM-1493) runners/core-java should be a pre-execution and an execution-time module

2017-02-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869042#comment-15869042
 ] 

ASF GitHub Bot commented on BEAM-1493:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2013


> runners/core-java should be a pre-execution and an execution-time module
> 
>
> Key: BEAM-1493
> URL: https://issues.apache.org/jira/browse/BEAM-1493
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>
> This permits a runner to use an internal version of runners-core, but have 
> utilities that interact with the Pipeline within the runner shim.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1467) Use well-known coder types for known window coders

2017-02-15 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869041#comment-15869041
 ] 

Kenneth Knowles commented on BEAM-1467:
---

{{WindowedValue}} is not intended to be user facing - it is slated for move to 
runners-core. I don't think that changes anything about what we'll do with this 
ticket, just noting.

> Use well-known coder types for known window coders
> --
>
> Key: BEAM-1467
> URL: https://issues.apache.org/jira/browse/BEAM-1467
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-fn-api, beam-model-runner-api
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>Priority: Minor
>
> Known window types include:
> * GlobalWindow
> * IntervalWindow
> * WindowedValueCoder
> Standardizing the name and encodings of these windows will enable many more 
> pipelines to work across the Fn API with low overhead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2013: [BEAM-1493] Add runners/core-pipeline-java

2017-02-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2013


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-1496) pysdk's sideinputs_test requires nose, but not installed by default

2017-02-15 Thread Ahmet Altay (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869046#comment-15869046
 ] 

Ahmet Altay commented on BEAM-1496:
---

We do not want to add test specific dependencies to the requirements in 
general. Tox will install nose before running tests.

If you want to run a single test, then you need to install it manually.

> pysdk's sideinputs_test requires nose, but not installed by default
> ---
>
> Key: BEAM-1496
> URL: https://issues.apache.org/jira/browse/BEAM-1496
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Younghee Kwon
>Assignee: Ahmet Altay
>Priority: Minor
>
> $ PYTHONPATH= python -m apache_beam.transforms.sideinputs_test
>  
> No handlers could be found for logger "oauth2client.contrib.multistore_file"
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
> "__main__", fname, loader, pkg_name)
>   File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
> exec code in run_globals
>   File 
> "/usr/local/google/home/youngheek/work/github/beam3/sdks/python/apache_beam/transforms/sideinputs_test.py",
>  line 23, in 
> from nose.plugins.attrib import attr
> ImportError: No module named nose.plugins.attrib



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[2/2] beam git commit: This closes #2014: Upgrade bytebuddy to 1.6.8 to jump past asm 5.0

2017-02-15 Thread kenn
This closes #2014: Upgrade bytebuddy to 1.6.8 to jump past asm 5.0


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/2e766ced
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/2e766ced
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/2e766ced

Branch: refs/heads/master
Commit: 2e766ced5e1bf3a605e13da208091d170486f9a8
Parents: 18f3767 3e4c05c
Author: Kenneth Knowles 
Authored: Wed Feb 15 19:12:09 2017 -0800
Committer: Kenneth Knowles 
Committed: Wed Feb 15 19:12:09 2017 -0800

--
 pom.xml  | 2 +-
 .../beam/sdk/transforms/reflect/ByteBuddyDoFnInvokerFactory.java | 4 ++--
 .../sdk/transforms/reflect/ByteBuddyOnTimerInvokerFactory.java   | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/2e766ced/pom.xml
--



[1/2] beam git commit: Upgrade bytebuddy to 1.6.8 to jump past asm 5.0

2017-02-15 Thread kenn
Repository: beam
Updated Branches:
  refs/heads/master 18f3767e3 -> 2e766ced5


Upgrade bytebuddy to 1.6.8 to jump past asm 5.0

There is a suspected bug in asm 5.0 that is considered the likely root cause of
a bug sbt-assembly [1] that carried over to Gearpump [2]. This commit upgrades
us to depend on asm 5.2 in which those derivative bugs have cleared up.

I have not found a direct reference to what the issue is, precisely, but
the dependency effect of this is extremely small and these are libraries
that are useful to keep current.

[1] https://github.com/sbt/sbt-assembly/issues/205#issuecomment-279964607
[2] https://issues.apache.org/jira/browse/GEARPUMP-236


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/3e4c05c3
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/3e4c05c3
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/3e4c05c3

Branch: refs/heads/master
Commit: 3e4c05c3449064e7f032a48b98551a73d71a5bbb
Parents: 00ea3f7
Author: Kenneth Knowles 
Authored: Wed Feb 15 14:02:38 2017 -0800
Committer: Kenneth Knowles 
Committed: Wed Feb 15 14:15:36 2017 -0800

--
 pom.xml  | 2 +-
 .../beam/sdk/transforms/reflect/ByteBuddyDoFnInvokerFactory.java | 4 ++--
 .../sdk/transforms/reflect/ByteBuddyOnTimerInvokerFactory.java   | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/3e4c05c3/pom.xml
--
diff --git a/pom.xml b/pom.xml
index d53502e..cc6de11 100644
--- a/pom.xml
+++ b/pom.xml
@@ -871,7 +871,7 @@
   
 net.bytebuddy
 byte-buddy
-1.5.5
+1.6.8
   
 
   

http://git-wip-us.apache.org/repos/asf/beam/blob/3e4c05c3/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/ByteBuddyDoFnInvokerFactory.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/ByteBuddyDoFnInvokerFactory.java
 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/ByteBuddyDoFnInvokerFactory.java
index 46b21d6..8e3a37c 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/ByteBuddyDoFnInvokerFactory.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/ByteBuddyDoFnInvokerFactory.java
@@ -427,7 +427,7 @@ public class ByteBuddyDoFnInvokerFactory implements 
DoFnInvokerFactory {
   // Push "this" (DoFnInvoker on top of the stack)
   MethodVariableAccess.REFERENCE.loadFrom(0),
   // Access this.delegate (DoFn on top of the stack)
-  FieldAccess.forField(delegateField).getter(),
+  FieldAccess.forField(delegateField).read(),
   // Cast it to the more precise type
   TypeCasting.to(doFnType),
   // Run the beforeDelegation manipulations.
@@ -637,7 +637,7 @@ public class ByteBuddyDoFnInvokerFactory implements 
DoFnInvokerFactory {
   StackManipulation pushDelegate =
   new StackManipulation.Compound(
   MethodVariableAccess.REFERENCE.loadFrom(0),
-  FieldAccess.forField(delegateField).getter());
+  FieldAccess.forField(delegateField).read());
 
   StackManipulation pushExtraContextFactory = 
MethodVariableAccess.REFERENCE.loadFrom(1);
 

http://git-wip-us.apache.org/repos/asf/beam/blob/3e4c05c3/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/ByteBuddyOnTimerInvokerFactory.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/ByteBuddyOnTimerInvokerFactory.java
 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/ByteBuddyOnTimerInvokerFactory.java
index 786857a..123808c 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/ByteBuddyOnTimerInvokerFactory.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/ByteBuddyOnTimerInvokerFactory.java
@@ -239,7 +239,7 @@ class ByteBuddyOnTimerInvokerFactory implements 
OnTimerInvokerFactory {
   StackManipulation pushDelegate =
   new StackManipulation.Compound(
   MethodVariableAccess.REFERENCE.loadFrom(0),
-  FieldAccess.forField(delegateField).getter());
+  FieldAccess.forField(delegateField).read());
 
   StackManipulation pushExtraContextFactory = 
MethodVariableAccess.REFERENCE.loadFrom(1);
 
@@ -295,7 +295,7 @@ class ByteBuddyOnTimerInvokerFactory implements 
OnTimerInvokerFactory {
   .getDeclaredFields()
   

[GitHub] beam pull request #2014: [BEAM-1492] Upgrade bytebuddy to 1.6.8 to jump past...

2017-02-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2014


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-1492) Avoid potential issue in ASM 5.0

2017-02-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869060#comment-15869060
 ] 

ASF GitHub Bot commented on BEAM-1492:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2014


> Avoid potential issue in ASM 5.0
> 
>
> Key: BEAM-1492
> URL: https://issues.apache.org/jira/browse/BEAM-1492
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Amit Sela
>
> There is a suspected bug in asm 5.0 that is considered the likely root cause 
> of a [bug 
> sbt-assembly|https://github.com/sbt/sbt-assembly/issues/205#issuecomment-279964607]
>  that carried over to [GEARPUMP-236]. I have not found a direct reference to 
> what the issue is, precisely, but the dependency effect of this is extremely 
> small and these are libraries that are useful to keep current. And if/when 
> Gearpump runner lands on master this will avoid any diamond dep issues.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build is back to normal : beam_PostCommit_Java_MavenInstall #2651

2017-02-15 Thread Apache Jenkins Server
See 




[1/2] beam git commit: Fix some DoFn javadoc literals

2017-02-15 Thread kenn
Repository: beam
Updated Branches:
  refs/heads/master 2e766ced5 -> 5fe78440b


Fix some DoFn javadoc literals


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/f360f47f
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/f360f47f
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/f360f47f

Branch: refs/heads/master
Commit: f360f47f9ca4f4054e9fb583c2a0f5dda9ee19ea
Parents: 18f3767
Author: Kenneth Knowles 
Authored: Tue Feb 7 14:11:13 2017 -0800
Committer: Kenneth Knowles 
Committed: Wed Feb 15 19:18:42 2017 -0800

--
 .../org/apache/beam/sdk/transforms/DoFn.java| 66 ++--
 1 file changed, 32 insertions(+), 34 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/f360f47f/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java
index 2043ef0..6f88738 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java
@@ -77,15 +77,15 @@ import org.joda.time.Instant;
  *
  * Example usage:
  *
- * {@code
- * PCollection lines = ... ;
- * PCollection words =
- * lines.apply(ParDo.of(new DoFn() {
- * {@literal @}ProcessElement
- * public void processElement(ProcessContext c, BoundedWindow window) {
- *
- * }}));
- * }
+ * 
+ * {@literal PCollection} lines = ... ;
+ * {@literal PCollection} words =
+ * {@literal lines.apply(ParDo.of(new DoFn())} {
+ * {@literal @ProcessElement}
+ *  public void processElement(ProcessContext c, BoundedWindow window) 
{
+ *...
+ *  }}));
+ * 
  *
  * @param  the type of the (main) input elements
  * @param  the type of the (main) output elements
@@ -385,21 +385,21 @@ public abstract class DoFn implements 
Serializable, HasDisplayD
* subclass to your {@link ProcessElement @ProcessElement} or {@link OnTimer 
@OnTimer} method, and
* annotate it with {@link StateId}. See the following code for an example:
*
-   * {@code
-   * new DoFn, Baz>() {
-   *   {@literal @}StateId("my-state-id")
-   *   private final StateSpec> myStateSpec =
+   * {@literal new DoFn, Baz>()} {
+   *
+   *  {@literal @StateId("my-state-id")}
+   *  {@literal private final StateSpec>} myStateSpec =
*   StateSpecs.value(new MyStateCoder());
*
-   *   {@literal @}ProcessElement
+   *  {@literal @ProcessElement}
*   public void processElement(
*   ProcessContext c,
-   *   {@literal @}StateId("my-state-id") ValueState myState) {
+   *  {@literal @StateId("my-state-id") ValueState myState}) {
* myState.read();
* myState.write(...);
*   }
* }
-   * }
+   * 
*
* State is subject to the following validity conditions:
*
@@ -429,24 +429,22 @@ public abstract class DoFn implements 
Serializable, HasDisplayD
* {@link ProcessElement @ProcessElement} or {@link OnTimer @OnTimer} 
method, and annotate it with
* {@link TimerId}. See the following code for an example:
*
-   * {@code
-   * new DoFn, Baz>() {
-   *   {@literal @}TimerId("my-timer-id")
-   *   private final TimerSpec myTimer = 
TimerSpecs.timerForDomain(TimeDomain.EVENT_TIME);
-   *
-   *   {@literal @}ProcessElement
-   *   public void processElement(
-   *   ProcessContext c,
-   *   {@literal @}TimerId("my-timer-id") Timer myTimer) {
-   * myTimer.setForNowPlus(Duration.standardSeconds(...));
-   *   }
-   *
-   *   {@literal @}OnTimer("my-timer-id")
-   *   public void onMyTimer() {
-   * ...
-   *   }
-   * }
-   * }
+   * {@literal new DoFn, Baz>()} {
+   *   {@literal @TimerId("my-timer-id")}
+   *private final TimerSpec myTimer = 
TimerSpecs.timerForDomain(TimeDomain.EVENT_TIME);
+   *
+   *   {@literal @ProcessElement}
+   *public void processElement(
+   *ProcessContext c,
+   *   {@literal @TimerId("my-timer-id") Timer myTimer}) {
+   *  myTimer.setForNowPlus(Duration.standardSeconds(...));
+   *}
+   *
+   *   {@literal @OnTimer("my-timer-id")}
+   *public void onMyTimer() {
+   *  ...
+   *}
+   * }
*
* Timers are subject to the following validity conditions:
*



[2/2] beam git commit: This closes #1940: Fix some DoFn javadoc literals

2017-02-15 Thread kenn
This closes #1940: Fix some DoFn javadoc literals


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/5fe78440
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/5fe78440
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/5fe78440

Branch: refs/heads/master
Commit: 5fe78440bd603702661595d0d92c33c38d38a51a
Parents: 2e766ce f360f47
Author: Kenneth Knowles 
Authored: Wed Feb 15 20:17:18 2017 -0800
Committer: Kenneth Knowles 
Committed: Wed Feb 15 20:17:18 2017 -0800

--
 .../org/apache/beam/sdk/transforms/DoFn.java| 66 ++--
 1 file changed, 32 insertions(+), 34 deletions(-)
--




[GitHub] beam pull request #1940: [BEAM-1413] Fix some DoFn javadoc literals

2017-02-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/1940


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-1413) DoFn javadoc has @literal where it shouldn't

2017-02-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869131#comment-15869131
 ] 

ASF GitHub Bot commented on BEAM-1413:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/1940


> DoFn javadoc has @literal where it shouldn't
> 
>
> Key: BEAM-1413
> URL: https://issues.apache.org/jira/browse/BEAM-1413
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> In particular, the docs for StateId and TimerId annotations have an extra 
> layer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-1440) Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK

2017-02-15 Thread Ibrahim Sharaf ElDen (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869155#comment-15869155
 ] 

Ibrahim Sharaf ElDen edited comment on BEAM-1440 at 2/16/17 4:38 AM:
-

Hello [~chamikara], I am interested in starting to contribute to this project, 
I am good Python and basic Java knowledge, how can I start?


was (Author: ibrahimsharaf):
Hello [~chamikara], I am interested to start contributing to this project, I am 
good Python and basic Java knowledge, how can I start?

> Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK
> --
>
> Key: BEAM-1440
> URL: https://issues.apache.org/jira/browse/BEAM-1440
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Chamikara Jayalath
>  Labels: gsoc2017, mentor, python
>
> Currently we have a BigQuery native source for Python SDK [1].
> This can only be used by Dataflow runner.
> We should  implement a Beam BigQuery source that implements 
> iobase.BoundedSource [2] interface so that other runners that try to use 
> Python SDK can read from BigQuery as well. Java SDK already has a Beam 
> BigQuery source [3].
> [1] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/bigquery.py
> [2] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py#L70
> [3] 
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1189



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1440) Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK

2017-02-15 Thread Ibrahim Sharaf ElDen (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869155#comment-15869155
 ] 

Ibrahim Sharaf ElDen commented on BEAM-1440:


Hello [~chamikara], I am interested to start contributing to this project, I am 
good Python and basic Java knowledge, how can I start?

> Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK
> --
>
> Key: BEAM-1440
> URL: https://issues.apache.org/jira/browse/BEAM-1440
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Chamikara Jayalath
>  Labels: gsoc2017, mentor, python
>
> Currently we have a BigQuery native source for Python SDK [1].
> This can only be used by Dataflow runner.
> We should  implement a Beam BigQuery source that implements 
> iobase.BoundedSource [2] interface so that other runners that try to use 
> Python SDK can read from BigQuery as well. Java SDK already has a Beam 
> BigQuery source [3].
> [1] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/bigquery.py
> [2] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py#L70
> [3] 
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1189



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1496) pysdk's sideinputs_test requires nose, but not installed by default

2017-02-15 Thread Younghee Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869200#comment-15869200
 ] 

Younghee Kwon commented on BEAM-1496:
-

I see; sorry for the noise. I thought it might have broken the automated tests, 
but I confirmed that travis-ci passes.

Closing..

> pysdk's sideinputs_test requires nose, but not installed by default
> ---
>
> Key: BEAM-1496
> URL: https://issues.apache.org/jira/browse/BEAM-1496
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Younghee Kwon
>Assignee: Ahmet Altay
>Priority: Minor
>
> $ PYTHONPATH= python -m apache_beam.transforms.sideinputs_test
>  
> No handlers could be found for logger "oauth2client.contrib.multistore_file"
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
> "__main__", fname, loader, pkg_name)
>   File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
> exec code in run_globals
>   File 
> "/usr/local/google/home/youngheek/work/github/beam3/sdks/python/apache_beam/transforms/sideinputs_test.py",
>  line 23, in 
> from nose.plugins.attrib import attr
> ImportError: No module named nose.plugins.attrib



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (BEAM-1496) pysdk's sideinputs_test requires nose, but not installed by default

2017-02-15 Thread Younghee Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Younghee Kwon closed BEAM-1496.
---
   Resolution: Not A Problem
Fix Version/s: Not applicable

> pysdk's sideinputs_test requires nose, but not installed by default
> ---
>
> Key: BEAM-1496
> URL: https://issues.apache.org/jira/browse/BEAM-1496
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Younghee Kwon
>Assignee: Ahmet Altay
>Priority: Minor
> Fix For: Not applicable
>
>
> $ PYTHONPATH= python -m apache_beam.transforms.sideinputs_test
>  
> No handlers could be found for logger "oauth2client.contrib.multistore_file"
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
> "__main__", fname, loader, pkg_name)
>   File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
> exec code in run_globals
>   File 
> "/usr/local/google/home/youngheek/work/github/beam3/sdks/python/apache_beam/transforms/sideinputs_test.py",
>  line 23, in 
> from nose.plugins.attrib import attr
> ImportError: No module named nose.plugins.attrib



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build became unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #2307

2017-02-15 Thread Apache Jenkins Server
See 




[jira] [Created] (BEAM-1497) Refactor Hadoop/HDFS IO

2017-02-15 Thread Rafal Wojdyla (JIRA)
Rafal Wojdyla created BEAM-1497:
---

 Summary: Refactor Hadoop/HDFS IO
 Key: BEAM-1497
 URL: https://issues.apache.org/jira/browse/BEAM-1497
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-extensions
Affects Versions: 0.5.0
Reporter: Rafal Wojdyla
Assignee: Rafal Wojdyla


[Scio|https://github.com/spotify/scio/tree/master/scio-hdfs/src] includes 
refactored version of Hadoop/HDFS IO, primarily:
 * single source/sink for all hadoop formats (no separate avro specific io)
 * no separate simple authentication - UGI in sink
 * use builder pattern instead of CTORs

This issue is to report push of this refactor to upstream/beam. Also tracked in 
[scio#416|https://github.com/spotify/scio/issues/416]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (BEAM-1497) Refactor Hadoop/HDFS IO

2017-02-15 Thread Rafal Wojdyla (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rafal Wojdyla closed BEAM-1497.
---
   Resolution: Duplicate
Fix Version/s: Not applicable

> Refactor Hadoop/HDFS IO
> ---
>
> Key: BEAM-1497
> URL: https://issues.apache.org/jira/browse/BEAM-1497
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Affects Versions: 0.5.0
>Reporter: Rafal Wojdyla
>Assignee: Rafal Wojdyla
> Fix For: Not applicable
>
>
> [Scio|https://github.com/spotify/scio/tree/master/scio-hdfs/src] includes 
> refactored version of Hadoop/HDFS IO, primarily:
>  * single source/sink for all hadoop formats (no separate avro specific io)
>  * no separate simple authentication - UGI in sink
>  * use builder pattern instead of CTORs
> This issue is to report push of this refactor to upstream/beam. Also tracked 
> in [scio#416|https://github.com/spotify/scio/issues/416]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Reopened] (BEAM-1497) Refactor Hadoop/HDFS IO

2017-02-15 Thread Rafal Wojdyla (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rafal Wojdyla reopened BEAM-1497:
-

> Refactor Hadoop/HDFS IO
> ---
>
> Key: BEAM-1497
> URL: https://issues.apache.org/jira/browse/BEAM-1497
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Affects Versions: 0.5.0
>Reporter: Rafal Wojdyla
>Assignee: Rafal Wojdyla
> Fix For: Not applicable
>
>
> [Scio|https://github.com/spotify/scio/tree/master/scio-hdfs/src] includes 
> refactored version of Hadoop/HDFS IO, primarily:
>  * single source/sink for all hadoop formats (no separate avro specific io)
>  * no separate simple authentication - UGI in sink
>  * use builder pattern instead of CTORs
> This issue is to report push of this refactor to upstream/beam. Also tracked 
> in [scio#416|https://github.com/spotify/scio/issues/416]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build became unstable: beam_PostCommit_Java_RunnableOnService_Apex #516

2017-02-15 Thread Apache Jenkins Server
See 




Jenkins build is back to stable : beam_PostCommit_Java_RunnableOnService_Dataflow #2308

2017-02-15 Thread Apache Jenkins Server
See