[jira] [Comment Edited] (BEAM-612) Add BSP runner

2016-08-31 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15454377#comment-15454377
 ] 

Edward J. Yoon edited comment on BEAM-612 at 9/1/16 5:42 AM:
-

Here's the link about BSP: http://hama.apache.org/hama_bsp_tutorial.html or 
https://people.apache.org/~tjungblut/downloads/hamadocs/ApacheHamaBSPProgrammingmodel_06.pdf


was (Author: udanax):
Here's the link about BSP: http://hama.apache.org/hama_bsp_tutorial.html 

> Add BSP runner
> --
>
> Key: BEAM-612
> URL: https://issues.apache.org/jira/browse/BEAM-612
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-ideas
>Reporter: Edward J. Yoon
>
> We thinking about contributing the BSP computing engine to the Beam w/ few 
> examples e.g., wordcount and streaming similarity-join computation HAMA-983.
> It would be really helpful if someone can guide us on this idea.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-612) Add BSP runner

2016-08-31 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15454377#comment-15454377
 ] 

Edward J. Yoon commented on BEAM-612:
-

Here's the link about BSP: http://hama.apache.org/hama_bsp_tutorial.html 

> Add BSP runner
> --
>
> Key: BEAM-612
> URL: https://issues.apache.org/jira/browse/BEAM-612
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-ideas
>Reporter: Edward J. Yoon
>
> We thinking about contributing the BSP computing engine to the Beam w/ few 
> examples e.g., wordcount and streaming similarity-join computation HAMA-983.
> It would be really helpful if someone can guide us on this idea.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-612) Add BSP runner

2016-08-31 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15454276#comment-15454276
 ] 

Jean-Baptiste Onofré commented on BEAM-612:
---

Do you have a link about BSP ?

> Add BSP runner
> --
>
> Key: BEAM-612
> URL: https://issues.apache.org/jira/browse/BEAM-612
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-ideas
>Reporter: Edward J. Yoon
>Assignee: James Malone
>
> We thinking about contributing the BSP computing engine to the Beam w/ few 
> examples e.g., wordcount and streaming similarity-join computation HAMA-983.
> It would be really helpful if someone can guide us on this idea.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-612) Add BSP runner

2016-08-31 Thread Edward J. Yoon (JIRA)
Edward J. Yoon created BEAM-612:
---

 Summary: Add BSP runner
 Key: BEAM-612
 URL: https://issues.apache.org/jira/browse/BEAM-612
 Project: Beam
  Issue Type: New Feature
  Components: runner-ideas
Reporter: Edward J. Yoon
Assignee: James Malone


We thinking about contributing the BSP computing engine to the Beam w/ few 
examples e.g., wordcount and streaming similarity-join computation HAMA-983.

It would be really helpful if someone can guide us on this idea.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-604) Use Watermark Check Streaming Job Finish in DataflowPipelineJob

2016-08-31 Thread Mark Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Liu updated BEAM-604:
--
Description: 
Currently, streaming job with bounded input can't be terminated automatically 
and TestDataflowRunner can't handle this case. Need to update 
TestDataflowRunner so that streaming integration test such as 
WindowedWordCountIT can run with it.

Implementation:
Query watermark of each step and wait until all watermarks set to MAX then 
cancel the job.

Update:
Suggesting by [~pei...@gmail.com], implement checkMaxWatermark in 
DataflowPipelineJob#waitUntilFinish. Thus, all dataflow streaming jobs with 
bounded input will take advantage of this change and are canceled automatically 
when watermarks reach to max value. Also Dataflow runners can keep simple and 
free from handling batch and streaming two cases.

  was:
Currently, streaming job with bounded input can't be terminated automatically 
and TestDataflowRunner can't handle this case. Need to update 
TestDataflowRunner so that streaming integration test such as 
WindowedWordCountIT can run with it.

Implementation:
Query watermark of each step and wait until all watermarks set to MAX then 
cancel the job.

Update:
Suggesting by [~pei...@gmail.com], implement checkMaxWatermark in 
DataflowPipelineJob#waitUntilFinish. Thus, all dataflow streaming jobs with 
bounded input will take advantage of this change and are canceled automatically 
when watermarks reach to max value. Also 


> Use Watermark Check Streaming Job Finish in DataflowPipelineJob
> ---
>
> Key: BEAM-604
> URL: https://issues.apache.org/jira/browse/BEAM-604
> Project: Beam
>  Issue Type: Improvement
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Minor
>
> Currently, streaming job with bounded input can't be terminated automatically 
> and TestDataflowRunner can't handle this case. Need to update 
> TestDataflowRunner so that streaming integration test such as 
> WindowedWordCountIT can run with it.
> Implementation:
> Query watermark of each step and wait until all watermarks set to MAX then 
> cancel the job.
> Update:
> Suggesting by [~pei...@gmail.com], implement checkMaxWatermark in 
> DataflowPipelineJob#waitUntilFinish. Thus, all dataflow streaming jobs with 
> bounded input will take advantage of this change and are canceled 
> automatically when watermarks reach to max value. Also Dataflow runners can 
> keep simple and free from handling batch and streaming two cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[1/2] incubator-beam git commit: Remove empty unused method in TestStreamEvaluatorFactory

2016-08-31 Thread bchambers
Repository: incubator-beam
Updated Branches:
  refs/heads/master a5320607a -> f65795661


Remove empty unused method in TestStreamEvaluatorFactory


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/ccfb78ea
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/ccfb78ea
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/ccfb78ea

Branch: refs/heads/master
Commit: ccfb78eac4d9b992d8694ad1f6347f50d80169c1
Parents: a532060
Author: Thomas Groh 
Authored: Wed Aug 31 15:34:21 2016 -0700
Committer: Thomas Groh 
Committed: Wed Aug 31 15:34:21 2016 -0700

--
 .../beam/runners/direct/TestStreamEvaluatorFactory.java   | 7 ---
 1 file changed, 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/ccfb78ea/runners/direct-java/src/main/java/org/apache/beam/runners/direct/TestStreamEvaluatorFactory.java
--
diff --git 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/TestStreamEvaluatorFactory.java
 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/TestStreamEvaluatorFactory.java
index 3dbd886..5fe771c 100644
--- 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/TestStreamEvaluatorFactory.java
+++ 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/TestStreamEvaluatorFactory.java
@@ -84,13 +84,6 @@ class TestStreamEvaluatorFactory implements 
TransformEvaluatorFactory {
 .orNull();
   }
 
-  /**
-   * Release the provided {@link Evaluator} after completing an evaluation. 
The next call to {@link
-   * #createEvaluator(AppliedPTransform, EvaluationContext)} with the {@link 
AppliedPTransform} will
-   * return this evaluator.
-   */
-  private void completeEvaluation(Evaluator evaluator) {}
-
   private static class Evaluator implements TransformEvaluator {
 private final AppliedPTransform 
application;
 private final EvaluationContext context;



[jira] [Updated] (BEAM-604) Use Watermark Check Streaming Job Finish in DataflowPipelineJob

2016-08-31 Thread Mark Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Liu updated BEAM-604:
--
Description: 
Currently, streaming job with bounded input can't be terminated automatically 
and TestDataflowRunner can't handle this case. Need to update 
TestDataflowRunner so that streaming integration test such as 
WindowedWordCountIT can run with it.

Implementation:
Query watermark of each step and wait until all watermarks set to MAX then 
cancel the job.

Update:
Suggesting by [~pei...@gmail.com], implement checkMaxWatermark in 
DataflowPipelineJob#waitUntilFinish. Thus, all dataflow streaming jobs with 
bounded input will take advantage of this change and are canceled automatically 
when watermarks reach to max value. Also 

  was:
Currently, streaming job with bounded input can't be terminated automatically 
and TestDataflowRunner can't handle this case. Need to update 
TestDataflowRunner so that streaming integration test such as 
WindowedWordCountIT can run with it.

Implementation:
Query watermark of each step and wait until all watermarks set to MAX then 
cancel the job.


> Use Watermark Check Streaming Job Finish in DataflowPipelineJob
> ---
>
> Key: BEAM-604
> URL: https://issues.apache.org/jira/browse/BEAM-604
> Project: Beam
>  Issue Type: Improvement
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Minor
>
> Currently, streaming job with bounded input can't be terminated automatically 
> and TestDataflowRunner can't handle this case. Need to update 
> TestDataflowRunner so that streaming integration test such as 
> WindowedWordCountIT can run with it.
> Implementation:
> Query watermark of each step and wait until all watermarks set to MAX then 
> cancel the job.
> Update:
> Suggesting by [~pei...@gmail.com], implement checkMaxWatermark in 
> DataflowPipelineJob#waitUntilFinish. Thus, all dataflow streaming jobs with 
> bounded input will take advantage of this change and are canceled 
> automatically when watermarks reach to max value. Also 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[2/2] incubator-beam git commit: Closes #910

2016-08-31 Thread bchambers
Closes #910


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/f6579566
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/f6579566
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/f6579566

Branch: refs/heads/master
Commit: f65795661a9d9b20c757791b953bd343b461b582
Parents: a532060 ccfb78e
Author: bchambers 
Authored: Wed Aug 31 15:54:20 2016 -0700
Committer: bchambers 
Committed: Wed Aug 31 15:54:20 2016 -0700

--
 .../beam/runners/direct/TestStreamEvaluatorFactory.java   | 7 ---
 1 file changed, 7 deletions(-)
--




[jira] [Updated] (BEAM-604) Use Watermark Check Streaming Job Finish in DataflowPipelineJob

2016-08-31 Thread Mark Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Liu updated BEAM-604:
--
Summary: Use Watermark Check Streaming Job Finish in DataflowPipelineJob  
(was: Use Watermark Check Streaming Job Finish in TestDataflowRunner )

> Use Watermark Check Streaming Job Finish in DataflowPipelineJob
> ---
>
> Key: BEAM-604
> URL: https://issues.apache.org/jira/browse/BEAM-604
> Project: Beam
>  Issue Type: Improvement
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Minor
>
> Currently, streaming job with bounded input can't be terminated automatically 
> and TestDataflowRunner can't handle this case. Need to update 
> TestDataflowRunner so that streaming integration test such as 
> WindowedWordCountIT can run with it.
> Implementation:
> Query watermark of each step and wait until all watermarks set to MAX then 
> cancel the job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request #910: Remove empty unused method in TestStreamEv...

2016-08-31 Thread tgroh
GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/910

Remove empty unused method in TestStreamEvaluatorFactory

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam remove_unused_tsef

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/910.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #910






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request #907: Test that multiple instances of TestStream...

2016-08-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/907


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] incubator-beam git commit: Test that multiple instances of TestStream are supported

2016-08-31 Thread bchambers
Repository: incubator-beam
Updated Branches:
  refs/heads/master 7dcb4c72c -> a5320607a


Test that multiple instances of TestStream are supported

Add KeyedResourcePool

This interface represents some shared pool of values that may be used by
at most one caller at a time.

Add LockedKeyedResourcePool which has at most one value per key and
at most one user per value at a time.

Use KeyedResourcePool in TestStream


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/89680975
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/89680975
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/89680975

Branch: refs/heads/master
Commit: 89680975b5a89351ccc4bf99a3a6bd8772d87f40
Parents: 7dcb4c7
Author: Thomas Groh 
Authored: Tue Aug 30 14:17:50 2016 -0700
Committer: bchambers 
Committed: Wed Aug 31 15:00:39 2016 -0700

--
 .../beam/runners/direct/KeyedResourcePool.java  |  47 +
 .../runners/direct/LockedKeyedResourcePool.java |  95 +
 .../direct/TestStreamEvaluatorFactory.java  | 141 +++--
 .../direct/LockedKeyedResourcePoolTest.java | 163 +++
 .../direct/TestStreamEvaluatorFactoryTest.java  | 206 +++
 .../apache/beam/sdk/testing/TestStreamTest.java |  29 +++
 6 files changed, 623 insertions(+), 58 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/89680975/runners/direct-java/src/main/java/org/apache/beam/runners/direct/KeyedResourcePool.java
--
diff --git 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/KeyedResourcePool.java
 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/KeyedResourcePool.java
new file mode 100644
index 000..b976b69
--- /dev/null
+++ 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/KeyedResourcePool.java
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.direct;
+
+import com.google.common.base.Optional;
+import java.util.concurrent.Callable;
+import java.util.concurrent.ExecutionException;
+
+/**
+ * A pool of resources associated with specific keys. Implementations enforce 
specific use patterns,
+ * such as limiting the the number of outstanding elements available per key.
+ */
+interface KeyedResourcePool {
+  /**
+   * Tries to acquire a value for the provided key, loading it via the 
provided loader if necessary.
+   *
+   * If the returned {@link Optional} contains a value, the caller obtains 
ownership of that
+   * value. The value should be released back to this {@link 
KeyedResourcePool} after the
+   * caller no longer has use of it using {@link #release(Object, Object)}.
+   *
+   * The provided {@link Callable} must not return null; it may 
either return a non-null
+   * value or throw an exception.
+   */
+  Optional tryAcquire(K key, Callable loader) throws ExecutionException;
+
+  /**
+   * Release the provided value, relinquishing ownership of it. Future calls to
+   * {@link #tryAcquire(Object, Callable)} may return the released value.
+   */
+  void release(K key, V value);
+}

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/89680975/runners/direct-java/src/main/java/org/apache/beam/runners/direct/LockedKeyedResourcePool.java
--
diff --git 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/LockedKeyedResourcePool.java
 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/LockedKeyedResourcePool.java
new file mode 100644
index 000..8b1e0b1
--- /dev/null
+++ 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/LockedKeyedResourcePool.java
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ 

[2/2] incubator-beam git commit: Closes #907

2016-08-31 Thread bchambers
Closes #907


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/a5320607
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/a5320607
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/a5320607

Branch: refs/heads/master
Commit: a5320607af10dd6b45440384b8afbbc8ad9889b7
Parents: 7dcb4c7 8968097
Author: bchambers 
Authored: Wed Aug 31 15:01:53 2016 -0700
Committer: bchambers 
Committed: Wed Aug 31 15:01:53 2016 -0700

--
 .../beam/runners/direct/KeyedResourcePool.java  |  47 +
 .../runners/direct/LockedKeyedResourcePool.java |  95 +
 .../direct/TestStreamEvaluatorFactory.java  | 141 +++--
 .../direct/LockedKeyedResourcePoolTest.java | 163 +++
 .../direct/TestStreamEvaluatorFactoryTest.java  | 206 +++
 .../apache/beam/sdk/testing/TestStreamTest.java |  29 +++
 6 files changed, 623 insertions(+), 58 deletions(-)
--




[jira] [Created] (BEAM-611) Add support for MapValues

2016-08-31 Thread Luke Cwik (JIRA)
Luke Cwik created BEAM-611:
--

 Summary: Add support for MapValues
 Key: BEAM-611
 URL: https://issues.apache.org/jira/browse/BEAM-611
 Project: Beam
  Issue Type: Improvement
  Components: sdk-ideas
Reporter: Luke Cwik
Priority: Minor


Filed from: https://github.com/GoogleCloudPlatform/DataflowJavaSDK/issues/412

Often I find myself needing to simply map a function over just the values of a 
key-valued PCollection. MapElements works for this, but suffers a small hit in 
readability (imho) and introduces some possibility for error.

I wanted to see if there is any bandwidth / interest in adding this as a 
standard transform to the SDK. If so, I have attached a gist with a basic spike 
I have been using in my flows:

https://gist.github.com/trentonstrong/8b60933dca545eb2138b72899195019e



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-610) Enable spark's checkpointing mechanism for driver-failure recovery in streaming

2016-08-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453099#comment-15453099
 ] 

ASF GitHub Bot commented on BEAM-610:
-

GitHub user amitsela opened a pull request:

https://github.com/apache/incubator-beam/pull/909

[BEAM-610] Enable spark's checkpointing mechanism for driver-failure 
recovery in streaming

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

Support basic functionality with GroupByKey and ParDo.

Added support for grouping operations.

Added checkpointDir option, using it before execution.

Support Accumulator recovery from checkpoint.

Streaming tests should use JUnit's TemporaryFolder Rule for checkpoint 
directory.

Support combine optimizations.

Support durable sideInput via Broadcast.

Branches in the pipeline are either Bounded or Unbounded and should be 
handles so.

Handle flatten/union of Bouned/Unbounded RDD/DStream.

JavaDoc

Rebased on master.

Reuse functionality between batch and streaming translators

Better implementation of streaming/batch pipeline-branch translation.

Move group/combine functions to their own wrapping class.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/amitsela/incubator-beam BEAM-610

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/909.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #909


commit ec9cd0805c23afd792dcccf0f8fb268cdbb0e319
Author: Sela 
Date:   2016-08-25T20:49:01Z

Refactor translation mechanism to support checkpointing of DStream.

Support basic functionality with GroupByKey and ParDo.

Added support for grouping operations.

Added checkpointDir option, using it before execution.

Support Accumulator recovery from checkpoint.

Streaming tests should use JUnit's TemporaryFolder Rule for checkpoint 
directory.

Support combine optimizations.

Support durable sideInput via Broadcast.

Branches in the pipeline are either Bounded or Unbounded and should be 
handles so.

Handle flatten/union of Bouned/Unbounded RDD/DStream.

JavaDoc

Rebased on master.

Reuse functionality between batch and streaming translators

Better implementation of streaming/batch pipeline-branch translation.

Move group/combine functions to their own wrapping class.




> Enable spark's checkpointing mechanism for driver-failure recovery in 
> streaming
> ---
>
> Key: BEAM-610
> URL: https://issues.apache.org/jira/browse/BEAM-610
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Amit Sela
>
> For streaming applications, Spark provides a checkpoint mechanism useful for 
> stateful processing and driver failures. See: 
> https://spark.apache.org/docs/1.6.2/streaming-programming-guide.html#checkpointing
> This requires the "lambdas", or the content of DStream/RDD functions to be 
> Serializable - currently, the runner a lot of the translation work in 
> streaming to the batch translator, which can no longer be the case because it 
> passes along non-serializables.
> This also requires wrapping the creation of the streaming application's graph 
> in a "getOrCreate" manner. See: 
> https://spark.apache.org/docs/1.6.2/streaming-programming-guide.html#how-to-configure-checkpointing
> Another limitation is the need to wrap Accumulators and Broadcast variables 
> in Singletons in order for them to be re-created once stale after recovery.
> This work is a prerequisite to support PerKey workflows, which will be 
> support via Spark's stateful operators such as mapWithState.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request #909: [BEAM-610] Enable spark's checkpointing me...

2016-08-31 Thread amitsela
GitHub user amitsela opened a pull request:

https://github.com/apache/incubator-beam/pull/909

[BEAM-610] Enable spark's checkpointing mechanism for driver-failure 
recovery in streaming

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

Support basic functionality with GroupByKey and ParDo.

Added support for grouping operations.

Added checkpointDir option, using it before execution.

Support Accumulator recovery from checkpoint.

Streaming tests should use JUnit's TemporaryFolder Rule for checkpoint 
directory.

Support combine optimizations.

Support durable sideInput via Broadcast.

Branches in the pipeline are either Bounded or Unbounded and should be 
handles so.

Handle flatten/union of Bouned/Unbounded RDD/DStream.

JavaDoc

Rebased on master.

Reuse functionality between batch and streaming translators

Better implementation of streaming/batch pipeline-branch translation.

Move group/combine functions to their own wrapping class.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/amitsela/incubator-beam BEAM-610

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/909.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #909


commit ec9cd0805c23afd792dcccf0f8fb268cdbb0e319
Author: Sela 
Date:   2016-08-25T20:49:01Z

Refactor translation mechanism to support checkpointing of DStream.

Support basic functionality with GroupByKey and ParDo.

Added support for grouping operations.

Added checkpointDir option, using it before execution.

Support Accumulator recovery from checkpoint.

Streaming tests should use JUnit's TemporaryFolder Rule for checkpoint 
directory.

Support combine optimizations.

Support durable sideInput via Broadcast.

Branches in the pipeline are either Bounded or Unbounded and should be 
handles so.

Handle flatten/union of Bouned/Unbounded RDD/DStream.

JavaDoc

Rebased on master.

Reuse functionality between batch and streaming translators

Better implementation of streaming/batch pipeline-branch translation.

Move group/combine functions to their own wrapping class.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (BEAM-610) Enable spark's checkpointing mechanism for driver-failure recovery in streaming

2016-08-31 Thread Amit Sela (JIRA)
Amit Sela created BEAM-610:
--

 Summary: Enable spark's checkpointing mechanism for driver-failure 
recovery in streaming
 Key: BEAM-610
 URL: https://issues.apache.org/jira/browse/BEAM-610
 Project: Beam
  Issue Type: Bug
  Components: runner-spark
Reporter: Amit Sela
Assignee: Amit Sela


For streaming applications, Spark provides a checkpoint mechanism useful for 
stateful processing and driver failures. See: 
https://spark.apache.org/docs/1.6.2/streaming-programming-guide.html#checkpointing

This requires the "lambdas", or the content of DStream/RDD functions to be 
Serializable - currently, the runner a lot of the translation work in streaming 
to the batch translator, which can no longer be the case because it passes 
along non-serializables.

This also requires wrapping the creation of the streaming application's graph 
in a "getOrCreate" manner. See: 
https://spark.apache.org/docs/1.6.2/streaming-programming-guide.html#how-to-configure-checkpointing

Another limitation is the need to wrap Accumulators and Broadcast variables in 
Singletons in order for them to be re-created once stale after recovery.

This work is a prerequisite to support PerKey workflows, which will be support 
via Spark's stateful operators such as mapWithState.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-609) Add Interface around Evaluator Caching

2016-08-31 Thread Thomas Groh (JIRA)
Thomas Groh created BEAM-609:


 Summary: Add Interface around Evaluator Caching
 Key: BEAM-609
 URL: https://issues.apache.org/jira/browse/BEAM-609
 Project: Beam
  Issue Type: Improvement
  Components: runner-direct
Reporter: Thomas Groh
Assignee: Davor Bonaci
Priority: Minor


The "acquire-use-release" pattern is relatively common throughout the 
TransformEvaluators ((Un)BoundedRead, TestStream), and as a result there's some 
code duplication.

Refactoring to use a common interface (among the lines of:

public static class ConcurrentSingleUseInstanceCache {
  public ConcurrentSingleUseInstanceCache(Function createInstance) { ... }
  public @Nullable V tryAcquire(K key) { ... };
  public void release(K key, V value) { ... }
}
)

would improve this abstraction boundary and get rid of some duplicate logic. We 
can also test the cache-and-hold implementations more easily.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request #778: Correct some accidental renames

2016-08-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/778


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] incubator-beam git commit: Correct some accidental renames

2016-08-31 Thread bchambers
Repository: incubator-beam
Updated Branches:
  refs/heads/master 98da6e8fb -> 7dcb4c72c


Correct some accidental renames

IDE over-eagerly replaced some occurrences of createAggregator with
createAggregatorForDoFn. This corrects that.


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/f70aa49e
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/f70aa49e
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/f70aa49e

Branch: refs/heads/master
Commit: f70aa49e2babc79a65a339309776837be2a45126
Parents: 98da6e8
Author: bchambers 
Authored: Wed Aug 3 13:38:43 2016 -0700
Committer: bchambers 
Committed: Wed Aug 31 10:23:29 2016 -0700

--
 .../main/java/org/apache/beam/sdk/util/DoFnRunnerBase.java   | 2 +-
 .../main/java/org/apache/beam/sdk/transforms/Aggregator.java | 8 
 .../main/java/org/apache/beam/sdk/transforms/DoFnTester.java | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/f70aa49e/runners/core-java/src/main/java/org/apache/beam/sdk/util/DoFnRunnerBase.java
--
diff --git 
a/runners/core-java/src/main/java/org/apache/beam/sdk/util/DoFnRunnerBase.java 
b/runners/core-java/src/main/java/org/apache/beam/sdk/util/DoFnRunnerBase.java
index 04a0978..f0cfd74 100644
--- 
a/runners/core-java/src/main/java/org/apache/beam/sdk/util/DoFnRunnerBase.java
+++ 
b/runners/core-java/src/main/java/org/apache/beam/sdk/util/DoFnRunnerBase.java
@@ -344,7 +344,7 @@ public abstract class DoFnRunnerBase 
implements DoFnRunner Aggregator 
createAggregatorInternal(
 String name, CombineFn combiner) {
-  checkNotNull(combiner, "Combiner passed to createAggregatorForDoFn 
cannot be null");
+  checkNotNull(combiner, "Combiner passed to createAggregatorInternal 
cannot be null");
   return aggregatorFactory.createAggregatorForDoFn(fn.getClass(), 
stepContext, name, combiner);
 }
   }

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/f70aa49e/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Aggregator.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Aggregator.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Aggregator.java
index 67d399f..e8f6247 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Aggregator.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Aggregator.java
@@ -25,7 +25,7 @@ import org.apache.beam.sdk.util.ExecutionContext;
  * to be combined across all bundles.
  *
  * Aggregators are created by calling
- * {@link DoFn#createAggregator DoFn.createAggregatorForDoFn},
+ * {@link DoFn#createAggregator DoFn.createAggregator},
  * typically from the {@link DoFn} constructor. Elements can be added to the
  * {@code Aggregator} by calling {@link Aggregator#addValue}.
  *
@@ -41,7 +41,7 @@ import org.apache.beam.sdk.util.ExecutionContext;
  *   private Aggregator myAggregator;
  *
  *   public MyDoFn() {
- * myAggregator = createAggregatorForDoFn("myAggregator", new 
Sum.SumIntegerFn());
+ * myAggregator = createAggregator("myAggregator", new Sum.SumIntegerFn());
  *   }
  *
  *   @ProcessElement
@@ -89,9 +89,9 @@ public interface Aggregator {
   }
 
   // TODO: Consider the following additional API conveniences:
-  // - In addition to createAggregatorForDoFn(), consider adding 
getAggregator() to
+  // - In addition to createAggregator(), consider adding getAggregator() to
   //   avoid the need to store the aggregator locally in a DoFn, i.e., create
   //   if not already present.
   // - Add a shortcut for the most common aggregator:
-  //   c.createAggregatorForDoFn("name", new Sum.SumIntegerFn()).
+  //   c.createAggregator("name", new Sum.SumIntegerFn()).
 }

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/f70aa49e/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFnTester.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFnTester.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFnTester.java
index 6801768..b867a55 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFnTester.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFnTester.java
@@ -667,7 +667,7 @@ public class DoFnTester {
 String name, CombineFn combiner) {
   throw new 

[2/2] incubator-beam git commit: Closes #778

2016-08-31 Thread bchambers
Closes #778


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/7dcb4c72
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/7dcb4c72
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/7dcb4c72

Branch: refs/heads/master
Commit: 7dcb4c72cd7e2be0ccaa30424226a3451e903f76
Parents: 98da6e8 f70aa49
Author: bchambers 
Authored: Wed Aug 31 10:23:30 2016 -0700
Committer: bchambers 
Committed: Wed Aug 31 10:23:30 2016 -0700

--
 .../main/java/org/apache/beam/sdk/util/DoFnRunnerBase.java   | 2 +-
 .../main/java/org/apache/beam/sdk/transforms/Aggregator.java | 8 
 .../main/java/org/apache/beam/sdk/transforms/DoFnTester.java | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)
--




[jira] [Commented] (BEAM-572) Remove Spark references in WordCount

2016-08-31 Thread Mark Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15452754#comment-15452754
 ] 

Mark Liu commented on BEAM-572:
---

PR is closed. This jira can be marked as resolved.

> Remove Spark references in WordCount
> 
>
> Key: BEAM-572
> URL: https://issues.apache.org/jira/browse/BEAM-572
> Project: Beam
>  Issue Type: Bug
>  Components: examples-java
>Reporter: Pei He
>Assignee: Mark Liu
>
> Examples should be runner agnostics.
> We don't want to have Spark references in
> https://github.com/apache/incubator-beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WordCount.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-608) Set EvaluationContext once in the TransformEvaluatorFactory instead of forApplication

2016-08-31 Thread Thomas Groh (JIRA)
Thomas Groh created BEAM-608:


 Summary: Set EvaluationContext once in the 
TransformEvaluatorFactory instead of forApplication
 Key: BEAM-608
 URL: https://issues.apache.org/jira/browse/BEAM-608
 Project: Beam
  Issue Type: Improvement
  Components: runner-direct
Reporter: Thomas Groh
Assignee: Thomas Groh
Priority: Minor


Remove the EvaluationContext parameter from 
TransformEvaluatorFactory.forApplication() and move it to 
TransformEvaluatorRegistry.defaultRegistry(), and pass it to all of the 
TransformEvaluatorFactories created in that constructor.

A single EvaluationContext is produced per pipeline run. 
TransformEvaluatorFactories are also produced once per pipeline run, as they 
are stateful (especially with regards to sources). This change will both ensure 
that Evaluators cannot be reused across pipelines, and cleans up the signature 
of the TransformEvaluatorFactory interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request #908: Fix inconsistent in formatting flink logs

2016-08-31 Thread xhumanoid
GitHub user xhumanoid opened a pull request:

https://github.com/apache/incubator-beam/pull/908

Fix inconsistent in formatting flink logs

**leaveCompositeTransform** always decrement _this.depth_, 
but **enterCompositeTransform** increment _this.depth_ only on 
ENTER_TRANSFORM

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xhumanoid/incubator-beam master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/908.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #908


commit cea201eaaea24d8cc1e117645d1c81f379beeb41
Author: Alexey Diomin 
Date:   2016-08-31T14:17:01Z

Fix inconsistent in formatting logs: leaveCompositeTransform always 
decrement depth, but enterCompositeTransform increment depth only on 
ENTER_TRANSFORM




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


incubator-beam git commit: Fix condition in FlinkStreamingPipelineTranslator

2016-08-31 Thread aljoscha
Repository: incubator-beam
Updated Branches:
  refs/heads/master 1dc1f25b6 -> 98da6e8fb


Fix condition in FlinkStreamingPipelineTranslator


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/98da6e8f
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/98da6e8f
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/98da6e8f

Branch: refs/heads/master
Commit: 98da6e8fb014d2a93b7441f6b2b131968d874ab6
Parents: 1dc1f25
Author: Aljoscha Krettek 
Authored: Wed Aug 31 13:42:30 2016 +0200
Committer: Aljoscha Krettek 
Committed: Wed Aug 31 13:42:30 2016 +0200

--
 .../flink/translation/FlinkStreamingPipelineTranslator.java| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/98da6e8f/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/FlinkStreamingPipelineTranslator.java
--
diff --git 
a/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/FlinkStreamingPipelineTranslator.java
 
b/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/FlinkStreamingPipelineTranslator.java
index b127455..284cd23 100644
--- 
a/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/FlinkStreamingPipelineTranslator.java
+++ 
b/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/FlinkStreamingPipelineTranslator.java
@@ -84,7 +84,7 @@ public class FlinkStreamingPipelineTranslator extends 
FlinkPipelineTranslator {
 StreamTransformTranslator translator =
 FlinkStreamingTransformTranslators.getTranslator(transform);
 
-if (translator == null && applyCanTranslate(transform, node, translator)) {
+if (translator == null || !applyCanTranslate(transform, node, translator)) 
{
   LOG.info(node.getTransform().getClass().toString());
   throw new UnsupportedOperationException(
   "The transform " + transform + " is currently not supported.");



[2/2] incubator-beam git commit: Merge branch 'flink-fixes'

2016-08-31 Thread aljoscha
Merge branch 'flink-fixes'

This closes #883


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/1dc1f25b
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/1dc1f25b
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/1dc1f25b

Branch: refs/heads/master
Commit: 1dc1f25b64eed7679e5d30309d5634052d187814
Parents: 33d747e cae9638
Author: Aljoscha Krettek 
Authored: Wed Aug 31 11:05:18 2016 +0200
Committer: Aljoscha Krettek 
Committed: Wed Aug 31 11:05:18 2016 +0200

--
 .../beam/runners/core/SideInputHandler.java |  6 +-
 .../apache/beam/runners/flink/FlinkRunner.java  | 86 ++--
 .../wrappers/streaming/DoFnOperator.java| 13 ++-
 .../wrappers/streaming/WindowDoFnOperator.java  |  2 -
 4 files changed, 89 insertions(+), 18 deletions(-)
--




[1/2] incubator-beam git commit: Address comments of Flink Side-Input PR

2016-08-31 Thread aljoscha
Repository: incubator-beam
Updated Branches:
  refs/heads/master 33d747efa -> 1dc1f25b6


Address comments of Flink Side-Input PR


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/cae96380
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/cae96380
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/cae96380

Branch: refs/heads/master
Commit: cae96380f15fc293d00b444148e5d08c3f14d909
Parents: 33d747e
Author: Aljoscha Krettek 
Authored: Thu Aug 25 11:00:39 2016 +0200
Committer: Aljoscha Krettek 
Committed: Wed Aug 31 11:04:51 2016 +0200

--
 .../beam/runners/core/SideInputHandler.java |  6 +-
 .../apache/beam/runners/flink/FlinkRunner.java  | 86 ++--
 .../wrappers/streaming/DoFnOperator.java| 13 ++-
 .../wrappers/streaming/WindowDoFnOperator.java  |  2 -
 4 files changed, 89 insertions(+), 18 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/cae96380/runners/core-java/src/main/java/org/apache/beam/runners/core/SideInputHandler.java
--
diff --git 
a/runners/core-java/src/main/java/org/apache/beam/runners/core/SideInputHandler.java
 
b/runners/core-java/src/main/java/org/apache/beam/runners/core/SideInputHandler.java
index a97d3f3..851ed37 100644
--- 
a/runners/core-java/src/main/java/org/apache/beam/runners/core/SideInputHandler.java
+++ 
b/runners/core-java/src/main/java/org/apache/beam/runners/core/SideInputHandler.java
@@ -60,7 +60,11 @@ public class SideInputHandler implements 
ReadyCheckingSideInputReader {
   /** The list of side inputs that we're handling. */
   protected final Collection sideInputs;
 
-  /** State internals that are scoped not to the key of a value but instead to 
one key group. */
+  /**
+   * State internals that are scoped not to the key of a value but are global. 
The state can still
+   * be keep locally but if side inputs are broadcast to all parallel 
operators then all will
+   * have the same view of the state.
+   */
   private final StateInternals stateInternals;
 
   /**

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/cae96380/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkRunner.java
--
diff --git 
a/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkRunner.java
 
b/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkRunner.java
index 8b1f42e..d3c65c0 100644
--- 
a/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkRunner.java
+++ 
b/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkRunner.java
@@ -25,8 +25,13 @@ import java.net.URL;
 import java.net.URLClassLoader;
 import java.util.ArrayList;
 import java.util.Arrays;
+import java.util.HashSet;
 import java.util.List;
 import java.util.Map;
+import java.util.Set;
+import java.util.SortedSet;
+import java.util.TreeSet;
+
 import org.apache.beam.sdk.Pipeline;
 import org.apache.beam.sdk.coders.Coder;
 import org.apache.beam.sdk.coders.CoderRegistry;
@@ -35,6 +40,7 @@ import org.apache.beam.sdk.coders.ListCoder;
 import org.apache.beam.sdk.options.PipelineOptions;
 import org.apache.beam.sdk.options.PipelineOptionsValidator;
 import org.apache.beam.sdk.runners.PipelineRunner;
+import org.apache.beam.sdk.runners.TransformTreeNode;
 import org.apache.beam.sdk.transforms.Combine;
 import org.apache.beam.sdk.transforms.OldDoFn;
 import org.apache.beam.sdk.transforms.PTransform;
@@ -47,6 +53,8 @@ import org.apache.beam.sdk.values.PCollection;
 import org.apache.beam.sdk.values.PCollectionView;
 import org.apache.beam.sdk.values.PInput;
 import org.apache.beam.sdk.values.POutput;
+import org.apache.beam.sdk.values.PValue;
+
 import org.apache.flink.api.common.JobExecutionResult;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
@@ -108,6 +116,7 @@ public class FlinkRunner extends 
PipelineRunner {
 
   private FlinkRunner(FlinkPipelineOptions options) {
 this.options = options;
+this.ptransformViewsWithNonDeterministicKeyCoders = new HashSet<>();
 
 ImmutableMap.Builder builder = ImmutableMap.builder();
 if (options.isStreaming()) {
@@ -124,6 +133,8 @@ public class FlinkRunner extends 
PipelineRunner {
 
   @Override
   public FlinkRunnerResult run(Pipeline pipeline) {
+logWarningIfPCollectionViewHasNonDeterministicKeyCoder(pipeline);
+
 LOG.info("Executing pipeline using FlinkRunner.");
 
 FlinkPipelineExecutionEnvironment env = new 
FlinkPipelineExecutionEnvironment(options);
@@ -176,6 +187,7 @@ public class FlinkRunner extends 
PipelineRunner {
 

[jira] [Updated] (BEAM-607) Add DistributedLog IO

2016-08-31 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/BEAM-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré updated BEAM-607:
--
Fix Version/s: (was: 0.3.0-incubating)

> Add DistributedLog IO
> -
>
> Key: BEAM-607
> URL: https://issues.apache.org/jira/browse/BEAM-607
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Khurrum Nasim
>
> I'd like to add an IO for the new DistributedLog streams - 
> http://distributedlog.io
> - bounded source and sink (sealed streams)
> - unbounded source and sink (unsealed streams)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-607) Add DistributedLog IO

2016-08-31 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451642#comment-15451642
 ] 

Jean-Baptiste Onofré commented on BEAM-607:
---

We don't assign Jira if you are not committer. So, please provide a PR, I will 
do the review.

On the other hand, the "Fix Version" tag should be set only when the feature is 
already merged on master (and so included in next release).

Let me know if I can help you in any way !

Thanks !

> Add DistributedLog IO
> -
>
> Key: BEAM-607
> URL: https://issues.apache.org/jira/browse/BEAM-607
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Khurrum Nasim
>
> I'd like to add an IO for the new DistributedLog streams - 
> http://distributedlog.io
> - bounded source and sink (sealed streams)
> - unbounded source and sink (unsealed streams)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request #883: Address comments of Flink Side-Input PR

2016-08-31 Thread aljoscha
Github user aljoscha closed the pull request at:

https://github.com/apache/incubator-beam/pull/883


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (BEAM-607) Add DistributedLog IO

2016-08-31 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/BEAM-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré updated BEAM-607:
--
Assignee: (was: James Malone)

> Add DistributedLog IO
> -
>
> Key: BEAM-607
> URL: https://issues.apache.org/jira/browse/BEAM-607
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Khurrum Nasim
> Fix For: 0.3.0-incubating
>
>
> I'd like to add an IO for the new DistributedLog streams - 
> http://distributedlog.io
> - bounded source and sink (sealed streams)
> - unbounded source and sink (unsealed streams)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-607) Add DistributedLog IO

2016-08-31 Thread Khurrum Nasim (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451628#comment-15451628
 ] 

Khurrum Nasim commented on BEAM-607:


I'd like to contribute. Can you assign this to me?

> Add DistributedLog IO
> -
>
> Key: BEAM-607
> URL: https://issues.apache.org/jira/browse/BEAM-607
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Khurrum Nasim
>Assignee: James Malone
> Fix For: 0.3.0-incubating
>
>
> I'd like to add an IO for the new DistributedLog streams - 
> http://distributedlog.io
> - bounded source and sink (sealed streams)
> - unbounded source and sink (unsealed streams)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-607) Add DistributedLog IO

2016-08-31 Thread Khurrum Nasim (JIRA)
Khurrum Nasim created BEAM-607:
--

 Summary: Add DistributedLog IO
 Key: BEAM-607
 URL: https://issues.apache.org/jira/browse/BEAM-607
 Project: Beam
  Issue Type: New Feature
  Components: sdk-java-extensions
Reporter: Khurrum Nasim
Assignee: James Malone
 Fix For: 0.3.0-incubating


I'd like to add an IO for the new DistributedLog streams - 
http://distributedlog.io

- bounded source and sink (sealed streams)
- unbounded source and sink (unsealed streams)





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)