date:20170306

[jira] [Created] (BEAM-1634) Flink: register IOChannelFactories on workers

2017-03-06 Thread Davor Bonaci (JIRA)

Davor Bonaci created BEAM-1634:
--

 Summary: Flink: register IOChannelFactories on workers
 Key: BEAM-1634
 URL: https://issues.apache.org/jira/browse/BEAM-1634
 Project: Beam
  Issue Type: Bug
  Components: runner-flink
Reporter: Davor Bonaci
Assignee: Aljoscha Krettek
 Fix For: 0.6.0


Runners should register IOChannelFactories on each worker before executing a 
task on the workers.

This ensures that things like TextIO, AvroIO, etc. work with custom file 
systems, like GCS, HDFS and others.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-1634) Flink: register IOChannelFactories on workers

2017-03-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898854#comment-15898854
 ] 

ASF GitHub Bot commented on BEAM-1634:
--

GitHub user davorbonaci opened a pull request:

https://github.com/apache/beam/pull/2176

[BEAM-1634] Flink: register known IOChannelFactories

R: @aljoscha

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davorbonaci/beam flink-gcs-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2176.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2176


commit 971335e4d85982b697e30a6ba331eabc7b38f3a0
Author: Davor Bonaci 
Date:   2017-03-07T00:51:04Z

Flink: register known IOChannelFactories




> Flink: register IOChannelFactories on workers
> -
>
> Key: BEAM-1634
> URL: https://issues.apache.org/jira/browse/BEAM-1634
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Davor Bonaci
>Assignee: Aljoscha Krettek
> Fix For: 0.6.0
>
>
> Runners should register IOChannelFactories on each worker before executing a 
> task on the workers.
> This ensures that things like TextIO, AvroIO, etc. work with custom file 
> systems, like GCS, HDFS and others.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2176: [BEAM-1634] Flink: register known IOChannelFactorie...

2017-03-06 Thread davorbonaci

GitHub user davorbonaci opened a pull request:

https://github.com/apache/beam/pull/2176

[BEAM-1634] Flink: register known IOChannelFactories

R: @aljoscha

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davorbonaci/beam flink-gcs-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2176.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2176


commit 971335e4d85982b697e30a6ba331eabc7b38f3a0
Author: Davor Bonaci 
Date:   2017-03-07T00:51:04Z

Flink: register known IOChannelFactories




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Jenkins build is back to stable : beam_PostCommit_Java_MavenInstall #2832

2017-03-06 Thread Apache Jenkins Server

See

[2/2] beam git commit: This closes #2169

2017-03-06 Thread davor

This closes #2169


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/1fd52f53
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/1fd52f53
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/1fd52f53

Branch: refs/heads/master
Commit: 1fd52f53c92ece265ed41bd7f2a06b0cdf6f8afd
Parents: 410534b 4dda585
Author: Davor Bonaci 
Authored: Mon Mar 6 22:37:55 2017 -0800
Committer: Davor Bonaci 
Committed: Mon Mar 6 22:37:55 2017 -0800

--
 .../spark/translation/SparkRuntimeContext.java  | 29 
 1 file changed, 23 insertions(+), 6 deletions(-)
--

[GitHub] beam pull request #2169: [BEAM-1556] Make PipelineOptions a lazy-singleton a...

2017-03-06 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2169


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[1/2] beam git commit: [BEAM-1556] Make PipelineOptions a lazy-singleton and init IOs as part of it.

2017-03-06 Thread davor

Repository: beam
Updated Branches:
  refs/heads/master 410534b1f -> 1fd52f53c


[BEAM-1556] Make PipelineOptions a lazy-singleton and init IOs as part of it.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/4dda585c
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/4dda585c
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/4dda585c

Branch: refs/heads/master
Commit: 4dda585cda61a775e2d616fa5c25698f490b9cd3
Parents: 410534b
Author: Sela 
Authored: Mon Mar 6 11:17:00 2017 +0200
Committer: Davor Bonaci 
Committed: Mon Mar 6 22:37:41 2017 -0800

--
 .../spark/translation/SparkRuntimeContext.java  | 29 
 1 file changed, 23 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/4dda585c/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkRuntimeContext.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkRuntimeContext.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkRuntimeContext.java
index 9c3d79f..4ccfead 100644
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkRuntimeContext.java
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkRuntimeContext.java
@@ -32,6 +32,7 @@ import org.apache.beam.sdk.coders.CoderRegistry;
 import org.apache.beam.sdk.options.PipelineOptions;
 import org.apache.beam.sdk.transforms.Aggregator;
 import org.apache.beam.sdk.transforms.Combine;
+import org.apache.beam.sdk.util.IOChannelUtils;
 import org.apache.spark.Accumulator;
 
 /**
@@ -40,12 +41,10 @@ import org.apache.spark.Accumulator;
  */
 public class SparkRuntimeContext implements Serializable {
   private final String serializedPipelineOptions;
+  private transient CoderRegistry coderRegistry;
 
-  /**
-   * Map fo names to Beam aggregators.
-   */
+  // map for names to Beam aggregators.
   private final Map aggregators = new HashMap<>();
-  private transient CoderRegistry coderRegistry;
 
   SparkRuntimeContext(Pipeline pipeline) {
 this.serializedPipelineOptions = 
serializePipelineOptions(pipeline.getOptions());
@@ -67,8 +66,8 @@ public class SparkRuntimeContext implements Serializable {
 }
   }
 
-  public synchronized PipelineOptions getPipelineOptions() {
-return deserializePipelineOptions(serializedPipelineOptions);
+  public PipelineOptions getPipelineOptions() {
+return PipelineOptionsHolder.getOrInit(serializedPipelineOptions);
   }
 
   /**
@@ -118,6 +117,24 @@ public class SparkRuntimeContext implements Serializable {
 return coderRegistry;
   }
 
+  private static class PipelineOptionsHolder {
+// on executors, this should deserialize once.
+private static transient volatile PipelineOptions pipelineOptions = null;
+
+static PipelineOptions getOrInit(String serializedPipelineOptions) {
+  if (pipelineOptions == null) {
+synchronized (PipelineOptionsHolder.class) {
+  if (pipelineOptions == null) {
+pipelineOptions = 
deserializePipelineOptions(serializedPipelineOptions);
+  }
+}
+// register IO factories.
+IOChannelUtils.registerIOFactoriesAllowOverride(pipelineOptions);
+  }
+  return pipelineOptions;
+}
+  }
+
   /**
* Initialize spark aggregators exactly once.
*

[jira] [Commented] (BEAM-1556) Spark executors need to register IO factories

2017-03-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898830#comment-15898830
 ] 

ASF GitHub Bot commented on BEAM-1556:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2169


> Spark executors need to register IO factories
> -
>
> Key: BEAM-1556
> URL: https://issues.apache.org/jira/browse/BEAM-1556
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Frances Perry
>Assignee: Amit Sela
>
> The Spark executors need to call IOChannelUtils.registerIOFactories(options) 
> in order to support GCS file and make the default WordCount example work.
> Context in this thread: 
> https://lists.apache.org/thread.html/469a139c9eb07e64e514cdea42ab8000678ab743794a090c365205d7@%3Cuser.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (BEAM-1546) Specify exact version for Python in the SDK

2017-03-06 Thread Tibor Kiss (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tibor Kiss updated BEAM-1546:
-
Description: 
Python SDK currently supports Python-2.7 only.

apache_beam package's init uses named component attribute 
(sys.version_info.*major*) to bail if unsupported Python is used.
The named component based version was introduced in Python 2.7 thus
if one uses older Python version (e.g. 2.6) an AttributeError will be thrown:
{noformat}
Traceback (most recent call last):
  File "apache_beam/examples/complete/autocomplete_test.py", line 22, in 

import apache_beam as beam
  File "/Users/tiborkiss/workspace/beam/sdks/python/apache_beam/__init__.py", 
line 69, in 
if sys.version_info.major != 2:
AttributeError: 'tuple' object has no attribute 'major'
{noformat}

To fix this problem the {{sys.version_info.major}} should be replaced by 
{{sys.version_info[0]}}.


  was:
Python SDK currently supports Python2.7 only.

There are two shortcomings with the version check/enforcement:
1) apache_beam package's init uses named component attribute 
(sys.version_info.*major*) to bail if unsupported Python is used.
The named component based version was introduced in Python 2.7 thus
if one uses older Python version (e.g. 2.6) an AttributeError will be thrown:
{noformat}
Traceback (most recent call last):
  File "apache_beam/examples/complete/autocomplete_test.py", line 22, in 

import apache_beam as beam
  File "/Users/tiborkiss/workspace/beam/sdks/python/apache_beam/__init__.py", 
line 69, in 
if sys.version_info.major != 2:
AttributeError: 'tuple' object has no attribute 'major'
{noformat}

To fix this problem the {{sys.version_info.major}} should be replaced by 
{{sys.version_info[0]}}.

2) The Python interpreter & pip definition in pom.xml defines that {{python2}} 
& {{pip2}} should be used. Users with multiple Python interpreters installed 
might end up having python2 and pip2 pointing to their 2.6 installation. 
Calling out {{python2.7}} and {{pip2.7}} explicitly would help to resolve this 
problem.


> Specify exact version for Python in the SDK
> ---
>
> Key: BEAM-1546
> URL: https://issues.apache.org/jira/browse/BEAM-1546
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Affects Versions: 0.6.0
>Reporter: Tibor Kiss
>Assignee: Ahmet Altay
>Priority: Trivial
> Fix For: First stable release
>
>
> Python SDK currently supports Python-2.7 only.
> apache_beam package's init uses named component attribute 
> (sys.version_info.*major*) to bail if unsupported Python is used.
> The named component based version was introduced in Python 2.7 thus
> if one uses older Python version (e.g. 2.6) an AttributeError will be thrown:
> {noformat}
> Traceback (most recent call last):
>   File "apache_beam/examples/complete/autocomplete_test.py", line 22, in 
> 
> import apache_beam as beam
>   File "/Users/tiborkiss/workspace/beam/sdks/python/apache_beam/__init__.py", 
> line 69, in 
> if sys.version_info.major != 2:
> AttributeError: 'tuple' object has no attribute 'major'
> {noformat}
> To fix this problem the {{sys.version_info.major}} should be replaced by 
> {{sys.version_info[0]}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (BEAM-1633) Move .tox/ directory under target/ in Python SDK

2017-03-06 Thread Tibor Kiss (JIRA)

Tibor Kiss created BEAM-1633:


 Summary: Move .tox/ directory under target/ in Python SDK
 Key: BEAM-1633
 URL: https://issues.apache.org/jira/browse/BEAM-1633
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py
Reporter: Tibor Kiss
Assignee: Tibor Kiss
Priority: Minor


Beam uses Maven to initiate the build / test process for all components and the 
Python SDK relies on Tox to create the test environment. 

Tox's workdir is currently placed under beam/sdks/python/.tox, which is not 
located in Maven's default build output (target/) directory.
 
Due to the current directory layout maven does not remove Tox  workdir during 
{{mvn clean}}.

Proposed fix: move .tox directory under beam/sdks/python/target directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (BEAM-1439) Beam Example(s) exploring public document datasets

2017-03-06 Thread SungJunyoung (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891773#comment-15891773
 ] 

SungJunyoung edited comment on BEAM-1439 at 3/7/17 4:46 AM:


Hello, I am a third year student in computer engineering at Kyunghee University 
in Korea. I came to know this project through the GSoC list. I am very 
interested in the apache beam project. And I wrote a simple pipeline of 
documentation. Contributing to the project by creating examples and datasets 
that use advanced pipelines seems very interesting. If you have a document or a 
mail address that can be contacted, it would be a great help to me. Thank you!

ps. I am trying to translate the Beam documents in github : 
https://github.com/sungjunyoung/apache_beam_doc_ko


was (Author: wnsdud1861):
Hello, I am a third year student in computer engineering at Kyunghee University 
in Korea. I came to know this project through the GSoC list. I am very 
interested in the apache beam project. And I wrote a simple pipeline of 
documentation. Contributing to the project by creating examples and datasets 
that use advanced pipelines seems very interesting. If you have a document or a 
mail address that can be contacted, it would be a great help to me. Thank you!

> Beam Example(s) exploring public document datasets
> --
>
> Key: BEAM-1439
> URL: https://issues.apache.org/jira/browse/BEAM-1439
> Project: Beam
>  Issue Type: Wish
>  Components: examples-java
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Minor
>  Labels: gsoc2017, java, mentor, python
>
> In Beam, we have examples illustrating counting the occurrences of words and 
> performing a basic TF-IDF analysis on the works of Shakespeare (or whatever 
> you point it at). It would be even cooler to do these analyses, and more, on 
> a much larger data set that is really the subject of current investigations.
> In chatting with professors at the University of Washington, I've learned 
> that scholars of many fields would really like to explore new and highly 
> customized ways of processing the growing body of publicly-available 
> scholarly documents, such as PubMed Central. Queries like "show me documents 
> where chemical compounds X and Y were both used in the 'method' section"
> So I propose a Google Summer of Code project wherein a student writes some 
> large-scale Beam pipelines to perform analyses such as term frequency, bigram 
> frequency, etc.
> Skills required:
>  - Java or Python
>  - (nice to have) Working through the Beam getting started materials



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-59) IOChannelFactory rethinking/redesign

2017-03-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-59?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898601#comment-15898601
 ] 

ASF GitHub Bot commented on BEAM-59:


GitHub user peihe opened a pull request:

https://github.com/apache/beam/pull/2175

[BEAM-59] Beam FileSystems: add match(), copy(), rename(), delete() 
utilities.



Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/peihe/incubator-beam file-system-FileSystems

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2175.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2175


commit 5508baac909acb935c6be4794c1cc4909e072517
Author: Pei He 
Date:   2017-03-02T02:58:45Z

[BEAM-59] Beam FileSystems: add match(), copy(), rename(), delete() 
utilities.




> IOChannelFactory rethinking/redesign
> 
>
> Key: BEAM-59
> URL: https://issues.apache.org/jira/browse/BEAM-59
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core, sdk-java-gcp
>Reporter: Daniel Halperin
>Assignee: Pei He
> Fix For: First stable release
>
>
> Right now, FileBasedSource and FileBasedSink communication is mediated by 
> IOChannelFactory. There are a number of issues:
> * Global configuration -- e.g., all 'gs://' URIs use the same credentials. 
> This should be per-source/per-sink/etc.
> * Supported APIs -- currently IOChannelFactory is in the "non-public API" 
> util package and subject to change. We need users to be able to add new 
> backends ('s3://', 'hdfs://', etc.) directly, without fear that they will be 
> broken.
> * Per-backend features: e.g., creating buckets in GCS/s3, setting expiration 
> time, etc.
> Updates:
> Design docs posted on dev@ list:
> Part 1: IOChannelFactory Redesign: 
> https://docs.google.com/document/d/11TdPyZ9_zmjokhNWM3Id-XJsVG3qel2lhdKTknmZ_7M/edit#
> Part 2: Configurable BeamFileSystem:
> https://docs.google.com/document/d/1-7vo9nLRsEEzDGnb562PuL4q9mUiq_ZVpCAiyyJw8p8/edit#heading=h.p3gc3colc2cs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2175: [BEAM-59] Beam FileSystems: add match(), copy(), re...

2017-03-06 Thread peihe

GitHub user peihe opened a pull request:

https://github.com/apache/beam/pull/2175

[BEAM-59] Beam FileSystems: add match(), copy(), rename(), delete() 
utilities.



Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/peihe/incubator-beam file-system-FileSystems

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2175.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2175


commit 5508baac909acb935c6be4794c1cc4909e072517
Author: Pei He 
Date:   2017-03-02T02:58:45Z

[BEAM-59] Beam FileSystems: add match(), copy(), rename(), delete() 
utilities.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #2831

2017-03-06 Thread Apache Jenkins Server

See

Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #2830

2017-03-06 Thread Apache Jenkins Server

See

[jira] [Commented] (BEAM-1624) Unable to deserialize Coder in DataflowRunner

2017-03-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898462#comment-15898462
 ] 

ASF GitHub Bot commented on BEAM-1624:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2173


> Unable to deserialize Coder in DataflowRunner
> -
>
> Key: BEAM-1624
> URL: https://issues.apache.org/jira/browse/BEAM-1624
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Frances Perry
>Assignee: Kenneth Knowles
>Priority: Blocker
> Fix For: 0.6.0
>
>
> To repro, sync to head and run the LeaderBoard example with the Dataflow 
> runner
> Does not repro in 0.5.
> Caused by: java.lang.RuntimeException: Unable to deserialize Coder: 
> WindowedValue$FullWindowedValueCoder(KvCoder(BigQueryIO$ShardedKeyCoder(StringUtf8Coder),BigQueryIO$TableRowInfoCoder),IntervalWindow$IntervalWindowCoder).
>  Check that a suitable constructor is defined.  See Coder for details.
>   at 
> org.apache.beam.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:115)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$StepTranslator.addOutput(DataflowPipelineTranslator.java:655)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$StepTranslator.addOutput(DataflowPipelineTranslator.java:602)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translateOutputs(DataflowPipelineTranslator.java:945)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator.access$1200(DataflowPipelineTranslator.java:111)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translateMultiHelper(DataflowPipelineTranslator.java:836)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translate(DataflowPipelineTranslator.java:826)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translate(DataflowPipelineTranslator.java:823)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.visitPrimitiveTransform(DataflowPipelineTranslator.java:413)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:486)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.access$400(TransformHierarchy.java:231)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:206)
>   at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:321)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.translate(DataflowPipelineTranslator.java:363)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translate(DataflowPipelineTranslator.java:153)
>   at 
> org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:505)
>   at 
> org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:150)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:210)
>   at 
> org.apache.beam.examples.complete.game.GameStats.main(GameStats.java:340)
>   ... 6 more
> Caused by: java.lang.RuntimeException: Unable to deserialize class interface 
> org.apache.beam.sdk.coders.Coder
>   at org.apache.beam.sdk.util.Serializer.deserialize(Serializer.java:102)
>   at 
> org.apache.beam.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:112)
>   ... 29 more



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2173: [BEAM-1624] Add tests for serialization of BigQuery...

2017-03-06 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2173


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[1/2] beam git commit: Add tests for serialization of BigQueryIO.TableRowInfoCoder

2017-03-06 Thread kenn

Repository: beam
Updated Branches:
  refs/heads/master 5d120bd3f -> 410534b1f


Add tests for serialization of BigQueryIO.TableRowInfoCoder


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/57886246
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/57886246
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/57886246

Branch: refs/heads/master
Commit: 57886246a08e82a2cff0d6b95816b76a5b912595
Parents: 5d120bd
Author: Kenneth Knowles 
Authored: Mon Mar 6 16:02:54 2017 -0800
Committer: Kenneth Knowles 
Committed: Mon Mar 6 16:02:54 2017 -0800

--
 .../beam/sdk/io/gcp/bigquery/BigQueryIO.java   |  3 ++-
 .../beam/sdk/io/gcp/bigquery/BigQueryIOTest.java   | 17 +
 2 files changed, 19 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/57886246/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
--
diff --git 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
index be9a786..0e1c6fc 100644
--- 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
+++ 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
@@ -2887,7 +2887,8 @@ public class BigQueryIO {
 VarIntCoder shardNumberCoder;
   }
 
-  private static class TableRowInfoCoder extends AtomicCoder {
+  @VisibleForTesting
+  static class TableRowInfoCoder extends AtomicCoder {
 private static final TableRowInfoCoder INSTANCE = new TableRowInfoCoder();
 
 @JsonCreator

http://git-wip-us.apache.org/repos/asf/beam/blob/57886246/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOTest.java
--
diff --git 
a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOTest.java
 
b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOTest.java
index fe41703..c9061a3 100644
--- 
a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOTest.java
+++ 
b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOTest.java
@@ -145,6 +145,7 @@ import org.apache.beam.sdk.transforms.display.DisplayData;
 import org.apache.beam.sdk.transforms.display.DisplayDataEvaluator;
 import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
 import org.apache.beam.sdk.transforms.windowing.GlobalWindow;
+import org.apache.beam.sdk.transforms.windowing.IntervalWindow;
 import org.apache.beam.sdk.transforms.windowing.NonMergingWindowFn;
 import org.apache.beam.sdk.transforms.windowing.Window;
 import org.apache.beam.sdk.transforms.windowing.WindowFn;
@@ -154,6 +155,7 @@ import org.apache.beam.sdk.util.IOChannelUtils;
 import org.apache.beam.sdk.util.MimeTypes;
 import org.apache.beam.sdk.util.PCollectionViews;
 import org.apache.beam.sdk.util.Transport;
+import org.apache.beam.sdk.util.WindowedValue;
 import org.apache.beam.sdk.util.WindowingStrategy;
 import org.apache.beam.sdk.values.KV;
 import org.apache.beam.sdk.values.PCollection;
@@ -2475,6 +2477,21 @@ public class BigQueryIOTest implements Serializable {
   }
 
   @Test
+  public void testTableRowInfoCoderSerializable() {
+CoderProperties.coderSerializable(BigQueryIO.TableRowInfoCoder.of());
+  }
+
+  @Test
+  public void testComplexCoderSerializable() {
+CoderProperties.coderSerializable(
+WindowedValue.getFullCoder(
+KvCoder.of(
+BigQueryIO.ShardedKeyCoder.of(StringUtf8Coder.of()),
+BigQueryIO.TableRowInfoCoder.of()),
+IntervalWindow.getCoder()));
+  }
+
+  @Test
   public void testUniqueStepIdRead() {
 RuntimeTestOptions options = 
PipelineOptionsFactory.as(RuntimeTestOptions.class);
 BigQueryOptions bqOptions = options.as(BigQueryOptions.class);

[2/2] beam git commit: This closes #2173: Add tests for serialization of BigQueryIO.TableRowInfoCoder

2017-03-06 Thread kenn

This closes #2173: Add tests for serialization of BigQueryIO.TableRowInfoCoder


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/410534b1
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/410534b1
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/410534b1

Branch: refs/heads/master
Commit: 410534b1fe591210ccd310bb0a0262663887d161
Parents: 5d120bd 5788624
Author: Kenneth Knowles 
Authored: Mon Mar 6 16:13:48 2017 -0800
Committer: Kenneth Knowles 
Committed: Mon Mar 6 16:13:48 2017 -0800

--
 .../beam/sdk/io/gcp/bigquery/BigQueryIO.java   |  3 ++-
 .../beam/sdk/io/gcp/bigquery/BigQueryIOTest.java   | 17 +
 2 files changed, 19 insertions(+), 1 deletion(-)
--

[jira] [Commented] (BEAM-1310) Add running integration tests for JdbcIO

2017-03-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898457#comment-15898457
 ] 

ASF GitHub Bot commented on BEAM-1310:
--

GitHub user ssisk opened a pull request:

https://github.com/apache/beam/pull/2174

[BEAM-1310] updates to JdbcIO k8 scripts & data loading

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [X] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [X] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [X] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [X] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

We have started setting up the k8 cluster & postgres instances, and I had a 
few small updates as I got things working. 

R @jbonofre 
cc @jasonkuster 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ssisk/beam jdbc-it-profiles

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2174.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2174


commit a59aebad8481044714e0ed8532812de263793843
Author: Stephen Sisk 
Date:   2017-03-06T23:59:31Z

Jdbc k8 & data loading: add teardown and update names/docs

commit 156018bbdc7769de6095425504d38703a8e81762
Author: Stephen Sisk 
Date:   2017-03-07T00:01:26Z

Jdbc k8 script: postgres data store only accessible inside test project




> Add running integration tests for JdbcIO
> 
>
> Key: BEAM-1310
> URL: https://issues.apache.org/jira/browse/BEAM-1310
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Reporter: Stephen Sisk
>Assignee: Stephen Sisk
> Fix For: 0.6.0
>
>
> Jdbc IO could use some integration tests! We'd like to have them run against 
> a real list instance of postgres.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2174: [BEAM-1310] updates to JdbcIO k8 scripts & data loa...

2017-03-06 Thread ssisk

GitHub user ssisk opened a pull request:

https://github.com/apache/beam/pull/2174

[BEAM-1310] updates to JdbcIO k8 scripts & data loading

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [X] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [X] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [X] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [X] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

We have started setting up the k8 cluster & postgres instances, and I had a 
few small updates as I got things working. 

R @jbonofre 
cc @jasonkuster 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ssisk/beam jdbc-it-profiles

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2174.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2174


commit a59aebad8481044714e0ed8532812de263793843
Author: Stephen Sisk 
Date:   2017-03-06T23:59:31Z

Jdbc k8 & data loading: add teardown and update names/docs

commit 156018bbdc7769de6095425504d38703a8e81762
Author: Stephen Sisk 
Date:   2017-03-07T00:01:26Z

Jdbc k8 script: postgres data store only accessible inside test project




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (BEAM-1624) Unable to deserialize Coder in DataflowRunner

2017-03-06 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898433#comment-15898433
 ] 

Kenneth Knowles commented on BEAM-1624:
---

Couldn't repro - possible that the jar was corrupted via some 
multiple-jobs-running race condition?

> Unable to deserialize Coder in DataflowRunner
> -
>
> Key: BEAM-1624
> URL: https://issues.apache.org/jira/browse/BEAM-1624
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Frances Perry
>Assignee: Kenneth Knowles
>Priority: Blocker
> Fix For: 0.6.0
>
>
> To repro, sync to head and run the LeaderBoard example with the Dataflow 
> runner
> Does not repro in 0.5.
> Caused by: java.lang.RuntimeException: Unable to deserialize Coder: 
> WindowedValue$FullWindowedValueCoder(KvCoder(BigQueryIO$ShardedKeyCoder(StringUtf8Coder),BigQueryIO$TableRowInfoCoder),IntervalWindow$IntervalWindowCoder).
>  Check that a suitable constructor is defined.  See Coder for details.
>   at 
> org.apache.beam.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:115)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$StepTranslator.addOutput(DataflowPipelineTranslator.java:655)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$StepTranslator.addOutput(DataflowPipelineTranslator.java:602)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translateOutputs(DataflowPipelineTranslator.java:945)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator.access$1200(DataflowPipelineTranslator.java:111)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translateMultiHelper(DataflowPipelineTranslator.java:836)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translate(DataflowPipelineTranslator.java:826)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translate(DataflowPipelineTranslator.java:823)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.visitPrimitiveTransform(DataflowPipelineTranslator.java:413)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:486)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.access$400(TransformHierarchy.java:231)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:206)
>   at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:321)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.translate(DataflowPipelineTranslator.java:363)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translate(DataflowPipelineTranslator.java:153)
>   at 
> org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:505)
>   at 
> org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:150)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:210)
>   at 
> org.apache.beam.examples.complete.game.GameStats.main(GameStats.java:340)
>   ... 6 more
> Caused by: java.lang.RuntimeException: Unable to deserialize class interface 
> org.apache.beam.sdk.coders.Coder
>   at org.apache.beam.sdk.util.Serializer.deserialize(Serializer.java:102)
>   at 
> org.apache.beam.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:112)
>   ... 29 more



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-1624) Unable to deserialize Coder in DataflowRunner

2017-03-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898417#comment-15898417
 ] 

ASF GitHub Bot commented on BEAM-1624:
--

GitHub user kennknowles opened a pull request:

https://github.com/apache/beam/pull/2173

[BEAM-1624] Add tests for serialization of BigQueryIO.TableRowInfoCoder

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

I have not yet reproduced the issue in BEAM-1624 (my LeaderBoard pipeline 
seems happy), but these tests are a good idea anyhow.

R: @tgroh 
CC: @francesperry 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/beam BQIO-Coder

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2173.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2173


commit 57886246a08e82a2cff0d6b95816b76a5b912595
Author: Kenneth Knowles 
Date:   2017-03-07T00:02:54Z

Add tests for serialization of BigQueryIO.TableRowInfoCoder




> Unable to deserialize Coder in DataflowRunner
> -
>
> Key: BEAM-1624
> URL: https://issues.apache.org/jira/browse/BEAM-1624
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Frances Perry
>Assignee: Kenneth Knowles
>Priority: Blocker
> Fix For: 0.6.0
>
>
> To repro, sync to head and run the LeaderBoard example with the Dataflow 
> runner
> Does not repro in 0.5.
> Caused by: java.lang.RuntimeException: Unable to deserialize Coder: 
> WindowedValue$FullWindowedValueCoder(KvCoder(BigQueryIO$ShardedKeyCoder(StringUtf8Coder),BigQueryIO$TableRowInfoCoder),IntervalWindow$IntervalWindowCoder).
>  Check that a suitable constructor is defined.  See Coder for details.
>   at 
> org.apache.beam.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:115)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$StepTranslator.addOutput(DataflowPipelineTranslator.java:655)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$StepTranslator.addOutput(DataflowPipelineTranslator.java:602)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translateOutputs(DataflowPipelineTranslator.java:945)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator.access$1200(DataflowPipelineTranslator.java:111)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translateMultiHelper(DataflowPipelineTranslator.java:836)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translate(DataflowPipelineTranslator.java:826)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translate(DataflowPipelineTranslator.java:823)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.visitPrimitiveTransform(DataflowPipelineTranslator.java:413)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:486)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.access$400(TransformHierarchy.java:231)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:206)
>   at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:321)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.translate(DataflowPipelineTranslator.java:363)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translate(DataflowPipelineTranslator.java:153)
>   at 
>

[GitHub] beam pull request #2173: [BEAM-1624] Add tests for serialization of BigQuery...

2017-03-06 Thread kennknowles

GitHub user kennknowles opened a pull request:

https://github.com/apache/beam/pull/2173

[BEAM-1624] Add tests for serialization of BigQueryIO.TableRowInfoCoder

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

I have not yet reproduced the issue in BEAM-1624 (my LeaderBoard pipeline 
seems happy), but these tests are a good idea anyhow.

R: @tgroh 
CC: @francesperry 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/beam BQIO-Coder

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2173.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2173


commit 57886246a08e82a2cff0d6b95816b76a5b912595
Author: Kenneth Knowles 
Date:   2017-03-07T00:02:54Z

Add tests for serialization of BigQueryIO.TableRowInfoCoder




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] beam pull request #1429: [BEAM-970] add side-input support to gearpump-runne...

2017-03-06 Thread manuzhang

Github user manuzhang closed the pull request at:

https://github.com/apache/beam/pull/1429


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (BEAM-1624) Unable to deserialize Coder in DataflowRunner

2017-03-06 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898387#comment-15898387
 ] 

Kenneth Knowles commented on BEAM-1624:
---

False; the coder is serializable. The topmost error is a red herring - the 
inability to deserialize {{Coder}} is more curious.

> Unable to deserialize Coder in DataflowRunner
> -
>
> Key: BEAM-1624
> URL: https://issues.apache.org/jira/browse/BEAM-1624
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Frances Perry
>Assignee: Kenneth Knowles
>Priority: Blocker
> Fix For: 0.6.0
>
>
> To repro, sync to head and run the LeaderBoard example with the Dataflow 
> runner
> Does not repro in 0.5.
> Caused by: java.lang.RuntimeException: Unable to deserialize Coder: 
> WindowedValue$FullWindowedValueCoder(KvCoder(BigQueryIO$ShardedKeyCoder(StringUtf8Coder),BigQueryIO$TableRowInfoCoder),IntervalWindow$IntervalWindowCoder).
>  Check that a suitable constructor is defined.  See Coder for details.
>   at 
> org.apache.beam.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:115)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$StepTranslator.addOutput(DataflowPipelineTranslator.java:655)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$StepTranslator.addOutput(DataflowPipelineTranslator.java:602)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translateOutputs(DataflowPipelineTranslator.java:945)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator.access$1200(DataflowPipelineTranslator.java:111)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translateMultiHelper(DataflowPipelineTranslator.java:836)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translate(DataflowPipelineTranslator.java:826)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translate(DataflowPipelineTranslator.java:823)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.visitPrimitiveTransform(DataflowPipelineTranslator.java:413)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:486)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.access$400(TransformHierarchy.java:231)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:206)
>   at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:321)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.translate(DataflowPipelineTranslator.java:363)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translate(DataflowPipelineTranslator.java:153)
>   at 
> org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:505)
>   at 
> org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:150)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:210)
>   at 
> org.apache.beam.examples.complete.game.GameStats.main(GameStats.java:340)
>   ... 6 more
> Caused by: java.lang.RuntimeException: Unable to deserialize class interface 
> org.apache.beam.sdk.coders.Coder
>   at org.apache.beam.sdk.util.Serializer.deserialize(Serializer.java:102)
>   at 
> org.apache.beam.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:112)
>   ... 29 more



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam-site pull request #171: Typo fix in "Reading Data Into Your Pipeline" e...

2017-03-06 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam-site/pull/171


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[3/3] beam-site git commit: This closes #171

2017-03-06 Thread davor

This closes #171


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/dd1ca3fd
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/dd1ca3fd
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/dd1ca3fd

Branch: refs/heads/asf-site
Commit: dd1ca3fd3f8d51be5debd623d05cda7f393ea1b3
Parents: bbe0283 8b84da8
Author: Davor Bonaci 
Authored: Mon Mar 6 15:21:07 2017 -0800
Committer: Davor Bonaci 
Committed: Mon Mar 6 15:21:07 2017 -0800

--
 content/contribute/contribution-guide/index.html   | 6 ++
 .../documentation/pipelines/create-your-pipeline/index.html| 2 +-
 src/documentation/pipelines/create-your-pipeline.md| 4 ++--
 3 files changed, 9 insertions(+), 3 deletions(-)
--

[2/3] beam-site git commit: Regenerate website

2017-03-06 Thread davor

Regenerate website


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/8b84da8a
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/8b84da8a
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/8b84da8a

Branch: refs/heads/asf-site
Commit: 8b84da8ab6f210e90fcf53c125af4ce16cb787d7
Parents: c71edc8
Author: Davor Bonaci 
Authored: Mon Mar 6 15:21:07 2017 -0800
Committer: Davor Bonaci 
Committed: Mon Mar 6 15:21:07 2017 -0800

--
 content/contribute/contribution-guide/index.html   | 6 ++
 .../documentation/pipelines/create-your-pipeline/index.html| 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/beam-site/blob/8b84da8a/content/contribute/contribution-guide/index.html
--
diff --git a/content/contribute/contribution-guide/index.html 
b/content/contribute/contribution-guide/index.html
index 5f8ec98..92e009a 100644
--- a/content/contribute/contribution-guide/index.html
+++ b/content/contribute/contribution-guide/index.html
@@ -156,6 +156,7 @@
   Engage
   Mailing 
list(s)
   JIRA issue tracker
+  Online discussions
 
   
   Design
@@ -254,6 +255,11 @@
 
 For moderate or large contributions, you should not start coding or writing 
a design document unless there is a corresponding JIRA issue assigned to you 
for that work. Simple changes, like fixing typos, do not require an associated 
issue.
 
+Online discussions
+We donât have an official IRC channel. Most of the online discussions 
happen in the https://apachebeam.slack.com/;>Apache Beam Slack 
channel. If you want access, you need to send an email to the user mailing 
list to mailto:u...@beam.apache.org?subject=Regarding Beam Slack 
Channelbody=Hello%0D%0A%0ACan someone please add me to the Beam slack 
channel?%0D%0A%0AThanks.">request access.
+
+Chat rooms are great for quick questions or discussions on specialized 
topics. Remember that we strongly encourage communication via the mailing 
lists, and we prefer to discuss more complex subjects by email. Developers 
should be careful to move or duplicate all the official or useful discussions 
to the issue tracking system and/or the dev mailing list.
+
 Design
 
 To avoid potential frustration during the code review cycle, we encourage 
you to clearly scope and design non-trivial contributions with the Beam 
community before you start coding.

http://git-wip-us.apache.org/repos/asf/beam-site/blob/8b84da8a/content/documentation/pipelines/create-your-pipeline/index.html
--
diff --git a/content/documentation/pipelines/create-your-pipeline/index.html 
b/content/documentation/pipelines/create-your-pipeline/index.html
index c7cf3fb..e360299 100644
--- a/content/documentation/pipelines/create-your-pipeline/index.html
+++ b/content/documentation/pipelines/create-your-pipeline/index.html
@@ -272,7 +272,7 @@
 The following example code shows how to apply a TextIO.Read root transform to read data from a 
text file. The transform is applied to a Pipeline object p, and returns a pipeline data set in the form 
of a PCollectionString:
 
 PCollectionString 
lines = p.apply(
-  apply("ReadLines", TextIO.Read.from("gs://some/inputData.txt"));
+  "ReadLines", TextIO.Read.from("gs://some/inputData.txt"));

[1/3] beam-site git commit: Typo fix in "Reading Data Into Your Pipeline" example.

2017-03-06 Thread davor

Repository: beam-site
Updated Branches:
  refs/heads/asf-site bbe028320 -> dd1ca3fd3


Typo fix in "Reading Data Into Your Pipeline" example.


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/c71edc8d
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/c71edc8d
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/c71edc8d

Branch: refs/heads/asf-site
Commit: c71edc8d99fa293bd32359d60134f56593571cbe
Parents: bbe0283
Author: Mairbek Khadikov 
Authored: Mon Mar 6 14:41:24 2017 -0800
Committer: Mairbek Khadikov 
Committed: Mon Mar 6 14:41:24 2017 -0800

--
 src/documentation/pipelines/create-your-pipeline.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam-site/blob/c71edc8d/src/documentation/pipelines/create-your-pipeline.md
--
diff --git a/src/documentation/pipelines/create-your-pipeline.md 
b/src/documentation/pipelines/create-your-pipeline.md
index 05b1594..120ec35 100644
--- a/src/documentation/pipelines/create-your-pipeline.md
+++ b/src/documentation/pipelines/create-your-pipeline.md
@@ -107,7 +107,7 @@ The following example code shows how to `apply` a 
`TextIO.Read` root transform t
 
 ```java
 PCollection lines = p.apply(
-  apply("ReadLines", TextIO.Read.from("gs://some/inputData.txt"));
+  "ReadLines", TextIO.Read.from("gs://some/inputData.txt"));
 ```
 
 ## Applying Transforms to Process Pipeline Data
@@ -158,4 +158,4 @@ p.run().waitUntilFinish();
 
 ## What's next
 
-*   [Test your pipeline]({{ site.baseurl 
}}/documentation/pipelines/test-your-pipeline).
\ No newline at end of file
+*   [Test your pipeline]({{ site.baseurl 
}}/documentation/pipelines/test-your-pipeline).

[jira] [Updated] (BEAM-1632) Current bigtable protos include google rpc generated classes in its jar

2017-03-06 Thread Vikas Kedigehalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikas Kedigehalli updated BEAM-1632:

Description: bigtable-protos include generated classes like 
'com.google.rpc.Code' in its jar rather than depending on 
'grpc-google-common-protos'. This conflicts with 'Datastore' dependencies. I am 
not sure what the right solution is but for now the workaround is to tag the 
usage of 'grpc-google-common-protos' as 'usedDependency' to prevent 
'maven-dependency:analyze' issues.   (was: bigtable-protos include generated 
classes like 'com.google.rpc.Code' in its jar rather than depending on 
'grpc-google-common-protos'. I am not sure what the right solution is but for 
now the workaround is to tag the usage of 'grpc-google-common-protos' as 
'usedDependency' to prevent 'maven-dependency:analyze' issues. )

> Current bigtable protos include google rpc generated classes in its jar
> ---
>
> Key: BEAM-1632
> URL: https://issues.apache.org/jira/browse/BEAM-1632
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Reporter: Vikas Kedigehalli
>Assignee: Daniel Halperin
>Priority: Minor
>
> bigtable-protos include generated classes like 'com.google.rpc.Code' in its 
> jar rather than depending on 'grpc-google-common-protos'. This conflicts with 
> 'Datastore' dependencies. I am not sure what the right solution is but for 
> now the workaround is to tag the usage of 'grpc-google-common-protos' as 
> 'usedDependency' to prevent 'maven-dependency:analyze' issues. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (BEAM-1632) Current bigtable protos includes google rpc generated classes that conflict with Datastore dependencies

2017-03-06 Thread Vikas Kedigehalli (JIRA)

Vikas Kedigehalli created BEAM-1632:
---

 Summary: Current bigtable protos includes google rpc generated 
classes that conflict with Datastore dependencies
 Key: BEAM-1632
 URL: https://issues.apache.org/jira/browse/BEAM-1632
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-gcp
Reporter: Vikas Kedigehalli
Assignee: Daniel Halperin
Priority: Minor


bigtable-protos include generated classes like 'com.google.rpc.Code' in its jar 
rather than depending on 'grpc-google-common-protos'. I am not sure what the 
right solution is but for now the workaround is to tag the usage of 
'grpc-google-common-protos' as 'usedDependency' to prevent 
'maven-dependency:analyze' issues. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (BEAM-1632) Current bigtable protos include google rpc generated classes in its jar

2017-03-06 Thread Vikas Kedigehalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikas Kedigehalli updated BEAM-1632:

Summary: Current bigtable protos include google rpc generated classes in 
its jar  (was: Current bigtable protos includes google rpc generated classes 
that conflict with Datastore dependencies)

> Current bigtable protos include google rpc generated classes in its jar
> ---
>
> Key: BEAM-1632
> URL: https://issues.apache.org/jira/browse/BEAM-1632
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Reporter: Vikas Kedigehalli
>Assignee: Daniel Halperin
>Priority: Minor
>
> bigtable-protos include generated classes like 'com.google.rpc.Code' in its 
> jar rather than depending on 'grpc-google-common-protos'. I am not sure what 
> the right solution is but for now the workaround is to tag the usage of 
> 'grpc-google-common-protos' as 'usedDependency' to prevent 
> 'maven-dependency:analyze' issues. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam-site pull request #171: Typo fix in "Reading Data Into Your Pipeline" e...

2017-03-06 Thread mairbek

GitHub user mairbek opened a pull request:

https://github.com/apache/beam-site/pull/171

Typo fix in "Reading Data Into Your Pipeline" example.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mairbek/beam-site fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam-site/pull/171.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #171


commit c71edc8d99fa293bd32359d60134f56593571cbe
Author: Mairbek Khadikov 
Date:   2017-03-06T22:41:24Z

Typo fix in "Reading Data Into Your Pipeline" example.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (BEAM-1631) Flink runner: submit job to a Flink-on-YARN cluster

2017-03-06 Thread Xu Mingmin (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898281#comment-15898281
 ] 

Xu Mingmin commented on BEAM-1631:
--

Do I understand correctly? I can submit a job with Flink runner on YARN, either 
directly on YARN, or by creating a virtual cluster on YARN. 

Flink version 1.2.0
YARN: 2.6/2.7
Beam: 0.6.0-SNAPSHOT

> Flink runner: submit job to a Flink-on-YARN cluster
> ---
>
> Key: BEAM-1631
> URL: https://issues.apache.org/jira/browse/BEAM-1631
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Davor Bonaci
>Assignee: Aljoscha Krettek
>
> As far as I understand, running Beam pipelines on a Flink cluster can be done 
> in two ways:
> * Run directly with a Flink runner, and specifying {{--flinkMaster}} pipeline 
> option via, say, {{mvn exec}}.
> * Produce a bundled JAR, and use {{bin/flink}} to submit the same pipeline.
> These two ways are equivalent, and work well on a standalone Flink cluster.
> Submitting to a Flink-on-YARN is more complicated. You can still produce a 
> bundled JAR, and use {{bin/flink -yid }} to submit such a job. 
> However, that seems impossible with a Flink runner directly.
> If so, we should add the ability to the Flink runner to submit a job to a 
> Flink-on-YARN cluster directly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-1616) Gauge Metric type

2017-03-06 Thread Ben Chambers (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898267#comment-15898267
 ] 

Ben Chambers commented on BEAM-1616:


I believe the other metrics have solved/documented this:

Attempted = "Aggregate across all attempts at bundles"
Committed = "Aggregate across all committed attempts at bundles"

The point is that the general use case of a Gauge as a "I sampled this value at 
this point in time" doesn't fit that model of aggregation.

So maybe it is enough to say that for a gauge, the attempted = committed = the 
latest value observed?

> Gauge Metric type
> -
>
> Key: BEAM-1616
> URL: https://issues.apache.org/jira/browse/BEAM-1616
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-java-core, sdk-py
>Reporter: Aviem Zur
>Assignee: Ben Chambers
>
> Add support for Gauge metric type to the SDK.
> This will serve to get the last value reported.
> Interface should be along the lines of:
> {code}
> void set(long value);
> {code}
> Compare to 
> http://metrics.dropwizard.io/3.1.0/apidocs/com/codahale/metrics/Gauge.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Jenkins build became unstable: beam_PostCommit_Java_MavenInstall #2829

2017-03-06 Thread Apache Jenkins Server

See

[jira] [Commented] (BEAM-1624) Unable to deserialize Coder in DataflowRunner

2017-03-06 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898227#comment-15898227
 ] 

Kenneth Knowles commented on BEAM-1624:
---

My guess is wrong: this is actually a legitimate code path to make sure the 
coder is deserializable via its JSON. So it is likely an erroneous coder in 
BigQueryIO missing some Jackson annotations.

> Unable to deserialize Coder in DataflowRunner
> -
>
> Key: BEAM-1624
> URL: https://issues.apache.org/jira/browse/BEAM-1624
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Frances Perry
>Assignee: Kenneth Knowles
>Priority: Blocker
> Fix For: 0.6.0
>
>
> To repro, sync to head and run the LeaderBoard example with the Dataflow 
> runner
> Does not repro in 0.5.
> Caused by: java.lang.RuntimeException: Unable to deserialize Coder: 
> WindowedValue$FullWindowedValueCoder(KvCoder(BigQueryIO$ShardedKeyCoder(StringUtf8Coder),BigQueryIO$TableRowInfoCoder),IntervalWindow$IntervalWindowCoder).
>  Check that a suitable constructor is defined.  See Coder for details.
>   at 
> org.apache.beam.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:115)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$StepTranslator.addOutput(DataflowPipelineTranslator.java:655)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$StepTranslator.addOutput(DataflowPipelineTranslator.java:602)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translateOutputs(DataflowPipelineTranslator.java:945)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator.access$1200(DataflowPipelineTranslator.java:111)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translateMultiHelper(DataflowPipelineTranslator.java:836)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translate(DataflowPipelineTranslator.java:826)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translate(DataflowPipelineTranslator.java:823)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.visitPrimitiveTransform(DataflowPipelineTranslator.java:413)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:486)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.access$400(TransformHierarchy.java:231)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:206)
>   at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:321)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.translate(DataflowPipelineTranslator.java:363)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translate(DataflowPipelineTranslator.java:153)
>   at 
> org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:505)
>   at 
> org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:150)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:210)
>   at 
> org.apache.beam.examples.complete.game.GameStats.main(GameStats.java:340)
>   ... 6 more
> Caused by: java.lang.RuntimeException: Unable to deserialize class interface 
> org.apache.beam.sdk.coders.Coder
>   at org.apache.beam.sdk.util.Serializer.deserialize(Serializer.java:102)
>   at 
> org.apache.beam.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:112)
>   ... 29 more



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-1631) Flink runner: submit job to a Flink-on-YARN cluster

2017-03-06 Thread Davor Bonaci (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898205#comment-15898205
 ] 

Davor Bonaci commented on BEAM-1631:


[~aljoscha], is this analysis correct? Am I off-base here?

If this sounds reasonable, how hard would it be to add it?

> Flink runner: submit job to a Flink-on-YARN cluster
> ---
>
> Key: BEAM-1631
> URL: https://issues.apache.org/jira/browse/BEAM-1631
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Davor Bonaci
>Assignee: Aljoscha Krettek
>
> As far as I understand, running Beam pipelines on a Flink cluster can be done 
> in two ways:
> * Run directly with a Flink runner, and specifying {{--flinkMaster}} pipeline 
> option via, say, {{mvn exec}}.
> * Produce a bundled JAR, and use {{bin/flink}} to submit the same pipeline.
> These two ways are equivalent, and work well on a standalone Flink cluster.
> Submitting to a Flink-on-YARN is more complicated. You can still produce a 
> bundled JAR, and use {{bin/flink -yid }} to submit such a job. 
> However, that seems impossible with a Flink runner directly.
> If so, we should add the ability to the Flink runner to submit a job to a 
> Flink-on-YARN cluster directly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (BEAM-1631) Flink runner: submit job to a Flink-on-YARN cluster

2017-03-06 Thread Davor Bonaci (JIRA)

Davor Bonaci created BEAM-1631:
--

 Summary: Flink runner: submit job to a Flink-on-YARN cluster
 Key: BEAM-1631
 URL: https://issues.apache.org/jira/browse/BEAM-1631
 Project: Beam
  Issue Type: New Feature
  Components: runner-flink
Reporter: Davor Bonaci
Assignee: Aljoscha Krettek


As far as I understand, running Beam pipelines on a Flink cluster can be done 
in two ways:
* Run directly with a Flink runner, and specifying {{--flinkMaster}} pipeline 
option via, say, {{mvn exec}}.
* Produce a bundled JAR, and use {{bin/flink}} to submit the same pipeline.

These two ways are equivalent, and work well on a standalone Flink cluster.

Submitting to a Flink-on-YARN is more complicated. You can still produce a 
bundled JAR, and use {{bin/flink -yid }} to submit such a job. 
However, that seems impossible with a Flink runner directly.

If so, we should add the ability to the Flink runner to submit a job to a 
Flink-on-YARN cluster directly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Jenkins build is back to stable : beam_PostCommit_Java_RunnableOnService_Dataflow #2482

2017-03-06 Thread Apache Jenkins Server

See

[jira] [Commented] (BEAM-68) Support for limiting parallelism of a step

2017-03-06 Thread Xu Mingmin (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898151#comment-15898151
 ] 

Xu Mingmin commented on BEAM-68:


This's required by some runners. With this parameter, runners, like Flink/Storm 
can leverage it, and those, like Dataflow can ignore it.
I'm not sure about the existing implementation of Flink runner, seems like set 
in job level, meaning same parallelism for each step.

FYI Flink parallel 
https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/parallel.html 
Storm parallel 
http://storm.apache.org/releases/1.0.1/Understanding-the-parallelism-of-a-Storm-topology.html

> Support for limiting parallelism of a step
> --
>
> Key: BEAM-68
> URL: https://issues.apache.org/jira/browse/BEAM-68
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Daniel Halperin
>
> Users may want to limit the parallelism of a step. Two classic uses cases are:
> - User wants to produce at most k files, so sets 
> TextIO.Write.withNumShards(k).
> - External API only supports k QPS, so user sets a limit of k/(expected 
> QPS/step) on the ParDo that makes the API call.
> Unfortunately, there is no way to do this effectively within the Beam model. 
> A GroupByKey with exactly k keys will guarantee that only k elements are 
> produced, but runners are free to break fusion in ways that each element may 
> be processed in parallel later.
> To implement this functionaltiy, I believe we need to add this support to the 
> Beam Model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2172: Revert "DataflowRunner: experimental support for is...

2017-03-06 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2172


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[1/2] beam git commit: Revert "DataflowRunner: experimental support for issuing FnAPI based jobs"

2017-03-06 Thread kenn

Repository: beam
Updated Branches:
  refs/heads/master 626bc38aa -> 5d120bd3f


Revert "DataflowRunner: experimental support for issuing FnAPI based jobs"

This reverts commit 131c9f916dae6345ec77a869112ae5901b568f23.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/b31be1ca
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/b31be1ca
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/b31be1ca

Branch: refs/heads/master
Commit: b31be1caef6a691006082e3e440d0b42ff1d4165
Parents: 2f96bc3
Author: Kenneth Knowles 
Authored: Sun Mar 5 20:03:18 2017 -0800
Committer: Kenneth Knowles 
Committed: Sun Mar 5 20:03:18 2017 -0800

--
 runners/google-cloud-dataflow-java/pom.xml  |  3 +-
 .../dataflow/DataflowPipelineTranslator.java|  3 +-
 .../beam/runners/dataflow/DataflowRunner.java   | 46 
 .../runners/dataflow/DataflowRunnerInfo.java| 38 
 .../options/DataflowPipelineDebugOptions.java   |  2 -
 .../DataflowPipelineWorkerPoolOptions.java  | 10 ++---
 .../beam/runners/dataflow/dataflow.properties   |  8 ++--
 .../dataflow/DataflowRunnerInfoTest.java| 23 +-
 .../runners/dataflow/DataflowRunnerTest.java| 17 
 9 files changed, 58 insertions(+), 92 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/b31be1ca/runners/google-cloud-dataflow-java/pom.xml
--
diff --git a/runners/google-cloud-dataflow-java/pom.xml 
b/runners/google-cloud-dataflow-java/pom.xml
index fb06797..fdd088f 100644
--- a/runners/google-cloud-dataflow-java/pom.xml
+++ b/runners/google-cloud-dataflow-java/pom.xml
@@ -34,8 +34,7 @@
 
   
 
beam-master-20170228
-
1
-
6
+6
   
 
   

http://git-wip-us.apache.org/repos/asf/beam/blob/b31be1ca/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java
--
diff --git 
a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java
 
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java
index ab4cb9c..7a78a4c 100644
--- 
a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java
+++ 
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java
@@ -326,7 +326,8 @@ public class DataflowPipelineTranslator {
   workerPool.setNumWorkers(options.getNumWorkers());
 
   if (options.isStreaming()
-  && !DataflowRunner.hasExperiment(options, 
"enable_windmill_service")) {
+  && (options.getExperiments() == null
+  || 
!options.getExperiments().contains("enable_windmill_service"))) {
 // Use separate data disk for streaming.
 Disk disk = new Disk();
 disk.setDiskType(options.getWorkerDiskType());

http://git-wip-us.apache.org/repos/asf/beam/blob/b31be1ca/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java
--
diff --git 
a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java
 
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java
index c609b54..15147f1 100644
--- 
a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java
+++ 
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java
@@ -51,6 +51,7 @@ import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.Collection;
 import java.util.Collections;
+import java.util.HashMap;
 import java.util.HashSet;
 import java.util.List;
 import java.util.Map;
@@ -303,12 +304,14 @@ public class DataflowRunner extends 
PipelineRunner {
 PTransformMatchers.parDoWithFnType(unsupported),
 
UnsupportedOverrideFactory.withMessage(getUnsupportedMessage(unsupported, 
true)));
   }
-  if (!hasExperiment(options, "enable_custom_pubsub_source")) {
+  if (options.getExperiments() == null
+  || 
!options.getExperiments().contains("enable_custom_pubsub_source")) {
 ptoverrides.put(
 PTransformMatchers.classEqualTo(PubsubUnboundedSource.class),
 new ReflectiveRootOverrideFactory(StreamingPubsubIORead.class, 
this));
   }
-  if (!hasExperiment(options, "enable_custom_pubsub_sink")) {
+  if (options.getExperiments() == null
+  ||

[2/2] beam git commit: This closes #2172: Revert "DataflowRunner: experimental support for issuing FnAPI based jobs"

2017-03-06 Thread kenn

This closes #2172: Revert "DataflowRunner: experimental support for issuing 
FnAPI based jobs"


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/5d120bd3
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/5d120bd3
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/5d120bd3

Branch: refs/heads/master
Commit: 5d120bd3f02214e7e88456ec735cd499be307ff4
Parents: 626bc38 b31be1c
Author: Kenneth Knowles 
Authored: Mon Mar 6 12:49:20 2017 -0800
Committer: Kenneth Knowles 
Committed: Mon Mar 6 12:49:20 2017 -0800

--
 runners/google-cloud-dataflow-java/pom.xml  |  3 +-
 .../dataflow/DataflowPipelineTranslator.java|  3 +-
 .../beam/runners/dataflow/DataflowRunner.java   | 46 
 .../runners/dataflow/DataflowRunnerInfo.java| 38 
 .../options/DataflowPipelineDebugOptions.java   |  2 -
 .../DataflowPipelineWorkerPoolOptions.java  | 10 ++---
 .../beam/runners/dataflow/dataflow.properties   |  8 ++--
 .../dataflow/DataflowRunnerInfoTest.java| 23 +-
 .../runners/dataflow/DataflowRunnerTest.java| 17 
 9 files changed, 58 insertions(+), 92 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/5d120bd3/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java
--

[GitHub] beam pull request #2148: Fix tox warning for non-whitelisted find command

2017-03-06 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2148


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[1/2] beam git commit: Fix tox warning for non-whitelisted find command

2017-03-06 Thread altay

Repository: beam
Updated Branches:
  refs/heads/master 9cc8018b3 -> 626bc38aa


Fix tox warning for non-whitelisted find command


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/a4e8bef4
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/a4e8bef4
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/a4e8bef4

Branch: refs/heads/master
Commit: a4e8bef4fee90ef4107282f1fcb14110c8fc38be
Parents: 9cc8018
Author: Ahmet Altay 
Authored: Thu Mar 2 16:15:38 2017 -0800
Committer: Ahmet Altay 
Committed: Mon Mar 6 11:57:01 2017 -0800

--
 sdks/python/tox.ini | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/a4e8bef4/sdks/python/tox.ini
--
diff --git a/sdks/python/tox.ini b/sdks/python/tox.ini
index 927c211..3e4f2f2 100644
--- a/sdks/python/tox.ini
+++ b/sdks/python/tox.ini
@@ -44,6 +44,7 @@ platform = linux2
 deps =
   nose
   cython
+whitelist_externals=find
 commands =
   python --version
   pip install -e .[test]

[2/2] beam git commit: This closes #2148

2017-03-06 Thread altay

This closes #2148


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/626bc38a
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/626bc38a
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/626bc38a

Branch: refs/heads/master
Commit: 626bc38aa6f5583f0a813761afbb753c1766d178
Parents: 9cc8018 a4e8bef
Author: Ahmet Altay 
Authored: Mon Mar 6 11:57:06 2017 -0800
Committer: Ahmet Altay 
Committed: Mon Mar 6 11:57:06 2017 -0800

--
 sdks/python/tox.ini | 1 +
 1 file changed, 1 insertion(+)
--

Jenkins build became unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #2481

2017-03-06 Thread Apache Jenkins Server

See

Jenkins build is back to stable : beam_PostCommit_Java_RunnableOnService_Spark #1153

2017-03-06 Thread Apache Jenkins Server

See

[GitHub] beam pull request #2172: Revert "DataflowRunner: experimental support for is...

2017-03-06 Thread kennknowles

GitHub user kennknowles opened a pull request:

https://github.com/apache/beam/pull/2172

Revert "DataflowRunner: experimental support for issuing FnAPI based jobs

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

This reverts commit 131c9f916dae6345ec77a869112ae5901b568f23.

This actually doesn't have coverage in Jenkins and we've discovered 
breakage. Should be easy to patch up and roll forwards in a bit.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/beam revert-fn-api

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2172.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2172


commit b31be1caef6a691006082e3e440d0b42ff1d4165
Author: Kenneth Knowles 
Date:   2017-03-06T04:03:18Z

Revert "DataflowRunner: experimental support for issuing FnAPI based jobs"

This reverts commit 131c9f916dae6345ec77a869112ae5901b568f23.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (BEAM-632) Dataflow runner does not correctly flatten duplicate inputs

2017-03-06 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-632:
-
Priority: Critical  (was: Major)

> Dataflow runner does not correctly flatten duplicate inputs
> ---
>
> Key: BEAM-632
> URL: https://issues.apache.org/jira/browse/BEAM-632
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Daniel Halperin
>Assignee: Kenneth Knowles
>Priority: Critical
>
> https://github.com/apache/incubator-beam/pull/960
> Builds #1148+ are failing the new test that [~tgroh] added in that PR.
> https://builds.apache.org/job/beam_PostCommit_RunnableOnService_GoogleCloudDataflow/changes



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (BEAM-632) Dataflow runner does not correctly flatten duplicate inputs

2017-03-06 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-632:


Assignee: Kenneth Knowles

> Dataflow runner does not correctly flatten duplicate inputs
> ---
>
> Key: BEAM-632
> URL: https://issues.apache.org/jira/browse/BEAM-632
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Daniel Halperin
>Assignee: Kenneth Knowles
>
> https://github.com/apache/incubator-beam/pull/960
> Builds #1148+ are failing the new test that [~tgroh] added in that PR.
> https://builds.apache.org/job/beam_PostCommit_RunnableOnService_GoogleCloudDataflow/changes



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-632) Dataflow runner does not correctly flatten duplicate inputs

2017-03-06 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897856#comment-15897856
 ] 

Kenneth Knowles commented on BEAM-632:
--

Bumping this because the test is wholly suppressed, so we've missed other 
issues. We need to surgically isolate the tests that don't pass and then we can 
proceed with workarounds in the translation.

> Dataflow runner does not correctly flatten duplicate inputs
> ---
>
> Key: BEAM-632
> URL: https://issues.apache.org/jira/browse/BEAM-632
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Daniel Halperin
>Assignee: Kenneth Knowles
>Priority: Critical
>
> https://github.com/apache/incubator-beam/pull/960
> Builds #1148+ are failing the new test that [~tgroh] added in that PR.
> https://builds.apache.org/job/beam_PostCommit_RunnableOnService_GoogleCloudDataflow/changes



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-1629) Make metrics/aggregators accumulators available on driver

2017-03-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897853#comment-15897853
 ] 

ASF GitHub Bot commented on BEAM-1629:
--

GitHub user aviemzur opened a pull request:

https://github.com/apache/beam/pull/2171

[BEAM-1629] Make metrics/aggregators accumulators available on driver

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aviemzur/beam 
fix-accumulator-singleton-instantiation-race

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2171.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2171


commit 5c4cb5ee153fd7ca069d90d3ac558bceaf82678c
Author: Aviem Zur 
Date:   2017-03-06T18:48:48Z

[BEAM-1629] Make metrics/aggregators accumulators available on driver




> Make metrics/aggregators accumulators available on driver
> -
>
> Key: BEAM-1629
> URL: https://issues.apache.org/jira/browse/BEAM-1629
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> Today aggregators and metrics accumulators are instantiated after pipeline is 
> traversed, and so, if code that runs on driver accesses these singletons but 
> is not part of a closure a {{NullPointerException}} is thrown.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Jenkins build became unstable: beam_PostCommit_Java_RunnableOnService_Spark #1152

2017-03-06 Thread Apache Jenkins Server

See

[jira] [Updated] (BEAM-1629) Make metrics/aggregators accumulators available on driver

2017-03-06 Thread Aviem Zur (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1629:

Description: Today aggregators and metrics accumulators are instantiated 
after pipeline is traversed, and so, if code that runs on driver accesses these 
singletons but is not part of a closure a {{NullPointerException}} is thrown.  
(was: Today aggregators and metrics accumulators are instantiated after 
pipeline is traversed, and so, if transform translations access these 
singletons in code which runs on the driver, but is not part of a closure, a 
{{NullPointerException}} is thrown.)

> Make metrics/aggregators accumulators available on driver
> -
>
> Key: BEAM-1629
> URL: https://issues.apache.org/jira/browse/BEAM-1629
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> Today aggregators and metrics accumulators are instantiated after pipeline is 
> traversed, and so, if code that runs on driver accesses these singletons but 
> is not part of a closure a {{NullPointerException}} is thrown.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-1573) KafkaIO does not allow using Kafka serializers and deserializers

2017-03-06 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897824#comment-15897824
 ] 

Raghu Angadi commented on BEAM-1573:


New API ({{withKeySerializer() & withValueSerializer()}}) sounds good. We can 
mark the old API deprecated and also provide Coder based Kafka Serializer and 
Deserializer for if users still want to use the coders (say for transition).

Implementation wise, note that Kafka deserializer would run on Kafka consumer 
thread, which is outside normal 'advance()' (invoked by the runner on the 
reader). That implies we need to propagate serialization errors appropriately 
and throw them in advance(). Alternately we could invoke deserializer 
explicitly inside advance() rather than 'consumer poll thread', not sure if 
there are any drawbacks to that.

[~peay] PR will be useful.

> KafkaIO does not allow using Kafka serializers and deserializers
> 
>
> Key: BEAM-1573
> URL: https://issues.apache.org/jira/browse/BEAM-1573
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Affects Versions: 0.4.0, 0.5.0
>Reporter: peay
>Assignee: Raghu Angadi
>Priority: Minor
>
> KafkaIO does not allow to override the serializer and deserializer settings 
> of the Kafka consumer and producers it uses internally. Instead, it allows to 
> set a `Coder`, and has a simple Kafka serializer/deserializer wrapper class 
> that calls the coder.
> I appreciate that allowing to use Beam coders is good and consistent with the 
> rest of the system. However, is there a reason to completely disallow to use 
> custom Kafka serializers instead?
> This is a limitation when working with an Avro schema registry for instance, 
> which requires custom serializers. One can write a `Coder` that wraps a 
> custom Kafka serializer, but that means two levels of un-necessary wrapping.
> In addition, the `Coder` abstraction is not equivalent to Kafka's 
> `Serializer` which gets the topic name as input. Using a `Coder` wrapper 
> would require duplicating the output topic setting in the argument to 
> `KafkaIO` and when building the wrapper, which is not elegant and error prone.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (BEAM-1573) KafkaIO does not allow using Kafka serializers and deserializers

2017-03-06 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896948#comment-15896948
 ] 

Raghu Angadi edited comment on BEAM-1573 at 3/6/17 6:40 PM:


@peay, 
There are two levels of solutions to deserializer (and serializer): 
  # Reasonable ways to use of custom Kafka deserializers & serializers
 ** This is very feasible now, including the case when you are reading from 
multiple topics.
  # Update to KafkaIO API to pass Kafka serializers directly to the Kafka 
consumer.
  ** We might end up doing this, not exactly how you proposed, but rather 
replacing coders with Kafka (de)serializers. There is no need to include both I 
think. 
  ** There is a discussion on Beam mailing lists about removing use of 
coders directly in sources and other places and that might be right time to add 
this support. (cc [~jkff])

Are you more interested 1 or 2? 

One way to use any Kafka serializer (for (1)): 
{code}
PCollection kafkaRecords = // Note that KafkaRecord 
include topic name, partition etc.
 pipeline
.apply(KafkaIO.read()
.withBootstrapServers("broker_1:9092,broker_2:9092")
.withTopics(ImmutableList.of("topic_a")));

kafkaRecords.apply( ParDo.of(new DoFn, 
MyAvroRecord>) {
 
   private final Map config = // config 
   private transient Deserializer kafkaDeserializer;
   @Setup
   public void setup() {
  kafkaDeserializer = new MyDeserializer();
 kafkaDeserializer.configure(config) // kafka config (serializable map)
}

   @ProcessElement
public void procesElement(Context context) {
   MyAvroRecord record = 
kafkaDeserializer.deserialize(context.element().getTopic(), 
context.element().getValue())
   context.output(record);
   }
 
   @TearDown
   public void tearDown() {
 kafkaDeserializer.close();
   }
}))

{code}

   


was (Author: rangadi):
@peay, 
There are two levels of solutions to deserializer (and serializer): 
  # Reasonable ways to use of custom Kafka deserializers & serializers
* This is very feasible now, including the case when you are reading from 
multiple topics.
  # Update to KafkaIO API to pass Kafka serializers directly to the Kafka 
consumer.
 * We might end up doing this, not exactly how you proposed, but rather 
replacing coders with Kafka (de)serializers. There is no need to include both I 
think. 
 * There is a discussion on Beam mailing lists about removing use of coders 
directly in sources and other places and that might be right time to add this 
support. (cc [~jkff])

Are you more interested 1 or 2? 

One way to use any Kafka serializer (for (1)): 
{code}
PCollection kafkaRecords = // Note that KafkaRecord 
include topic name, partition etc.
 pipeline
.apply(KafkaIO.read()
.withBootstrapServers("broker_1:9092,broker_2:9092")
.withTopics(ImmutableList.of("topic_a")));

kafkaRecords.apply( ParDo.of(new DoFn, 
MyAvroRecord) {
 
   private final Map config = // config 
   private transient Deserializer kafkaDeserializer;
   @Setup
   public void setup() {
  kafkaDeserializer = new MyDeserializer();
 kafkaDeserializer.configure(config) // kafka config (serializable map)
}

   @ProcessElement
public void procesElement(Context context) {
   MyAvroRecord record = 
kafkaDeserializer.deserialize(context.element().getTopic(), 
context.element().getValue())
   context.output(record);
   }
 
   @TearDown
   public void tearDown() {
 kafkaDeserializer.close();
   }
}))

{code}

   

> KafkaIO does not allow using Kafka serializers and deserializers
> 
>
> Key: BEAM-1573
> URL: https://issues.apache.org/jira/browse/BEAM-1573
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Affects Versions: 0.4.0, 0.5.0
>Reporter: peay
>Assignee: Raghu Angadi
>Priority: Minor
>
> KafkaIO does not allow to override the serializer and deserializer settings 
> of the Kafka consumer and producers it uses internally. Instead, it allows to 
> set a `Coder`, and has a simple Kafka serializer/deserializer wrapper class 
> that calls the coder.
> I appreciate that allowing to use Beam coders is good and consistent with the 
> rest of the system. However, is there a reason to completely disallow to use 
> custom Kafka serializers instead?
> This is a limitation when working with an Avro schema registry for instance, 
> which requires custom serializers. One can write a `Coder` that wraps a 
> custom Kafka serializer, but that means two levels of un-necessary wrapping.
> In addition, the `Coder` abstraction is not equivalent to Kafka's 
>

[jira] [Comment Edited] (BEAM-1573) KafkaIO does not allow using Kafka serializers and deserializers

2017-03-06 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896948#comment-15896948
 ] 

Raghu Angadi edited comment on BEAM-1573 at 3/6/17 6:40 PM:


@peay, 
There are two levels of solutions to deserializer (and serializer): 
  # Reasonable ways to make use of custom Kafka deserializers & serializers
 ** This is very feasible now, including the case when you are reading from 
multiple topics.
  # Update to KafkaIO API to pass Kafka serializers directly to the Kafka 
consumer.
  ** We might end up doing this, not exactly how you proposed, but rather 
replacing coders with Kafka (de)serializers. There is no need to include both I 
think. 
  ** There is a discussion on Beam mailing lists about removing use of 
coders directly in sources and other places and that might be right time to add 
this support. (cc [~jkff])

Are you more interested 1 or 2? 

One way to use any Kafka serializer (for (1)): 
{code}
PCollection kafkaRecords = // Note that KafkaRecord 
include topic name, partition etc.
 pipeline
.apply(KafkaIO.read()
.withBootstrapServers("broker_1:9092,broker_2:9092")
.withTopics(ImmutableList.of("topic_a")));

kafkaRecords.apply( ParDo.of(new DoFn, 
MyAvroRecord>) {
 
   private final Map config = // config 
   private transient Deserializer kafkaDeserializer;
   @Setup
   public void setup() {
  kafkaDeserializer = new MyDeserializer();
 kafkaDeserializer.configure(config) // kafka config (serializable map)
}

   @ProcessElement
public void procesElement(Context context) {
   MyAvroRecord record = 
kafkaDeserializer.deserialize(context.element().getTopic(), 
context.element().getValue())
   context.output(record);
   }
 
   @TearDown
   public void tearDown() {
 kafkaDeserializer.close();
   }
}))

{code}

   


was (Author: rangadi):
@peay, 
There are two levels of solutions to deserializer (and serializer): 
  # Reasonable ways to use of custom Kafka deserializers & serializers
 ** This is very feasible now, including the case when you are reading from 
multiple topics.
  # Update to KafkaIO API to pass Kafka serializers directly to the Kafka 
consumer.
  ** We might end up doing this, not exactly how you proposed, but rather 
replacing coders with Kafka (de)serializers. There is no need to include both I 
think. 
  ** There is a discussion on Beam mailing lists about removing use of 
coders directly in sources and other places and that might be right time to add 
this support. (cc [~jkff])

Are you more interested 1 or 2? 

One way to use any Kafka serializer (for (1)): 
{code}
PCollection kafkaRecords = // Note that KafkaRecord 
include topic name, partition etc.
 pipeline
.apply(KafkaIO.read()
.withBootstrapServers("broker_1:9092,broker_2:9092")
.withTopics(ImmutableList.of("topic_a")));

kafkaRecords.apply( ParDo.of(new DoFn, 
MyAvroRecord>) {
 
   private final Map config = // config 
   private transient Deserializer kafkaDeserializer;
   @Setup
   public void setup() {
  kafkaDeserializer = new MyDeserializer();
 kafkaDeserializer.configure(config) // kafka config (serializable map)
}

   @ProcessElement
public void procesElement(Context context) {
   MyAvroRecord record = 
kafkaDeserializer.deserialize(context.element().getTopic(), 
context.element().getValue())
   context.output(record);
   }
 
   @TearDown
   public void tearDown() {
 kafkaDeserializer.close();
   }
}))

{code}

   

> KafkaIO does not allow using Kafka serializers and deserializers
> 
>
> Key: BEAM-1573
> URL: https://issues.apache.org/jira/browse/BEAM-1573
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Affects Versions: 0.4.0, 0.5.0
>Reporter: peay
>Assignee: Raghu Angadi
>Priority: Minor
>
> KafkaIO does not allow to override the serializer and deserializer settings 
> of the Kafka consumer and producers it uses internally. Instead, it allows to 
> set a `Coder`, and has a simple Kafka serializer/deserializer wrapper class 
> that calls the coder.
> I appreciate that allowing to use Beam coders is good and consistent with the 
> rest of the system. However, is there a reason to completely disallow to use 
> custom Kafka serializers instead?
> This is a limitation when working with an Avro schema registry for instance, 
> which requires custom serializers. One can write a `Coder` that wraps a 
> custom Kafka serializer, but that means two levels of un-necessary wrapping.
> In addition, the `Coder` abstraction is not equivalent to

[jira] [Commented] (BEAM-1624) Unable to deserialize Coder in DataflowRunner

2017-03-06 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897813#comment-15897813
 ] 

Kenneth Knowles commented on BEAM-1624:
---

Most likely the coder getting caught in an inner class getting serialized. I'll 
take this.

> Unable to deserialize Coder in DataflowRunner
> -
>
> Key: BEAM-1624
> URL: https://issues.apache.org/jira/browse/BEAM-1624
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Frances Perry
>Assignee: Kenneth Knowles
>Priority: Blocker
> Fix For: 0.6.0
>
>
> To repro, sync to head and run the LeaderBoard example with the Dataflow 
> runner
> Does not repro in 0.5.
> Caused by: java.lang.RuntimeException: Unable to deserialize Coder: 
> WindowedValue$FullWindowedValueCoder(KvCoder(BigQueryIO$ShardedKeyCoder(StringUtf8Coder),BigQueryIO$TableRowInfoCoder),IntervalWindow$IntervalWindowCoder).
>  Check that a suitable constructor is defined.  See Coder for details.
>   at 
> org.apache.beam.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:115)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$StepTranslator.addOutput(DataflowPipelineTranslator.java:655)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$StepTranslator.addOutput(DataflowPipelineTranslator.java:602)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translateOutputs(DataflowPipelineTranslator.java:945)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator.access$1200(DataflowPipelineTranslator.java:111)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translateMultiHelper(DataflowPipelineTranslator.java:836)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translate(DataflowPipelineTranslator.java:826)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translate(DataflowPipelineTranslator.java:823)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.visitPrimitiveTransform(DataflowPipelineTranslator.java:413)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:486)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.access$400(TransformHierarchy.java:231)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:206)
>   at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:321)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.translate(DataflowPipelineTranslator.java:363)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translate(DataflowPipelineTranslator.java:153)
>   at 
> org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:505)
>   at 
> org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:150)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:210)
>   at 
> org.apache.beam.examples.complete.game.GameStats.main(GameStats.java:340)
>   ... 6 more
> Caused by: java.lang.RuntimeException: Unable to deserialize class interface 
> org.apache.beam.sdk.coders.Coder
>   at org.apache.beam.sdk.util.Serializer.deserialize(Serializer.java:102)
>   at 
> org.apache.beam.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:112)
>   ... 29 more



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (BEAM-1624) Unable to deserialize Coder in DataflowRunner

2017-03-06 Thread Davor Bonaci (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-1624:
--

Assignee: Kenneth Knowles  (was: Davor Bonaci)

> Unable to deserialize Coder in DataflowRunner
> -
>
> Key: BEAM-1624
> URL: https://issues.apache.org/jira/browse/BEAM-1624
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Frances Perry
>Assignee: Kenneth Knowles
>Priority: Blocker
> Fix For: 0.6.0
>
>
> To repro, sync to head and run the LeaderBoard example with the Dataflow 
> runner
> Does not repro in 0.5.
> Caused by: java.lang.RuntimeException: Unable to deserialize Coder: 
> WindowedValue$FullWindowedValueCoder(KvCoder(BigQueryIO$ShardedKeyCoder(StringUtf8Coder),BigQueryIO$TableRowInfoCoder),IntervalWindow$IntervalWindowCoder).
>  Check that a suitable constructor is defined.  See Coder for details.
>   at 
> org.apache.beam.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:115)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$StepTranslator.addOutput(DataflowPipelineTranslator.java:655)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$StepTranslator.addOutput(DataflowPipelineTranslator.java:602)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translateOutputs(DataflowPipelineTranslator.java:945)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator.access$1200(DataflowPipelineTranslator.java:111)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translateMultiHelper(DataflowPipelineTranslator.java:836)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translate(DataflowPipelineTranslator.java:826)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translate(DataflowPipelineTranslator.java:823)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.visitPrimitiveTransform(DataflowPipelineTranslator.java:413)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:486)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.access$400(TransformHierarchy.java:231)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:206)
>   at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:321)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.translate(DataflowPipelineTranslator.java:363)
>   at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translate(DataflowPipelineTranslator.java:153)
>   at 
> org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:505)
>   at 
> org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:150)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:210)
>   at 
> org.apache.beam.examples.complete.game.GameStats.main(GameStats.java:340)
>   ... 6 more
> Caused by: java.lang.RuntimeException: Unable to deserialize class interface 
> org.apache.beam.sdk.coders.Coder
>   at org.apache.beam.sdk.util.Serializer.deserialize(Serializer.java:102)
>   at 
> org.apache.beam.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:112)
>   ... 29 more



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-1573) KafkaIO does not allow using Kafka serializers and deserializers

2017-03-06 Thread Eugene Kirpichov (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1589#comment-1589
 ] 

Eugene Kirpichov commented on BEAM-1573:


Yes, this would be a backward-incompatible change. We are sort of fine with 
making them if there is a good reason and before declaring Beam stable API 
(first 1.x release), and this would be one of a family of changes bringing beam 
in accordance with its own style guide - which we consider a necessary evil. We 
already removed coders from TextIO as part of that.

So yes, please feel free to proceed, and I'll be happy to review the PR.

> KafkaIO does not allow using Kafka serializers and deserializers
> 
>
> Key: BEAM-1573
> URL: https://issues.apache.org/jira/browse/BEAM-1573
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Affects Versions: 0.4.0, 0.5.0
>Reporter: peay
>Assignee: Raghu Angadi
>Priority: Minor
>
> KafkaIO does not allow to override the serializer and deserializer settings 
> of the Kafka consumer and producers it uses internally. Instead, it allows to 
> set a `Coder`, and has a simple Kafka serializer/deserializer wrapper class 
> that calls the coder.
> I appreciate that allowing to use Beam coders is good and consistent with the 
> rest of the system. However, is there a reason to completely disallow to use 
> custom Kafka serializers instead?
> This is a limitation when working with an Avro schema registry for instance, 
> which requires custom serializers. One can write a `Coder` that wraps a 
> custom Kafka serializer, but that means two levels of un-necessary wrapping.
> In addition, the `Coder` abstraction is not equivalent to Kafka's 
> `Serializer` which gets the topic name as input. Using a `Coder` wrapper 
> would require duplicating the output topic setting in the argument to 
> `KafkaIO` and when building the wrapper, which is not elegant and error prone.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (BEAM-1573) KafkaIO does not allow using Kafka serializers and deserializers

2017-03-06 Thread peay (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897752#comment-15897752
 ] 

peay edited comment on BEAM-1573 at 3/6/17 6:15 PM:


[~rangadi] OK, that makes sense. I was hoping to try and keep everything as a 
single step, for instance to be able to leverage {{withTimestampFn}}, but I'll 
go with (1) for now.

If the long term plan is to remove the use of coders in read/write and allow to 
pass in Kafka serializers directly, this was my original point, so all the 
better. I am happy to work on a PR for that if you want me to. I think 
{{KafkaIO}} can still provide a typed reader/writer with 
{{}withCustomKafkaValueSerializer}} like methods, to avoid the extraneous ParDo 
and having to call a utility to get something else than `byte[]`, which is 
assume is often going to be the case.

The main issue I see is that removing {{withValueCoder}} and so on will break 
API compatibility, not sure what the project's policy is on that [~jkff]? A 
deprecation phase of a couple releases, and then breaking changes?


was (Author: peay):
[~rangadi] OK, that makes sense. I was hoping to try and keep everything as a 
single step, for instance to be able to leverage {code}withTimestampFn{code}, 
but I'll go with (1) for now.

If the long term plan is to remove the use of coders in read/write and allow to 
pass in Kafka serializers directly, this was my original point, so all the 
better. I am happy to work on a PR for that if you want me to. I think 
{code}KafkaIO{code} can still provide a typed reader/writer with 
{code}withCustomKafkaValueSerializer{code} like methods, to avoid the 
extraneous ParDo and having to call a utility to get something else than 
`byte[]`, which is assume is often going to be the case.

The main issue I see is that removing {code}withValueCoder{code} and so on will 
break API compatibility, not sure what the project's policy is on that [~jkff]? 
A deprecation phase of a couple releases, and then breaking changes?

> KafkaIO does not allow using Kafka serializers and deserializers
> 
>
> Key: BEAM-1573
> URL: https://issues.apache.org/jira/browse/BEAM-1573
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Affects Versions: 0.4.0, 0.5.0
>Reporter: peay
>Assignee: Raghu Angadi
>Priority: Minor
>
> KafkaIO does not allow to override the serializer and deserializer settings 
> of the Kafka consumer and producers it uses internally. Instead, it allows to 
> set a `Coder`, and has a simple Kafka serializer/deserializer wrapper class 
> that calls the coder.
> I appreciate that allowing to use Beam coders is good and consistent with the 
> rest of the system. However, is there a reason to completely disallow to use 
> custom Kafka serializers instead?
> This is a limitation when working with an Avro schema registry for instance, 
> which requires custom serializers. One can write a `Coder` that wraps a 
> custom Kafka serializer, but that means two levels of un-necessary wrapping.
> In addition, the `Coder` abstraction is not equivalent to Kafka's 
> `Serializer` which gets the topic name as input. Using a `Coder` wrapper 
> would require duplicating the output topic setting in the argument to 
> `KafkaIO` and when building the wrapper, which is not elegant and error prone.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (BEAM-1573) KafkaIO does not allow using Kafka serializers and deserializers

2017-03-06 Thread peay (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

peay updated BEAM-1573:
---
Issue Type: Improvement  (was: Bug)

> KafkaIO does not allow using Kafka serializers and deserializers
> 
>
> Key: BEAM-1573
> URL: https://issues.apache.org/jira/browse/BEAM-1573
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Affects Versions: 0.4.0, 0.5.0
>Reporter: peay
>Assignee: Raghu Angadi
>Priority: Minor
>
> KafkaIO does not allow to override the serializer and deserializer settings 
> of the Kafka consumer and producers it uses internally. Instead, it allows to 
> set a `Coder`, and has a simple Kafka serializer/deserializer wrapper class 
> that calls the coder.
> I appreciate that allowing to use Beam coders is good and consistent with the 
> rest of the system. However, is there a reason to completely disallow to use 
> custom Kafka serializers instead?
> This is a limitation when working with an Avro schema registry for instance, 
> which requires custom serializers. One can write a `Coder` that wraps a 
> custom Kafka serializer, but that means two levels of un-necessary wrapping.
> In addition, the `Coder` abstraction is not equivalent to Kafka's 
> `Serializer` which gets the topic name as input. Using a `Coder` wrapper 
> would require duplicating the output topic setting in the argument to 
> `KafkaIO` and when building the wrapper, which is not elegant and error prone.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-1573) KafkaIO does not allow using Kafka serializers and deserializers

2017-03-06 Thread peay (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897752#comment-15897752
 ] 

peay commented on BEAM-1573:


[~rangadi] OK, that makes sense. I was hoping to try and keep everything as a 
single step, for instance to be able to leverage {code}withTimestampFn{code}, 
but I'll go with (1) for now.

If the long term plan is to remove the use of coders in read/write and allow to 
pass in Kafka serializers directly, this was my original point, so all the 
better. I am happy to work on a PR for that if you want me to. I think 
{code}KafkaIO{code} can still provide a typed reader/writer with 
{code}withCustomKafkaValueSerializer{code} like methods, to avoid the 
extraneous ParDo and having to call a utility to get something else than 
`byte[]`, which is assume is often going to be the case.

The main issue I see is that removing {code}withValueCoder{code} and so on will 
break API compatibility, not sure what the project's policy is on that [~jkff]? 
A deprecation phase of a couple releases, and then breaking changes?

> KafkaIO does not allow using Kafka serializers and deserializers
> 
>
> Key: BEAM-1573
> URL: https://issues.apache.org/jira/browse/BEAM-1573
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Affects Versions: 0.4.0, 0.5.0
>Reporter: peay
>Assignee: Raghu Angadi
>Priority: Minor
>
> KafkaIO does not allow to override the serializer and deserializer settings 
> of the Kafka consumer and producers it uses internally. Instead, it allows to 
> set a `Coder`, and has a simple Kafka serializer/deserializer wrapper class 
> that calls the coder.
> I appreciate that allowing to use Beam coders is good and consistent with the 
> rest of the system. However, is there a reason to completely disallow to use 
> custom Kafka serializers instead?
> This is a limitation when working with an Avro schema registry for instance, 
> which requires custom serializers. One can write a `Coder` that wraps a 
> custom Kafka serializer, but that means two levels of un-necessary wrapping.
> In addition, the `Coder` abstraction is not equivalent to Kafka's 
> `Serializer` which gets the topic name as input. Using a `Coder` wrapper 
> would require duplicating the output topic setting in the argument to 
> `KafkaIO` and when building the wrapper, which is not elegant and error prone.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (BEAM-1630) Add Splittable DoFn to Python SDK

2017-03-06 Thread Chamikara Jayalath (JIRA)

Chamikara Jayalath created BEAM-1630:


 Summary: Add Splittable DoFn to Python SDK
 Key: BEAM-1630
 URL: https://issues.apache.org/jira/browse/BEAM-1630
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py
Reporter: Chamikara Jayalath
Assignee: Chamikara Jayalath


Splittable DoFn [1] is currently being implemented for Java SDK [2]. We should 
add this to Python SDK as well.

Following document proposes an API for this.

https://docs.google.com/document/d/1h_zprJrOilivK2xfvl4L42vaX4DMYGfH1YDmi-s_ozM/edit?usp=sharing

[1] https://s.apache.org/splittable-do-fn
[2] 
https://lists.apache.org/thread.html/0ce61ac162460a149d5c93cdface37cc383f8030fe86ca09e5699b18@%3Cdev.beam.apache.org%3E




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (BEAM-1627) Composite/DisplayData structure changed

2017-03-06 Thread Thomas Groh (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Groh resolved BEAM-1627.
---
Resolution: Fixed

> Composite/DisplayData structure changed
> ---
>
> Key: BEAM-1627
> URL: https://issues.apache.org/jira/browse/BEAM-1627
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Frances Perry
>Assignee: Thomas Groh
>Priority: Blocker
> Fix For: 0.6.0
>
> Attachments: FixedWindows-0.5.png, 
> FixedWindows-snapshot-extraComposite-noDisplayData.png, ParseGame-0.5.png, 
> ParseGame-snapshot-extraComposite.png
>
>
> When running at head, pipeline composite structure has changed. My guess is 
> this is related to pull/2145. 
> (1) Steps that used to be leaf notes are now expandable composites with a 
> ParMultiDo inside them.
> (2) For some (but not all) display data appears to be lost
> This can be seen pretty clearly in the Dataflow monitoring UI. Attached 
> screenshots showing
> -- ParseGameEvent transform leaks an extra level of composite.
> -- FixedWindows transform leaks an extra composite and loses display data.
> [~tgroh] can you triage?
> [~altay] FYI potential 0.6 release blocker



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Jenkins build is back to stable : beam_PostCommit_Java_MavenInstall #2826

2017-03-06 Thread Apache Jenkins Server

See

[jira] [Commented] (BEAM-1627) Composite/DisplayData structure changed

2017-03-06 Thread Davor Bonaci (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897720#comment-15897720
 ] 

Davor Bonaci commented on BEAM-1627:


[~tgroh], now that the offending PR is reverted, this can be resolved?

> Composite/DisplayData structure changed
> ---
>
> Key: BEAM-1627
> URL: https://issues.apache.org/jira/browse/BEAM-1627
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Frances Perry
>Assignee: Thomas Groh
>Priority: Blocker
> Fix For: 0.6.0
>
> Attachments: FixedWindows-0.5.png, 
> FixedWindows-snapshot-extraComposite-noDisplayData.png, ParseGame-0.5.png, 
> ParseGame-snapshot-extraComposite.png
>
>
> When running at head, pipeline composite structure has changed. My guess is 
> this is related to pull/2145. 
> (1) Steps that used to be leaf notes are now expandable composites with a 
> ParMultiDo inside them.
> (2) For some (but not all) display data appears to be lost
> This can be seen pretty clearly in the Dataflow monitoring UI. Attached 
> screenshots showing
> -- ParseGameEvent transform leaks an extra level of composite.
> -- FixedWindows transform leaks an extra composite and loses display data.
> [~tgroh] can you triage?
> [~altay] FYI potential 0.6 release blocker



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (BEAM-649) Cache any PCollection implementation if accessed more than once

2017-03-06 Thread Davor Bonaci (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci updated BEAM-649:
--
Fix Version/s: 0.6.0

> Cache any PCollection implementation if accessed more than once
> ---
>
> Key: BEAM-649
> URL: https://issues.apache.org/jira/browse/BEAM-649
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Jean-Baptiste Onofré
> Fix For: 0.6.0
>
>
> Currently, the runner will cache any {{PCollection}} implementation - {{RDD}} 
> or {{DStream}} - if accessed for the second time.
> This can be further optimized to cache after the first evaluation, if 
> accessed again, and also solve issues in BEAM-1206.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-1627) Composite/DisplayData structure changed

2017-03-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897661#comment-15897661
 ] 

ASF GitHub Bot commented on BEAM-1627:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2170


> Composite/DisplayData structure changed
> ---
>
> Key: BEAM-1627
> URL: https://issues.apache.org/jira/browse/BEAM-1627
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Frances Perry
>Assignee: Thomas Groh
>Priority: Blocker
> Fix For: 0.6.0
>
> Attachments: FixedWindows-0.5.png, 
> FixedWindows-snapshot-extraComposite-noDisplayData.png, ParseGame-0.5.png, 
> ParseGame-snapshot-extraComposite.png
>
>
> When running at head, pipeline composite structure has changed. My guess is 
> this is related to pull/2145. 
> (1) Steps that used to be leaf notes are now expandable composites with a 
> ParMultiDo inside them.
> (2) For some (but not all) display data appears to be lost
> This can be seen pretty clearly in the Dataflow monitoring UI. Attached 
> screenshots showing
> -- ParseGameEvent transform leaks an extra level of composite.
> -- FixedWindows transform leaks an extra composite and loses display data.
> [~tgroh] can you triage?
> [~altay] FYI potential 0.6 release blocker



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[3/3] beam git commit: This closes #2170

2017-03-06 Thread tgroh

This closes #2170


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/9cc8018b
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/9cc8018b
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/9cc8018b

Branch: refs/heads/master
Commit: 9cc8018b3ed945244bb311134ebd824016d1633f
Parents: 34b38ef 8766b03
Author: Thomas Groh 
Authored: Mon Mar 6 09:09:10 2017 -0800
Committer: Thomas Groh 
Committed: Mon Mar 6 09:09:10 2017 -0800

--
 .../translation/ApexPipelineTranslator.java |   3 +-
 .../translation/ParDoBoundMultiTranslator.java  | 185 ++
 .../apex/translation/ParDoBoundTranslator.java  |  95 +
 .../apex/translation/ParDoTranslator.java   | 185 --
 .../FlattenPCollectionTranslatorTest.java   |   3 +-
 .../translation/ParDoBoundTranslatorTest.java   | 344 +++
 .../apex/translation/ParDoTranslatorTest.java   | 344 ---
 .../beam/runners/direct/DirectRunner.java   |  18 +-
 .../ParDoSingleViaMultiOverrideFactory.java |  70 
 .../ParDoSingleViaMultiOverrideFactoryTest.java |  46 +++
 .../flink/FlinkBatchTransformTranslators.java   |  78 -
 .../FlinkStreamingTransformTranslators.java | 113 +-
 .../dataflow/DataflowPipelineTranslator.java|  29 ++
 .../DataflowPipelineTranslatorTest.java |   7 +-
 .../spark/translation/TransformTranslator.java  | 100 +++---
 .../streaming/StreamingTransformTranslator.java | 115 +++
 .../streaming/TrackStreamingSourcesTest.java|   4 +-
 .../org/apache/beam/sdk/transforms/ParDo.java   |   8 +-
 18 files changed, 1079 insertions(+), 668 deletions(-)
--

[GitHub] beam pull request #2170: [BEAM-1627] Revert "Implement Single-Output ParDo a...

2017-03-06 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2170


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[2/3] beam git commit: Revert "Implement Single-Output ParDo as a composite"

2017-03-06 Thread tgroh

Revert "Implement Single-Output ParDo as a composite"

This reverts commit 6253abaac62979e8496a828c18c7d1aa7214be6a.

The reverted commit breaks Dataflow DisplayData.

The actual fix will include a Dataflow override for single-output
ParDos.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/8766b03e
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/8766b03e
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/8766b03e

Branch: refs/heads/master
Commit: 8766b03eb31b4f16de8fbf5a6902378a2c1151e0
Parents: 34b38ef
Author: Thomas Groh 
Authored: Mon Mar 6 08:55:13 2017 -0800
Committer: Thomas Groh 
Committed: Mon Mar 6 08:55:13 2017 -0800

--
 .../translation/ApexPipelineTranslator.java |   3 +-
 .../translation/ParDoBoundMultiTranslator.java  | 185 ++
 .../apex/translation/ParDoBoundTranslator.java  |  95 +
 .../apex/translation/ParDoTranslator.java   | 185 --
 .../FlattenPCollectionTranslatorTest.java   |   3 +-
 .../translation/ParDoBoundTranslatorTest.java   | 344 +++
 .../apex/translation/ParDoTranslatorTest.java   | 344 ---
 .../beam/runners/direct/DirectRunner.java   |  18 +-
 .../ParDoSingleViaMultiOverrideFactory.java |  70 
 .../ParDoSingleViaMultiOverrideFactoryTest.java |  46 +++
 .../flink/FlinkBatchTransformTranslators.java   |  78 -
 .../FlinkStreamingTransformTranslators.java | 113 +-
 .../dataflow/DataflowPipelineTranslator.java|  29 ++
 .../DataflowPipelineTranslatorTest.java |   7 +-
 .../spark/translation/TransformTranslator.java  | 100 +++---
 .../streaming/StreamingTransformTranslator.java | 115 +++
 .../streaming/TrackStreamingSourcesTest.java|   4 +-
 .../org/apache/beam/sdk/transforms/ParDo.java   |   8 +-
 18 files changed, 1079 insertions(+), 668 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/8766b03e/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/ApexPipelineTranslator.java
--
diff --git 
a/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/ApexPipelineTranslator.java
 
b/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/ApexPipelineTranslator.java
index 7eb9551..951a286 100644
--- 
a/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/ApexPipelineTranslator.java
+++ 
b/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/ApexPipelineTranslator.java
@@ -59,7 +59,8 @@ public class ApexPipelineTranslator implements 
Pipeline.PipelineVisitor {
 
   static {
 // register TransformTranslators
-registerTransformTranslator(ParDo.BoundMulti.class, new 
ParDoTranslator<>());
+registerTransformTranslator(ParDo.Bound.class, new ParDoBoundTranslator());
+registerTransformTranslator(ParDo.BoundMulti.class, new 
ParDoBoundMultiTranslator<>());
 registerTransformTranslator(Read.Unbounded.class, new 
ReadUnboundedTranslator());
 registerTransformTranslator(Read.Bounded.class, new 
ReadBoundedTranslator());
 registerTransformTranslator(GroupByKey.class, new GroupByKeyTranslator());

http://git-wip-us.apache.org/repos/asf/beam/blob/8766b03e/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/ParDoBoundMultiTranslator.java
--
diff --git 
a/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/ParDoBoundMultiTranslator.java
 
b/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/ParDoBoundMultiTranslator.java
new file mode 100644
index 000..f55b48c
--- /dev/null
+++ 
b/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/ParDoBoundMultiTranslator.java
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.apex.translation;
+
+import static com.google.common.base.Preconditions.checkArgument;
+
+import

[1/3] beam git commit: Revert "Implement Single-Output ParDo as a composite"

2017-03-06 Thread tgroh

Repository: beam
Updated Branches:
  refs/heads/master 34b38ef95 -> 9cc8018b3


http://git-wip-us.apache.org/repos/asf/beam/blob/8766b03e/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/StreamingTransformTranslator.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/StreamingTransformTranslator.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/StreamingTransformTranslator.java
index 31307cc..ccf84b2 100644
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/StreamingTransformTranslator.java
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/StreamingTransformTranslator.java
@@ -332,20 +332,58 @@ final class StreamingTransformTranslator {
 };
   }
 
+  private static  TransformEvaluator> parDo() {
+return new TransformEvaluator>() {
+  @Override
+  public void evaluate(final ParDo.Bound transform,
+   final EvaluationContext context) {
+final DoFn doFn = transform.getFn();
+rejectSplittable(doFn);
+rejectStateAndTimers(doFn);
+final SparkRuntimeContext runtimeContext = context.getRuntimeContext();
+final WindowingStrategy windowingStrategy =
+context.getInput(transform).getWindowingStrategy();
+final SparkPCollectionView pviews = context.getPViews();
+
+@SuppressWarnings("unchecked")
+UnboundedDataset unboundedDataset =
+((UnboundedDataset) context.borrowDataset(transform));
+JavaDStream dStream = 
unboundedDataset.getDStream();
+
+final String stepName = context.getCurrentTransform().getFullName();
+
+JavaDStream outStream =
+dStream.transform(new Function,
+JavaRDD>() {
+  @Override
+  public JavaRDD 
call(JavaRDD rdd) throws
+  Exception {
+final JavaSparkContext jsc = new JavaSparkContext(rdd.context());
+final Accumulator aggAccum =
+SparkAggregators.getNamedAggregators(jsc);
+final Accumulator metricsAccum =
+MetricsAccumulator.getInstance();
+final Map> sideInputs =
+TranslationUtils.getSideInputs(transform.getSideInputs(),
+jsc, pviews);
+return rdd.mapPartitions(
+new DoFnFunction<>(aggAccum, metricsAccum, stepName, doFn, 
runtimeContext,
+sideInputs, windowingStrategy));
+  }
+});
+
+context.putDataset(transform,
+new UnboundedDataset<>(outStream, 
unboundedDataset.getStreamSources()));
+  }
+};
+  }
+
   private static  TransformEvaluator>
   multiDo() {
 return new TransformEvaluator>() {
-  public void evaluate(
-  final ParDo.BoundMulti transform, final 
EvaluationContext context) {
-if (transform.getSideOutputTags().size() == 0) {
-  evaluateSingle(transform, context);
-} else {
-  evaluateMulti(transform, context);
-}
-  }
-
-  private void evaluateMulti(
-  final ParDo.BoundMulti transform, final 
EvaluationContext context) {
+  @Override
+  public void evaluate(final ParDo.BoundMulti transform,
+   final EvaluationContext context) {
 final DoFn doFn = transform.getFn();
 rejectSplittable(doFn);
 rejectStateAndTimers(doFn);
@@ -389,60 +427,10 @@ final class StreamingTransformTranslator {
   JavaDStream values =
   (JavaDStream)
   (JavaDStream) TranslationUtils.dStreamValues(filtered);
-  context.putDataset(
-  e.getValue(), new UnboundedDataset<>(values, 
unboundedDataset.getStreamSources()));
+  context.putDataset(e.getValue(),
+  new UnboundedDataset<>(values, 
unboundedDataset.getStreamSources()));
 }
   }
-
-  private void evaluateSingle(
-  final ParDo.BoundMulti transform, final 
EvaluationContext context) {
-final DoFn doFn = transform.getFn();
-rejectSplittable(doFn);
-rejectStateAndTimers(doFn);
-final SparkRuntimeContext runtimeContext = context.getRuntimeContext();
-final WindowingStrategy windowingStrategy =
-context.getInput(transform).getWindowingStrategy();
-

[jira] [Created] (BEAM-1629) Make metrics/aggregators accumulators available on driver

2017-03-06 Thread Aviem Zur (JIRA)

Aviem Zur created BEAM-1629:
---

 Summary: Make metrics/aggregators accumulators available on driver
 Key: BEAM-1629
 URL: https://issues.apache.org/jira/browse/BEAM-1629
 Project: Beam
  Issue Type: Bug
  Components: runner-spark
Reporter: Aviem Zur
Assignee: Aviem Zur


Today aggregators and metrics accumulators are instantiated after pipeline is 
traversed, and so, if transform translations access these singletons in code 
which runs on the driver, but is not part of a closure, a 
{{NullPointerException}} is thrown.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2170: Revert "Implement Single-Output ParDo as a composit...

2017-03-06 Thread tgroh

GitHub user tgroh opened a pull request:

https://github.com/apache/beam/pull/2170

Revert "Implement Single-Output ParDo as a composite"

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
This reverts commit 6253abaac62979e8496a828c18c7d1aa7214be6a.

The reverted commit breaks Dataflow DisplayData.

The actual fix will include a Dataflow override for single-output
ParDos.

R: @davorbonaci @kennknowles @francesperry (AnyOf)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/beam rollback_par_do_conglomeration

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2170.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2170






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Jenkins build is back to stable : beam_PostCommit_Java_RunnableOnService_Spark #1149

2017-03-06 Thread Apache Jenkins Server

See

Jenkins build became unstable: beam_PostCommit_Java_RunnableOnService_Spark #1148

2017-03-06 Thread Apache Jenkins Server

See

[jira] [Commented] (BEAM-1556) Spark executors need to register IO factories

2017-03-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897245#comment-15897245
 ] 

ASF GitHub Bot commented on BEAM-1556:
--

GitHub user amitsela opened a pull request:

https://github.com/apache/beam/pull/2169

[BEAM-1556] Make PipelineOptions a lazy-singleton and init IOs as par…

…t of it.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/amitsela/beam BEAM-1556

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2169.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2169


commit caf781800b4ee3ee27f4c56f6c87f04ac46225f3
Author: Sela 
Date:   2017-03-06T09:17:00Z

[BEAM-1556] Make PipelineOptions a lazy-singleton and init IOs as part of 
it.




> Spark executors need to register IO factories
> -
>
> Key: BEAM-1556
> URL: https://issues.apache.org/jira/browse/BEAM-1556
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Frances Perry
>Assignee: Amit Sela
>
> The Spark executors need to call IOChannelUtils.registerIOFactories(options) 
> in order to support GCS file and make the default WordCount example work.
> Context in this thread: 
> https://lists.apache.org/thread.html/469a139c9eb07e64e514cdea42ab8000678ab743794a090c365205d7@%3Cuser.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2169: [BEAM-1556] Make PipelineOptions a lazy-singleton a...

2017-03-06 Thread amitsela

GitHub user amitsela opened a pull request:

https://github.com/apache/beam/pull/2169

[BEAM-1556] Make PipelineOptions a lazy-singleton and init IOs as parâ¦

â¦t of it.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/amitsela/beam BEAM-1556

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2169.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2169


commit caf781800b4ee3ee27f4c56f6c87f04ac46225f3
Author: Sela 
Date:   2017-03-06T09:17:00Z

[BEAM-1556] Make PipelineOptions a lazy-singleton and init IOs as part of 
it.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (BEAM-1582) ResumeFromCheckpointStreamingTest flakes with what appears as a second firing.

2017-03-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897224#comment-15897224
 ] 

ASF GitHub Bot commented on BEAM-1582:
--

GitHub user amitsela opened a pull request:

https://github.com/apache/beam/pull/2168

[BEAM-1582, BEAM-1562] Stop streaming tests on EOT Watermark.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/amitsela/beam stop-streaming-tests

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2168.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2168


commit 7d171bf5a69e6eb3f7ae58343a06c4a48865feaf
Author: Sela 
Date:   2017-03-04T19:04:02Z

Test runner to stop on EOT watermark, or timeout.

commit 899021fa4bf969c93288379bf847dd7c06ec5f09
Author: Sela 
Date:   2017-03-04T19:05:25Z

Remove timeout since it is already a pipeline option.

commit 6b8a37f66cd4a0ca29d5343f3dffbd95202ecb6f
Author: Sela 
Date:   2017-03-04T19:07:16Z

Advance to infinity at the end of pipelines.

commit 909201ae2f1bca762fcdd56c0db3bb1841965b54
Author: Sela 
Date:   2017-03-04T19:07:59Z

Add EOT watermark and expected assertions test options.

commit e4f66dfb13feea93b0b4c797620c3fa9080652f0
Author: Sela 
Date:   2017-03-04T19:09:38Z

SparkPipelineResult should avoid returning null, and handle exceptions 
better.

commit 73cfebd770fd4b5c6566051172b3e31faaf2c4e9
Author: Sela 
Date:   2017-03-04T19:11:36Z

Make ResumeFromCheckpointStreamingTest use TestSparkRunner and stop on EOT 
watermark.

commit d71207a75ef38dcdbf893fa032753db2875e4d3b
Author: Sela 
Date:   2017-03-04T20:08:05Z

fixup! post-rebase import order.

commit 4d1222f2cc653d31bc4cfa5e516af08b5e5e53a6
Author: Sela 
Date:   2017-03-05T10:52:58Z

Stop the context and update the state BEFORE throwing the exception.




> ResumeFromCheckpointStreamingTest flakes with what appears as a second firing.
> --
>
> Key: BEAM-1582
> URL: https://issues.apache.org/jira/browse/BEAM-1582
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Amit Sela
>
> See: 
> https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_MavenInstall/org.apache.beam$beam-runners-spark/2788/testReport/junit/org.apache.beam.runners.spark.translation.streaming/ResumeFromCheckpointStreamingTest/testWithResume/
> After some digging in it appears that a second firing occurs (though only one 
> is expected) but it doesn't come from a stale state (state is empty before it 
> fires).
> Might be a retry happening for some reason, which is OK in terms of 
> fault-tolerance guarantees (at-least-once), but not so much in terms of flaky 
> tests. 
> I'm looking into this hoping to fix this ASAP.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] beam pull request #2168: [BEAM-1582, BEAM-1562] Stop streaming tests on EOT ...

2017-03-06 Thread amitsela

GitHub user amitsela opened a pull request:

https://github.com/apache/beam/pull/2168

[BEAM-1582, BEAM-1562] Stop streaming tests on EOT Watermark.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/amitsela/beam stop-streaming-tests

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2168.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2168


commit 7d171bf5a69e6eb3f7ae58343a06c4a48865feaf
Author: Sela 
Date:   2017-03-04T19:04:02Z

Test runner to stop on EOT watermark, or timeout.

commit 899021fa4bf969c93288379bf847dd7c06ec5f09
Author: Sela 
Date:   2017-03-04T19:05:25Z

Remove timeout since it is already a pipeline option.

commit 6b8a37f66cd4a0ca29d5343f3dffbd95202ecb6f
Author: Sela 
Date:   2017-03-04T19:07:16Z

Advance to infinity at the end of pipelines.

commit 909201ae2f1bca762fcdd56c0db3bb1841965b54
Author: Sela 
Date:   2017-03-04T19:07:59Z

Add EOT watermark and expected assertions test options.

commit e4f66dfb13feea93b0b4c797620c3fa9080652f0
Author: Sela 
Date:   2017-03-04T19:09:38Z

SparkPipelineResult should avoid returning null, and handle exceptions 
better.

commit 73cfebd770fd4b5c6566051172b3e31faaf2c4e9
Author: Sela 
Date:   2017-03-04T19:11:36Z

Make ResumeFromCheckpointStreamingTest use TestSparkRunner and stop on EOT 
watermark.

commit d71207a75ef38dcdbf893fa032753db2875e4d3b
Author: Sela 
Date:   2017-03-04T20:08:05Z

fixup! post-rebase import order.

commit 4d1222f2cc653d31bc4cfa5e516af08b5e5e53a6
Author: Sela 
Date:   2017-03-05T10:52:58Z

Stop the context and update the state BEFORE throwing the exception.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (BEAM-1582) ResumeFromCheckpointStreamingTest flakes with what appears as a second firing.

2017-03-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897223#comment-15897223
 ] 

ASF GitHub Bot commented on BEAM-1582:
--

Github user amitsela closed the pull request at:

https://github.com/apache/beam/pull/2159


> ResumeFromCheckpointStreamingTest flakes with what appears as a second firing.
> --
>
> Key: BEAM-1582
> URL: https://issues.apache.org/jira/browse/BEAM-1582
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Amit Sela
>
> See: 
> https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_MavenInstall/org.apache.beam$beam-runners-spark/2788/testReport/junit/org.apache.beam.runners.spark.translation.streaming/ResumeFromCheckpointStreamingTest/testWithResume/
> After some digging in it appears that a second firing occurs (though only one 
> is expected) but it doesn't come from a stale state (state is empty before it 
> fires).
> Might be a retry happening for some reason, which is OK in terms of 
> fault-tolerance guarantees (at-least-once), but not so much in terms of flaky 
> tests. 
> I'm looking into this hoping to fix this ASAP.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Jenkins build became unstable: beam_PostCommit_Java_MavenInstall #2825

2017-03-06 Thread Apache Jenkins Server

See

[jira] [Commented] (BEAM-425) Create Elasticsearch IO

2017-03-06 Thread JIRA


[ 
https://issues.apache.org/jira/browse/BEAM-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897187#comment-15897187
 ] 

Ismaël Mejía commented on BEAM-425:
---

If not done yet, you should create a new JIRA for Elasticsearch 5, so this get 
into the release notes once merged, this one sholud not be reopened.

> Create Elasticsearch IO
> ---
>
> Key: BEAM-425
> URL: https://issues.apache.org/jira/browse/BEAM-425
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Jean-Baptiste Onofré
>Assignee: Etienne Chauchot
> Fix For: 0.5.0
>
>
> I'm working on a new ElasticsearchIO providing both bounded source and sink.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (BEAM-425) Create Elasticsearch IO

2017-03-06 Thread JIRA


[ 
https://issues.apache.org/jira/browse/BEAM-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897187#comment-15897187
 ] 

Ismaël Mejía edited comment on BEAM-425 at 3/6/17 12:15 PM:


If not done yet, you should create a new JIRA for Elasticsearch 5, so this get 
into the release notes once merged, this one should not be reopened.


was (Author: iemejia):
If not done yet, you should create a new JIRA for Elasticsearch 5, so this get 
into the release notes once merged, this one sholud not be reopened.

> Create Elasticsearch IO
> ---
>
> Key: BEAM-425
> URL: https://issues.apache.org/jira/browse/BEAM-425
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Jean-Baptiste Onofré
>Assignee: Etienne Chauchot
> Fix For: 0.5.0
>
>
> I'm working on a new ElasticsearchIO providing both bounded source and sink.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-1184) Add integration tests for ElasticsearchIO

2017-03-06 Thread JIRA


[ 
https://issues.apache.org/jira/browse/BEAM-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897008#comment-15897008
 ] 

Jean-Baptiste Onofré commented on BEAM-1184:


Let me share with you what I already did (I already started ITests in most of 
the IOs).

> Add integration tests for ElasticsearchIO
> -
>
> Key: BEAM-1184
> URL: https://issues.apache.org/jira/browse/BEAM-1184
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-extensions
>Reporter: Stephen Sisk
>Assignee: Jean-Baptiste Onofré
>
> In https://github.com/apache/incubator-beam/pull/1439#pullrequestreview, the 
> original PR included a dockerfile and an integration test that ran against an 
> ES instance in that dockerfile. 
> We are working on infrastructure to stand up docker containers for this kind 
> of testing, but aren't quite ready for it, so we pulled the scripts, 
> dockerfile+IT out of that PR. This issue tracks checking that work in once we 
> can take advantage of it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (BEAM-1184) Add integration tests for ElasticsearchIO

2017-03-06 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré reassigned BEAM-1184:
--

Assignee: Jean-Baptiste Onofré  (was: Etienne Chauchot)

> Add integration tests for ElasticsearchIO
> -
>
> Key: BEAM-1184
> URL: https://issues.apache.org/jira/browse/BEAM-1184
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-extensions
>Reporter: Stephen Sisk
>Assignee: Jean-Baptiste Onofré
>
> In https://github.com/apache/incubator-beam/pull/1439#pullrequestreview, the 
> original PR included a dockerfile and an integration test that ran against an 
> ES instance in that dockerfile. 
> We are working on infrastructure to stand up docker containers for this kind 
> of testing, but aren't quite ready for it, so we pulled the scripts, 
> dockerfile+IT out of that PR. This issue tracks checking that work in once we 
> can take advantage of it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (BEAM-1184) Add integration tests for ElasticsearchIO

2017-03-06 Thread Etienne Chauchot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot reassigned BEAM-1184:
--

Assignee: Etienne Chauchot  (was: Jean-Baptiste Onofré)

> Add integration tests for ElasticsearchIO
> -
>
> Key: BEAM-1184
> URL: https://issues.apache.org/jira/browse/BEAM-1184
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-extensions
>Reporter: Stephen Sisk
>Assignee: Etienne Chauchot
>
> In https://github.com/apache/incubator-beam/pull/1439#pullrequestreview, the 
> original PR included a dockerfile and an integration test that ran against an 
> ES instance in that dockerfile. 
> We are working on infrastructure to stand up docker containers for this kind 
> of testing, but aren't quite ready for it, so we pulled the scripts, 
> dockerfile+IT out of that PR. This issue tracks checking that work in once we 
> can take advantage of it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (BEAM-1623) Transform Reshuffle directly in Spark runner

2017-03-06 Thread Amit Sela (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amit Sela resolved BEAM-1623.
-
   Resolution: Fixed
Fix Version/s: 0.6.0

> Transform Reshuffle directly in Spark runner
> 
>
> Key: BEAM-1623
> URL: https://issues.apache.org/jira/browse/BEAM-1623
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Aviem Zur
>Assignee: Aviem Zur
> Fix For: 0.6.0
>
>
> Transform {{Reshuffle}} directly in Spark runner.
> Spark's {{repartition}} is logically equivalent to {{Reshuffle}} and will 
> cause the required fusion break without incurring the overhead of 
> {{groupByKey}} (In streaming pipelines this will use {{updateStateByKey}}) 
> compared to a simple {{repartition}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-1623) Transform Reshuffle directly in Spark runner

2017-03-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896998#comment-15896998
 ] 

ASF GitHub Bot commented on BEAM-1623:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2160


> Transform Reshuffle directly in Spark runner
> 
>
> Key: BEAM-1623
> URL: https://issues.apache.org/jira/browse/BEAM-1623
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> Transform {{Reshuffle}} directly in Spark runner.
> Spark's {{repartition}} is logically equivalent to {{Reshuffle}} and will 
> cause the required fusion break without incurring the overhead of 
> {{groupByKey}} (In streaming pipelines this will use {{updateStateByKey}}) 
> compared to a simple {{repartition}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[2/2] beam git commit: This closes #2160

2017-03-06 Thread amitsela

This closes #2160


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/34b38ef9
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/34b38ef9
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/34b38ef9

Branch: refs/heads/master
Commit: 34b38ef952b1981e82c0c74e0ee22b3d570169d4
Parents: 69d9512 d8bc618
Author: Sela 
Authored: Mon Mar 6 11:19:48 2017 +0200
Committer: Sela 
Committed: Mon Mar 6 11:19:48 2017 +0200

--
 .../translation/GroupCombineFunctions.java  | 22 
 .../spark/translation/TransformTranslator.java  | 38 +++-
 .../spark/translation/TranslationUtils.java | 10 ++
 .../streaming/StreamingTransformTranslator.java | 36 +++
 4 files changed, 97 insertions(+), 9 deletions(-)
--

[GitHub] beam pull request #2160: [BEAM-1623] Transform Reshuffle directly in Spark r...

2017-03-06 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2160


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[1/2] beam git commit: [BEAM-1623] Transform Reshuffle directly in Spark runner

2017-03-06 Thread amitsela

Repository: beam
Updated Branches:
  refs/heads/master 69d951225 -> 34b38ef95


[BEAM-1623] Transform Reshuffle directly in Spark runner


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/d8bc618e
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/d8bc618e
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/d8bc618e

Branch: refs/heads/master
Commit: d8bc618edafd07ae8e0ec692fc7f3df7395b876e
Parents: 69d9512
Author: Aviem Zur 
Authored: Sun Mar 5 07:15:32 2017 +0200
Committer: Sela 
Committed: Mon Mar 6 11:19:22 2017 +0200

--
 .../translation/GroupCombineFunctions.java  | 22 
 .../spark/translation/TransformTranslator.java  | 38 +++-
 .../spark/translation/TranslationUtils.java | 10 ++
 .../streaming/StreamingTransformTranslator.java | 36 +++
 4 files changed, 97 insertions(+), 9 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/d8bc618e/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/GroupCombineFunctions.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/GroupCombineFunctions.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/GroupCombineFunctions.java
index 1e879ce..b2a589d 100644
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/GroupCombineFunctions.java
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/GroupCombineFunctions.java
@@ -203,4 +203,26 @@ public class GroupCombineFunctions {
 
 return accumulatedBytes.mapToPair(CoderHelpers.fromByteFunction(keyCoder, 
iterAccumCoder));
   }
+
+  /**
+   * An implementation of
+   * {@link org.apache.beam.sdk.util.Reshuffle} for the Spark runner.
+   */
+  public static  JavaRDD>> reshuffle(
+  JavaRDD>> rdd,
+  Coder keyCoder,
+  WindowedValueCoder wvCoder) {
+
+// Use coders to convert objects in the PCollection to byte arrays, so they
+// can be transferred over the network for the shuffle.
+return rdd
+.map(new ReifyTimestampsAndWindowsFunction())
+.map(WindowingHelpers.>unwindowFunction())
+.mapToPair(TranslationUtils.toPairFunction())
+.mapToPair(CoderHelpers.toByteFunction(keyCoder, wvCoder))
+.repartition(rdd.getNumPartitions())
+.mapToPair(CoderHelpers.fromByteFunction(keyCoder, wvCoder))
+.map(TranslationUtils.fromPairFunction())
+.map(TranslationUtils.toKVByWindowInValue());
+  }
 }

http://git-wip-us.apache.org/repos/asf/beam/blob/d8bc618e/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/TransformTranslator.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/TransformTranslator.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/TransformTranslator.java
index a4939b9..0ae7313 100644
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/TransformTranslator.java
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/TransformTranslator.java
@@ -69,6 +69,7 @@ import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
 import org.apache.beam.sdk.transforms.windowing.Window;
 import org.apache.beam.sdk.transforms.windowing.WindowFn;
 import org.apache.beam.sdk.util.CombineFnUtil;
+import org.apache.beam.sdk.util.Reshuffle;
 import org.apache.beam.sdk.util.WindowedValue;
 import org.apache.beam.sdk.util.WindowingStrategy;
 import org.apache.beam.sdk.values.KV;
@@ -318,15 +319,7 @@ public final class TransformTranslator {
 return sparkCombineFn.extractOutput(iter);
   }
 }).map(TranslationUtils.fromPairFunction())
-  .map(new Function,
-  WindowedValue>>() {
-@Override
-  public WindowedValue> call(
-  KV kv) throws 
Exception {
-WindowedValue wv = kv.getValue();
-return wv.withValue(KV.of(kv.getKey(), 
wv.getValue()));
-  }
-  });
+  .map(TranslationUtils.toKVByWindowInValue());
 
 context.putDataset(transform, new BoundedDataset<>(outRdd));
   }
@@ -735,6 +728,32 @@ public final class TransformTranslator {

[jira] [Commented] (BEAM-1573) KafkaIO does not allow using Kafka serializers and deserializers

2017-03-06 Thread Eugene Kirpichov (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896989#comment-15896989
 ] 

Eugene Kirpichov commented on BEAM-1573:


At a conference right now, but quick comment: yes, as Raghu said, we're getting 
rid of Coder's as a general parsing mechanism. Use of coders for the purpose 
for which KafkaIO currently uses them is explicitly forbidden by the Beam 
PTransform Style Guide 
https://beam.apache.org/contribute/ptransform-style-guide/#coders .

We should replace that with having KafkaIO return byte[] and having convenience 
utilities for deserializing these byte[] using Kafka deserializers, e.g. by 
wrapping the code Raghu posted as a utility in the kafka module (packaged, say, 
as a SerializableFunction).

Raghu or @peay, perhaps consider sending a PR to fix this? It seems like it 
should be rather easy. Though it would merit a short discussion on 
d...@beam.apache.org first.

> KafkaIO does not allow using Kafka serializers and deserializers
> 
>
> Key: BEAM-1573
> URL: https://issues.apache.org/jira/browse/BEAM-1573
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Affects Versions: 0.4.0, 0.5.0
>Reporter: peay
>Assignee: Raghu Angadi
>Priority: Minor
>
> KafkaIO does not allow to override the serializer and deserializer settings 
> of the Kafka consumer and producers it uses internally. Instead, it allows to 
> set a `Coder`, and has a simple Kafka serializer/deserializer wrapper class 
> that calls the coder.
> I appreciate that allowing to use Beam coders is good and consistent with the 
> rest of the system. However, is there a reason to completely disallow to use 
> custom Kafka serializers instead?
> This is a limitation when working with an Avro schema registry for instance, 
> which requires custom serializers. One can write a `Coder` that wraps a 
> custom Kafka serializer, but that means two levels of un-necessary wrapping.
> In addition, the `Coder` abstraction is not equivalent to Kafka's 
> `Serializer` which gets the topic name as input. Using a `Coder` wrapper 
> would require duplicating the output topic setting in the argument to 
> `KafkaIO` and when building the wrapper, which is not elegant and error prone.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-1573) KafkaIO does not allow using Kafka serializers and deserializers

2017-03-06 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896948#comment-15896948
 ] 

Raghu Angadi commented on BEAM-1573:


@peay, 
There are two levels of solutions to deserializer (and serializer): 
  # Reasonable ways to use of custom Kafka deserializers & serializers
* This is very feasible now, including the case when you are reading from 
multiple topics.
  # Update to KafkaIO API to pass Kafka serializers directly to the Kafka 
consumer.
 * We might end up doing this, not exactly how you proposed, but rather 
replacing coders with Kafka (de)serializers. There is no need to include both I 
think. 
 * There is a discussion on Beam mailing lists about removing use of coders 
directly in sources and other places and that might be right time to add this 
support. (cc [~jkff])

Are you more interested 1 or 2? 

One way to use any Kafka serializer (for (1)): 
{code}
PCollection kafkaRecords = // Note that KafkaRecord 
include topic name, partition etc.
 pipeline
.apply(KafkaIO.read()
.withBootstrapServers("broker_1:9092,broker_2:9092")
.withTopics(ImmutableList.of("topic_a")));

kafkaRecords.apply( ParDo.of(new DoFn, 
MyAvroRecord) {
 
   private final Map config = // config 
   private transient Deserializer kafkaDeserializer;
   @Setup
   public void setup() {
  kafkaDeserializer = new MyDeserializer();
 kafkaDeserializer.configure(config) // kafka config (serializable map)
}

   @ProcessElement
public void procesElement(Context context) {
   MyAvroRecord record = 
kafkaDeserializer.deserialize(context.element().getTopic(), 
context.element().getValue())
   context.output(record);
   }
 
   @TearDown
   public void tearDown() {
 kafkaDeserializer.close();
   }
}))

{code}

   

> KafkaIO does not allow using Kafka serializers and deserializers
> 
>
> Key: BEAM-1573
> URL: https://issues.apache.org/jira/browse/BEAM-1573
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Affects Versions: 0.4.0, 0.5.0
>Reporter: peay
>Assignee: Raghu Angadi
>Priority: Minor
>
> KafkaIO does not allow to override the serializer and deserializer settings 
> of the Kafka consumer and producers it uses internally. Instead, it allows to 
> set a `Coder`, and has a simple Kafka serializer/deserializer wrapper class 
> that calls the coder.
> I appreciate that allowing to use Beam coders is good and consistent with the 
> rest of the system. However, is there a reason to completely disallow to use 
> custom Kafka serializers instead?
> This is a limitation when working with an Avro schema registry for instance, 
> which requires custom serializers. One can write a `Coder` that wraps a 
> custom Kafka serializer, but that means two levels of un-necessary wrapping.
> In addition, the `Coder` abstraction is not equivalent to Kafka's 
> `Serializer` which gets the topic name as input. Using a `Coder` wrapper 
> would require duplicating the output topic setting in the argument to 
> `KafkaIO` and when building the wrapper, which is not elegant and error prone.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Jenkins build became unstable: beam_PostCommit_Java_RunnableOnService_Spark #1142

2017-03-06 Thread Apache Jenkins Server

See

[jira] [Resolved] (BEAM-1626) Remove caching of read MapWithStateDStream.

2017-03-06 Thread Amit Sela (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amit Sela resolved BEAM-1626.
-
   Resolution: Fixed
Fix Version/s: 0.6.0

> Remove caching of read MapWithStateDStream.
> ---
>
> Key: BEAM-1626
> URL: https://issues.apache.org/jira/browse/BEAM-1626
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Amit Sela
> Fix For: 0.6.0
>
>
> There's no real need for it since checkpointing caches as well, and from my 
> experiments I think it also has something to do with some of the flakes in 
> streaming tests.
> Anyway, I don't see a good reason to call {{cache()}} there, so let's remove 
> it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

1 2 >

1 - 100 of 104 matches

Mail list logo