Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Spark #1006

2017-02-20 Thread Apache Jenkins Server
See 




[jira] [Assigned] (BEAM-1356) `mvn clean install -DskipTests` still runs python tests

2017-02-20 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1356:
---

Assignee: Aviem Zur  (was: Ahmet Altay)

> `mvn clean install -DskipTests` still runs python tests
> ---
>
> Key: BEAM-1356
> URL: https://issues.apache.org/jira/browse/BEAM-1356
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Daniel Halperin
>Assignee: Aviem Zur
> Fix For: 0.6.0
>
>
> e.g
> Using 
> /Users/dhalperi/IdeaProjects/beam/sdks/python/.eggs/PyHamcrest-1.9.0-py2.7.egg
> running egg_info
> writing dependency_links to apache_beam_sdk.egg-info/dependency_links.txt
> writing top-level names to apache_beam_sdk.egg-info/top_level.txt
> writing requirements to apache_beam_sdk.egg-info/requires.txt
> writing entry points to apache_beam_sdk.egg-info/entry_points.txt
> writing apache_beam_sdk.egg-info/PKG-INFO
> reading manifest file 'apache_beam_sdk.egg-info/SOURCES.txt'
> reading manifest template 'MANIFEST.in'
> writing manifest file 'apache_beam_sdk.egg-info/SOURCES.txt'
> running build_ext
> test_str_utf8_coder (apache_beam.coders.coders_test.CodersTest) ... ok
> test_default_fallback_path (apache_beam.coders.coders_test.FallbackCoderTest)
> Test fallback path picks a matching coder if no coder is registered. ... 
> /Users/dhalperi/IdeaProjects/beam/sdks/python/apache_beam/coders/typecoders.py:136:
>  UserWarning: Using fallback coder for typehint:  'apache_beam.coders.coders_test.DummyClass'>.
>   warnings.warn('Using fallback coder for typehint: %r.' % typehint)
> ok
> test_basics (apache_beam.coders.coders_test.PickleCoderTest) ... ok
> test_equality (apache_beam.coders.coders_test.PickleCoderTest) ... ok
> test_proto_coder (apache_beam.coders.coders_test.ProtoCoderTest) ... 
> /Users/dhalperi/IdeaProjects/beam/sdks/python/apache_beam/coders/typecoders.py:136:
>  UserWarning: Using fallback coder for typehint:  'apache_beam.coders.proto2_coder_test_messages_pb2.MessageA'>.
>   warnings.warn('Using fallback coder for typehint: %r.' % typehint)
> ok
> test_base64_pickle_coder (apache_beam.coders.coders_test_common.CodersTest) 
> ... ok
> test_bytes_coder (apache_beam.coders.coders_test_common.CodersTest) ... ok
> test_custom_coder (apache_beam.coders.coders_test_common.CodersTest) ... ok
> test_deterministic_coder (apache_beam.coders.coders_test_common.CodersTest) 
> ... ok
> test_dill_coder (apache_beam.coders.coders_test_common.CodersTest) ... ok
> test_fast_primitives_coder (apache_beam.coders.coders_test_common.CodersTest) 
> ... ok
> test_float_coder (apache_beam.coders.coders_test_common.CodersTest) ... ok
> test_global_window_coder (apache_beam.coders.coders_test_common.CodersTest) 
> ... ok
> test_iterable_coder (apache_beam.coders.coders_test_common.CodersTest) ... ok
> test_length_prefix_coder (apache_beam.coders.coders_test_common.CodersTest) 
> ... ok
> test_nested_observables (apache_beam.coders.coders_test_common.CodersTest) 
> ... ok
> test_pickle_coder (apache_beam.coders.coders_test_common.CodersTest) ... ok
> test_proto_coder (apache_beam.coders.coders_test_common.CodersTest) ... ok
> test_singleton_coder (apache_beam.coders.coders_test_common.CodersTest) ... ok
> test_timestamp_coder (apache_beam.coders.coders_test_common.CodersTest) ... ok
> test_tuple_coder (apache_beam.coders.coders_test_common.CodersTest) ... ok
> test_tuple_sequence_coder (apache_beam.coders.coders_test_common.CodersTest) 
> ... ok
> test_utf8_coder (apache_beam.coders.coders_test_common.CodersTest) ... ok
> test_varint_coder (apache_beam.coders.coders_test_common.CodersTest) ... ok
> test_windowed_value_coder (apache_beam.coders.coders_test_common.CodersTest) 
> ... ok



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1513) Skip slower verifications if '-Dquick' specified. Enable them otherwise

2017-02-20 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1513:

Description: 
Skip slower verifications (checkstyle, rat and findbugs) if '-Dquick' was 
specified in the maven command. Enable them otherwise.
The reasoning behind this is usually if you're skipping tests you're in a hurry 
to build and do not want to go through the slower verifications.
Should still be able to force these verifications with '-Prelease' as before, 
even if '-Dquick' is specified.

[Original dev list 
discussion|https://lists.apache.org/thread.html/e1f80e54b44b4a39630d978abe79fb6a6cecf71d9821ee1881b47afb@%3Cdev.beam.apache.org%3E]

  was:
Skip slower verifications (checkstyle, rat and findbugs) if '-DskipTests' was 
specified in the maven command. Enable them otherwise.
The reasoning behind this is usually if you're skipping tests you're in a hurry 
to build and do not want to go through the slower verifications.
Should still be able to force these verifications with '-Prelease' as before, 
even if '-DskipTests' is specified.

[Original dev list 
discussion|https://lists.apache.org/thread.html/e1f80e54b44b4a39630d978abe79fb6a6cecf71d9821ee1881b47afb@%3Cdev.beam.apache.org%3E]


> Skip slower verifications if '-Dquick' specified. Enable them otherwise
> ---
>
> Key: BEAM-1513
> URL: https://issues.apache.org/jira/browse/BEAM-1513
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>Priority: Minor
>
> Skip slower verifications (checkstyle, rat and findbugs) if '-Dquick' was 
> specified in the maven command. Enable them otherwise.
> The reasoning behind this is usually if you're skipping tests you're in a 
> hurry to build and do not want to go through the slower verifications.
> Should still be able to force these verifications with '-Prelease' as before, 
> even if '-Dquick' is specified.
> [Original dev list 
> discussion|https://lists.apache.org/thread.html/e1f80e54b44b4a39630d978abe79fb6a6cecf71d9821ee1881b47afb@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1513) Skip slower verifications if '-Dquick' specified

2017-02-20 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1513:

Summary: Skip slower verifications if '-Dquick' specified  (was: Skip 
slower verifications if '-DskipTests' specified)

> Skip slower verifications if '-Dquick' specified
> 
>
> Key: BEAM-1513
> URL: https://issues.apache.org/jira/browse/BEAM-1513
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>Priority: Minor
>
> Skip slower verifications (checkstyle, rat and findbugs) if '-DskipTests' was 
> specified in the maven command. Enable them otherwise.
> The reasoning behind this is usually if you're skipping tests you're in a 
> hurry to build and do not want to go through the slower verifications.
> Should still be able to force these verifications with '-Prelease' as before, 
> even if '-DskipTests' is specified.
> [Original dev list 
> discussion|https://lists.apache.org/thread.html/e1f80e54b44b4a39630d978abe79fb6a6cecf71d9821ee1881b47afb@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Spark #1005

2017-02-20 Thread Apache Jenkins Server
See 




[GitHub] beam pull request #2051: [Beam-1218] Remove dependencies on GCP libraries fo...

2017-02-20 Thread sb2nov
GitHub user sb2nov opened a pull request:

https://github.com/apache/beam/pull/2051

[Beam-1218] Remove dependencies on GCP libraries for using beam

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sb2nov/beam BEAM-1218-remove-gcp-test

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2051.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2051


commit 0e4889a0dffb304f0749121e0884e17351c0c829
Author: Sourabh Bajaj 
Date:   2017-02-21T00:58:55Z

Remove dependency on apitools

commit 9a7ff94e9e51a20291494b41efc2b4b9df82e74c
Author: Sourabh Bajaj 
Date:   2017-02-21T02:36:28Z

Remove dependency on datastore

commit 8deab113ba8b5a20e1c05100e833395da5eb2dfb
Author: Sourabh Bajaj 
Date:   2017-02-21T02:52:10Z

Update setup and tox configs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build became unstable: beam_PostCommit_Java_RunnableOnService_Spark #1004

2017-02-20 Thread Apache Jenkins Server
See 




[jira] [Created] (BEAM-1516) null value is not supported in Map with AvroCoder

2017-02-20 Thread Xu Mingmin (JIRA)
Xu Mingmin created BEAM-1516:


 Summary: null value is not supported in Map with AvroCoder
 Key: BEAM-1516
 URL: https://issues.apache.org/jira/browse/BEAM-1516
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Reporter: Xu Mingmin
Assignee: Davor Bonaci


NullPointerException is thrown, when there's null value in a Map with AvroCoder.

The domain class is defined as below:
```
@DefaultCoder(AvroCoder.class)
public class RecordRow implements Serializable {
private Map dataMap = new HashMap<>();
private RecordType dataType;
```

And the error message as:
```
Caused by: java.lang.NullPointerException: in *.beam.RecordRow in map in string 
null of string of map in field dataMap of *.beam.RecordRow
at 
org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:152)
at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:60)
at org.apache.beam.sdk.coders.AvroCoder.encode(AvroCoder.java:296)
at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:663)
at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:603)
at 
org.apache.beam.sdk.util.CoderUtils.encodeToSafeStream(CoderUtils.java:122)
at 
org.apache.beam.sdk.util.CoderUtils.encodeToByteArray(CoderUtils.java:106)
at org.apache.beam.sdk.util.CoderUtils.clone(CoderUtils.java:185)
```



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1515) unit KafkaIO with multiple Kafka client versions

2017-02-20 Thread Xu Mingmin (JIRA)
Xu Mingmin created BEAM-1515:


 Summary: unit KafkaIO with multiple Kafka client versions 
 Key: BEAM-1515
 URL: https://issues.apache.org/jira/browse/BEAM-1515
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-extensions
Reporter: Xu Mingmin
Assignee: Xu Mingmin


add junit test for KafkaIO, with Kafka client 0.10 specified.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2049: Remove unneeded plugins from IOs + some other janit...

2017-02-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2049


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[3/4] beam git commit: Fix bad indents and remove repeated dependencies for parent pom and javadoc pom

2017-02-20 Thread davor
Fix bad indents and remove repeated dependencies for parent pom and javadoc pom


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/44d39dc3
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/44d39dc3
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/44d39dc3

Branch: refs/heads/master
Commit: 44d39dc396709f637bb278a6c7a950f07682e4bd
Parents: 568efff
Author: Ismaël Mejía 
Authored: Mon Feb 20 11:20:40 2017 +0100
Committer: Davor Bonaci 
Committed: Mon Feb 20 13:27:36 2017 -0800

--
 pom.xml   | 17 -
 sdks/java/javadoc/pom.xml | 15 +--
 2 files changed, 13 insertions(+), 19 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/44d39dc3/pom.xml
--
diff --git a/pom.xml b/pom.xml
index 00a0f31..0688b73 100644
--- a/pom.xml
+++ b/pom.xml
@@ -334,7 +334,6 @@
 ${project.version}
   
 
-
   
 org.apache.beam
 beam-runners-flink_2.10-examples
@@ -348,32 +347,32 @@
   
 
   
-   org.apache.beam
-   beam-sdks-java-extensions-join-library
+org.apache.beam
+beam-sdks-java-extensions-join-library
 ${project.version}
   
 
   
-   org.apache.beam
-   beam-sdks-java-extensions-sorter
+org.apache.beam
+beam-sdks-java-extensions-sorter
 ${project.version}
   
 
   
-   org.apache.beam
-   beam-sdks-java-harness
+org.apache.beam
+beam-sdks-java-harness
 ${project.version}
   
 
   
 org.apache.beam
-beam-sdks-java-io-google-cloud-platform
+beam-sdks-java-io-elasticsearch
 ${project.version}
   
 
   
 org.apache.beam
-beam-sdks-java-io-elasticsearch
+beam-sdks-java-io-google-cloud-platform
 ${project.version}
   
 

http://git-wip-us.apache.org/repos/asf/beam/blob/44d39dc3/sdks/java/javadoc/pom.xml
--
diff --git a/sdks/java/javadoc/pom.xml b/sdks/java/javadoc/pom.xml
index d9832b6..d39958d 100644
--- a/sdks/java/javadoc/pom.xml
+++ b/sdks/java/javadoc/pom.xml
@@ -49,6 +49,11 @@
 
 
   org.apache.beam
+  beam-runners-core-construction-java
+
+
+
+  org.apache.beam
   beam-runners-core-java
 
 
@@ -84,16 +89,6 @@
 
 
   org.apache.beam
-  beam-sdks-java-io-google-cloud-platform
-
-
-
-  org.apache.beam
-  beam-sdks-java-core
-
-
-
-  org.apache.beam
   beam-sdks-java-extensions-join-library
 
 



[2/4] beam git commit: Remove maven javadoc/source plugin (inherited from the release profile)

2017-02-20 Thread davor
Remove maven javadoc/source plugin (inherited from the release profile)


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/568efff8
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/568efff8
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/568efff8

Branch: refs/heads/master
Commit: 568efff858efb04a0364dd65bdbf7c2bebe040eb
Parents: 0ad1792
Author: Ismaël Mejía 
Authored: Mon Feb 20 10:35:01 2017 +0100
Committer: Davor Bonaci 
Committed: Mon Feb 20 13:27:36 2017 -0800

--
 sdks/java/io/elasticsearch/pom.xml | 8 
 sdks/java/io/mqtt/pom.xml  | 8 
 2 files changed, 16 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/568efff8/sdks/java/io/elasticsearch/pom.xml
--
diff --git a/sdks/java/io/elasticsearch/pom.xml 
b/sdks/java/io/elasticsearch/pom.xml
index 00f065b..da52fdd 100644
--- a/sdks/java/io/elasticsearch/pom.xml
+++ b/sdks/java/io/elasticsearch/pom.xml
@@ -38,20 +38,12 @@
   
   
 org.apache.maven.plugins
-maven-source-plugin
-  
-  
-org.apache.maven.plugins
 maven-surefire-plugin
   
   
 org.apache.maven.plugins
 maven-jar-plugin
   
-  
-org.apache.maven.plugins
-maven-javadoc-plugin
-  
 
   
 

http://git-wip-us.apache.org/repos/asf/beam/blob/568efff8/sdks/java/io/mqtt/pom.xml
--
diff --git a/sdks/java/io/mqtt/pom.xml b/sdks/java/io/mqtt/pom.xml
index 7ac709e..baab614 100644
--- a/sdks/java/io/mqtt/pom.xml
+++ b/sdks/java/io/mqtt/pom.xml
@@ -38,20 +38,12 @@
   
   
 org.apache.maven.plugins
-maven-source-plugin
-  
-  
-org.apache.maven.plugins
 maven-surefire-plugin
   
   
 org.apache.maven.plugins
 maven-jar-plugin
   
-  
-org.apache.maven.plugins
-maven-javadoc-plugin
-  
 
   
 



[4/4] beam git commit: This closes #2049

2017-02-20 Thread davor
This closes #2049


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/8cfb3d12
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/8cfb3d12
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/8cfb3d12

Branch: refs/heads/master
Commit: 8cfb3d125d9a5b5950f487e2a31fa6d4a5096a04
Parents: 4e5a762 44d39dc
Author: Davor Bonaci 
Authored: Mon Feb 20 13:28:15 2017 -0800
Committer: Davor Bonaci 
Committed: Mon Feb 20 13:28:15 2017 -0800

--
 pom.xml| 17 -
 sdks/java/io/elasticsearch/pom.xml | 12 
 sdks/java/io/mqtt/pom.xml  | 12 
 sdks/java/javadoc/pom.xml  | 15 +--
 sdks/pom.xml   |  1 -
 5 files changed, 13 insertions(+), 44 deletions(-)
--




[1/4] beam git commit: Remove checkstyle dependency (it is inherited from the parent pom)

2017-02-20 Thread davor
Repository: beam
Updated Branches:
  refs/heads/master 4e5a762ef -> 8cfb3d125


Remove checkstyle dependency (it is inherited from the parent pom)


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/0ad17928
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/0ad17928
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/0ad17928

Branch: refs/heads/master
Commit: 0ad1792870791f13a8fd33a663e08abffb5e9f99
Parents: 4e5a762
Author: Ismaël Mejía 
Authored: Mon Feb 20 10:20:37 2017 +0100
Committer: Davor Bonaci 
Committed: Mon Feb 20 13:27:35 2017 -0800

--
 sdks/java/io/elasticsearch/pom.xml | 4 
 sdks/java/io/mqtt/pom.xml  | 4 
 sdks/pom.xml   | 1 -
 3 files changed, 9 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/0ad17928/sdks/java/io/elasticsearch/pom.xml
--
diff --git a/sdks/java/io/elasticsearch/pom.xml 
b/sdks/java/io/elasticsearch/pom.xml
index cdb096f..00f065b 100644
--- a/sdks/java/io/elasticsearch/pom.xml
+++ b/sdks/java/io/elasticsearch/pom.xml
@@ -50,10 +50,6 @@
   
   
 org.apache.maven.plugins
-maven-checkstyle-plugin
-  
-  
-org.apache.maven.plugins
 maven-javadoc-plugin
   
 

http://git-wip-us.apache.org/repos/asf/beam/blob/0ad17928/sdks/java/io/mqtt/pom.xml
--
diff --git a/sdks/java/io/mqtt/pom.xml b/sdks/java/io/mqtt/pom.xml
index 2e70e1d..7ac709e 100644
--- a/sdks/java/io/mqtt/pom.xml
+++ b/sdks/java/io/mqtt/pom.xml
@@ -50,10 +50,6 @@
   
   
 org.apache.maven.plugins
-maven-checkstyle-plugin
-  
-  
-org.apache.maven.plugins
 maven-javadoc-plugin
   
 

http://git-wip-us.apache.org/repos/asf/beam/blob/0ad17928/sdks/pom.xml
--
diff --git a/sdks/pom.xml b/sdks/pom.xml
index 3d0b893..f130816 100644
--- a/sdks/pom.xml
+++ b/sdks/pom.xml
@@ -68,7 +68,6 @@
 
   
 
-
   
 
 



[jira] [Commented] (BEAM-1514) change default timestamp in KafkaIO

2017-02-20 Thread Davor Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875035#comment-15875035
 ] 

Davor Bonaci commented on BEAM-1514:


SGTM!

> change default timestamp in KafkaIO
> ---
>
> Key: BEAM-1514
> URL: https://issues.apache.org/jira/browse/BEAM-1514
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>
> When user use Kafka 0.10, the field 'timestamp' from Kafka should be used as 
> the default event timestamp.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1514) change default timestamp in KafkaIO

2017-02-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875027#comment-15875027
 ] 

Jean-Baptiste Onofré commented on BEAM-1514:


It makes sense to use the default record timestamp (it's what I did in JMS with 
the message timestamp). However, also be able for used to provide his own 
timestamp  fn is useful.

> change default timestamp in KafkaIO
> ---
>
> Key: BEAM-1514
> URL: https://issues.apache.org/jira/browse/BEAM-1514
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>
> When user use Kafka 0.10, the field 'timestamp' from Kafka should be used as 
> the default event timestamp.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1514) change default timestamp in KafkaIO

2017-02-20 Thread Xu Mingmin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875015#comment-15875015
 ] 

Xu Mingmin commented on BEAM-1514:
--

[~davor], will refer to the naming standard in Cloud PubsubIO.

the lines impacted would be here 
https://github.com/XuMingmin/beam/blob/6aca4d5238165ead825ec6c55202cebc091e900d/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L962-L963
 

curTimestamp = (source.spec.getTimestampFn() == null)
? Instant.now() : source.spec.getTimestampFn().apply(record);

With kafka client 0.10, ConsumerRecord.timestamp is there, instead of using 
Instant.now().

> change default timestamp in KafkaIO
> ---
>
> Key: BEAM-1514
> URL: https://issues.apache.org/jira/browse/BEAM-1514
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>
> When user use Kafka 0.10, the field 'timestamp' from Kafka should be used as 
> the default event timestamp.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-1514) change default timestamp in KafkaIO

2017-02-20 Thread Xu Mingmin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874972#comment-15874972
 ] 

Xu Mingmin edited comment on BEAM-1514 at 2/20/17 8:00 PM:
---

will submit a PR after BEAM-1407, the priority would be:
1). KafkaClient0.9: 
KafkaIO.withTimestampFn() > Instant.now()
2). KafkaClient0.10:
KafkaIO.withTimestampFn() > ConsumerRecord.timestamp

--update the field name used in Kafka0.10.



was (Author: mingmxu):
will submit a PR after BEAM-1407, the priority would be:
1). KafkaClient0.9: 
KafkaIO.withTimestampFn() > Instant.now()
2). KafkaClient0.10:
KafkaIO.withTimestampFn() > KafkaRecord.timestamp


> change default timestamp in KafkaIO
> ---
>
> Key: BEAM-1514
> URL: https://issues.apache.org/jira/browse/BEAM-1514
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>
> When user use Kafka 0.10, the field 'timestamp' from Kafka should be used as 
> the default event timestamp.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1466) JSON utils extension

2017-02-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875003#comment-15875003
 ] 

ASF GitHub Bot commented on BEAM-1466:
--

GitHub user aviemzur reopened a pull request:

https://github.com/apache/beam/pull/1983

[BEAM-1466] JSON utils extension

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aviemzur/beam json-utils

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/1983.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1983


commit c222abcddf07509b5e6d3a4003d9da086f809d94
Author: Aviem Zur 
Date:   2017-02-11T15:06:45Z

[BEAM-1466] JSON utils extension

commit bd82f2cd543559ceef73b6818828e2d54f77a370
Author: Aviem Zur 
Date:   2017-02-12T20:17:04Z

Javadoc improvements.

commit 3c5e0da7a6e570c58a105f37e3ddd374cb4cb60c
Author: Aviem Zur 
Date:   2017-02-13T05:12:58Z

Added test for writing with custom mapper.

commit d4d7173a592da6e91cd27b9c60951fb17c76c174
Author: Aviem Zur 
Date:   2017-02-20T19:47:08Z

Changes after review.




> JSON utils extension
> 
>
> Key: BEAM-1466
> URL: https://issues.apache.org/jira/browse/BEAM-1466
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> Create a JSON extension module which will contain transforms to aid with 
> handling JSONs.
> Suggested transforms:
> * Parse JSON strings to type OutputT.
> * Parse InputT to JSON strings.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1466) JSON utils extension

2017-02-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875002#comment-15875002
 ] 

ASF GitHub Bot commented on BEAM-1466:
--

Github user aviemzur closed the pull request at:

https://github.com/apache/beam/pull/1983


> JSON utils extension
> 
>
> Key: BEAM-1466
> URL: https://issues.apache.org/jira/browse/BEAM-1466
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> Create a JSON extension module which will contain transforms to aid with 
> handling JSONs.
> Suggested transforms:
> * Parse JSON strings to type OutputT.
> * Parse InputT to JSON strings.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #1983: [BEAM-1466] JSON utils extension

2017-02-20 Thread aviemzur
Github user aviemzur closed the pull request at:

https://github.com/apache/beam/pull/1983


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #1983: [BEAM-1466] JSON utils extension

2017-02-20 Thread aviemzur
GitHub user aviemzur reopened a pull request:

https://github.com/apache/beam/pull/1983

[BEAM-1466] JSON utils extension

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aviemzur/beam json-utils

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/1983.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1983


commit c222abcddf07509b5e6d3a4003d9da086f809d94
Author: Aviem Zur 
Date:   2017-02-11T15:06:45Z

[BEAM-1466] JSON utils extension

commit bd82f2cd543559ceef73b6818828e2d54f77a370
Author: Aviem Zur 
Date:   2017-02-12T20:17:04Z

Javadoc improvements.

commit 3c5e0da7a6e570c58a105f37e3ddd374cb4cb60c
Author: Aviem Zur 
Date:   2017-02-13T05:12:58Z

Added test for writing with custom mapper.

commit d4d7173a592da6e91cd27b9c60951fb17c76c174
Author: Aviem Zur 
Date:   2017-02-20T19:47:08Z

Changes after review.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Comment Edited] (BEAM-1514) change default timestamp in KafkaIO

2017-02-20 Thread Davor Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874973#comment-15874973
 ] 

Davor Bonaci edited comment on BEAM-1514 at 2/20/17 7:12 PM:
-

Sorry, [~mingmxu], I'm a little uninformed here. Is it a Kafka convention to 
use that specific name? (Also, we should aim for consistency with Cloud Pubsub, 
which has a similar thing, i.e., both have defaults or none has one.)

FYI [~rangadi].


was (Author: davor):
Sorry, [~mingmxu], I'm a little uninformed here. Is it a Kafka convention to 
use that specific name? (Also, we should aim for consistency with Cloud Pubsub, 
which has a similar thing.)

FYI [~rangadi].

> change default timestamp in KafkaIO
> ---
>
> Key: BEAM-1514
> URL: https://issues.apache.org/jira/browse/BEAM-1514
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>
> When user use Kafka 0.10, the field 'timestamp' from Kafka should be used as 
> the default event timestamp.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1514) change default timestamp in KafkaIO

2017-02-20 Thread Davor Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874973#comment-15874973
 ] 

Davor Bonaci commented on BEAM-1514:


Sorry, [~mingmxu], I'm a little uninformed here. Is it a Kafka convention to 
use that specific name? (Also, we should aim for consistency with Cloud Pubsub, 
which has a similar thing.)

FYI [~rangadi].

> change default timestamp in KafkaIO
> ---
>
> Key: BEAM-1514
> URL: https://issues.apache.org/jira/browse/BEAM-1514
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>
> When user use Kafka 0.10, the field 'timestamp' from Kafka should be used as 
> the default event timestamp.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2050: [BEAM-920] Support triggers, panes and watermarks.

2017-02-20 Thread amitsela
GitHub user amitsela opened a pull request:

https://github.com/apache/beam/pull/2050

[BEAM-920] Support triggers, panes and watermarks.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/amitsela/beam BEAM-920

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2050.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2050


commit f5d23531de20b64164496e39f01bc964176501c6
Author: Sela 
Date:   2017-02-13T14:30:16Z

Better name for batch implementation of GroupAlsoByWindow.

commit 9bb666f794b3179683f1e70e93fb4cf7e7ad7067
Author: Sela 
Date:   2017-02-13T14:32:21Z

Implementation of GroupAlsoByWindowViaWindowSet for the Spark runner.

commit 1bdc69ae84b39ef8b71f095a5ac14f0bfeca38ef
Author: Sela 
Date:   2017-02-13T14:33:14Z

Utils for SparkGroupAlsoByWindowViaWindowSet.

commit 7b93356216d174cb4e2ced46e7a58de9932ac8b6
Author: Sela 
Date:   2017-02-16T23:19:23Z

Refactor translators according to new GroupAlsoByWindow implemenation for 
the Spark runnner.

commit 9a87206c1750b6c92f2787fd991d9d5b519e5a4f
Author: Sela 
Date:   2017-02-18T20:01:58Z

Fix streaming translation of Flatten and Window, make CreateStream eager.

commit 2a9fdd914e017a42b4b8a07937294d67324f4a3a
Author: Sela 
Date:   2017-02-18T20:05:19Z

Test triggers, panes and watermarks via CreateStream.

commit 9f15cf81c7469431d08c0f297d496edc6c85d86f
Author: Sela 
Date:   2017-02-18T20:06:51Z

Remove streaming tests that were needed before supporting the model.

commit 01a4da64ca328645e78c74edde3375326d8c
Author: Sela 
Date:   2017-02-18T20:10:50Z

Use TestSparkRunner in tests.

commit 700907ff0757b25f20d3acb83eff715359775c5e
Author: Sela 
Date:   2017-02-18T20:12:39Z

Handle test failures in "graceful stop peroid".

commit 1b21ed4594ccba1a6cb249433f577374f19dc531
Author: Sela 
Date:   2017-02-18T20:13:29Z

Further refactoring following changes.

commit bfe6594143a2cd830c5a7a2a2923a906c9855ac2
Author: Sela 
Date:   2017-02-19T22:13:23Z

Serialize state stream with coders for shuffle and checkpointing.

commit 173e29b0ad70b2f891ab6c74b740850eb6c01139
Author: Sela 
Date:   2017-02-19T22:14:39Z

Add multi stream and flattened stream tests.

commit 4cfa372927bc77c74469de702eb13efe70560650
Author: Sela 
Date:   2017-02-20T12:00:54Z

Misc. fixups.

commit 5788190aed2def9b8028f3759d160b2d38f32547
Author: Sela 
Date:   2017-02-20T17:45:03Z

Make TimestampTransform Serializable.

commit 462f7b22b94d3528fff0a086f87a25630b748ca1
Author: Sela 
Date:   2017-02-20T17:47:45Z

Rebase leftover fixes.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-920) Support triggers, panes and watermarks.

2017-02-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874874#comment-15874874
 ] 

ASF GitHub Bot commented on BEAM-920:
-

GitHub user amitsela opened a pull request:

https://github.com/apache/beam/pull/2050

[BEAM-920] Support triggers, panes and watermarks.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/amitsela/beam BEAM-920

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2050.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2050


commit f5d23531de20b64164496e39f01bc964176501c6
Author: Sela 
Date:   2017-02-13T14:30:16Z

Better name for batch implementation of GroupAlsoByWindow.

commit 9bb666f794b3179683f1e70e93fb4cf7e7ad7067
Author: Sela 
Date:   2017-02-13T14:32:21Z

Implementation of GroupAlsoByWindowViaWindowSet for the Spark runner.

commit 1bdc69ae84b39ef8b71f095a5ac14f0bfeca38ef
Author: Sela 
Date:   2017-02-13T14:33:14Z

Utils for SparkGroupAlsoByWindowViaWindowSet.

commit 7b93356216d174cb4e2ced46e7a58de9932ac8b6
Author: Sela 
Date:   2017-02-16T23:19:23Z

Refactor translators according to new GroupAlsoByWindow implemenation for 
the Spark runnner.

commit 9a87206c1750b6c92f2787fd991d9d5b519e5a4f
Author: Sela 
Date:   2017-02-18T20:01:58Z

Fix streaming translation of Flatten and Window, make CreateStream eager.

commit 2a9fdd914e017a42b4b8a07937294d67324f4a3a
Author: Sela 
Date:   2017-02-18T20:05:19Z

Test triggers, panes and watermarks via CreateStream.

commit 9f15cf81c7469431d08c0f297d496edc6c85d86f
Author: Sela 
Date:   2017-02-18T20:06:51Z

Remove streaming tests that were needed before supporting the model.

commit 01a4da64ca328645e78c74edde3375326d8c
Author: Sela 
Date:   2017-02-18T20:10:50Z

Use TestSparkRunner in tests.

commit 700907ff0757b25f20d3acb83eff715359775c5e
Author: Sela 
Date:   2017-02-18T20:12:39Z

Handle test failures in "graceful stop peroid".

commit 1b21ed4594ccba1a6cb249433f577374f19dc531
Author: Sela 
Date:   2017-02-18T20:13:29Z

Further refactoring following changes.

commit bfe6594143a2cd830c5a7a2a2923a906c9855ac2
Author: Sela 
Date:   2017-02-19T22:13:23Z

Serialize state stream with coders for shuffle and checkpointing.

commit 173e29b0ad70b2f891ab6c74b740850eb6c01139
Author: Sela 
Date:   2017-02-19T22:14:39Z

Add multi stream and flattened stream tests.

commit 4cfa372927bc77c74469de702eb13efe70560650
Author: Sela 
Date:   2017-02-20T12:00:54Z

Misc. fixups.

commit 5788190aed2def9b8028f3759d160b2d38f32547
Author: Sela 
Date:   2017-02-20T17:45:03Z

Make TimestampTransform Serializable.

commit 462f7b22b94d3528fff0a086f87a25630b748ca1
Author: Sela 
Date:   2017-02-20T17:47:45Z

Rebase leftover fixes.




> Support triggers, panes and watermarks.
> ---
>
> Key: BEAM-920
> URL: https://issues.apache.org/jira/browse/BEAM-920
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Amit Sela
>
> Implement event-time based aggregation using triggers, panes and watermarks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build is back to stable : beam_PostCommit_Java_RunnableOnService_Spark #1001

2017-02-20 Thread Apache Jenkins Server
See 




Jenkins build became unstable: beam_PostCommit_Java_RunnableOnService_Spark #1000

2017-02-20 Thread Apache Jenkins Server
See 




Jenkins build is back to stable : beam_PostCommit_Java_MavenInstall #2691

2017-02-20 Thread Apache Jenkins Server
See 




Jenkins build is back to normal : beam_PostCommit_Java_RunnableOnService_Dataflow #2346

2017-02-20 Thread Apache Jenkins Server
See 




Jenkins build became unstable: beam_PostCommit_Java_MavenInstall #2690

2017-02-20 Thread Apache Jenkins Server
See 




[jira] [Resolved] (BEAM-1512) Optimize leaf transforms materialization

2017-02-20 Thread Amit Sela (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amit Sela resolved BEAM-1512.
-
   Resolution: Fixed
Fix Version/s: 0.6.0

> Optimize leaf transforms materialization
> 
>
> Key: BEAM-1512
> URL: https://issues.apache.org/jira/browse/BEAM-1512
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Aviem Zur
>Assignee: Aviem Zur
> Fix For: 0.6.0
>
>
> Optimize leaf materialization in {{EvaluationContext}} Use register for 
> DStream leaves and an empty {{foreachPartition}} for other leaves instead of 
> the current {{count()}} which adds overhead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1512) Optimize leaf transforms materialization

2017-02-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874318#comment-15874318
 ] 

ASF GitHub Bot commented on BEAM-1512:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2046


> Optimize leaf transforms materialization
> 
>
> Key: BEAM-1512
> URL: https://issues.apache.org/jira/browse/BEAM-1512
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> Optimize leaf materialization in {{EvaluationContext}} Use register for 
> DStream leaves and an empty {{foreachPartition}} for other leaves instead of 
> the current {{count()}} which adds overhead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[2/2] beam git commit: This closes #2046

2017-02-20 Thread amitsela
This closes #2046


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/4e5a762e
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/4e5a762e
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/4e5a762e

Branch: refs/heads/master
Commit: 4e5a762ef36336eeecb568f12c9f3b41fa683a97
Parents: aa45ccb a2f0615
Author: Sela 
Authored: Mon Feb 20 12:08:23 2017 +0200
Committer: Sela 
Committed: Mon Feb 20 12:08:23 2017 +0200

--
 .../beam/runners/spark/translation/BoundedDataset.java| 10 +-
 .../spark/translation/streaming/UnboundedDataset.java |  9 ++---
 2 files changed, 11 insertions(+), 8 deletions(-)
--




[GitHub] beam pull request #2046: [BEAM-1512] Optimize leaf transforms materializatio...

2017-02-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2046


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: [BEAM-1512] Optimize leaf transforms materialization

2017-02-20 Thread amitsela
Repository: beam
Updated Branches:
  refs/heads/master aa45ccb08 -> 4e5a762ef


[BEAM-1512] Optimize leaf transforms materialization


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/a2f0615f
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/a2f0615f
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/a2f0615f

Branch: refs/heads/master
Commit: a2f0615f27315c3ee4cbe5f67e8c0018797a0e7f
Parents: aa45ccb
Author: Aviem Zur 
Authored: Sun Feb 19 19:52:22 2017 +0200
Committer: Sela 
Committed: Mon Feb 20 12:07:54 2017 +0200

--
 .../beam/runners/spark/translation/BoundedDataset.java| 10 +-
 .../spark/translation/streaming/UnboundedDataset.java |  9 ++---
 2 files changed, 11 insertions(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/a2f0615f/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/BoundedDataset.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/BoundedDataset.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/BoundedDataset.java
index 1cfb0e0..5e19846 100644
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/BoundedDataset.java
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/BoundedDataset.java
@@ -20,6 +20,7 @@ package org.apache.beam.runners.spark.translation;
 
 import com.google.common.base.Function;
 import com.google.common.collect.Iterables;
+import java.util.Iterator;
 import java.util.List;
 import javax.annotation.Nullable;
 import org.apache.beam.runners.spark.coders.CoderHelpers;
@@ -32,6 +33,7 @@ import org.apache.beam.sdk.values.PCollection;
 import org.apache.spark.api.java.JavaRDD;
 import org.apache.spark.api.java.JavaRDDLike;
 import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.api.java.function.VoidFunction;
 import org.apache.spark.storage.StorageLevel;
 
 /**
@@ -104,7 +106,13 @@ public class BoundedDataset implements Dataset {
 
   @Override
   public void action() {
-rdd.count();
+// Empty function to force computation of RDD.
+rdd.foreachPartition(new VoidFunction>() {
+  @Override
+  public void call(Iterator windowedValueIterator) 
throws Exception {
+// Empty implementation.
+  }
+});
   }
 
   @Override

http://git-wip-us.apache.org/repos/asf/beam/blob/a2f0615f/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/UnboundedDataset.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/UnboundedDataset.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/UnboundedDataset.java
index 8b65dca..6f5fa93 100644
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/UnboundedDataset.java
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/UnboundedDataset.java
@@ -32,7 +32,6 @@ import 
org.apache.beam.runners.spark.translation.WindowingHelpers;
 import org.apache.beam.sdk.coders.Coder;
 import org.apache.beam.sdk.util.WindowedValue;
 import org.apache.spark.api.java.JavaRDD;
-import org.apache.spark.api.java.function.VoidFunction;
 import org.apache.spark.streaming.api.java.JavaDStream;
 import org.apache.spark.streaming.api.java.JavaStreamingContext;
 import org.slf4j.Logger;
@@ -115,12 +114,8 @@ public class UnboundedDataset implements Dataset {
 
   @Override
   public void action() {
-dStream.foreachRDD(new VoidFunction>() {
-  @Override
-  public void call(JavaRDD rdd) throws Exception {
-rdd.count();
-  }
-});
+// Force computation of DStream.
+dStream.dstream().register();
   }
 
   @Override



Jenkins build is back to normal : beam_PostCommit_Python_Verify #1316

2017-02-20 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-920) Support triggers, panes and watermarks.

2017-02-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874291#comment-15874291
 ] 

ASF GitHub Bot commented on BEAM-920:
-

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/1987


> Support triggers, panes and watermarks.
> ---
>
> Key: BEAM-920
> URL: https://issues.apache.org/jira/browse/BEAM-920
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Amit Sela
>
> Implement event-time based aggregation using triggers, panes and watermarks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #1987: [BEAM-920] Add support for Watermarks in the Spark ...

2017-02-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/1987


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/9] beam git commit: Advance watermarks onBatchCompleted hook.

2017-02-20 Thread amitsela
Advance watermarks onBatchCompleted hook.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/fa31f18e
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/fa31f18e
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/fa31f18e

Branch: refs/heads/master
Commit: fa31f18e489d4cbe44fe4a9be7ba3d7dbee7c354
Parents: bbf3744
Author: Sela 
Authored: Sun Feb 12 18:31:14 2017 +0200
Committer: Sela 
Committed: Mon Feb 20 11:30:14 2017 +0200

--
 .../main/java/org/apache/beam/runners/spark/SparkRunner.java   | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/fa31f18e/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunner.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunner.java 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunner.java
index ebac375..52a080b 100644
--- a/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunner.java
+++ b/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunner.java
@@ -39,6 +39,7 @@ import 
org.apache.beam.runners.spark.translation.TransformEvaluator;
 import org.apache.beam.runners.spark.translation.TransformTranslator;
 import 
org.apache.beam.runners.spark.translation.streaming.Checkpoint.CheckpointDir;
 import 
org.apache.beam.runners.spark.translation.streaming.SparkRunnerStreamingContextFactory;
+import 
org.apache.beam.runners.spark.util.GlobalWatermarkHolder.WatermarksListener;
 import org.apache.beam.sdk.Pipeline;
 import org.apache.beam.sdk.io.Read;
 import org.apache.beam.sdk.metrics.MetricsEnvironment;
@@ -191,12 +192,15 @@ public final class SparkRunner extends 
PipelineRunner {
   new JavaStreamingListenerWrapper(
   new MetricsAccumulator.AccumulatorCheckpointingSparkListener()));
 
-  // register listeners.
+  // register user-defined listeners.
   for (JavaStreamingListener listener: 
mOptions.as(SparkContextOptions.class).getListeners()) {
 LOG.info("Registered listener {}." + 
listener.getClass().getSimpleName());
 jssc.addStreamingListener(new JavaStreamingListenerWrapper(listener));
   }
 
+  // register Watermarks listener to broadcast the advanced WMs.
+  jssc.addStreamingListener(new JavaStreamingListenerWrapper(new 
WatermarksListener(jssc)));
+
   startPipeline = executorService.submit(new Runnable() {
 
 @Override



[7/9] beam git commit: Streaming sources tracking test.

2017-02-20 Thread amitsela
Streaming sources tracking test.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/add87166
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/add87166
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/add87166

Branch: refs/heads/master
Commit: add87166fd445d634f6faf0f838f234c3f908c2e
Parents: 9784f20
Author: Sela 
Authored: Sun Feb 12 19:23:28 2017 +0200
Committer: Sela 
Committed: Mon Feb 20 11:30:16 2017 +0200

--
 .../translation/streaming/UnboundedDataset.java |   6 +
 .../streaming/TrackStreamingSourcesTest.java| 152 +++
 2 files changed, 158 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/add87166/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/UnboundedDataset.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/UnboundedDataset.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/UnboundedDataset.java
index 80c0515..08d1ab6 100644
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/UnboundedDataset.java
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/UnboundedDataset.java
@@ -18,6 +18,7 @@
 
 package org.apache.beam.runners.spark.translation.streaming;
 
+import com.google.common.annotations.VisibleForTesting;
 import com.google.common.collect.Iterables;
 
 import java.util.ArrayList;
@@ -72,6 +73,11 @@ public class UnboundedDataset implements Dataset {
 this.streamingSources.add(queuedStreamIds.decrementAndGet());
   }
 
+  @VisibleForTesting
+  public static void resetQueuedStreamIds() {
+queuedStreamIds.set(0);
+  }
+
   @SuppressWarnings("ConstantConditions")
   JavaDStream getDStream() {
 if (dStream == null) {

http://git-wip-us.apache.org/repos/asf/beam/blob/add87166/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/TrackStreamingSourcesTest.java
--
diff --git 
a/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/TrackStreamingSourcesTest.java
 
b/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/TrackStreamingSourcesTest.java
new file mode 100644
index 000..f102ac8
--- /dev/null
+++ 
b/runners/spark/src/test/java/org/apache/beam/runners/spark/translation/streaming/TrackStreamingSourcesTest.java
@@ -0,0 +1,152 @@
+package org.apache.beam.runners.spark.translation.streaming;
+
+import static org.hamcrest.Matchers.containsInAnyOrder;
+import static org.hamcrest.core.IsEqual.equalTo;
+import static org.junit.Assert.assertThat;
+
+import java.util.Collections;
+import java.util.List;
+import org.apache.beam.runners.spark.ReuseSparkContext;
+import org.apache.beam.runners.spark.SparkPipelineOptions;
+import org.apache.beam.runners.spark.SparkRunner;
+import org.apache.beam.runners.spark.io.CreateStream;
+import org.apache.beam.runners.spark.translation.Dataset;
+import org.apache.beam.runners.spark.translation.EvaluationContext;
+import org.apache.beam.runners.spark.translation.SparkContextFactory;
+import org.apache.beam.runners.spark.translation.TransformTranslator;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.coders.VarIntCoder;
+import org.apache.beam.sdk.options.PipelineOptionsFactory;
+import org.apache.beam.sdk.runners.TransformHierarchy;
+import org.apache.beam.sdk.transforms.AppliedPTransform;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.Flatten;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionList;
+import org.apache.beam.sdk.values.PValue;
+import org.apache.spark.SparkStatusTracker;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.streaming.Duration;
+import org.apache.spark.streaming.api.java.JavaStreamingContext;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+
+
+/**
+ * A test suite that tests tracking of the streaming sources created an
+ * {@link 
org.apache.beam.runners.spark.translation.streaming.UnboundedDataset}.
+ */
+public class TrackStreamingSourcesTest {
+
+  @Rule
+  public ReuseSparkContext reuseContext = ReuseSparkContext.yes();
+
+  private static final transient SparkPipelineOptions options =
+  PipelineOptionsFactory.create().as(SparkPipelineOptions.class);
+
+  @Before
+  public void before() {
+UnboundedDataset.resetQueuedStreamIds();
+

[8/9] beam git commit: This relied on a wrong functionality as described in BEAM-1444 and should be revisited there.

2017-02-20 Thread amitsela
This relied on a wrong functionality as described in BEAM-1444 and should be 
revisited there.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/3d25b9cc
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/3d25b9cc
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/3d25b9cc

Branch: refs/heads/master
Commit: 3d25b9cc794ce39a0ac3051eacf28127da138449
Parents: add8716
Author: Sela 
Authored: Sun Feb 12 19:47:00 2017 +0200
Committer: Sela 
Committed: Mon Feb 20 11:30:33 2017 +0200

--
 .../beam/runners/spark/TestSparkRunner.java | 11 +-
 .../translation/streaming/UnboundedDataset.java |  1 -
 .../beam/runners/spark/WatermarkTest.java   | 19 +++--
 .../streaming/FlattenStreamingTest.java | 22 
 .../streaming/TrackStreamingSourcesTest.java| 18 +++-
 .../apache/beam/sdk/testing/RegexMatcher.java   | 17 +++
 6 files changed, 57 insertions(+), 31 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/3d25b9cc/runners/spark/src/main/java/org/apache/beam/runners/spark/TestSparkRunner.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/TestSparkRunner.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/TestSparkRunner.java
index a634dd4..8b8f9ba 100644
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/TestSparkRunner.java
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/TestSparkRunner.java
@@ -23,8 +23,8 @@ import static org.hamcrest.Matchers.is;
 
 import org.apache.beam.runners.core.UnboundedReadFromBoundedSource;
 import org.apache.beam.runners.spark.aggregators.AggregatorsAccumulator;
-import org.apache.beam.runners.spark.util.GlobalWatermarkHolder;
 import org.apache.beam.runners.spark.metrics.SparkMetricsContainer;
+import org.apache.beam.runners.spark.util.GlobalWatermarkHolder;
 import org.apache.beam.sdk.Pipeline;
 import org.apache.beam.sdk.PipelineResult.State;
 import org.apache.beam.sdk.io.BoundedReadFromUnboundedSource;
@@ -108,14 +108,15 @@ public final class TestSparkRunner extends 
PipelineRunner {
 
   @Override
   public SparkPipelineResult run(Pipeline pipeline) {
+// clear state of Aggregators, Metrics and Watermarks if exists.
+AggregatorsAccumulator.clear();
+SparkMetricsContainer.clear();
+GlobalWatermarkHolder.clear();
+
 TestPipelineOptions testPipelineOptions = 
pipeline.getOptions().as(TestPipelineOptions.class);
 SparkPipelineResult result = delegate.run(pipeline);
 result.waitUntilFinish();
 
-// clear state of Aggregators, Metrics and Watermarks.
-AggregatorsAccumulator.clear();
-SparkMetricsContainer.clear();
-GlobalWatermarkHolder.clear();
 
 // make sure the test pipeline finished successfully.
 State resultState = result.getState();

http://git-wip-us.apache.org/repos/asf/beam/blob/3d25b9cc/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/UnboundedDataset.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/UnboundedDataset.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/UnboundedDataset.java
index 08d1ab6..8b65dca 100644
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/UnboundedDataset.java
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/UnboundedDataset.java
@@ -20,7 +20,6 @@ package org.apache.beam.runners.spark.translation.streaming;
 
 import com.google.common.annotations.VisibleForTesting;
 import com.google.common.collect.Iterables;
-
 import java.util.ArrayList;
 import java.util.List;
 import java.util.Queue;

http://git-wip-us.apache.org/repos/asf/beam/blob/3d25b9cc/runners/spark/src/test/java/org/apache/beam/runners/spark/WatermarkTest.java
--
diff --git 
a/runners/spark/src/test/java/org/apache/beam/runners/spark/WatermarkTest.java 
b/runners/spark/src/test/java/org/apache/beam/runners/spark/WatermarkTest.java
index 0b56403..e7a5481 100644
--- 
a/runners/spark/src/test/java/org/apache/beam/runners/spark/WatermarkTest.java
+++ 
b/runners/spark/src/test/java/org/apache/beam/runners/spark/WatermarkTest.java
@@ -1,3 +1,20 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 

[3/9] beam git commit: Handle QueuedStream (for testing) and track sources upstream.

2017-02-20 Thread amitsela
Handle QueuedStream (for testing) and track sources upstream.

Refactor according to changes.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/705695eb
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/705695eb
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/705695eb

Branch: refs/heads/master
Commit: 705695eb726acf086915e610cb2304bd968e3682
Parents: fa31f18
Author: Sela 
Authored: Sun Feb 12 18:32:06 2017 +0200
Committer: Sela 
Committed: Mon Feb 20 11:30:14 2017 +0200

--
 .../beam/runners/spark/TestSparkRunner.java | 11 ++-
 .../aggregators/AggregatorsAccumulator.java |  2 +-
 .../spark/translation/SparkContextFactory.java  |  2 +-
 .../streaming/StreamingTransformTranslator.java | 92 +---
 .../translation/streaming/UnboundedDataset.java | 29 --
 5 files changed, 88 insertions(+), 48 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/705695eb/runners/spark/src/main/java/org/apache/beam/runners/spark/TestSparkRunner.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/TestSparkRunner.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/TestSparkRunner.java
index e770164..a634dd4 100644
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/TestSparkRunner.java
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/TestSparkRunner.java
@@ -22,6 +22,8 @@ import static org.hamcrest.MatcherAssert.assertThat;
 import static org.hamcrest.Matchers.is;
 
 import org.apache.beam.runners.core.UnboundedReadFromBoundedSource;
+import org.apache.beam.runners.spark.aggregators.AggregatorsAccumulator;
+import org.apache.beam.runners.spark.util.GlobalWatermarkHolder;
 import org.apache.beam.runners.spark.metrics.SparkMetricsContainer;
 import org.apache.beam.sdk.Pipeline;
 import org.apache.beam.sdk.PipelineResult.State;
@@ -107,13 +109,14 @@ public final class TestSparkRunner extends 
PipelineRunner {
   @Override
   public SparkPipelineResult run(Pipeline pipeline) {
 TestPipelineOptions testPipelineOptions = 
pipeline.getOptions().as(TestPipelineOptions.class);
-
-// clear metrics singleton
-SparkMetricsContainer.clear();
-
 SparkPipelineResult result = delegate.run(pipeline);
 result.waitUntilFinish();
 
+// clear state of Aggregators, Metrics and Watermarks.
+AggregatorsAccumulator.clear();
+SparkMetricsContainer.clear();
+GlobalWatermarkHolder.clear();
+
 // make sure the test pipeline finished successfully.
 State resultState = result.getState();
 assertThat(

http://git-wip-us.apache.org/repos/asf/beam/blob/705695eb/runners/spark/src/main/java/org/apache/beam/runners/spark/aggregators/AggregatorsAccumulator.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/aggregators/AggregatorsAccumulator.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/aggregators/AggregatorsAccumulator.java
index 1b49e91..a4dfda6 100644
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/aggregators/AggregatorsAccumulator.java
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/aggregators/AggregatorsAccumulator.java
@@ -89,7 +89,7 @@ public class AggregatorsAccumulator {
   }
 
   @VisibleForTesting
-  static void clear() {
+  public static void clear() {
 synchronized (AggregatorsAccumulator.class) {
   instance = null;
 }

http://git-wip-us.apache.org/repos/asf/beam/blob/705695eb/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkContextFactory.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkContextFactory.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkContextFactory.java
index 326838a..cdeddad 100644
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkContextFactory.java
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkContextFactory.java
@@ -38,7 +38,7 @@ public final class SparkContextFactory {
* {@code true} then the Spark context will be reused for beam pipelines.
* This property should only be enabled for tests.
*/
-  static final String TEST_REUSE_SPARK_CONTEXT = 
"beam.spark.test.reuseSparkContext";
+  public static final String TEST_REUSE_SPARK_CONTEXT = 
"beam.spark.test.reuseSparkContext";
 
   // Spark allows only one context for JVM so this can be static.
   private static JavaSparkContext sparkContext;


[9/9] beam git commit: This closes #1987

2017-02-20 Thread amitsela
This closes #1987


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/aa45ccb0
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/aa45ccb0
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/aa45ccb0

Branch: refs/heads/master
Commit: aa45ccb0800741ddf5ee7ecc7965c85c1914acf7
Parents: 92190ba 3d25b9c
Author: Sela 
Authored: Mon Feb 20 11:40:26 2017 +0200
Committer: Sela 
Committed: Mon Feb 20 11:40:26 2017 +0200

--
 .../runners/dataflow/util/PackageUtilTest.java  |  30 +--
 .../apache/beam/runners/spark/SparkRunner.java  |   6 +-
 .../beam/runners/spark/TestSparkRunner.java |  10 +-
 .../aggregators/AggregatorsAccumulator.java |   2 +-
 .../runners/spark/io/SparkUnboundedSource.java  |  69 --
 .../spark/stateful/StateSpecFunctions.java  |  31 ++-
 .../spark/translation/SparkContextFactory.java  |   2 +-
 .../streaming/StreamingTransformTranslator.java |  92 +---
 .../translation/streaming/UnboundedDataset.java |  34 ++-
 .../spark/util/GlobalWatermarkHolder.java   | 200 
 .../beam/runners/spark/ClearWatermarksRule.java |  37 +++
 .../beam/runners/spark/ReuseSparkContext.java   |  46 
 .../beam/runners/spark/WatermarkTest.java   | 227 +++
 .../streaming/FlattenStreamingTest.java |  22 --
 .../streaming/TrackStreamingSourcesTest.java| 168 ++
 .../apache/beam/sdk/testing/RegexMatcher.java   |  49 
 16 files changed, 898 insertions(+), 127 deletions(-)
--




[4/9] beam git commit: Ingest the input watermarks into the GlobalWatermarkHolder.

2017-02-20 Thread amitsela
Ingest the input watermarks into the GlobalWatermarkHolder.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/bbf3744d
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/bbf3744d
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/bbf3744d

Branch: refs/heads/master
Commit: bbf3744d4cc1a6a58712b4c54c421b0009c5bb5e
Parents: a620653
Author: Sela 
Authored: Sun Feb 12 18:30:47 2017 +0200
Committer: Sela 
Committed: Mon Feb 20 11:30:14 2017 +0200

--
 .../runners/spark/io/SparkUnboundedSource.java  | 69 ++--
 .../spark/stateful/StateSpecFunctions.java  | 31 ++---
 2 files changed, 72 insertions(+), 28 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/bbf3744d/runners/spark/src/main/java/org/apache/beam/runners/spark/io/SparkUnboundedSource.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/io/SparkUnboundedSource.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/io/SparkUnboundedSource.java
index f03dc8c..354461f 100644
--- 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/io/SparkUnboundedSource.java
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/io/SparkUnboundedSource.java
@@ -24,8 +24,13 @@ import org.apache.beam.runners.spark.SparkPipelineOptions;
 import org.apache.beam.runners.spark.coders.CoderHelpers;
 import org.apache.beam.runners.spark.stateful.StateSpecFunctions;
 import org.apache.beam.runners.spark.translation.SparkRuntimeContext;
+import org.apache.beam.runners.spark.translation.streaming.UnboundedDataset;
+import org.apache.beam.runners.spark.util.GlobalWatermarkHolder;
+import 
org.apache.beam.runners.spark.util.GlobalWatermarkHolder.SparkWatermarks;
 import org.apache.beam.sdk.io.Source;
 import org.apache.beam.sdk.io.UnboundedSource;
+import org.apache.beam.sdk.io.UnboundedSource.CheckpointMark;
+import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
 import org.apache.beam.sdk.transforms.windowing.GlobalWindow;
 import org.apache.beam.sdk.util.WindowedValue;
 import org.apache.spark.api.java.JavaRDD;
@@ -55,7 +60,7 @@ import scala.runtime.BoxedUnit;
  * This read is a composite of the following steps:
  * 
  * Create a single-element (per-partition) stream, that contains the 
(partitioned)
- * {@link Source} and an optional {@link UnboundedSource.CheckpointMark} to 
start from.
+ * {@link Source} and an optional {@link CheckpointMark} to start from.
  * Read from within a stateful operation {@link 
JavaPairInputDStream#mapWithState(StateSpec)}
  * using the {@link StateSpecFunctions#mapSourceFunction(SparkRuntimeContext)} 
mapping function,
  * which manages the state of the CheckpointMark per partition.
@@ -65,10 +70,11 @@ import scala.runtime.BoxedUnit;
  */
 public class SparkUnboundedSource {
 
-  public static 
-  JavaDStream read(JavaStreamingContext jssc,
- SparkRuntimeContext rc,
- UnboundedSource 
source) {
+  public static  
UnboundedDataset read(
+  JavaStreamingContext jssc,
+  SparkRuntimeContext rc,
+  UnboundedSource source) {
+
 SparkPipelineOptions options = 
rc.getPipelineOptions().as(SparkPipelineOptions.class);
 Long maxRecordsPerBatch = options.getMaxRecordsPerBatch();
 SourceDStream sourceDStream = new 
SourceDStream<>(jssc.ssc(), source, rc);
@@ -82,7 +88,7 @@ public class SparkUnboundedSource {
 JavaSparkContext$.MODULE$.fakeClassTag());
 
 // call mapWithState to read from a checkpointable sources.
-JavaMapWithStateDStream,
 Tuple2, Metadata>> mapWithStateDStream = 
inputDStream.mapWithState(
 StateSpec.function(StateSpecFunctions.mapSourceFunction(rc)));
 
@@ -109,13 +115,14 @@ public class SparkUnboundedSource {
 WindowedValue.FullWindowedValueCoder.of(
 source.getDefaultOutputCoder(),
 GlobalWindow.Coder.INSTANCE);
-return mapWithStateDStream.flatMap(
+JavaDStream readUnboundedStream = 
mapWithStateDStream.flatMap(
 new FlatMapFunction, Metadata>, byte[]>() {
   @Override
   public Iterable call(Tuple2, Metadata> t2) 
throws Exception {
 return t2._1();
   }
 }).map(CoderHelpers.fromByteFunction(coder));
+return 

[5/9] beam git commit: Watermark tests.

2017-02-20 Thread amitsela
Watermark tests.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/9784f204
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/9784f204
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/9784f204

Branch: refs/heads/master
Commit: 9784f204793ab8e8ec3ec84e3c7c8d2ca4ddaf6a
Parents: c18f8a2
Author: Sela 
Authored: Sun Feb 12 18:34:30 2017 +0200
Committer: Sela 
Committed: Mon Feb 20 11:30:15 2017 +0200

--
 .../beam/runners/spark/ClearWatermarksRule.java |  37 
 .../beam/runners/spark/ReuseSparkContext.java   |  46 
 .../beam/runners/spark/WatermarkTest.java   | 212 +++
 3 files changed, 295 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/9784f204/runners/spark/src/test/java/org/apache/beam/runners/spark/ClearWatermarksRule.java
--
diff --git 
a/runners/spark/src/test/java/org/apache/beam/runners/spark/ClearWatermarksRule.java
 
b/runners/spark/src/test/java/org/apache/beam/runners/spark/ClearWatermarksRule.java
new file mode 100644
index 000..4c0c99a
--- /dev/null
+++ 
b/runners/spark/src/test/java/org/apache/beam/runners/spark/ClearWatermarksRule.java
@@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.spark;
+
+import org.apache.beam.runners.spark.util.GlobalWatermarkHolder;
+import org.junit.rules.ExternalResource;
+
+/**
+ * A rule that clears the {@link GlobalWatermarkHolder}.
+ */
+public class ClearWatermarksRule extends ExternalResource {
+
+  @Override
+  protected void before() throws Throwable {
+clearWatermarks();
+  }
+
+  public void clearWatermarks() {
+GlobalWatermarkHolder.clear();
+  }
+}

http://git-wip-us.apache.org/repos/asf/beam/blob/9784f204/runners/spark/src/test/java/org/apache/beam/runners/spark/ReuseSparkContext.java
--
diff --git 
a/runners/spark/src/test/java/org/apache/beam/runners/spark/ReuseSparkContext.java
 
b/runners/spark/src/test/java/org/apache/beam/runners/spark/ReuseSparkContext.java
new file mode 100644
index 000..027f9fd
--- /dev/null
+++ 
b/runners/spark/src/test/java/org/apache/beam/runners/spark/ReuseSparkContext.java
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.spark;
+
+import org.apache.beam.runners.spark.translation.SparkContextFactory;
+import org.junit.rules.ExternalResource;
+
+/**
+ * Explicitly set {@link org.apache.spark.SparkContext} to be reused (or not) 
in tests.
+ */
+public class ReuseSparkContext extends ExternalResource {
+
+  private final boolean reuse;
+
+  private ReuseSparkContext(boolean reuse) {
+this.reuse = reuse;
+  }
+
+  public static ReuseSparkContext no() {
+return new ReuseSparkContext(false);
+  }
+
+  public static ReuseSparkContext yes() {
+return new ReuseSparkContext(true);
+  }
+
+  @Override
+  protected void before() throws Throwable {
+System.setProperty(SparkContextFactory.TEST_REUSE_SPARK_CONTEXT, 
Boolean.toString(reuse));
+  }
+}


[1/9] beam git commit: A global Watermark holder to update and broadcast to workers.

2017-02-20 Thread amitsela
Repository: beam
Updated Branches:
  refs/heads/master 92190ba5d -> aa45ccb08


A global Watermark holder to update and broadcast to workers.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/a6206535
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/a6206535
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/a6206535

Branch: refs/heads/master
Commit: a6206535ec84d8c01695368c8973a6577ae6d953
Parents: 92190ba
Author: Sela 
Authored: Sun Feb 12 18:28:19 2017 +0200
Committer: Sela 
Committed: Mon Feb 20 11:30:13 2017 +0200

--
 .../spark/util/GlobalWatermarkHolder.java   | 200 +++
 1 file changed, 200 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/a6206535/runners/spark/src/main/java/org/apache/beam/runners/spark/util/GlobalWatermarkHolder.java
--
diff --git 
a/runners/spark/src/main/java/org/apache/beam/runners/spark/util/GlobalWatermarkHolder.java
 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/util/GlobalWatermarkHolder.java
new file mode 100644
index 000..b215b5f
--- /dev/null
+++ 
b/runners/spark/src/main/java/org/apache/beam/runners/spark/util/GlobalWatermarkHolder.java
@@ -0,0 +1,200 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.spark.util;
+
+import static com.google.common.base.Preconditions.checkState;
+
+import com.google.common.annotations.VisibleForTesting;
+import java.io.Serializable;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Queue;
+import java.util.concurrent.ConcurrentLinkedQueue;
+
+import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.broadcast.Broadcast;
+import org.apache.spark.streaming.api.java.JavaStreamingContext;
+import org.apache.spark.streaming.api.java.JavaStreamingListener;
+import org.apache.spark.streaming.api.java.JavaStreamingListenerBatchCompleted;
+import org.joda.time.Instant;
+
+
+/**
+ * A {@link Broadcast} variable to hold the global watermarks for a 
micro-batch.
+ *
+ * For each source, holds a queue for the watermarks of each micro-batch 
that was read,
+ * and advances the watermarks according to the queue (first-in-first-out).
+ */
+public class GlobalWatermarkHolder {
+  // the broadcast is broadcasted to the workers.
+  private static volatile Broadcast> broadcast = 
null;
+  // this should only live in the driver so transient.
+  private static final transient Map 
sourceTimes = new HashMap<>();
+
+  public static void add(int sourceId, SparkWatermarks sparkWatermarks) {
+Queue timesQueue = sourceTimes.get(sourceId);
+if (timesQueue == null) {
+  timesQueue = new ConcurrentLinkedQueue<>();
+}
+timesQueue.offer(sparkWatermarks);
+sourceTimes.put(sourceId, timesQueue);
+  }
+
+  @VisibleForTesting
+  public static void addAll(Map sourceTimes) {
+for (Map.Entry en: 
sourceTimes.entrySet()) {
+  int sourceId = en.getKey();
+  Queue timesQueue = en.getValue();
+  while (!timesQueue.isEmpty()) {
+add(sourceId, timesQueue.poll());
+  }
+}
+  }
+
+  /**
+   * Returns the {@link Broadcast} containing the {@link SparkWatermarks} 
mapped
+   * to their sources.
+   */
+  public static Broadcast> get() {
+return broadcast;
+  }
+
+  /**
+   * Advances the watermarks to the next-in-line watermarks.
+   * SparkWatermarks are monotonically increasing.
+   */
+  public static void advance(JavaSparkContext jsc) {
+synchronized (GlobalWatermarkHolder.class){
+  if (sourceTimes.isEmpty()) {
+return;
+  }
+
+  // update all sources' watermarks into the new broadcast.
+  Map newBroadcast = new HashMap<>();
+
+  for (Map.Entry en: 
sourceTimes.entrySet()) {
+

[6/9] beam git commit: Exopse RegexMatcher as part of the SDK.

2017-02-20 Thread amitsela
Exopse RegexMatcher as part of the SDK.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/c18f8a2c
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/c18f8a2c
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/c18f8a2c

Branch: refs/heads/master
Commit: c18f8a2c33a74ea00f8bbba6fb7cd766de9082f4
Parents: 705695e
Author: Sela 
Authored: Sun Feb 12 18:33:55 2017 +0200
Committer: Sela 
Committed: Mon Feb 20 11:30:15 2017 +0200

--
 .../runners/dataflow/util/PackageUtilTest.java  | 30 +-
 .../apache/beam/sdk/testing/RegexMatcher.java   | 32 
 2 files changed, 33 insertions(+), 29 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/c18f8a2c/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/PackageUtilTest.java
--
diff --git 
a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/PackageUtilTest.java
 
b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/PackageUtilTest.java
index 800c5a9..a11872f 100644
--- 
a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/PackageUtilTest.java
+++ 
b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/PackageUtilTest.java
@@ -66,7 +66,6 @@ import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.Collections;
 import java.util.List;
-import java.util.regex.Pattern;
 import java.util.zip.ZipEntry;
 import java.util.zip.ZipInputStream;
 import org.apache.beam.runners.dataflow.util.PackageUtil.PackageAttributes;
@@ -74,11 +73,10 @@ import org.apache.beam.sdk.options.GcsOptions;
 import org.apache.beam.sdk.options.PipelineOptionsFactory;
 import org.apache.beam.sdk.testing.ExpectedLogs;
 import org.apache.beam.sdk.testing.FastNanoClockAndSleeper;
+import org.apache.beam.sdk.testing.RegexMatcher;
 import org.apache.beam.sdk.util.GcsUtil;
 import org.apache.beam.sdk.util.IOChannelUtils;
 import org.apache.beam.sdk.util.gcsfs.GcsPath;
-import org.hamcrest.BaseMatcher;
-import org.hamcrest.Description;
 import org.hamcrest.Matchers;
 import org.junit.Before;
 import org.junit.Rule;
@@ -107,32 +105,6 @@ public class PackageUtilTest {
   // 128 bits, base64 encoded is 171 bits, rounds to 22 bytes
   private static final String HASH_PATTERN = "[a-zA-Z0-9+-]{22}";
 
-  // Hamcrest matcher to assert a string matches a pattern
-  private static class RegexMatcher extends BaseMatcher {
-private final Pattern pattern;
-
-public RegexMatcher(String regex) {
-  this.pattern = Pattern.compile(regex);
-}
-
-@Override
-public boolean matches(Object o) {
-  if (!(o instanceof String)) {
-return false;
-  }
-  return pattern.matcher((String) o).matches();
-}
-
-@Override
-public void describeTo(Description description) {
-  description.appendText(String.format("matches regular expression %s", 
pattern));
-}
-
-public static RegexMatcher matches(String regex) {
-  return new RegexMatcher(regex);
-}
-  }
-
   @Before
   public void setUp() {
 MockitoAnnotations.initMocks(this);

http://git-wip-us.apache.org/repos/asf/beam/blob/c18f8a2c/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/RegexMatcher.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/RegexMatcher.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/RegexMatcher.java
new file mode 100644
index 000..a4b14fe
--- /dev/null
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/RegexMatcher.java
@@ -0,0 +1,32 @@
+package org.apache.beam.sdk.testing;
+
+import java.util.regex.Pattern;
+import org.hamcrest.BaseMatcher;
+import org.hamcrest.Description;
+
+
+/** Hamcrest matcher to assert a string matches a pattern. */
+public class RegexMatcher extends BaseMatcher {
+  private final Pattern pattern;
+
+  public RegexMatcher(String regex) {
+this.pattern = Pattern.compile(regex);
+  }
+
+  @Override
+  public boolean matches(Object o) {
+if (!(o instanceof String)) {
+  return false;
+}
+return pattern.matcher((String) o).matches();
+  }
+
+  @Override
+  public void describeTo(Description description) {
+description.appendText(String.format("matches regular expression %s", 
pattern));
+  }
+
+  public static RegexMatcher matches(String regex) {
+return new RegexMatcher(regex);
+  }
+}



[jira] [Comment Edited] (BEAM-1399) Code coverage numbers are not accurate

2017-02-20 Thread Stas Levin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873590#comment-15873590
 ] 

Stas Levin edited comment on BEAM-1399 at 2/20/17 8:04 AM:
---

Thanks for the tip [~dhalp...@google.com], I wasn't aware the 
{{beam-sdks-java-javadoc}} module depended on all the other ones.

>From preliminary experiments, looks like {{report-aggregate}} aggregates 
>reports across dependent modules, so that the resulting report is as if you 
>had executed the regular {{report}} and merged them all together (visually). 
In terms of coverage, I'm still seeing the 69% for {{beam-sdks-java-core}} both 
when using {{report}} specifically for {{beam-sdks-java-core}} and 
{{report-aggregate}} for the {{beam-sdks-java-javadoc}} module. This may 
indicate that {{report-aggregate}} does not capture the additional 
{{@RunnableOnService/@NeedsRunner}} tests we are after, at least not out-of-the 
box. 
In addition, test class coverage is not reported (neither by {{report}} nor by 
{{report-aggregate}}), only production code ^2^.

Essentially we have the following major issues:
# Record coverage across modules 
#* Might be possible using {{maven-antrun-plugin}} and/or copying {{.class}} / 
{{.java}} files of dependent modules to {{beam-sdks-java-javadoc}}, as 
described in ^1^.
# Report test class coverage 
#* Might be possible using {{maven-antrun-plugin}} since it gives fine-grained 
control over what's reported.
# Merge coverage provided by different executions of the same test class 
#* As I dive deeper into this, I'm inclined to think that the 
{{jacoco-maven-plugin}} does not support merging coverage for a class executed 
from different modules due to ^3^, which brings me back to separating 
{{@RunnableOnService/@NeedsRunner}} into their own test classes and the rest of 
the goodies behind curtain number (2) detailed in my first comment above.


1. 
https://groups.google.com/forum/#!searchin/jacoco/integration$20tests|sort:relevance/jacoco/_yuHGr_Kp6s/irM031B6KNAJ
2. 
http://stackoverflow.com/questions/34483904/jacoco-maven-plugin-include-test-sources
3.  http://www.eclemma.org/jacoco/trunk/doc/classids.html


was (Author: staslev):
Thanks for the tip [~dhalp...@google.com], I wasn't aware the 
{{beam-sdks-java-javadoc}} module depended on all the other ones.

>From preliminary experiments, looks like {{report-aggregate}} aggregates 
>reports across dependent modules, so that the resulting report is as if you 
>had executed the regular {{report}} and merged them all together (visually). 
In terms of coverage, I'm still seeing the 69% for {{beam-sdks-java-core}} both 
when using {{report}} specifically for {{beam-sdks-java-core}} and 
{{report-aggregate}} for the {{beam-sdks-java-javadoc}} module. This may 
indicate that {{report-aggregate}} does not capture the additional 
{{@RunnableOnService/@NeedsRunner}} tests we are after, at least not out-of-the 
box. 
In addition, test class coverage is not reported (neither by {{report}} nor by 
{{report-aggregate}}), only production code ^2^.

Essentially we have the following major issues:
# Record coverage across modules 
#* Might be possible using {{maven-antrun-plugin}} and/or copying {{.class}} / 
{{.java}} files of dependent modules to {{beam-sdks-java-javadoc}}, as 
described in ^1^.
# Report test class coverage 
#* Might be possible using {{maven-antrun-plugin}} since it gives fine-grained 
control over what's reported.
# Merge coverage provided by different executions of the same test class 
#* As I dive deeper into this, I'm inclined to think that the 
{{jacoco-maven-plugin}} does not support merging coverage for a test class 
executed from different modules due to ^3^, which brings me back to separating 
{{@RunnableOnService/@NeedsRunner}} into their own test classes and the rest of 
the goodies behind curtain number (2) detailed in my first comment above.


1. 
https://groups.google.com/forum/#!searchin/jacoco/integration$20tests|sort:relevance/jacoco/_yuHGr_Kp6s/irM031B6KNAJ
2. 
http://stackoverflow.com/questions/34483904/jacoco-maven-plugin-include-test-sources
3.  http://www.eclemma.org/jacoco/trunk/doc/classids.html

> Code coverage numbers are not accurate
> --
>
> Key: BEAM-1399
> URL: https://issues.apache.org/jira/browse/BEAM-1399
> Project: Beam
>  Issue Type: Bug
>  Components: build-system, sdk-java-core, testing
>Reporter: Daniel Halperin
>  Labels: newbie, starter
>
> We've started adding Java Code Coverage numbers to PRs using the jacoco 
> plugin. However, we are getting very low coverage reported for things like 
> the Java SDK core.
> My belief is that this is happening because we test the bulk of the SDK not 
> in the SDK module , but in fact in the DirectRunner and other similar modules.
> JaCoCo has a 

[jira] [Comment Edited] (BEAM-1399) Code coverage numbers are not accurate

2017-02-20 Thread Stas Levin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873590#comment-15873590
 ] 

Stas Levin edited comment on BEAM-1399 at 2/20/17 8:04 AM:
---

Thanks for the tip [~dhalp...@google.com], I wasn't aware the 
{{beam-sdks-java-javadoc}} module depended on all the other ones.

>From preliminary experiments, looks like {{report-aggregate}} aggregates 
>reports across dependent modules, so that the resulting report is as if you 
>had executed the regular {{report}} and merged them all together (visually). 
In terms of coverage, I'm still seeing the 69% for {{beam-sdks-java-core}} both 
when using {{report}} specifically for {{beam-sdks-java-core}} and 
{{report-aggregate}} for the {{beam-sdks-java-javadoc}} module. This may 
indicate that {{report-aggregate}} does not capture the additional 
{{@RunnableOnService/@NeedsRunner}} tests we are after, at least not out-of-the 
box. 
In addition, test class coverage is not reported (neither by {{report}} nor by 
{{report-aggregate}}), only production code ^2^.

Essentially we have the following major issues:
# Record coverage across modules 
#* Might be possible using {{maven-antrun-plugin}} and/or copying {{.class}} / 
{{.java}} files of dependent modules to {{beam-sdks-java-javadoc}}, as 
described in ^1^.
# Report test class coverage 
#* Might be possible using {{maven-antrun-plugin}} since it gives fine-grained 
control over what's reported.
# Merge coverage provided by different executions of the same test class 
#* As I dive deeper into this, I'm inclined to think that the 
{{jacoco-maven-plugin}} does not support merging coverage for a test class 
executed from different modules due to ^3^, which brings me back to separating 
{{@RunnableOnService/@NeedsRunner}} into their own test classes and the rest of 
the goodies behind curtain number (2) detailed in my first comment above.


1. 
https://groups.google.com/forum/#!searchin/jacoco/integration$20tests|sort:relevance/jacoco/_yuHGr_Kp6s/irM031B6KNAJ
2. 
http://stackoverflow.com/questions/34483904/jacoco-maven-plugin-include-test-sources
3.  http://www.eclemma.org/jacoco/trunk/doc/classids.html


was (Author: staslev):
Thanks for the tip [~dhalp...@google.com], I wasn't aware the 
{{beam-sdks-java-javadoc}} module depended on all the other ones.

>From preliminary experiments, looks like {{report-aggregate}} aggregates 
>reports across dependent modules, so that the resulting report is as if you 
>had executed the regular {{report}} and merged them all together (visually). 
In terms of coverage, I'm still seeing the 69% for {{beam-sdks-java-core}} both 
when using {{report}} specifically for {{beam-sdks-java-core}} and 
{{report-aggregate}} for the {{beam-sdks-java-javadoc}} module. This may 
indicate that {{report-aggregate}} does not capture the additional 
{{@RunnableOnService/@NeedsRunner}} tests we are after, at least not out-of-the 
box. 
In addition, test class coverage is not reported (neither by {{report}} nor by 
{{report-aggregate}}), only production code ^2^.

Essentially we have the following major issues:
# Record coverage across modules 
#* Might be possible using {{maven-antrun-plugin}} and/or copying {{.class}} / 
{{.java}} files of dependent modules to {{beam-sdks-java-javadoc}}, as 
described in ^1^.
# Report test class coverage 
#* Might be possible using {{maven-antrun-plugin}} since it gives fine-grained 
control over that's reported.
# Merge coverage provided by different executions of the same test class 
#* As I dive deeper into this, I'm inclined to think that the 
{{jacoco-maven-plugin}} does not support merging coverage for a test class 
executed from different modules due to ^3^, which brings me back to separating 
{{@RunnableOnService/@NeedsRunner}} into their own test classes and the rest of 
the goodies behind curtain number (2) detailed in my first comment above.


1. 
https://groups.google.com/forum/#!searchin/jacoco/integration$20tests|sort:relevance/jacoco/_yuHGr_Kp6s/irM031B6KNAJ
2. 
http://stackoverflow.com/questions/34483904/jacoco-maven-plugin-include-test-sources
3.  http://www.eclemma.org/jacoco/trunk/doc/classids.html

> Code coverage numbers are not accurate
> --
>
> Key: BEAM-1399
> URL: https://issues.apache.org/jira/browse/BEAM-1399
> Project: Beam
>  Issue Type: Bug
>  Components: build-system, sdk-java-core, testing
>Reporter: Daniel Halperin
>  Labels: newbie, starter
>
> We've started adding Java Code Coverage numbers to PRs using the jacoco 
> plugin. However, we are getting very low coverage reported for things like 
> the Java SDK core.
> My belief is that this is happening because we test the bulk of the SDK not 
> in the SDK module , but in fact in the DirectRunner and other similar modules.
> JaCoCo has a