from:"ASF GitHub Bot \(JIRA\)"

[jira] [Commented] (BEAM-117) Implement the API for Static Display Metadata

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297137#comment-15297137
 ] 

ASF GitHub Bot commented on BEAM-117:
-

GitHub user swegner opened a pull request:

https://github.com/apache/incubator-beam/pull/375

[BEAM-117] Evaluate display data from InProcessPipelineRunner

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

Display data can be added to any PTransform to be used
for display from any runner. Runners are not required to
consume display data, and currently many don't.

This changes InProcessRunner to consumer display data (and then
discard it) in order to validate that display data is properly
implemented on transforms within a pipeline. Exceptions thrown
within HasDisplayData implementations will cause Pipeline.run()
to fail.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/swegner/incubator-beam 
displaydata-directrunner

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/375.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #375


commit 698a0af1553279ba48ad2f4768556fbd4810208a
Author: Scott Wegner 
Date:   2016-05-23T21:29:33Z

Evaluate display data from InProcessPipelineRunner

Display data can be added to any PTransform to be used
for display from any runner. Runners are not required to
consume display data, and currently many don't.

This changes InProcessRunner to consumer display data (and then
discard it) in order to validate that display data is properly
implemented on transforms within a pipeline. Exceptions thrown
within HasDisplayData implementations will cause Pipeline.run()
to fail.




> Implement the API for Static Display Metadata
> -
>
> Key: BEAM-117
> URL: https://issues.apache.org/jira/browse/BEAM-117
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Ben Chambers
>Assignee: Scott Wegner
>
> As described in the following doc, we would like the SDK to allow associating 
> display metadata with PTransforms.
> https://docs.google.com/document/d/11enEB9JwVp6vO0uOYYTMYTGkr3TdNfELwWqoiUg5ZxM/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-53) PubSubIO: reimplement in Java

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-53?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297103#comment-15297103
 ] 

ASF GitHub Bot commented on BEAM-53:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/371


> PubSubIO: reimplement in Java
> -
>
> Key: BEAM-53
> URL: https://issues.apache.org/jira/browse/BEAM-53
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Daniel Halperin
>Assignee: Mark Shields
>
> PubSubIO is currently only partially implemented in Java: the 
> DirectPipelineRunner uses a non-scalable API in a single-threaded manner.
> In contrast, the DataflowPipelineRunner uses an entirely different code path 
> implemented in the Google Cloud Dataflow service.
> We need to reimplement PubSubIO in Java in order to support other runners in 
> a scalable way.
> Additionally, we can take this opportunity to add new features:
> * getting timestamp from an arbitrary lambda in arbitrary formats rather than 
> from a message attribute in only 2 formats.
> * exposing metadata and attributes in the elements produced by PubSubIO.Read
> * setting metadata and attributes in the messages written by PubSubIO.Write



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-117) Implement the API for Static Display Metadata

2016-05-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291647#comment-15291647
 ] 

ASF GitHub Bot commented on BEAM-117:
-

GitHub user swegner opened a pull request:

https://github.com/apache/incubator-beam/pull/355

[BEAM-117] Fix bug in PipelineOptions DisplayData serialization

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

In the code for populating display data for `PipelineOptions` in the 
serialized JSON, we had a bug in `ProxyInvocationHandler.Serializer` where it 
was not checking for null values from previously-deserialized JSON values. As a 
result, a null value in the JSON would cause us to throw and block pipelines 
from running.

This PR fixes a number of issues:

* Properly handle null JSON values in `ProxyInvocatinHandler.Serializer`
* Establish a common pattern for handling errors during display data 
collection, via `DisplayData.from(Throwable)`
* Implement resiliency pattern in `ProxyInvocationHandler.Serializer` so 
bugs in `PipelineOptions` serialization do not prevent running pipelines.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/swegner/incubator-beam displaydata-resiliency

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/355.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #355


commit 8ffeb3da894ee5e25a54f7faa8faeff8fe1ae98f
Author: Scott Wegner 
Date:   2016-05-19T16:17:37Z

Fix bug in PipelineOptions DisplayData serialization




> Implement the API for Static Display Metadata
> -
>
> Key: BEAM-117
> URL: https://issues.apache.org/jira/browse/BEAM-117
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Ben Chambers
>Assignee: Scott Wegner
>
> As described in the following doc, we would like the SDK to allow associating 
> display metadata with PTransforms.
> https://docs.google.com/document/d/11enEB9JwVp6vO0uOYYTMYTGkr3TdNfELwWqoiUg5ZxM/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-242) Enable Checkstyle check for the Flink Runner

2016-05-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15295674#comment-15295674
 ] 

ASF GitHub Bot commented on BEAM-242:
-

GitHub user jbonofre opened a pull request:

https://github.com/apache/incubator-beam/pull/372

[BEAM-242] Enable and fix checkstyle on Flink runner

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [X] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [X] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [X] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [X] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
Enable checkstyle plugin in Flink runner and fix checkstyle issues.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jbonofre/incubator-beam 
BEAM-242-FLINK-CHECKSTYLE

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/372.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #372


commit 01039c296f90be1544385454e1efb0d2398aaa35
Author: Jean-Baptiste Onofré 
Date:   2016-05-22T18:43:05Z

[BEAM-242] Enable and fix checkstyle on Flink runner




> Enable Checkstyle check for the Flink Runner 
> -
>
> Key: BEAM-242
> URL: https://issues.apache.org/jira/browse/BEAM-242
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Jean-Baptiste Onofré
>Priority: Minor
>
> We don't have a Checkstyle check in place for the Flink Runner. I would like 
> to use the SDK's checkstyle rules.
> We could also think about a unified Checkstyle for all Runners.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-297) version typo at README.md of flink runner

2016-05-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15294900#comment-15294900
 ] 

ASF GitHub Bot commented on BEAM-297:
-

GitHub user JianfengQian opened a pull request:

https://github.com/apache/incubator-beam/pull/370

[BEAM-297] update flink README.md at line 145

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
[BEAM-297] update flink README.md at line 145,
version of org.apache.beam should be 0.1.0-incubating-SNAPSHOT now.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JianfengQian/incubator-beam patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/370.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #370


commit 671ac2e321ec07e511281af28138861dc56b9b22
Author: JianfengQian 
Date:   2016-05-21T10:32:29Z

[BEAM-297] update flink README.md at line 145,version of org.apache.beam 
should be 0.1.0-incubating-SNAPSHOT now.




> version typo at README.md of flink runner
> -
>
> Key: BEAM-297
> URL: https://issues.apache.org/jira/browse/BEAM-297
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 0.1.0-incubating
>Reporter: Jianfeng Qian
>Priority: Trivial
>  Labels: easyfix
> Fix For: 0.1.0-incubating
>
>
> version typo at README.md of flink runner
> at line 145, the version should be "0.1.0-incubating-SNAPSHOT" instead of 
> "0.4-SNAPSHOT"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-103) Make UnboundedSourceWrapper parallel

2016-05-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298188#comment-15298188
 ] 

ASF GitHub Bot commented on BEAM-103:
-

GitHub user mxm opened a pull request:

https://github.com/apache/incubator-beam-site/pull/19

[BEAM-103] update capability matrix

This reflects the changes of BEAM-103 in the Capability Matrix.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mxm/incubator-beam-site asf-site

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam-site/pull/19.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19


commit 0ee1aa0f08a941abd21a27ed3ab87d0bde1f6f14
Author: Maximilian Michels 
Date:   2016-05-24T13:41:21Z

[BEAM-103] update capability matrix

This reflects the changes of BEAM-103 in the Capability Matrix.

commit 07f63a1c88b94e9395faa8744ba171b7f5bcfc38
Author: Maximilian Michels 
Date:   2016-05-24T13:43:26Z

rebuild Beam web site




> Make UnboundedSourceWrapper parallel
> 
>
> Key: BEAM-103
> URL: https://issues.apache.org/jira/browse/BEAM-103
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Aljoscha Krettek
> Fix For: 0.1.0-incubating
>
>
> As of now {{UnboundedSource}} s are executed with a parallelism of 1 
> regardless of the splits which the source returns. The corresponding 
> {{UnboundedSourceWrapper}} should implement {{RichParallelSourceFunction}} 
> and deal with splits correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-242) Enable Checkstyle check and Javadoc build for the Flink Runner

2016-05-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298525#comment-15298525
 ] 

ASF GitHub Bot commented on BEAM-242:
-

GitHub user jbonofre opened a pull request:

https://github.com/apache/incubator-beam/pull/382

[BEAM-242] Fix javadoc on Flink runner.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [X] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [X] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [X] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [X] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
Fixing Javadoc issue on Flink runner (core and examples) allowing to run a 
mvn clean install -Prelease.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jbonofre/incubator-beam BEAM-242-JAVADOC

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/382.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #382


commit ecc96837495bbdc17f6b77899ac4d16c2d7ba839
Author: Jean-Baptiste Onofré 
Date:   2016-05-23T06:48:34Z

[BEAM-242] Fix javadoc on Flink runner.




> Enable Checkstyle check and Javadoc build for the Flink Runner 
> ---
>
> Key: BEAM-242
> URL: https://issues.apache.org/jira/browse/BEAM-242
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Jean-Baptiste Onofré
>Priority: Minor
>
> We don't have a Checkstyle check in place for the Flink Runner. I would like 
> to use the SDK's checkstyle rules.
> We could also think about a unified Checkstyle for all Runners.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-297) version typo at README.md of flink runner

2016-05-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15294886#comment-15294886
 ] 

ASF GitHub Bot commented on BEAM-297:
-

Github user JianfengQian closed the pull request at:

https://github.com/apache/incubator-beam/pull/354


> version typo at README.md of flink runner
> -
>
> Key: BEAM-297
> URL: https://issues.apache.org/jira/browse/BEAM-297
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 0.1.0-incubating
>Reporter: Jianfeng Qian
>Priority: Trivial
>  Labels: easyfix
> Fix For: 0.1.0-incubating
>
>
> version typo at README.md of flink runner
> at line 145, the version should be "0.1.0-incubating-SNAPSHOT" instead of 
> "0.4-SNAPSHOT"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-53) PubSubIO: reimplement in Java

2016-05-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-53?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15295352#comment-15295352
 ] 

ASF GitHub Bot commented on BEAM-53:


GitHub user mshields822 opened a pull request:

https://github.com/apache/incubator-beam/pull/371

[BEAM-53] Mirror some DataflowJavaSDK pubsub fixes

Forward port from
https://github.com/GoogleCloudPlatform/DataflowJavaSDK/pull/281

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mshields822/incubator-beam pubsub-fiddles

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/371.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #371


commit c87a19b38e51b2e15689b154089ec93da1a8c7ae
Author: Mark Shields 
Date:   2016-05-21T04:22:08Z

s/Apiary/Json/g

commit 0a48cfa5479310fbdb212dce8c84a140517a7f66
Author: Mark Shields 
Date:   2016-05-21T04:23:02Z

wibble

commit 178f5dcef4bb22def2b818a1dd21e5952700c0e4
Author: Mark Shields 
Date:   2016-05-21T05:14:54Z

Mark as bounded

commit da3bfe464f0b2b69c826d1a7d72868211b448371
Author: Mark Shields 
Date:   2016-05-22T01:44:47Z

Fix warnings

commit 710fd2db6100d071392dd5fba28114ebe8548065
Author: Mark Shields 
Date:   2016-05-22T01:47:17Z

yawn




> PubSubIO: reimplement in Java
> -
>
> Key: BEAM-53
> URL: https://issues.apache.org/jira/browse/BEAM-53
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Daniel Halperin
>Assignee: Mark Shields
>
> PubSubIO is currently only partially implemented in Java: the 
> DirectPipelineRunner uses a non-scalable API in a single-threaded manner.
> In contrast, the DataflowPipelineRunner uses an entirely different code path 
> implemented in the Google Cloud Dataflow service.
> We need to reimplement PubSubIO in Java in order to support other runners in 
> a scalable way.
> Additionally, we can take this opportunity to add new features:
> * getting timestamp from an arbitrary lambda in arbitrary formats rather than 
> from a message attribute in only 2 formats.
> * exposing metadata and attributes in the elements produced by PubSubIO.Read
> * setting metadata and attributes in the messages written by PubSubIO.Write



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-270) Support Timestamps/Windows in Flink Batch

2016-05-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292003#comment-15292003
 ] 

ASF GitHub Bot commented on BEAM-270:
-

Github user aljoscha closed the pull request at:

https://github.com/apache/incubator-beam/pull/328


> Support Timestamps/Windows in Flink Batch
> -
>
> Key: BEAM-270
> URL: https://issues.apache.org/jira/browse/BEAM-270
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> Right now, Flink Batch execution does not use {{WindowedValue}} internally, 
> this means that all programs that interact with timestamps/windows will not 
> work. We should just internally wrap everything in {{WindowedValue}} as we do 
> in Flink Streaming. This also makes it very straightforward to add support 
> for windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-115) Beam Runner API

2016-05-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291952#comment-15291952
 ] 

ASF GitHub Bot commented on BEAM-115:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/268


> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-272) Flink Runner depends on Dataflow Runner

2016-05-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279893#comment-15279893
 ] 

ASF GitHub Bot commented on BEAM-272:
-

GitHub user mxm opened a pull request:

https://github.com/apache/incubator-beam/pull/324

[BEAM-272][flink] remove dependency on Dataflow Runner

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [X] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [X] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [X] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [X] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mxm/incubator-beam BEAM-272

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/324.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #324


commit abfe73a962624001022a83ce4dd5c1697037bdbf
Author: Maximilian Michels 
Date:   2016-05-11T09:57:44Z

[BEAM-272][flink] remove dependency on Dataflow Runner




> Flink Runner depends on Dataflow Runner
> ---
>
> Key: BEAM-272
> URL: https://issues.apache.org/jira/browse/BEAM-272
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>
> During restructuring of the modules, we have introduced a dependency of the 
> Flink Runner on the Dataflow Runner. The {{PipelineOptionsFactory}} used to 
> be part of the SDK core but moved to the Dataflow Runner. We should get rid 
> of this dependency to avoid classpath related problems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-05-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278646#comment-15278646
 ] 

ASF GitHub Bot commented on BEAM-22:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/312


> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-117) Implement the API for Static Display Metadata

2016-05-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278633#comment-15278633
 ] 

ASF GitHub Bot commented on BEAM-117:
-

GitHub user swegner opened a pull request:

https://github.com/apache/incubator-beam/pull/315

[BEAM-117] Runners should be resilient to DisplayData failure

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
Display data is collected from PTransforms at Pipeline construction
time. Collecting display data runs user code from provided transforms
and fn's. These components should be designed not to throw during
pipeline construction, however we also shouldn't fail a pipeline
if this code does fail.

This PR adds resiliency to the DataflowPipelineTranslator, where
we collect display data for the Dataflow runner, and also a
RunnableOnService test to verify that all runners are resilient to
display data failures. Other runners are not yet using display data,
but will get this validation for free when they do.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/swegner/incubator-beam displaydata-safe

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/315.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #315


commit d6c5025937cbd0016d0a83d08e53902ae4d4519b
Author: Scott Wegner 
Date:   2016-05-10T18:19:14Z

Runners should be resilient to DisplayData failure

Display data is collected from PTransforms at Pipeline construction
time. Collecting display data runs user code from provided transforms
and fn's. These components should be designed not to throw during
pipeline construction, however we also shouldn't fail a pipeline
if this code does fail.

This PR adds resiliency to the DataflowPipelineTranslator, where
we collect display data for the Dataflow runner, and also a
RunnableOnService test to verify that all runners are resilient to
display data failures. Other runners are not yet using display data,
but will get this validation for free when they do.




> Implement the API for Static Display Metadata
> -
>
> Key: BEAM-117
> URL: https://issues.apache.org/jira/browse/BEAM-117
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Ben Chambers
>Assignee: Scott Wegner
>
> As described in the following doc, we would like the SDK to allow associating 
> display metadata with PTransforms.
> https://docs.google.com/document/d/11enEB9JwVp6vO0uOYYTMYTGkr3TdNfELwWqoiUg5ZxM/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-103) Make UnboundedSourceWrapper parallel

2016-05-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278810#comment-15278810
 ] 

ASF GitHub Bot commented on BEAM-103:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/274


> Make UnboundedSourceWrapper parallel
> 
>
> Key: BEAM-103
> URL: https://issues.apache.org/jira/browse/BEAM-103
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Aljoscha Krettek
>
> As of now {{UnboundedSource}} s are executed with a parallelism of 1 
> regardless of the splits which the source returns. The corresponding 
> {{UnboundedSourceWrapper}} should implement {{RichParallelSourceFunction}} 
> and deal with splits correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-05-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278813#comment-15278813
 ] 

ASF GitHub Bot commented on BEAM-22:


GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/318

[BEAM-22] Use an AtomicReference in InProcessSideInputContainer

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

This fixes a TOCTOU race in the contents updating logic, where the
determination that the current pane should replace the contents of the
side input and the replacement is not a single atomic operation. Using
AtomicReference allows the use of compareAndSet to ensure that the
replacement can only occur on the pane that the decision to replace was
made with.

Fixes a race where a pane could be the latest, and replace a
pane, but would be lost due to an earlier pane being written between the
invalidation and loading of contents.

Fixes a race where a reader can incorrectly read an empty iterable as
the contents of a PCollectionView, due to occuring between the
invalidate and reload steps.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam 
atomic_reference_side_input_container

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/318.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #318






> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-05-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279008#comment-15279008
 ] 

ASF GitHub Bot commented on BEAM-22:


GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/320

[BEAM-22] Reuse DoFns in ParDoEvaluators

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

This allows the runner to avoid cloning DoFns for every input bundle.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam 
reuse_dofns_pardoevaluators

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/320.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #320


commit f018c75ef3ca1b60fc990bd88031d5419c571b87
Author: Thomas Groh 
Date:   2016-05-10T21:06:27Z

Reuse DoFns in ParDoEvaluators

This allows the runner to avoid cloning DoFns for every input bundle.




> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-271) Option to configure remote Dataflow windmill service endpoint

2016-05-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278839#comment-15278839
 ] 

ASF GitHub Bot commented on BEAM-271:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/314


> Option to configure remote Dataflow windmill service endpoint
> -
>
> Key: BEAM-271
> URL: https://issues.apache.org/jira/browse/BEAM-271
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Raghu Angadi
>Assignee: Davor Bonaci
>Priority: Minor
>
> Add two options to DataflowPipelineDebugOptions to configure Dataflow remove 
> windmill service. This lets Dataflow users to configure the streaming 
> pipelines to point to remote windmill service.
> https://github.com/apache/incubator-beam/pull/314



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-05-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278844#comment-15278844
 ] 

ASF GitHub Bot commented on BEAM-22:


GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/319

[BEAM-22] Enable RunnableOnService Tests

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
Not ready for review. Publishing PR to hook into Travis and Jenkins.

Update runners/direct-java/pom.xml to enable the RunnableOnService
tests phase.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam enable_ros_tests

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/319.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #319


commit d1796ba6fecb8423e563fcdf66946beda79e52c6
Author: Thomas Groh 
Date:   2016-05-09T22:47:27Z

Minor checkArgument style fix

commit f0e38fd170949f27d4794113e4bcb2077ffe88a6
Author: Thomas Groh 
Date:   2016-05-10T18:27:37Z

Use an AtomicReference in InProcessSideInputContainer

This fixes a TOCTOU race in the contents updating logic, where the
determination that the current pane should replace the contents of the
side input and the replacement is not a single atomic operation. Using
AtomicReference allows the use of compareAndSet to ensure that the
replacement can only occur on the pane that the decision to replace was
made with.

Fixes a race where a pane could be the latest, and replace a
pane, but would be lost due to an earlier pane being written between the
invalidation and loading of contents.

Fixes a race where a reader can incorrectly read an empty iterable as
the contents of a PCollectionView, due to occuring between the
invalidate and reload steps.

commit e06e449e3762a48404d0407babaff440ebfa416e
Author: Thomas Groh 
Date:   2016-05-10T20:22:20Z

Cache read SideInput Contents in the InProcessSideInputContainer

This ensures that while processing a bundle all elements see the same
contents for any SideInput Window.

commit 8ff1d79474f3d114381b924fa61aa46bd7b935db
Author: Thomas Groh 
Date:   2016-05-10T20:36:21Z

Enable RunnableOnService tests for the Direct Runner




> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-05-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279326#comment-15279326
 ] 

ASF GitHub Bot commented on BEAM-22:


GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/322

[BEAM-22] More regularly schedule additional roots

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

This ensures that even when elements are pushed back into the Pipeline
Runner, roots are scheduled if necessary.

As elements may be rescheduled indefinitely, this is required to ensure
that unbounded roots are scheduled during pipeline execution when
existing elements are blocked on side inputs.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam 
more_aggressively_add_roots

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/322.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #322


commit 1f5e399630e3ba71e2c025e7ab3f32c6c3ba9518
Author: Thomas Groh 
Date:   2016-05-11T00:11:23Z

More regularly schedule additional roots

This ensures that even when elements are pushed back into the Pipeline
Runner, roots are scheduled if necessary.

As elements may be rescheduled indefinitely, this is required to ensure
that unbounded roots are scheduled during pipeline execution when
existing elements are blocked on side inputs.




> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-225) Create Class for Common TypeDescriptors

2016-05-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280522#comment-15280522
 ] 

ASF GitHub Bot commented on BEAM-225:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/275


> Create Class for Common TypeDescriptors
> ---
>
> Key: BEAM-225
> URL: https://issues.apache.org/jira/browse/BEAM-225
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Jesse Anderson
>Assignee: Jesse Anderson
>Priority: Trivial
>  Labels: starter
>
> There should be a built-in class for common types like String, Float, etc.
> Right now, all types have to create an inline TypeDescriptor:
> {code:java}
> PCollection words = suits.apply(
> FlatMapElements.via(
> (String line) -> Arrays.asList(line.split(" "))
> ).withOutputType(new TypeDescriptor() {}));
> {code}
> The should be a built-in class with common types like String so you don't 
> have to create a TypeDescriptor each time like:
> {code:java}
>   PCollection words = suits.apply(
> FlatMapElements.via(
> (String line) -> Arrays.asList(line.split(" "))
> ).withOutputType(TypeDescriptors.STRINGS));
> {code}
> Another possibility is to make it a static method:
> {code:java}
>   PCollection words = suits.apply(
> FlatMapElements.via(
> (String line) -> Arrays.asList(line.split(" "))
> ).withOutputType(TypeDescriptors.strings()));
> {code}
> An example of this is Apache Crunch's Writables class 
> https://crunch.apache.org/apidocs/0.11.0/org/apache/crunch/types/writable/Writables.html.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-270) Support Timestamps/Windows in Flink Batch

2016-05-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286181#comment-15286181
 ] 

ASF GitHub Bot commented on BEAM-270:
-

GitHub user aljoscha opened a pull request:

https://github.com/apache/incubator-beam/pull/343

[BEAM-270] Support Timestamps/Windows in Flink Batch

This is a cleanup version of #328, this time for real.

The interesting things are in 
`FlinkPartialReduceFunction`/`FlinkReduceFunction`, 
`FlinkMergingPartialReduceFunction`/`FlinkMergingReduceFunction` and 
`FlinkMergingNonShuffleReduceFunction`. All of these implement special cases of 
windowing. The first two are for general, non-merging windows, the second set 
is for doing a `GroupByKey`, the last one is for merging windows. In the last 
case we cannot do a pre-shuffle combine step.

R: @kennknowles and @mxm for review

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aljoscha/incubator-beam 
flink-windowed-value-batch-cleanup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/343.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #343


commit 93c3f99a6be44b7aad7859927c69974d368f9903
Author: Kenneth Knowles 
Date:   2016-05-02T20:11:12Z

Add TestFlinkPipelineRunner to FlinkRunnerRegistrar

This makes the runner available for selection by integration tests.

commit c48e1eaea4805359fdfc326d70b5d3c9964fe37f
Author: Kenneth Knowles 
Date:   2016-05-02T21:04:20Z

Configure RunnableOnService tests for Flink in batch mode

Today Flink batch supports only global windows. This is a situation we
intend our build to allow, eventually via JUnit category filtering.

For now all the test classes that use non-global windows are excluded
entirely via maven configuration. In the future, it should be on a
per-test-method basis.

commit 4cc1acc8630a2e436acd75f5aeb4ee6b01a38dc5
Author: Aljoscha Krettek 
Date:   2016-05-06T06:26:50Z

Fix Dangling Flink DataSets

commit 508eebafee0a762a59d5a21a07c26f43981c304f
Author: Aljoscha Krettek 
Date:   2016-05-06T07:38:55Z

Add hamcrest dependency to Flink Runner

Without it the RunnableOnService tests seem to not work

commit 3b1f064ca1f2985b6898d527f6174cc9055a1e4a
Author: Kenneth Knowles 
Date:   2016-05-06T17:54:41Z

Remove unused threadCount from integration tests

commit 99df86fc057d49fcf4e305d3523864d68cf5abd1
Author: Kenneth Knowles 
Date:   2016-05-06T17:55:16Z

Disable Flink streaming integration tests for now

commit 4b2eb1151e4cd7ef140a9e6e0eab251452ef7070
Author: Kenneth Knowles 
Date:   2016-05-06T19:49:55Z

Special casing job exec AssertionError in TestFlinkPipelineRunner

commit c45651fa434d91064a16f54d53b65f40eadad108
Author: Aljoscha Krettek 
Date:   2016-05-10T11:53:03Z

[BEAM-270] Support Timestamps/Windows in Flink Batch

With this change we always use WindowedValue for the underlying Flink
DataSets instead of just T. This allows us to support windowing as well.

This changes also a lot of other stuff enabled by the above:

 - Use WindowedValue throughout
 - Add proper translation for Window.into()
 - Make side inputs window aware
 - Make GroupByKey and Combine transformations window aware, this
   includes support for merging windows. GroupByKey is implemented as a
   Combine with a concatenating CombineFn, for simplicity

This removes Flink specific transformations for things that are handled
by builtin sources/sinks, among other things this:

 - Removes special translation for AvroIO.Read/Write and
   TextIO.Read/Write
 - Removes special support for Write.Bound, this was not working properly
   and is now handled by the Beam machinery that uses DoFns for this
 - Removes special translation for binary Co-Group, the code was still
   in there but was never used

With this change all RunnableOnService tests run on Flink Batch.

commit 863aa2cb2a207449e9a711c4a9e248ed134939d4
Author: Aljoscha Krettek 
Date:   2016-05-13T12:17:50Z

Fix faulty Flink Flatten when PCollectionList is empty

commit 5c58830c2da4c0b86d80f93251b001f96edeef35
Author: Aljoscha Krettek 
Date:   2016-05-13T12:41:20Z

Remove superfluous Flink Tests, Fix those that stay in

All of the stuff in the removed ITCases is covered (in more detail) by
the RunnableOnService tests.

commit 5e6be8c757f89d933a4e6818cf7ef6316b7195d6
Author: Aljoscha Krettek

[jira] [Commented] (BEAM-53) PubSubIO: reimplement in Java

2016-05-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-53?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287148#comment-15287148
 ] 

ASF GitHub Bot commented on BEAM-53:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/332


> PubSubIO: reimplement in Java
> -
>
> Key: BEAM-53
> URL: https://issues.apache.org/jira/browse/BEAM-53
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Daniel Halperin
>Assignee: Mark Shields
>
> PubSubIO is currently only partially implemented in Java: the 
> DirectPipelineRunner uses a non-scalable API in a single-threaded manner.
> In contrast, the DataflowPipelineRunner uses an entirely different code path 
> implemented in the Google Cloud Dataflow service.
> We need to reimplement PubSubIO in Java in order to support other runners in 
> a scalable way.
> Additionally, we can take this opportunity to add new features:
> * getting timestamp from an arbitrary lambda in arbitrary formats rather than 
> from a message attribute in only 2 formats.
> * exposing metadata and attributes in the elements produced by PubSubIO.Read
> * setting metadata and attributes in the messages written by PubSubIO.Write



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-53) PubSubIO: reimplement in Java

2016-05-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-53?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287905#comment-15287905
 ] 

ASF GitHub Bot commented on BEAM-53:


GitHub user mshields822 opened a pull request:

https://github.com/apache/incubator-beam/pull/346

[BEAM-53] Wire PubsubUnbounded{Source,Sink} into PubsubIO

This also refines the handling of record ids in the sink to be 
random-but-reused-on-failure, using the same trick as we do for the BigQuery 
sink.

Still need to re-do the load tests I did a few weeks back with the actual 
change.
Note that last time I tested the DataflowPipelineTranslator does not kick 
in and replace the new transforms with the correct native transforms. Need to 
dig deeper.

R: @dhalperi 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mshields822/incubator-beam pubsub-runner

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/346.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #346


commit ce655a6a7480147f7527fa23818e1d546abaa599
Author: Mark Shields 
Date:   2016-04-12T00:36:27Z

Make java unbounded pub/sub source the default.

commit aafcf5f9d0286ec5a6ed0d634df8ff0902897cdc
Author: Mark Shields 
Date:   2016-05-17T23:44:30Z

Refine record id calculation. Prepare for supporting unit tests with 
re-using record ids.




> PubSubIO: reimplement in Java
> -
>
> Key: BEAM-53
> URL: https://issues.apache.org/jira/browse/BEAM-53
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Daniel Halperin
>Assignee: Mark Shields
>
> PubSubIO is currently only partially implemented in Java: the 
> DirectPipelineRunner uses a non-scalable API in a single-threaded manner.
> In contrast, the DataflowPipelineRunner uses an entirely different code path 
> implemented in the Google Cloud Dataflow service.
> We need to reimplement PubSubIO in Java in order to support other runners in 
> a scalable way.
> Additionally, we can take this opportunity to add new features:
> * getting timestamp from an arbitrary lambda in arbitrary formats rather than 
> from a message attribute in only 2 formats.
> * exposing metadata and attributes in the elements produced by PubSubIO.Read
> * setting metadata and attributes in the messages written by PubSubIO.Write



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-117) Implement the API for Static Display Metadata

2016-05-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285185#comment-15285185
 ] 

ASF GitHub Bot commented on BEAM-117:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/338


> Implement the API for Static Display Metadata
> -
>
> Key: BEAM-117
> URL: https://issues.apache.org/jira/browse/BEAM-117
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Ben Chambers
>Assignee: Scott Wegner
>
> As described in the following doc, we would like the SDK to allow associating 
> display metadata with PTransforms.
> https://docs.google.com/document/d/11enEB9JwVp6vO0uOYYTMYTGkr3TdNfELwWqoiUg5ZxM/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-289) Examples Use TypeDescriptors

2016-05-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285528#comment-15285528
 ] 

ASF GitHub Bot commented on BEAM-289:
-

GitHub user eljefe6a opened a pull request:

https://github.com/apache/incubator-beam/pull/342

[BEAM-289] Examples Use TypeDescriptors

Jira issue BEAM-289 Examples Use TypeDescriptors. Changed example code to 
use `TypeDescriptors`.

These could have used static imports to make the lines shorter, but we 
opted for the more understandable syntax.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/eljefe6a/incubator-beam 
TypeDescriptorsExamples

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/342.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #342


commit 72ec9b82fb21ee9040b1bee6615fecfb9916e470
Author: Jesse Anderson 
Date:   2016-05-02T18:39:26Z

Make Regex Transform

commit e6f8c958a2bb88d8b2582cb9d4391922c15b7141
Author: Jesse Anderson 
Date:   2016-05-02T22:34:08Z

Merge remote-tracking branch 'upstream/master'

commit 587eaaec106829002df5df1b38753f811649aa51
Author: Jesse Anderson 
Date:   2016-05-03T01:08:13Z

Fixing checkstyle issues. Added missing Apache license.

commit df3045f62c939ef3a777ffbf658088f193144983
Author: Jesse Anderson 
Date:   2016-05-05T15:46:14Z

Added distributed replacement functions. Add replaceAll and replaceFirst. 
Fixed some JavaDocs.

commit 793d22667f485a5cdd49a7d36553c96e6898391c
Author: Jesse Anderson 
Date:   2016-05-05T15:55:58Z

Whitespace fixes for check style.

commit 9e5a9971131721c988242400643712f5c9671b9e
Author: Jesse Anderson 
Date:   2016-05-09T15:47:56Z

Merge remote-tracking branch 'upstream/master'

commit 425a4d89692f33f99a68aed511270de0ff9db4ac
Author: Jesse Anderson 
Date:   2016-05-11T19:51:57Z

Merge remote-tracking branch 'upstream/master'

commit 225b2d0ac2d8808da7756f915cbeb7684e4951a4
Author: Jesse Anderson 
Date:   2016-05-14T01:20:12Z

Merge remote-tracking branch 'upstream/master'

commit c1a3bc55b47e5dbd11c2bcb360a2a7f2c4aacfb9
Author: Jesse Anderson 
Date:   2016-05-16T20:58:08Z

Changed Word Counts to use TypeDescriptors.

commit adfeb01bafb1c17c6c65b30c45ea3e42473a80e6
Author: Jesse Anderson 
Date:   2016-05-16T21:09:18Z

Updated complete examples to use TypeDescriptors.

commit 2a455f3ae029e30644cc7b6c96105f761f903e74
Author: Jesse Anderson 
Date:   2016-05-16T22:11:57Z

Removing Regex transforms from this branch.




> Examples Use TypeDescriptors
> 
>
> Key: BEAM-289
> URL: https://issues.apache.org/jira/browse/BEAM-289
> Project: Beam
>  Issue Type: Improvement
>  Components: examples-java
>Reporter: Jesse Anderson
>Assignee: Frances Perry
>
> Change the Java and Java 8 examples to use TypeDescriptors instead of inline 
> TypeDescriptor creation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-295) Flink Create Functions call Collector.close()

2016-05-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289185#comment-15289185
 ] 

ASF GitHub Bot commented on BEAM-295:
-

GitHub user aljoscha opened a pull request:

https://github.com/apache/incubator-beam/pull/347

[BEAM-295] Remove erroneous close() calls in Flink Create Sources

Collector.close() should only be called by internal Flink components,
not by user functions.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aljoscha/incubator-beam remove-collector-close

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/347.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #347


commit bd658bfb3d36e047eacecc91146b051b91eebf1b
Author: Aljoscha Krettek 
Date:   2016-05-18T15:46:34Z

[BEAM-295] Remove erroneous close() calls in Flink Create Sources

Collector.close() should only be called by internal Flink components,
not by user functions.




> Flink Create Functions call Collector.close()
> -
>
> Key: BEAM-295
> URL: https://issues.apache.org/jira/browse/BEAM-295
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> {{Collector.close()}} should only be called internally, by Flink. Calling 
> close() in the user function, as we do in {{FlinkCreateFunction}} and 
> {{FlinkStreamingCreateFunction}} will lead to downstream operations being 
> closed twice, which can lead to faulty behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-286) Reorganize flink runner directories

2016-05-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289295#comment-15289295
 ] 

ASF GitHub Bot commented on BEAM-286:
-

GitHub user jbonofre opened a pull request:

https://github.com/apache/incubator-beam/pull/348

[BEAM-286] Reorganize flink runner module to follow other runners str…

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [X] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [X] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [X] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [X] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

Reorganize flink runner module to follow other runners structure.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jbonofre/incubator-beam BEAM-286-REORG-FLINK

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/348.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #348


commit 5200ebf3586a0424093f160c287394c85f466d73
Author: Jean-Baptiste Onofré 
Date:   2016-05-18T16:46:34Z

[BEAM-286] Reorganize flink runner module to follow other runners structure




> Reorganize flink runner directories
> ---
>
> Key: BEAM-286
> URL: https://issues.apache.org/jira/browse/BEAM-286
> Project: Beam
>  Issue Type: Task
>  Components: runner-flink
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
> Fix For: 0.1.0-incubating
>
>
> The flink runner Maven module uses two sub-modules: runner and examples. It's 
> the only one which use such layout (compare to spark, dataflow or 
> inprocess/direct runners).
> I will propose a PR to align flink runner module with the other, keeping the 
> examples in a sub-directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-228) Create a merge bot for Beam

2016-05-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289345#comment-15289345
 ] 

ASF GitHub Bot commented on BEAM-228:
-

Github user jasonkuster closed the pull request at:

https://github.com/apache/incubator-beam/pull/225


> Create a merge bot for Beam
> ---
>
> Key: BEAM-228
> URL: https://issues.apache.org/jira/browse/BEAM-228
> Project: Beam
>  Issue Type: New Feature
>  Components: project-management
>Reporter: Jason Kuster
>Assignee: Jason Kuster
>
> This issue tracks the creation of a merge bot for Beam. This merge bot should 
> watch the Beam github repository and queue and merge pull requests which are 
> marked LGTM and good for merge by an approved Beam committer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-79) Gearpump runner

2016-05-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279595#comment-15279595
 ] 

ASF GitHub Bot commented on BEAM-79:


GitHub user manuzhang opened a pull request:

https://github.com/apache/incubator-beam/pull/323

[BEAM-79] add Gearpump runner

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

This PR adds Gearpump runner to Beam meeting the goals of phase 1 in the 
[design 
document](https://docs.google.com/document/d/1nw64QUWVfT8L7FUprPGLEeNjSBpDMkn1otfLt2rHM5g/edit).

The Gearpump runner supports the following functionalities,

* Transformations: ParDo, GroupByKey, Flatten
* Windows: using Beam's window logic and code
* side outputs
* serialization/deserialization: using Gearpump's Kryo serializer
* sources: Beam's UnboundedSource
* message delivery guarantee: at-most-once
* tests: integration test for various translators

Here's a snapshot of running the following Beam example on Gearpump cluster

```java
PCollection> wordCounts =
p.apply(Read.from(new UnboundedTextSource()).named("WordStream"))
.apply(ParDo.of(new ExtractWordsFn()))

.apply(Window.into(FixedWindows.of(Duration.standardSeconds(10
.apply(Count.perElement());

wordCounts.apply(ParDo.of(new FormatAsStringFn()));
```


![snip20160511_4](https://cloud.githubusercontent.com/assets/1191767/15171197/fd6ffba0-177e-11e6-99a1-30c7c2597244.png)

Note that the Gearpump runner is still in early stage and lacking 
capabilities like trigger, side inputs, aggregator. However, I'd like to have 
the community to get a feel of what Gearpump is like, whether Beam and Gearpump 
go well, and gather ideas for improvements. 




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manuzhang/incubator-beam gearpump_runner

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/323.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #323


commit 73e5978f599bcf32ed8c2f1d54b6bd3bd8350092
Author: manuzhang 
Date:   2016-03-15T08:15:16Z

[BEAM-79] add Gearpump runner




> Gearpump runner
> ---
>
> Key: BEAM-79
> URL: https://issues.apache.org/jira/browse/BEAM-79
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-ideas
>Reporter: Tyler Akidau
>Assignee: Manu Zhang
>
> Intel is submitting Gearpump (http://www.gearpump.io) to ASF 
> (https://wiki.apache.org/incubator/GearpumpProposal). Appears to be a mix of 
> low-level primitives a la MillWheel, with some higher level primitives like 
> non-merging windowing mixed in. Seems like it would make a nice Beam runner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-05-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280718#comment-15280718
 ] 

ASF GitHub Bot commented on BEAM-22:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/322


> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-77) Reorganize Directory structure

2016-04-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264846#comment-15264846
 ] 

ASF GitHub Bot commented on BEAM-77:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/256


> Reorganize Directory structure
> --
>
> Key: BEAM-77
> URL: https://issues.apache.org/jira/browse/BEAM-77
> Project: Beam
>  Issue Type: Task
>  Components: project-management
>Reporter: Frances Perry
>Assignee: Jean-Baptiste Onofré
>
> Now that we've done the initial Dataflow code drop, we will restructure 
> directories to provide space for additional SDKs and Runners.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-255) Understand and improve performance of Write/FileBasedSink

2016-05-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269481#comment-15269481
 ] 

ASF GitHub Bot commented on BEAM-255:
-

GitHub user dhalperi opened a pull request:

https://github.com/apache/incubator-beam/pull/279

[BEAM-255] Write: add limited logging

This will help, for all sinks, users and developers gain insight into where 
time
is spent. (Enabling DEBUG level will provide more insight.)

R: @lukecwik 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhalperi/incubator-beam write-logging

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/279.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #279


commit d892f259cb9dd33b762aca416d9b616423c5fbae
Author: Dan Halperin 
Date:   2016-05-03T20:07:04Z

[BEAM-255] Write: add limited logging

This will help, for all sinks, users and developers gain insight into where 
time
is spent. (Enabling DEBUG level will provide more insight.)




> Understand and improve performance of Write/FileBasedSink
> -
>
> Key: BEAM-255
> URL: https://issues.apache.org/jira/browse/BEAM-255
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>
> The work by [~lcwik] to move TextIO from a Dataflow-specific implementation 
> to a FileBasedSink-based implementation may have caused performance 
> regressions -- which really means it has exposed opportunity for improvement 
> in the Beam combination of Write/FileBasedSink.
> This is a general tracking bug for SDK-side improvements (likely GCP-related).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-168) IntervalBoundedExponentialBackOff change broke Beam-on-Dataflow

2016-05-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269369#comment-15269369
 ] 

ASF GitHub Bot commented on BEAM-168:
-

GitHub user dhalperi opened a pull request:

https://github.com/apache/incubator-beam/pull/278

[BEAM-168] IntervalBEB: remove deprecated function

The pre-commit wordcount test will confirm that this does not break the
Cloud Dataflow worker.

This resolves BEAM-168.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhalperi/incubator-beam remove-deprecated

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/278.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #278


commit fa2b18a92f98950ec401863610c2e126e1c62e5c
Author: Dan Halperin 
Date:   2016-05-03T19:04:13Z

[BEAM-168] IntervalBEB: remove deprecated function

The pre-commit wordcount test will confirm that this does not break the
Cloud Dataflow worker.




> IntervalBoundedExponentialBackOff change broke Beam-on-Dataflow
> ---
>
> Key: BEAM-168
> URL: https://issues.apache.org/jira/browse/BEAM-168
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>
> Changing the `int` to a `long` breaks ABI compatibility, which Dataflow 
> service uses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-225) Create Class for Common TypeDescriptors

2016-05-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269249#comment-15269249
 ] 

ASF GitHub Bot commented on BEAM-225:
-

GitHub user eljefe6a opened a pull request:

https://github.com/apache/incubator-beam/pull/275

[BEAM-225] Create Class for Common TypeDescriptors

Jira issue BEAM-225 Create Class for Common TypeDescriptors. Adding 
KVTypeDescriptors and TypeDescriptors to make primitives easier.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/eljefe6a/incubator-beam TypeDescriptors

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/275.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #275


commit 8e89c7a9e390ec8026ee46d68d3a422f8917ef71
Author: Jesse Anderson 
Date:   2016-05-03T18:09:08Z

Adding KVTypeDescriptors and TypeDescriptors to make primitives easier.




> Create Class for Common TypeDescriptors
> ---
>
> Key: BEAM-225
> URL: https://issues.apache.org/jira/browse/BEAM-225
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Jesse Anderson
>Priority: Trivial
>  Labels: starter
>
> There should be a built-in class for common types like String, Float, etc.
> Right now, all types have to create an inline TypeDescriptor:
> {code:java}
> PCollection words = suits.apply(
> FlatMapElements.via(
> (String line) -> Arrays.asList(line.split(" "))
> ).withOutputType(new TypeDescriptor() {}));
> {code}
> The should be a built-in class with common types like String so you don't 
> have to create a TypeDescriptor each time like:
> {code:java}
>   PCollection words = suits.apply(
> FlatMapElements.via(
> (String line) -> Arrays.asList(line.split(" "))
> ).withOutputType(TypeDescriptors.STRINGS));
> {code}
> Another possibility is to make it a static method:
> {code:java}
>   PCollection words = suits.apply(
> FlatMapElements.via(
> (String line) -> Arrays.asList(line.split(" "))
> ).withOutputType(TypeDescriptors.strings()));
> {code}
> An example of this is Apache Crunch's Writables class 
> https://crunch.apache.org/apidocs/0.11.0/org/apache/crunch/types/writable/Writables.html.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-05-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267997#comment-15267997
 ] 

ASF GitHub Bot commented on BEAM-22:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/265


> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-154) Provide Maven BOM

2016-05-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269550#comment-15269550
 ] 

ASF GitHub Bot commented on BEAM-154:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/267


> Provide Maven BOM
> -
>
> Key: BEAM-154
> URL: https://issues.apache.org/jira/browse/BEAM-154
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>
> When using the Java SDK (for instance to develop IO), the developer has to 
> add dependencies in his pom.xml (like junit, hamcrest, slf4j, ...).
> To simplify the way to define the dependencies, each Beam SDK could provide a 
> Maven BoM (Bill of Material) describing these dependencies. Then the 
> developer could simply define this BoM as pom.xml dependency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-168) IntervalBoundedExponentialBackOff change broke Beam-on-Dataflow

2016-05-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269558#comment-15269558
 ] 

ASF GitHub Bot commented on BEAM-168:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/278


> IntervalBoundedExponentialBackOff change broke Beam-on-Dataflow
> ---
>
> Key: BEAM-168
> URL: https://issues.apache.org/jira/browse/BEAM-168
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>
> Changing the `int` to a `long` breaks ABI compatibility, which Dataflow 
> service uses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-248) Register DisplayData from anonymous implementation PTransforms

2016-05-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269544#comment-15269544
 ] 

ASF GitHub Bot commented on BEAM-248:
-

GitHub user swegner opened a pull request:

https://github.com/apache/incubator-beam/pull/280

[BEAM-248] Add display data to additional PTransforms

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/swegner/incubator-beam displaydata-leaves

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/280.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #280


commit 676f6fe8715444a8bf1c614de7322557a6f9de72
Author: Scott Wegner 
Date:   2016-05-03T16:28:21Z

Test utility for display data in a pipeline runner

DisplayDataEvaluator is useful for validating how PTransform
display data is surfaced in the context of a Pipeline and runner.

commit f5e167bbea1ffced370dffab8af2697fad157728
Author: Scott Wegner 
Date:   2016-05-03T16:33:55Z

Fix Combine transform primitive display data

commit 978068c2fb754986cb86f412c1e85eaf5ee30f7b
Author: Scott Wegner 
Date:   2016-05-03T18:16:51Z

Add display data for MapElements transform




> Register DisplayData from anonymous implementation PTransforms
> --
>
> Key: BEAM-248
> URL: https://issues.apache.org/jira/browse/BEAM-248
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>
> Most SDK PTransforms are implemented in terms of lower-level PTransforms, 
> often with anonymous user-fn implementations at the leaf-level. Currently 
> display data is only being registered on the composite node and not within 
> the anonymous implementation. As a result, the details are lost.
> We should register display data both in the composite and internal leaf 
> nodes, particularly when the implementation is anonymous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-255) Understand and improve performance of Write/FileBasedSink

2016-05-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269549#comment-15269549
 ] 

ASF GitHub Bot commented on BEAM-255:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/279


> Understand and improve performance of Write/FileBasedSink
> -
>
> Key: BEAM-255
> URL: https://issues.apache.org/jira/browse/BEAM-255
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>
> The work by [~lcwik] to move TextIO from a Dataflow-specific implementation 
> to a FileBasedSink-based implementation may have caused performance 
> regressions for Dataflow -- which really means it has exposed an opportunity 
> for improvement in the Beam combination of Write/FileBasedSink.
> This is a general tracking bug for SDK-side improvements (likely GCP-related).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-05-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269636#comment-15269636
 ] 

ASF GitHub Bot commented on BEAM-22:


GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/282

[BEAM-22] Refactor CompletionCallbacks

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

The default and timerful completion callbacks are identical, excepting
their calls to evaluationContext.commitResult; factor that code into a
common location.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam 
completion_callback_refactor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/282.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #282






> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-256) Add lifecycle even verifiers for Beam pipelines.

2016-05-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270093#comment-15270093
 ] 

ASF GitHub Bot commented on BEAM-256:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/273


> Add lifecycle even verifiers for Beam pipelines.
> 
>
> Key: BEAM-256
> URL: https://issues.apache.org/jira/browse/BEAM-256
> Project: Beam
>  Issue Type: New Feature
>  Components: testing
>Reporter: Jason Kuster
>Assignee: Davor Bonaci
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-259) Execute selected RunnableOnService tests with Spark runner

2016-05-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273223#comment-15273223
 ] 

ASF GitHub Bot commented on BEAM-259:
-

GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/294

[BEAM-259] Configure RunnableOnService tests for Spark runner, batch mode

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

This PR demonstrates how to configure the integration tests. It has two 
categories of issue:

1. Transforms that are not supported. For these we can add surefire 
exclusions for now.
2. Runtime errors having to do with configuring Spark. I'm hoping someone 
with expertise in the runner can take quickly recommend the course of action.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam spark-integration

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/294.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #294


commit d8a2a34f723ff4ca7fe841c8056706c19d37770d
Author: Kenneth Knowles 
Date:   2016-05-05T22:11:07Z

Configure RunnableOnService tests for Spark runner, batch mode




> Execute selected RunnableOnService tests with Spark runner
> --
>
> Key: BEAM-259
> URL: https://issues.apache.org/jira/browse/BEAM-259
> Project: Beam
>  Issue Type: Test
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-258) Execute selected RunnableOnService tests with Flink runner

2016-05-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272882#comment-15272882
 ] 

ASF GitHub Bot commented on BEAM-258:
-

GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/291

[BEAM-258] Configure RunnableOnService tests for Flink runner

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

This is a sample configuration for now.

There are these kind of failures in the tests right now:

1. Since the batch runner only supports global windows, I've filtered those 
tests out. I added the `UnsupportedOperationException` to the windowing 
translator so I could distinguish them.
2. Those tests that have simple & supported pipelines succeed at building 
the pipeline, but somehow the graph is empty - I have checked and it seemed 
like translators are not even being invoked. This is beyond my current scope of 
digging in.
3. Some other tests fail with other misc errors. Perhaps they use state, 
which results in `NullPointerException`.
4. Pretty much all of the tests require side inputs since that is how 
`PAssert` works, so they cannot work in streaming mode.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam flink-integration

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/291.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #291


commit 52198eb7e9c2df627871e7f96404d188dc4a4ee7
Author: Kenneth Knowles 
Date:   2016-05-02T21:29:30Z

Add Window.Bound translator to Flink batch

This adds a Window.Bound translator that allows only
GlobalWindows. It is a temporary measure, but one that
brings the Flink batch translator in line with the
Beam model - instead of "ignoring" windows, the GBK
is a perfectly valid GBK for GlobalWindows.

Previously, the SDK's runner test suite would fail
due to the lack of a translator - now some of them
will fail due to windowing support, but others have
a chance.

commit 095f9840dd0d8d78f041e494613375664f7d3eaa
Author: Kenneth Knowles 
Date:   2016-05-02T20:11:12Z

Add TestFlinkPipelineRunner to FlinkRunnerRegistrar

This makes the runner available for selection by integration tests.

commit 750a49d286f4d11d6ad63460d8b244a5ebde975e
Author: Kenneth Knowles 
Date:   2016-05-02T21:04:20Z

Configure RunnableOnService tests for Flink in batch mode

Today Flink batch supports only global windows. This is a situation we
intend our build to allow, eventually via JUnit category filtering.

For now all the test classes that use non-global windows are excluded
entirely via maven configuration. In the future, it should be on a
per-test-method basis.




> Execute selected RunnableOnService tests with Flink runner
> --
>
> Key: BEAM-258
> URL: https://issues.apache.org/jira/browse/BEAM-258
> Project: Beam
>  Issue Type: Test
>  Components: runner-flink
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-13) Create JMS IO

2016-05-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274324#comment-15274324
 ] 

ASF GitHub Bot commented on BEAM-13:


GitHub user jbonofre opened a pull request:

https://github.com/apache/incubator-beam/pull/299

[BEAM-13] Add JmsIO

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [X] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [X] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [X] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [X] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
JmsIO (unbounded source and sink).


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jbonofre/incubator-beam BEAM-13-JMSIO

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/299.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #299


commit 02b5b8327ee6ebe963e6d613fb5cfec6156762d3
Author: Jean-Baptiste Onofré 
Date:   2016-05-05T17:14:37Z

[BEAM-13] Add JmsIO




> Create JMS IO
> -
>
> Key: BEAM-13
> URL: https://issues.apache.org/jira/browse/BEAM-13
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>
> Work in progress: https://github.com/jbonofre/DataflowJavaSDK/tree/IO-JMS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-240) Add display data link URLs for sources / sinks

2016-05-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274423#comment-15274423
 ] 

ASF GitHub Bot commented on BEAM-240:
-

GitHub user swegner opened a pull request:

https://github.com/apache/incubator-beam/pull/300

[BEAM-240] Display data links for input and output files

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
With display data, SDK authors have the ability to annotate display items 
with a link URLs. This adds  browse URLs for GCS and local files, and attach 
them to well-known source/sink display data.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/swegner/incubator-beam displaydata-urls

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/300.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #300


commit c95a61a0df01f93ebeeb897249535f160afdf9eb
Author: Scott Wegner 
Date:   2016-05-03T23:18:27Z

Add browse URL to GcsPath

commit 6ff5bae0e5bc1cdc2d10ef9e65b44039c8363407
Author: Scott Wegner 
Date:   2016-05-03T23:32:21Z

Add browse URL to File IO

commit ca40e25c87ddde4693b17169e40c00c0b62be170
Author: Scott Wegner 
Date:   2016-05-04T16:38:18Z

Add DisplayDataMatcher for linkUrl

commit 28739a73ec2510fc9b78fc878346f121e5bd7636
Author: Scott Wegner 
Date:   2016-05-04T16:39:04Z

Add linkUrl to AvroIO DisplayData

commit fefc3ef5b2a3b7c0178a216c7fdaaa4885893144
Author: Scott Wegner 
Date:   2016-05-04T16:51:20Z

Add linkUrl to FileBasedSource DisplayData

commit c28c4470c93b77096ea182ddc29813f7fe69db03
Author: Scott Wegner 
Date:   2016-05-04T17:01:39Z

Add linkUrl to FileBasedSink DisplayData

commit f369a3324a260346bd45c786bccfa8a2a1c15e64
Author: Scott Wegner 
Date:   2016-05-04T17:19:15Z

Add linkUrl to TextIO DisplayData




> Add display data link URLs for sources / sinks
> --
>
> Key: BEAM-240
> URL: https://issues.apache.org/jira/browse/BEAM-240
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-267) Enable Chekstyle check in Spark runner

2016-05-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274252#comment-15274252
 ] 

ASF GitHub Bot commented on BEAM-267:
-

GitHub user jbonofre opened a pull request:

https://github.com/apache/incubator-beam/pull/298

[BEAM-267] Enable checkstyle in Spark runner

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [X] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [X] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [X] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [X] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
Enable checkstyle in Spark runner and fix checkstyle errors.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jbonofre/incubator-beam 
BEAM-267-CHECKSTYLE-SPARK

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/298.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #298


commit 2fb4f96d5d24ac0aeb89e37e6bc55234ce735751
Author: Jean-Baptiste Onofré 
Date:   2016-05-06T15:47:46Z

[BEAM-267] Enable checkstyle in Spark runner




> Enable Chekstyle check in Spark runner
> --
>
> Key: BEAM-267
> URL: https://issues.apache.org/jira/browse/BEAM-267
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-201) Material page

2016-05-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267027#comment-15267027
 ] 

ASF GitHub Bot commented on BEAM-201:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam-site/pull/13


> Material page
> -
>
> Key: BEAM-201
> URL: https://issues.apache.org/jira/browse/BEAM-201
> Project: Beam
>  Issue Type: Improvement
>  Components: website
> Environment: Create a website page with logo and project material 
> content
>Reporter: James Malone
>Assignee: James Malone
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-05-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267190#comment-15267190
 ] 

ASF GitHub Bot commented on BEAM-22:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/264


> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-252) Make Regex Transform

2016-05-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267234#comment-15267234
 ] 

ASF GitHub Bot commented on BEAM-252:
-

GitHub user eljefe6a opened a pull request:

https://github.com/apache/incubator-beam/pull/269

[BEAM-252] Make Regex Transform

Jira issue BEAM-252 Make Regex Transform. Adding a pre-built transform for 
Regex.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/eljefe6a/incubator-beam master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/269.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #269


commit 72ec9b82fb21ee9040b1bee6615fecfb9916e470
Author: Jesse Anderson 
Date:   2016-05-02T18:39:26Z

Make Regex Transform




> Make Regex Transform
> 
>
> Key: BEAM-252
> URL: https://issues.apache.org/jira/browse/BEAM-252
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Jesse Anderson
>Assignee: Davor Bonaci
>
> There needs to be an easier way to run Regular Expressions as part of a 
> transform. This will make string-based ETL much easier.
> The transform should support using the matches and find methods. The 
> transform should allow you to choose a group in the regex to output. The 
> transform should allow single strings to be output or KV's of strings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-231) Remove ClassForDisplay infrastructure class.

2016-05-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267431#comment-15267431
 ] 

ASF GitHub Bot commented on BEAM-231:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/259


> Remove ClassForDisplay infrastructure class.
> 
>
> Key: BEAM-231
> URL: https://issues.apache.org/jira/browse/BEAM-231
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Scott Wegner
>
> See discussion here: 
> https://github.com/apache/incubator-beam/pull/247#discussion-diff-61184975
> This class should no longer be needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-77) Reorganize Directory structure

2016-05-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271033#comment-15271033
 ] 

ASF GitHub Bot commented on BEAM-77:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/281


> Reorganize Directory structure
> --
>
> Key: BEAM-77
> URL: https://issues.apache.org/jira/browse/BEAM-77
> Project: Beam
>  Issue Type: Task
>  Components: project-management
>Reporter: Frances Perry
>Assignee: Jean-Baptiste Onofré
>
> Now that we've done the initial Dataflow code drop, we will restructure 
> directories to provide space for additional SDKs and Runners.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-05-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271078#comment-15271078
 ] 

ASF GitHub Bot commented on BEAM-22:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/258


> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-05-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271716#comment-15271716
 ] 

ASF GitHub Bot commented on BEAM-22:


GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/289

[BEAM-22] Mark CheckpointMark as volatile in UnboundedReadEvaluator

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

The evaluator may be reused in a different thread, and updates to the
checkpoint must be visible.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam 
unbounded_evaluator_close_before_requeue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/289.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #289


commit e6ba15d451c9e3574a29ac0d51e6275893c9ee60
Author: Thomas Groh 
Date:   2016-05-05T00:44:59Z

Mark CheckpointMark as volatile in UnboundedReadEvaluator

The evaluator may be reused in a different thread, and updates to the
checkpoint must be visible.




> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-05-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272951#comment-15272951
 ] 

ASF GitHub Bot commented on BEAM-22:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/282


> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-115) Beam Runner API

2016-05-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272941#comment-15272941
 ] 

ASF GitHub Bot commented on BEAM-115:
-

Github user kennknowles closed the pull request at:

https://github.com/apache/incubator-beam/pull/277


> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-05-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278513#comment-15278513
 ] 

ASF GitHub Bot commented on BEAM-22:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/309


> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-103) Make UnboundedSourceWrapper parallel

2016-05-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268736#comment-15268736
 ] 

ASF GitHub Bot commented on BEAM-103:
-

GitHub user aljoscha opened a pull request:

https://github.com/apache/incubator-beam/pull/274

[BEAM-103][BEAM-130] Make Flink Source Parallel and Checkpointed

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ x] Replace `` in the title with the actual Jira issue
   number, if there is one
---

CC: @mxm for the Flink parts
CC: @tgroh for the tests, I thought you could have something to say about 
that since you started the doc about better testing

I created a new `TestCountingSource` in the `runner-flink` package because 
I didn't want to depend on the data flow runner. Maybe we should move this to a 
common package.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aljoscha/incubator-beam 
flink-parallel-unbounded-source

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/274.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #274


commit 698f50bd7759e3a036e35ca214b1d727f9db4e51
Author: Aljoscha Krettek 
Date:   2016-05-03T11:35:35Z

[BEAM-103][BEAM-130] Make Flink Source Parallel and Checkpointed




> Make UnboundedSourceWrapper parallel
> 
>
> Key: BEAM-103
> URL: https://issues.apache.org/jira/browse/BEAM-103
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Aljoscha Krettek
>
> As of now {{UnboundedSource}} s are executed with a parallelism of 1 
> regardless of the splits which the source returns. The corresponding 
> {{UnboundedSourceWrapper}} should implement {{RichParallelSourceFunction}} 
> and deal with splits correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-145) OutputTimeFn#assignOutputTime overrides WindowFn#getOutputTime in unfortunate ways

2016-05-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276642#comment-15276642
 ] 

ASF GitHub Bot commented on BEAM-145:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/296


> OutputTimeFn#assignOutputTime overrides WindowFn#getOutputTime in unfortunate 
> ways
> --
>
> Key: BEAM-145
> URL: https://issues.apache.org/jira/browse/BEAM-145
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Minor
>  Labels: windowing
>
> Today the {{OutputTimeFn}} includes {{#assignOutputTime}}, {{#combine}}, and 
> {{#merge}}. Together these express the grouping of timestamps, analogous to 
> the grouping of values in a GBK / Combine, in a canonical way.
> The default {{OutputTimeFn}} is provided by the {{WindowFn}}. In particular, 
> {{SlidingWindows}} provides an {{OutputTimeFn}} that shifts input timestamps 
> later to avoid watermark stuckness and then takes the minimum to compute the 
> output timestamp.
> The SDK additionally provides instance for "min", "max" and "end of window" 
> output timestamps.
> Unfortunately,  if one overrides the {{OutputTimeFn}} to one of these, the 
> shifting done by {{SlidingWindows}} is lost.
> This is actually only a minor problem for now, since "min" is the default, 
> "end of window" is unaffected, and "max" has only esoteric uses.The fix is 
> easy:
> This is interrelated with another suggested change:  Since there are only 
> three common {{OutputTimeFn}} instances, and it is a high bandwidth API, it 
> does not seem worthwhile to leave it in userland. So it is proposed to reduce 
> it to an enum, which would leave only the {{WindowFn}} as a userland place 
> for timestamp adjustments. (requiring special casing for end-of-window, since 
> it cannot be implemented without owning {{#assignOutputTime}})



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-269) Create BigDecimal Coder

2016-05-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276678#comment-15276678
 ] 

ASF GitHub Bot commented on BEAM-269:
-

GitHub user eljefe6a opened a pull request:

https://github.com/apache/incubator-beam/pull/307

[BEAM-269] Create BigDecimal Coder

Jira issue BEAM-269 Create BigDecimal Coder. Added coder for BigDecimal. 
Added unit tests for BigDecimal coder.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/eljefe6a/incubator-beam BigDecimalCoder

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/307.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #307


commit 72ec9b82fb21ee9040b1bee6615fecfb9916e470
Author: Jesse Anderson 
Date:   2016-05-02T18:39:26Z

Make Regex Transform

commit e6f8c958a2bb88d8b2582cb9d4391922c15b7141
Author: Jesse Anderson 
Date:   2016-05-02T22:34:08Z

Merge remote-tracking branch 'upstream/master'

commit 587eaaec106829002df5df1b38753f811649aa51
Author: Jesse Anderson 
Date:   2016-05-03T01:08:13Z

Fixing checkstyle issues. Added missing Apache license.

commit df3045f62c939ef3a777ffbf658088f193144983
Author: Jesse Anderson 
Date:   2016-05-05T15:46:14Z

Added distributed replacement functions. Add replaceAll and replaceFirst. 
Fixed some JavaDocs.

commit 793d22667f485a5cdd49a7d36553c96e6898391c
Author: Jesse Anderson 
Date:   2016-05-05T15:55:58Z

Whitespace fixes for check style.

commit 9e5a9971131721c988242400643712f5c9671b9e
Author: Jesse Anderson 
Date:   2016-05-09T15:47:56Z

Merge remote-tracking branch 'upstream/master'

commit bdbed177c4fa8afed2a89f9c8c9be87ae93abeff
Author: Jesse Anderson 
Date:   2016-05-09T17:05:15Z

Added BigDecimal coder and tests.

commit 9615b1bc1eb3095d6469640b9d8e072013cceee5
Author: Jesse Anderson 
Date:   2016-05-09T17:31:11Z

Removing files from another branch.




> Create BigDecimal Coder
> ---
>
> Key: BEAM-269
> URL: https://issues.apache.org/jira/browse/BEAM-269
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Jesse Anderson
>Assignee: Jesse Anderson
>
> There isn't a coder for BigDecimal. This class is especially important for 
> financial companies to represent money.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-05-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276802#comment-15276802
 ] 

ASF GitHub Bot commented on BEAM-22:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/283


> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-146) WindowFn.AssingContext leaks implementation details about compressed WindowedValue representation

2016-05-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276858#comment-15276858
 ] 

ASF GitHub Bot commented on BEAM-146:
-

GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/308

[BEAM-146] Remove references to multi-window representation from model

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

Some areas of the Beam model in the SDK allude to the use of a
compressed representation of an element along with the set
of windows it is assigned to. However, the model itself views
elements in different windows as fully independent, so the SDK
should not place any obligation on the part of the runner or
user to use a particular representation.

This change removes those places in the SDK where an element
is treated in multiple windows at once.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam AssignContext

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/308.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #308


commit 27588f1cccfc4daffc5d594e69cd0f94a225d22a
Author: Kenneth Knowles 
Date:   2016-05-09T19:17:09Z

Remove references to multi-window representation from model

Some areas of the Beam model in the SDK allude to the use of a
compressed representation of an element along with the set
of windows it is assigned to. However, the model itself views
elements in different windows as fully independent, so the SDK
should not place any obligation on the part of the runner or
user to use a particular representation.

This change removes those places in the SDK where an element
is treated in multiple windows at once.




> WindowFn.AssingContext leaks implementation details about compressed 
> WindowedValue representation
> -
>
> Key: BEAM-146
> URL: https://issues.apache.org/jira/browse/BEAM-146
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Minor
>
> Today, {{WindowFn.AssignContext}} provides simultaneous access to all of the 
> windows that a value has been placed in.
> Providing access to the current window for a value is convenient for, e.g. 
> converting day windows to hour windows for each hour of the assign day. But 
> providing access to all the assigned windows allows spooky action across 
> windows, and is generally not intended to be observable - elements are 
> semantically considered to be "duplicated" into each of the assigned windows.
> This ticket proposes that the {{AssignContext}} should provide only a single 
> window, and that windows should be "exploded" prior to window re-assignment 
> so that elements are only observed within one window at a time. This can be 
> accomplished trivially today via surgical insertion of 
> {{RequiresWindowAccess}} but the {{AssignContext}} should have its API 
> adjusted to be explicit about it, too.
> This will affect only pipelines for which _all_ of the following hold:
>  - assigns to sliding windows (or custom {{WindowFn}} that places each 
> element in multiple windows)
>  - re-assigns to different windows without a {{GroupByKey}} between.
>  - the new window assignment actually does depend on the full set of windows 
> assigned
> I hypothesize the number of such pipelines is zero.
> I expect to address this during the Beam Runner API design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-05-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276999#comment-15276999
 ] 

ASF GitHub Bot commented on BEAM-22:


GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/309

[BEAM-22] Use PushbackSideInputDoFnRunner in the InProcessRunner

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
This removes blocking behavior while retrieving Side Inputs in the 
InProcessRunner.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam use_pushback_runner

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/309.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #309


commit ffab4b6b22276bf41c9ce4b33d22c44e6d753268
Author: Thomas Groh 
Date:   2016-04-28T00:27:57Z

Use PushbackDoFnRunner in the ParDoInProcessEvaluator

This ensures that the evaluator does not block while processing an input
bundle.

commit b7d09e14ef749fcd8e773c82b3f9e73f1646eff8
Author: Thomas Groh 
Date:   2016-04-28T17:12:09Z

Limit the number of work schedules per MonitorRunnable run

This ensures that work readded to the queue will not cause the monitor 
runnable
to run forever before delivering timers




> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-05-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277124#comment-15277124
 ] 

ASF GitHub Bot commented on BEAM-22:


GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/312

[BEAM-22] Update Watermarks Outside of handleResult

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
This removes excess mutual exclusion, and reduces the
eagerness of updating the value of a watermark.

The first commit in this CR reviewed separately in #310 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam ippr_wm_asynchronous

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/312.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #312


commit 2d94540ccd02b19a73410647e84486c008d9cdc5
Author: Thomas Groh 
Date:   2016-05-09T21:03:16Z

Return null evaluators from Unavailable Reads

Null TransformEvaluators for sources represent a source where all splits
are currently in use or completed.

Update TransformExecutor to handle null evaluators properly.

Change TransformExecutor to a Runnable.

commit 9ee1502e159cce49a8a4490593d896b34154e006
Author: Thomas Groh 
Date:   2016-05-03T00:26:47Z

Update Watermarks Outside of handleResult

Remove excess mutual exclusion




> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-05-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277134#comment-15277134
 ] 

ASF GitHub Bot commented on BEAM-22:


Github user tgroh closed the pull request at:

https://github.com/apache/incubator-beam/pull/304


> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-05-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277296#comment-15277296
 ] 

ASF GitHub Bot commented on BEAM-22:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/289


> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-04-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241712#comment-15241712
 ] 

ASF GitHub Bot commented on BEAM-22:


GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/178

[BEAM-22] Switch the Default PipelineRunner

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

Use the InProcessPiplineRunner (pending rename) as the default runner.
The InProcessPipelineRunner implements the beam model, including support
for Unbounded PCollections.

Tests will fail until #167 and #177 are merged, as this change modifies the
graph constructed by the Pipeline if the runner is not specified.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam ippr_switch_default

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/178.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #178


commit 55ac23eb9bc184c5eda7c69ba5b70e436b665e09
Author: Thomas Groh 
Date:   2016-04-08T17:20:56Z

Switch the Default PipelineRunner

Use the InProcessPiplineRunner (pending rename) as the default runner.
The InProcessPipelineRunner implements the beam model, including support
for Unbounded PCollections.




> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-189) The Spark runner uses valueInEmptyWindow which causes values to be dropped

2016-04-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241913#comment-15241913
 ] 

ASF GitHub Bot commented on BEAM-189:
-

GitHub user amitsela opened a pull request:

https://github.com/apache/incubator-beam/pull/179

[BEAM-189] The Spark runner uses valueInEmptyWindow which causes values to 
be dropped

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/amitsela/incubator-beam BEAM-189

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/179.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #179


commit d9ce6df3c41dcbc7d884d5b19841475fac330757
Author: Sela 
Date:   2016-04-14T19:01:15Z

Replace valueInEmptyWindows with valueInGlobalWindow

commit 2309ad6dbd74333808d6b993b43fe81791a19611
Author: Sela 
Date:   2016-04-14T19:02:20Z

Replace valueInEmptyWindows with valueInGlobalWindow in Spark Function, and 
add per-value (non-RDD)
windowing functions

commit 47fdf133da440a9ee8a6a1bc5ea34e59138f7c53
Author: Sela 
Date:   2016-04-14T19:05:14Z

Materialize PCollection/RDD as windowed values with the appropriate windows.

commit 06afe790c3e933f7a1a5ed26c3586c22dbaf750f
Author: Sela 
Date:   2016-04-14T20:18:24Z

Add unit test for TextIO output to support the mvn exec:exec example we 
provide in README

commit 51089e52474127e4dba21014ca79e1e8879b98cd
Author: Sela 
Date:   2016-04-14T20:58:14Z

Satisfy checkstyle




> The Spark runner uses valueInEmptyWindow which causes values to be dropped
> --
>
> Key: BEAM-189
> URL: https://issues.apache.org/jira/browse/BEAM-189
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Amit Sela
>
> Values in empty windowed may be dropped at anytime and so the default 
> windowing should be with GlobalWindow
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-155) Support asserting the contents of windows and panes in PAssert

2016-04-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242046#comment-15242046
 ] 

ASF GitHub Bot commented on BEAM-155:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/150


> Support asserting the contents of windows and panes in PAssert
> --
>
> Key: BEAM-155
> URL: https://issues.apache.org/jira/browse/BEAM-155
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>
> This consists of reifying the output windows and panes, and running asserts 
> per-window about the contents of panes.
> This includes aggregated matching and final pane matching, e.g.
> PAssert.that(output).byOnTimePane().hasOutputElements(foo, bar);
> // For discarding mode - could have emitted (say) [spam, eggs], [spam], [], 
> [sausage], []
> PAssert.that(output).byFinalPane().hasOutputElements(spam, eggs, sausage, 
> spam);
> // For accumulating mode without late data
> PAssert.that(output).finalPane().containsInAnyOrder(spam, eggs, sausage, 
> spam);
> // For accumulating mode with late data
> PAssert.that(output).finalPane().containsInAnyOrder(foo, 
> bar).mayAlsoContain(baz, rab);
> See also: 
> https://docs.google.com/document/d/1fZUUbG2LxBtqCVabQshldXIhkMcXepsbv2vuuny8Ix4/edit#



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-121) Publish DisplayData from common PTransforms

2016-04-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243430#comment-15243430
 ] 

ASF GitHub Bot commented on BEAM-121:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/124


> Publish DisplayData from common PTransforms
> ---
>
> Key: BEAM-121
> URL: https://issues.apache.org/jira/browse/BEAM-121
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-77) Reorganize Directory structure

2016-04-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243263#comment-15243263
 ] 

ASF GitHub Bot commented on BEAM-77:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/96


> Reorganize Directory structure
> --
>
> Key: BEAM-77
> URL: https://issues.apache.org/jira/browse/BEAM-77
> Project: Beam
>  Issue Type: Task
>  Components: project-management
>Reporter: Frances Perry
>Assignee: Jean-Baptiste Onofré
>
> Now that we've done the initial Dataflow code drop, we will restructure 
> directories to provide space for additional SDKs and Runners.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-04-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243267#comment-15243267
 ] 

ASF GitHub Bot commented on BEAM-22:


GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/188

[BEAM-22] Improve ParDoEvaluator Factoring

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
This moves shared code into a common location.

Clone DoFn instances before constructing the DoFnRunner to
 avoid races.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam 
ippr_better_ParDoEvaluator_factoring

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/188.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #188


commit ecc26d51cee4ea1568948d48cd3441594f638e39
Author: Thomas Groh 
Date:   2016-03-30T00:38:22Z

Move Shared construction code to ParDoInProcessEvaluator

Remove duplicate code in ParDo(Single/Multi)EvaluatorFactory; instead
only extract the appropriate elements and pass them to the
ParDoInProcessEvaluator.t log

commit e47aba0a7097cee8341369594e47e73b83029a50
Author: Thomas Groh 
Date:   2016-04-15T17:23:15Z

Clone DoFns before constructing a DoFnRunner

This ensures that each thread gets an individual copy of a DoFn, so
multiple threads do not interact.




> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-202) Remove YYYCoderBase

2016-04-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243875#comment-15243875
 ] 

ASF GitHub Bot commented on BEAM-202:
-

GitHub user lukecwik opened a pull request:

https://github.com/apache/incubator-beam/pull/194

[BEAM-202] Clean-up *CoderBase classes

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

We are on Jackson 2.7.0 now.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lukecwik/incubator-beam remove_coder_base

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/194.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #194


commit 11e8bfed4c73ec21dcf5a671ff0381ec1f39d8d4
Author: Luke Cwik 
Date:   2016-04-15T23:53:23Z

[BEAM-202] Clean-up *CoderBase classes since we are on a newer version of 
Jackson




> Remove YYYCoderBase
> ---
>
> Key: BEAM-202
> URL: https://issues.apache.org/jira/browse/BEAM-202
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>
> Jackson 2.4.5 had a bug where it couldn't deserialize types with multiple 
> generic parameters. Since we are on Jackson 2.7.0, we can remove KvCoderBase 
> and MapCoderBase



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-50) BigQueryIO.Write: reimplement in Java

2016-04-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243825#comment-15243825
 ] 

ASF GitHub Bot commented on BEAM-50:


GitHub user peihe opened a pull request:

https://github.com/apache/incubator-beam/pull/192

[BEAM-50] Fix BigQuery.Write tempFilePrefix concatenation




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/peihe/incubator-beam fix-path-resolve

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/192.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #192


commit 8024a85f05f568b83745d8f61c443bf8ca179421
Author: Pei He 
Date:   2016-04-15T23:15:11Z

Fix BigQuery.Write tempFilePrefix concatenation




> BigQueryIO.Write: reimplement in Java
> -
>
> Key: BEAM-50
> URL: https://issues.apache.org/jira/browse/BEAM-50
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-gcp
>Reporter: Daniel Halperin
>Priority: Minor
>
> BigQueryIO.Write is currently implemented in a somewhat hacky way.
> Unbounded sink:
> * The DirectPipelineRunner and the DataflowPipelineRunner use 
> StreamingWriteFn and BigQueryTableInserter to insert rows using BigQuery's 
> streaming writes API.
> Bounded sink:
> * The DirectPipelineRunner still uses streaming writes.
> * The DataflowPipelineRunner uses a different code path in the Google Cloud 
> Dataflow service that writes to GCS and the initiates a BigQuery load job.
> * Per-window table destinations do not work scalably. (See Beam-XXX).
> We need to reimplement BigQueryIO.Write fully in Java code in order to 
> support other runners in a scalable way.
> I additionally suggest that we revisit the design of the BigQueryIO sink in 
> the process. A short list:
> * Do not use TableRow as the default value for rows. It could be Map Object> with well-defined types, for example, or an Avro GenericRecord. 
> Dropping TableRow will get around a variety of issues with types, fields 
> named 'f', etc., and it will also reduce confusion as we use TableRow objects 
> differently than usual (for good reason).
> * Possibly support not-knowing the schema until pipeline execution time.
> * Our builders for BigQueryIO.Write are useful and we should keep them. Where 
> possible we should also allow users to provide the JSON objects that 
> configure the underlying table creation, write disposition, etc. This would 
> let users directly control things like table expiration time, table location, 
> etc., Would also optimistically let users take advantage of some new BigQuery 
> features without code changes.
> * We could choose between streaming write API and load jobs based on user 
> preference or dynamic job properties . We could use streaming write in a 
> batch pipeline if the data is small. We could use load jobs in streaming 
> pipelines if the windows are large enough to make this practical.
> * When issuing BigQuery load jobs, we could leave files in GCS if the import 
> fails, so that data errors can be debugged.
> * We should make per-window table writes scalable in batch.
> Caveat, possibly blocker:
> * (Beam-XXX): cleanup and temp file management. One advantage of the Google 
> Cloud Dataflow implementation of BigQueryIO.Write is cleanup: we ensure that 
> intermediate files are deleted when bundles or jobs fail, etc. Beam does not 
> currently support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-04-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243888#comment-15243888
 ] 

ASF GitHub Bot commented on BEAM-22:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/178


> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-50) BigQueryIO.Write: reimplement in Java

2016-04-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243900#comment-15243900
 ] 

ASF GitHub Bot commented on BEAM-50:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/193


> BigQueryIO.Write: reimplement in Java
> -
>
> Key: BEAM-50
> URL: https://issues.apache.org/jira/browse/BEAM-50
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-gcp
>Reporter: Daniel Halperin
>Priority: Minor
>
> BigQueryIO.Write is currently implemented in a somewhat hacky way.
> Unbounded sink:
> * The DirectPipelineRunner and the DataflowPipelineRunner use 
> StreamingWriteFn and BigQueryTableInserter to insert rows using BigQuery's 
> streaming writes API.
> Bounded sink:
> * The DirectPipelineRunner still uses streaming writes.
> * The DataflowPipelineRunner uses a different code path in the Google Cloud 
> Dataflow service that writes to GCS and the initiates a BigQuery load job.
> * Per-window table destinations do not work scalably. (See Beam-XXX).
> We need to reimplement BigQueryIO.Write fully in Java code in order to 
> support other runners in a scalable way.
> I additionally suggest that we revisit the design of the BigQueryIO sink in 
> the process. A short list:
> * Do not use TableRow as the default value for rows. It could be Map Object> with well-defined types, for example, or an Avro GenericRecord. 
> Dropping TableRow will get around a variety of issues with types, fields 
> named 'f', etc., and it will also reduce confusion as we use TableRow objects 
> differently than usual (for good reason).
> * Possibly support not-knowing the schema until pipeline execution time.
> * Our builders for BigQueryIO.Write are useful and we should keep them. Where 
> possible we should also allow users to provide the JSON objects that 
> configure the underlying table creation, write disposition, etc. This would 
> let users directly control things like table expiration time, table location, 
> etc., Would also optimistically let users take advantage of some new BigQuery 
> features without code changes.
> * We could choose between streaming write API and load jobs based on user 
> preference or dynamic job properties . We could use streaming write in a 
> batch pipeline if the data is small. We could use load jobs in streaming 
> pipelines if the windows are large enough to make this practical.
> * When issuing BigQuery load jobs, we could leave files in GCS if the import 
> fails, so that data errors can be debugged.
> * We should make per-window table writes scalable in batch.
> Caveat, possibly blocker:
> * (Beam-XXX): cleanup and temp file management. One advantage of the Google 
> Cloud Dataflow implementation of BigQueryIO.Write is cleanup: we ensure that 
> intermediate files are deleted when bundles or jobs fail, etc. Beam does not 
> currently support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-78) Rename Dataflow to Beam

2016-04-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243807#comment-15243807
 ] 

ASF GitHub Bot commented on BEAM-78:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/191


> Rename Dataflow to Beam 
> 
>
> Key: BEAM-78
> URL: https://issues.apache.org/jira/browse/BEAM-78
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Frances Perry
>Assignee: Jean-Baptiste Onofré
>Priority: Blocker
>
> The initial code drop contains code that uses "Dataflow" to refer to the 
> SDK/model and Cloud Dataflow service. The first usage needs to be swapped to 
> Beam.
> This includes:
> - mentions throughout the javadoc
> - packages of classes that belong to the java sdk core
> And does not include:
> - the DataflowPipelineRunner
> We plan to postpone this rename until other code drops have been integrated 
> into the repository, and we have completed the refactoring that will separate 
> these two uses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-50) BigQueryIO.Write: reimplement in Java

2016-04-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243836#comment-15243836
 ] 

ASF GitHub Bot commented on BEAM-50:


GitHub user peihe opened a pull request:

https://github.com/apache/incubator-beam/pull/193

[BEAM-50] Remove BigQueryIO.Write.Bound translator




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/peihe/incubator-beam 
remove-bq-write-translator

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/193.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #193


commit 5dd12fe94a5ea2249cf5156c16df1b22aa663baf
Author: Pei He 
Date:   2016-04-15T23:19:54Z

[BEAM-50] Remove BigQueryIO.Write.Bound translator




> BigQueryIO.Write: reimplement in Java
> -
>
> Key: BEAM-50
> URL: https://issues.apache.org/jira/browse/BEAM-50
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-gcp
>Reporter: Daniel Halperin
>Priority: Minor
>
> BigQueryIO.Write is currently implemented in a somewhat hacky way.
> Unbounded sink:
> * The DirectPipelineRunner and the DataflowPipelineRunner use 
> StreamingWriteFn and BigQueryTableInserter to insert rows using BigQuery's 
> streaming writes API.
> Bounded sink:
> * The DirectPipelineRunner still uses streaming writes.
> * The DataflowPipelineRunner uses a different code path in the Google Cloud 
> Dataflow service that writes to GCS and the initiates a BigQuery load job.
> * Per-window table destinations do not work scalably. (See Beam-XXX).
> We need to reimplement BigQueryIO.Write fully in Java code in order to 
> support other runners in a scalable way.
> I additionally suggest that we revisit the design of the BigQueryIO sink in 
> the process. A short list:
> * Do not use TableRow as the default value for rows. It could be Map Object> with well-defined types, for example, or an Avro GenericRecord. 
> Dropping TableRow will get around a variety of issues with types, fields 
> named 'f', etc., and it will also reduce confusion as we use TableRow objects 
> differently than usual (for good reason).
> * Possibly support not-knowing the schema until pipeline execution time.
> * Our builders for BigQueryIO.Write are useful and we should keep them. Where 
> possible we should also allow users to provide the JSON objects that 
> configure the underlying table creation, write disposition, etc. This would 
> let users directly control things like table expiration time, table location, 
> etc., Would also optimistically let users take advantage of some new BigQuery 
> features without code changes.
> * We could choose between streaming write API and load jobs based on user 
> preference or dynamic job properties . We could use streaming write in a 
> batch pipeline if the data is small. We could use load jobs in streaming 
> pipelines if the windows are large enough to make this practical.
> * When issuing BigQuery load jobs, we could leave files in GCS if the import 
> fails, so that data errors can be debugged.
> * We should make per-window table writes scalable in batch.
> Caveat, possibly blocker:
> * (Beam-XXX): cleanup and temp file management. One advantage of the Google 
> Cloud Dataflow implementation of BigQueryIO.Write is cleanup: we ensure that 
> intermediate files are deleted when bundles or jobs fail, etc. Beam does not 
> currently support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-121) Publish DisplayData from common PTransforms

2016-04-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243554#comment-15243554
 ] 

ASF GitHub Bot commented on BEAM-121:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/125


> Publish DisplayData from common PTransforms
> ---
>
> Key: BEAM-121
> URL: https://issues.apache.org/jira/browse/BEAM-121
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-121) Publish DisplayData from common PTransforms

2016-04-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243723#comment-15243723
 ] 

ASF GitHub Bot commented on BEAM-121:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/126


> Publish DisplayData from common PTransforms
> ---
>
> Key: BEAM-121
> URL: https://issues.apache.org/jira/browse/BEAM-121
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-78) Rename Dataflow to Beam

2016-04-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243743#comment-15243743
 ] 

ASF GitHub Bot commented on BEAM-78:


GitHub user lukecwik opened a pull request:

https://github.com/apache/incubator-beam/pull/191

[BEAM-78] Expose package private methods that Dataflow worker relies on

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

The Beam rename caused package structure to change. This broke
some of the visiblity requirements inside Dataflow worker.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lukecwik/incubator-beam beam_rename

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/191.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #191


commit 99020bc5a31669af191778c3a58d80a57263
Author: Luke Cwik 
Date:   2016-04-15T22:04:46Z

[BEAM-78] Expose package private methods that Dataflow worker relies on

The Beam rename caused package structure to change. This broke
some of the visiblity requirements inside Dataflow worker.

commit 70d7a7cc56fc4c353be2a64c35a38080cf125b69
Author: Luke Cwik 
Date:   2016-04-15T22:09:46Z

[BEAM-78] !fixup Fix package import order




> Rename Dataflow to Beam 
> 
>
> Key: BEAM-78
> URL: https://issues.apache.org/jira/browse/BEAM-78
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Frances Perry
>Assignee: Jean-Baptiste Onofré
>Priority: Blocker
>
> The initial code drop contains code that uses "Dataflow" to refer to the 
> SDK/model and Cloud Dataflow service. The first usage needs to be swapped to 
> Beam.
> This includes:
> - mentions throughout the javadoc
> - packages of classes that belong to the java sdk core
> And does not include:
> - the DataflowPipelineRunner
> We plan to postpone this rename until other code drops have been integrated 
> into the repository, and we have completed the refactoring that will separate 
> these two uses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-155) Support asserting the contents of windows and panes in PAssert

2016-04-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243665#comment-15243665
 ] 

ASF GitHub Bot commented on BEAM-155:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/180


> Support asserting the contents of windows and panes in PAssert
> --
>
> Key: BEAM-155
> URL: https://issues.apache.org/jira/browse/BEAM-155
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>
> This consists of reifying the output windows and panes, and running asserts 
> per-window about the contents of panes.
> This includes aggregated matching and final pane matching, e.g.
> PAssert.that(output).byOnTimePane().hasOutputElements(foo, bar);
> // For discarding mode - could have emitted (say) [spam, eggs], [spam], [], 
> [sausage], []
> PAssert.that(output).byFinalPane().hasOutputElements(spam, eggs, sausage, 
> spam);
> // For accumulating mode without late data
> PAssert.that(output).finalPane().containsInAnyOrder(spam, eggs, sausage, 
> spam);
> // For accumulating mode with late data
> PAssert.that(output).finalPane().containsInAnyOrder(foo, 
> bar).mayAlsoContain(baz, rab);
> See also: 
> https://docs.google.com/document/d/1fZUUbG2LxBtqCVabQshldXIhkMcXepsbv2vuuny8Ix4/edit#



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-201) Material page

2016-04-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243674#comment-15243674
 ] 

ASF GitHub Bot commented on BEAM-201:
-

GitHub user evilsoapbox opened a pull request:

https://github.com/apache/incubator-beam-site/pull/13

Material page; logo fixes

- Added material page with project logos/materials
- Navigation fixes
- Logo fix for the main logo on all pages

In JIRA as [BEAM-201](https://issues.apache.org/jira/browse/BEAM-201)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/evilsoapbox/incubator-beam-site logo-files

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam-site/pull/13.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13


commit a7cdd50326c5f14e7b578328001a710ba4672620
Author: James Malone 
Date:   2016-04-15T21:13:19Z

Addition of material page; nav fixes




> Material page
> -
>
> Key: BEAM-201
> URL: https://issues.apache.org/jira/browse/BEAM-201
> Project: Beam
>  Issue Type: Improvement
>  Components: website
> Environment: Create a website page with logo and project material 
> content
>Reporter: James Malone
>Assignee: James Malone
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-177) Integrate code coverage to build and review process

2016-04-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246085#comment-15246085
 ] 

ASF GitHub Bot commented on BEAM-177:
-

Github user kennknowles closed the pull request at:

https://github.com/apache/incubator-beam/pull/138


> Integrate code coverage to build and review process
> ---
>
> Key: BEAM-177
> URL: https://issues.apache.org/jira/browse/BEAM-177
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> We cannot use codecov, but we can use coveralls. We have the maven plugin 
> included in the pom and need to invoke it appropriately in our various 
> builds, and disseminate knowledge about browser extensions to get it into the 
> pull request UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-202) Remove YYYCoderBase

2016-04-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245961#comment-15245961
 ] 

ASF GitHub Bot commented on BEAM-202:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/194


> Remove YYYCoderBase
> ---
>
> Key: BEAM-202
> URL: https://issues.apache.org/jira/browse/BEAM-202
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>
> Jackson 2.4.5 had a bug where it couldn't deserialize types with multiple 
> generic parameters. Since we are on Jackson 2.7.0, we can remove KvCoderBase 
> and MapCoderBase



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-196) Pipeline options must be available Context in DoFn.startBundle

2016-04-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245934#comment-15245934
 ] 

ASF GitHub Bot commented on BEAM-196:
-

GitHub user mxm opened a pull request:

https://github.com/apache/incubator-beam/pull/200

[BEAM-196] Pipeline options must be available Context in DoFn.startBundle

This gets rid of the custom Java serialization code by defaulting to 
serialization of the `PipelineOptions` to a byte array. So far, this has been 
proven the most hassle-free method for the Flink Runner. For code reuse and 
avoiding multiple deserialization of the byte array, the 
`SerializedPipelineOptions` class has been introduced.

The changes also make the options accessible in the context of the `DoFn` 
function.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mxm/incubator-beam BEAM-196

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/200.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #200


commit 81577b31c2642522f7dd4ba8eba794df48a0ca56
Author: Maximilian Michels 
Date:   2016-04-18T15:40:38Z

[BEAM-196] abstraction for PipelineOptions serialization

commit 43b5ec743718e63c2d9d9532e3ca55bc87370290
Author: Maximilian Michels 
Date:   2016-04-18T15:40:50Z

[BEAM-196] make use of SerializedPipelineOptions




> Pipeline options must be available Context in DoFn.startBundle
> --
>
> Key: BEAM-196
> URL: https://issues.apache.org/jira/browse/BEAM-196
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Mark Shields
>Assignee: Maximilian Michels
>
> Our (not yet merged) Java Pubsub implementation has code like this in a DoFn:
> @Override
> public void startBundle(Context c) throws Exception {
>   Preconditions.checkState(pubsubClient == null);
>   pubsubClient = PubsubClient.newClient(transportType,
>   timestampLabel, idLabel, 
> c.getPipelineOptions().as(PubsubOptions.class));
>   super.startBundle(c);
> }
> This fails with NPE since the pipeline options are not conveyed via the 
> context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-04-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246471#comment-15246471
 ] 

ASF GitHub Bot commented on BEAM-22:


GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/201

[BEAM-22] Remove isKeyed property of InProcess Bundles

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

The property of keyedness belongs to a PCollection. A BundleFactory
propogates the key as far as possible, but does not track if a bundle is
keyed.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam 
ippr_remove_bundle_iskeyed

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/201.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #201


commit 9644b7e20955603cb191f79f16b0f1f50b6497db
Author: Thomas Groh 
Date:   2016-04-18T19:59:24Z

Remove isKeyed property of InProcess Bundles

The property of keyedness belongs to a PCollection. A BundleFactory
propogates the key as far as possible, but does not track if a bundle is
keyed.




> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-208) Flaky test: org.apache.beam.runners.flink.streaming.GroupByNullKeyTest.testJob

2016-04-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248133#comment-15248133
 ] 

ASF GitHub Bot commented on BEAM-208:
-

GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/210

[BEAM-208] Remove use of System.currentTimeMillis in Flink Test

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

This will stop tests failing if the DoFn executes within ~25 seconds of
an hour boundary.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam 
flink_deflake_groupByKeyNull

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/210.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #210


commit 416ca1f5e589b19e9e98e68fbb0e35349e19b82d
Author: Thomas Groh 
Date:   2016-04-19T16:39:38Z

Remove use of System.currentTimeMillis in Flink Test

This will stop tests failing if the DoFn executes within ~25 seconds of
an hour boundary.




> Flaky test: org.apache.beam.runners.flink.streaming.GroupByNullKeyTest.testJob
> --
>
> Key: BEAM-208
> URL: https://issues.apache.org/jira/browse/BEAM-208
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Davor Bonaci
>Assignee: Maximilian Michels
>Priority: Minor
>
> org.apache.beam.runners.flink.streaming.GroupByNullKeyTest.testJob sometimes 
> flakes out.
> Error:
> Different number of lines in expected and obtained result. expected:<1> but 
> was:<2>
> Here's an example on Jenkins:
> https://builds.apache.org/job/beam_PostCommit_MavenVerify/org.apache.beam$flink-runner_2.10/199/testReport/junit/org.apache.beam.runners.flink.streaming/GroupByNullKeyTest/testJob/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-04-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246581#comment-15246581
 ] 

ASF GitHub Bot commented on BEAM-22:


Github user tgroh closed the pull request at:

https://github.com/apache/incubator-beam/pull/202


> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-04-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246791#comment-15246791
 ] 

ASF GitHub Bot commented on BEAM-22:


Github user tgroh closed the pull request at:

https://github.com/apache/incubator-beam/pull/204


> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-50) BigQueryIO.Write: reimplement in Java

2016-04-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246956#comment-15246956
 ] 

ASF GitHub Bot commented on BEAM-50:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/205


> BigQueryIO.Write: reimplement in Java
> -
>
> Key: BEAM-50
> URL: https://issues.apache.org/jira/browse/BEAM-50
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-gcp
>Reporter: Daniel Halperin
>Priority: Minor
>
> BigQueryIO.Write is currently implemented in a somewhat hacky way.
> Unbounded sink:
> * The DirectPipelineRunner and the DataflowPipelineRunner use 
> StreamingWriteFn and BigQueryTableInserter to insert rows using BigQuery's 
> streaming writes API.
> Bounded sink:
> * The DirectPipelineRunner still uses streaming writes.
> * The DataflowPipelineRunner uses a different code path in the Google Cloud 
> Dataflow service that writes to GCS and the initiates a BigQuery load job.
> * Per-window table destinations do not work scalably. (See Beam-XXX).
> We need to reimplement BigQueryIO.Write fully in Java code in order to 
> support other runners in a scalable way.
> I additionally suggest that we revisit the design of the BigQueryIO sink in 
> the process. A short list:
> * Do not use TableRow as the default value for rows. It could be Map Object> with well-defined types, for example, or an Avro GenericRecord. 
> Dropping TableRow will get around a variety of issues with types, fields 
> named 'f', etc., and it will also reduce confusion as we use TableRow objects 
> differently than usual (for good reason).
> * Possibly support not-knowing the schema until pipeline execution time.
> * Our builders for BigQueryIO.Write are useful and we should keep them. Where 
> possible we should also allow users to provide the JSON objects that 
> configure the underlying table creation, write disposition, etc. This would 
> let users directly control things like table expiration time, table location, 
> etc., Would also optimistically let users take advantage of some new BigQuery 
> features without code changes.
> * We could choose between streaming write API and load jobs based on user 
> preference or dynamic job properties . We could use streaming write in a 
> batch pipeline if the data is small. We could use load jobs in streaming 
> pipelines if the windows are large enough to make this practical.
> * When issuing BigQuery load jobs, we could leave files in GCS if the import 
> fails, so that data errors can be debugged.
> * We should make per-window table writes scalable in batch.
> Caveat, possibly blocker:
> * (Beam-XXX): cleanup and temp file management. One advantage of the Google 
> Cloud Dataflow implementation of BigQueryIO.Write is cleanup: we ensure that 
> intermediate files are deleted when bundles or jobs fail, etc. Beam does not 
> currently support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-04-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246889#comment-15246889
 ] 

ASF GitHub Bot commented on BEAM-22:


GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/207

[BEAM-22] Track Pending Elements via Exploded WindowedValues

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
This allows the WindowedValues that are completed to be
removed from the set of pending elements, even if the actual object
is a different instance, by ensuring that all WindowedValues contain
only a single (element, window) pair.

Built on top of #206 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam ippr_exploded_wm_tracking

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/207.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #207


commit ac0696d8867d7a1706f1ff85c4b299c2b1779d02
Author: Thomas Groh 
Date:   2016-04-18T23:55:57Z

Add WindowedValue#explodeWindows

This takes an existing WindowedValue and returns a Collection of
WindowedValues, each of which is in exactly one window.

Use the explode implementation on DoFnRunnerBase

commit 725b2ddea58add3f583ed6f7c74f6ab4343cf292
Author: Thomas Groh 
Date:   2016-04-19T00:28:47Z

Track pending elements via exploded WindowedValues

This allows the WindowedValues to be partially completed while the
still holding the watermark.




> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-04-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248040#comment-15248040
 ] 

ASF GitHub Bot commented on BEAM-22:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/203


> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-189) The Spark runner uses valueInEmptyWindow which causes values to be dropped

2016-04-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248476#comment-15248476
 ] 

ASF GitHub Bot commented on BEAM-189:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/179


> The Spark runner uses valueInEmptyWindow which causes values to be dropped
> --
>
> Key: BEAM-189
> URL: https://issues.apache.org/jira/browse/BEAM-189
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Amit Sela
>
> Values in empty windowed may be dropped at anytime and so the default 
> windowing should be with GlobalWindow
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-210) Be consistent with emitting final empty panes

2016-04-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248488#comment-15248488
 ] 

ASF GitHub Bot commented on BEAM-210:
-

GitHub user bjchambers opened a pull request:

https://github.com/apache/incubator-beam/pull/211

[BEAM-210] Test that empty final panes are not produced.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bjchambers/incubator-beam empty-final-panes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/211.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #211


commit cae541795c632f5ba4799b24a730018ec75ffb1b
Author: bchambers 
Date:   2016-04-19T18:34:25Z

Remove unused generic arguments in ReduceFnRunnerTest.

commit 4ab1c175f188e03f0ef2a5b6b2019c1e0ba27260
Author: bchambers 
Date:   2016-04-19T19:43:52Z

Add test for empty ON_TIME and no empty final pane

Add a test that we get an empty `ON_TIME` pane, and don't get the empty
final pane when using accumulation mode with the only if non-empty
`ClosingBehavior`.




> Be consistent with emitting final empty panes
> -
>
> Key: BEAM-210
> URL: https://issues.apache.org/jira/browse/BEAM-210
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Mark Shields
>Assignee: Mark Shields
>
> Currently ReduceFnRunner.onTrigger uses shouldEmit to prevent empty final 
> panes unless the user has requested them.
> The same check needs to be done in ReduceFnRunner.onTimer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-04-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15250092#comment-15250092
 ] 

ASF GitHub Bot commented on BEAM-22:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/207


> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-207) Flink test flake in ReadSourceStreamingITCase

2016-04-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15250310#comment-15250310
 ] 

ASF GitHub Bot commented on BEAM-207:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/209


> Flink test flake in ReadSourceStreamingITCase
> -
>
> Key: BEAM-207
> URL: https://issues.apache.org/jira/browse/BEAM-207
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, testing
>Reporter: Daniel Halperin
>Assignee: Maximilian Michels
>
> Log from Travis: 
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/124066205/log.txt
> Snippet:
> {noformat}
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.792 sec - 
> in org.apache.beam.runners.flink.SideInputITCase
> Running org.apache.beam.runners.flink.ReadSourceStreamingITCase
> Pipeline execution failed
> java.lang.RuntimeException: Pipeline execution failed
>   at 
> org.apache.beam.runners.flink.FlinkPipelineRunner.run(FlinkPipelineRunner.java:119)
>   at 
> org.apache.beam.runners.flink.FlinkPipelineRunner.run(FlinkPipelineRunner.java:51)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:182)
>   at 
> org.apache.beam.runners.flink.ReadSourceStreamingITCase.runProgram(ReadSourceStreamingITCase.java:70)
>   at 
> org.apache.beam.runners.flink.ReadSourceStreamingITCase.testProgram(ReadSourceStreamingITCase.java:53)
>   at 
> org.apache.flink.streaming.util.StreamingProgramTestBase.testJob(StreamingProgramTestBase.java:85)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:483)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job 
> execution failed.
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:714)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:660)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:660)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>   at 
>

[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-04-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15250442#comment-15250442
 ] 

ASF GitHub Bot commented on BEAM-22:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/201


> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-188) Merging WindowFn + GBK + Write => InvalidWindows throws UnsupportedOperationException

2016-04-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242215#comment-15242215
 ] 

ASF GitHub Bot commented on BEAM-188:
-

GitHub user dhalperi opened a pull request:

https://github.com/apache/incubator-beam/pull/181

[BEAM-188] Write: apply GlobalWindows first

And do not supply a timestamp when outputting.

Note that this is safe because the functions in the Writer cannot access 
the window
or timestamp. When we add per-Window or similar functions to the sinks, we 
will
likely do so at a higher level.

Also testing and improving the existing tests. Note that the session test 
did fail without the accompanying changes to Write.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhalperi/incubator-beam write-global-windows

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/181.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #181


commit 303e14d508a45c848bafb227944e8b8a4078efb0
Author: Dan Halperin 
Date:   2016-04-15T00:23:42Z

[BEAM-188] Write: apply GlobalWindows first

And do not supply a timestamp when outputting.

Note that this is safe because the functions in the Writer cannot access 
the window
or timestamp. When we add per-Window or similar functions to the sinks, we 
will
likely do so at a higher level.




> Merging WindowFn + GBK + Write => InvalidWindows throws 
> UnsupportedOperationException
> -
>
> Key: BEAM-188
> URL: https://issues.apache.org/jira/browse/BEAM-188
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Daniel Halperin
>
> The {{Write}} transform performs {{outputWithTimestamp(..., Instant.now())}} 
> in the {{finishBundle}} of one of the encapsulated {{ParDo}} transforms. This 
> action causes the {{WindowFn}} to be invoked to assign a window to the output 
> value. But a merging {{WindowFn}} such as {{Sessions}} will be replaced by 
> {{InvalidWindows}} at the GBK where merging is performed, so this is destined 
> to crash.
> It is almost certain that the window is not relevant, so we can quickly fix 
> this by just windowing into the global window earlier and using vanilla 
> {{output(...)}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BEAM-188) Merging WindowFn + GBK + Write => InvalidWindows throws UnsupportedOperationException

2016-04-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242246#comment-15242246
 ] 

ASF GitHub Bot commented on BEAM-188:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/181


> Merging WindowFn + GBK + Write => InvalidWindows throws 
> UnsupportedOperationException
> -
>
> Key: BEAM-188
> URL: https://issues.apache.org/jira/browse/BEAM-188
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Daniel Halperin
>
> The {{Write}} transform performs {{outputWithTimestamp(..., Instant.now())}} 
> in the {{finishBundle}} of one of the encapsulated {{ParDo}} transforms. This 
> action causes the {{WindowFn}} to be invoked to assign a window to the output 
> value. But a merging {{WindowFn}} such as {{Sessions}} will be replaced by 
> {{InvalidWindows}} at the GBK where merging is performed, so this is destined 
> to crash.
> It is almost certain that the window is not relevant, so we can quickly fix 
> this by just windowing into the global window earlier and using vanilla 
> {{output(...)}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

< 1 2 3 4 5 6 7 8 9 10 >

301 - 400 of 1640 matches

Mail list logo