[beam-site] 02/02: This closes #280

2017-08-24 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 6a6df3934f85779ec5967c2a8c926e92ab8cf309
Merge: 6558483 71bff15
Author: Mergebot 
AuthorDate: Fri Aug 25 03:48:47 2017 +

This closes #280

 src/documentation/programming-guide.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
To stop receiving notification emails like this one, please contact
"commits@beam.apache.org" .


[beam-site] 01/02: [BEAM-2726] Fix typo in programming guide

2017-08-24 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 71bff154c1aa2a52280673b112e26841a1af4376
Author: Lukasz Cwik 
AuthorDate: Fri Aug 4 06:52:30 2017 -0700

[BEAM-2726] Fix typo in programming guide
---
 src/documentation/programming-guide.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/documentation/programming-guide.md 
b/src/documentation/programming-guide.md
index 70c3785..061a36a 100644
--- a/src/documentation/programming-guide.md
+++ b/src/documentation/programming-guide.md
@@ -670,7 +670,7 @@ pc = ...
 
 # Global windowing:
 
-If your input `PCollection` uses the default global windowing, the default 
behavior is to return a `PCollection` containing one item. That item's value 
comes from the accumulator in the combine function that you specified when 
applying `Combine`. For example, the Beam provided sum combine function returns 
a zero value (the sum of an empty input), while the min combine function 
returns a maximal or infinite value.
+If your input `PCollection` uses the default global windowing, the default 
behavior is to return a `PCollection` containing one item. That item's value 
comes from the accumulator in the combine function that you specified when 
applying `Combine`. For example, the Beam provided sum combine function returns 
a zero value (the sum of an empty input), while the max combine function 
returns a maximal or infinite value.
 
 To have `Combine` instead return an empty `PCollection` if the input is empty, 
specify `.withoutDefaults` when you apply your `Combine` transform, as in the 
following code example:
 

-- 
To stop receiving notification emails like this one, please contact
"commits@beam.apache.org" .


[beam-site] branch mergebot updated (d00f31d -> 6a6df39)

2017-08-24 Thread mergebot-role
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a change to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git.


from d00f31d  This closes #299
 add 6558483  Prepare repository for deployment.
 new 71bff15  [BEAM-2726] Fix typo in programming guide
 new 6a6df39  This closes #280

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 content/documentation/dsls/sql/index.html | 132 +-
 src/documentation/programming-guide.md|   2 +-
 2 files changed, 94 insertions(+), 40 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
['"commits@beam.apache.org" '].


Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #3832

2017-08-24 Thread Apache Jenkins Server
See 




[jira] [Updated] (BEAM-2624) File-based sinks should produce a PCollection of written filenames

2017-08-24 Thread Reuven Lax (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reuven Lax updated BEAM-2624:
-
Fix Version/s: 2.2.0

> File-based sinks should produce a PCollection of written filenames
> --
>
> Key: BEAM-2624
> URL: https://issues.apache.org/jira/browse/BEAM-2624
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
> Fix For: 2.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (BEAM-2601) FileBasedSink produces incorrect shards when writing to multiple destinations

2017-08-24 Thread Reuven Lax (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reuven Lax closed BEAM-2601.

Resolution: Fixed

> FileBasedSink produces incorrect shards when writing to multiple destinations
> -
>
> Key: BEAM-2601
> URL: https://issues.apache.org/jira/browse/BEAM-2601
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
> Fix For: 2.2.0
>
>
> FileBasedSink now supports multiple dynamic destinations, however it 
> finalizes all files in a bundle without paying attention to destination. This 
> means that the shard counts will be incorrect across these destinations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #3831

2017-08-24 Thread Apache Jenkins Server
See 




[GitHub] beam pull request #3760: [BEAM-1347] Add a BagUserState implementation over ...

2017-08-24 Thread lukecwik
GitHub user lukecwik opened a pull request:

https://github.com/apache/beam/pull/3760

[BEAM-1347] Add a BagUserState implementation over the BeamFnStateClient

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lukecwik/incubator-beam state_api

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3760.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3760


commit 3e601f0d5a49b013ca83c63712fafe039d9ea7bc
Author: Luke Cwik 
Date:   2017-08-25T01:34:47Z

[BEAM-1347] Add a BagUserState implementation over the BeamFnStateClient




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-1347) Basic Java harness capable of understanding process bundle tasks and sending data over the Fn Api

2017-08-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16141046#comment-16141046
 ] 

ASF GitHub Bot commented on BEAM-1347:
--

GitHub user lukecwik opened a pull request:

https://github.com/apache/beam/pull/3760

[BEAM-1347] Add a BagUserState implementation over the BeamFnStateClient

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lukecwik/incubator-beam state_api

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3760.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3760


commit 3e601f0d5a49b013ca83c63712fafe039d9ea7bc
Author: Luke Cwik 
Date:   2017-08-25T01:34:47Z

[BEAM-1347] Add a BagUserState implementation over the BeamFnStateClient




> Basic Java harness capable of understanding process bundle tasks and sending 
> data over the Fn Api
> -
>
> Key: BEAM-1347
> URL: https://issues.apache.org/jira/browse/BEAM-1347
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>
> Create a basic Java harness capable of understanding process bundle requests 
> and able to stream data over the Fn Api.
> Overview: https://s.apache.org/beam-fn-api



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-1347) Basic Java harness capable of understanding process bundle tasks and sending data over the Fn Api

2017-08-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16141048#comment-16141048
 ] 

ASF GitHub Bot commented on BEAM-1347:
--

GitHub user lukecwik opened a pull request:

https://github.com/apache/beam/pull/3761

[BEAM-1347] Remove unused FakeStepContext now that FnApiDoFnRunner has its 
own implementation

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lukecwik/incubator-beam state_api2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3761.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3761


commit ed8eb70c32a9cb101ef51f756e0da7aaf2e23669
Author: Luke Cwik 
Date:   2017-08-25T01:31:50Z

[BEAM-1347] Remove unused FakeStepContext now that FnApiDoFnRunner has its 
own implementation




> Basic Java harness capable of understanding process bundle tasks and sending 
> data over the Fn Api
> -
>
> Key: BEAM-1347
> URL: https://issues.apache.org/jira/browse/BEAM-1347
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>
> Create a basic Java harness capable of understanding process bundle requests 
> and able to stream data over the Fn Api.
> Overview: https://s.apache.org/beam-fn-api



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3761: [BEAM-1347] Remove unused FakeStepContext now that ...

2017-08-24 Thread lukecwik
GitHub user lukecwik opened a pull request:

https://github.com/apache/beam/pull/3761

[BEAM-1347] Remove unused FakeStepContext now that FnApiDoFnRunner has its 
own implementation

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lukecwik/incubator-beam state_api2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3761.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3761


commit ed8eb70c32a9cb101ef51f756e0da7aaf2e23669
Author: Luke Cwik 
Date:   2017-08-25T01:31:50Z

[BEAM-1347] Remove unused FakeStepContext now that FnApiDoFnRunner has its 
own implementation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Build failed in Jenkins: beam_PostCommit_Java_MavenInstall #4640

2017-08-24 Thread Apache Jenkins Server
See 


Changes:

[chamikara] Fix min_timestamp used for KafkaIO watermark.

--
[...truncated 1.12 MB...]
2017-08-25T01:25:43.220 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/github/jbellis/jamm/0.3.0/jamm-0.3.0.jar
2017-08-25T01:25:43.248 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/github/jbellis/jamm/0.3.0/jamm-0.3.0.jar
 (21 KB at 3.3 KB/sec)
2017-08-25T01:25:43.248 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/io/netty/netty-all/4.0.39.Final/netty-all-4.0.39.Final.jar
2017-08-25T01:25:43.256 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/googlecode/concurrent-trees/concurrent-trees/2.4.0/concurrent-trees-2.4.0.jar
 (116 KB at 18.5 KB/sec)
2017-08-25T01:25:43.256 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/fusesource/sigar/1.6.4/sigar-1.6.4.jar
2017-08-25T01:25:43.337 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/fusesource/sigar/1.6.4/sigar-1.6.4.jar 
(419 KB at 65.8 KB/sec)
2017-08-25T01:25:43.337 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/eclipse/jdt/core/compiler/ecj/4.4.2/ecj-4.4.2.jar
2017-08-25T01:25:43.642 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/io/netty/netty-all/4.0.39.Final/netty-all-4.0.39.Final.jar
 (2219 KB at 333.0 KB/sec)
2017-08-25T01:25:43.643 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/caffinitas/ohc/ohc-core/0.4.3/ohc-core-0.4.3.jar
2017-08-25T01:25:43.682 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/eclipse/jdt/core/compiler/ecj/4.4.2/ecj-4.4.2.jar
 (2257 KB at 336.6 KB/sec)
2017-08-25T01:25:43.682 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/github/ben-manes/caffeine/caffeine/2.2.6/caffeine-2.2.6.jar
2017-08-25T01:25:43.684 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/caffinitas/ohc/ohc-core/0.4.3/ohc-core-0.4.3.jar
 (125 KB at 18.5 KB/sec)
2017-08-25T01:25:43.685 [INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/datastax/cassandra/cassandra-driver-core/3.1.1/cassandra-driver-core-3.1.1.jar
2017-08-25T01:25:43.831 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/github/ben-manes/caffeine/caffeine/2.2.6/caffeine-2.2.6.jar
 (926 KB at 135.1 KB/sec)
2017-08-25T01:25:43.849 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/com/datastax/cassandra/cassandra-driver-core/3.1.1/cassandra-driver-core-3.1.1.jar
 (1029 KB at 149.7 KB/sec)
2017-08-25T01:25:43.995 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/storm/storm-core/1.0.1/storm-core-1.0.1.jar
 (19650 KB at 2800.7 KB/sec)
2017-08-25T01:25:44.327 [INFO] Downloaded: 
https://repo.maven.apache.org/maven2/it/unimi/dsi/fastutil/6.5.7/fastutil-6.5.7.jar
 (16508 KB at 2246.8 KB/sec)
2017-08-25T01:25:47.178 [INFO] Downloading: 
http://conjars.org/repo/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar
2017-08-25T01:25:47.178 [INFO] Downloading: 
http://conjars.org/repo/cascading/cascading-hadoop/2.6.3/cascading-hadoop-2.6.3.jar
2017-08-25T01:25:47.179 [INFO] Downloading: 
http://conjars.org/repo/cascading/cascading-core/2.6.3/cascading-core-2.6.3.jar
2017-08-25T01:25:47.179 [INFO] Downloading: 
http://conjars.org/repo/riffle/riffle/0.1-dev/riffle-0.1-dev.jar
2017-08-25T01:25:47.180 [INFO] Downloading: 
http://conjars.org/repo/thirdparty/jgrapht-jdk1.6/0.8.1/jgrapht-jdk1.6-0.8.1.jar
2017-08-25T01:25:47.300 [INFO] Downloaded: 
http://conjars.org/repo/riffle/riffle/0.1-dev/riffle-0.1-dev.jar (12 KB at 91.6 
KB/sec)
2017-08-25T01:25:47.300 [INFO] Downloading: 
http://conjars.org/repo/cascading/cascading-local/2.6.3/cascading-local-2.6.3.jar
2017-08-25T01:25:47.349 [INFO] Downloaded: 
http://conjars.org/repo/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar
 (48 KB at 277.3 KB/sec)
2017-08-25T01:25:47.462 [INFO] Downloaded: 
http://conjars.org/repo/thirdparty/jgrapht-jdk1.6/0.8.1/jgrapht-jdk1.6-0.8.1.jar
 (230 KB at 810.1 KB/sec)
2017-08-25T01:25:47.471 [INFO] Downloaded: 
http://conjars.org/repo/cascading/cascading-local/2.6.3/cascading-local-2.6.3.jar
 (43 KB at 145.0 KB/sec)
2017-08-25T01:25:47.492 [INFO] Downloaded: 
http://conjars.org/repo/cascading/cascading-hadoop/2.6.3/cascading-hadoop-2.6.3.jar
 (246 KB at 783.1 KB/sec)
2017-08-25T01:25:47.635 [INFO] Downloaded: 
http://conjars.org/repo/cascading/cascading-core/2.6.3/cascading-core-2.6.3.jar 
(680 KB at 1486.8 KB/sec)
[JENKINS] Archiving disabled
2017-08-25T01:25:48.544 [INFO]  
   
2017-08-25T01:25:48.544 [INFO] 

2017-08-25T01:25:48.544 [INFO] Skipping Apache Beam :: Parent
2017-08-25T01:25:48.544 [INFO] This project has been banned from the build due 
to previous failures.

[jira] [Commented] (BEAM-2703) KafkaIO: watermark outside the bounds of BoundedWindow

2017-08-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16141005#comment-16141005
 ] 

ASF GitHub Bot commented on BEAM-2703:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3758


> KafkaIO: watermark outside the bounds of BoundedWindow
> --
>
> Key: BEAM-2703
> URL: https://issues.apache.org/jira/browse/BEAM-2703
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Reporter: Chris Pettitt
>Assignee: Raghu Angadi
>
> KafkaIO appears to use an incorrect lower bound for it's initial watermark 
> with respect to BoundedWindow.TIMESTAMP_MIN_VALUE.
> KafkaIO's initial watermark:
> new Instant(Long.MIN_VALUE) -> -9223372036854775808
> BoundedWindow.TIMESTAMP_MIN_VALUE:
> new Instant(TimeUnit.MICROSECONDS.toMillis(Long.MIN_VALUE)) -> 
> -9223372036854775
> The difference is that the last three digits have been truncated due to the 
> micro to millis conversion.
> This difference can cause errors in runners that assert that the input 
> watermark can never regress as KafkaIO gives a value below the lower bound 
> when no messages have been received yet. For consistency it would probably be 
> best for it to use BoundedWindow.TIMESTAMP_MIN_VALUE.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3758: [BEAM-2703] Fix min_timestamp used for KafkaIO wate...

2017-08-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3758


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: Fix min_timestamp used for KafkaIO watermark.

2017-08-24 Thread chamikara
Repository: beam
Updated Branches:
  refs/heads/master c33cb0340 -> 20d88dbfc


Fix min_timestamp used for KafkaIO watermark.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/3362d1f5
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/3362d1f5
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/3362d1f5

Branch: refs/heads/master
Commit: 3362d1f52bd2076908d74ff6643a483468630502
Parents: c33cb03
Author: Raghu Angadi 
Authored: Thu Aug 24 14:33:28 2017 -0700
Committer: chamik...@google.com 
Committed: Thu Aug 24 17:48:12 2017 -0700

--
 .../kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/3362d1f5/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java
--
diff --git 
a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java 
b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java
index 7fb4260..dae4c1d 100644
--- a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java
+++ b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java
@@ -82,6 +82,7 @@ import org.apache.beam.sdk.transforms.ParDo;
 import org.apache.beam.sdk.transforms.SerializableFunction;
 import org.apache.beam.sdk.transforms.SimpleFunction;
 import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
 import org.apache.beam.sdk.values.KV;
 import org.apache.beam.sdk.values.PBegin;
 import org.apache.beam.sdk.values.PCollection;
@@ -899,7 +900,7 @@ public class KafkaIO {
 private transient ConsumerSpEL consumerSpEL;
 
 /** watermark before any records have been read. */
-private static Instant initialWatermark = new Instant(Long.MIN_VALUE);
+private static Instant initialWatermark = 
BoundedWindow.TIMESTAMP_MIN_VALUE;
 
 @Override
 public String toString() {



[2/2] beam git commit: This closes #3758

2017-08-24 Thread chamikara
This closes #3758


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/20d88dbf
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/20d88dbf
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/20d88dbf

Branch: refs/heads/master
Commit: 20d88dbfc40a6a8ed2fef7f43fb185913def92e1
Parents: c33cb03 3362d1f
Author: chamik...@google.com 
Authored: Thu Aug 24 17:49:25 2017 -0700
Committer: chamik...@google.com 
Committed: Thu Aug 24 17:49:25 2017 -0700

--
 .../kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--




Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #3830

2017-08-24 Thread Apache Jenkins Server
See 




[GitHub] beam pull request #3759: Moves Match into FileIO.match()/matchAll()

2017-08-24 Thread jkff
GitHub user jkff opened a pull request:

https://github.com/apache/beam/pull/3759

Moves Match into FileIO.match()/matchAll()

FileIO will later gain other methods, such as read()/write().

Also introduces FileIO.match().filepattern(...) as a shorthand for 
Create.of(filepattern).apply(FileIO.matchAll()).

Also introduces FileIO.MatchConfiguration - a common type to use
by various file-based IOs to reduce boilerplate, and uses it in TextIO.

R: @reuvenlax 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkff/incubator-beam file-io

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3759.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3759


commit bcad095f87436be7944dd7494ecad9b13af51557
Author: Eugene Kirpichov 
Date:   2017-08-24T23:31:41Z

Moves Match into FileIO.match()/matchAll()

FileIO will later gain other methods, such as read()/write().

Also introduces FileIO.MatchConfiguration - a common type to use
by various file-based IOs to reduce boilerplate, and uses it in TextIO.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (BEAM-2806) support View.CreatePCollectionView in FlinkRunner

2017-08-24 Thread Xu Mingmin (JIRA)
Xu Mingmin created BEAM-2806:


 Summary: support View.CreatePCollectionView in FlinkRunner
 Key: BEAM-2806
 URL: https://issues.apache.org/jira/browse/BEAM-2806
 Project: Beam
  Issue Type: New Feature
  Components: runner-flink
Reporter: Xu Mingmin
Assignee: Aljoscha Krettek


Beam version: 2.2.0-SNAPSHOT

Here's the code
{code}
PCollectionView>> rowsView = rightRows
.apply(View.asMultimap());
{code}

And exception when running with {{FlinkRunner}}:
{code}
Exception in thread "main" java.lang.UnsupportedOperationException: The 
transform View.CreatePCollectionView is currently not supported.
at 
org.apache.beam.runners.flink.FlinkStreamingPipelineTranslator.visitPrimitiveTransform(FlinkStreamingPipelineTranslator.java:113)
at 
org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:594)
at 
org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:586)
at 
org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:586)
at 
org.apache.beam.sdk.runners.TransformHierarchy$Node.access$500(TransformHierarchy.java:268)
at 
org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:202)
at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:440)
at 
org.apache.beam.runners.flink.FlinkPipelineTranslator.translate(FlinkPipelineTranslator.java:38)
at 
org.apache.beam.runners.flink.FlinkStreamingPipelineTranslator.translate(FlinkStreamingPipelineTranslator.java:69)
at 
org.apache.beam.runners.flink.FlinkPipelineExecutionEnvironment.translate(FlinkPipelineExecutionEnvironment.java:104)
at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:113)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:283)
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2791) DirectRunner shuts down when there are pending event time timers

2017-08-24 Thread Thomas Groh (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Groh reassigned BEAM-2791:
-

Assignee: (was: Thomas Groh)

> DirectRunner shuts down when there are pending event time timers
> 
>
> Key: BEAM-2791
> URL: https://issues.apache.org/jira/browse/BEAM-2791
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Reporter: Kenneth Knowles
>
> When there are pending event time timers for a window by definition they are 
> within the window so it should not be possible to GC the window nor to shut 
> down the pipeline.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-1542) Need Source/Sink for Spanner

2017-08-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16140878#comment-16140878
 ] 

ASF GitHub Bot commented on BEAM-1542:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3696


> Need Source/Sink for Spanner
> 
>
> Key: BEAM-1542
> URL: https://issues.apache.org/jira/browse/BEAM-1542
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-gcp
>Reporter: Guy Molinari
>Assignee: Mairbek Khadikov
>
> Is there a source/sink for Spanner in the works?   If not I would gladly give 
> this a shot.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[2/2] beam git commit: This closes #3696: [BEAM-1542] Add SpannerAccessor

2017-08-24 Thread jkff
This closes #3696: [BEAM-1542] Add SpannerAccessor


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/c33cb034
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/c33cb034
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/c33cb034

Branch: refs/heads/master
Commit: c33cb03403c4f6fdbb49a525f66f12210c00ea0a
Parents: f3de736 c9c2e81
Author: Eugene Kirpichov 
Authored: Thu Aug 24 15:51:45 2017 -0700
Committer: Eugene Kirpichov 
Committed: Thu Aug 24 15:51:45 2017 -0700

--
 .../sdk/io/gcp/spanner/AbstractSpannerFn.java   | 71 
 .../sdk/io/gcp/spanner/CreateTransactionFn.java | 22 --
 .../sdk/io/gcp/spanner/NaiveSpannerReadFn.java  | 18 +++--
 .../sdk/io/gcp/spanner/SpannerAccessor.java | 43 
 .../beam/sdk/io/gcp/spanner/SpannerConfig.java  | 22 ++
 .../sdk/io/gcp/spanner/SpannerWriteGroupFn.java | 24 ---
 6 files changed, 111 insertions(+), 89 deletions(-)
--




[GitHub] beam pull request #3696: [BEAM-1542] Make AbstractSpannerFn public

2017-08-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3696


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: Adds SpannerAccessor - a utility for DoFn's that use Spanner

2017-08-24 Thread jkff
Repository: beam
Updated Branches:
  refs/heads/master f3de7363c -> c33cb0340


Adds SpannerAccessor - a utility for DoFn's that use Spanner


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/c9c2e816
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/c9c2e816
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/c9c2e816

Branch: refs/heads/master
Commit: c9c2e81672676e3ec705269a94f11fb1a2596c48
Parents: f3de736
Author: Mairbek Khadikov 
Authored: Mon Aug 7 12:33:19 2017 -0700
Committer: Eugene Kirpichov 
Committed: Thu Aug 24 15:51:41 2017 -0700

--
 .../sdk/io/gcp/spanner/AbstractSpannerFn.java   | 71 
 .../sdk/io/gcp/spanner/CreateTransactionFn.java | 22 --
 .../sdk/io/gcp/spanner/NaiveSpannerReadFn.java  | 18 +++--
 .../sdk/io/gcp/spanner/SpannerAccessor.java | 43 
 .../beam/sdk/io/gcp/spanner/SpannerConfig.java  | 22 ++
 .../sdk/io/gcp/spanner/SpannerWriteGroupFn.java | 24 ---
 6 files changed, 111 insertions(+), 89 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/c9c2e816/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/AbstractSpannerFn.java
--
diff --git 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/AbstractSpannerFn.java
 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/AbstractSpannerFn.java
deleted file mode 100644
index 50efdea..000
--- 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/AbstractSpannerFn.java
+++ /dev/null
@@ -1,71 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.beam.sdk.io.gcp.spanner;
-
-import com.google.cloud.spanner.DatabaseClient;
-import com.google.cloud.spanner.DatabaseId;
-import com.google.cloud.spanner.Spanner;
-import com.google.cloud.spanner.SpannerOptions;
-import org.apache.beam.sdk.transforms.DoFn;
-import org.apache.beam.sdk.util.ReleaseInfo;
-
-/**
- * Abstract {@link DoFn} that manages {@link Spanner} lifecycle. Use {@link
- * AbstractSpannerFn#databaseClient} to access the Cloud Spanner database 
client.
- */
-abstract class AbstractSpannerFn extends DoFn {
-  // A common user agent token that indicates that this request was originated 
from Apache Beam.
-  private static final String USER_AGENT_PREFIX = "Apache_Beam_Java";
-
-  private transient Spanner spanner;
-  private transient DatabaseClient databaseClient;
-
-  abstract SpannerConfig getSpannerConfig();
-
-  @Setup
-  public void setup() throws Exception {
-SpannerConfig spannerConfig = getSpannerConfig();
-SpannerOptions.Builder builder = SpannerOptions.newBuilder();
-if (spannerConfig.getProjectId() != null) {
-  builder.setProjectId(spannerConfig.getProjectId().get());
-}
-if (spannerConfig.getServiceFactory() != null) {
-  builder.setServiceFactory(spannerConfig.getServiceFactory());
-}
-ReleaseInfo releaseInfo = ReleaseInfo.getReleaseInfo();
-builder.setUserAgentPrefix(USER_AGENT_PREFIX + "/" + 
releaseInfo.getVersion());
-SpannerOptions options = builder.build();
-spanner = options.getService();
-databaseClient = spanner.getDatabaseClient(DatabaseId
-.of(options.getProjectId(), spannerConfig.getInstanceId().get(),
-spannerConfig.getDatabaseId().get()));
-  }
-
-  @Teardown
-  public void teardown() throws Exception {
-if (spanner == null) {
-  return;
-}
-spanner.close();
-spanner = null;
-  }
-
-  protected DatabaseClient databaseClient() {
-return databaseClient;
-  }
-}

http://git-wip-us.apache.org/repos/asf/beam/blob/c9c2e816/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/CreateTransactionFn.java
--
diff --git 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/CreateTransactionFn.j

[jira] [Commented] (BEAM-2792) Populate All Runner API Components from the Python SDK

2017-08-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16140868#comment-16140868
 ] 

ASF GitHub Bot commented on BEAM-2792:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3746


> Populate All Runner API Components from the Python SDK
> --
>
> Key: BEAM-2792
> URL: https://issues.apache.org/jira/browse/BEAM-2792
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (BEAM-2302) WriteFiles with runner-determined sharding and large numbers of windows causes OOM errors

2017-08-24 Thread Reuven Lax (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reuven Lax closed BEAM-2302.


> WriteFiles with runner-determined sharding and large numbers of windows 
> causes OOM errors
> -
>
> Key: BEAM-2302
> URL: https://issues.apache.org/jira/browse/BEAM-2302
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
> Fix For: 2.1.0
>
>
> This is because the WriteWindowedBundles transform will create many file 
> writers, and the sheer number of file buffers (which defaults to 64mb per 
> writer) uses up all memory. The fix is the same as was done in BigQueryIO - 
> if too many writers are opened, spill into a shuffle, and write the files 
> after the shuffle



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3746: [BEAM-2792] Translate basic coders through the Runn...

2017-08-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3746


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[3/3] beam git commit: Add URN registration mechanism for coders.

2017-08-24 Thread robertwb
Add URN registration mechanism for coders.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/ef4239ab
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/ef4239ab
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/ef4239ab

Branch: refs/heads/master
Commit: ef4239ab7928bfad95a4debb1517c2547473bf8f
Parents: d261d6b
Author: Robert Bradshaw 
Authored: Tue Aug 22 10:01:40 2017 -0700
Committer: Robert Bradshaw 
Committed: Thu Aug 24 15:41:46 2017 -0700

--
 sdks/python/apache_beam/coders/coders.py | 66 +--
 1 file changed, 53 insertions(+), 13 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/ef4239ab/sdks/python/apache_beam/coders/coders.py
--
diff --git a/sdks/python/apache_beam/coders/coders.py 
b/sdks/python/apache_beam/coders/coders.py
index 7ced5a9..0ea5f7c 100644
--- a/sdks/python/apache_beam/coders/coders.py
+++ b/sdks/python/apache_beam/coders/coders.py
@@ -202,25 +202,65 @@ class Coder(object):
 and self._dict_without_impl() == other._dict_without_impl())
 # pylint: enable=protected-access
 
-  def to_runner_api(self, context):
-"""For internal use only; no backwards-compatibility guarantees.
+  _known_urns = {}
+
+  @classmethod
+  def register_urn(cls, urn, parameter_type, fn=None):
+"""Registeres a urn with a constructor.
+
+For example, if 'beam:fn:foo' had paramter type FooPayload, one could
+write `RunnerApiFn.register_urn('bean:fn:foo', FooPayload, foo_from_proto)`
+where foo_from_proto took as arguments a FooPayload and a PipelineContext.
+This function can also be used as a decorator rather than passing the
+callable in as the final parameter.
+
+A corresponding to_runner_api_parameter method would be expected that
+returns the tuple ('beam:fn:foo', FooPayload)
 """
-# TODO(BEAM-115): Use specialized URNs and components.
-serialized_coder = serialize_coder(self)
+def register(fn):
+  cls._known_urns[urn] = parameter_type, fn
+  return staticmethod(fn)
+if fn:
+  # Used as a statement.
+  register(fn)
+else:
+  # Used as a decorator.
+  return register
+
+  def to_runner_api(self, context):
+from apache_beam.portability.api import beam_runner_api_pb2
+urn, typed_param, components = self.to_runner_api_parameter(context)
 return beam_runner_api_pb2.Coder(
 spec=beam_runner_api_pb2.SdkFunctionSpec(
 spec=beam_runner_api_pb2.FunctionSpec(
-urn=urns.PICKLED_CODER,
-any_param=proto_utils.pack_Any(
-google.protobuf.wrappers_pb2.BytesValue(
-value=serialized_coder)),
-payload=serialized_coder)))
+urn=urn,
+any_param=proto_utils.pack_Any(typed_param),
+payload=typed_param.SerializeToString()
+if typed_param is not None else None)),
+component_coder_ids=[context.coders.get_id(c) for c in components])
 
-  @staticmethod
-  def from_runner_api(proto, context):
-"""For internal use only; no backwards-compatibility guarantees.
+  @classmethod
+  def from_runner_api(cls, coder_proto, context):
+"""Converts from an SdkFunctionSpec to a Fn object.
+
+Prefer registering a urn with its parameter type and constructor.
 """
-return deserialize_coder(proto.spec.spec.payload)
+parameter_type, constructor = cls._known_urns[coder_proto.spec.spec.urn]
+return constructor(
+proto_utils.parse_Bytes(coder_proto.spec.spec.payload, parameter_type),
+[context.coders.get_by_id(c) for c in coder_proto.component_coder_ids],
+context)
+
+  def to_runner_api_parameter(self, context):
+return (
+urns.PICKLED_CODER,
+google.protobuf.wrappers_pb2.BytesValue(value=serialize_coder(self)),
+())
+
+
+@Coder.register_urn(urns.PICKLED_CODER, 
google.protobuf.wrappers_pb2.BytesValue)
+def _pickle_from_runner_api_parameter(payload, components, context):
+  return deserialize_coder(payload.value)
 
 
 class StrUtf8Coder(Coder):



[1/3] beam git commit: Closes #3746

2017-08-24 Thread robertwb
Repository: beam
Updated Branches:
  refs/heads/master d261d6bde -> f3de7363c


Closes #3746


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/f3de7363
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/f3de7363
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/f3de7363

Branch: refs/heads/master
Commit: f3de7363c1908cf5e20e2c33f6fe290e44384c60
Parents: d261d6b 9cc004f
Author: Robert Bradshaw 
Authored: Thu Aug 24 15:41:46 2017 -0700
Committer: Robert Bradshaw 
Committed: Thu Aug 24 15:41:46 2017 -0700

--
 sdks/python/apache_beam/coders/coders.py| 100 ---
 .../apache_beam/coders/coders_test_common.py|   4 +-
 sdks/python/apache_beam/utils/urns.py   |  11 +-
 3 files changed, 101 insertions(+), 14 deletions(-)
--




[2/3] beam git commit: Runner API encoding of common coders.

2017-08-24 Thread robertwb
Runner API encoding of common coders.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/9cc004fb
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/9cc004fb
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/9cc004fb

Branch: refs/heads/master
Commit: 9cc004fb0c32234b541cd622a0d0ab4c5c3d2389
Parents: ef4239a
Author: Robert Bradshaw 
Authored: Tue Aug 22 10:54:21 2017 -0700
Committer: Robert Bradshaw 
Committed: Thu Aug 24 15:41:46 2017 -0700

--
 sdks/python/apache_beam/coders/coders.py| 42 ++--
 .../apache_beam/coders/coders_test_common.py|  4 +-
 sdks/python/apache_beam/utils/urns.py   | 11 -
 3 files changed, 52 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/9cc004fb/sdks/python/apache_beam/coders/coders.py
--
diff --git a/sdks/python/apache_beam/coders/coders.py 
b/sdks/python/apache_beam/coders/coders.py
index 0ea5f7c..e204369 100644
--- a/sdks/python/apache_beam/coders/coders.py
+++ b/sdks/python/apache_beam/coders/coders.py
@@ -206,9 +206,9 @@ class Coder(object):
 
   @classmethod
   def register_urn(cls, urn, parameter_type, fn=None):
-"""Registeres a urn with a constructor.
+"""Registers a urn with a constructor.
 
-For example, if 'beam:fn:foo' had paramter type FooPayload, one could
+For example, if 'beam:fn:foo' had parameter type FooPayload, one could
 write `RunnerApiFn.register_urn('bean:fn:foo', FooPayload, foo_from_proto)`
 where foo_from_proto took as arguments a FooPayload and a PipelineContext.
 This function can also be used as a decorator rather than passing the
@@ -228,7 +228,6 @@ class Coder(object):
   return register
 
   def to_runner_api(self, context):
-from apache_beam.portability.api import beam_runner_api_pb2
 urn, typed_param, components = self.to_runner_api_parameter(context)
 return beam_runner_api_pb2.Coder(
 spec=beam_runner_api_pb2.SdkFunctionSpec(
@@ -257,6 +256,22 @@ class Coder(object):
 google.protobuf.wrappers_pb2.BytesValue(value=serialize_coder(self)),
 ())
 
+  @staticmethod
+  def register_structured_urn(urn, cls):
+"""Register a coder that's completely defined by its urn and its
+component(s), if any, which are passed to construct the instance.
+"""
+cls.to_runner_api_parameter = (
+lambda self, unused_context: (urn, None, self._get_component_coders()))
+
+# pylint: disable=unused-variable
+@Coder.register_urn(urn, None)
+def from_runner_api_parameter(unused_payload, components, unused_context):
+  if components:
+return cls(*components)
+  else:
+return cls()
+
 
 @Coder.register_urn(urns.PICKLED_CODER, 
google.protobuf.wrappers_pb2.BytesValue)
 def _pickle_from_runner_api_parameter(payload, components, context):
@@ -337,6 +352,9 @@ class BytesCoder(FastCoder):
 return hash(type(self))
 
 
+Coder.register_structured_urn(urns.BYTES_CODER, BytesCoder)
+
+
 class VarIntCoder(FastCoder):
   """Variable-length integer coder."""
 
@@ -353,6 +371,9 @@ class VarIntCoder(FastCoder):
 return hash(type(self))
 
 
+Coder.register_structured_urn(urns.VAR_INT_CODER, VarIntCoder)
+
+
 class FloatCoder(FastCoder):
   """A coder used for floating-point values."""
 
@@ -757,6 +778,9 @@ class IterableCoder(FastCoder):
 return hash((type(self), self._elem_coder))
 
 
+Coder.register_structured_urn(urns.ITERABLE_CODER, IterableCoder)
+
+
 class GlobalWindowCoder(SingletonCoder):
   """Coder for global windows."""
 
@@ -770,6 +794,9 @@ class GlobalWindowCoder(SingletonCoder):
 }
 
 
+Coder.register_structured_urn(urns.GLOBAL_WINDOW_CODER, GlobalWindowCoder)
+
+
 class IntervalWindowCoder(FastCoder):
   """Coder for an window defined by a start timestamp and a duration."""
 
@@ -791,6 +818,9 @@ class IntervalWindowCoder(FastCoder):
 return hash(type(self))
 
 
+Coder.register_structured_urn(urns.INTERVAL_WINDOW_CODER, IntervalWindowCoder)
+
+
 class WindowedValueCoder(FastCoder):
   """Coder for windowed values."""
 
@@ -847,6 +877,9 @@ class WindowedValueCoder(FastCoder):
 (self.wrapped_value_coder, self.timestamp_coder, self.window_coder))
 
 
+Coder.register_structured_urn(urns.WINDOWED_VALUE_CODER, WindowedValueCoder)
+
+
 class LengthPrefixCoder(FastCoder):
   """For internal use only; no backwards-compatibility guarantees.
 
@@ -886,3 +919,6 @@ class LengthPrefixCoder(FastCoder):
 
   def __hash__(self):
 return hash((type(self), self._value_coder))
+
+
+Coder.register_structured_urn(urns.LENGTH_PREFIX_CODER, LengthPrefixCoder)

http://git-wip-us.apache.org/repos/asf/beam/blob/9cc004fb/sdks/python/apache_beam/coders/coders_test_common.py

[jira] [Resolved] (BEAM-2797) Update unit tests for python mobile gaming examples

2017-08-24 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay resolved BEAM-2797.
---
   Resolution: Fixed
Fix Version/s: 2.2.0

> Update unit tests for python mobile gaming examples
> ---
>
> Key: BEAM-2797
> URL: https://issues.apache.org/jira/browse/BEAM-2797
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>Priority: Critical
> Fix For: 2.2.0
>
>
> Update for the changes that happened in 
> https://github.com/apache/beam/pull/3483



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2797) Update unit tests for python mobile gaming examples

2017-08-24 Thread Ahmet Altay (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16140836#comment-16140836
 ] 

Ahmet Altay commented on BEAM-2797:
---

https://github.com/apache/beam/pull/3756 fixed this.

> Update unit tests for python mobile gaming examples
> ---
>
> Key: BEAM-2797
> URL: https://issues.apache.org/jira/browse/BEAM-2797
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>Priority: Critical
> Fix For: 2.2.0
>
>
> Update for the changes that happened in 
> https://github.com/apache/beam/pull/3483



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #3829

2017-08-24 Thread Apache Jenkins Server
See 




[jira] [Created] (BEAM-2805) Potential arithmetic overflow in Generator#nextAuctionLengthMs()

2017-08-24 Thread Ted Yu (JIRA)
Ted Yu created BEAM-2805:


 Summary: Potential arithmetic overflow in 
Generator#nextAuctionLengthMs()
 Key: BEAM-2805
 URL: https://issues.apache.org/jira/browse/BEAM-2805
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-extensions
Reporter: Ted Yu
Assignee: Reuven Lax
Priority: Minor


{code}
long numEventsForAuctions =
(config.configuration.numInFlightAuctions * 
GeneratorConfig.PROPORTION_DENOMINATOR)
/ GeneratorConfig.AUCTION_PROPORTION;
{code}
The multiplication is done on 32-bit integers while long is expected 
(numEventsForAuctions).

There is possibility for arithmetic overflow



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-2804) support TIMESTAMP in Sort

2017-08-24 Thread Xu Mingmin (JIRA)
Xu Mingmin created BEAM-2804:


 Summary: support TIMESTAMP in Sort
 Key: BEAM-2804
 URL: https://issues.apache.org/jira/browse/BEAM-2804
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Xu Mingmin
Assignee: Xu Mingmin
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2703) KafkaIO: watermark outside the bounds of BoundedWindow

2017-08-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16140750#comment-16140750
 ] 

ASF GitHub Bot commented on BEAM-2703:
--

GitHub user rangadi opened a pull request:

https://github.com/apache/beam/pull/3758

[BEAM-2703] Fix min_timestamp used for KafkaIO watermark.

Use correct -ve infinity timestamp in KafkaIO as suggested in description 
of [BEAM-2703].(https://issues.apache.org/jira/browse/BEAM-2703). 

+R: @davorbonaci.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rangadi/beam fix_min_timestamp

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3758.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3758


commit 8ed07054f78aa25f64e303ec5f13ffa6ae7e701c
Author: Raghu Angadi 
Date:   2017-08-24T21:33:28Z

Fix min_timestamp used for KafkaIO watermark.




> KafkaIO: watermark outside the bounds of BoundedWindow
> --
>
> Key: BEAM-2703
> URL: https://issues.apache.org/jira/browse/BEAM-2703
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Reporter: Chris Pettitt
>Assignee: Raghu Angadi
>
> KafkaIO appears to use an incorrect lower bound for it's initial watermark 
> with respect to BoundedWindow.TIMESTAMP_MIN_VALUE.
> KafkaIO's initial watermark:
> new Instant(Long.MIN_VALUE) -> -9223372036854775808
> BoundedWindow.TIMESTAMP_MIN_VALUE:
> new Instant(TimeUnit.MICROSECONDS.toMillis(Long.MIN_VALUE)) -> 
> -9223372036854775
> The difference is that the last three digits have been truncated due to the 
> micro to millis conversion.
> This difference can cause errors in runners that assert that the input 
> watermark can never regress as KafkaIO gives a value below the lower bound 
> when no messages have been received yet. For consistency it would probably be 
> best for it to use BoundedWindow.TIMESTAMP_MIN_VALUE.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3758: [BEAM-2703] Fix min_timestamp used for KafkaIO wate...

2017-08-24 Thread rangadi
GitHub user rangadi opened a pull request:

https://github.com/apache/beam/pull/3758

[BEAM-2703] Fix min_timestamp used for KafkaIO watermark.

Use correct -ve infinity timestamp in KafkaIO as suggested in description 
of [BEAM-2703].(https://issues.apache.org/jira/browse/BEAM-2703). 

+R: @davorbonaci.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rangadi/beam fix_min_timestamp

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3758.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3758


commit 8ed07054f78aa25f64e303ec5f13ffa6ae7e701c
Author: Raghu Angadi 
Date:   2017-08-24T21:33:28Z

Fix min_timestamp used for KafkaIO watermark.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-998) Consider asking Apache to register Apache Beam trademark

2017-08-24 Thread Davor Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16140717#comment-16140717
 ] 

Davor Bonaci commented on BEAM-998:
---

Re-pinged.

> Consider asking Apache to register Apache Beam trademark
> 
>
> Key: BEAM-998
> URL: https://issues.apache.org/jira/browse/BEAM-998
> Project: Beam
>  Issue Type: Task
>  Components: project-management
>Affects Versions: Not applicable
>Reporter: Daniel Halperin
>Assignee: Davor Bonaci
>
> "Registered Trademarks If a PMC would like to request legal registration of 
> their project's trademarks, please registering their marks, please follow the 
> REGREQUEST instructions."
> http://www.apache.org/foundation/marks/pmcs#other
> The link to REGREQUEST: 
> http://www.apache.org/foundation/marks/register#register



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (BEAM-2302) WriteFiles with runner-determined sharding and large numbers of windows causes OOM errors

2017-08-24 Thread Reuven Lax (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reuven Lax resolved BEAM-2302.
--
   Resolution: Fixed
Fix Version/s: 2.1.0

> WriteFiles with runner-determined sharding and large numbers of windows 
> causes OOM errors
> -
>
> Key: BEAM-2302
> URL: https://issues.apache.org/jira/browse/BEAM-2302
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
> Fix For: 2.1.0
>
>
> This is because the WriteWindowedBundles transform will create many file 
> writers, and the sheer number of file buffers (which defaults to 64mb per 
> writer) uses up all memory. The fix is the same as was done in BigQueryIO - 
> if too many writers are opened, spill into a shuffle, and write the files 
> after the shuffle



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-1457) Enable rat plugin and findbugs plugin in default build

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-1457:
--

Assignee: (was: Davor Bonaci)

> Enable rat plugin and findbugs plugin in default build
> --
>
> Key: BEAM-1457
> URL: https://issues.apache.org/jira/browse/BEAM-1457
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Aviem Zur
>
> Today, maven rat plugin and findbugs plugin only run when `release` profile 
> is specified.
> Since these plugins do not add a large amount of time compared to the normal 
> build, and their checks are required to pass to approve pull requests - let's 
> enable them by default.
> [Original dev list 
> discussion|https://lists.apache.org/thread.html/e1f80e54b44b4a39630d978abe79fb6a6cecf71d9821ee1881b47afb@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-1975) Documentation around logging in different runners

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-1975:
--

Assignee: (was: Davor Bonaci)

> Documentation around logging in different runners
> -
>
> Key: BEAM-1975
> URL: https://issues.apache.org/jira/browse/BEAM-1975
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Aviem Zur
>
> Add documentation on how to configure logging in different runners, relate to 
> SLF4J and bindings, and which binding is used in which runner.
> Add helpful links to the different logging configuration guides for the 
> bindings used in each runner.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-1189) Add guide for release verifiers in the release guide

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-1189:
--

Assignee: (was: Davor Bonaci)

> Add guide for release verifiers in the release guide
> 
>
> Key: BEAM-1189
> URL: https://issues.apache.org/jira/browse/BEAM-1189
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Kenneth Knowles
>
> This came up during the 0.4.0-incubating release discussion.
> There is this checklist: 
> http://incubator.apache.org/guides/releasemanagement.html#check-list
> And we could point to that but make more detailed Beam-specific instructions 
> on 
> http://beam.apache.org/contribute/release-guide/#vote-on-the-release-candidate
> And the template for the vote email should include a link to suggested 
> verification steps.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-1974) Metrics documentation

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-1974:
--

Assignee: (was: Davor Bonaci)

> Metrics documentation
> -
>
> Key: BEAM-1974
> URL: https://issues.apache.org/jira/browse/BEAM-1974
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Aviem Zur
>
> Document metrics API and uses (make sure to remark that it is still 
> experimental).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-1929) I/O Authoring Overview should discuss when to use source/Pardo/IOChannelFactory

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-1929:
--

Assignee: Chamikara Jayalath  (was: Davor Bonaci)

> I/O Authoring Overview should discuss when to use 
> source/Pardo/IOChannelFactory
> ---
>
> Key: BEAM-1929
> URL: https://issues.apache.org/jira/browse/BEAM-1929
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Stephen Sisk
>Assignee: Chamikara Jayalath
>
> In a recent discussion on the mailing list [1] it came up that it's not 
> always clear when reading from files with various file formats what exactly 
> is the right way to do so.
> Key quote: "To contribute a new IO connector, how can I determine whether it 
> should be implemented as a source transform or as a scheme for the TextIO?"
> cc [~jbonofre] [~dhalp...@google.com]
> [1] 
> https://lists.apache.org/thread.html/16188ab68e738846c1620552075ff18b90fa391a7210871e9a04778d@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-1961) Standard IO metrics documentation

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-1961:
--

Assignee: (was: Davor Bonaci)

> Standard IO metrics documentation
> -
>
> Key: BEAM-1961
> URL: https://issues.apache.org/jira/browse/BEAM-1961
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Aviem Zur
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2082) I/O Authoring overview - emphasize reading the PTransform style guide

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-2082:
--

Assignee: Chamikara Jayalath  (was: Davor Bonaci)

> I/O Authoring overview - emphasize reading the PTransform style guide
> -
>
> Key: BEAM-2082
> URL: https://issues.apache.org/jira/browse/BEAM-2082
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Stephen Sisk
>Assignee: Chamikara Jayalath
>Priority: Minor
>
> currently, the I/O Authoring style guide mentions the PTransform style guide, 
> but I think it underplays the value - I'd like to emphasize it a bit more 
> (probably make it its own section or least make it the first bullet point :)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-1957) Missing DoFn annotations documentation

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-1957:
--

Assignee: Kenneth Knowles  (was: Davor Bonaci)

> Missing DoFn annotations documentation
> --
>
> Key: BEAM-1957
> URL: https://issues.apache.org/jira/browse/BEAM-1957
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Aviem Zur
>Assignee: Kenneth Knowles
>
> Not all {{DoFn}} annotations are covered by the programming guide.
> Only {{@ProcessElement}} is currently covered.
> We should have documentation for the other (non-expermintal at least) 
> annotations:
> {code}
> public @interface Setup
> public @interface StartBundle
> public @interface FinishBundle
> public @interface Teardown
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2634) Add better documentation on testing unbounded I/O scenarios

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-2634:
--

Assignee: (was: Davor Bonaci)

> Add better documentation on testing unbounded I/O scenarios
> ---
>
> Key: BEAM-2634
> URL: https://issues.apache.org/jira/browse/BEAM-2634
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Stephen Sisk
>
> The currently planned unit test & integration test docs will mostly cover 
> bounded I/O transforms - we'll need to add documentation on testing unbounded 
> I/O



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2358) "/test-your-pipeline" example code results in an exception

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-2358:
--

Assignee: Jason Kuster  (was: Davor Bonaci)

> "/test-your-pipeline" example code results in an exception
> --
>
> Key: BEAM-2358
> URL: https://issues.apache.org/jira/browse/BEAM-2358
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Nicholas Ursa
>Assignee: Jason Kuster
>  Labels: documentation, easyfix
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> https://beam.apache.org/documentation/pipelines/test-your-pipeline/ has
> {code}
>  public void testCountWords() throws Exception {
>   Pipeline p = TestPipeline.create();
> {code}
> but this results in 
> {code}
> Exception in thread "main" java.lang.IllegalStateException: Is your 
> TestPipeline declaration missing a @Rule annotation? Usage: @Rule public 
> final transient TestPipeline pipeline = TestPipeline.Create();
>   at 
> org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkState(Preconditions.java:444)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:299)
>   at BasicPipelineTest.run(BasicPipelineTest.java:42)
>   at Main.main(Main.java:25)
> {code}
> In the [github 
> example|https://github.com/apache/beam/blob/master/examples/java8/src/test/java/org/apache/beam/examples/MinimalWordCountJava8Test.java#L56]
>  it's written as:
> {code}
> public TestPipeline p = 
> TestPipeline.create().enableAbandonedNodeEnforcement(false);
> {code}
> I'm using 2.0.0 from the maven repo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-686) Integration tests should create/destroy resources required for running

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-686:
-

Assignee: (was: Davor Bonaci)

> Integration tests should create/destroy resources required for running
> --
>
> Key: BEAM-686
> URL: https://issues.apache.org/jira/browse/BEAM-686
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Luke Cwik
>Priority: Minor
>
> By not creating/tearing down resources in integration tests, the tests are 
> not portable or easily runnable by a user who doesn't have access to all the 
> pre-created test artifacts.
> For example BigtableReadIT assumes that it is executing in a project which 
> has a specifically named Bigtable instance preloaded with testdata.
> Also, KinesisReaderIT assumes an already created Kinesis topic exists.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-601) Enable Kinesis integration tests

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-601:
-

Assignee: Chamikara Jayalath  (was: Davor Bonaci)

> Enable Kinesis integration tests
> 
>
> Key: BEAM-601
> URL: https://issues.apache.org/jira/browse/BEAM-601
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Affects Versions: 0.3.0-incubating
>Reporter: Przemyslaw Pastuszka
>Assignee: Chamikara Jayalath
>
> There's an integration test for KinesisIO called KinesisReaderIT, but it is 
> currently ignored, because it needs real Kinesis instance setup.
> As part of this task please:
> * setup real Kinesis environment on AWS for testing purposes
> * enable KinesisReaderIT test
> * setup jenkins, so that it passes all KinesisTestOptions when running 
> integration tests
> This is a follow up to BEAM-461 requested by [~dhalp...@google.com] in 
> https://github.com/apache/incubator-beam/pull/687/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-1605) Add support for Apex cluster metrics to PerfKit Benchmarker

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-1605:
--

Assignee: Jason Kuster  (was: Davor Bonaci)

> Add support for Apex cluster metrics to PerfKit Benchmarker
> ---
>
> Key: BEAM-1605
> URL: https://issues.apache.org/jira/browse/BEAM-1605
> Project: Beam
>  Issue Type: Bug
>  Components: runner-apex, testing
>Reporter: Jason Kuster
>Assignee: Jason Kuster
>
> See 
> https://docs.google.com/document/d/1PsjGPSN6FuorEEPrKEP3u3m16tyOzph5FnL2DhaRDz0/edit?ts=58a78e73#heading=h.exn0s6jsm24q
>  for more details on what this entails. 
> Blocked on BEAM-1599, adding support for Apex to PKB



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-910) Always explicitly specify --output for ITs

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-910:
-

Assignee: (was: Davor Bonaci)

> Always explicitly specify --output for ITs
> --
>
> Key: BEAM-910
> URL: https://issues.apache.org/jira/browse/BEAM-910
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Kenneth Knowles
>
> With a revamp of the examples in 
> [#1275|https://github.com/apache/incubator-beam/pull/1275] the {{--output}} 
> parameter becomes mandatory rather than defaulting to the temp location (this 
> was weird anyhow).
> The ITs all fail because they leave this value implicit, as in 
> https://builds.apache.org/job/beam_PreCommit_MavenVerify/4635/org.apache.beam$beam-examples-java/testReport/junit/org.apache.beam.examples/WindowedWordCountIT/testWindowedWordCountInBatch/
> I believe this requires Jenkins access. I could be wrong.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2650) migrate beam_it_args -> beam_it_options

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-2650:
--

Assignee: Chamikara Jayalath

> migrate beam_it_args -> beam_it_options
> ---
>
> Key: BEAM-2650
> URL: https://issues.apache.org/jira/browse/BEAM-2650
> Project: Beam
>  Issue Type: Task
>  Components: testing
>Reporter: Stephen Sisk
>Assignee: Chamikara Jayalath
>
> When adding the mvn -> pkb -> mvn integration for the IO IT's usage of PKB, I 
> noticed that beam_it_args had two problems:
> 1) args had a format for passing options that was sub-optimal
> 2) the thing we are working with is options, not args, so it's mis-named.
> It was important to solve #1, but you can't just change the name since it'd 
> break with the currently checked in jenkins job. So I needed to migrate away 
> from it, and #2 presented an easy opportunity to do so, so I added 
> beam_it_options as the new option.
> We should remove usages of beam_it_args and migrate over to only 
> beam_it_options, then remove beam_it_args from pkb



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-1654) Tests that UnboundedSources are executed correctly

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-1654:
--

Assignee: (was: Davor Bonaci)

> Tests that UnboundedSources are executed correctly
> --
>
> Key: BEAM-1654
> URL: https://issues.apache.org/jira/browse/BEAM-1654
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Ben Chambers
>
> Specifically, develop a set of RunnableOnService tests that validate runner 
> behavior when executing an Unbounded Source. Validations should include 
> behaviors such as finalizeCheckpoint being called at most once, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2650) migrate beam_it_args -> beam_it_options

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-2650:
--

Assignee: (was: Davor Bonaci)

> migrate beam_it_args -> beam_it_options
> ---
>
> Key: BEAM-2650
> URL: https://issues.apache.org/jira/browse/BEAM-2650
> Project: Beam
>  Issue Type: Task
>  Components: testing
>Reporter: Stephen Sisk
>
> When adding the mvn -> pkb -> mvn integration for the IO IT's usage of PKB, I 
> noticed that beam_it_args had two problems:
> 1) args had a format for passing options that was sub-optimal
> 2) the thing we are working with is options, not args, so it's mis-named.
> It was important to solve #1, but you can't just change the name since it'd 
> break with the currently checked in jenkins job. So I needed to migrate away 
> from it, and #2 presented an easy opportunity to do so, so I added 
> beam_it_options as the new option.
> We should remove usages of beam_it_args and migrate over to only 
> beam_it_options, then remove beam_it_args from pkb



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2121) Tests on MacOS

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-2121:
--

Assignee: (was: Davor Bonaci)

> Tests on MacOS
> --
>
> Key: BEAM-2121
> URL: https://issues.apache.org/jira/browse/BEAM-2121
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Ahmet Altay
>
> After removal of Travis testing, we lost the ability to test on Macs. I am 
> wondering if this is possible on Jenkins. A simple web search for "cloud 
> macos" shows many promising results.
> cc: [~davor]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2220) Move org.apache.beam.sdk.util within google-cloud-platform-core to org.apache.beam.runners.dataflow

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-2220:
--

Assignee: Luke Cwik  (was: Davor Bonaci)

> Move org.apache.beam.sdk.util within google-cloud-platform-core to 
> org.apache.beam.runners.dataflow
> ---
>
> Key: BEAM-2220
> URL: https://issues.apache.org/jira/browse/BEAM-2220
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>
> Move org.apache.beam.sdk.util within google-cloud-platform-core to underneath 
> org.apache.beam.runners.dataflow



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-1609) Add support for Beam Metrics API to PKB

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-1609:
--

Assignee: Jason Kuster  (was: Davor Bonaci)

> Add support for Beam Metrics API to PKB
> ---
>
> Key: BEAM-1609
> URL: https://issues.apache.org/jira/browse/BEAM-1609
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Jason Kuster
>Assignee: Jason Kuster
>
> See 
> https://docs.google.com/document/d/1PsjGPSN6FuorEEPrKEP3u3m16tyOzph5FnL2DhaRDz0/edit?ts=58a78e73#heading=h.exn0s6jsm24q
>  for more details on what this entails.
> Blocked on BEAM-147 -- creation of metrics API.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-1271) Develop Apache Accumulo IO

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-1271:
--

Assignee: Reuven Lax  (was: Davor Bonaci)

> Develop Apache Accumulo IO
> --
>
> Key: BEAM-1271
> URL: https://issues.apache.org/jira/browse/BEAM-1271
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Affects Versions: Not applicable
>Reporter: Wyatt Frelot
>Assignee: Reuven Lax
>Priority: Minor
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Develop the Apache Accumulo IO for write and read operations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-1271) Develop Apache Accumulo IO

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-1271:
--

Assignee: Wyatt Frelot  (was: Reuven Lax)

> Develop Apache Accumulo IO
> --
>
> Key: BEAM-1271
> URL: https://issues.apache.org/jira/browse/BEAM-1271
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Affects Versions: Not applicable
>Reporter: Wyatt Frelot
>Assignee: Wyatt Frelot
>Priority: Minor
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Develop the Apache Accumulo IO for write and read operations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-1305) Support additional configuration in BigQueryServices.insertAll()

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-1305:
--

Assignee: (was: Davor Bonaci)

> Support additional configuration in BigQueryServices.insertAll()
> 
>
> Key: BEAM-1305
> URL: https://issues.apache.org/jira/browse/BEAM-1305
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Pei He
>
> ignoreUnknownValues is requested in 
> https://issues.apache.org/jira/browse/BEAM-1267
> There are additional configurations that could be useful.
> TableDataInsertAllRequest content = new TableDataInsertAllRequest();
> content.setSkipInvalidRows();
> content.setTemplateSuffix();
> content.setKind();
> 
> I think we can improve the BigQueryServices interface by define it as:
> void insertAll(TableReference ref, Collection 
> request);
> and, provided a static method to prepare requests:
> List makeInsertBatches(List rowList, 
> @Nullable List insertIdList);
> Then, client can set additional config in the returned list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2752) Job fails to checkpoint with kinesis stream as an input for Flink job

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-2752:
--

Assignee: Chamikara Jayalath  (was: Davor Bonaci)

> Job fails to checkpoint with kinesis stream as an input for Flink job
> -
>
> Key: BEAM-2752
> URL: https://issues.apache.org/jira/browse/BEAM-2752
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Affects Versions: 2.0.0
>Reporter: Pawel Bartoszek
>Assignee: Chamikara Jayalath
>Priority: Minor
>
> Our job is reading from kinesis stream as a job input. Quiet often when the 
> job is checkpointing for the first time the exception is thrown:
> The scenario the produces the exception:
> # Upload a new jar file with job logic
> # Start new job
> # Stop the job with savepoint that is written to s3
> # Upload a new jar file with job logic(in this case the jar contains the same 
> code - but our pipeline generates new jar file name for every build)
> # Start a new job from savepoint
> # The first checkpoint fails causing the job to be cancelled
> If the job is started without passing savepoint the checkpointing works fine.
> Other information:
> Flink version 1.2.1
> Beam 2.0.0
> Flink Parallelism - 20 slots 
> Number of task managers - 4
> Number of kinesis shards - 8
> {code:java}
> java.lang.Exception: Error while triggering checkpoint 59 for Source: 
> Read(KinesisSource) -> Flat Map -> ParMultiDo(KinesisExtractor) -> Flat Map 
> -> ParMultiDo(StringToRecord) -> Flat Map -> ParMultiDo(Anonymous) -> Flat 
> Map -> ParMultiDo(ToRRecord) -> Flat Map -> ParMultiDo(AddTimestamps) -> Flat 
> Map -> ..GroupByOneMinuteWindow GROUP RDOTRECORDS BY ONE MINUTE 
> WINDOWS/Window.Assign.out -> (ParMultiDo(Anonymous) -> Flat Map -> 
> ParMultiDo(ToSomeKey) -> Flat Map -> ToKeyedWorkItem, 
> ParMultiDo(ToCompositeKey) -> Flat Map -> ParMultiDo(Anonymous) -> Flat Map 
> -> ToKeyedWorkItem, ParMultiDo(Anonymous) -> Flat Map -> 
> ParMultiDo(ApplyShardingKey) -> Flat Map -> ToKeyedWorkItem) (1/20)
>   at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1136)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.Exception: Could not perform checkpoint 59 for operator 
> Source: Read(KinesisSource) -> Flat Map -> ParMultiDo(KinesisExtractor) -> 
> Flat Map -> ParMultiDo(StringToRecord) -> Flat Map -> ParMultiDo(Anonymous) 
> -> Flat Map -> ParMultiDo(ToRRecord) -> Flat Map -> ParMultiDo(AddTimestamps) 
> -> Flat Map -> ..GroupByOneMinuteWindow GROUP RDOTRECORDS BY ONE 
> MINUTE WINDOWS/Window.Assign.out -> (ParMultiDo(Anonymous) -> Flat Map -> 
> ParMultiDo(ToSomeKey) -> Flat Map -> ToKeyedWorkItem, 
> ParMultiDo(ToCompositeKey) -> Flat Map -> ParMultiDo(Anonymous) -> Flat Map 
> -> ToKeyedWorkItem, ParMultiDo(Anonymous) -> Flat Map -> 
> ParMultiDo(ApplyShardingKey) -> Flat Map -> ToKeyedWorkItem) (1/20).
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:524)
>   at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1125)
>   ... 5 more
> Caused by: java.lang.Exception: Could not complete snapshot 59 for operator 
> Source: Read(KinesisSource) -> Flat Map -> ParMultiDo(KinesisExtractor) -> 
> Flat Map -> ParMultiDo(StringToRecord) -> Flat Map -> ParMultiDo(Anonymous) 
> -> Flat Map -> ParMultiDo(ToRRecord) -> Flat Map -> ParMultiDo(AddTimestamps) 
> -> Flat Map -> ..GroupByOneMinuteWindow GROUP RDOTRECORDS BY ONE 
> MINUTE WINDOWS/Window.Assign.out -> (ParMultiDo(Anonymous) -> Flat Map -> 
> ParMultiDo(ToSomeKey) -> Flat Map -> ToKeyedWorkItem, 
> ParMultiDo(ToCompositeKey) -> Flat Map -> ParMultiDo(Anonymous) -> Flat Map 
> -> ToKeyedWorkItem, ParMultiDo(Anonymous) -> Flat Map -> 
> ParMultiDo(ApplyShardingKey) -> Flat Map -> ToKeyedWorkItem) (1/20).
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:379)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(StreamTask.java:1157)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1090)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:630)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:575)

[jira] [Assigned] (BEAM-1664) Support Kafka0.8.x client in KafkaIO

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-1664:
--

Assignee: Raghu Angadi  (was: Davor Bonaci)

> Support  Kafka0.8.x client in KafkaIO
> -
>
> Key: BEAM-1664
> URL: https://issues.apache.org/jira/browse/BEAM-1664
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: JiJun Tang
>Assignee: Raghu Angadi
>
> Kafka-0.8 is not supported yet, these's a big change from 0.8 to 0.9. So we 
> need to create a specific KafkaIO moudle for 0.8. After complete this 
> moudle,we will consider to extract common code to kafkaio-common  moudle.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2639) Unbounded Source for MongoDB

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-2639:
--

Assignee: Jean-Baptiste Onofré  (was: Davor Bonaci)

> Unbounded Source for MongoDB
> 
>
> Key: BEAM-2639
> URL: https://issues.apache.org/jira/browse/BEAM-2639
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Affects Versions: 2.0.0
>Reporter: nevi_me
>Assignee: Jean-Baptiste Onofré
>Priority: Minor
>
> The current MongoDB source is bounded, which means that we can't build 
> streaming pipelines directly from MongoDB.
> MongoDB publishes changes in each collection through the oplog. Would it be 
> possible to create a connector that reads the oplog to create an unbounded 
> source?
> As an oplog is only available through replication, this creates that 
> dependency. We would need to also consider whether a polling method (using 
> the ObjectId) could be an appropriate fallback.
> Thanks



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-1879) PTransform style guide should discuss display data

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-1879:
--

Assignee: Eugene Kirpichov  (was: Davor Bonaci)

> PTransform style guide should discuss display data
> --
>
> Key: BEAM-1879
> URL: https://issues.apache.org/jira/browse/BEAM-1879
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Stephen Sisk
>Assignee: Eugene Kirpichov
>
> Currently, the PTransform style guide 
> (https://beam.apache.org/contribute/ptransform-style-guide/) does not discuss 
> display data at all.
> We should make sure to discuss testing display data - specifically  that 
> using DisplayDataEvaluator is a best practice for testing since without it, 
> you cannot tell whether or not the display data will actually be displayed.
> cc [~swegner] [~bchambers] [~jkff]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2582) KinesisIO incorrectly handles closed shards

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-2582:
--

Assignee: Chamikara Jayalath  (was: Davor Bonaci)

> KinesisIO incorrectly handles closed shards
> ---
>
> Key: BEAM-2582
> URL: https://issues.apache.org/jira/browse/BEAM-2582
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Reporter: adam gray
>Assignee: Chamikara Jayalath
>
> The KinesisIO throws an exception when consuming closed Kinesis shards, which 
> return null from `GetShardIterator`, as it tries to call `GetRecords` with 
> the null `shardIterator` value instead of abandoning the closed shard.
> This means KinesisIO fails after re-sharding a stream with an exception like 
> the following:
> {noformat}
> Exception in thread "main" java.lang.RuntimeException: Kinesis client side 
> failure
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.wrapExceptions(SimplifiedKinesisClient.java:151)
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.getRecords(SimplifiedKinesisClient.java:115)
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.getRecords(SimplifiedKinesisClient.java:102)
>   at 
> org.apache.beam.sdk.io.kinesis.ShardRecordsIterator.readMoreIfNecessary(ShardRecordsIterator.java:79)
>   at 
> org.apache.beam.sdk.io.kinesis.ShardRecordsIterator.next(ShardRecordsIterator.java:64)
>   at 
> org.apache.beam.sdk.io.kinesis.KinesisReader.advance(KinesisReader.java:86)
>   at 
> org.apache.beam.runners.direct.UnboundedReadEvaluatorFactory$UnboundedReadEvaluator.startReader(UnboundedReadEvaluatorFactory.java:190)
>   at 
> org.apache.beam.runners.direct.UnboundedReadEvaluatorFactory$UnboundedReadEvaluator.processElement(UnboundedReadEvaluatorFactory.java:128)
>   at 
> org.apache.beam.runners.direct.TransformExecutor.processElements(TransformExecutor.java:139)
>   at 
> org.apache.beam.runners.direct.TransformExecutor.run(TransformExecutor.java:107)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: com.amazonaws.AmazonServiceException: 1 validation error detected: 
> Value null at 'shardIterator' failed to satisfy constraint: Member must not 
> be null (Service: AmazonKinesis; Status Code: 400; Error Code: 
> ValidationException; Request ID: d764e747-9616-5db3-86ba-08a0bc44cb39)
>   at 
> com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1378)
>   at 
> com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:924)
>   at 
> com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:702)
>   at 
> com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:454)
>   at 
> com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:416)
>   at 
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:365)
>   at 
> com.amazonaws.services.kinesis.AmazonKinesisClient.doInvoke(AmazonKinesisClient.java:2016)
>   at 
> com.amazonaws.services.kinesis.AmazonKinesisClient.invoke(AmazonKinesisClient.java:1986)
>   at 
> com.amazonaws.services.kinesis.AmazonKinesisClient.getRecords(AmazonKinesisClient.java:985)
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient$3.call(SimplifiedKinesisClient.java:118)
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient$3.call(SimplifiedKinesisClient.java:115)
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.wrapExceptions(SimplifiedKinesisClient.java:140)
>   ... 14 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2703) KafkaIO: watermark outside the bounds of BoundedWindow

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-2703:
--

Assignee: Raghu Angadi  (was: Davor Bonaci)

> KafkaIO: watermark outside the bounds of BoundedWindow
> --
>
> Key: BEAM-2703
> URL: https://issues.apache.org/jira/browse/BEAM-2703
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Reporter: Chris Pettitt
>Assignee: Raghu Angadi
>
> KafkaIO appears to use an incorrect lower bound for it's initial watermark 
> with respect to BoundedWindow.TIMESTAMP_MIN_VALUE.
> KafkaIO's initial watermark:
> new Instant(Long.MIN_VALUE) -> -9223372036854775808
> BoundedWindow.TIMESTAMP_MIN_VALUE:
> new Instant(TimeUnit.MICROSECONDS.toMillis(Long.MIN_VALUE)) -> 
> -9223372036854775
> The difference is that the last three digits have been truncated due to the 
> micro to millis conversion.
> This difference can cause errors in runners that assert that the input 
> watermark can never regress as KafkaIO gives a value below the lower bound 
> when no messages have been received yet. For consistency it would probably be 
> best for it to use BoundedWindow.TIMESTAMP_MIN_VALUE.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2704) KafkaIO: NPE without key serializer set

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-2704:
--

Assignee: Raghu Angadi  (was: Davor Bonaci)

> KafkaIO: NPE without key serializer set
> ---
>
> Key: BEAM-2704
> URL: https://issues.apache.org/jira/browse/BEAM-2704
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Reporter: Chris Pettitt
>Assignee: Raghu Angadi
>
> The KafkaIO javadoc implies that you do not need to set a Serializer if you 
> only want to emit values:
> {code}
>  * Often you might want to write just values without any keys to Kafka. 
> Use {@code values()} to
>  * write records with default empty(null) key:
>  *
>  * {@code
>  *  PCollection strings = ...;
>  *  strings.apply(KafkaIO.write()
>  *  .withBootstrapServers("broker_1:9092,broker_2:9092")
>  *  .withTopic("results")
>  *  .withValueSerializer(new StringSerializer()) // just need serializer 
> for value
>  *  .values()
>  *);
>  * }
> {code}
> However, if you don't set the key serializer then Kafka blows up when trying 
> to instantiate the key serializer (in Kafka 0.10.1, at least). It would be 
> more convenient if KafkaIO worked as documented and assigned a null 
> serializer if values() is used.  
> Relevant stack trace:
> {code}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.kafka.clients.producer.KafkaProducer.(KafkaProducer.java:230)
>   at 
> org.apache.kafka.clients.producer.KafkaProducer.(KafkaProducer.java:163)
>   at 
> org.apache.beam.sdk.io.kafka.KafkaIO$KafkaWriter.setup(KafkaIO.java:1582)
>   at 
> org.apache.beam.sdk.io.kafka.KafkaIO$KafkaWriter$DoFnInvoker.invokeSetup(Unknown
>  Source)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2605) Exception raised when using MongoDBIO.Write in streaming mode

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-2605:
--

Assignee: Jean-Baptiste Onofré  (was: Davor Bonaci)

> Exception raised when using MongoDBIO.Write in streaming mode
> -
>
> Key: BEAM-2605
> URL: https://issues.apache.org/jira/browse/BEAM-2605
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Affects Versions: 2.0.0
>Reporter: Pascal Castéran
>Assignee: Jean-Baptiste Onofré
>
> In org.apache.beam.sdk.io.mongodb.MongoDbIO.Write.WriteFn#flush(), no check 
> is done on the size of the batch list of documents before executing the 
> _*insertMany*_ operation.
> In streaming mode, when processing an empty pane, an empty list of documents 
> can be passed to the MongoDB client which results in the following exception:
> {quote}java.lang.IllegalArgumentException: state should be: writes is not an 
> empty list
>   at com.mongodb.assertions.Assertions.isTrueArgument(Assertions.java:99)
>   at 
> com.mongodb.operation.MixedBulkWriteOperation.(MixedBulkWriteOperation.java:95)
>   at 
> com.mongodb.MongoCollectionImpl.insertMany(MongoCollectionImpl.java:323)
>   at 
> com.mongodb.MongoCollectionImpl.insertMany(MongoCollectionImpl.java:311)
>   at 
> org.apache.beam.sdk.io.mongodb.MongoDbIO$Write$WriteFn.flush(MongoDbIO.java:513)
>   at 
> org.apache.beam.sdk.io.mongodb.MongoDbIO$Write$WriteFn.finishBundle(MongoDbIO.java:506){quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2737) Test SSL/TLS and authentication in Elasticsearch integration tests

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-2737:
--

Assignee: Etienne Chauchot  (was: Davor Bonaci)

> Test SSL/TLS and authentication in Elasticsearch integration tests
> --
>
> Key: BEAM-2737
> URL: https://issues.apache.org/jira/browse/BEAM-2737
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-extensions
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>
> Testing authentication and SSL/TLS communication requires to setup shield and 
> certificates. This is not doable in the embedded Elasticsearch used for 
> UTests. So, do this tests as integration tests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2738) Setup shield and certificates in Elasticseach ITests backend server

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-2738:
--

Assignee: Etienne Chauchot  (was: Davor Bonaci)

> Setup shield and certificates in Elasticseach ITests backend server
> ---
>
> Key: BEAM-2738
> URL: https://issues.apache.org/jira/browse/BEAM-2738
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-extensions
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>
> Linnks to official documentation:
> https://www.elastic.co/guide/en/shield/current/ssl-tls.html
> https://www.elastic.co/guide/en/shield/current/native-realm.html#managing-native-users



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-2721) Augment BeamRecordType to do slicing and concatenation.

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci updated BEAM-2721:
---
Component/s: (was: sdk-java-extensions)
 dsl-sql

> Augment BeamRecordType to do slicing and concatenation.
> ---
>
> Key: BEAM-2721
> URL: https://issues.apache.org/jira/browse/BEAM-2721
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Robert Bradshaw
>Assignee: Xu Mingmin
>
> Currently in several places we cast to BeamSqlRecordType, extract the field 
> type ints, do the slicing, and then reconstruct a new BeamSqlRecordType. If 
> BeamRecordType had polymorphic methods to slice/concat this would be cleaner 
> and more flexible. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2577) IO tests should exercise Runtime Values where supported

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-2577:
--

Assignee: (was: Davor Bonaci)

> IO tests should exercise Runtime Values where supported
> ---
>
> Key: BEAM-2577
> URL: https://issues.apache.org/jira/browse/BEAM-2577
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions, testing
>Reporter: Ben Chambers
>
> The only tests I have found for `ValueProvider` parameterized methods is 
> that they are not evaluated during pipeline construction time. This is 
> missing out on several important pieces:
> 1. 
> https://stackoverflow.com/questions/44967898/notify-when-textio-is-done-writing-a-file
>  seems to be a problem with an AvroIO write using a RuntimeValueProvider 
> being non-serializable (current theory is because of an anonymous inner class 
> capturing the enclosing AvroIO.Write instance which has non-serializable 
> fields).
> 2. Testing that the code paths that actually read the file do so correctly 
> when parameterized.
> We should update the developer documentation to describe what the 
> requirements are for a parameterized IO and provide guidance on what tests 
> are needed and how to write them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2721) Augment BeamRecordType to do slicing and concatenation.

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-2721:
--

Assignee: Xu Mingmin  (was: Davor Bonaci)

> Augment BeamRecordType to do slicing and concatenation.
> ---
>
> Key: BEAM-2721
> URL: https://issues.apache.org/jira/browse/BEAM-2721
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Robert Bradshaw
>Assignee: Xu Mingmin
>
> Currently in several places we cast to BeamSqlRecordType, extract the field 
> type ints, do the slicing, and then reconstruct a new BeamSqlRecordType. If 
> BeamRecordType had polymorphic methods to slice/concat this would be cleaner 
> and more flexible. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-449) Support PCollectionList in PAssert

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-449:
-

Assignee: (was: Davor Bonaci)

> Support PCollectionList in PAssert
> --
>
> Key: BEAM-449
> URL: https://issues.apache.org/jira/browse/BEAM-449
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Thomas Groh
>Priority: Minor
>
> The assertion takes an input PCollectionList and takes a list of matchers of 
> the same size, and applies each matcher to the identical index of the 
> PCollectionList
> e.g. PAssert.that(PCollectionList[0]).satisfies(matchers[0])
> Potentially also worthwhile is a "PAssert.thatFlattened(PCollectionList)" 
> static constructor, that runs an assertion on the flattened contents of the 
> list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-448) Print Properties on validation failures in PipelineOptionsValidator

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-448:
-

Assignee: (was: Davor Bonaci)

> Print Properties on validation failures in PipelineOptionsValidator
> ---
>
> Key: BEAM-448
> URL: https://issues.apache.org/jira/browse/BEAM-448
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Thomas Groh
>Priority: Minor
>
> If Pipeline Validation fails in the pipeline options Validator, the methods 
> that failed validation are printed. Instead, the property names (passed on 
> the command line) should be printed for consistency with the 
> PipelineOptionsFactory.
> Currently methods are printed at 
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsValidator.java#L72
> PipelineOptionsReflector currently extracts the property -> method mapping
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsReflector.java#L95



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-1408) outputWithTimestamp() accepts timestamps that will fail preconditions

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-1408:
--

Assignee: (was: Davor Bonaci)

> outputWithTimestamp() accepts timestamps that will fail preconditions
> -
>
> Key: BEAM-1408
> URL: https://issues.apache.org/jira/browse/BEAM-1408
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Andy Xu
>Priority: Minor
>
> We have accidentally created events with *wrong* timestamps in the future 
> which are accepted by
> outputWithTimestamp(), but will fail at a later step:
> java.lang.IllegalStateException: Timer 472976-06-15T20:09:57.269Z is beyond 
> end-of-time
> at Preconditions.checkState(Preconditions.java:199)
> at 
> ReduceFnRunner.scheduleEndOfWindowOrGarbageCollectionTimer(ReduceFnRunner.java:1050)
> [...]
> Would it make sense to implement a check already at outputWithTimestamp() 
> level to fail early?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-1639) Catch bad FractionConsumed values in the beam SDK.

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-1639:
--

Assignee: (was: Davor Bonaci)

> Catch bad FractionConsumed values in the beam SDK.
> --
>
> Key: BEAM-1639
> URL: https://issues.apache.org/jira/browse/BEAM-1639
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Stephen Sisk
>Priority: Minor
>
> getFractionConsumed in Sources are expected to return values between 0.0 and 
> 1.0
> I recently encountered a bug where a bad value of fractionConsumed was sent 
> to a runner. Looking through the beam source, I couldn't find anywhere that 
> we validate the value before we send it (it could be there and I didn't see 
> it, but given that I saw this error with a real source, I don't believe it 
> is.)
> I think it'd be useful if the beam SDK caught this error at the time the 
> value is returned and at least generate a useful error in the logs (thus 
> allowing the user to more easily debug the issue)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-728) Javadoc should clearly separate facts from runner requirements

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-728:
-

Assignee: (was: Davor Bonaci)

> Javadoc should clearly separate facts from runner requirements
> --
>
> Key: BEAM-728
> URL: https://issues.apache.org/jira/browse/BEAM-728
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Frances Perry
>
> The javadoc for View.asMap() says the map needs to fit in memory. That's not 
> true in all runners. (For example, Dataflow has distributed map support.) 
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/View.java
> This is likely just one specific case of a more general issue -- different 
> runners will have common constraints on the scalability of portions of the 
> model. Currently these are documented in the capability matrix on the 
> website, but for usability we should consider surfacing these constraints on 
> particularly relevant methods. But keeping things in sync in multiple 
> locations is hard...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-1662) Re. BEAM-974 - PubSub.read/write() .withCoder requirement should raise a more informative error in Dataflow

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-1662:
--

Assignee: (was: Davor Bonaci)

> Re. BEAM-974 - PubSub.read/write() .withCoder requirement should raise a more 
> informative error in Dataflow
> ---
>
> Key: BEAM-1662
> URL: https://issues.apache.org/jira/browse/BEAM-1662
> Project: Beam
>  Issue Type: Wish
>  Components: sdk-java-core
> Environment: google cloud / dataflow
>Reporter: G Money
>Priority: Minor
>
> Hello,
> I'm a Google Cloud Support Specialist who recently owned a case from a 
> platinum customer who was reporting "validation of workflow failed" internal 
> error when attempting to use new beta SDK.  Eventually, it was determined 
> that the problem was that they weren't using .withCoder, which became 
> required after BEAM-974[1].  They would like to request that a more 
> informative error be thrown, as the current one is far too vague to be able 
> to derive any useful information from.  Thank you.
> Regards,
> Garrett Anderson
> Cloud Support Specialist
> Google Cloud Platform Support 
> [1] https://issues.apache.org/jira/browse/BEAM-974



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-1461) duplication with StartBundle and prepareForProcessing in DoFn

2017-08-24 Thread Davor Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16140695#comment-16140695
 ] 

Davor Bonaci commented on BEAM-1461:


Seems complete; I'd resolve. Please reactivate if there's more work to do.

> duplication with StartBundle and prepareForProcessing in DoFn
> -
>
> Key: BEAM-1461
> URL: https://issues.apache.org/jira/browse/BEAM-1461
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Xu Mingmin
>Assignee: Davor Bonaci
> Fix For: Not applicable
>
>
> There're one annotation `StartBundle`, and one public function 
> `prepareForProcessing` in DoFn, which are called both before 
> `ProcessElement`. It's confused which one should be implemented in a subclass.
> The call sequence seems as:
> prepareForProcessing -> StartBundle -> processElement



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (BEAM-1461) duplication with StartBundle and prepareForProcessing in DoFn

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci resolved BEAM-1461.

   Resolution: Fixed
Fix Version/s: Not applicable

> duplication with StartBundle and prepareForProcessing in DoFn
> -
>
> Key: BEAM-1461
> URL: https://issues.apache.org/jira/browse/BEAM-1461
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Xu Mingmin
>Assignee: Davor Bonaci
> Fix For: Not applicable
>
>
> There're one annotation `StartBundle`, and one public function 
> `prepareForProcessing` in DoFn, which are called both before 
> `ProcessElement`. It's confused which one should be implemented in a subclass.
> The call sequence seems as:
> prepareForProcessing -> StartBundle -> processElement



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (BEAM-1691) Dynamic properties supported in PipelineOptions

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci resolved BEAM-1691.

   Resolution: Not A Problem
Fix Version/s: Not applicable

> Dynamic properties supported in PipelineOptions
> ---
>
> Key: BEAM-1691
> URL: https://issues.apache.org/jira/browse/BEAM-1691
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Xu Mingmin
>Assignee: Davor Bonaci
> Fix For: Not applicable
>
>
> Usually the two lines to create a new Beam pipeline are:
> {code}
> Options options = 
> PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);
> Pipeline pipeline = Pipeline.create(options);
> {code} 
> As each runner has its own PipelineOptions, one piece of code is hardly to 
> run on different runners without code change, --as least Options needs to be 
> updated.
> Dynamic property could be a choice, similar as
> {code}
> -D property1=value1 -D property2=value2 ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-1691) Dynamic properties supported in PipelineOptions

2017-08-24 Thread Davor Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16140693#comment-16140693
 ] 

Davor Bonaci commented on BEAM-1691:


I'd resolve for now; if we want to change the behavior here, perhaps there 
should be a dev@ discussion first.

> Dynamic properties supported in PipelineOptions
> ---
>
> Key: BEAM-1691
> URL: https://issues.apache.org/jira/browse/BEAM-1691
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Xu Mingmin
>Assignee: Davor Bonaci
> Fix For: Not applicable
>
>
> Usually the two lines to create a new Beam pipeline are:
> {code}
> Options options = 
> PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);
> Pipeline pipeline = Pipeline.create(options);
> {code} 
> As each runner has its own PipelineOptions, one piece of code is hardly to 
> run on different runners without code change, --as least Options needs to be 
> updated.
> Dynamic property could be a choice, similar as
> {code}
> -D property1=value1 -D property2=value2 ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-1898) Need a SerializableThrowable for PAssert to capture the point of an assertion

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-1898:
--

Assignee: (was: Davor Bonaci)

> Need a SerializableThrowable for PAssert to capture the point of an assertion
> -
>
> Key: BEAM-1898
> URL: https://issues.apache.org/jira/browse/BEAM-1898
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Pablo Estrada
>
> For the regular Error class in Java, its stack trace is a transient 
> attribute, so it's not serialized by any framework. We need a class that 
> allows us to serialize these AssertionErrors to have them in PCollections.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-1448) Coder encode/decode context documentation is lacking

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-1448:
--

Assignee: (was: Davor Bonaci)

> Coder encode/decode context documentation is lacking
> 
>
> Key: BEAM-1448
> URL: https://issues.apache.org/jira/browse/BEAM-1448
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Aviem Zur
>  Labels: documentation
>
> Coder encode/decode context documentation is lacking.
> * Documentation of {{Coder}} methods {{encode}} and {{decode}} should include 
> description of {{context}} argument and explain how to relate to it when 
> implementing.
> * Consider renaming the static {{Context}} values {{NESTED}} and {{OUTER}} to 
> more accurate names.
> * Emphasize the use of CoderProperties as the best way to test a coder.
> [Original dev list 
> discussion|https://lists.apache.org/thread.html/fbd2d6b869ac2b0225ec39461b14158a03f304a930782d39ac9a60a6@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2303) Add SpecificData to AvroCoder

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-2303:
--

Assignee: (was: Davor Bonaci)

> Add SpecificData to AvroCoder
> -
>
> Key: BEAM-2303
> URL: https://issues.apache.org/jira/browse/BEAM-2303
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 2.1.0
>Reporter: Arvid Heise
>
> The AvroCoder currently supports GenericData and ReflectData, but not 
> SpecificData.
> It should relatively easy to incorporate it by expanding the logic while 
> constructing the Reader and Writer by also checking if the type implements 
> the SpecificRecord interface. It would greatly speed up (de-)serialization of 
> Avro-generated java classes.
> {code}
> return myCoder.getType().equals(GenericRecord.class)
> ? new GenericDatumReader(myCoder.getSchema())
> : new ReflectDatumReader(
> myCoder.getSchema(), myCoder.getSchema(), 
> myCoder.reflectData.get());
> {code}
> should be
> {code}
> if (myCoder.getType().equals(GenericRecord.class)) {
> return new 
> GenericDatumReader(myCoder.getSchema());
> }
> if 
> (SpecificRecord.class.isAssignableFrom(myCoder.getType())) {
> return new 
> SpecificDatumReader(myCoder.getType());
> }
> return new ReflectDatumReader(
> myCoder.getSchema(), myCoder.getSchema(), 
> myCoder.reflectData.get());
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2069) Remove ResourceId.getCurrentDirectory()?

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-2069:
--

Assignee: (was: Davor Bonaci)

> Remove ResourceId.getCurrentDirectory()?
> 
>
> Key: BEAM-2069
> URL: https://issues.apache.org/jira/browse/BEAM-2069
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.0.0
>Reporter: Stephen Sisk
>  Labels: backward-incompatible
>
> Beam ResourceId currently has a getCurrentDirectory method that returns the 
> current resource id if it's a directory, or the parent directory if it's a 
> directory.
> To implement this you need to know whether or not a particular path is a 
> directory or not.
> I'm trying to implement the Hadoop ResourceId implementation, and it's not 
> clear if it's possible. Hadoop's Paths do not end a / if they are a directory 
> (they are stripped), nor do hadoop paths tell you if something is a 
> directory, so it's not possible to determine if a given path is a file that 
> does not have a suffix, or a directory.
> It's not clear to me that all file systems can determine whether a path is a 
> directory and thus I don't believe it can be implemented reliably.
> The only usages of getCurrentDirectory that I could find are in tests so it's 
> not clear we actually need this.
> I propose that we remove this method.
> cc [~davor]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2302) WriteFiles with runner-determined sharding and large numbers of windows causes OOM errors

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-2302:
--

Assignee: Reuven Lax  (was: Davor Bonaci)

> WriteFiles with runner-determined sharding and large numbers of windows 
> causes OOM errors
> -
>
> Key: BEAM-2302
> URL: https://issues.apache.org/jira/browse/BEAM-2302
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>
> This is because the WriteWindowedBundles transform will create many file 
> writers, and the sheer number of file buffers (which defaults to 64mb per 
> writer) uses up all memory. The fix is the same as was done in BigQueryIO - 
> if too many writers are opened, spill into a shuffle, and write the files 
> after the shuffle



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2302) WriteFiles with runner-determined sharding and large numbers of windows causes OOM errors

2017-08-24 Thread Davor Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16140689#comment-16140689
 ] 

Davor Bonaci commented on BEAM-2302:


Fixed?

> WriteFiles with runner-determined sharding and large numbers of windows 
> causes OOM errors
> -
>
> Key: BEAM-2302
> URL: https://issues.apache.org/jira/browse/BEAM-2302
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Davor Bonaci
>
> This is because the WriteWindowedBundles transform will create many file 
> writers, and the sheer number of file buffers (which defaults to 64mb per 
> writer) uses up all memory. The fix is the same as was done in BigQueryIO - 
> if too many writers are opened, spill into a shuffle, and write the files 
> after the shuffle



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2615) Add ViewTests with SlidingWindows

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-2615:
--

Assignee: Thomas Groh  (was: Davor Bonaci)

> Add ViewTests with SlidingWindows
> -
>
> Key: BEAM-2615
> URL: https://issues.apache.org/jira/browse/BEAM-2615
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>
> For both reading and writing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2601) FileBasedSink produces incorrect shards when writing to multiple destinations

2017-08-24 Thread Davor Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16140688#comment-16140688
 ] 

Davor Bonaci commented on BEAM-2601:


Fixed?

> FileBasedSink produces incorrect shards when writing to multiple destinations
> -
>
> Key: BEAM-2601
> URL: https://issues.apache.org/jira/browse/BEAM-2601
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
> Fix For: 2.2.0
>
>
> FileBasedSink now supports multiple dynamic destinations, however it 
> finalizes all files in a bundle without paying attention to destination. This 
> means that the shard counts will be incorrect across these destinations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2601) FileBasedSink produces incorrect shards when writing to multiple destinations

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-2601:
--

Assignee: Reuven Lax  (was: Davor Bonaci)

> FileBasedSink produces incorrect shards when writing to multiple destinations
> -
>
> Key: BEAM-2601
> URL: https://issues.apache.org/jira/browse/BEAM-2601
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
> Fix For: 2.2.0
>
>
> FileBasedSink now supports multiple dynamic destinations, however it 
> finalizes all files in a bundle without paying attention to destination. This 
> means that the shard counts will be incorrect across these destinations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[2/2] beam git commit: This closes #3756

2017-08-24 Thread altay
This closes #3756


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/d261d6bd
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/d261d6bd
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/d261d6bd

Branch: refs/heads/master
Commit: d261d6bde1a9fe128c7a5c56936c88caca7ca9eb
Parents: 73c2026 0f53e2a
Author: Ahmet Altay 
Authored: Thu Aug 24 13:59:48 2017 -0700
Committer: Ahmet Altay 
Committed: Thu Aug 24 13:59:48 2017 -0700

--
 .../examples/complete/game/game_stats_test.py   | 81 
 .../examples/complete/game/leader_board_test.py | 69 +
 2 files changed, 150 insertions(+)
--




[1/2] beam git commit: Added tests for python gaming examples

2017-08-24 Thread altay
Repository: beam
Updated Branches:
  refs/heads/master 73c2026db -> d261d6bde


Added tests for python gaming examples


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/0f53e2ad
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/0f53e2ad
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/0f53e2ad

Branch: refs/heads/master
Commit: 0f53e2adc7509cd8383341c2b2a8c0275b7f0816
Parents: 73c2026
Author: David Cavazos 
Authored: Thu Aug 24 11:43:23 2017 -0700
Committer: Ahmet Altay 
Committed: Thu Aug 24 13:59:16 2017 -0700

--
 .../examples/complete/game/game_stats_test.py   | 81 
 .../examples/complete/game/leader_board_test.py | 69 +
 2 files changed, 150 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/0f53e2ad/sdks/python/apache_beam/examples/complete/game/game_stats_test.py
--
diff --git a/sdks/python/apache_beam/examples/complete/game/game_stats_test.py 
b/sdks/python/apache_beam/examples/complete/game/game_stats_test.py
new file mode 100644
index 000..971f9dc
--- /dev/null
+++ b/sdks/python/apache_beam/examples/complete/game/game_stats_test.py
@@ -0,0 +1,81 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Test for the game_stats example."""
+
+import logging
+import unittest
+
+import apache_beam as beam
+from apache_beam.examples.complete.game import game_stats
+from apache_beam.testing.test_pipeline import TestPipeline
+from apache_beam.testing.util import assert_that
+from apache_beam.testing.util import equal_to
+
+
+class GameStatsTest(unittest.TestCase):
+
+  SAMPLE_DATA = [
+  'user1_team1,team1,18,1447686663000,2015-11-16 15:11:03.921',
+  'user1_team1,team1,18,1447690263000,2015-11-16 16:11:03.921',
+  'user2_team2,team2,2,1447690263000,2015-11-16 16:11:03.955',
+  'user3_team3,team3,8,1447690263000,2015-11-16 16:11:03.955',
+  'user4_team3,team3,5,1447690263000,2015-11-16 16:11:03.959',
+  'user1_team1,team1,14,1447697463000,2015-11-16 18:11:03.955',
+  'robot1_team1,team1,9000,1447697463000,2015-11-16 18:11:03.955',
+  'robot2_team2,team2,1,1447697463000,2015-11-16 20:11:03.955',
+  'robot2_team2,team2,9000,1447697463000,2015-11-16 21:11:03.955',
+  ]
+
+  def create_data(self, p):
+return (p
+| beam.Create(GameStatsTest.SAMPLE_DATA)
+| beam.ParDo(game_stats.ParseGameEventFn())
+| beam.Map(lambda elem:\
+   beam.window.TimestampedValue(elem, elem['timestamp'])))
+
+  def test_spammy_users(self):
+with TestPipeline() as p:
+  result = (
+  self.create_data(p)
+  | beam.Map(lambda elem: (elem['user'], elem['score']))
+  | game_stats.CalculateSpammyUsers())
+  assert_that(result, equal_to([
+  ('robot1_team1', 9000), ('robot2_team2', 9001)]))
+
+  def test_game_stats_sessions(self):
+session_gap = 5 * 60
+user_activity_window_duration = 30 * 60
+with TestPipeline() as p:
+  result = (
+  self.create_data(p)
+  | beam.Map(lambda elem: (elem['user'], elem['score']))
+  | 'WindowIntoSessions' >> beam.WindowInto(
+  beam.window.Sessions(session_gap),
+  timestamp_combiner=beam.window.TimestampCombiner.OUTPUT_AT_EOW)
+  | beam.CombinePerKey(lambda _: None)
+  | beam.ParDo(game_stats.UserSessionActivity())
+  | 'WindowToExtractSessionMean' >> beam.WindowInto(
+  beam.window.FixedWindows(user_activity_window_duration))
+  | beam.CombineGlobally(beam.combiners.MeanCombineFn())\
+  .without_defaults())
+  assert_that(result, equal_to([300.0, 300.0, 300.0]))
+
+
+if __name__ == '__main__':
+  logging.getLogger().setLevel(logging.INFO)
+  unittest.main()

http://git-wip-us.apache.org/repos/asf/beam/blob/0f53e2ad/sdks/python/apache_beam/examples/complete/game/leader_board_test.py
--
diff --git 
a/sdks/pytho

[jira] [Assigned] (BEAM-2624) File-based sinks should produce a PCollection of written filenames

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-2624:
--

Assignee: Reuven Lax  (was: Davor Bonaci)

> File-based sinks should produce a PCollection of written filenames
> --
>
> Key: BEAM-2624
> URL: https://issues.apache.org/jira/browse/BEAM-2624
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2701) use a custom implementation of java.io.ObjectInputStream

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-2701:
--

Assignee: Luke Cwik  (was: Davor Bonaci)

> use a custom implementation of java.io.ObjectInputStream
> 
>
> Key: BEAM-2701
> URL: https://issues.apache.org/jira/browse/BEAM-2701
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Luke Cwik
>
> java.io.ObjectInputStream should override resolve[Proxy]Class using the TCCL 
> to support any classloader and not fallback into some JVM pitfall using 
> another classloader (default). This will enable beam to use any classloader 
> instead of requiring to run in the JVM using java serialization.
> {code}
> @Override
> protected Class resolveClass(final ObjectStreamClass classDesc) throws 
> IOException, ClassNotFoundException {
> final String n = classDesc.getName();
> final ClassLoader classloader = getClassloader();
> try {
> return Class.forName(n, false, classloader);
> } catch (ClassNotFoundException e) {
> if (n.equals("boolean")) {
> return boolean.class;
> }
> if (n.equals("byte")) {
> return byte.class;
> }
> if (n.equals("char")) {
> return char.class;
> }
> if (n.equals("short")) {
> return short.class;
> }
> if (n.equals("int")) {
> return int.class;
> }
> if (n.equals("long")) {
> return long.class;
> }
> if (n.equals("float")) {
> return float.class;
> }
> if (n.equals("double")) {
> return double.class;
> }
> //Last try - Let runtime try and find it.
> return Class.forName(n, false, null);
> }
> }
> @Override
> protected Class resolveProxyClass(final String[] interfaces) throws 
> IOException, ClassNotFoundException {
> final Class[] cinterfaces = new Class[interfaces.length];
> for (int i = 0; i < interfaces.length; i++) {
> cinterfaces[i] = getClassloader().loadClass(interfaces[i]);
> }
> try {
> return Proxy.getProxyClass(getClassloader(), cinterfaces);
> } catch (IllegalArgumentException e) {
> throw new ClassNotFoundException(null, e);
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2705) DoFnTester supports for StateParameter

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-2705:
--

Assignee: Kenneth Knowles  (was: Davor Bonaci)

> DoFnTester supports for StateParameter
> --
>
> Key: BEAM-2705
> URL: https://issues.apache.org/jira/browse/BEAM-2705
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 2.0.0
>Reporter: Yihua Eric Fang
>Assignee: Kenneth Knowles
>
> Today DoFnTester does not support StateParameters such as ValueState. I 
> didn't see an issue being created on JIRA, so filing this one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2751) Write PCollection elements to individual files

2017-08-24 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci reassigned BEAM-2751:
--

Assignee: Eugene Kirpichov  (was: Davor Bonaci)

> Write PCollection elements to individual files
> --
>
> Key: BEAM-2751
> URL: https://issues.apache.org/jira/browse/BEAM-2751
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Christopher Hebert
>Assignee: Eugene Kirpichov
>
> I'd like to write elements as individual files.
> Rather than smashing thousands of outputs into a handful of files as TextIO 
> does (output-0-of-5, output-1-of-5,...), I want to write each 
> element into unique files.
> So if I used WholeFileIO from [BEAM-2750] to read in three files (hi.txt, 
> what.txt, and yes.txt) then I'd like to write the processed files out to 
> individual files with user or data-defined filenames (like hi-modified.txt, 
> what-modified.txt, and yes-modified.txt).
> With a WholeFileIO, this would look like:
> {code:java}
> PCollection> fileNamesAndBytes = p.apply("Read", 
> WholeFileIO.read().from("/path/to/input/dir/*"));
> ...
> // Do stuff that change contents and file names
> PCollection> modifedFileNamesAndBytes = ...
> ...
> modifedFileNamesAndBytes.apply("Write", 
> WholeFileIO.write().to("/path/to/output/dir/"));
> {code}
> This ticket complements [BEAM-2750].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   >