Jenkins build is back to stable : beam_PostCommit_Java_MavenInstall #4338

2017-07-10 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-2429) Conflicting filesystems with used of HadoopFileSystem

2017-07-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16079994#comment-16079994
 ] 

François Wagner commented on BEAM-2429:
---

Hi Manoj,

Here the code that worked for me:

```
String[] args = new String[]{ "--hdfsConfiguration=[{\"fs.defaultFS\" : 
\"hdfs://host:port\"}]"};
options = PipelineOptionsFactory .fromArgs(args) .withValidation() 
.as(HadoopFileSystemOptions.class);
Pipeline pipeline = Pipeline.create(options);
```

Cheers,
François

> Conflicting filesystems with used of HadoopFileSystem
> -
>
> Key: BEAM-2429
> URL: https://issues.apache.org/jira/browse/BEAM-2429
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Affects Versions: 2.0.0
>Reporter: François Wagner
>Assignee: Flavio Fiszman
> Fix For: 2.0.0
>
>
> I'm facing issue when trying to use HadoopFileSystem in my pipeline. It looks 
> like HadoopFileSystem is registring itself under the `file` schema 
> (https://github.com/apache/beam/pull/2777/files#diff-330bd0854dcab6037ef0e52c05d68eb2L79),
>  hence the following Exception is thrown when trying to register 
> HadoopFileSystem.
> java.lang.IllegalStateException: Scheme: [file] has conflicting filesystems: 
> [org.apache.beam.sdk.io.LocalFileSystem, 
> org.apache.beam.sdk.io.hdfs.HadoopFileSystem]
>   at 
> org.apache.beam.sdk.io.FileSystems.verifySchemesAreUnique(FileSystems.java:498)
> What is the correct way to handle `hdfs` url out of the box with TextIO & 
> AvroIO ?
> {code:java}
> String[] args = new String[]{
> "--hdfsConfiguration=[{\"dfs.client.use.datanode.hostname\": 
> \"true\"}]"};
> HadoopFileSystemOptions options = PipelineOptionsFactory
> .fromArgs(args)
> .withValidation()
> .as(HadoopFileSystemOptions.class);
> Pipeline pipeline = Pipeline.create(options); 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3531: [BEAM-2306] Fail build when @Deprecated is used wit...

2017-07-10 Thread alex-filatov
GitHub user alex-filatov opened a pull request:

https://github.com/apache/beam/pull/3531

[BEAM-2306] Fail build when @Deprecated is used without @deprecated javadoc

Add checkstyle check to fail the build when @Deprecated is used without 
@deprecated javadoc (or vice versa).

The check is disabled for existing violations where reason for deprecation 
and/or alternative is not clear.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alex-filatov/beam 
beam-2306-check-missing-deprecated

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3531.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3531


commit 0694dcc62a28a98363cdd67a69e3f2c3ce82e072
Author: Alex Filatov 
Date:   2017-07-10T10:20:49Z

[BEAM-2306] Add checkstyle check to fail the build when @Deprecated is used 
without @deprecated javadoc (or vice versa).

The check is disabled for existing violations where reason for deprecation 
and/or alternative is not clear.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-2306) @Deprecated without @deprecated javadoc explanation should cause build failure

2017-07-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080128#comment-16080128
 ] 

ASF GitHub Bot commented on BEAM-2306:
--

GitHub user alex-filatov opened a pull request:

https://github.com/apache/beam/pull/3531

[BEAM-2306] Fail build when @Deprecated is used without @deprecated javadoc

Add checkstyle check to fail the build when @Deprecated is used without 
@deprecated javadoc (or vice versa).

The check is disabled for existing violations where reason for deprecation 
and/or alternative is not clear.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alex-filatov/beam 
beam-2306-check-missing-deprecated

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3531.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3531


commit 0694dcc62a28a98363cdd67a69e3f2c3ce82e072
Author: Alex Filatov 
Date:   2017-07-10T10:20:49Z

[BEAM-2306] Add checkstyle check to fail the build when @Deprecated is used 
without @deprecated javadoc (or vice versa).

The check is disabled for existing violations where reason for deprecation 
and/or alternative is not clear.




> @Deprecated without @deprecated javadoc explanation should cause build failure
> --
>
> Key: BEAM-2306
> URL: https://issues.apache.org/jira/browse/BEAM-2306
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.0.0
>Reporter: Kenneth Knowles
>  Labels: starter
>
> We have a number of places with {{@Deprecated}} annotations on seemingly 
> innocuous methods, for example in {{CoderRegistry}}, with no accompanying 
> {{@deprecated}} javadoc.
>  - If there is a preferred alternative, it should be explicitly linked.
>  - If there is no alternative, that should be explained.
>  - The deprecation should indicate whether it is for removal at version 3.0.0 
> or whether it was deprecated prior to 2.0.0 and may be removed at some 
> increment 2.x.y.
> I believe javadoc or findbugs has the ability to enforce proper policy. This 
> ticket tracks getting that policy in place.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3532: [BEAM-2564] add integration test for string functio...

2017-07-10 Thread xumingming
GitHub user xumingming opened a pull request:

https://github.com/apache/beam/pull/3532

[BEAM-2564] add integration test for string functions

Add integration test for string functions and fixed a bug in 
`BeamSqlTrimExpression`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xumingming/beam 
BEAM-2564-integration-test-for-string-operators

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3532.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3532


commit 709768f98fa1bd4daa3080bfe1aaec31b78f
Author: James Xu 
Date:   2017-07-10T11:58:21Z

[BEAM-2564] add integration test for string functions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (BEAM-2561) Add integration test for date functions

2017-07-10 Thread James Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Xu reassigned BEAM-2561:
--

Assignee: James Xu

> Add integration test for date functions
> ---
>
> Key: BEAM-2561
> URL: https://issues.apache.org/jira/browse/BEAM-2561
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: James Xu
>Assignee: James Xu
>  Labels: dsl_sql_merge
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3533: [Beam-2230] ApiSurface Refactoring

2017-07-10 Thread evindj
GitHub user evindj opened a pull request:

https://github.com/apache/beam/pull/3533

[Beam-2230] ApiSurface Refactoring

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`.
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/evindj/beam BEAM-2230

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3533.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3533


commit a0a90f298a5c3131a887193cccb278fbf45e61b5
Author: Innocent Djiofack 
Date:   2017-07-10T04:30:46Z

Made ApiSurface abstract and created implementing subclasses

commit c800fb95b7f9a622f4a9dc2fa061764610739298
Author: Innocent Djiofack 
Date:   2017-07-10T05:34:18Z

Created the direct runner api surface sub class

commit a532663cf2650d0f5dfbc6ea7838e3c5da77a1bb
Author: Innocent Djiofack 
Date:   2017-07-10T05:52:22Z

Modified GcpApiSurface to reflect the correct Api Surface

commit d696b2323bcf89d9272bed94c860dac091bffe3d
Author: Innocent Djiofack 
Date:   2017-07-10T06:02:03Z

Fixed Core Google Cloud API surface.

commit 4a321dc4e07087d0fc4148268f83552fa5b65be6
Author: Innocent Djiofack 
Date:   2017-07-10T11:25:08Z

Fixed a bug in GcpApiSurface




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-2564) Add integration test for string operators

2017-07-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080224#comment-16080224
 ] 

ASF GitHub Bot commented on BEAM-2564:
--

GitHub user xumingming opened a pull request:

https://github.com/apache/beam/pull/3532

[BEAM-2564] add integration test for string functions

Add integration test for string functions and fixed a bug in 
`BeamSqlTrimExpression`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xumingming/beam 
BEAM-2564-integration-test-for-string-operators

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3532.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3532


commit 709768f98fa1bd4daa3080bfe1aaec31b78f
Author: James Xu 
Date:   2017-07-10T11:58:21Z

[BEAM-2564] add integration test for string functions




> Add integration test for string operators
> -
>
> Key: BEAM-2564
> URL: https://issues.apache.org/jira/browse/BEAM-2564
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: James Xu
>Assignee: James Xu
>  Labels: dsl_sql_merge
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Flink #3361

2017-07-10 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-2571) Flink ValidatesRunner failing CombineTest.testSlidingWindowsCombineWithContext

2017-07-10 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080254#comment-16080254
 ] 

Aljoscha Krettek commented on BEAM-2571:


I'm looking into this.

> Flink ValidatesRunner failing CombineTest.testSlidingWindowsCombineWithContext
> --
>
> Key: BEAM-2571
> URL: https://issues.apache.org/jira/browse/BEAM-2571
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kenneth Knowles
>Assignee: Aljoscha Krettek
> Fix For: 2.1.0
>
>
> This appears to have been caused by https://github.com/apache/beam/pull/3429 
> which fixes a couple errors in how trigger timers were processed / final 
> panes labeled.
> I am investigating, considering roll back vs forward fix. Since it is an 
> esoteric use case where I would advise users to use a stateful DoFn instead, 
> I think the bug fixed probably outweighs the bug introduced. I would like to 
> fix for 2.1.0 but will report back soon.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2595

2017-07-10 Thread Apache Jenkins Server
See 




[GitHub] beam pull request #3526: BEAM-934 Findbugs doesn't pass in Java8 Examples

2017-07-10 Thread eralmas7
Github user eralmas7 closed the pull request at:

https://github.com/apache/beam/pull/3526


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #3526: BEAM-934 Findbugs doesn't pass in Java8 Examples

2017-07-10 Thread eralmas7
GitHub user eralmas7 reopened a pull request:

https://github.com/apache/beam/pull/3526

BEAM-934 Findbugs doesn't pass in Java8 Examples

Maven Build result:

...
[INFO] --- maven-install-plugin:2.5.2:install (default-install) @ 
beam-examples-java8 ---
[INFO] Installing 
C:\workspace-apache\beam\examples\java8\target\beam-examples-java8-2.2.0-SNAPSHOT.jar
 to 
C:\Users\almass\.m2\repository\org\apache\beam\beam-examples-java8\2.2.0-SNAPSHOT\beam-examples-java8-2.2.0-SNAPSHOT.jar
[INFO] Installing 
C:\workspace-apache\beam\examples\java8\dependency-reduced-pom.xml to 
C:\Users\almass\.m2\repository\org\apache\beam\beam-examples-java8\2.2.0-SNAPSHOT\beam-examples-java8-2.2.0-SNAPSHOT.pom
[INFO] Installing 
C:\workspace-apache\beam\examples\java8\target\beam-examples-java8-2.2.0-SNAPSHOT-tests.jar
 to 
C:\Users\almass\.m2\repository\org\apache\beam\beam-examples-java8\2.2.0-SNAPSHOT\beam-examples-java8-2.2.0-SNAPSHOT-tests.jar
[INFO] Installing 
C:\workspace-apache\beam\examples\java8\target\beam-examples-java8-2.2.0-SNAPSHOT-tests.jar
 to 
C:\Users\almass\.m2\repository\org\apache\beam\beam-examples-java8\2.2.0-SNAPSHOT\beam-examples-java8-2.2.0-SNAPSHOT-tests.jar
[INFO] 

[INFO] BUILD SUCCESS
[INFO] 

[INFO] Total time: 02:20 min
[INFO] Finished at: 2017-07-09T11:49:43+05:30
[INFO] Final Memory: 30M/109M
[INFO] 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/eralmas7/beam bugFix-934

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3526.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3526


commit c909c9efe97632a9a9ab19ca6a6d99f3c8c5ffe0
Author: eralmas7 
Date:   2017-07-09T06:20:52Z

BEAM-934 Fixed build by fixing firebug error.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-934) Findbugs doesn't pass in Java8 Examples

2017-07-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080595#comment-16080595
 ] 

ASF GitHub Bot commented on BEAM-934:
-

Github user eralmas7 closed the pull request at:

https://github.com/apache/beam/pull/3526


> Findbugs doesn't pass in Java8 Examples
> ---
>
> Key: BEAM-934
> URL: https://issues.apache.org/jira/browse/BEAM-934
> Project: Beam
>  Issue Type: Bug
>  Components: examples-java
>Reporter: Daniel Halperin
>Assignee: Jean-Baptiste Onofré
>  Labels: newbie, starter
>
> {code}
> [INFO] --- findbugs-maven-plugin:3.0.1:check (default) @ beam-examples-java8 
> ---
> [INFO] BugInstance size is 2
> [INFO] Error size is 0
> [INFO] Total bugs: 2
> [INFO] Result of integer multiplication cast to long in 
> org.apache.beam.examples.complete.game.injector.Injector$TeamInfo.getEndTimeInMillis()
>  [org.apache.beam.examples.complete.game.injector.Injector$TeamInfo] At 
> Injector.java:[line 170]
> [INFO] Format string should use %n rather than \n in 
> org.apache.beam.examples.complete.game.injector.InjectorUtils.createTopic(Pubsub,
>  String) [org.apache.beam.examples.complete.game.injector.InjectorUtils] At 
> InjectorUtils.java:[line 96]
> [INFO]  
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Reopened] (BEAM-2404) BigQueryIO reading stalls if no data is returned by query

2017-07-10 Thread Andre (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andre reopened BEAM-2404:
-

reopening

> BigQueryIO reading stalls if no data is returned by query
> -
>
> Key: BEAM-2404
> URL: https://issues.apache.org/jira/browse/BEAM-2404
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Affects Versions: 2.0.0
>Reporter: Andre
>Assignee: Stephen Sisk
> Fix For: Not applicable
>
>
> When running a BigQueryIO query that doesn't return any rows (e.g. nothing 
> has changed in a delta job) the job seems to stall and nothing happens as no 
> temp files are being written which I think might be what it is waiting for. 
> Just adding one row to the source table will make the job run through 
> successfully.
> Code:
> {code:java}
> PCollection  rows = p.apply("ReadFromBQ",
>  BigQueryIO.read()
>  .fromQuery("SELECT * FROM `myproject.dataset.table`")
>  .withoutResultFlattening().usingStandardSql());
> {code}
>   
> Log:
> {code:java}   
> Jun 02, 2017 9:00:36 AM 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$JobServiceImpl 
> startJob
> INFO: Started BigQuery job: {jobId=beam_job_batch-query, 
> projectId=my-project}.
> bq show -j --format=prettyjson --project_id=my-project beam_job_batch-query
> Jun 02, 2017 9:03:11 AM 
> org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase executeExtract
> INFO: Starting BigQuery extract job: beam_job_batch-extract
> Jun 02, 2017 9:03:12 AM 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$JobServiceImpl 
> startJob
> INFO: Started BigQuery job: {jobId=beam_job_batch-extract, 
> projectId=my-project}.
> bq show -j --format=prettyjson --project_id=my-project beam_job_batch-extract
> Jun 02, 2017 9:04:06 AM 
> org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase executeExtract
> INFO: BigQuery extract job completed: beam_job_batch-extract
> Jun 02, 2017 9:04:08 AM org.apache.beam.sdk.io.FileBasedSource 
> expandFilePattern
> INFO: Matched 1 files for pattern 
> gs://my-bucket/tmp/BigQueryExtractTemp/ff594d003c6440a1ad84b9e02858b5c6/.avro
> Jun 02, 2017 9:04:09 AM org.apache.beam.sdk.io.FileBasedSource 
> getEstimatedSizeBytes
> INFO: Filepattern 
> gs://my-bucket/tmp/BigQueryExtractTemp/ff594d003c6440a1ad84b9e02858b5c6/.avro
>  matched 1 files with total size 9750
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2404) BigQueryIO reading stalls if no data is returned by query

2017-07-10 Thread Andre (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080563#comment-16080563
 ] 

Andre commented on BEAM-2404:
-

Sorry guys, got distracted with other stuff. It looks like the problem is 
introduced by the write step after not reading anything.

For example the following code runs fine when data is read (WHERE removed) but 
fails with the exception below otherwise.

{code:java}
PCollection rows = p.apply("ReadFromBQ", BigQueryIO.read()
   .fromQuery("SELECT * FROM [project:dataset.table] WHERE 1 = 
2").withoutResultFlattening()); 

rows.apply("WriteToBQ", BigQueryIO.writeTableRows()
   .to(targetTable).withSchema(mySchema)
   .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER)
   .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));
{code}


{code:java}
SEVERE: java.lang.NullPointerException
org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
java.lang.NullPointerException
at 
org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:322)
at 
org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:292)
at 
org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:200)
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:63)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:295)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:281)
at com.project.MyClass.main(MyClass.java:128)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at 
org.apache.beam.sdk.io.gcp.bigquery.WriteTables.processElement(WriteTables.java:97)
{code}


Now if I add a windowing strategy, the code doesn't fail anymore but never 
finishes even though no data is being read.

{code:java}
rows
.apply("AddTimestamp", ParDo.of(new OrderAddTimestampDoFn()))
.apply("WindowDaily", Window.into(CalendarWindows.days(1)))
.apply("WriteToBQ", BigQueryIO.writeTableRows()
.to(new SerializableFunction, 
TableDestination>() {
@Override
public TableDestination apply(ValueInSingleWindow 
value) {
String dayString = 
DateTimeFormat.forPattern("MMdd").withZone(DateTimeZone.UTC).print(((IntervalWindow)
 value.getWindow()).start());
TableDestination td = new TableDestination(targetTable 
+ "$" + dayString, null);
return td;
}
}).withSchema(mySchema)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));
{code}



> BigQueryIO reading stalls if no data is returned by query
> -
>
> Key: BEAM-2404
> URL: https://issues.apache.org/jira/browse/BEAM-2404
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Affects Versions: 2.0.0
>Reporter: Andre
>Assignee: Stephen Sisk
> Fix For: Not applicable
>
>
> When running a BigQueryIO query that doesn't return any rows (e.g. nothing 
> has changed in a delta job) the job seems to stall and nothing happens as no 
> temp files are being written which I think might be what it is waiting for. 
> Just adding one row to the source table will make the job run through 
> successfully.
> Code:
> {code:java}
> PCollection  rows = p.apply("ReadFromBQ",
>  BigQueryIO.read()
>  .fromQuery("SELECT * FROM `myproject.dataset.table`")
>  .withoutResultFlattening().usingStandardSql());
> {code}
>   
> Log:
> {code:java}   
> Jun 02, 2017 9:00:36 AM 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$JobServiceImpl 
> startJob
> INFO: Started BigQuery job: {jobId=beam_job_batch-query, 
> projectId=my-project}.
> bq show -j --format=prettyjson --project_id=my-project beam_job_batch-query
> Jun 02, 2017 9:03:11 AM 
> org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase executeExtract
> INFO: Starting BigQuery extract job: beam_job_batch-extract
> Jun 02, 2017 9:03:12 AM 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$JobServiceImpl 
> startJob
> INFO: Started BigQuery job: {jobId=beam_job_batch-extract, 
> projectId=my-project}.
> bq show -j --format=prettyjson --project_id=my-project beam_job_ba

[jira] [Commented] (BEAM-934) Findbugs doesn't pass in Java8 Examples

2017-07-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080596#comment-16080596
 ] 

ASF GitHub Bot commented on BEAM-934:
-

GitHub user eralmas7 reopened a pull request:

https://github.com/apache/beam/pull/3526

BEAM-934 Findbugs doesn't pass in Java8 Examples

Maven Build result:

...
[INFO] --- maven-install-plugin:2.5.2:install (default-install) @ 
beam-examples-java8 ---
[INFO] Installing 
C:\workspace-apache\beam\examples\java8\target\beam-examples-java8-2.2.0-SNAPSHOT.jar
 to 
C:\Users\almass\.m2\repository\org\apache\beam\beam-examples-java8\2.2.0-SNAPSHOT\beam-examples-java8-2.2.0-SNAPSHOT.jar
[INFO] Installing 
C:\workspace-apache\beam\examples\java8\dependency-reduced-pom.xml to 
C:\Users\almass\.m2\repository\org\apache\beam\beam-examples-java8\2.2.0-SNAPSHOT\beam-examples-java8-2.2.0-SNAPSHOT.pom
[INFO] Installing 
C:\workspace-apache\beam\examples\java8\target\beam-examples-java8-2.2.0-SNAPSHOT-tests.jar
 to 
C:\Users\almass\.m2\repository\org\apache\beam\beam-examples-java8\2.2.0-SNAPSHOT\beam-examples-java8-2.2.0-SNAPSHOT-tests.jar
[INFO] Installing 
C:\workspace-apache\beam\examples\java8\target\beam-examples-java8-2.2.0-SNAPSHOT-tests.jar
 to 
C:\Users\almass\.m2\repository\org\apache\beam\beam-examples-java8\2.2.0-SNAPSHOT\beam-examples-java8-2.2.0-SNAPSHOT-tests.jar
[INFO] 

[INFO] BUILD SUCCESS
[INFO] 

[INFO] Total time: 02:20 min
[INFO] Finished at: 2017-07-09T11:49:43+05:30
[INFO] Final Memory: 30M/109M
[INFO] 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/eralmas7/beam bugFix-934

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3526.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3526


commit c909c9efe97632a9a9ab19ca6a6d99f3c8c5ffe0
Author: eralmas7 
Date:   2017-07-09T06:20:52Z

BEAM-934 Fixed build by fixing firebug error.




> Findbugs doesn't pass in Java8 Examples
> ---
>
> Key: BEAM-934
> URL: https://issues.apache.org/jira/browse/BEAM-934
> Project: Beam
>  Issue Type: Bug
>  Components: examples-java
>Reporter: Daniel Halperin
>Assignee: Jean-Baptiste Onofré
>  Labels: newbie, starter
>
> {code}
> [INFO] --- findbugs-maven-plugin:3.0.1:check (default) @ beam-examples-java8 
> ---
> [INFO] BugInstance size is 2
> [INFO] Error size is 0
> [INFO] Total bugs: 2
> [INFO] Result of integer multiplication cast to long in 
> org.apache.beam.examples.complete.game.injector.Injector$TeamInfo.getEndTimeInMillis()
>  [org.apache.beam.examples.complete.game.injector.Injector$TeamInfo] At 
> Injector.java:[line 170]
> [INFO] Format string should use %n rather than \n in 
> org.apache.beam.examples.complete.game.injector.InjectorUtils.createTopic(Pubsub,
>  String) [org.apache.beam.examples.complete.game.injector.InjectorUtils] At 
> InjectorUtils.java:[line 96]
> [INFO]  
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2571) Flink ValidatesRunner failing CombineTest.testSlidingWindowsCombineWithContext

2017-07-10 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080623#comment-16080623
 ] 

Aljoscha Krettek commented on BEAM-2571:


[~kenn] with some super ugly debugging I think I found at least one problem: 
the Flink and Beam watermarks have slightly different semantics. In Beam, a 
watermark {{t}} says there won't be elements with a timestamp {{< t}} in the 
future. In Flink a watermark {{t}} says there won't be elements with a 
timestamp {{<= t}} in the future. This means, that Flink will fire a timer for 
time {{t0}} when the current input watermark is at least {{t0}}. The default 
Trigger, however, has this piece of code: 
https://github.com/apache/beam/blob/ca41af8fe4711ab4a81c2a33746a64e64fb0ca37/runners/core-java/src/main/java/org/apache/beam/runners/core/triggers/DefaultTriggerStateMachine.java#L76-L76.
 Meaning it will only fire when the input watermark is past {{t0}}. What this 
means here, is that we fire the only timer that was set but the Trigger doesn't 
think it's time yet and so doesn't fire.

I'm not yet sure how to solve this and if it is the only problem. A solution 
might be to shift everything that is timestamp related in the Flink runner by 
one, I'd have to first find all the places, though, and put in very good tests.

This is the branch that I used for debugging: 
https://github.com/aljoscha/beam/tree/jira-2571-sleuthing-for-combinetest

> Flink ValidatesRunner failing CombineTest.testSlidingWindowsCombineWithContext
> --
>
> Key: BEAM-2571
> URL: https://issues.apache.org/jira/browse/BEAM-2571
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kenneth Knowles
>Assignee: Aljoscha Krettek
> Fix For: 2.1.0
>
>
> This appears to have been caused by https://github.com/apache/beam/pull/3429 
> which fixes a couple errors in how trigger timers were processed / final 
> panes labeled.
> I am investigating, considering roll back vs forward fix. Since it is an 
> esoteric use case where I would advise users to use a stateful DoFn instead, 
> I think the bug fixed probably outweighs the bug introduced. I would like to 
> fix for 2.1.0 but will report back soon.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-2577) IO tests should exercise Runtime Values where supported

2017-07-10 Thread Ben Chambers (JIRA)
Ben Chambers created BEAM-2577:
--

 Summary: IO tests should exercise Runtime Values where supported
 Key: BEAM-2577
 URL: https://issues.apache.org/jira/browse/BEAM-2577
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-extensions, testing
Reporter: Ben Chambers
Assignee: Davor Bonaci


The only tests I have found for `ValueProvider` parameterized methods is 
that they are not evaluated during pipeline construction time. This is missing 
out on several important pieces:

1. 
https://stackoverflow.com/questions/44967898/notify-when-textio-is-done-writing-a-file
 seems to be a problem with an AvroIO write using a RuntimeValueProvider being 
non-serializable (current theory is because of an anonymous inner class 
capturing the enclosing AvroIO.Write instance which has non-serializable 
fields).

2. Testing that the code paths that actually read the file do so correctly when 
parameterized.

We should update the developer documentation to describe what the requirements 
are for a parameterized IO and provide guidance on what tests are needed and 
how to write them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2577) IO tests should exercise Runtime Values where supported

2017-07-10 Thread Ben Chambers (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080633#comment-16080633
 ] 

Ben Chambers commented on BEAM-2577:


I looked for other places in the codebase that did this testing to base things 
on and couldn't find any. I suspect one of the larger difficulties here is that 
there isn't a uniform API for providing values when running a pipeline, which 
makes it hard to write this test.

> IO tests should exercise Runtime Values where supported
> ---
>
> Key: BEAM-2577
> URL: https://issues.apache.org/jira/browse/BEAM-2577
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions, testing
>Reporter: Ben Chambers
>Assignee: Davor Bonaci
>
> The only tests I have found for `ValueProvider` parameterized methods is 
> that they are not evaluated during pipeline construction time. This is 
> missing out on several important pieces:
> 1. 
> https://stackoverflow.com/questions/44967898/notify-when-textio-is-done-writing-a-file
>  seems to be a problem with an AvroIO write using a RuntimeValueProvider 
> being non-serializable (current theory is because of an anonymous inner class 
> capturing the enclosing AvroIO.Write instance which has non-serializable 
> fields).
> 2. Testing that the code paths that actually read the file do so correctly 
> when parameterized.
> We should update the developer documentation to describe what the 
> requirements are for a parameterized IO and provide guidance on what tests 
> are needed and how to write them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2571) Flink ValidatesRunner failing CombineTest.testSlidingWindowsCombineWithContext

2017-07-10 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080644#comment-16080644
 ] 

Kenneth Knowles commented on BEAM-2571:
---

Ouch. I don't see any way around just always adapting it. Doing the +1 or -1 at 
key places in FlinkTimerInternals is at least a good start. Ideally it might be 
the only thing that adapts a runner's notion of time to Beam's notion of time.

> Flink ValidatesRunner failing CombineTest.testSlidingWindowsCombineWithContext
> --
>
> Key: BEAM-2571
> URL: https://issues.apache.org/jira/browse/BEAM-2571
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kenneth Knowles
>Assignee: Aljoscha Krettek
> Fix For: 2.1.0
>
>
> This appears to have been caused by https://github.com/apache/beam/pull/3429 
> which fixes a couple errors in how trigger timers were processed / final 
> panes labeled.
> I am investigating, considering roll back vs forward fix. Since it is an 
> esoteric use case where I would advise users to use a stateful DoFn instead, 
> I think the bug fixed probably outweighs the bug introduced. I would like to 
> fix for 2.1.0 but will report back soon.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-2576) Move non-core transform payloads out of Runner API proto

2017-07-10 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-2576:
-

 Summary: Move non-core transform payloads out of Runner API proto
 Key: BEAM-2576
 URL: https://issues.apache.org/jira/browse/BEAM-2576
 Project: Beam
  Issue Type: Bug
  Components: beam-model-runner-api
Reporter: Kenneth Knowles
Assignee: Kenneth Knowles
 Fix For: 2.2.0


The presence of e.g. WriteFilesPayload in beam_runner_api.proto makes it 
appears as though this is a core part of the model. While it is a very 
important transform, this is actually just a payload for a composite, like any 
other, and should not be treated so specially.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-1939) Serialize more coders via URN + Class name

2017-07-10 Thread Innocent (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080705#comment-16080705
 ] 

Innocent commented on BEAM-1939:


Thank you for your comment will look at it later today.

> Serialize more coders via URN + Class name
> --
>
> Key: BEAM-1939
> URL: https://issues.apache.org/jira/browse/BEAM-1939
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Innocent
>Priority: Trivial
>
> If the size of serializing Standard Coders becomes too large, an arbitrary 
> Standard Coder can be encoded, alongside its components, via an URN and 
> looking up the class when it is to be deserialized.
> See 
> https://github.com/tgroh/beam/commit/070854845346d8e4df824e4aa374688bd095c2c6 
> as an example



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2230) Core SDK ApiSurface should be only org.apache.beam.sdk and should be defined outside of the general ApiSurface class

2017-07-10 Thread Innocent (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080710#comment-16080710
 ] 

Innocent commented on BEAM-2230:


Create a v0 pull request for this https://github.com/apache/beam/pull/3533

> Core SDK ApiSurface should be only org.apache.beam.sdk and should be defined 
> outside of the general ApiSurface class
> 
>
> Key: BEAM-2230
> URL: https://issues.apache.org/jira/browse/BEAM-2230
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Innocent
>
> Currenlty, ApiSurface.getSdkApiSurface() is highly specialized and also not 
> correct.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3534: Beam-933 Findbugs doesn't pass in Java Examples

2017-07-10 Thread eralmas7
GitHub user eralmas7 opened a pull request:

https://github.com/apache/beam/pull/3534

Beam-933 Findbugs doesn't pass in Java Examples

[INFO] --- maven-failsafe-plugin:2.20:integration-test (default) @ 
beam-examples-java ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-dependency-plugin:3.0.1:analyze-only (default) @ 
beam-examples-java ---
[INFO] No dependency problems found
[INFO] 
[INFO] --- maven-failsafe-plugin:2.20:verify (default) @ beam-examples-java 
---
[INFO] Tests are skipped.
[INFO] 

[INFO] BUILD SUCCESS
[INFO] 

[INFO] Total time: 01:10 min
[INFO] Finished at: 2017-07-10T23:02:14+05:30
[INFO] Final Memory: 45M/129M
[INFO] 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/eralmas7/beam BEAM-933

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3534.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3534


commit c909c9efe97632a9a9ab19ca6a6d99f3c8c5ffe0
Author: eralmas7 
Date:   2017-07-09T06:20:52Z

BEAM-934 Fixed build by fixing firebug error.

commit 82e5406897ec35d7dc2c3355e4b7304136814775
Author: eralmas7 
Date:   2017-07-10T17:34:13Z

BEAM-933 Enable firebug again for maven build.

commit 8820dbfff6b217ea8a1eb7958be5cfda2a59e93b
Author: eralmas7 
Date:   2017-07-10T17:39:06Z

BEAM-933 Reverted changes from BEAM-934




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (BEAM-2578) Unable to run DebuggingWordCountTest

2017-07-10 Thread Almas Shaikh (JIRA)
Almas Shaikh created BEAM-2578:
--

 Summary: Unable to run DebuggingWordCountTest
 Key: BEAM-2578
 URL: https://issues.apache.org/jira/browse/BEAM-2578
 Project: Beam
  Issue Type: Bug
  Components: examples-java
Reporter: Almas Shaikh
Assignee: Frances Perry
Priority: Minor


It fails with below error on windows Vista:

org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
java.lang.IllegalStateException: Unable to find registrar for c
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:303)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:283)
at 
org.apache.beam.examples.DebuggingWordCount.main(DebuggingWordCount.java:160)
at 
org.apache.beam.examples.DebuggingWordCountTest.testDebuggingWordCount(DebuggingWordCountTest.java:50)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
Caused by: java.lang.IllegalStateException: Unable to find registrar for c
at 
org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:447)
at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:111)
at 
org.apache.beam.sdk.io.FileBasedSource.getEstimatedSizeBytes(FileBasedSource.java:207)
at 
org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$InputProvider.getInitialInputs(BoundedReadEvaluatorFactory.java:207)
at 
org.apache.beam.runners.direct.ReadEvaluatorFactory$InputProvider.getInitialInputs(ReadEvaluatorFactory.java:87)
at 
org.apache.beam.runners.direct.RootProviderRegistry.getInitialInputs(RootProviderRegistry.java:62)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2231) ApiSurface should be lazy

2017-07-10 Thread Innocent (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080713#comment-16080713
 ] 

Innocent commented on BEAM-2231:


Thank you for your reply, will further check tonight.

> ApiSurface should be lazy
> -
>
> Key: BEAM-2231
> URL: https://issues.apache.org/jira/browse/BEAM-2231
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Innocent
>
> Currently, the ApiSurface loads classes recursively, when they should be 
> pruned before loading by the pruning pattern. This has caused crashes because 
> some classes that are never referenced in our code.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3535: BEAM-2578 DebuggingWordCountTest Fails on Windows

2017-07-10 Thread eralmas7
GitHub user eralmas7 opened a pull request:

https://github.com/apache/beam/pull/3535

BEAM-2578 DebuggingWordCountTest Fails on Windows

[INFO] --- maven-failsafe-plugin:2.20:integration-test (default) @ 
beam-examples-java ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-dependency-plugin:3.0.1:analyze-only (default) @ 
beam-examples-java ---
[INFO] No dependency problems found
[INFO] 
[INFO] --- maven-failsafe-plugin:2.20:verify (default) @ beam-examples-java 
---
[INFO] Tests are skipped.
[INFO] 

[INFO] BUILD SUCCESS
[INFO] 

[INFO] Total time: 01:41 min
[INFO] Finished at: 2017-07-10T23:18:01+05:30
[INFO] Final Memory: 45M/129M
[INFO] 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/eralmas7/beam BEAM-2578

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3535.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3535


commit c0eec0de3f7bc1a059660c9aefec6279ce4d1202
Author: eralmas7 
Date:   2017-07-10T17:44:24Z

BEAM-2578 Fixed test to run on Windows




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-2578) DebuggingWordCountTest Fails on Windows

2017-07-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080732#comment-16080732
 ] 

ASF GitHub Bot commented on BEAM-2578:
--

GitHub user eralmas7 opened a pull request:

https://github.com/apache/beam/pull/3535

BEAM-2578 DebuggingWordCountTest Fails on Windows

[INFO] --- maven-failsafe-plugin:2.20:integration-test (default) @ 
beam-examples-java ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-dependency-plugin:3.0.1:analyze-only (default) @ 
beam-examples-java ---
[INFO] No dependency problems found
[INFO] 
[INFO] --- maven-failsafe-plugin:2.20:verify (default) @ beam-examples-java 
---
[INFO] Tests are skipped.
[INFO] 

[INFO] BUILD SUCCESS
[INFO] 

[INFO] Total time: 01:41 min
[INFO] Finished at: 2017-07-10T23:18:01+05:30
[INFO] Final Memory: 45M/129M
[INFO] 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/eralmas7/beam BEAM-2578

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3535.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3535


commit c0eec0de3f7bc1a059660c9aefec6279ce4d1202
Author: eralmas7 
Date:   2017-07-10T17:44:24Z

BEAM-2578 Fixed test to run on Windows




> DebuggingWordCountTest Fails on Windows
> ---
>
> Key: BEAM-2578
> URL: https://issues.apache.org/jira/browse/BEAM-2578
> Project: Beam
>  Issue Type: Bug
>  Components: examples-java
>Reporter: Almas Shaikh
>Assignee: Frances Perry
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> It fails with below error on windows Vista:
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.IllegalStateException: Unable to find registrar for c
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:303)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:283)
>   at 
> org.apache.beam.examples.DebuggingWordCount.main(DebuggingWordCount.java:160)
>   at 
> org.apache.beam.examples.DebuggingWordCountTest.testDebuggingWordCount(DebuggingWordCountTest.java:50)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
>   at 
> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
> Caused by: java.lang.IllegalStateException: Unable to find registrar for c
>   at 
> org.apache.beam.sdk.io.FileSystems.getF

[jira] [Updated] (BEAM-2578) Unable to run DebuggingWordCountTest on Windows

2017-07-10 Thread Almas Shaikh (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Almas Shaikh updated BEAM-2578:
---
Summary: Unable to run DebuggingWordCountTest on Windows  (was: Unable to 
run DebuggingWordCountTest)

> Unable to run DebuggingWordCountTest on Windows
> ---
>
> Key: BEAM-2578
> URL: https://issues.apache.org/jira/browse/BEAM-2578
> Project: Beam
>  Issue Type: Bug
>  Components: examples-java
>Reporter: Almas Shaikh
>Assignee: Frances Perry
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> It fails with below error on windows Vista:
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.IllegalStateException: Unable to find registrar for c
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:303)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:283)
>   at 
> org.apache.beam.examples.DebuggingWordCount.main(DebuggingWordCount.java:160)
>   at 
> org.apache.beam.examples.DebuggingWordCountTest.testDebuggingWordCount(DebuggingWordCountTest.java:50)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
>   at 
> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
> Caused by: java.lang.IllegalStateException: Unable to find registrar for c
>   at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:447)
>   at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:111)
>   at 
> org.apache.beam.sdk.io.FileBasedSource.getEstimatedSizeBytes(FileBasedSource.java:207)
>   at 
> org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$InputProvider.getInitialInputs(BoundedReadEvaluatorFactory.java:207)
>   at 
> org.apache.beam.runners.direct.ReadEvaluatorFactory$InputProvider.getInitialInputs(ReadEvaluatorFactory.java:87)
>   at 
> org.apache.beam.runners.direct.RootProviderRegistry.getInitialInputs(RootProviderRegistry.java:62)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-2578) DebuggingWordCountTest Fails on Windows

2017-07-10 Thread Almas Shaikh (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Almas Shaikh updated BEAM-2578:
---
Summary: DebuggingWordCountTest Fails on Windows  (was: Unable to run 
DebuggingWordCountTest on Windows)

> DebuggingWordCountTest Fails on Windows
> ---
>
> Key: BEAM-2578
> URL: https://issues.apache.org/jira/browse/BEAM-2578
> Project: Beam
>  Issue Type: Bug
>  Components: examples-java
>Reporter: Almas Shaikh
>Assignee: Frances Perry
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> It fails with below error on windows Vista:
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.IllegalStateException: Unable to find registrar for c
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:303)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:283)
>   at 
> org.apache.beam.examples.DebuggingWordCount.main(DebuggingWordCount.java:160)
>   at 
> org.apache.beam.examples.DebuggingWordCountTest.testDebuggingWordCount(DebuggingWordCountTest.java:50)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
>   at 
> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
> Caused by: java.lang.IllegalStateException: Unable to find registrar for c
>   at 
> org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:447)
>   at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:111)
>   at 
> org.apache.beam.sdk.io.FileBasedSource.getEstimatedSizeBytes(FileBasedSource.java:207)
>   at 
> org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$InputProvider.getInitialInputs(BoundedReadEvaluatorFactory.java:207)
>   at 
> org.apache.beam.runners.direct.ReadEvaluatorFactory$InputProvider.getInitialInputs(ReadEvaluatorFactory.java:87)
>   at 
> org.apache.beam.runners.direct.RootProviderRegistry.getInitialInputs(RootProviderRegistry.java:62)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3526: BEAM-934 Findbugs doesn't pass in Java8 Examples

2017-07-10 Thread eralmas7
Github user eralmas7 closed the pull request at:

https://github.com/apache/beam/pull/3526


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #3526: BEAM-934 Findbugs doesn't pass in Java8 Examples

2017-07-10 Thread eralmas7
GitHub user eralmas7 reopened a pull request:

https://github.com/apache/beam/pull/3526

BEAM-934 Findbugs doesn't pass in Java8 Examples

Maven Build result:

...
[INFO] --- maven-install-plugin:2.5.2:install (default-install) @ 
beam-examples-java8 ---
[INFO] Installing 
C:\workspace-apache\beam\examples\java8\target\beam-examples-java8-2.2.0-SNAPSHOT.jar
 to 
C:\Users\almass\.m2\repository\org\apache\beam\beam-examples-java8\2.2.0-SNAPSHOT\beam-examples-java8-2.2.0-SNAPSHOT.jar
[INFO] Installing 
C:\workspace-apache\beam\examples\java8\dependency-reduced-pom.xml to 
C:\Users\almass\.m2\repository\org\apache\beam\beam-examples-java8\2.2.0-SNAPSHOT\beam-examples-java8-2.2.0-SNAPSHOT.pom
[INFO] Installing 
C:\workspace-apache\beam\examples\java8\target\beam-examples-java8-2.2.0-SNAPSHOT-tests.jar
 to 
C:\Users\almass\.m2\repository\org\apache\beam\beam-examples-java8\2.2.0-SNAPSHOT\beam-examples-java8-2.2.0-SNAPSHOT-tests.jar
[INFO] Installing 
C:\workspace-apache\beam\examples\java8\target\beam-examples-java8-2.2.0-SNAPSHOT-tests.jar
 to 
C:\Users\almass\.m2\repository\org\apache\beam\beam-examples-java8\2.2.0-SNAPSHOT\beam-examples-java8-2.2.0-SNAPSHOT-tests.jar
[INFO] 

[INFO] BUILD SUCCESS
[INFO] 

[INFO] Total time: 02:20 min
[INFO] Finished at: 2017-07-09T11:49:43+05:30
[INFO] Final Memory: 30M/109M
[INFO] 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/eralmas7/beam bugFix-934

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3526.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3526


commit c909c9efe97632a9a9ab19ca6a6d99f3c8c5ffe0
Author: eralmas7 
Date:   2017-07-09T06:20:52Z

BEAM-934 Fixed build by fixing firebug error.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-934) Findbugs doesn't pass in Java8 Examples

2017-07-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080773#comment-16080773
 ] 

ASF GitHub Bot commented on BEAM-934:
-

Github user eralmas7 closed the pull request at:

https://github.com/apache/beam/pull/3526


> Findbugs doesn't pass in Java8 Examples
> ---
>
> Key: BEAM-934
> URL: https://issues.apache.org/jira/browse/BEAM-934
> Project: Beam
>  Issue Type: Bug
>  Components: examples-java
>Reporter: Daniel Halperin
>Assignee: Jean-Baptiste Onofré
>  Labels: newbie, starter
>
> {code}
> [INFO] --- findbugs-maven-plugin:3.0.1:check (default) @ beam-examples-java8 
> ---
> [INFO] BugInstance size is 2
> [INFO] Error size is 0
> [INFO] Total bugs: 2
> [INFO] Result of integer multiplication cast to long in 
> org.apache.beam.examples.complete.game.injector.Injector$TeamInfo.getEndTimeInMillis()
>  [org.apache.beam.examples.complete.game.injector.Injector$TeamInfo] At 
> Injector.java:[line 170]
> [INFO] Format string should use %n rather than \n in 
> org.apache.beam.examples.complete.game.injector.InjectorUtils.createTopic(Pubsub,
>  String) [org.apache.beam.examples.complete.game.injector.InjectorUtils] At 
> InjectorUtils.java:[line 96]
> [INFO]  
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-934) Findbugs doesn't pass in Java8 Examples

2017-07-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080774#comment-16080774
 ] 

ASF GitHub Bot commented on BEAM-934:
-

GitHub user eralmas7 reopened a pull request:

https://github.com/apache/beam/pull/3526

BEAM-934 Findbugs doesn't pass in Java8 Examples

Maven Build result:

...
[INFO] --- maven-install-plugin:2.5.2:install (default-install) @ 
beam-examples-java8 ---
[INFO] Installing 
C:\workspace-apache\beam\examples\java8\target\beam-examples-java8-2.2.0-SNAPSHOT.jar
 to 
C:\Users\almass\.m2\repository\org\apache\beam\beam-examples-java8\2.2.0-SNAPSHOT\beam-examples-java8-2.2.0-SNAPSHOT.jar
[INFO] Installing 
C:\workspace-apache\beam\examples\java8\dependency-reduced-pom.xml to 
C:\Users\almass\.m2\repository\org\apache\beam\beam-examples-java8\2.2.0-SNAPSHOT\beam-examples-java8-2.2.0-SNAPSHOT.pom
[INFO] Installing 
C:\workspace-apache\beam\examples\java8\target\beam-examples-java8-2.2.0-SNAPSHOT-tests.jar
 to 
C:\Users\almass\.m2\repository\org\apache\beam\beam-examples-java8\2.2.0-SNAPSHOT\beam-examples-java8-2.2.0-SNAPSHOT-tests.jar
[INFO] Installing 
C:\workspace-apache\beam\examples\java8\target\beam-examples-java8-2.2.0-SNAPSHOT-tests.jar
 to 
C:\Users\almass\.m2\repository\org\apache\beam\beam-examples-java8\2.2.0-SNAPSHOT\beam-examples-java8-2.2.0-SNAPSHOT-tests.jar
[INFO] 

[INFO] BUILD SUCCESS
[INFO] 

[INFO] Total time: 02:20 min
[INFO] Finished at: 2017-07-09T11:49:43+05:30
[INFO] Final Memory: 30M/109M
[INFO] 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/eralmas7/beam bugFix-934

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3526.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3526


commit c909c9efe97632a9a9ab19ca6a6d99f3c8c5ffe0
Author: eralmas7 
Date:   2017-07-09T06:20:52Z

BEAM-934 Fixed build by fixing firebug error.




> Findbugs doesn't pass in Java8 Examples
> ---
>
> Key: BEAM-934
> URL: https://issues.apache.org/jira/browse/BEAM-934
> Project: Beam
>  Issue Type: Bug
>  Components: examples-java
>Reporter: Daniel Halperin
>Assignee: Jean-Baptiste Onofré
>  Labels: newbie, starter
>
> {code}
> [INFO] --- findbugs-maven-plugin:3.0.1:check (default) @ beam-examples-java8 
> ---
> [INFO] BugInstance size is 2
> [INFO] Error size is 0
> [INFO] Total bugs: 2
> [INFO] Result of integer multiplication cast to long in 
> org.apache.beam.examples.complete.game.injector.Injector$TeamInfo.getEndTimeInMillis()
>  [org.apache.beam.examples.complete.game.injector.Injector$TeamInfo] At 
> Injector.java:[line 170]
> [INFO] Format string should use %n rather than \n in 
> org.apache.beam.examples.complete.game.injector.InjectorUtils.createTopic(Pubsub,
>  String) [org.apache.beam.examples.complete.game.injector.InjectorUtils] At 
> InjectorUtils.java:[line 96]
> [INFO]  
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Flink #3362

2017-07-10 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2596

2017-07-10 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-2573) Better filesystem discovery mechanism in Python SDK

2017-07-10 Thread Dmitry Demeshchuk (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080867#comment-16080867
 ] 

Dmitry Demeshchuk commented on BEAM-2573:
-

So, I tried to switch to the head version of the SDK and use the 
{{beam_plugins}} option.

Here are the steps I performed:

1. Did {{python setup.py sdist}} in the master branch of Beam. Also, did 
{{python setup.py install}}.
2. Created the following dictionary in Python (note the {{beam_plugins}} option 
present):
{code}
{'job_name': 's3-wordcount-example-2', 'staging_location': 
'gs://dataflow-test-gc-project-168517/s3-wordcount-example-2/staging_location', 
'runner': 'dataflow', 'streaming': False, 'runtime_type_check': False, 
'temp_location': 
'gs://dataflow-test-gc-project-168517/s3-wordcount-example-2/temporary_location',
 'setup_file': '/tmp/tmpEdRIo2/setup.py', 'dataflow_endpoint': 
'https://dataflow.googleapis.com', 'sdk_location': 
'/Users/dmitrydemeshchuk/beam/sdks/python/dist/apache-beam-2.2.0.dev0.tar.gz', 
'save_main_session': True, 'zone': 'us-west1-a', 'region': 'us-west1', 
'profile_cpu': False, 'bucket': 'gs://dataflow-test-gc-project-168517', 
'profile_memory': False, 'pipeline_type_check': True, 'project': 
'test-gc-project-168517', 'direct_runner_use_stacked_bundle': True, 
'type_check_strictness': 'DEFAULT_TO_ANY', 'beam_plugins': ' dataflow', 
'no_auth': False}
{code}
3. Created a {{PipelineOptions}} object and used it inside a pipeline: 
{{options = PipelineOptions.from_dictionary(options_dict)}}
4. Ran the pipeline in Dataflow.

Now, in the Dataflow UI I'm seeing some of the pipeline options (for example, 
{{sdk_location}} is correct), however I'm not seeing {{beam_plugins}} anywhere. 
FWIW, the job's "Dataflow SDK version" equals to "Google Cloud Dataflow SDK for 
Python 2.0.0", but {{sdk_location}} equals to 
{{/Users/dmitrydemeshchuk/beam/sdks/python/dist/apache-beam-2.2.0.dev0.tar.gz}} 
(note the 2.2.0 version).

Needless to say, the `beam_plugins` option doesn't seem to get applied to my 
pipeline, at least it fails as if the plugin wasn't imported.

I'm almost sure this has something to do with the Dataflow SDK version, but so 
far cannot find a way to make it right. Any suggestions?

> Better filesystem discovery mechanism in Python SDK
> ---
>
> Key: BEAM-2573
> URL: https://issues.apache.org/jira/browse/BEAM-2573
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow, sdk-py
>Affects Versions: 2.0.0
>Reporter: Dmitry Demeshchuk
>Priority: Minor
>
> It looks like right now custom filesystem classes have to be imported 
> explicitly: 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/filesystems.py#L30
> Seems like the current implementation doesn't allow discovering filesystems 
> that come from side packages, not from apache_beam itself. Even if I put a 
> custom FileSystem-inheriting class into a package and explicitly import it in 
> the root __init__.py of that package, it still doesn't make the class 
> discoverable.
> The problems I'm experiencing happen on Dataflow runner, while Direct runner 
> works just fine. Here's an example of Dataflow output:
> {code}
>   (320418708fe777d7): Traceback (most recent call last):
>   File 
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 
> 581, in do_work
> work_executor.execute()
>   File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", 
> line 166, in execute
> op.start()
>   File 
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/native_operations.py",
>  line 54, in start
> self.output(windowed_value)
>   File "dataflow_worker/operations.py", line 138, in 
> dataflow_worker.operations.Operation.output 
> (dataflow_worker/operations.c:5768)
> def output(self, windowed_value, output_index=0):
>   File "dataflow_worker/operations.py", line 139, in 
> dataflow_worker.operations.Operation.output 
> (dataflow_worker/operations.c:5654)
> cython.cast(Receiver, 
> self.receivers[output_index]).receive(windowed_value)
>   File "dataflow_worker/operations.py", line 72, in 
> dataflow_worker.operations.ConsumerSet.receive 
> (dataflow_worker/operations.c:3506)
> cython.cast(Operation, consumer).process(windowed_value)
>   File "dataflow_worker/operations.py", line 328, in 
> dataflow_worker.operations.DoOperation.process 
> (dataflow_worker/operations.c:11162)
> with self.scoped_process_state:
>   File "dataflow_worker/operations.py", line 329, in 
> dataflow_worker.operations.DoOperation.process 
> (dataflow_worker/operations.c:6)
> self.dofn_receiver.receive(o)
>   File "apache_beam/runners/common.py", line 382, in 
> apache_beam.runners.common.DoFnRunner.receive 
> (apache_beam/runners/common.c:10156)
> self.proc

[jira] [Commented] (BEAM-2573) Better filesystem discovery mechanism in Python SDK

2017-07-10 Thread Dmitry Demeshchuk (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080889#comment-16080889
 ] 

Dmitry Demeshchuk commented on BEAM-2573:
-

Hmm, now it's getting a bit past that, but my pipeline enters an infinite crash 
loop backoff. This is what Stackdriver prints out:
{code}
IFile "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", 
line 26, in  
I   
I  from dataflow_worker import windmillio 
IFile 
"/usr/local/lib/python2.7/dist-packages/dataflow_worker/windmillio.py", line 
41, in  
I  class PubSubWindmillSource(pubsub.PubSubSource): 
I  :  
I  'module' object has no attribute 'PubSubSource' 
I  checking backoff for container "python" in pod 
"dataflow-s3-wordcount-example-2-07101139-0c31-harness-3tf8" 
E  Error syncing pod e6169e8537b1bd83321865dafc047ba4, skipping: failed to 
"StartContainer" for "python" with CrashLoopBackOff: "Back-off 5m0s restarting 
failed container=python 
pod=dataflow-s3-wordcount-example-2-07101139-0c31-harness-3tf8_default(e6169e8537b1bd83321865dafc047ba4)"
 
I  Setting node annotation to enable volume controller attach/detach 
I  checking backoff for container "python" in pod 
"dataflow-s3-wordcount-example-2-07101139-0c31-harness-3tf8" 
I  Back-off 5m0s restarting failed container=python 
pod=dataflow-s3-wordcount-example-2-07101139-0c31-harness-3tf8_default(e6169e8537b1bd83321865dafc047ba4)
 
{code}

I'll try to clean up the local dependencies and see if that helps.

> Better filesystem discovery mechanism in Python SDK
> ---
>
> Key: BEAM-2573
> URL: https://issues.apache.org/jira/browse/BEAM-2573
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow, sdk-py
>Affects Versions: 2.0.0
>Reporter: Dmitry Demeshchuk
>Priority: Minor
>
> It looks like right now custom filesystem classes have to be imported 
> explicitly: 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/filesystems.py#L30
> Seems like the current implementation doesn't allow discovering filesystems 
> that come from side packages, not from apache_beam itself. Even if I put a 
> custom FileSystem-inheriting class into a package and explicitly import it in 
> the root __init__.py of that package, it still doesn't make the class 
> discoverable.
> The problems I'm experiencing happen on Dataflow runner, while Direct runner 
> works just fine. Here's an example of Dataflow output:
> {code}
>   (320418708fe777d7): Traceback (most recent call last):
>   File 
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 
> 581, in do_work
> work_executor.execute()
>   File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", 
> line 166, in execute
> op.start()
>   File 
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/native_operations.py",
>  line 54, in start
> self.output(windowed_value)
>   File "dataflow_worker/operations.py", line 138, in 
> dataflow_worker.operations.Operation.output 
> (dataflow_worker/operations.c:5768)
> def output(self, windowed_value, output_index=0):
>   File "dataflow_worker/operations.py", line 139, in 
> dataflow_worker.operations.Operation.output 
> (dataflow_worker/operations.c:5654)
> cython.cast(Receiver, 
> self.receivers[output_index]).receive(windowed_value)
>   File "dataflow_worker/operations.py", line 72, in 
> dataflow_worker.operations.ConsumerSet.receive 
> (dataflow_worker/operations.c:3506)
> cython.cast(Operation, consumer).process(windowed_value)
>   File "dataflow_worker/operations.py", line 328, in 
> dataflow_worker.operations.DoOperation.process 
> (dataflow_worker/operations.c:11162)
> with self.scoped_process_state:
>   File "dataflow_worker/operations.py", line 329, in 
> dataflow_worker.operations.DoOperation.process 
> (dataflow_worker/operations.c:6)
> self.dofn_receiver.receive(o)
>   File "apache_beam/runners/common.py", line 382, in 
> apache_beam.runners.common.DoFnRunner.receive 
> (apache_beam/runners/common.c:10156)
> self.process(windowed_value)
>   File "apache_beam/runners/common.py", line 390, in 
> apache_beam.runners.common.DoFnRunner.process 
> (apache_beam/runners/common.c:10458)
> self._reraise_augmented(exn)
>   File "apache_beam/runners/common.py", line 431, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented 
> (apache_beam/runners/common.c:11673)
> raise new_exn, None, original_traceback
>   File "apache_beam/runners/common.py", line 388, in 
> apache_beam.runners.common.DoFnRunner.process 
> (apache_beam/runners/common.c:10371)
> self.do_fn_invoker.invoke_process(windowed_value)
>   File "apache_beam/runners/common.py", line 281, in 
> apache_beam.runners.common.PerWindowInvoker.invoke_process 
> (apache_beam/runn

[jira] [Commented] (BEAM-2573) Better filesystem discovery mechanism in Python SDK

2017-07-10 Thread Dmitry Demeshchuk (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080933#comment-16080933
 ] 

Dmitry Demeshchuk commented on BEAM-2573:
-

Hmm, looks like it was indeed a dependency issue which setuptools was hiding.

I did `pip install .` from the beam repo, and the plugins made it in there.

Interestingly, Beam seems to have automatically converted the plugins list into 
a list of classes. In the Dataflow job's options, {{beam_plugins}} equals to 
{{['apache_beam.io.filesystem.FileSystem', 
'apache_beam.io.localfilesystem.LocalFileSystem', 
'apache_beam.io.gcp.gcsfilesystem.GCSFileSystem', 
'dataflow.aws.s3.S3FileSystem']}}.

Interestingly, the crash loop described in my previous comment still happens. 
I'll continue looking at the sources to see where I screwed up.

> Better filesystem discovery mechanism in Python SDK
> ---
>
> Key: BEAM-2573
> URL: https://issues.apache.org/jira/browse/BEAM-2573
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow, sdk-py
>Affects Versions: 2.0.0
>Reporter: Dmitry Demeshchuk
>Priority: Minor
>
> It looks like right now custom filesystem classes have to be imported 
> explicitly: 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/filesystems.py#L30
> Seems like the current implementation doesn't allow discovering filesystems 
> that come from side packages, not from apache_beam itself. Even if I put a 
> custom FileSystem-inheriting class into a package and explicitly import it in 
> the root __init__.py of that package, it still doesn't make the class 
> discoverable.
> The problems I'm experiencing happen on Dataflow runner, while Direct runner 
> works just fine. Here's an example of Dataflow output:
> {code}
>   (320418708fe777d7): Traceback (most recent call last):
>   File 
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 
> 581, in do_work
> work_executor.execute()
>   File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", 
> line 166, in execute
> op.start()
>   File 
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/native_operations.py",
>  line 54, in start
> self.output(windowed_value)
>   File "dataflow_worker/operations.py", line 138, in 
> dataflow_worker.operations.Operation.output 
> (dataflow_worker/operations.c:5768)
> def output(self, windowed_value, output_index=0):
>   File "dataflow_worker/operations.py", line 139, in 
> dataflow_worker.operations.Operation.output 
> (dataflow_worker/operations.c:5654)
> cython.cast(Receiver, 
> self.receivers[output_index]).receive(windowed_value)
>   File "dataflow_worker/operations.py", line 72, in 
> dataflow_worker.operations.ConsumerSet.receive 
> (dataflow_worker/operations.c:3506)
> cython.cast(Operation, consumer).process(windowed_value)
>   File "dataflow_worker/operations.py", line 328, in 
> dataflow_worker.operations.DoOperation.process 
> (dataflow_worker/operations.c:11162)
> with self.scoped_process_state:
>   File "dataflow_worker/operations.py", line 329, in 
> dataflow_worker.operations.DoOperation.process 
> (dataflow_worker/operations.c:6)
> self.dofn_receiver.receive(o)
>   File "apache_beam/runners/common.py", line 382, in 
> apache_beam.runners.common.DoFnRunner.receive 
> (apache_beam/runners/common.c:10156)
> self.process(windowed_value)
>   File "apache_beam/runners/common.py", line 390, in 
> apache_beam.runners.common.DoFnRunner.process 
> (apache_beam/runners/common.c:10458)
> self._reraise_augmented(exn)
>   File "apache_beam/runners/common.py", line 431, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented 
> (apache_beam/runners/common.c:11673)
> raise new_exn, None, original_traceback
>   File "apache_beam/runners/common.py", line 388, in 
> apache_beam.runners.common.DoFnRunner.process 
> (apache_beam/runners/common.c:10371)
> self.do_fn_invoker.invoke_process(windowed_value)
>   File "apache_beam/runners/common.py", line 281, in 
> apache_beam.runners.common.PerWindowInvoker.invoke_process 
> (apache_beam/runners/common.c:8626)
> self._invoke_per_window(windowed_value)
>   File "apache_beam/runners/common.py", line 307, in 
> apache_beam.runners.common.PerWindowInvoker._invoke_per_window 
> (apache_beam/runners/common.c:9302)
> windowed_value, self.process_method(*args_for_process))
>   File 
> "/Users/dmitrydemeshchuk/miniconda2/lib/python2.7/site-packages/apache_beam/transforms/core.py",
>  line 749, in 
>   File 
> "/Users/dmitrydemeshchuk/miniconda2/lib/python2.7/site-packages/apache_beam/io/iobase.py",
>  line 891, in 
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/options/value_provider.py",
>  line 109, in _f
> return fnc(self, *args, **

[jira] [Commented] (BEAM-2573) Better filesystem discovery mechanism in Python SDK

2017-07-10 Thread Sourabh Bajaj (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080974#comment-16080974
 ] 

Sourabh Bajaj commented on BEAM-2573:
-

I think the plugins are working correctly if they are passing a list of class 
names to be imported at the start. You might need to wait for the next release 
as this required a change to the dataflow workers as they need to start 
importing the paths specified in the beam-plugins list. There is a release 
going on right now so that might happen in the next few days itself.

I am not sure about the crash loop in windmillio  [~charleschen] might know 
more.



> Better filesystem discovery mechanism in Python SDK
> ---
>
> Key: BEAM-2573
> URL: https://issues.apache.org/jira/browse/BEAM-2573
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow, sdk-py
>Affects Versions: 2.0.0
>Reporter: Dmitry Demeshchuk
>Priority: Minor
>
> It looks like right now custom filesystem classes have to be imported 
> explicitly: 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/filesystems.py#L30
> Seems like the current implementation doesn't allow discovering filesystems 
> that come from side packages, not from apache_beam itself. Even if I put a 
> custom FileSystem-inheriting class into a package and explicitly import it in 
> the root __init__.py of that package, it still doesn't make the class 
> discoverable.
> The problems I'm experiencing happen on Dataflow runner, while Direct runner 
> works just fine. Here's an example of Dataflow output:
> {code}
>   (320418708fe777d7): Traceback (most recent call last):
>   File 
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 
> 581, in do_work
> work_executor.execute()
>   File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", 
> line 166, in execute
> op.start()
>   File 
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/native_operations.py",
>  line 54, in start
> self.output(windowed_value)
>   File "dataflow_worker/operations.py", line 138, in 
> dataflow_worker.operations.Operation.output 
> (dataflow_worker/operations.c:5768)
> def output(self, windowed_value, output_index=0):
>   File "dataflow_worker/operations.py", line 139, in 
> dataflow_worker.operations.Operation.output 
> (dataflow_worker/operations.c:5654)
> cython.cast(Receiver, 
> self.receivers[output_index]).receive(windowed_value)
>   File "dataflow_worker/operations.py", line 72, in 
> dataflow_worker.operations.ConsumerSet.receive 
> (dataflow_worker/operations.c:3506)
> cython.cast(Operation, consumer).process(windowed_value)
>   File "dataflow_worker/operations.py", line 328, in 
> dataflow_worker.operations.DoOperation.process 
> (dataflow_worker/operations.c:11162)
> with self.scoped_process_state:
>   File "dataflow_worker/operations.py", line 329, in 
> dataflow_worker.operations.DoOperation.process 
> (dataflow_worker/operations.c:6)
> self.dofn_receiver.receive(o)
>   File "apache_beam/runners/common.py", line 382, in 
> apache_beam.runners.common.DoFnRunner.receive 
> (apache_beam/runners/common.c:10156)
> self.process(windowed_value)
>   File "apache_beam/runners/common.py", line 390, in 
> apache_beam.runners.common.DoFnRunner.process 
> (apache_beam/runners/common.c:10458)
> self._reraise_augmented(exn)
>   File "apache_beam/runners/common.py", line 431, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented 
> (apache_beam/runners/common.c:11673)
> raise new_exn, None, original_traceback
>   File "apache_beam/runners/common.py", line 388, in 
> apache_beam.runners.common.DoFnRunner.process 
> (apache_beam/runners/common.c:10371)
> self.do_fn_invoker.invoke_process(windowed_value)
>   File "apache_beam/runners/common.py", line 281, in 
> apache_beam.runners.common.PerWindowInvoker.invoke_process 
> (apache_beam/runners/common.c:8626)
> self._invoke_per_window(windowed_value)
>   File "apache_beam/runners/common.py", line 307, in 
> apache_beam.runners.common.PerWindowInvoker._invoke_per_window 
> (apache_beam/runners/common.c:9302)
> windowed_value, self.process_method(*args_for_process))
>   File 
> "/Users/dmitrydemeshchuk/miniconda2/lib/python2.7/site-packages/apache_beam/transforms/core.py",
>  line 749, in 
>   File 
> "/Users/dmitrydemeshchuk/miniconda2/lib/python2.7/site-packages/apache_beam/io/iobase.py",
>  line 891, in 
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/options/value_provider.py",
>  line 109, in _f
> return fnc(self, *args, **kwargs)
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/filebasedsink.py", 
> line 146, in initialize_write
> tmp_dir = self._create_temp_dir(file_path_prefix)
> 

[2/2] beam git commit: This closes #3514

2017-07-10 Thread jbonofre
This closes #3514


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/7f2419f0
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/7f2419f0
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/7f2419f0

Branch: refs/heads/release-2.1.0
Commit: 7f2419f0963be14abdab8cffc4562036101fdbfd
Parents: de7652b 9ffaf18
Author: Jean-Baptiste Onofré 
Authored: Mon Jul 10 21:53:33 2017 +0200
Committer: Jean-Baptiste Onofré 
Committed: Mon Jul 10 21:53:33 2017 +0200

--
 .../org/apache/beam/sdk/io/kafka/KafkaIO.java   | 49 
 1 file changed, 29 insertions(+), 20 deletions(-)
--




[GitHub] beam pull request #3514: [BEAM-2534] Cherry-pick #3461 into 2.1.0.

2017-07-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3514


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: [BEAM-2534] Handle offset gaps in Kafka messages.

2017-07-10 Thread jbonofre
Repository: beam
Updated Branches:
  refs/heads/release-2.1.0 de7652b2a -> 7f2419f09


[BEAM-2534] Handle offset gaps in Kafka messages.

KafkaIO logged a warning when there is a gap in offstes for messages.
Kafka also support 'KV' store style topics where some of the messages
are deleted leading gaps in offsets. This PR removes the log and
accounts for offset gaps in backlog estimate.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/9ffaf185
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/9ffaf185
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/9ffaf185

Branch: refs/heads/release-2.1.0
Commit: 9ffaf185ceefaf7de51b1197e00d2d54bdbb6760
Parents: 53b372b
Author: Raghu Angadi 
Authored: Wed Jun 28 12:07:06 2017 -0700
Committer: Raghu Angadi 
Committed: Thu Jul 6 22:22:13 2017 -0700

--
 .../org/apache/beam/sdk/io/kafka/KafkaIO.java   | 49 
 1 file changed, 29 insertions(+), 20 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/9ffaf185/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java
--
diff --git 
a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java 
b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java
index 702bdd3..e520367 100644
--- a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java
+++ b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java
@@ -904,6 +904,22 @@ public class KafkaIO {
   return name;
 }
 
+// Maintains approximate average over last 1000 elements
+private static class MovingAvg {
+  private static final int MOVING_AVG_WINDOW = 1000;
+  private double avg = 0;
+  private long numUpdates = 0;
+
+  void update(double quantity) {
+numUpdates++;
+avg += (quantity - avg) / Math.min(MOVING_AVG_WINDOW, numUpdates);
+  }
+
+  double get() {
+return avg;
+  }
+}
+
 // maintains state of each assigned partition (buffered records, consumed 
offset, etc)
 private static class PartitionState {
   private final TopicPartition topicPartition;
@@ -911,9 +927,8 @@ public class KafkaIO {
   private long latestOffset;
   private Iterator> recordIter = 
Collections.emptyIterator();
 
-  // simple moving average for size of each record in bytes
-  private double avgRecordSize = 0;
-  private static final int movingAvgWindow = 1000; // very roughly avg of 
last 1000 elements
+  private MovingAvg avgRecordSize = new MovingAvg();
+  private MovingAvg avgOffsetGap = new MovingAvg(); // > 0 only when log 
compaction is enabled.
 
   PartitionState(TopicPartition partition, long nextOffset) {
 this.topicPartition = partition;
@@ -921,17 +936,13 @@ public class KafkaIO {
 this.latestOffset = UNINITIALIZED_OFFSET;
   }
 
-  // update consumedOffset and avgRecordSize
-  void recordConsumed(long offset, int size) {
+  // Update consumedOffset, avgRecordSize, and avgOffsetGap
+  void recordConsumed(long offset, int size, long offsetGap) {
 nextOffset = offset + 1;
 
-// this is always updated from single thread. probably not worth 
making it an AtomicDouble
-if (avgRecordSize <= 0) {
-  avgRecordSize = size;
-} else {
-  // initially, first record heavily contributes to average.
-  avgRecordSize += ((size - avgRecordSize) / movingAvgWindow);
-}
+// This is always updated from single thread. Probably not worth 
making atomic.
+avgRecordSize.update(size);
+avgOffsetGap.update(offsetGap);
   }
 
   synchronized void setLatestOffset(long latestOffset) {
@@ -944,14 +955,15 @@ public class KafkaIO {
 if (backlogMessageCount == UnboundedReader.BACKLOG_UNKNOWN) {
   return UnboundedReader.BACKLOG_UNKNOWN;
 }
-return (long) (backlogMessageCount * avgRecordSize);
+return (long) (backlogMessageCount * avgRecordSize.get());
   }
 
   synchronized long backlogMessageCount() {
 if (latestOffset < 0 || nextOffset < 0) {
   return UnboundedReader.BACKLOG_UNKNOWN;
 }
-return Math.max(0, (latestOffset - nextOffset));
+double remaining = (latestOffset - nextOffset) / (1 + 
avgOffsetGap.get());
+return Math.max(0, (long) Math.ceil(remaining));
   }
 }
 
@@ -1154,14 +1166,11 @@ public class KafkaIO {
 continue;
   }
 
-  // sanity check
-  if (offset != expected) {
-LOG.warn("{}: gap in offsets for {} at {}. {} records missing.",
-this, pState.topicPartition, expected, offset - expected);
-

[jira] [Resolved] (BEAM-2534) KafkaIO should allow gaps in message offsets

2017-07-10 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/BEAM-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré resolved BEAM-2534.

Resolution: Fixed

> KafkaIO should allow gaps in message offsets
> 
>
> Key: BEAM-2534
> URL: https://issues.apache.org/jira/browse/BEAM-2534
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Affects Versions: 2.0.0
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
>Priority: Minor
> Fix For: 2.1.0
>
>
> KafkaIO reader logs a warning when it notices gaps in offsets for messages. 
> While such gaps are not expected for normal Kafka topics, there could be gaps 
> when log compaction is enabled (which deletes older messages for a key). 
> This warning log is not very useful. Also we should take such gaps while 
> estimating backlog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[1/2] beam git commit: Add timeout to initialization of partition in KafkaIO

2017-07-10 Thread jbonofre
Repository: beam
Updated Branches:
  refs/heads/release-2.1.0 7f2419f09 -> 2351c7e33


Add timeout to initialization of partition in KafkaIO


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/38fc2b2e
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/38fc2b2e
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/38fc2b2e

Branch: refs/heads/release-2.1.0
Commit: 38fc2b2e76fd13c3edd304ffcb5378fb36ab48c0
Parents: 53b372b
Author: Raghu Angadi 
Authored: Mon Jul 3 23:54:10 2017 -0700
Committer: Raghu Angadi 
Committed: Thu Jul 6 22:02:40 2017 -0700

--
 .../org/apache/beam/sdk/io/kafka/KafkaIO.java   | 81 +++-
 .../apache/beam/sdk/io/kafka/KafkaIOTest.java   | 30 
 2 files changed, 92 insertions(+), 19 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/38fc2b2e/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java
--
diff --git 
a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java 
b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java
index 702bdd3..28262c9 100644
--- a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java
+++ b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java
@@ -49,9 +49,11 @@ import java.util.Random;
 import java.util.Set;
 import java.util.concurrent.ExecutorService;
 import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
 import java.util.concurrent.ScheduledExecutorService;
 import java.util.concurrent.SynchronousQueue;
 import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
 import java.util.concurrent.atomic.AtomicBoolean;
 import javax.annotation.Nullable;
 import org.apache.beam.sdk.annotations.Experimental;
@@ -1049,8 +1051,32 @@ public class KafkaIO {
   curBatch = Iterators.cycle(nonEmpty);
 }
 
+private void setupInitialOffset(PartitionState pState) {
+  Read spec = source.spec;
+
+  if (pState.nextOffset != UNINITIALIZED_OFFSET) {
+consumer.seek(pState.topicPartition, pState.nextOffset);
+  } else {
+// nextOffset is unininitialized here, meaning start reading from 
latest record as of now
+// ('latest' is the default, and is configurable) or 'look up offset 
by startReadTime.
+// Remember the current position without waiting until the first 
record is read. This
+// ensures checkpoint is accurate even if the reader is closed before 
reading any records.
+Instant startReadTime = spec.getStartReadTime();
+if (startReadTime != null) {
+  pState.nextOffset =
+  consumerSpEL.offsetForTime(consumer, pState.topicPartition, 
spec.getStartReadTime());
+  consumer.seek(pState.topicPartition, pState.nextOffset);
+} else {
+  pState.nextOffset = consumer.position(pState.topicPartition);
+}
+  }
+}
+
 @Override
 public boolean start() throws IOException {
+  final int defaultPartitionInitTimeout = 60 * 1000;
+  final int kafkaRequestTimeoutMultiple = 2;
+
   Read spec = source.spec;
   consumer = spec.getConsumerFactoryFn().apply(spec.getConsumerConfig());
   consumerSpEL.evaluateAssign(consumer, spec.getTopicPartitions());
@@ -1065,25 +1091,38 @@ public class KafkaIO {
   keyDeserializerInstance.configure(spec.getConsumerConfig(), true);
   valueDeserializerInstance.configure(spec.getConsumerConfig(), false);
 
-  for (PartitionState p : partitionStates) {
-if (p.nextOffset != UNINITIALIZED_OFFSET) {
-  consumer.seek(p.topicPartition, p.nextOffset);
-} else {
-  // nextOffset is unininitialized here, meaning start reading from 
latest record as of now
-  // ('latest' is the default, and is configurable) or 'look up offset 
by startReadTime.
-  // Remember the current position without waiting until the first 
record is read. This
-  // ensures checkpoint is accurate even if the reader is closed 
before reading any records.
-  Instant startReadTime = spec.getStartReadTime();
-  if (startReadTime != null) {
-p.nextOffset =
-consumerSpEL.offsetForTime(consumer, p.topicPartition, 
spec.getStartReadTime());
-consumer.seek(p.topicPartition, p.nextOffset);
-  } else {
-p.nextOffset = consumer.position(p.topicPartition);
+  // Seek to start offset for each partition. This is the first 
interaction with the server.
+  // Unfortunately it can block forever in case of network issues like 
incorrect ACLs.
+  // Initialize partition in a separate thread and cancel it if takes 
longer than a minute.
+  for (

[jira] [Commented] (BEAM-2551) KafkaIO reader blocks indefinitely if servers are not reachable

2017-07-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080995#comment-16080995
 ] 

ASF GitHub Bot commented on BEAM-2551:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3513


> KafkaIO reader blocks indefinitely if servers are not reachable
> ---
>
> Key: BEAM-2551
> URL: https://issues.apache.org/jira/browse/BEAM-2551
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Affects Versions: 2.0.0
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
> Fix For: 2.1.0
>
>
> If the KafaIO source reader on the worker can't reach the server, Kafka 
> consumer blocks forever inside {{UnboundedReader.start()}}. Users have no 
> indication of the error. It is better if start() fails with an error. 
> It is easy to reproduce in Kafka. I reported it on Kafka users list here : 
> https://lists.apache.org/thread.html/98cebefacbd65b0d6c6817fe0b5197e26bc60252e72d05fced91e628@%3Cusers.kafka.apache.org%3E
> It blocks inside Kafka client. Fortunately it can be unblocked with 
> KafkaConsumer.wakeup(). We could run initialization in another thread and 
> cancel it if takes longer than a minute.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[2/2] beam git commit: This closes #3513

2017-07-10 Thread jbonofre
This closes #3513


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/2351c7e3
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/2351c7e3
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/2351c7e3

Branch: refs/heads/release-2.1.0
Commit: 2351c7e33d2949392d3db3b1bf5a403b504824fa
Parents: 7f2419f 38fc2b2
Author: Jean-Baptiste Onofré 
Authored: Mon Jul 10 22:00:31 2017 +0200
Committer: Jean-Baptiste Onofré 
Committed: Mon Jul 10 22:00:31 2017 +0200

--
 .../org/apache/beam/sdk/io/kafka/KafkaIO.java   | 81 +++-
 .../apache/beam/sdk/io/kafka/KafkaIOTest.java   | 30 
 2 files changed, 92 insertions(+), 19 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/2351c7e3/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java
--



[GitHub] beam pull request #3513: [BEAM-2551] Cherrypick #3492 to 2.1.0

2017-07-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3513


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-2573) Better filesystem discovery mechanism in Python SDK

2017-07-10 Thread Dmitry Demeshchuk (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080997#comment-16080997
 ] 

Dmitry Demeshchuk commented on BEAM-2573:
-

[~sb2nov] Thanks for the explanation! I'll wait till the 2.1.0 release, then 
(or maybe release candidates would be good enough too).

By the way, am I right to assume that full Dataflow compatibility is only 
guaranteed for tagged Beam versions, then?

> Better filesystem discovery mechanism in Python SDK
> ---
>
> Key: BEAM-2573
> URL: https://issues.apache.org/jira/browse/BEAM-2573
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow, sdk-py
>Affects Versions: 2.0.0
>Reporter: Dmitry Demeshchuk
>Priority: Minor
>
> It looks like right now custom filesystem classes have to be imported 
> explicitly: 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/filesystems.py#L30
> Seems like the current implementation doesn't allow discovering filesystems 
> that come from side packages, not from apache_beam itself. Even if I put a 
> custom FileSystem-inheriting class into a package and explicitly import it in 
> the root __init__.py of that package, it still doesn't make the class 
> discoverable.
> The problems I'm experiencing happen on Dataflow runner, while Direct runner 
> works just fine. Here's an example of Dataflow output:
> {code}
>   (320418708fe777d7): Traceback (most recent call last):
>   File 
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 
> 581, in do_work
> work_executor.execute()
>   File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", 
> line 166, in execute
> op.start()
>   File 
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/native_operations.py",
>  line 54, in start
> self.output(windowed_value)
>   File "dataflow_worker/operations.py", line 138, in 
> dataflow_worker.operations.Operation.output 
> (dataflow_worker/operations.c:5768)
> def output(self, windowed_value, output_index=0):
>   File "dataflow_worker/operations.py", line 139, in 
> dataflow_worker.operations.Operation.output 
> (dataflow_worker/operations.c:5654)
> cython.cast(Receiver, 
> self.receivers[output_index]).receive(windowed_value)
>   File "dataflow_worker/operations.py", line 72, in 
> dataflow_worker.operations.ConsumerSet.receive 
> (dataflow_worker/operations.c:3506)
> cython.cast(Operation, consumer).process(windowed_value)
>   File "dataflow_worker/operations.py", line 328, in 
> dataflow_worker.operations.DoOperation.process 
> (dataflow_worker/operations.c:11162)
> with self.scoped_process_state:
>   File "dataflow_worker/operations.py", line 329, in 
> dataflow_worker.operations.DoOperation.process 
> (dataflow_worker/operations.c:6)
> self.dofn_receiver.receive(o)
>   File "apache_beam/runners/common.py", line 382, in 
> apache_beam.runners.common.DoFnRunner.receive 
> (apache_beam/runners/common.c:10156)
> self.process(windowed_value)
>   File "apache_beam/runners/common.py", line 390, in 
> apache_beam.runners.common.DoFnRunner.process 
> (apache_beam/runners/common.c:10458)
> self._reraise_augmented(exn)
>   File "apache_beam/runners/common.py", line 431, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented 
> (apache_beam/runners/common.c:11673)
> raise new_exn, None, original_traceback
>   File "apache_beam/runners/common.py", line 388, in 
> apache_beam.runners.common.DoFnRunner.process 
> (apache_beam/runners/common.c:10371)
> self.do_fn_invoker.invoke_process(windowed_value)
>   File "apache_beam/runners/common.py", line 281, in 
> apache_beam.runners.common.PerWindowInvoker.invoke_process 
> (apache_beam/runners/common.c:8626)
> self._invoke_per_window(windowed_value)
>   File "apache_beam/runners/common.py", line 307, in 
> apache_beam.runners.common.PerWindowInvoker._invoke_per_window 
> (apache_beam/runners/common.c:9302)
> windowed_value, self.process_method(*args_for_process))
>   File 
> "/Users/dmitrydemeshchuk/miniconda2/lib/python2.7/site-packages/apache_beam/transforms/core.py",
>  line 749, in 
>   File 
> "/Users/dmitrydemeshchuk/miniconda2/lib/python2.7/site-packages/apache_beam/io/iobase.py",
>  line 891, in 
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/options/value_provider.py",
>  line 109, in _f
> return fnc(self, *args, **kwargs)
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/filebasedsink.py", 
> line 146, in initialize_write
> tmp_dir = self._create_temp_dir(file_path_prefix)
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/filebasedsink.py", 
> line 151, in _create_temp_dir
> base_path, last_component = FileSystems.split(file_path_prefix)
>   File 
> "/u

[2/3] beam git commit: Ignore processing time timers in expired windows

2017-07-10 Thread jbonofre
Ignore processing time timers in expired windows


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/3d7b0098
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/3d7b0098
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/3d7b0098

Branch: refs/heads/release-2.1.0
Commit: 3d7b00983d2ef215639c3fefc3d8df487aac7b2e
Parents: 53b372b
Author: Kenneth Knowles 
Authored: Thu Jun 22 18:09:11 2017 -0700
Committer: Kenneth Knowles 
Committed: Thu Jul 6 14:38:58 2017 -0700

--
 .../beam/runners/core/ReduceFnRunner.java   | 10 ++
 .../beam/runners/core/ReduceFnRunnerTest.java   | 32 
 2 files changed, 42 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/3d7b0098/runners/core-java/src/main/java/org/apache/beam/runners/core/ReduceFnRunner.java
--
diff --git 
a/runners/core-java/src/main/java/org/apache/beam/runners/core/ReduceFnRunner.java
 
b/runners/core-java/src/main/java/org/apache/beam/runners/core/ReduceFnRunner.java
index ef33bef..0632c05 100644
--- 
a/runners/core-java/src/main/java/org/apache/beam/runners/core/ReduceFnRunner.java
+++ 
b/runners/core-java/src/main/java/org/apache/beam/runners/core/ReduceFnRunner.java
@@ -693,6 +693,11 @@ public class ReduceFnRunner {
   @SuppressWarnings("unchecked")
 WindowNamespace windowNamespace = (WindowNamespace) 
timer.getNamespace();
   W window = windowNamespace.getWindow();
+
+  if (TimeDomain.PROCESSING_TIME == timer.getDomain() && 
windowIsExpired(window)) {
+continue;
+  }
+
   ReduceFn.Context directContext =
   contextFactory.base(window, StateStyle.DIRECT);
   ReduceFn.Context renamedContext =
@@ -1090,4 +1095,9 @@ public class ReduceFnRunner {
 }
   }
 
+  private boolean windowIsExpired(BoundedWindow w) {
+return timerInternals
+.currentInputWatermarkTime()
+
.isAfter(w.maxTimestamp().plus(windowingStrategy.getAllowedLateness()));
+  }
 }

http://git-wip-us.apache.org/repos/asf/beam/blob/3d7b0098/runners/core-java/src/test/java/org/apache/beam/runners/core/ReduceFnRunnerTest.java
--
diff --git 
a/runners/core-java/src/test/java/org/apache/beam/runners/core/ReduceFnRunnerTest.java
 
b/runners/core-java/src/test/java/org/apache/beam/runners/core/ReduceFnRunnerTest.java
index 3a2c220..79ee91b 100644
--- 
a/runners/core-java/src/test/java/org/apache/beam/runners/core/ReduceFnRunnerTest.java
+++ 
b/runners/core-java/src/test/java/org/apache/beam/runners/core/ReduceFnRunnerTest.java
@@ -286,6 +286,38 @@ public class ReduceFnRunnerTest {
 
   /**
* Tests that when a processing time timer comes in after a window is expired
+   * it is just ignored.
+   */
+  @Test
+  public void testLateProcessingTimeTimer() throws Exception {
+WindowingStrategy strategy =
+WindowingStrategy.of((WindowFn) 
FixedWindows.of(Duration.millis(100)))
+.withTimestampCombiner(TimestampCombiner.EARLIEST)
+.withMode(AccumulationMode.ACCUMULATING_FIRED_PANES)
+.withAllowedLateness(Duration.ZERO)
+.withTrigger(
+Repeatedly.forever(
+
AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.millis(10;
+
+ReduceFnTester tester =
+ReduceFnTester.combining(strategy, Sum.ofIntegers(), VarIntCoder.of());
+
+tester.advanceProcessingTime(new Instant(5000));
+injectElement(tester, 2); // processing timer @ 5000 + 10; EOW timer @ 100
+injectElement(tester, 5);
+
+// After this advancement, the window is expired and only the GC process
+// should be allowed to touch it
+tester.advanceInputWatermarkNoTimers(new Instant(100));
+
+// This should not output
+tester.advanceProcessingTime(new Instant(6000));
+
+assertThat(tester.extractOutput(), emptyIterable());
+  }
+
+  /**
+   * Tests that when a processing time timer comes in after a window is expired
* but in the same bundle it does not cause a spurious output.
*/
   @Test



[1/3] beam git commit: Process timer firings for a window together

2017-07-10 Thread jbonofre
Repository: beam
Updated Branches:
  refs/heads/release-2.1.0 2351c7e33 -> 5c4ea884d


Process timer firings for a window together


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/b1e53d6d
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/b1e53d6d
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/b1e53d6d

Branch: refs/heads/release-2.1.0
Commit: b1e53d6d87a14e3116f996d44b14540b6b7bfa59
Parents: 3d7b009
Author: Kenneth Knowles 
Authored: Thu Jun 22 18:43:39 2017 -0700
Committer: Kenneth Knowles 
Committed: Thu Jul 6 14:38:58 2017 -0700

--
 .../examples/complete/game/LeaderBoardTest.java |  2 +
 .../beam/runners/core/ReduceFnRunner.java   | 98 +---
 .../beam/runners/core/ReduceFnRunnerTest.java   | 49 +-
 3 files changed, 115 insertions(+), 34 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/b1e53d6d/examples/java8/src/test/java/org/apache/beam/examples/complete/game/LeaderBoardTest.java
--
diff --git 
a/examples/java8/src/test/java/org/apache/beam/examples/complete/game/LeaderBoardTest.java
 
b/examples/java8/src/test/java/org/apache/beam/examples/complete/game/LeaderBoardTest.java
index 745c210..611e2b3 100644
--- 
a/examples/java8/src/test/java/org/apache/beam/examples/complete/game/LeaderBoardTest.java
+++ 
b/examples/java8/src/test/java/org/apache/beam/examples/complete/game/LeaderBoardTest.java
@@ -276,6 +276,8 @@ public class LeaderBoardTest implements Serializable {
 .addElements(event(TestUser.RED_ONE, 4, Duration.standardMinutes(2)),
 event(TestUser.BLUE_TWO, 3, Duration.ZERO),
 event(TestUser.BLUE_ONE, 3, Duration.standardMinutes(3)))
+// Move the watermark to the end of the window to output on time
+.advanceWatermarkTo(baseTime.plus(TEAM_WINDOW_DURATION))
 // Move the watermark past the end of the allowed lateness plus the 
end of the window
 .advanceWatermarkTo(baseTime.plus(ALLOWED_LATENESS)
 .plus(TEAM_WINDOW_DURATION).plus(Duration.standardMinutes(1)))

http://git-wip-us.apache.org/repos/asf/beam/blob/b1e53d6d/runners/core-java/src/main/java/org/apache/beam/runners/core/ReduceFnRunner.java
--
diff --git 
a/runners/core-java/src/main/java/org/apache/beam/runners/core/ReduceFnRunner.java
 
b/runners/core-java/src/main/java/org/apache/beam/runners/core/ReduceFnRunner.java
index 0632c05..634a2d1 100644
--- 
a/runners/core-java/src/main/java/org/apache/beam/runners/core/ReduceFnRunner.java
+++ 
b/runners/core-java/src/main/java/org/apache/beam/runners/core/ReduceFnRunner.java
@@ -29,7 +29,6 @@ import java.util.Collection;
 import java.util.Collections;
 import java.util.HashMap;
 import java.util.HashSet;
-import java.util.LinkedList;
 import java.util.List;
 import java.util.Map;
 import java.util.Set;
@@ -638,11 +637,9 @@ public class ReduceFnRunner {
   }
 
   /**
-   * Enriches TimerData with state necessary for processing a timer as well as
-   * common queries about a timer.
+   * A descriptor of the activation for a window based on a timer.
*/
-  private class EnrichedTimerData {
-public final Instant timestamp;
+  private class WindowActivation {
 public final ReduceFn.Context directContext;
 public final ReduceFn.Context renamedContext;
 // If this is an end-of-window timer then we may need to set a garbage 
collection timer
@@ -653,19 +650,34 @@ public class ReduceFnRunner {
 // end-of-window time to be a signal to garbage collect.
 public final boolean isGarbageCollection;
 
-EnrichedTimerData(
-TimerData timer,
+WindowActivation(
 ReduceFn.Context directContext,
 ReduceFn.Context renamedContext) {
-  this.timestamp = timer.getTimestamp();
   this.directContext = directContext;
   this.renamedContext = renamedContext;
   W window = directContext.window();
-  this.isEndOfWindow = TimeDomain.EVENT_TIME == timer.getDomain()
-  && timer.getTimestamp().equals(window.maxTimestamp());
-  Instant cleanupTime = LateDataUtils.garbageCollectionTime(window, 
windowingStrategy);
+
+  // The output watermark is before the end of the window if it is either 
unknown
+  // or it is known to be before it. If it is unknown, that means that 
there hasn't been
+  // enough data to advance it.
+  boolean outputWatermarkBeforeEOW =
+  timerInternals.currentOutputWatermarkTime() == null
+  || 
!timerInternals.currentOutputWatermarkTime().isAfter(window.maxTimestamp());
+
+  // The "end of the window" is reached when the local input watermark 
(for this key) surpasses
+  // it but the local output watermark (al

[3/3] beam git commit: This closes #3511

2017-07-10 Thread jbonofre
This closes #3511


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/5c4ea884
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/5c4ea884
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/5c4ea884

Branch: refs/heads/release-2.1.0
Commit: 5c4ea884dcbb87cacba5ac67de3e4162ed2ffeea
Parents: 2351c7e b1e53d6
Author: Jean-Baptiste Onofré 
Authored: Mon Jul 10 22:04:55 2017 +0200
Committer: Jean-Baptiste Onofré 
Committed: Mon Jul 10 22:04:55 2017 +0200

--
 .../examples/complete/game/LeaderBoardTest.java |   2 +
 .../beam/runners/core/ReduceFnRunner.java   | 106 +--
 .../beam/runners/core/ReduceFnRunnerTest.java   |  81 +-
 3 files changed, 156 insertions(+), 33 deletions(-)
--




[GitHub] beam pull request #3511: [BEAM-2505, BEAM-2502] Fixes to ReduceFnRunner.onTi...

2017-07-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3511


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (BEAM-2551) KafkaIO reader blocks indefinitely if servers are not reachable

2017-07-10 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/BEAM-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré resolved BEAM-2551.

Resolution: Fixed

> KafkaIO reader blocks indefinitely if servers are not reachable
> ---
>
> Key: BEAM-2551
> URL: https://issues.apache.org/jira/browse/BEAM-2551
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Affects Versions: 2.0.0
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
> Fix For: 2.1.0
>
>
> If the KafaIO source reader on the worker can't reach the server, Kafka 
> consumer blocks forever inside {{UnboundedReader.start()}}. Users have no 
> indication of the error. It is better if start() fails with an error. 
> It is easy to reproduce in Kafka. I reported it on Kafka users list here : 
> https://lists.apache.org/thread.html/98cebefacbd65b0d6c6817fe0b5197e26bc60252e72d05fced91e628@%3Cusers.kafka.apache.org%3E
> It blocks inside Kafka client. Fortunately it can be unblocked with 
> KafkaConsumer.wakeup(). We could run initialization in another thread and 
> cancel it if takes longer than a minute.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (BEAM-2502) Processing time timers for expired windows are not ignored

2017-07-10 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/BEAM-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré resolved BEAM-2502.

Resolution: Fixed

> Processing time timers for expired windows are not ignored
> --
>
> Key: BEAM-2502
> URL: https://issues.apache.org/jira/browse/BEAM-2502
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Minor
> Fix For: 2.1.0
>
>
> If the ReduceFnRunner receives a processing time timer for an expired window, 
> it may produce output even though the window is expired (and may have already 
> sent a final output!)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (BEAM-2505) When EOW != GC and the timers fire in together, the output is not marked as the final pane

2017-07-10 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/BEAM-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré resolved BEAM-2505.

Resolution: Fixed

> When EOW != GC and the timers fire in together, the output is not marked as 
> the final pane
> --
>
> Key: BEAM-2505
> URL: https://issues.apache.org/jira/browse/BEAM-2505
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
> Fix For: 2.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2505) When EOW != GC and the timers fire in together, the output is not marked as the final pane

2017-07-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081002#comment-16081002
 ] 

ASF GitHub Bot commented on BEAM-2505:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3511


> When EOW != GC and the timers fire in together, the output is not marked as 
> the final pane
> --
>
> Key: BEAM-2505
> URL: https://issues.apache.org/jira/browse/BEAM-2505
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
> Fix For: 2.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2530) Make Beam compatible with Java 9

2017-07-10 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081108#comment-16081108
 ] 

Kenneth Knowles commented on BEAM-2530:
---

This is still on 2.1.0 burndown. Since it is a meta-issue, is there anything 
actually blocking 2.1.0 that can be cherry picked?

> Make Beam compatible with Java 9
> 
>
> Key: BEAM-2530
> URL: https://issues.apache.org/jira/browse/BEAM-2530
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: Not applicable
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.1.0
>
>
> Java 9 seems to be finally been released this year, this is a JIRA to keep 
> track of the needed changes to support Beam on Java 9.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3124: Serialize all Primitive Transforms in the DirectRun...

2017-07-10 Thread tgroh
Github user tgroh closed the pull request at:

https://github.com/apache/beam/pull/3124


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-2571) Flink ValidatesRunner failing CombineTest.testSlidingWindowsCombineWithContext

2017-07-10 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081106#comment-16081106
 ] 

Kenneth Knowles commented on BEAM-2571:
---

Since this is longstanding, and other tests of side inputs seem OK, I don't 
feel the same urgency to block 2.1.0 on it. But I am a little surprised we got 
this far. Perhaps it is just because that one test is tighter than others?

> Flink ValidatesRunner failing CombineTest.testSlidingWindowsCombineWithContext
> --
>
> Key: BEAM-2571
> URL: https://issues.apache.org/jira/browse/BEAM-2571
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kenneth Knowles
>Assignee: Aljoscha Krettek
> Fix For: 2.1.0
>
>
> This appears to have been caused by https://github.com/apache/beam/pull/3429 
> which fixes a couple errors in how trigger timers were processed / final 
> panes labeled.
> I am investigating, considering roll back vs forward fix. Since it is an 
> esoteric use case where I would advise users to use a stateful DoFn instead, 
> I think the bug fixed probably outweighs the bug introduced. I would like to 
> fix for 2.1.0 but will report back soon.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (BEAM-2571) Flink ValidatesRunner failing CombineTest.testSlidingWindowsCombineWithContext

2017-07-10 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081106#comment-16081106
 ] 

Kenneth Knowles edited comment on BEAM-2571 at 7/10/17 8:40 PM:


Since this is longstanding, and other tests of side inputs seem OK, I don't 
feel the same urgency to block 2.1.0 on it because of this failure. We might 
consider blocking any release based on the danger here...

But I am a little surprised we got this far. Perhaps it is just because that 
one test is tighter than others?


was (Author: kenn):
Since this is longstanding, and other tests of side inputs seem OK, I don't 
feel the same urgency to block 2.1.0 on it. But I am a little surprised we got 
this far. Perhaps it is just because that one test is tighter than others?

> Flink ValidatesRunner failing CombineTest.testSlidingWindowsCombineWithContext
> --
>
> Key: BEAM-2571
> URL: https://issues.apache.org/jira/browse/BEAM-2571
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kenneth Knowles
>Assignee: Aljoscha Krettek
> Fix For: 2.1.0
>
>
> This appears to have been caused by https://github.com/apache/beam/pull/3429 
> which fixes a couple errors in how trigger timers were processed / final 
> panes labeled.
> I am investigating, considering roll back vs forward fix. Since it is an 
> esoteric use case where I would advise users to use a stateful DoFn instead, 
> I think the bug fixed probably outweighs the bug introduced. I would like to 
> fix for 2.1.0 but will report back soon.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3049: Update signature for Deterministic Helper

2017-07-10 Thread tgroh
Github user tgroh closed the pull request at:

https://github.com/apache/beam/pull/3049


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #3402: Add a SlidingWindows Test for Combine without Conte...

2017-07-10 Thread tgroh
Github user tgroh closed the pull request at:

https://github.com/apache/beam/pull/3402


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (BEAM-2579) Improve component determinism methods in StandardCoder

2017-07-10 Thread Thomas Groh (JIRA)
Thomas Groh created BEAM-2579:
-

 Summary: Improve component determinism methods in StandardCoder
 Key: BEAM-2579
 URL: https://issues.apache.org/jira/browse/BEAM-2579
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Reporter: Thomas Groh
Priority: Trivial


Effectively, add the methods that are introduced in 
https://github.com/apache/beam/pull/3049, while retaining (but deprecating) the 
static verifyDeterministic methods.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2573) Better filesystem discovery mechanism in Python SDK

2017-07-10 Thread Sourabh Bajaj (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081145#comment-16081145
 ] 

Sourabh Bajaj commented on BEAM-2573:
-

I think the worker is specified in 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/internal/dependency.py#L81
 so it is related to the version of the sdk being used.

I might be wrong about needing the release if you're on master as I can run  
the following command and see logs about failing to import the dataflow package 
on head and success for the builtin plugins.

{code:shell}
python -m apache_beam.examples.wordcount --output $BUCKET/wc/output --project 
$PROJECT --staging_location $BUCKET/wc/staging --temp_location $BUCKET/wc/tmp  
--job_name "sourabhbajaj-wc-4" --runner DataflowRunner --sdk_location 
dist/apache-beam-2.1.0.dev0.tar.gz  --beam_plugin dataflow.s3.aws.S3FS
{code}


> Better filesystem discovery mechanism in Python SDK
> ---
>
> Key: BEAM-2573
> URL: https://issues.apache.org/jira/browse/BEAM-2573
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow, sdk-py
>Affects Versions: 2.0.0
>Reporter: Dmitry Demeshchuk
>Priority: Minor
>
> It looks like right now custom filesystem classes have to be imported 
> explicitly: 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/filesystems.py#L30
> Seems like the current implementation doesn't allow discovering filesystems 
> that come from side packages, not from apache_beam itself. Even if I put a 
> custom FileSystem-inheriting class into a package and explicitly import it in 
> the root __init__.py of that package, it still doesn't make the class 
> discoverable.
> The problems I'm experiencing happen on Dataflow runner, while Direct runner 
> works just fine. Here's an example of Dataflow output:
> {code}
>   (320418708fe777d7): Traceback (most recent call last):
>   File 
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 
> 581, in do_work
> work_executor.execute()
>   File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", 
> line 166, in execute
> op.start()
>   File 
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/native_operations.py",
>  line 54, in start
> self.output(windowed_value)
>   File "dataflow_worker/operations.py", line 138, in 
> dataflow_worker.operations.Operation.output 
> (dataflow_worker/operations.c:5768)
> def output(self, windowed_value, output_index=0):
>   File "dataflow_worker/operations.py", line 139, in 
> dataflow_worker.operations.Operation.output 
> (dataflow_worker/operations.c:5654)
> cython.cast(Receiver, 
> self.receivers[output_index]).receive(windowed_value)
>   File "dataflow_worker/operations.py", line 72, in 
> dataflow_worker.operations.ConsumerSet.receive 
> (dataflow_worker/operations.c:3506)
> cython.cast(Operation, consumer).process(windowed_value)
>   File "dataflow_worker/operations.py", line 328, in 
> dataflow_worker.operations.DoOperation.process 
> (dataflow_worker/operations.c:11162)
> with self.scoped_process_state:
>   File "dataflow_worker/operations.py", line 329, in 
> dataflow_worker.operations.DoOperation.process 
> (dataflow_worker/operations.c:6)
> self.dofn_receiver.receive(o)
>   File "apache_beam/runners/common.py", line 382, in 
> apache_beam.runners.common.DoFnRunner.receive 
> (apache_beam/runners/common.c:10156)
> self.process(windowed_value)
>   File "apache_beam/runners/common.py", line 390, in 
> apache_beam.runners.common.DoFnRunner.process 
> (apache_beam/runners/common.c:10458)
> self._reraise_augmented(exn)
>   File "apache_beam/runners/common.py", line 431, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented 
> (apache_beam/runners/common.c:11673)
> raise new_exn, None, original_traceback
>   File "apache_beam/runners/common.py", line 388, in 
> apache_beam.runners.common.DoFnRunner.process 
> (apache_beam/runners/common.c:10371)
> self.do_fn_invoker.invoke_process(windowed_value)
>   File "apache_beam/runners/common.py", line 281, in 
> apache_beam.runners.common.PerWindowInvoker.invoke_process 
> (apache_beam/runners/common.c:8626)
> self._invoke_per_window(windowed_value)
>   File "apache_beam/runners/common.py", line 307, in 
> apache_beam.runners.common.PerWindowInvoker._invoke_per_window 
> (apache_beam/runners/common.c:9302)
> windowed_value, self.process_method(*args_for_process))
>   File 
> "/Users/dmitrydemeshchuk/miniconda2/lib/python2.7/site-packages/apache_beam/transforms/core.py",
>  line 749, in 
>   File 
> "/Users/dmitrydemeshchuk/miniconda2/lib/python2.7/site-packages/apache_beam/io/iobase.py",
>  line 891, in 
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_b

[jira] [Comment Edited] (BEAM-2573) Better filesystem discovery mechanism in Python SDK

2017-07-10 Thread Sourabh Bajaj (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081145#comment-16081145
 ] 

Sourabh Bajaj edited comment on BEAM-2573 at 7/10/17 9:03 PM:
--

I think the worker is specified in 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/internal/dependency.py#L81
 so it is related to the version of the sdk being used.

I might be wrong about needing the release if you're on master as I can run  
the following command and see logs about failing to import the dataflow package 
on head and success for the builtin plugins.

{code:none}
python -m apache_beam.examples.wordcount --output $BUCKET/wc/output --project 
$PROJECT --staging_location $BUCKET/wc/staging --temp_location $BUCKET/wc/tmp  
--job_name "sourabhbajaj-wc-4" --runner DataflowRunner --sdk_location 
dist/apache-beam-2.1.0.dev0.tar.gz  --beam_plugin dataflow.s3.aws.S3FS
{code}


{code:none}
13:54:41.574
Failed to import beam plugin dataflow.s3.aws.S3FS
13:54:41.573
Successfully imported beam plugin apache_beam.io.filesystem.FileSystem
13:54:41.573
Successfully imported beam plugin apache_beam.io.localfilesystem.LocalFileSystem
13:54:41.573
Successfully imported beam plugin apache_beam.io.gcp.gcsfilesystem.GCSFileSystem
{code}




was (Author: sb2nov):
I think the worker is specified in 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/internal/dependency.py#L81
 so it is related to the version of the sdk being used.

I might be wrong about needing the release if you're on master as I can run  
the following command and see logs about failing to import the dataflow package 
on head and success for the builtin plugins.

{code:shell}
python -m apache_beam.examples.wordcount --output $BUCKET/wc/output --project 
$PROJECT --staging_location $BUCKET/wc/staging --temp_location $BUCKET/wc/tmp  
--job_name "sourabhbajaj-wc-4" --runner DataflowRunner --sdk_location 
dist/apache-beam-2.1.0.dev0.tar.gz  --beam_plugin dataflow.s3.aws.S3FS
{code}


> Better filesystem discovery mechanism in Python SDK
> ---
>
> Key: BEAM-2573
> URL: https://issues.apache.org/jira/browse/BEAM-2573
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow, sdk-py
>Affects Versions: 2.0.0
>Reporter: Dmitry Demeshchuk
>Priority: Minor
>
> It looks like right now custom filesystem classes have to be imported 
> explicitly: 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/filesystems.py#L30
> Seems like the current implementation doesn't allow discovering filesystems 
> that come from side packages, not from apache_beam itself. Even if I put a 
> custom FileSystem-inheriting class into a package and explicitly import it in 
> the root __init__.py of that package, it still doesn't make the class 
> discoverable.
> The problems I'm experiencing happen on Dataflow runner, while Direct runner 
> works just fine. Here's an example of Dataflow output:
> {code}
>   (320418708fe777d7): Traceback (most recent call last):
>   File 
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 
> 581, in do_work
> work_executor.execute()
>   File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", 
> line 166, in execute
> op.start()
>   File 
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/native_operations.py",
>  line 54, in start
> self.output(windowed_value)
>   File "dataflow_worker/operations.py", line 138, in 
> dataflow_worker.operations.Operation.output 
> (dataflow_worker/operations.c:5768)
> def output(self, windowed_value, output_index=0):
>   File "dataflow_worker/operations.py", line 139, in 
> dataflow_worker.operations.Operation.output 
> (dataflow_worker/operations.c:5654)
> cython.cast(Receiver, 
> self.receivers[output_index]).receive(windowed_value)
>   File "dataflow_worker/operations.py", line 72, in 
> dataflow_worker.operations.ConsumerSet.receive 
> (dataflow_worker/operations.c:3506)
> cython.cast(Operation, consumer).process(windowed_value)
>   File "dataflow_worker/operations.py", line 328, in 
> dataflow_worker.operations.DoOperation.process 
> (dataflow_worker/operations.c:11162)
> with self.scoped_process_state:
>   File "dataflow_worker/operations.py", line 329, in 
> dataflow_worker.operations.DoOperation.process 
> (dataflow_worker/operations.c:6)
> self.dofn_receiver.receive(o)
>   File "apache_beam/runners/common.py", line 382, in 
> apache_beam.runners.common.DoFnRunner.receive 
> (apache_beam/runners/common.c:10156)
> self.process(windowed_value)
>   File "apache_beam/runners/common.py", line 390, in 
> apache_beam.runners.common.DoFnRunner.process 
> (apache_beam/runners/common.c:10458)
> 

[jira] [Commented] (BEAM-2573) Better filesystem discovery mechanism in Python SDK

2017-07-10 Thread Ahmet Altay (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081160#comment-16081160
 ] 

Ahmet Altay commented on BEAM-2573:
---

[~demeshchuk], I belive you are mixing two SDK versions. The SDK version 
installed in your virtual environment should exactly match the sdk version you 
use with --sdk_location flag.

You have two options:
1. Use the released version. pip install the latest version and do not set the 
--sdk_location flag. You will not have the new features that will be available 
in the next version.
2. Use the head version. Build and sdk from head. Install that in your new 
virtual environment and also use --sdk_location flag to point to that sdk. You 
will have all the new features available at head.

> Better filesystem discovery mechanism in Python SDK
> ---
>
> Key: BEAM-2573
> URL: https://issues.apache.org/jira/browse/BEAM-2573
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow, sdk-py
>Affects Versions: 2.0.0
>Reporter: Dmitry Demeshchuk
>Priority: Minor
>
> It looks like right now custom filesystem classes have to be imported 
> explicitly: 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/filesystems.py#L30
> Seems like the current implementation doesn't allow discovering filesystems 
> that come from side packages, not from apache_beam itself. Even if I put a 
> custom FileSystem-inheriting class into a package and explicitly import it in 
> the root __init__.py of that package, it still doesn't make the class 
> discoverable.
> The problems I'm experiencing happen on Dataflow runner, while Direct runner 
> works just fine. Here's an example of Dataflow output:
> {code}
>   (320418708fe777d7): Traceback (most recent call last):
>   File 
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 
> 581, in do_work
> work_executor.execute()
>   File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", 
> line 166, in execute
> op.start()
>   File 
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/native_operations.py",
>  line 54, in start
> self.output(windowed_value)
>   File "dataflow_worker/operations.py", line 138, in 
> dataflow_worker.operations.Operation.output 
> (dataflow_worker/operations.c:5768)
> def output(self, windowed_value, output_index=0):
>   File "dataflow_worker/operations.py", line 139, in 
> dataflow_worker.operations.Operation.output 
> (dataflow_worker/operations.c:5654)
> cython.cast(Receiver, 
> self.receivers[output_index]).receive(windowed_value)
>   File "dataflow_worker/operations.py", line 72, in 
> dataflow_worker.operations.ConsumerSet.receive 
> (dataflow_worker/operations.c:3506)
> cython.cast(Operation, consumer).process(windowed_value)
>   File "dataflow_worker/operations.py", line 328, in 
> dataflow_worker.operations.DoOperation.process 
> (dataflow_worker/operations.c:11162)
> with self.scoped_process_state:
>   File "dataflow_worker/operations.py", line 329, in 
> dataflow_worker.operations.DoOperation.process 
> (dataflow_worker/operations.c:6)
> self.dofn_receiver.receive(o)
>   File "apache_beam/runners/common.py", line 382, in 
> apache_beam.runners.common.DoFnRunner.receive 
> (apache_beam/runners/common.c:10156)
> self.process(windowed_value)
>   File "apache_beam/runners/common.py", line 390, in 
> apache_beam.runners.common.DoFnRunner.process 
> (apache_beam/runners/common.c:10458)
> self._reraise_augmented(exn)
>   File "apache_beam/runners/common.py", line 431, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented 
> (apache_beam/runners/common.c:11673)
> raise new_exn, None, original_traceback
>   File "apache_beam/runners/common.py", line 388, in 
> apache_beam.runners.common.DoFnRunner.process 
> (apache_beam/runners/common.c:10371)
> self.do_fn_invoker.invoke_process(windowed_value)
>   File "apache_beam/runners/common.py", line 281, in 
> apache_beam.runners.common.PerWindowInvoker.invoke_process 
> (apache_beam/runners/common.c:8626)
> self._invoke_per_window(windowed_value)
>   File "apache_beam/runners/common.py", line 307, in 
> apache_beam.runners.common.PerWindowInvoker._invoke_per_window 
> (apache_beam/runners/common.c:9302)
> windowed_value, self.process_method(*args_for_process))
>   File 
> "/Users/dmitrydemeshchuk/miniconda2/lib/python2.7/site-packages/apache_beam/transforms/core.py",
>  line 749, in 
>   File 
> "/Users/dmitrydemeshchuk/miniconda2/lib/python2.7/site-packages/apache_beam/io/iobase.py",
>  line 891, in 
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/options/value_provider.py",
>  line 109, in _f
> return fnc(self, *args, **kwargs)
>   File 
> "/usr/local/lib/python2.7/dist-package

[jira] [Comment Edited] (BEAM-2573) Better filesystem discovery mechanism in Python SDK

2017-07-10 Thread Sourabh Bajaj (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081145#comment-16081145
 ] 

Sourabh Bajaj edited comment on BEAM-2573 at 7/10/17 9:03 PM:
--

I think the worker is specified in 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/internal/dependency.py#L81
 so it is related to the version of the sdk being used.

You don't need the release if you're on master as I can run  the following 
command and see logs about failing to import the dataflow package on head and 
success for the builtin plugins.

{code:none}
python -m apache_beam.examples.wordcount --output $BUCKET/wc/output --project 
$PROJECT --staging_location $BUCKET/wc/staging --temp_location $BUCKET/wc/tmp  
--job_name "sourabhbajaj-wc-4" --runner DataflowRunner --sdk_location 
dist/apache-beam-2.1.0.dev0.tar.gz  --beam_plugin dataflow.s3.aws.S3FS
{code}


{code:none}
13:54:41.574
Failed to import beam plugin dataflow.s3.aws.S3FS
13:54:41.573
Successfully imported beam plugin apache_beam.io.filesystem.FileSystem
13:54:41.573
Successfully imported beam plugin apache_beam.io.localfilesystem.LocalFileSystem
13:54:41.573
Successfully imported beam plugin apache_beam.io.gcp.gcsfilesystem.GCSFileSystem
{code}




was (Author: sb2nov):
I think the worker is specified in 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/internal/dependency.py#L81
 so it is related to the version of the sdk being used.

I might be wrong about needing the release if you're on master as I can run  
the following command and see logs about failing to import the dataflow package 
on head and success for the builtin plugins.

{code:none}
python -m apache_beam.examples.wordcount --output $BUCKET/wc/output --project 
$PROJECT --staging_location $BUCKET/wc/staging --temp_location $BUCKET/wc/tmp  
--job_name "sourabhbajaj-wc-4" --runner DataflowRunner --sdk_location 
dist/apache-beam-2.1.0.dev0.tar.gz  --beam_plugin dataflow.s3.aws.S3FS
{code}


{code:none}
13:54:41.574
Failed to import beam plugin dataflow.s3.aws.S3FS
13:54:41.573
Successfully imported beam plugin apache_beam.io.filesystem.FileSystem
13:54:41.573
Successfully imported beam plugin apache_beam.io.localfilesystem.LocalFileSystem
13:54:41.573
Successfully imported beam plugin apache_beam.io.gcp.gcsfilesystem.GCSFileSystem
{code}



> Better filesystem discovery mechanism in Python SDK
> ---
>
> Key: BEAM-2573
> URL: https://issues.apache.org/jira/browse/BEAM-2573
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow, sdk-py
>Affects Versions: 2.0.0
>Reporter: Dmitry Demeshchuk
>Priority: Minor
>
> It looks like right now custom filesystem classes have to be imported 
> explicitly: 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/filesystems.py#L30
> Seems like the current implementation doesn't allow discovering filesystems 
> that come from side packages, not from apache_beam itself. Even if I put a 
> custom FileSystem-inheriting class into a package and explicitly import it in 
> the root __init__.py of that package, it still doesn't make the class 
> discoverable.
> The problems I'm experiencing happen on Dataflow runner, while Direct runner 
> works just fine. Here's an example of Dataflow output:
> {code}
>   (320418708fe777d7): Traceback (most recent call last):
>   File 
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 
> 581, in do_work
> work_executor.execute()
>   File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", 
> line 166, in execute
> op.start()
>   File 
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/native_operations.py",
>  line 54, in start
> self.output(windowed_value)
>   File "dataflow_worker/operations.py", line 138, in 
> dataflow_worker.operations.Operation.output 
> (dataflow_worker/operations.c:5768)
> def output(self, windowed_value, output_index=0):
>   File "dataflow_worker/operations.py", line 139, in 
> dataflow_worker.operations.Operation.output 
> (dataflow_worker/operations.c:5654)
> cython.cast(Receiver, 
> self.receivers[output_index]).receive(windowed_value)
>   File "dataflow_worker/operations.py", line 72, in 
> dataflow_worker.operations.ConsumerSet.receive 
> (dataflow_worker/operations.c:3506)
> cython.cast(Operation, consumer).process(windowed_value)
>   File "dataflow_worker/operations.py", line 328, in 
> dataflow_worker.operations.DoOperation.process 
> (dataflow_worker/operations.c:11162)
> with self.scoped_process_state:
>   File "dataflow_worker/operations.py", line 329, in 
> dataflow_worker.operations.DoOperation.process 
> (dataflow_worker/operations.c:6)
> self.dofn_receiver.

[jira] [Commented] (BEAM-2573) Better filesystem discovery mechanism in Python SDK

2017-07-10 Thread Dmitry Demeshchuk (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081179#comment-16081179
 ] 

Dmitry Demeshchuk commented on BEAM-2573:
-

[~altay]: I was doing exactly the option 2, yes.

My guess is that there are some cached dependency issues, I've stumbled upon 
cached dependencies before. Will try to clean the virtualenv, install 
everything from scratch and see how it goes.

> Better filesystem discovery mechanism in Python SDK
> ---
>
> Key: BEAM-2573
> URL: https://issues.apache.org/jira/browse/BEAM-2573
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow, sdk-py
>Affects Versions: 2.0.0
>Reporter: Dmitry Demeshchuk
>Priority: Minor
>
> It looks like right now custom filesystem classes have to be imported 
> explicitly: 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/filesystems.py#L30
> Seems like the current implementation doesn't allow discovering filesystems 
> that come from side packages, not from apache_beam itself. Even if I put a 
> custom FileSystem-inheriting class into a package and explicitly import it in 
> the root __init__.py of that package, it still doesn't make the class 
> discoverable.
> The problems I'm experiencing happen on Dataflow runner, while Direct runner 
> works just fine. Here's an example of Dataflow output:
> {code}
>   (320418708fe777d7): Traceback (most recent call last):
>   File 
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 
> 581, in do_work
> work_executor.execute()
>   File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", 
> line 166, in execute
> op.start()
>   File 
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/native_operations.py",
>  line 54, in start
> self.output(windowed_value)
>   File "dataflow_worker/operations.py", line 138, in 
> dataflow_worker.operations.Operation.output 
> (dataflow_worker/operations.c:5768)
> def output(self, windowed_value, output_index=0):
>   File "dataflow_worker/operations.py", line 139, in 
> dataflow_worker.operations.Operation.output 
> (dataflow_worker/operations.c:5654)
> cython.cast(Receiver, 
> self.receivers[output_index]).receive(windowed_value)
>   File "dataflow_worker/operations.py", line 72, in 
> dataflow_worker.operations.ConsumerSet.receive 
> (dataflow_worker/operations.c:3506)
> cython.cast(Operation, consumer).process(windowed_value)
>   File "dataflow_worker/operations.py", line 328, in 
> dataflow_worker.operations.DoOperation.process 
> (dataflow_worker/operations.c:11162)
> with self.scoped_process_state:
>   File "dataflow_worker/operations.py", line 329, in 
> dataflow_worker.operations.DoOperation.process 
> (dataflow_worker/operations.c:6)
> self.dofn_receiver.receive(o)
>   File "apache_beam/runners/common.py", line 382, in 
> apache_beam.runners.common.DoFnRunner.receive 
> (apache_beam/runners/common.c:10156)
> self.process(windowed_value)
>   File "apache_beam/runners/common.py", line 390, in 
> apache_beam.runners.common.DoFnRunner.process 
> (apache_beam/runners/common.c:10458)
> self._reraise_augmented(exn)
>   File "apache_beam/runners/common.py", line 431, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented 
> (apache_beam/runners/common.c:11673)
> raise new_exn, None, original_traceback
>   File "apache_beam/runners/common.py", line 388, in 
> apache_beam.runners.common.DoFnRunner.process 
> (apache_beam/runners/common.c:10371)
> self.do_fn_invoker.invoke_process(windowed_value)
>   File "apache_beam/runners/common.py", line 281, in 
> apache_beam.runners.common.PerWindowInvoker.invoke_process 
> (apache_beam/runners/common.c:8626)
> self._invoke_per_window(windowed_value)
>   File "apache_beam/runners/common.py", line 307, in 
> apache_beam.runners.common.PerWindowInvoker._invoke_per_window 
> (apache_beam/runners/common.c:9302)
> windowed_value, self.process_method(*args_for_process))
>   File 
> "/Users/dmitrydemeshchuk/miniconda2/lib/python2.7/site-packages/apache_beam/transforms/core.py",
>  line 749, in 
>   File 
> "/Users/dmitrydemeshchuk/miniconda2/lib/python2.7/site-packages/apache_beam/io/iobase.py",
>  line 891, in 
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/options/value_provider.py",
>  line 109, in _f
> return fnc(self, *args, **kwargs)
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/filebasedsink.py", 
> line 146, in initialize_write
> tmp_dir = self._create_temp_dir(file_path_prefix)
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/filebasedsink.py", 
> line 151, in _create_temp_dir
> base_path, last_component = FileSystems.split(file_path_prefix)
>   File 
> "/usr/local/

[jira] [Commented] (BEAM-2530) Make Beam compatible with Java 9

2017-07-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081224#comment-16081224
 ] 

Ismaël Mejía commented on BEAM-2530:


No, it was marked 2.1.0 by some confusion. I just changed it, all current 
changes are already in master, but there are upstream fixes (maven-enforcer + 
auto) that we need also first, so not blocking at all.

> Make Beam compatible with Java 9
> 
>
> Key: BEAM-2530
> URL: https://issues.apache.org/jira/browse/BEAM-2530
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: Not applicable
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: Not applicable
>
>
> Java 9 seems to be finally been released this year, this is a JIRA to keep 
> track of the needed changes to support Beam on Java 9.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-2530) Make Beam compatible with Java 9

2017-07-10 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/BEAM-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-2530:
---
Fix Version/s: (was: 2.1.0)
   Not applicable

> Make Beam compatible with Java 9
> 
>
> Key: BEAM-2530
> URL: https://issues.apache.org/jira/browse/BEAM-2530
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: Not applicable
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: Not applicable
>
>
> Java 9 seems to be finally been released this year, this is a JIRA to keep 
> track of the needed changes to support Beam on Java 9.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2534) KafkaIO should allow gaps in message offsets

2017-07-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080984#comment-16080984
 ] 

ASF GitHub Bot commented on BEAM-2534:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3514


> KafkaIO should allow gaps in message offsets
> 
>
> Key: BEAM-2534
> URL: https://issues.apache.org/jira/browse/BEAM-2534
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Affects Versions: 2.0.0
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
>Priority: Minor
> Fix For: 2.1.0
>
>
> KafkaIO reader logs a warning when it notices gaps in offsets for messages. 
> While such gaps are not expected for normal Kafka topics, there could be gaps 
> when log compaction is enabled (which deletes older messages for a key). 
> This warning log is not very useful. Also we should take such gaps while 
> estimating backlog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3536: [BEAM-2371] Use URNs, not Java classes, in immutabi...

2017-07-10 Thread kennknowles
GitHub user kennknowles opened a pull request:

https://github.com/apache/beam/pull/3536

[BEAM-2371] Use URNs, not Java classes, in immutability enforcements

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`.
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---

R: @tgroh peeled this off #3334 as I think it is separable.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/beam DirectRunner-enforcements

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3536.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3536


commit 311547aa561bb314a8fe743b6f4677a2eaaaca50
Author: Kenneth Knowles 
Date:   2017-07-10T22:25:11Z

Use URNs, not Java classes, in immutability enforcements




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-2371) Make Java DirectRunner demonstrate language-agnostic Runner API translation wrappers

2017-07-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081293#comment-16081293
 ] 

ASF GitHub Bot commented on BEAM-2371:
--

GitHub user kennknowles opened a pull request:

https://github.com/apache/beam/pull/3536

[BEAM-2371] Use URNs, not Java classes, in immutability enforcements

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`.
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---

R: @tgroh peeled this off #3334 as I think it is separable.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/beam DirectRunner-enforcements

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3536.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3536


commit 311547aa561bb314a8fe743b6f4677a2eaaaca50
Author: Kenneth Knowles 
Date:   2017-07-10T22:25:11Z

Use URNs, not Java classes, in immutability enforcements




> Make Java DirectRunner demonstrate language-agnostic Runner API translation 
> wrappers
> 
>
> Key: BEAM-2371
> URL: https://issues.apache.org/jira/browse/BEAM-2371
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> This will complete the PoC for runners-core-construction-java and the Runner 
> API and show other runners the easy path to executing non-Java pipelines, 
> modulo Fn API.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3482: [BEAM-1348] Remove deprecated concepts in Fn API (n...

2017-07-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3482


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: [BEAM-1348] Remove deprecated concepts in Fn API (now replaced with Runner API concepts).

2017-07-10 Thread lcwik
[BEAM-1348] Remove deprecated concepts in Fn API (now replaced with Runner API 
concepts).

This closes #3482


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/9d48bd5e
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/9d48bd5e
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/9d48bd5e

Branch: refs/heads/master
Commit: 9d48bd5e85f568fe0c00f304f9ae022181e25160
Parents: 9f904dc 521488f
Author: Luke Cwik 
Authored: Mon Jul 10 15:53:29 2017 -0700
Committer: Luke Cwik 
Committed: Mon Jul 10 15:53:29 2017 -0700

--
 .../fn-api/src/main/proto/beam_fn_api.proto | 151 +--
 .../harness/control/ProcessBundleHandler.java   |   4 +-
 .../fn/harness/control/RegisterHandler.java |   2 +-
 .../fn/harness/control/RegisterHandlerTest.java |   8 +-
 .../apache_beam/runners/pipeline_context.py |   2 +-
 .../runners/portability/fn_api_runner.py|   2 +-
 .../apache_beam/runners/worker/sdk_worker.py|   4 +-
 .../runners/worker/sdk_worker_test.py   |  16 +-
 8 files changed, 25 insertions(+), 164 deletions(-)
--




[1/2] beam git commit: [BEAM-1348] Remove deprecated concepts in Fn API (now replaced with Runner API concepts).

2017-07-10 Thread lcwik
Repository: beam
Updated Branches:
  refs/heads/master 9f904dc00 -> 9d48bd5e8


[BEAM-1348] Remove deprecated concepts in Fn API (now replaced with Runner API 
concepts).


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/521488f8
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/521488f8
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/521488f8

Branch: refs/heads/master
Commit: 521488f8239547c7e93c30e75ecb2462ff114cb8
Parents: 9f904dc
Author: Luke Cwik 
Authored: Fri Jun 30 10:21:55 2017 -0700
Committer: Luke Cwik 
Committed: Mon Jul 10 15:53:07 2017 -0700

--
 .../fn-api/src/main/proto/beam_fn_api.proto | 151 +--
 .../harness/control/ProcessBundleHandler.java   |   4 +-
 .../fn/harness/control/RegisterHandler.java |   2 +-
 .../fn/harness/control/RegisterHandlerTest.java |   8 +-
 .../apache_beam/runners/pipeline_context.py |   2 +-
 .../runners/portability/fn_api_runner.py|   2 +-
 .../apache_beam/runners/worker/sdk_worker.py|   4 +-
 .../runners/worker/sdk_worker_test.py   |  16 +-
 8 files changed, 25 insertions(+), 164 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/521488f8/sdks/common/fn-api/src/main/proto/beam_fn_api.proto
--
diff --git a/sdks/common/fn-api/src/main/proto/beam_fn_api.proto 
b/sdks/common/fn-api/src/main/proto/beam_fn_api.proto
index 8162bc5..9da5afe 100644
--- a/sdks/common/fn-api/src/main/proto/beam_fn_api.proto
+++ b/sdks/common/fn-api/src/main/proto/beam_fn_api.proto
@@ -38,7 +38,6 @@ option java_package = "org.apache.beam.fn.v1";
 option java_outer_classname = "BeamFnApi";
 
 import "beam_runner_api.proto";
-import "google/protobuf/any.proto";
 import "google/protobuf/timestamp.proto";
 
 /*
@@ -67,129 +66,6 @@ message Target {
   string name = 2;
 }
 
-// (Deprecated) Information defining a PCollection
-//
-// Migrate to Runner API.
-message PCollection {
-  // (Required) A reference to a coder.
-  string coder_reference = 1 [deprecated = true];
-
-  // TODO: Windowing strategy, ...
-}
-
-// (Deprecated) A primitive transform within Apache Beam.
-//
-// Migrate to Runner API.
-message PrimitiveTransform {
-  // (Required) A pipeline level unique id which can be used as a reference to
-  // refer to this.
-  string id = 1 [deprecated = true];
-
-  // (Required) A function spec that is used by this primitive
-  // transform to process data.
-  FunctionSpec function_spec = 2 [deprecated = true];
-
-  // A map of distinct input names to target definitions.
-  // For example, in CoGbk this represents the tag name associated with each
-  // distinct input name and a list of primitive transforms that are associated
-  // with the specified input.
-  map inputs = 3 [deprecated = true];
-
-  // A map from local output name to PCollection definitions. For example, in
-  // DoFn this represents the tag name associated with each distinct output.
-  map outputs = 4 [deprecated = true];
-
-  // TODO: Should we model side inputs as a special type of input for a
-  // primitive transform or should it be modeled as the relationship that
-  // the predecessor input will be a view primitive transform.
-  // A map of from side input names to side inputs.
-  map side_inputs = 5 [deprecated = true];
-
-  // The user name of this step.
-  // TODO: This should really be in display data and not at this level
-  string step_name = 6 [deprecated = true];
-}
-
-/*
- * User Definable Functions
- *
- * This is still unstable mainly due to how we model the side input.
- */
-
-// (Deprecated) Defines the common elements of user-definable functions,
-// to allow the SDK to express the information the runner needs to execute 
work.
-//
-// Migrate to Runner API.
-message FunctionSpec {
-  // (Required) A pipeline level unique id which can be used as a reference to
-  // refer to this.
-  string id = 1 [deprecated = true];
-
-  // (Required) A globally unique name representing this user definable
-  // function.
-  //
-  // User definable functions use the urn encodings registered such that 
another
-  // may implement the user definable function within another language.
-  //
-  // For example:
-  //urn:org.apache.beam:coder:kv:1.0
-  string urn = 2 [deprecated = true];
-
-  // (Required) Reference to specification of execution environment required to
-  // invoke this function.
-  string environment_reference = 3 [deprecated = true];
-
-  // Data used to parameterize this function. Depending on the urn, this may be
-  // optional or required.
-  google.protobuf.Any data = 4 [deprecated = true];
-}
-
-// (Deprecated) Migrate to Runner API.
-message SideInput {
-  // TODO: Coder?
-
-  // For RunnerAPI.
-  Target input = 1 [deprecated = true];
-
-  // For FnAPI.

[jira] [Commented] (BEAM-1348) Model the Fn Api

2017-07-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081311#comment-16081311
 ] 

ASF GitHub Bot commented on BEAM-1348:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3482


> Model the Fn Api
> 
>
> Key: BEAM-1348
> URL: https://issues.apache.org/jira/browse/BEAM-1348
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-fn-api
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>
> Create a proto representation of the services and data types required to 
> execute the Fn Api.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Flink #3363

2017-07-10 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2597

2017-07-10 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-2572) Implement an S3 filesystem for Python SDK

2017-07-10 Thread Dmitry Demeshchuk (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081374#comment-16081374
 ] 

Dmitry Demeshchuk commented on BEAM-2572:
-

How about the following plan, then?

1. Add an ability to hide pipeline options. For example, extend 
{{_BeamArgumentParser}} by overloading the {{add_argument}} method, adding a 
{{hidden=False}} parameter there.
2. Add an {{AWSOptions}} class that inherits from {{PipelineOptions}} and 
provides hidden options {{aws_access_key_id}}, {{aws_secret_access_key}} and 
{{aws_default_region}}.
3. Add an AWS extra package to {{apache_beam}} (similar to 
{{apache_beam[gcp]}}), which depends on boto and contains all the AWS-related 
code.
4. Add an ability for filesystems to be aware of the pipeline options.
5. Add the actual S3 filesystem.

I can make the corresponding tickets and start working on them.

> Implement an S3 filesystem for Python SDK
> -
>
> Key: BEAM-2572
> URL: https://issues.apache.org/jira/browse/BEAM-2572
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py
>Reporter: Dmitry Demeshchuk
>Assignee: Ahmet Altay
>Priority: Minor
>
> There are two paths worth exploring, to my understanding:
> 1. Sticking to the HDFS-based approach (like it's done in Java).
> 2. Using boto/boto3 for accessing S3 through its common API endpoints.
> I personally prefer the second approach, for a few reasons:
> 1. In real life, HDFS and S3 have different consistency guarantees, therefore 
> their behaviors may contradict each other in some edge cases (say, we write 
> something to S3, but it's not immediately accessible for reading from another 
> end).
> 2. There are other AWS-based sources and sinks we may want to create in the 
> future: DynamoDB, Kinesis, SQS, etc.
> 3. boto3 already provides somewhat good logic for basic things like 
> reattempting.
> Whatever path we choose, there's another problem related to this: we 
> currently cannot pass any global settings (say, pipeline options, or just an 
> arbitrary kwarg) to a filesystem. Because of that, we'd have to setup the 
> runner nodes to have AWS keys set up in the environment, which is not trivial 
> to achieve and doesn't look too clean either (I'd rather see one single place 
> for configuring the runner options).
> Also, it's worth mentioning that I already have a janky S3 filesystem 
> implementation that only supports DirectRunner at the moment (because of the 
> previous paragraph). I'm perfectly fine finishing it myself, with some 
> guidance from the maintainers.
> Where should I move on from here, and whose input should I be looking for?
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Flink #3364

2017-07-10 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2598

2017-07-10 Thread Apache Jenkins Server
See 




[GitHub] beam pull request #3537: TestPipeline should support errors in expand

2017-07-10 Thread bjchambers
GitHub user bjchambers opened a pull request:

https://github.com/apache/beam/pull/3537

TestPipeline should support errors in expand

Writing a test that expects an exception during transform application is
currently not possible with TestPipeline in a NeedsRunner or
ValidatesRunner test. The exception causes the pipeline to be unrunnable.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [*] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [*] Make sure tests pass via `mvn clean verify`.
 - [*] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [*] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bjchambers/beam 
test-pipeline-construction-errors

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3537.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3537


commit 5d820c690540322f2d3c5869061703b8e460069f
Author: bchambers 
Date:   2017-07-11T00:29:24Z

TestPipeline should support errors in expand

Writing a test that expects an exception during transform application is
currently not possible with TestPipeline in a NeedsRunner or
ValidatesRunner test. The exception causes the pipeline to be unrunnable.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #3538: Add a test for Avro write with RVP; fix code

2017-07-10 Thread bjchambers
GitHub user bjchambers opened a pull request:

https://github.com/apache/beam/pull/3538

Add a test for Avro write with RVP; fix code

Add a test for AvroIO using RuntimeValueProvider

Make AvroIO actually work with RuntimeValueProvider. Previously it
caused the code to be non-serializable.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [*] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [*] Make sure tests pass via `mvn clean verify`.
 - [*] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [*] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bjchambers/beam avro-io

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3538.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3538


commit 08f1adafa2792d09eecc173d05cb2650a4f9eedb
Author: bchambers 
Date:   2017-07-10T23:41:15Z

Add a test for Avro write with RVP; fix code

Add a test for AvroIO using RuntimeValueProvider

Make AvroIO actually work with RuntimeValueProvider. Previously it
caused the code to be non-serializable.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Build failed in Jenkins: beam_PostCommit_Java_MavenInstall #4342

2017-07-10 Thread Apache Jenkins Server
See 


--
[...truncated 1.47 MB...]
2017-07-11T00:39:25.361 [INFO] Excluding 
com.google.http-client:google-http-client:jar:1.22.0 from the shaded jar.
2017-07-11T00:39:25.361 [INFO] Excluding 
org.apache.httpcomponents:httpclient:jar:4.0.1 from the shaded jar.
2017-07-11T00:39:25.361 [INFO] Excluding 
org.apache.httpcomponents:httpcore:jar:4.0.1 from the shaded jar.
2017-07-11T00:39:25.361 [INFO] Excluding commons-codec:commons-codec:jar:1.3 
from the shaded jar.
2017-07-11T00:39:25.361 [INFO] Excluding 
com.google.cloud.bigdataoss:util:jar:1.4.5 from the shaded jar.
2017-07-11T00:39:25.361 [INFO] Excluding 
com.google.api-client:google-api-client-java6:jar:1.22.0 from the shaded jar.
2017-07-11T00:39:25.361 [INFO] Excluding 
com.google.api-client:google-api-client-jackson2:jar:1.22.0 from the shaded jar.
2017-07-11T00:39:25.361 [INFO] Excluding 
com.google.oauth-client:google-oauth-client-java6:jar:1.22.0 from the shaded 
jar.
2017-07-11T00:39:25.361 [INFO] Excluding 
com.google.auth:google-auth-library-oauth2-http:jar:0.6.1 from the shaded jar.
2017-07-11T00:39:25.361 [INFO] Excluding 
com.google.auth:google-auth-library-credentials:jar:0.6.1 from the shaded jar.
2017-07-11T00:39:25.361 [INFO] Excluding org.apache.avro:avro:jar:1.8.2 from 
the shaded jar.
2017-07-11T00:39:25.361 [INFO] Excluding 
org.codehaus.jackson:jackson-core-asl:jar:1.9.13 from the shaded jar.
2017-07-11T00:39:25.361 [INFO] Excluding 
org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13 from the shaded jar.
2017-07-11T00:39:25.361 [INFO] Excluding 
com.thoughtworks.paranamer:paranamer:jar:2.7 from the shaded jar.
2017-07-11T00:39:25.361 [INFO] Excluding org.tukaani:xz:jar:1.5 from the shaded 
jar.
2017-07-11T00:39:25.361 [INFO] Excluding 
com.google.apis:google-api-services-pubsub:jar:v1-rev10-1.22.0 from the shaded 
jar.
2017-07-11T00:39:25.361 [INFO] Including com.google.guava:guava:jar:20.0 in the 
shaded jar.
2017-07-11T00:39:25.361 [INFO] Excluding 
com.google.cloud.datastore:datastore-v1-proto-client:jar:1.4.0 from the shaded 
jar.
2017-07-11T00:39:25.361 [INFO] Excluding 
com.google.http-client:google-http-client-protobuf:jar:1.22.0 from the shaded 
jar.
2017-07-11T00:39:25.361 [INFO] Excluding 
com.google.http-client:google-http-client-jackson:jar:1.22.0 from the shaded 
jar.
2017-07-11T00:39:25.361 [INFO] Excluding 
com.google.cloud.datastore:datastore-v1-protos:jar:1.3.0 from the shaded jar.
2017-07-11T00:39:25.361 [INFO] Excluding 
com.google.api.grpc:grpc-google-common-protos:jar:0.1.9 from the shaded jar.
2017-07-11T00:39:25.361 [INFO] Excluding joda-time:joda-time:jar:2.4 from the 
shaded jar.
2017-07-11T00:39:25.361 [INFO] Excluding org.slf4j:slf4j-api:jar:1.7.14 from 
the shaded jar.
2017-07-11T00:39:25.361 [INFO] Excluding org.slf4j:slf4j-jdk14:jar:1.7.14 from 
the shaded jar.
2017-07-11T00:39:25.361 [INFO] Excluding 
org.apache.beam:beam-sdks-common-runner-api:jar:2.2.0-SNAPSHOT from the shaded 
jar.
2017-07-11T00:39:25.361 [INFO] Excluding 
org.apache.beam:beam-runners-core-construction-java:jar:2.2.0-SNAPSHOT from the 
shaded jar.
2017-07-11T00:39:25.361 [INFO] Excluding 
org.apache.beam:beam-runners-google-cloud-dataflow-java:jar:2.2.0-SNAPSHOT from 
the shaded jar.
2017-07-11T00:39:25.361 [INFO] Excluding 
com.google.apis:google-api-services-dataflow:jar:v1b3-rev196-1.22.0 from the 
shaded jar.
2017-07-11T00:39:25.361 [INFO] Excluding 
com.google.apis:google-api-services-clouddebugger:jar:v2-rev8-1.22.0 from the 
shaded jar.
2017-07-11T00:39:27.089 [INFO] Replacing original artifact with shaded artifact.
2017-07-11T00:39:27.089 [INFO] Replacing 

 with 

2017-07-11T00:39:27.089 [INFO] Replacing original test artifact with shaded 
test artifact.
2017-07-11T00:39:27.089 [INFO] Replacing 

 with 

2017-07-11T00:39:27.089 [INFO] Dependency-reduced POM written at: 

2017-07-11T00:39:28.144 [INFO] 
2017-07-11T00:39:28.144 [INFO] --- maven-javadoc-plugin:2.10.4:jar 
(attach-javadocs) @ beam-examples-java ---
2017-07-11T00:39:30.806 [INFO] Building jar: 

2017-07-11T00:39:31.016 [INFO] 
2017-07-11T00:39:31.016 [INFO] --- maven-sourc

[1/5] beam git commit: Adds DynamicDestinations support to FileBasedSink

2017-07-10 Thread jkff
Repository: beam
Updated Branches:
  refs/heads/master 9d48bd5e8 -> c14a3184e


http://git-wip-us.apache.org/repos/asf/beam/blob/77ba7a35/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOTest.java
--
diff --git 
a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOTest.java 
b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOTest.java
index 9468893..8797ff7 100644
--- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOTest.java
+++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextIOTest.java
@@ -42,7 +42,9 @@ import static org.junit.Assert.assertThat;
 import static org.junit.Assert.assertTrue;
 
 import com.google.common.base.Function;
+import com.google.common.base.Functions;
 import com.google.common.base.Predicate;
+import com.google.common.base.Predicates;
 import com.google.common.collect.FluentIterable;
 import com.google.common.collect.ImmutableList;
 import com.google.common.collect.Iterables;
@@ -69,22 +71,31 @@ import java.util.zip.GZIPOutputStream;
 import java.util.zip.ZipEntry;
 import java.util.zip.ZipOutputStream;
 import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.AvroCoder;
 import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.coders.DefaultCoder;
 import org.apache.beam.sdk.coders.StringUtf8Coder;
 import org.apache.beam.sdk.io.BoundedSource.BoundedReader;
+import org.apache.beam.sdk.io.DefaultFilenamePolicy.Params;
+import org.apache.beam.sdk.io.FileBasedSink.DynamicDestinations;
+import org.apache.beam.sdk.io.FileBasedSink.FilenamePolicy;
 import org.apache.beam.sdk.io.FileBasedSink.WritableByteChannelFactory;
 import org.apache.beam.sdk.io.TextIO.CompressionType;
 import org.apache.beam.sdk.io.fs.MatchResult;
 import org.apache.beam.sdk.io.fs.MatchResult.Metadata;
+import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions;
+import org.apache.beam.sdk.io.fs.ResourceId;
 import org.apache.beam.sdk.options.PipelineOptions;
 import org.apache.beam.sdk.options.PipelineOptionsFactory;
 import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
 import org.apache.beam.sdk.testing.NeedsRunner;
 import org.apache.beam.sdk.testing.PAssert;
 import org.apache.beam.sdk.testing.SourceTestUtils;
 import org.apache.beam.sdk.testing.TestPipeline;
 import org.apache.beam.sdk.testing.ValidatesRunner;
 import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.SerializableFunction;
 import org.apache.beam.sdk.transforms.display.DisplayData;
 import org.apache.beam.sdk.transforms.display.DisplayDataEvaluator;
 import org.apache.beam.sdk.util.CoderUtils;
@@ -205,7 +216,7 @@ public class TextIOTest {
 });
   }
 
-  private  void runTestRead(String[] expected) throws Exception {
+  private void runTestRead(String[] expected) throws Exception {
 File tmpFile = Files.createTempFile(tempFolder, "file", "txt").toFile();
 String filename = tmpFile.getPath();
 
@@ -274,6 +285,213 @@ public class TextIOTest {
 displayData, hasItem(hasDisplayItem(hasValue(startsWith("foobar");
   }
 
+  static class TestDynamicDestinations extends DynamicDestinations {
+ResourceId baseDir;
+
+TestDynamicDestinations(ResourceId baseDir) {
+  this.baseDir = baseDir;
+}
+
+@Override
+public String getDestination(String element) {
+  // Destination is based on first character of string.
+  return element.substring(0, 1);
+}
+
+@Override
+public String getDefaultDestination() {
+  return "";
+}
+
+@Nullable
+@Override
+public Coder getDestinationCoder() {
+  return StringUtf8Coder.of();
+}
+
+@Override
+public FilenamePolicy getFilenamePolicy(String destination) {
+  return DefaultFilenamePolicy.fromStandardParameters(
+  StaticValueProvider.of(
+  baseDir.resolve("file_" + destination + ".txt", 
StandardResolveOptions.RESOLVE_FILE)),
+  null,
+  null,
+  false);
+}
+  }
+
+  class StartsWith implements Predicate {
+String prefix;
+
+StartsWith(String prefix) {
+  this.prefix = prefix;
+}
+
+@Override
+public boolean apply(@Nullable String input) {
+  return input.startsWith(prefix);
+}
+  }
+
+  @Test
+  @Category(NeedsRunner.class)
+  public void testDynamicDestinations() throws Exception {
+ResourceId baseDir =
+FileSystems.matchNewResource(
+Files.createTempDirectory(tempFolder, 
"testDynamicDestinations").toString(), true);
+
+List elements = Lists.newArrayList("", "aaab", "baaa", "baab", 
"caaa", "caab");
+PCollection input = 
p.apply(Create.of(elements).withCoder(StringUtf8Coder.of()));
+input.apply(
+TextIO.write()
+.to(new TestDynamicDestinations(baseDir))
+
.withTempDirectory(FileSystems.matchNewResource(baseDir.toString(), true))

[3/5] beam git commit: Adds DynamicDestinations support to FileBasedSink

2017-07-10 Thread jkff
http://git-wip-us.apache.org/repos/asf/beam/blob/77ba7a35/sdks/java/core/src/main/java/org/apache/beam/sdk/io/DynamicFileDestinations.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/DynamicFileDestinations.java
 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/DynamicFileDestinations.java
new file mode 100644
index 000..e7ef0f6
--- /dev/null
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/DynamicFileDestinations.java
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.sdk.io;
+
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.DefaultFilenamePolicy.Params;
+import org.apache.beam.sdk.io.DefaultFilenamePolicy.ParamsCoder;
+import org.apache.beam.sdk.io.FileBasedSink.DynamicDestinations;
+import org.apache.beam.sdk.io.FileBasedSink.FilenamePolicy;
+import org.apache.beam.sdk.transforms.SerializableFunction;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+
+/** Some helper classes that derive from {@link 
FileBasedSink.DynamicDestinations}. */
+public class DynamicFileDestinations {
+  /** Always returns a constant {@link FilenamePolicy}. */
+  private static class ConstantFilenamePolicy extends 
DynamicDestinations {
+private final FilenamePolicy filenamePolicy;
+
+public ConstantFilenamePolicy(FilenamePolicy filenamePolicy) {
+  this.filenamePolicy = filenamePolicy;
+}
+
+@Override
+public Void getDestination(T element) {
+  return (Void) null;
+}
+
+@Override
+public Coder getDestinationCoder() {
+  return null;
+}
+
+@Override
+public Void getDefaultDestination() {
+  return (Void) null;
+}
+
+@Override
+public FilenamePolicy getFilenamePolicy(Void destination) {
+  return filenamePolicy;
+}
+
+@Override
+public void populateDisplayData(DisplayData.Builder builder) {
+  filenamePolicy.populateDisplayData(builder);
+}
+  }
+
+  /**
+   * A base class for a {@link DynamicDestinations} object that returns 
differently-configured
+   * instances of {@link DefaultFilenamePolicy}.
+   */
+  private static class DefaultPolicyDestinations extends 
DynamicDestinations {
+SerializableFunction destinationFunction;
+Params emptyDestination;
+
+public DefaultPolicyDestinations(
+SerializableFunction destinationFunction, Params 
emptyDestination) {
+  this.destinationFunction = destinationFunction;
+  this.emptyDestination = emptyDestination;
+}
+
+@Override
+public Params getDestination(UserT element) {
+  return destinationFunction.apply(element);
+}
+
+@Override
+public Params getDefaultDestination() {
+  return emptyDestination;
+}
+
+@Nullable
+@Override
+public Coder getDestinationCoder() {
+  return ParamsCoder.of();
+}
+
+@Override
+public FilenamePolicy getFilenamePolicy(DefaultFilenamePolicy.Params 
params) {
+  return DefaultFilenamePolicy.fromParams(params);
+}
+  }
+
+  /** Returns a {@link DynamicDestinations} that always returns the same 
{@link FilenamePolicy}. */
+  public static  DynamicDestinations constant(FilenamePolicy 
filenamePolicy) {
+return new ConstantFilenamePolicy<>(filenamePolicy);
+  }
+
+  /**
+   * Returns a {@link DynamicDestinations} that returns instances of {@link 
DefaultFilenamePolicy}
+   * configured with the given {@link Params}.
+   */
+  public static  DynamicDestinations toDefaultPolicies(
+  SerializableFunction destinationFunction, Params 
emptyDestination) {
+return new DefaultPolicyDestinations<>(destinationFunction, 
emptyDestination);
+  }
+}

http://git-wip-us.apache.org/repos/asf/beam/blob/77ba7a35/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java
index 8102316..583af60 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io

[5/5] beam git commit: This closes #3356: [BEAM-92] Allow value-dependent files in FileBasedSink

2017-07-10 Thread jkff
This closes #3356: [BEAM-92] Allow value-dependent files in FileBasedSink


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/c14a3184
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/c14a3184
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/c14a3184

Branch: refs/heads/master
Commit: c14a3184e69f3ba6d228a61b2218930537008da8
Parents: 9d48bd5 77ba7a3
Author: Eugene Kirpichov 
Authored: Mon Jul 10 18:10:19 2017 -0700
Committer: Eugene Kirpichov 
Committed: Mon Jul 10 18:10:19 2017 -0700

--
 .../examples/common/WriteOneFilePerWindow.java  |  52 +-
 .../beam/examples/WindowedWordCountIT.java  |   4 +-
 .../complete/game/utils/WriteToText.java|  43 +-
 .../construction/WriteFilesTranslation.java |  67 +-
 .../construction/PTransformMatchersTest.java|  22 +-
 .../construction/WriteFilesTranslationTest.java |  62 +-
 .../direct/WriteWithShardingFactory.java|   6 +-
 .../direct/WriteWithShardingFactoryTest.java|  18 +-
 .../beam/runners/dataflow/DataflowRunner.java   |  15 +-
 .../runners/dataflow/DataflowRunnerTest.java|  35 +-
 .../runners/spark/SparkRunnerDebuggerTest.java  |  26 +-
 .../src/main/proto/beam_runner_api.proto|   7 +-
 .../apache/beam/sdk/coders/ShardedKeyCoder.java |  66 ++
 .../java/org/apache/beam/sdk/io/AvroIO.java | 220 ---
 .../java/org/apache/beam/sdk/io/AvroSink.java   |  32 +-
 .../beam/sdk/io/DefaultFilenamePolicy.java  | 274 +---
 .../beam/sdk/io/DynamicFileDestinations.java| 115 
 .../org/apache/beam/sdk/io/FileBasedSink.java   | 513 ---
 .../java/org/apache/beam/sdk/io/TFRecordIO.java |  44 +-
 .../java/org/apache/beam/sdk/io/TextIO.java | 488 ++
 .../java/org/apache/beam/sdk/io/TextSink.java   |  22 +-
 .../java/org/apache/beam/sdk/io/WriteFiles.java | 640 +++
 .../sdk/transforms/SerializableFunctions.java   |  50 ++
 .../org/apache/beam/sdk/values/ShardedKey.java  |  65 ++
 .../java/org/apache/beam/sdk/io/AvroIOTest.java |  85 ++-
 .../beam/sdk/io/DefaultFilenamePolicyTest.java  | 135 ++--
 .../sdk/io/DrunkWritableByteChannelFactory.java |   2 +-
 .../apache/beam/sdk/io/FileBasedSinkTest.java   |  93 +--
 .../java/org/apache/beam/sdk/io/SimpleSink.java |  56 +-
 .../java/org/apache/beam/sdk/io/TextIOTest.java | 264 +++-
 .../org/apache/beam/sdk/io/WriteFilesTest.java  | 339 --
 .../beam/sdk/io/gcp/bigquery/BatchLoads.java|   2 +
 .../io/gcp/bigquery/DynamicDestinations.java|  29 +-
 .../io/gcp/bigquery/GenerateShardedTable.java   |   1 +
 .../beam/sdk/io/gcp/bigquery/ShardedKey.java|  67 --
 .../sdk/io/gcp/bigquery/ShardedKeyCoder.java|  74 ---
 .../sdk/io/gcp/bigquery/StreamingWriteFn.java   |   1 +
 .../io/gcp/bigquery/StreamingWriteTables.java   |   2 +
 .../sdk/io/gcp/bigquery/TagWithUniqueIds.java   |   1 +
 .../io/gcp/bigquery/WriteBundlesToFiles.java|   2 +
 .../bigquery/WriteGroupedRecordsToFiles.java|   1 +
 .../sdk/io/gcp/bigquery/WritePartition.java |   1 +
 .../beam/sdk/io/gcp/bigquery/WriteTables.java   |   1 +
 .../sdk/io/gcp/bigquery/BigQueryIOTest.java |   2 +
 .../java/org/apache/beam/sdk/io/xml/XmlIO.java  |   4 +-
 .../org/apache/beam/sdk/io/xml/XmlSink.java |  21 +-
 .../org/apache/beam/sdk/io/xml/XmlSinkTest.java |   4 +-
 47 files changed, 2710 insertions(+), 1363 deletions(-)
--




[4/5] beam git commit: Adds DynamicDestinations support to FileBasedSink

2017-07-10 Thread jkff
Adds DynamicDestinations support to FileBasedSink


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/77ba7a35
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/77ba7a35
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/77ba7a35

Branch: refs/heads/master
Commit: 77ba7a35cdae0b036791cce0682beefeb3fd809b
Parents: 9d48bd5
Author: Reuven Lax 
Authored: Fri Jun 9 17:11:32 2017 -0700
Committer: Eugene Kirpichov 
Committed: Mon Jul 10 18:05:33 2017 -0700

--
 .../examples/common/WriteOneFilePerWindow.java  |  52 +-
 .../beam/examples/WindowedWordCountIT.java  |   4 +-
 .../complete/game/utils/WriteToText.java|  43 +-
 .../construction/WriteFilesTranslation.java |  67 +-
 .../construction/PTransformMatchersTest.java|  22 +-
 .../construction/WriteFilesTranslationTest.java |  62 +-
 .../direct/WriteWithShardingFactory.java|   6 +-
 .../direct/WriteWithShardingFactoryTest.java|  18 +-
 .../beam/runners/dataflow/DataflowRunner.java   |  15 +-
 .../runners/dataflow/DataflowRunnerTest.java|  35 +-
 .../runners/spark/SparkRunnerDebuggerTest.java  |  26 +-
 .../src/main/proto/beam_runner_api.proto|   7 +-
 .../apache/beam/sdk/coders/ShardedKeyCoder.java |  66 ++
 .../java/org/apache/beam/sdk/io/AvroIO.java | 220 ---
 .../java/org/apache/beam/sdk/io/AvroSink.java   |  32 +-
 .../beam/sdk/io/DefaultFilenamePolicy.java  | 274 +---
 .../beam/sdk/io/DynamicFileDestinations.java| 115 
 .../org/apache/beam/sdk/io/FileBasedSink.java   | 513 ---
 .../java/org/apache/beam/sdk/io/TFRecordIO.java |  44 +-
 .../java/org/apache/beam/sdk/io/TextIO.java | 488 ++
 .../java/org/apache/beam/sdk/io/TextSink.java   |  22 +-
 .../java/org/apache/beam/sdk/io/WriteFiles.java | 640 +++
 .../sdk/transforms/SerializableFunctions.java   |  50 ++
 .../org/apache/beam/sdk/values/ShardedKey.java  |  65 ++
 .../java/org/apache/beam/sdk/io/AvroIOTest.java |  85 ++-
 .../beam/sdk/io/DefaultFilenamePolicyTest.java  | 135 ++--
 .../sdk/io/DrunkWritableByteChannelFactory.java |   2 +-
 .../apache/beam/sdk/io/FileBasedSinkTest.java   |  93 +--
 .../java/org/apache/beam/sdk/io/SimpleSink.java |  56 +-
 .../java/org/apache/beam/sdk/io/TextIOTest.java | 264 +++-
 .../org/apache/beam/sdk/io/WriteFilesTest.java  | 339 --
 .../beam/sdk/io/gcp/bigquery/BatchLoads.java|   2 +
 .../io/gcp/bigquery/DynamicDestinations.java|  29 +-
 .../io/gcp/bigquery/GenerateShardedTable.java   |   1 +
 .../beam/sdk/io/gcp/bigquery/ShardedKey.java|  67 --
 .../sdk/io/gcp/bigquery/ShardedKeyCoder.java|  74 ---
 .../sdk/io/gcp/bigquery/StreamingWriteFn.java   |   1 +
 .../io/gcp/bigquery/StreamingWriteTables.java   |   2 +
 .../sdk/io/gcp/bigquery/TagWithUniqueIds.java   |   1 +
 .../io/gcp/bigquery/WriteBundlesToFiles.java|   2 +
 .../bigquery/WriteGroupedRecordsToFiles.java|   1 +
 .../sdk/io/gcp/bigquery/WritePartition.java |   1 +
 .../beam/sdk/io/gcp/bigquery/WriteTables.java   |   1 +
 .../sdk/io/gcp/bigquery/BigQueryIOTest.java |   2 +
 .../java/org/apache/beam/sdk/io/xml/XmlIO.java  |   4 +-
 .../org/apache/beam/sdk/io/xml/XmlSink.java |  21 +-
 .../org/apache/beam/sdk/io/xml/XmlSinkTest.java |   4 +-
 47 files changed, 2710 insertions(+), 1363 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/77ba7a35/examples/java/src/main/java/org/apache/beam/examples/common/WriteOneFilePerWindow.java
--
diff --git 
a/examples/java/src/main/java/org/apache/beam/examples/common/WriteOneFilePerWindow.java
 
b/examples/java/src/main/java/org/apache/beam/examples/common/WriteOneFilePerWindow.java
index 5e6df9c..49865ba 100644
--- 
a/examples/java/src/main/java/org/apache/beam/examples/common/WriteOneFilePerWindow.java
+++ 
b/examples/java/src/main/java/org/apache/beam/examples/common/WriteOneFilePerWindow.java
@@ -17,11 +17,12 @@
  */
 package org.apache.beam.examples.common;
 
-import static com.google.common.base.Verify.verifyNotNull;
+import static com.google.common.base.MoreObjects.firstNonNull;
 
 import javax.annotation.Nullable;
 import org.apache.beam.sdk.io.FileBasedSink;
 import org.apache.beam.sdk.io.FileBasedSink.FilenamePolicy;
+import org.apache.beam.sdk.io.FileBasedSink.OutputFileHints;
 import org.apache.beam.sdk.io.TextIO;
 import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions;
 import org.apache.beam.sdk.io.fs.ResourceId;
@@ -53,22 +54,12 @@ public class WriteOneFilePerWindow extends 
PTransform, PDone
 
   @Override
   public PDone expand(PCollection input) {
-// filenamePrefix may contain a directory and a filename component. Pull 
out only the filename
-// component from that path for the PerWindowFi

[2/5] beam git commit: Adds DynamicDestinations support to FileBasedSink

2017-07-10 Thread jkff
http://git-wip-us.apache.org/repos/asf/beam/blob/77ba7a35/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSink.java
--
diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSink.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSink.java
index 511d697..b57b28c 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSink.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSink.java
@@ -34,27 +34,29 @@ import org.apache.beam.sdk.util.MimeTypes;
  * '\n'} represented in {@code UTF-8} format as the record separator. Each 
record (including the
  * last) is terminated.
  */
-class TextSink extends FileBasedSink {
+class TextSink extends FileBasedSink {
   @Nullable private final String header;
   @Nullable private final String footer;
 
   TextSink(
   ValueProvider baseOutputFilename,
-  FilenamePolicy filenamePolicy,
+  DynamicDestinations dynamicDestinations,
   @Nullable String header,
   @Nullable String footer,
   WritableByteChannelFactory writableByteChannelFactory) {
-super(baseOutputFilename, filenamePolicy, writableByteChannelFactory);
+super(baseOutputFilename, dynamicDestinations, writableByteChannelFactory);
 this.header = header;
 this.footer = footer;
   }
+
   @Override
-  public WriteOperation createWriteOperation() {
-return new TextWriteOperation(this, header, footer);
+  public WriteOperation createWriteOperation() {
+return new TextWriteOperation<>(this, header, footer);
   }
 
   /** A {@link WriteOperation WriteOperation} for text files. */
-  private static class TextWriteOperation extends WriteOperation {
+  private static class TextWriteOperation
+  extends WriteOperation {
 @Nullable private final String header;
 @Nullable private final String footer;
 
@@ -65,20 +67,20 @@ class TextSink extends FileBasedSink {
 }
 
 @Override
-public Writer createWriter() throws Exception {
-  return new TextWriter(this, header, footer);
+public Writer createWriter() throws Exception {
+  return new TextWriter<>(this, header, footer);
 }
   }
 
   /** A {@link Writer Writer} for text files. */
-  private static class TextWriter extends Writer {
+  private static class TextWriter extends Writer {
 private static final String NEWLINE = "\n";
 @Nullable private final String header;
 @Nullable private final String footer;
 private OutputStreamWriter out;
 
 public TextWriter(
-WriteOperation writeOperation,
+WriteOperation writeOperation,
 @Nullable String header,
 @Nullable String footer) {
   super(writeOperation, MimeTypes.TEXT);

http://git-wip-us.apache.org/repos/asf/beam/blob/77ba7a35/sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java
index a220eab..7013044 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java
@@ -20,9 +20,12 @@ package org.apache.beam.sdk.io;
 import static com.google.common.base.Preconditions.checkArgument;
 import static com.google.common.base.Preconditions.checkNotNull;
 
+import com.google.common.base.Objects;
 import com.google.common.collect.ImmutableList;
 import com.google.common.collect.Lists;
 import com.google.common.collect.Maps;
+import com.google.common.hash.Hashing;
+import java.io.IOException;
 import java.util.List;
 import java.util.Map;
 import java.util.UUID;
@@ -30,8 +33,11 @@ import java.util.concurrent.ThreadLocalRandom;
 import javax.annotation.Nullable;
 import org.apache.beam.sdk.Pipeline;
 import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.CannotProvideCoderException;
 import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.coders.Coder.NonDeterministicException;
 import org.apache.beam.sdk.coders.KvCoder;
+import org.apache.beam.sdk.coders.ShardedKeyCoder;
 import org.apache.beam.sdk.coders.VarIntCoder;
 import org.apache.beam.sdk.coders.VoidCoder;
 import org.apache.beam.sdk.io.FileBasedSink.FileResult;
@@ -47,6 +53,7 @@ import org.apache.beam.sdk.transforms.Flatten;
 import org.apache.beam.sdk.transforms.GroupByKey;
 import org.apache.beam.sdk.transforms.PTransform;
 import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.SerializableFunction;
 import org.apache.beam.sdk.transforms.View;
 import org.apache.beam.sdk.transforms.WithKeys;
 import org.apache.beam.sdk.transforms.display.DisplayData;
@@ -55,6 +62,7 @@ import 
org.apache.beam.sdk.transforms.windowing.DefaultTrigger;
 import org.apache.beam.sdk.transforms.windowing.GlobalWindows;
 import org.apache.beam.sdk.

[GitHub] beam pull request #3356: [BEAM-92] Allow value-dependent files in FileBasedS...

2017-07-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3356


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Flink #3365

2017-07-10 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2599

2017-07-10 Thread Apache Jenkins Server
See 




Jenkins build is unstable: beam_PostCommit_Java_MavenInstall #4343

2017-07-10 Thread Apache Jenkins Server
See 




[4/8] beam git commit: Enable SplittableParDo on rehydrated ParDo transform

2017-07-10 Thread kenn
Enable SplittableParDo on rehydrated ParDo transform


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/fa61ed17
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/fa61ed17
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/fa61ed17

Branch: refs/heads/master
Commit: fa61ed17424083fa53b8aa8e70908fb6194ad4ad
Parents: de39f32
Author: Kenneth Knowles 
Authored: Thu Jun 8 14:27:02 2017 -0700
Committer: Kenneth Knowles 
Committed: Mon Jul 10 20:15:49 2017 -0700

--
 .../core/construction/SplittableParDo.java  | 25 ++
 .../direct/ParDoMultiOverrideFactory.java   | 36 ++--
 .../flink/FlinkStreamingPipelineTranslator.java |  2 +-
 3 files changed, 52 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/fa61ed17/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDo.java
--
diff --git 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDo.java
 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDo.java
index f31b495..e71187b 100644
--- 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDo.java
+++ 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDo.java
@@ -19,6 +19,7 @@ package org.apache.beam.runners.core.construction;
 
 import static com.google.common.base.Preconditions.checkArgument;
 
+import java.io.IOException;
 import java.util.List;
 import java.util.Map;
 import java.util.UUID;
@@ -26,6 +27,7 @@ import 
org.apache.beam.runners.core.construction.PTransformTranslation.RawPTrans
 import org.apache.beam.sdk.annotations.Experimental;
 import org.apache.beam.sdk.coders.Coder;
 import org.apache.beam.sdk.coders.KvCoder;
+import org.apache.beam.sdk.runners.AppliedPTransform;
 import org.apache.beam.sdk.transforms.DoFn;
 import org.apache.beam.sdk.transforms.PTransform;
 import org.apache.beam.sdk.transforms.ParDo;
@@ -103,6 +105,9 @@ public class SplittableParDo
   public static  SplittableParDo 
forJavaParDo(
   ParDo.MultiOutput parDo) {
 checkArgument(parDo != null, "parDo must not be null");
+checkArgument(
+
DoFnSignatures.getSignature(parDo.getFn().getClass()).processElement().isSplittable(),
+"fn must be a splittable DoFn");
 return new SplittableParDo(
 parDo.getFn(),
 parDo.getMainOutputTag(),
@@ -110,6 +115,26 @@ public class SplittableParDo
 parDo.getAdditionalOutputTags());
   }
 
+  /**
+   * Creates the transform for a {@link ParDo}-compatible {@link 
AppliedPTransform}.
+   *
+   * The input may generally be a deserialized transform so it may not 
actually be a {@link
+   * ParDo}. Instead {@link ParDoTranslation} will be used to extract fields.
+   */
+  public static SplittableParDo forAppliedParDo(AppliedPTransform parDo) {
+checkArgument(parDo != null, "parDo must not be null");
+
+try {
+  return new SplittableParDo<>(
+  ParDoTranslation.getDoFn(parDo),
+  (TupleTag) ParDoTranslation.getMainOutputTag(parDo),
+  ParDoTranslation.getSideInputs(parDo),
+  ParDoTranslation.getAdditionalOutputTags(parDo));
+} catch (IOException exc) {
+  throw new RuntimeException(exc);
+}
+  }
+
   @Override
   public PCollectionTuple expand(PCollection input) {
 Coder restrictionCoder =

http://git-wip-us.apache.org/repos/asf/beam/blob/fa61ed17/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ParDoMultiOverrideFactory.java
--
diff --git 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ParDoMultiOverrideFactory.java
 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ParDoMultiOverrideFactory.java
index 2904bc1..8881967 100644
--- 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ParDoMultiOverrideFactory.java
+++ 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ParDoMultiOverrideFactory.java
@@ -19,6 +19,7 @@ package org.apache.beam.runners.direct;
 
 import static com.google.common.base.Preconditions.checkState;
 
+import java.io.IOException;
 import java.util.List;
 import java.util.Map;
 import org.apache.beam.runners.core.KeyedWorkItem;
@@ -26,6 +27,7 @@ import org.apache.beam.runners.core.KeyedWorkItemCoder;
 import org.apache.beam.runners.core.KeyedWorkItems;
 import org.apache.beam.runners.core.construction.PTransformReplacements;
 import org.apache.beam.runners.core.construction.PTransformTranslation;
+import org.apache.beam.runners.core.cons

[6/8] beam git commit: Fix misleading comment in TransformHierarchy

2017-07-10 Thread kenn
Fix misleading comment in TransformHierarchy


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/be9a3879
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/be9a3879
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/be9a3879

Branch: refs/heads/master
Commit: be9a387976adf3424d680778f92ce22f728ffa32
Parents: 1ac4b7e
Author: Kenneth Knowles 
Authored: Mon Jun 12 15:11:49 2017 -0700
Committer: Kenneth Knowles 
Committed: Mon Jul 10 20:18:01 2017 -0700

--
 .../main/java/org/apache/beam/sdk/runners/TransformHierarchy.java  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/be9a3879/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/TransformHierarchy.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/TransformHierarchy.java
 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/TransformHierarchy.java
index 9c5f148..6f1ee94 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/TransformHierarchy.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/TransformHierarchy.java
@@ -406,7 +406,7 @@ public class TransformHierarchy {
   return fullName;
 }
 
-/** Returns the transform input, in unexpanded form. */
+/** Returns the transform input, in fully expanded form. */
 public Map, PValue> getInputs() {
   return inputs == null ? Collections., PValue>emptyMap() : 
inputs;
 }



[5/8] beam git commit: Port DirectRunner ParDo overrides to SDK-agnostic APIs

2017-07-10 Thread kenn
Port DirectRunner ParDo overrides to SDK-agnostic APIs


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/1ac4b7e6
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/1ac4b7e6
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/1ac4b7e6

Branch: refs/heads/master
Commit: 1ac4b7e6f7dbbd68c27c6634cd52767885a42760
Parents: fa61ed1
Author: Kenneth Knowles 
Authored: Thu Jun 8 13:44:52 2017 -0700
Committer: Kenneth Knowles 
Committed: Mon Jul 10 20:17:56 2017 -0700

--
 .../core/construction/ParDoTranslation.java | 16 ++---
 .../construction/RunnerPCollectionView.java | 16 +
 .../direct/ParDoMultiOverrideFactory.java   | 35 +---
 3 files changed, 43 insertions(+), 24 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/1ac4b7e6/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ParDoTranslation.java
--
diff --git 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ParDoTranslation.java
 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ParDoTranslation.java
index fe8c5aa..90c9aad 100644
--- 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ParDoTranslation.java
+++ 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ParDoTranslation.java
@@ -19,6 +19,7 @@
 package org.apache.beam.runners.core.construction;
 
 import static com.google.common.base.Preconditions.checkArgument;
+import static com.google.common.base.Preconditions.checkNotNull;
 import static com.google.common.base.Preconditions.checkState;
 import static 
org.apache.beam.runners.core.construction.PTransformTranslation.PAR_DO_TRANSFORM_URN;
 
@@ -262,12 +263,19 @@ public class ParDoTranslation {
 ParDoPayload payload = 
parDoProto.getSpec().getParameter().unpack(ParDoPayload.class);
 
 List> views = new ArrayList<>();
-for (Map.Entry sideInput : 
payload.getSideInputsMap().entrySet()) {
+for (Map.Entry sideInputEntry : 
payload.getSideInputsMap().entrySet()) {
+  String sideInputTag = sideInputEntry.getKey();
+  RunnerApi.SideInput sideInput = sideInputEntry.getValue();
+  PCollection originalPCollection =
+  checkNotNull(
+  (PCollection) application.getInputs().get(new 
TupleTag<>(sideInputTag)),
+  "no input with tag %s",
+  sideInputTag);
   views.add(
   viewFromProto(
-  application.getPipeline(),
-  sideInput.getValue(),
-  sideInput.getKey(),
+  sideInput,
+  sideInputTag,
+  originalPCollection,
   parDoProto,
   sdkComponents.toComponents()));
 }

http://git-wip-us.apache.org/repos/asf/beam/blob/1ac4b7e6/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/RunnerPCollectionView.java
--
diff --git 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/RunnerPCollectionView.java
 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/RunnerPCollectionView.java
index b275188..85139e8 100644
--- 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/RunnerPCollectionView.java
+++ 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/RunnerPCollectionView.java
@@ -19,6 +19,7 @@
 package org.apache.beam.runners.core.construction;
 
 import java.util.Map;
+import java.util.Objects;
 import javax.annotation.Nullable;
 import org.apache.beam.sdk.coders.Coder;
 import org.apache.beam.sdk.common.runner.v1.RunnerApi.SideInput;
@@ -94,4 +95,19 @@ class RunnerPCollectionView extends PValueBase implements 
PCollectionView
 throw new UnsupportedOperationException(String.format(
 "A %s cannot be expanded", 
RunnerPCollectionView.class.getSimpleName()));
   }
+
+  @Override
+  public boolean equals(Object other) {
+if (!(other instanceof PCollectionView)) {
+  return false;
+}
+@SuppressWarnings("unchecked")
+PCollectionView otherView = (PCollectionView) other;
+return tag.equals(otherView.getTagInternal());
+  }
+
+  @Override
+  public int hashCode() {
+return Objects.hash(tag);
+  }
 }

http://git-wip-us.apache.org/repos/asf/beam/blob/1ac4b7e6/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ParDoMultiOverrideFactory.java
--
diff --git 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direc

[2/8] beam git commit: Rehydrate PCollections

2017-07-10 Thread kenn
Rehydrate PCollections


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/20ce0756
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/20ce0756
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/20ce0756

Branch: refs/heads/master
Commit: 20ce0756c97f5ed47ad9c8cb46da574c273b5b46
Parents: c14a318
Author: Kenneth Knowles 
Authored: Thu Jul 6 09:24:22 2017 -0700
Committer: Kenneth Knowles 
Committed: Mon Jul 10 20:04:14 2017 -0700

--
 .../construction/PCollectionTranslation.java| 16 ++
 .../PCollectionTranslationTest.java | 22 
 2 files changed, 38 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/20ce0756/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PCollectionTranslation.java
--
diff --git 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PCollectionTranslation.java
 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PCollectionTranslation.java
index 968966f..52526bb 100644
--- 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PCollectionTranslation.java
+++ 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PCollectionTranslation.java
@@ -20,6 +20,7 @@ package org.apache.beam.runners.core.construction;
 
 import com.google.protobuf.InvalidProtocolBufferException;
 import java.io.IOException;
+import org.apache.beam.sdk.Pipeline;
 import org.apache.beam.sdk.coders.Coder;
 import org.apache.beam.sdk.common.runner.v1.RunnerApi;
 import org.apache.beam.sdk.values.PCollection;
@@ -47,6 +48,21 @@ public class PCollectionTranslation {
 .build();
   }
 
+  public static PCollection fromProto(
+  Pipeline pipeline, RunnerApi.PCollection pCollection, 
RunnerApi.Components components)
+  throws IOException {
+return PCollection.createPrimitiveOutputInternal(
+pipeline,
+WindowingStrategyTranslation.fromProto(
+
components.getWindowingStrategiesOrThrow(pCollection.getWindowingStrategyId()),
+components),
+fromProto(pCollection.getIsBounded()))
+.setCoder(
+(Coder)
+CoderTranslation.fromProto(
+components.getCodersOrThrow(pCollection.getCoderId()), 
components));
+  }
+
   public static IsBounded isBounded(RunnerApi.PCollection pCollection) {
 return fromProto(pCollection.getIsBounded());
   }

http://git-wip-us.apache.org/repos/asf/beam/blob/20ce0756/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PCollectionTranslationTest.java
--
diff --git 
a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PCollectionTranslationTest.java
 
b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PCollectionTranslationTest.java
index 3b94220..5c45487 100644
--- 
a/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PCollectionTranslationTest.java
+++ 
b/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PCollectionTranslationTest.java
@@ -113,6 +113,28 @@ public class PCollectionTranslationTest {
 
   @Test
   public void testEncodeDecodeCycle() throws Exception {
+// Encode
+SdkComponents sdkComponents = SdkComponents.create();
+RunnerApi.PCollection protoCollection =
+PCollectionTranslation.toProto(testCollection, sdkComponents);
+RunnerApi.Components protoComponents = sdkComponents.toComponents();
+
+// Decode
+Pipeline pipeline = Pipeline.create();
+PCollection decodedCollection =
+PCollectionTranslation.fromProto(pipeline, protoCollection, 
protoComponents);
+
+// Verify
+assertThat(decodedCollection.getCoder(), 
Matchers.>equalTo(testCollection.getCoder()));
+assertThat(
+decodedCollection.getWindowingStrategy(),
+Matchers.>equalTo(
+testCollection.getWindowingStrategy().fixDefaults()));
+assertThat(decodedCollection.isBounded(), 
equalTo(testCollection.isBounded()));
+  }
+
+  @Test
+  public void testEncodeDecodeFields() throws Exception {
 SdkComponents sdkComponents = SdkComponents.create();
 RunnerApi.PCollection protoCollection = PCollectionTranslation
 .toProto(testCollection, sdkComponents);



  1   2   >