[jira] [Updated] (BEAM-1772) Support merging WindowFn other than IntervalWindow on Flink Runner

2017-04-19 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-1772:
--
Fix Version/s: First stable release

> Support merging WindowFn other than IntervalWindow on Flink Runner
> --
>
> Key: BEAM-1772
> URL: https://issues.apache.org/jira/browse/BEAM-1772
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Ismaël Mejía
>Assignee: Jingsong Lee
> Fix For: First stable release
>
>
> Flink currently supports merging IntervalWindows, however if you have a 
> WindowFn who extends IntervalWindow the execution breaks.
> I found this while executing a Pipeline in Flink's batch mode.
> This will involve probably changing the window merging logic in 
> `FlinkMergingReduceFunction.mergeWindows()` and other similar parts to really 
> use the merging logic of the `WindowFn`.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-986) ReduceFnRunner doesn't batch prefetching pane firings

2017-04-19 Thread Sam Whittle (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Whittle resolved BEAM-986.
--
   Resolution: Fixed
Fix Version/s: 0.4.0

> ReduceFnRunner doesn't batch prefetching pane firings
> -
>
> Key: BEAM-986
> URL: https://issues.apache.org/jira/browse/BEAM-986
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Sam Whittle
>Assignee: Sam Whittle
> Fix For: 0.4.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Specifically
> - in ProcessElements, if there are multiple windows to consider each is 
> processed sequentially with sequential state fetches instead of a bulk 
> prefetch
> - onTimer method doesn't evaluate multiple timers at a time meaning that if 
> multiple timers are fired at once each is processed sequentially without 
> batched prefetching



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2005) Add a Hadoop FileSystem implementation of Beam's FileSystem

2017-04-19 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976046#comment-15976046
 ] 

Aviem Zur commented on BEAM-2005:
-

Regarding {{core}} vs {{extensions}}. This can reside in a separate module from 
core, but I think that core should depend on it so users get this functionality 
out of the box.

For example, if a user uses {{TextIO}} and it works for them when passing 
{{"file://path/to/file"}}, changing this to {{"hdfs://path/to/file"}} should 
work without the need to add a new dependency to their project.

> Add a Hadoop FileSystem implementation of Beam's FileSystem
> ---
>
> Key: BEAM-2005
> URL: https://issues.apache.org/jira/browse/BEAM-2005
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Stephen Sisk
>Assignee: Stephen Sisk
> Fix For: First stable release
>
>
> Beam's FileSystem creates an abstraction for reading from files in many 
> different places. 
> We should add a Hadoop FileSystem implementation 
> (https://hadoop.apache.org/docs/r2.8.0/api/org/apache/hadoop/fs/FileSystem.html)
>  - that would enable us to read from any file system that implements 
> FileSystem (including HDFS, azure, s3, etc..)
> I'm investigating this now.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1107) Display user names for steps in the Flink Web UI

2017-04-19 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976045#comment-15976045
 ] 

Kenneth Knowles commented on BEAM-1107:
---

Can we do the easy thing first? Right now the UI has a lot of anonymous things.

> Display user names for steps in the Flink Web UI
> 
>
> Key: BEAM-1107
> URL: https://issues.apache.org/jira/browse/BEAM-1107
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Daniel Halperin
>Assignee: Aljoscha Krettek
> Fix For: First stable release
>
> Attachments: screenshot-1.png
>
>
> [copying in-person / email discussion at Strata Singapore to JIRA]
> The FlinkBatchTransformTranslators use transform.getName() [1] -- this is the 
> "SDK name" for the transform.
> The "user name" for the transform is not available here, it is in fact on the 
> TransformHierarchy.Node as node.getFullName() [2].
> getFullName() is used some in Flink, but not when setting step names.
> I drafted a quick commit that sort of propagates the user names to the web UI 
> (but only for DataSource, and still too verbose: 
> https://github.com/dhalperi/incubator-beam/commit/a2f1fb06b22a85ec738e4f2a604c9a129891916c)
> Before this change, the "ReadLines" step showed up as: "DataSource (at 
> Read(CompressedSource) 
> (org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))"
> With this change, it shows up as "DataSource (at ReadLines/Read 
> (org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))"
> which I think is closer. [I'd still like it to JUST be "ReadLines/Read" e.g.].
> Thoughts?
> [1] 
> https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/FlinkBatchTransformTranslators.java#L129
> [2] 
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/TransformHierarchy.java#L252



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1107) Display user names for steps in the Flink Web UI

2017-04-19 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-1107:
--
Fix Version/s: First stable release

> Display user names for steps in the Flink Web UI
> 
>
> Key: BEAM-1107
> URL: https://issues.apache.org/jira/browse/BEAM-1107
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Daniel Halperin
>Assignee: Aljoscha Krettek
> Fix For: First stable release
>
> Attachments: screenshot-1.png
>
>
> [copying in-person / email discussion at Strata Singapore to JIRA]
> The FlinkBatchTransformTranslators use transform.getName() [1] -- this is the 
> "SDK name" for the transform.
> The "user name" for the transform is not available here, it is in fact on the 
> TransformHierarchy.Node as node.getFullName() [2].
> getFullName() is used some in Flink, but not when setting step names.
> I drafted a quick commit that sort of propagates the user names to the web UI 
> (but only for DataSource, and still too verbose: 
> https://github.com/dhalperi/incubator-beam/commit/a2f1fb06b22a85ec738e4f2a604c9a129891916c)
> Before this change, the "ReadLines" step showed up as: "DataSource (at 
> Read(CompressedSource) 
> (org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))"
> With this change, it shows up as "DataSource (at ReadLines/Read 
> (org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))"
> which I think is closer. [I'd still like it to JUST be "ReadLines/Read" e.g.].
> Thoughts?
> [1] 
> https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/FlinkBatchTransformTranslators.java#L129
> [2] 
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/TransformHierarchy.java#L252



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1766) Remove Aggregators from Apex runner

2017-04-19 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-1766:
--
Fix Version/s: First stable release

> Remove Aggregators from Apex runner
> ---
>
> Key: BEAM-1766
> URL: https://issues.apache.org/jira/browse/BEAM-1766
> Project: Beam
>  Issue Type: Bug
>  Components: runner-apex
>Reporter: Pablo Estrada
> Fix For: First stable release
>
>
> I have started removing aggregators from the Java SDK, but runners use them 
> in different ways that I can't figure out well. This is to track the 
> independent effort in Apex.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1796) Apex ValidatesRunner fails on testShiftTimestampInvalid

2017-04-19 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-1796:
--
Summary: Apex ValidatesRunner fails on testShiftTimestampInvalid  (was: 
Apex RunnableOnService fails on testShiftTimestampInvalid)

> Apex ValidatesRunner fails on testShiftTimestampInvalid
> ---
>
> Key: BEAM-1796
> URL: https://issues.apache.org/jira/browse/BEAM-1796
> Project: Beam
>  Issue Type: Bug
>  Components: runner-apex
>Reporter: Thomas Groh
>
> The failing test should fail due to an exception, but the pipeline appears to 
> either succeed or fail silently. Once the pipeline fails appropriately, the 
> test should be moved to be {{RunnableOnService}}
> There are exceptions with timestamps equal to now as the element timestamps 
> within the logs, which should not be present - all elements should begin with 
> timestamps equal to BoundedWindow.TIMESTAMP_MIN_VALUE, and shift only when 
> specified.
> https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Apex/843/org.apache.beam$beam-runners-apex/testReport/org.apache.beam.sdk.transforms/ParDoTest/testParDoShiftTimestampInvalid/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1767) Remove Aggregators from Dataflow runner

2017-04-19 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-1767:
--
Fix Version/s: First stable release

> Remove Aggregators from Dataflow runner
> ---
>
> Key: BEAM-1767
> URL: https://issues.apache.org/jira/browse/BEAM-1767
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
> Fix For: First stable release
>
>
> I have started removing aggregators from the Java SDK, but runners use them 
> in different ways that I can't figure out well. This is to track the 
> independent effort in Dataflow.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1850) Improve interplay between PushbackSideInputRunner and GroupAlsoByWindowViaWindowSetDoFn

2017-04-19 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-1850:
--
Summary: Improve interplay between PushbackSideInputRunner and 
GroupAlsoByWindowViaWindowSetDoFn  (was: Improve interplay between 
PusbackSideInputRunner and GroupAlsoByWindowViaWindowSetDoFn)

> Improve interplay between PushbackSideInputRunner and 
> GroupAlsoByWindowViaWindowSetDoFn
> ---
>
> Key: BEAM-1850
> URL: https://issues.apache.org/jira/browse/BEAM-1850
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core, runner-flink
>Reporter: Aljoscha Krettek
>
> This originated from a discussion on a PR: 
> https://github.com/apache/beam/pull/2235
> {{GroupAlsoByWindowViaWindowSetDoFn}}/{{GroupAlsoByWindowViaWindowSetNewDoFn}}
>  and {{PushbackSideInputDoFnRunner}} don't work well together and we manually 
> need to explode windows in 
> {{FlinkStreamingTransformTranslators.ToKeyedWorkItem}} because of this:
>  - {{GroupAlsoByWindowViaWindowSetDoFn}} is a {{DoFn InputT>, KV>}} so you have to push in {{KeyedWorkItem}}. These 
> themselves contain {{WindowedValue}} (or timers).
>  - For executing a {{DoFn}} we use a {{DoFnRunner}}. For our problem the 
> interesting case is using a {{PushbackSideInputDoFnRunner}}. The interesting 
> method is {{processElementInReadyWindows(WindowedValue elem)}} where 
> {{InputT}} is the input type of the {{DoFn}} which, for the windowing case, 
> is {{KeyedWorkItem}} (from above). The actual expanded type 
> signature is thus 
> {{processElementInReadyWindows(WindowedValue> 
> elem)}} where the keyed work items again contain {{WindowedValues}} (again, 
> from above).
> I think the {{PushbackSideInputDoFnRunner}} was not initially meant for 
> executing {{GroupAlsoByWindowViaWindowSetDoFns}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-311) Add 'garbage collection' hold when receive late element

2017-04-19 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-311:


Assignee: Kenneth Knowles  (was: Mark Shields)

> Add 'garbage collection' hold when receive late element
> ---
>
> Key: BEAM-311
> URL: https://issues.apache.org/jira/browse/BEAM-311
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Mark Shields
>Assignee: Kenneth Knowles
>
> We currently add a 'garbage collection' hold in WatermarkHold (invoked via 
> ReduceFnRunner) if the closing behavior is FIRE_ALWAYS. This means an element 
> which has come in too late for a data holds and an end-of-window hold may end 
> up setting no hold at all. As a result, the eventual pane containing that 
> element may end up dropped as being too late.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-311) Add 'garbage collection' hold when receive late element

2017-04-19 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-311:


Assignee: (was: Kenneth Knowles)

> Add 'garbage collection' hold when receive late element
> ---
>
> Key: BEAM-311
> URL: https://issues.apache.org/jira/browse/BEAM-311
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Mark Shields
>
> We currently add a 'garbage collection' hold in WatermarkHold (invoked via 
> ReduceFnRunner) if the closing behavior is FIRE_ALWAYS. This means an element 
> which has come in too late for a data holds and an end-of-window hold may end 
> up setting no hold at all. As a result, the eventual pane containing that 
> element may end up dropped as being too late.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1867) Element counts missing on Cloud Dataflow when PCollection has anything other than hardcoded name pattern

2017-04-19 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-1867:
-

Assignee: Kenneth Knowles

> Element counts missing on Cloud Dataflow when PCollection has anything other 
> than hardcoded name pattern
> 
>
> Key: BEAM-1867
> URL: https://issues.apache.org/jira/browse/BEAM-1867
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Blocker
> Fix For: First stable release
>
>
> In 0.6.0 and 0.7.0-SNAPSHOT (and possibly all past versions, these are just 
> those where it is confirmed) element count and byte metrics are not reported 
> correctly when the output PCollection for a primitive transform is not 
> {{transformname + ".out" + index}}.
> In 0.7.0-SNAPSHOT, the DataflowRunner uses pipeline surgery to replace the 
> composite {{ParDoSingle}} (that contains a {{ParDoMulti}}) with a 
> Dataflow-specific non-composite {{ParDoSingle}}. So metrics are reported for 
> names like {{"ParDoSingle(MyDoFn).out"}} when they should be reported for 
> {{"ParDoSingle/ParDoMulti(MyDoFn).out"}}. So all single-output ParDo 
> transforms lack these metrics on their outputs.
> In 0.6.0 the same problem occurs if the user ever uses 
> {{PCollection.setName}} to give their collection a meaningful name.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1458) Checkpoint support in Beam

2017-04-19 Thread huangxiaofeng (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976030#comment-15976030
 ] 

huangxiaofeng commented on BEAM-1458:
-

[#Aljoscha Krettek] do you have any plan, this feature will support in beam?

> Checkpoint support in Beam
> --
>
> Key: BEAM-1458
> URL: https://issues.apache.org/jira/browse/BEAM-1458
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Affects Versions: 0.5.0
>Reporter: Rafal Wojdyla
>Assignee: Rafal Wojdyla
>  Labels: features
>
> Beam could support checkpoints - similar to:
>  * flink's snapshots
>  * scalding's checkpoints
> Checkpoint should provides a simple mechanism to read and write intermediate 
> results. It would be useful for debugging one part of a long flow, when you 
> would otherwise have to run many steps to get to the one you care about.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1970) Cannot run UserScore on Flink runner due to AvroCoder classload issues

2017-04-19 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-1970:
--
Summary: Cannot run UserScore on Flink runner due to AvroCoder classload 
issues  (was: Cannot run UserScore on Flink runner)

> Cannot run UserScore on Flink runner due to AvroCoder classload issues
> --
>
> Key: BEAM-1970
> URL: https://issues.apache.org/jira/browse/BEAM-1970
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Ahmet Altay
>Assignee: Aljoscha Krettek
> Fix For: First stable release
>
>
> Fails with error:
> ClassCastException: 
> org.apache.beam.examples.complete.game.UserScore$GameActionInfo cannot be 
> cast to org.apache.beam.examples.complete.game.UserScore$GameActionInfo
> full stack:
> 
>  The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: The main method 
> caused an error.
> at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:545)
> at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:419)
> at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:339)
> at 
> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:831)
> at org.apache.flink.client.CliFrontend.run(CliFrontend.java:256)
> at 
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1073)
> at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)
> at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117)
> at 
> org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
> at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1116)
> Caused by: java.lang.RuntimeException: Pipeline execution failed
> at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:119)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:265)
> at 
> org.apache.beam.examples.complete.game.UserScore.main(UserScore.java:238)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)
> ... 13 more
> Caused by: org.apache.flink.client.program.ProgramInvocationException: The 
> program execution failed: Job execution failed.
> at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427)
> at 
> org.apache.flink.yarn.YarnClusterClient.submitJob(YarnClusterClient.java:210)
> at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400)
> at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:387)
> at 
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)
> at 
> org.apache.beam.runners.flink.FlinkPipelineExecutionEnvironment.executePipeline(FlinkPipelineExecutionEnvironment.java:111)
> at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:116)
> ... 20 more
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job 
> execution failed.
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:900)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
> at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
> at 
> 

[jira] [Commented] (BEAM-1970) Cannot run UserScore on Flink runner due to AvroCoder classload issues

2017-04-19 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976028#comment-15976028
 ] 

Kenneth Knowles commented on BEAM-1970:
---

I've now paged out the details, but there is a class getting cached that should 
not be.

> Cannot run UserScore on Flink runner due to AvroCoder classload issues
> --
>
> Key: BEAM-1970
> URL: https://issues.apache.org/jira/browse/BEAM-1970
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Ahmet Altay
>Assignee: Aljoscha Krettek
> Fix For: First stable release
>
>
> Fails with error:
> ClassCastException: 
> org.apache.beam.examples.complete.game.UserScore$GameActionInfo cannot be 
> cast to org.apache.beam.examples.complete.game.UserScore$GameActionInfo
> full stack:
> 
>  The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: The main method 
> caused an error.
> at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:545)
> at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:419)
> at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:339)
> at 
> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:831)
> at org.apache.flink.client.CliFrontend.run(CliFrontend.java:256)
> at 
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1073)
> at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)
> at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117)
> at 
> org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
> at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1116)
> Caused by: java.lang.RuntimeException: Pipeline execution failed
> at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:119)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:265)
> at 
> org.apache.beam.examples.complete.game.UserScore.main(UserScore.java:238)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)
> ... 13 more
> Caused by: org.apache.flink.client.program.ProgramInvocationException: The 
> program execution failed: Job execution failed.
> at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427)
> at 
> org.apache.flink.yarn.YarnClusterClient.submitJob(YarnClusterClient.java:210)
> at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400)
> at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:387)
> at 
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)
> at 
> org.apache.beam.runners.flink.FlinkPipelineExecutionEnvironment.executePipeline(FlinkPipelineExecutionEnvironment.java:111)
> at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:116)
> ... 20 more
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job 
> execution failed.
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:900)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
> at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
> at 
> 

[jira] [Updated] (BEAM-1970) Cannot run UserScore on Flink runner due to AvroCoder classload issues

2017-04-19 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-1970:
--
Fix Version/s: First stable release

> Cannot run UserScore on Flink runner due to AvroCoder classload issues
> --
>
> Key: BEAM-1970
> URL: https://issues.apache.org/jira/browse/BEAM-1970
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Ahmet Altay
>Assignee: Aljoscha Krettek
> Fix For: First stable release
>
>
> Fails with error:
> ClassCastException: 
> org.apache.beam.examples.complete.game.UserScore$GameActionInfo cannot be 
> cast to org.apache.beam.examples.complete.game.UserScore$GameActionInfo
> full stack:
> 
>  The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: The main method 
> caused an error.
> at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:545)
> at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:419)
> at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:339)
> at 
> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:831)
> at org.apache.flink.client.CliFrontend.run(CliFrontend.java:256)
> at 
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1073)
> at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)
> at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117)
> at 
> org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
> at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1116)
> Caused by: java.lang.RuntimeException: Pipeline execution failed
> at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:119)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:265)
> at 
> org.apache.beam.examples.complete.game.UserScore.main(UserScore.java:238)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)
> ... 13 more
> Caused by: org.apache.flink.client.program.ProgramInvocationException: The 
> program execution failed: Job execution failed.
> at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427)
> at 
> org.apache.flink.yarn.YarnClusterClient.submitJob(YarnClusterClient.java:210)
> at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400)
> at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:387)
> at 
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)
> at 
> org.apache.beam.runners.flink.FlinkPipelineExecutionEnvironment.executePipeline(FlinkPipelineExecutionEnvironment.java:111)
> at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:116)
> ... 20 more
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job 
> execution failed.
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:900)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
> at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
> at 
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at 
> 

[jira] [Assigned] (BEAM-2022) ApexTimerInternals seems to treat processing time timers as event time timers

2017-04-19 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-2022:
-

Assignee: Thomas Weise

> ApexTimerInternals seems to treat processing time timers as event time timers
> -
>
> Key: BEAM-2022
> URL: https://issues.apache.org/jira/browse/BEAM-2022
> Project: Beam
>  Issue Type: Bug
>  Components: runner-apex
>Reporter: Kenneth Knowles
>Assignee: Thomas Weise
> Fix For: First stable release
>
>
> I first noticed that {{currentProcessingTime()}} was using {{Instant.now()}}, 
> which has some bad issues in a distributed setting. But it seemed on 
> inspection that processing time timers are simply treated as event time. 
> Perhaps I am reading the code wrong?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2022) ApexTimerInternals seems to treat processing time timers as event time timers

2017-04-19 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976025#comment-15976025
 ] 

Kenneth Knowles commented on BEAM-2022:
---

If this is the case, and is not fixable quickly, we can add static validation 
of a graph to reject such timers.

> ApexTimerInternals seems to treat processing time timers as event time timers
> -
>
> Key: BEAM-2022
> URL: https://issues.apache.org/jira/browse/BEAM-2022
> Project: Beam
>  Issue Type: Bug
>  Components: runner-apex
>Reporter: Kenneth Knowles
>Assignee: Thomas Weise
> Fix For: First stable release
>
>
> I first noticed that {{currentProcessingTime()}} was using {{Instant.now()}}, 
> which has some bad issues in a distributed setting. But it seemed on 
> inspection that processing time timers are simply treated as event time. 
> Perhaps I am reading the code wrong?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2022) ApexTimerInternals seems to treat processing time timers as event time timers

2017-04-19 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-2022:
--
Fix Version/s: First stable release

> ApexTimerInternals seems to treat processing time timers as event time timers
> -
>
> Key: BEAM-2022
> URL: https://issues.apache.org/jira/browse/BEAM-2022
> Project: Beam
>  Issue Type: Bug
>  Components: runner-apex
>Reporter: Kenneth Knowles
> Fix For: First stable release
>
>
> I first noticed that {{currentProcessingTime()}} was using {{Instant.now()}}, 
> which has some bad issues in a distributed setting. But it seemed on 
> inspection that processing time timers are simply treated as event time. 
> Perhaps I am reading the code wrong?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2022) ApexTimerInternals seems to treat processing time timers as event time timers

2017-04-19 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-2022:
--
Issue Type: Bug  (was: Improvement)

> ApexTimerInternals seems to treat processing time timers as event time timers
> -
>
> Key: BEAM-2022
> URL: https://issues.apache.org/jira/browse/BEAM-2022
> Project: Beam
>  Issue Type: Bug
>  Components: runner-apex
>Reporter: Kenneth Knowles
>
> I first noticed that {{currentProcessingTime()}} was using {{Instant.now()}}, 
> which has some bad issues in a distributed setting. But it seemed on 
> inspection that processing time timers are simply treated as event time. 
> Perhaps I am reading the code wrong?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-2022) ApexTimerInternals seems to treat processing time timers as event time timers

2017-04-19 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-2022:
-

 Summary: ApexTimerInternals seems to treat processing time timers 
as event time timers
 Key: BEAM-2022
 URL: https://issues.apache.org/jira/browse/BEAM-2022
 Project: Beam
  Issue Type: Improvement
  Components: runner-apex
Reporter: Kenneth Knowles


I first noticed that {{currentProcessingTime()}} was using {{Instant.now()}}, 
which has some bad issues in a distributed setting. But it seemed on inspection 
that processing time timers are simply treated as event time. Perhaps I am 
reading the code wrong?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1247) Session state should not be lost when discardingFiredPanes

2017-04-19 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-1247:
-

Assignee: Kenneth Knowles

> Session state should not be lost when discardingFiredPanes
> --
>
> Key: BEAM-1247
> URL: https://issues.apache.org/jira/browse/BEAM-1247
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model, runner-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Critical
>  Labels: backward-incompatible
> Fix For: First stable release
>
>
> Today when {{discardingFiredPanes}} the entirety of state is cleared, 
> including the state of evolving sessions. This means that with multiple 
> triggerings a single session shows up as multiple. This also stymies 
> downstream stateful computations.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-986) ReduceFnRunner doesn't batch prefetching pane firings

2017-04-19 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976014#comment-15976014
 ] 

Kenneth Knowles commented on BEAM-986:
--

This is fully resolved, yes?

> ReduceFnRunner doesn't batch prefetching pane firings
> -
>
> Key: BEAM-986
> URL: https://issues.apache.org/jira/browse/BEAM-986
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Specifically
> - in ProcessElements, if there are multiple windows to consider each is 
> processed sequentially with sequential state fetches instead of a bulk 
> prefetch
> - onTimer method doesn't evaluate multiple timers at a time meaning that if 
> multiple timers are fired at once each is processed sequentially without 
> batched prefetching



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1641) Support synchronized processing time in Flink runner

2017-04-19 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-1641:
--
Fix Version/s: First stable release

> Support synchronized processing time in Flink runner
> 
>
> Key: BEAM-1641
> URL: https://issues.apache.org/jira/browse/BEAM-1641
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kenneth Knowles
>Assignee: Aljoscha Krettek
> Fix For: First stable release
>
>
> The "continuation trigger" for a processing time trigger is a synchronized 
> processing time trigger. Today, this throws an exception in the FlinkRunner.
> The supports the following:
>  - GBK1
>  - GBK2
> When GBK1 fires due to processing time past the first element in the pane and 
> that element arrives at GBK2, it will wait until all the other upstream keys 
> have also processed and emitted corresponding data.
> Sorry for the terseness of explanation - writing quickly so I don't forget.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build is back to normal : beam_PostCommit_Java_ValidatesRunner_Spark #1717

2017-04-19 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-1789) window can't not use in spark cluster module

2017-04-19 Thread tianyou (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976003#comment-15976003
 ] 

tianyou commented on BEAM-1789:
---

I change _Window.Bound> fixWindow = 
Window.> into(FixedWindows.of(size));_ to 
_Window.>into(FixedWindows.of(size)).triggering(AfterWatermark.pastEndOfWindow().withLateFirings(AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardHours(1.withAllowedLateness(Duration.standardDays(1)).discardingFiredPanes()_
 ,But I have the same conclusion

> window can't not use in spark cluster module
> 
>
> Key: BEAM-1789
> URL: https://issues.apache.org/jira/browse/BEAM-1789
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: tianyou
>Assignee: Amit Sela
>
>  I user beam in spark cluster,The application is blow.
> SparkPipelineOptions options = 
> PipelineOptionsFactory.as(SparkPipelineOptions.class);
> options.setRunner(SparkRunner.class);
> options.setEnableSparkMetricSinks(false);
> options.setStreaming(true);
> options.setSparkMaster("spark://10.100.124.205:6066");
> options.setAppName("Beam App Spark"+new Random().nextFloat());
> options.setJobName("Beam Job Spark"+new Random().nextFloat());
> System.out.println("App Name:"+options.getAppName());
> System.out.println("Job Name:"+options.getJobName());
> options.setMaxRecordsPerBatch(10L);
> 
> //  PipelineOptions options = PipelineOptionsFactory.create();
> Pipeline p = Pipeline.create(options);
> 
> //  Duration size = Duration.standardMinutes(4);
> long duration = 60;
> if(args!=null && args.length==1){
> duration = Integer.valueOf(args[0]);
> }
> Duration size = Duration.standardSeconds(duration);
> System.out.println("时间窗口为:["+duration+"]秒");
> Window.Bound> fixWindow = 
> Window.> into(
> FixedWindows.of(size)
> );
> 
> String kafkaAddress = "10.100.124.208:9093";
> //  String kafkaAddress = "192.168.100.212:9092";
> 
> Map kfConsunmerConf = new HashMap();
> kfConsunmerConf.put("auto.offset.reset", "latest");
> PCollection kafkaJsonPc = p.apply(KafkaIO. 
> read()
> .withBootstrapServers(kafkaAddress)
> .withTopics(ImmutableList.of("wypxx1"))
> .withKeyCoder(StringUtf8Coder.of()) 
> .withValueCoder(StringUtf8Coder.of())
> .updateConsumerProperties(kfConsunmerConf)
> .withoutMetadata() 
> ).apply(Values. create());
> 
> 
> PCollection> totalPc = kafkaJsonPc.apply(
> "count line",
> ParDo.of(new DoFn>() {
> @ProcessElement
>   public void processElement(ProcessContext c) {
> String line = c.element();
> Instant is = c.timestamp();
> if(line.length()>2)
>   line = line.substring(0,2);
> System.out.println(line + " " +  is.toString());
> c.output(KV.of(line, line));
>   }
>  })
> );
> 
> 
> PCollection> itPc = 
> totalPc.apply(fixWindow).apply(
> "group by appKey",
> GroupByKey.create()
> );
>   itPc.apply(ParDo.of(new DoFn, Void>() {
> @ProcessElement
> public void processElement(ProcessContext c) {
> KV keyIt = c.element();
> String key = keyIt.getKey();
> Iterable itb = keyIt.getValue();
> Iterator it = itb.iterator();
> StringBuilder sb = new StringBuilder();
> sb.append(key).append(":[");
> while(it.hasNext()){
> sb.append(it.next()).append(",");
> }
> String str = sb.toString();
> str = str.substring(0,str.length() -1) + "]";
> System.out.println(str);
> String filePath = "/data/wyp/sparktest.txt";
> String line = "word-->["+key+"]total 
> count="+str+"--->time+"+c.timestamp().toString();
> System.out.println("writefile->"+line);
> FileUtil.write(filePath, line, true, true);
>  

[jira] [Comment Edited] (BEAM-1789) window can't not use in spark cluster module

2017-04-19 Thread tianyou (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976003#comment-15976003
 ] 

tianyou edited comment on BEAM-1789 at 4/20/17 3:31 AM:


I change _*Window.Bound> fixWindow = 
Window.> into(FixedWindows.of(size));*_ to 
_*Window.>into(FixedWindows.of(size)).triggering(AfterWatermark.pastEndOfWindow().withLateFirings(AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardHours(1.withAllowedLateness(Duration.standardDays(1)).discardingFiredPanes()*_
 ,But I have the same conclusion


was (Author: tianyou):
I change _Window.Bound> fixWindow = 
Window.> into(FixedWindows.of(size));_ to 
_Window.>into(FixedWindows.of(size)).triggering(AfterWatermark.pastEndOfWindow().withLateFirings(AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardHours(1.withAllowedLateness(Duration.standardDays(1)).discardingFiredPanes()_
 ,But I have the same conclusion

> window can't not use in spark cluster module
> 
>
> Key: BEAM-1789
> URL: https://issues.apache.org/jira/browse/BEAM-1789
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: tianyou
>Assignee: Amit Sela
>
>  I user beam in spark cluster,The application is blow.
> SparkPipelineOptions options = 
> PipelineOptionsFactory.as(SparkPipelineOptions.class);
> options.setRunner(SparkRunner.class);
> options.setEnableSparkMetricSinks(false);
> options.setStreaming(true);
> options.setSparkMaster("spark://10.100.124.205:6066");
> options.setAppName("Beam App Spark"+new Random().nextFloat());
> options.setJobName("Beam Job Spark"+new Random().nextFloat());
> System.out.println("App Name:"+options.getAppName());
> System.out.println("Job Name:"+options.getJobName());
> options.setMaxRecordsPerBatch(10L);
> 
> //  PipelineOptions options = PipelineOptionsFactory.create();
> Pipeline p = Pipeline.create(options);
> 
> //  Duration size = Duration.standardMinutes(4);
> long duration = 60;
> if(args!=null && args.length==1){
> duration = Integer.valueOf(args[0]);
> }
> Duration size = Duration.standardSeconds(duration);
> System.out.println("时间窗口为:["+duration+"]秒");
> Window.Bound> fixWindow = 
> Window.> into(
> FixedWindows.of(size)
> );
> 
> String kafkaAddress = "10.100.124.208:9093";
> //  String kafkaAddress = "192.168.100.212:9092";
> 
> Map kfConsunmerConf = new HashMap();
> kfConsunmerConf.put("auto.offset.reset", "latest");
> PCollection kafkaJsonPc = p.apply(KafkaIO. 
> read()
> .withBootstrapServers(kafkaAddress)
> .withTopics(ImmutableList.of("wypxx1"))
> .withKeyCoder(StringUtf8Coder.of()) 
> .withValueCoder(StringUtf8Coder.of())
> .updateConsumerProperties(kfConsunmerConf)
> .withoutMetadata() 
> ).apply(Values. create());
> 
> 
> PCollection> totalPc = kafkaJsonPc.apply(
> "count line",
> ParDo.of(new DoFn>() {
> @ProcessElement
>   public void processElement(ProcessContext c) {
> String line = c.element();
> Instant is = c.timestamp();
> if(line.length()>2)
>   line = line.substring(0,2);
> System.out.println(line + " " +  is.toString());
> c.output(KV.of(line, line));
>   }
>  })
> );
> 
> 
> PCollection> itPc = 
> totalPc.apply(fixWindow).apply(
> "group by appKey",
> GroupByKey.create()
> );
>   itPc.apply(ParDo.of(new DoFn, Void>() {
> @ProcessElement
> public void processElement(ProcessContext c) {
> KV keyIt = c.element();
> String key = keyIt.getKey();
> Iterable itb = keyIt.getValue();
> Iterator it = itb.iterator();
> StringBuilder sb = new StringBuilder();
> sb.append(key).append(":[");
> while(it.hasNext()){
> sb.append(it.next()).append(",");
>

[jira] [Issue Comment Deleted] (BEAM-1789) window can't not use in spark cluster module

2017-04-19 Thread tianyou (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tianyou updated BEAM-1789:
--
Comment: was deleted

(was: I change *Window.Bound> fixWindow = 
Window.> into(FixedWindows.of(size));* to 
*Window.>into(FixedWindows.of(size)).triggering(AfterWatermark.pastEndOfWindow().withLateFirings(AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardHours(1.withAllowedLateness(Duration.standardDays(1)).discardingFiredPanes()*
 ,But I have the same conclusion)

> window can't not use in spark cluster module
> 
>
> Key: BEAM-1789
> URL: https://issues.apache.org/jira/browse/BEAM-1789
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: tianyou
>Assignee: Amit Sela
>
>  I user beam in spark cluster,The application is blow.
> SparkPipelineOptions options = 
> PipelineOptionsFactory.as(SparkPipelineOptions.class);
> options.setRunner(SparkRunner.class);
> options.setEnableSparkMetricSinks(false);
> options.setStreaming(true);
> options.setSparkMaster("spark://10.100.124.205:6066");
> options.setAppName("Beam App Spark"+new Random().nextFloat());
> options.setJobName("Beam Job Spark"+new Random().nextFloat());
> System.out.println("App Name:"+options.getAppName());
> System.out.println("Job Name:"+options.getJobName());
> options.setMaxRecordsPerBatch(10L);
> 
> //  PipelineOptions options = PipelineOptionsFactory.create();
> Pipeline p = Pipeline.create(options);
> 
> //  Duration size = Duration.standardMinutes(4);
> long duration = 60;
> if(args!=null && args.length==1){
> duration = Integer.valueOf(args[0]);
> }
> Duration size = Duration.standardSeconds(duration);
> System.out.println("时间窗口为:["+duration+"]秒");
> Window.Bound> fixWindow = 
> Window.> into(
> FixedWindows.of(size)
> );
> 
> String kafkaAddress = "10.100.124.208:9093";
> //  String kafkaAddress = "192.168.100.212:9092";
> 
> Map kfConsunmerConf = new HashMap();
> kfConsunmerConf.put("auto.offset.reset", "latest");
> PCollection kafkaJsonPc = p.apply(KafkaIO. 
> read()
> .withBootstrapServers(kafkaAddress)
> .withTopics(ImmutableList.of("wypxx1"))
> .withKeyCoder(StringUtf8Coder.of()) 
> .withValueCoder(StringUtf8Coder.of())
> .updateConsumerProperties(kfConsunmerConf)
> .withoutMetadata() 
> ).apply(Values. create());
> 
> 
> PCollection> totalPc = kafkaJsonPc.apply(
> "count line",
> ParDo.of(new DoFn>() {
> @ProcessElement
>   public void processElement(ProcessContext c) {
> String line = c.element();
> Instant is = c.timestamp();
> if(line.length()>2)
>   line = line.substring(0,2);
> System.out.println(line + " " +  is.toString());
> c.output(KV.of(line, line));
>   }
>  })
> );
> 
> 
> PCollection> itPc = 
> totalPc.apply(fixWindow).apply(
> "group by appKey",
> GroupByKey.create()
> );
>   itPc.apply(ParDo.of(new DoFn, Void>() {
> @ProcessElement
> public void processElement(ProcessContext c) {
> KV keyIt = c.element();
> String key = keyIt.getKey();
> Iterable itb = keyIt.getValue();
> Iterator it = itb.iterator();
> StringBuilder sb = new StringBuilder();
> sb.append(key).append(":[");
> while(it.hasNext()){
> sb.append(it.next()).append(",");
> }
> String str = sb.toString();
> str = str.substring(0,str.length() -1) + "]";
> System.out.println(str);
> String filePath = "/data/wyp/sparktest.txt";
> String line = "word-->["+key+"]total 
> count="+str+"--->time+"+c.timestamp().toString();
> System.out.println("writefile->"+line);
> FileUtil.write(filePath, line, true, true);
> }
>  

[jira] [Commented] (BEAM-1789) window can't not use in spark cluster module

2017-04-19 Thread tianyou (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976002#comment-15976002
 ] 

tianyou commented on BEAM-1789:
---

I change *Window.Bound> fixWindow = 
Window.> into(FixedWindows.of(size));* to 
*Window.>into(FixedWindows.of(size)).triggering(AfterWatermark.pastEndOfWindow().withLateFirings(AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardHours(1.withAllowedLateness(Duration.standardDays(1)).discardingFiredPanes()*
 ,But I have the same conclusion

> window can't not use in spark cluster module
> 
>
> Key: BEAM-1789
> URL: https://issues.apache.org/jira/browse/BEAM-1789
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: tianyou
>Assignee: Amit Sela
>
>  I user beam in spark cluster,The application is blow.
> SparkPipelineOptions options = 
> PipelineOptionsFactory.as(SparkPipelineOptions.class);
> options.setRunner(SparkRunner.class);
> options.setEnableSparkMetricSinks(false);
> options.setStreaming(true);
> options.setSparkMaster("spark://10.100.124.205:6066");
> options.setAppName("Beam App Spark"+new Random().nextFloat());
> options.setJobName("Beam Job Spark"+new Random().nextFloat());
> System.out.println("App Name:"+options.getAppName());
> System.out.println("Job Name:"+options.getJobName());
> options.setMaxRecordsPerBatch(10L);
> 
> //  PipelineOptions options = PipelineOptionsFactory.create();
> Pipeline p = Pipeline.create(options);
> 
> //  Duration size = Duration.standardMinutes(4);
> long duration = 60;
> if(args!=null && args.length==1){
> duration = Integer.valueOf(args[0]);
> }
> Duration size = Duration.standardSeconds(duration);
> System.out.println("时间窗口为:["+duration+"]秒");
> Window.Bound> fixWindow = 
> Window.> into(
> FixedWindows.of(size)
> );
> 
> String kafkaAddress = "10.100.124.208:9093";
> //  String kafkaAddress = "192.168.100.212:9092";
> 
> Map kfConsunmerConf = new HashMap();
> kfConsunmerConf.put("auto.offset.reset", "latest");
> PCollection kafkaJsonPc = p.apply(KafkaIO. 
> read()
> .withBootstrapServers(kafkaAddress)
> .withTopics(ImmutableList.of("wypxx1"))
> .withKeyCoder(StringUtf8Coder.of()) 
> .withValueCoder(StringUtf8Coder.of())
> .updateConsumerProperties(kfConsunmerConf)
> .withoutMetadata() 
> ).apply(Values. create());
> 
> 
> PCollection> totalPc = kafkaJsonPc.apply(
> "count line",
> ParDo.of(new DoFn>() {
> @ProcessElement
>   public void processElement(ProcessContext c) {
> String line = c.element();
> Instant is = c.timestamp();
> if(line.length()>2)
>   line = line.substring(0,2);
> System.out.println(line + " " +  is.toString());
> c.output(KV.of(line, line));
>   }
>  })
> );
> 
> 
> PCollection> itPc = 
> totalPc.apply(fixWindow).apply(
> "group by appKey",
> GroupByKey.create()
> );
>   itPc.apply(ParDo.of(new DoFn, Void>() {
> @ProcessElement
> public void processElement(ProcessContext c) {
> KV keyIt = c.element();
> String key = keyIt.getKey();
> Iterable itb = keyIt.getValue();
> Iterator it = itb.iterator();
> StringBuilder sb = new StringBuilder();
> sb.append(key).append(":[");
> while(it.hasNext()){
> sb.append(it.next()).append(",");
> }
> String str = sb.toString();
> str = str.substring(0,str.length() -1) + "]";
> System.out.println(str);
> String filePath = "/data/wyp/sparktest.txt";
> String line = "word-->["+key+"]total 
> count="+str+"--->time+"+c.timestamp().toString();
> System.out.println("writefile->"+line);
> FileUtil.write(filePath, line, true, true);
>  

[jira] [Issue Comment Deleted] (BEAM-1789) window can't not use in spark cluster module

2017-04-19 Thread tianyou (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tianyou updated BEAM-1789:
--
Comment: was deleted

(was: I change *Window.Bound> fixWindow = 
Window.> into(
FixedWindows.of(size)
);* to *Window.>into(FixedWindows.of(size))
.triggering(
AfterWatermark.pastEndOfWindow()
.withLateFirings(AfterProcessingTime
.pastFirstElementInPane().plusDelayOf(Duration.standardHours(1
.withAllowedLateness(Duration.standardDays(1)).discardingFiredPanes()*  ,But I 
have the same conclusion)

> window can't not use in spark cluster module
> 
>
> Key: BEAM-1789
> URL: https://issues.apache.org/jira/browse/BEAM-1789
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: tianyou
>Assignee: Amit Sela
>
>  I user beam in spark cluster,The application is blow.
> SparkPipelineOptions options = 
> PipelineOptionsFactory.as(SparkPipelineOptions.class);
> options.setRunner(SparkRunner.class);
> options.setEnableSparkMetricSinks(false);
> options.setStreaming(true);
> options.setSparkMaster("spark://10.100.124.205:6066");
> options.setAppName("Beam App Spark"+new Random().nextFloat());
> options.setJobName("Beam Job Spark"+new Random().nextFloat());
> System.out.println("App Name:"+options.getAppName());
> System.out.println("Job Name:"+options.getJobName());
> options.setMaxRecordsPerBatch(10L);
> 
> //  PipelineOptions options = PipelineOptionsFactory.create();
> Pipeline p = Pipeline.create(options);
> 
> //  Duration size = Duration.standardMinutes(4);
> long duration = 60;
> if(args!=null && args.length==1){
> duration = Integer.valueOf(args[0]);
> }
> Duration size = Duration.standardSeconds(duration);
> System.out.println("时间窗口为:["+duration+"]秒");
> Window.Bound> fixWindow = 
> Window.> into(
> FixedWindows.of(size)
> );
> 
> String kafkaAddress = "10.100.124.208:9093";
> //  String kafkaAddress = "192.168.100.212:9092";
> 
> Map kfConsunmerConf = new HashMap();
> kfConsunmerConf.put("auto.offset.reset", "latest");
> PCollection kafkaJsonPc = p.apply(KafkaIO. 
> read()
> .withBootstrapServers(kafkaAddress)
> .withTopics(ImmutableList.of("wypxx1"))
> .withKeyCoder(StringUtf8Coder.of()) 
> .withValueCoder(StringUtf8Coder.of())
> .updateConsumerProperties(kfConsunmerConf)
> .withoutMetadata() 
> ).apply(Values. create());
> 
> 
> PCollection> totalPc = kafkaJsonPc.apply(
> "count line",
> ParDo.of(new DoFn>() {
> @ProcessElement
>   public void processElement(ProcessContext c) {
> String line = c.element();
> Instant is = c.timestamp();
> if(line.length()>2)
>   line = line.substring(0,2);
> System.out.println(line + " " +  is.toString());
> c.output(KV.of(line, line));
>   }
>  })
> );
> 
> 
> PCollection> itPc = 
> totalPc.apply(fixWindow).apply(
> "group by appKey",
> GroupByKey.create()
> );
>   itPc.apply(ParDo.of(new DoFn, Void>() {
> @ProcessElement
> public void processElement(ProcessContext c) {
> KV keyIt = c.element();
> String key = keyIt.getKey();
> Iterable itb = keyIt.getValue();
> Iterator it = itb.iterator();
> StringBuilder sb = new StringBuilder();
> sb.append(key).append(":[");
> while(it.hasNext()){
> sb.append(it.next()).append(",");
> }
> String str = sb.toString();
> str = str.substring(0,str.length() -1) + "]";
> System.out.println(str);
> String filePath = "/data/wyp/sparktest.txt";
> String line = "word-->["+key+"]total 
> count="+str+"--->time+"+c.timestamp().toString();
> System.out.println("writefile->"+line);
> FileUtil.write(filePath, line, true, true);
>   

[jira] [Commented] (BEAM-1789) window can't not use in spark cluster module

2017-04-19 Thread tianyou (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976000#comment-15976000
 ] 

tianyou commented on BEAM-1789:
---

I change *Window.Bound> fixWindow = 
Window.> into(
FixedWindows.of(size)
);* to *Window.>into(FixedWindows.of(size))
.triggering(
AfterWatermark.pastEndOfWindow()
.withLateFirings(AfterProcessingTime
.pastFirstElementInPane().plusDelayOf(Duration.standardHours(1
.withAllowedLateness(Duration.standardDays(1)).discardingFiredPanes()*  ,But I 
have the same conclusion

> window can't not use in spark cluster module
> 
>
> Key: BEAM-1789
> URL: https://issues.apache.org/jira/browse/BEAM-1789
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: tianyou
>Assignee: Amit Sela
>
>  I user beam in spark cluster,The application is blow.
> SparkPipelineOptions options = 
> PipelineOptionsFactory.as(SparkPipelineOptions.class);
> options.setRunner(SparkRunner.class);
> options.setEnableSparkMetricSinks(false);
> options.setStreaming(true);
> options.setSparkMaster("spark://10.100.124.205:6066");
> options.setAppName("Beam App Spark"+new Random().nextFloat());
> options.setJobName("Beam Job Spark"+new Random().nextFloat());
> System.out.println("App Name:"+options.getAppName());
> System.out.println("Job Name:"+options.getJobName());
> options.setMaxRecordsPerBatch(10L);
> 
> //  PipelineOptions options = PipelineOptionsFactory.create();
> Pipeline p = Pipeline.create(options);
> 
> //  Duration size = Duration.standardMinutes(4);
> long duration = 60;
> if(args!=null && args.length==1){
> duration = Integer.valueOf(args[0]);
> }
> Duration size = Duration.standardSeconds(duration);
> System.out.println("时间窗口为:["+duration+"]秒");
> Window.Bound> fixWindow = 
> Window.> into(
> FixedWindows.of(size)
> );
> 
> String kafkaAddress = "10.100.124.208:9093";
> //  String kafkaAddress = "192.168.100.212:9092";
> 
> Map kfConsunmerConf = new HashMap();
> kfConsunmerConf.put("auto.offset.reset", "latest");
> PCollection kafkaJsonPc = p.apply(KafkaIO. 
> read()
> .withBootstrapServers(kafkaAddress)
> .withTopics(ImmutableList.of("wypxx1"))
> .withKeyCoder(StringUtf8Coder.of()) 
> .withValueCoder(StringUtf8Coder.of())
> .updateConsumerProperties(kfConsunmerConf)
> .withoutMetadata() 
> ).apply(Values. create());
> 
> 
> PCollection> totalPc = kafkaJsonPc.apply(
> "count line",
> ParDo.of(new DoFn>() {
> @ProcessElement
>   public void processElement(ProcessContext c) {
> String line = c.element();
> Instant is = c.timestamp();
> if(line.length()>2)
>   line = line.substring(0,2);
> System.out.println(line + " " +  is.toString());
> c.output(KV.of(line, line));
>   }
>  })
> );
> 
> 
> PCollection> itPc = 
> totalPc.apply(fixWindow).apply(
> "group by appKey",
> GroupByKey.create()
> );
>   itPc.apply(ParDo.of(new DoFn, Void>() {
> @ProcessElement
> public void processElement(ProcessContext c) {
> KV keyIt = c.element();
> String key = keyIt.getKey();
> Iterable itb = keyIt.getValue();
> Iterator it = itb.iterator();
> StringBuilder sb = new StringBuilder();
> sb.append(key).append(":[");
> while(it.hasNext()){
> sb.append(it.next()).append(",");
> }
> String str = sb.toString();
> str = str.substring(0,str.length() -1) + "]";
> System.out.println(str);
> String filePath = "/data/wyp/sparktest.txt";
> String line = "word-->["+key+"]total 
> count="+str+"--->time+"+c.timestamp().toString();
> System.out.println("writefile->"+line);
> FileUtil.write(filePath, line, true, true);
>   

[jira] [Created] (BEAM-2021) Fix Java's Coder class hierarchy

2017-04-19 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-2021:
-

 Summary: Fix Java's Coder class hierarchy
 Key: BEAM-2021
 URL: https://issues.apache.org/jira/browse/BEAM-2021
 Project: Beam
  Issue Type: Improvement
  Components: beam-model-runner-api, sdk-java-core
Affects Versions: First stable release
Reporter: Kenneth Knowles
Assignee: Thomas Groh


This is thoroughly out of hand. In the runner API world, there are two paths:

1. URN plus component coders plus custom payload (in the form of component 
coders alongside an SdkFunctionSpec)
2. Custom coder (a single URN) and payload is serialized Java. I think this 
never has component coders.

The other base classes have now been shown to be extraneous: they favor saving 
~3 lines of boilerplate for rarely written code at the cost of readability. 
Instead they should just be dropped.

The custom payload is an Any proto in the runner API. But tying the Coder 
interface to proto would be unfortunate from a design perspective and cannot be 
done anyhow due to dependency hell.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Issue Comment Deleted] (BEAM-1789) window can't not use in spark cluster module

2017-04-19 Thread tianyou (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tianyou updated BEAM-1789:
--
Comment: was deleted

(was: I  change _Window.Bound> fixWindow = 
Window.> into(
FixedWindows.of(size)
);_ to _Window.>into(FixedWindows.of(size))
  .triggering(
  AfterWatermark.pastEndOfWindow()
  
.withLateFirings(AfterProcessingTime
  
.pastFirstElementInPane().plusDelayOf(Duration.standardHours(1
  
.withAllowedLateness(Duration.standardDays(1)).discardingFiredPanes()_ ,But I 
have the same conclusion)

> window can't not use in spark cluster module
> 
>
> Key: BEAM-1789
> URL: https://issues.apache.org/jira/browse/BEAM-1789
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: tianyou
>Assignee: Amit Sela
>
>  I user beam in spark cluster,The application is blow.
> SparkPipelineOptions options = 
> PipelineOptionsFactory.as(SparkPipelineOptions.class);
> options.setRunner(SparkRunner.class);
> options.setEnableSparkMetricSinks(false);
> options.setStreaming(true);
> options.setSparkMaster("spark://10.100.124.205:6066");
> options.setAppName("Beam App Spark"+new Random().nextFloat());
> options.setJobName("Beam Job Spark"+new Random().nextFloat());
> System.out.println("App Name:"+options.getAppName());
> System.out.println("Job Name:"+options.getJobName());
> options.setMaxRecordsPerBatch(10L);
> 
> //  PipelineOptions options = PipelineOptionsFactory.create();
> Pipeline p = Pipeline.create(options);
> 
> //  Duration size = Duration.standardMinutes(4);
> long duration = 60;
> if(args!=null && args.length==1){
> duration = Integer.valueOf(args[0]);
> }
> Duration size = Duration.standardSeconds(duration);
> System.out.println("时间窗口为:["+duration+"]秒");
> Window.Bound> fixWindow = 
> Window.> into(
> FixedWindows.of(size)
> );
> 
> String kafkaAddress = "10.100.124.208:9093";
> //  String kafkaAddress = "192.168.100.212:9092";
> 
> Map kfConsunmerConf = new HashMap();
> kfConsunmerConf.put("auto.offset.reset", "latest");
> PCollection kafkaJsonPc = p.apply(KafkaIO. 
> read()
> .withBootstrapServers(kafkaAddress)
> .withTopics(ImmutableList.of("wypxx1"))
> .withKeyCoder(StringUtf8Coder.of()) 
> .withValueCoder(StringUtf8Coder.of())
> .updateConsumerProperties(kfConsunmerConf)
> .withoutMetadata() 
> ).apply(Values. create());
> 
> 
> PCollection> totalPc = kafkaJsonPc.apply(
> "count line",
> ParDo.of(new DoFn>() {
> @ProcessElement
>   public void processElement(ProcessContext c) {
> String line = c.element();
> Instant is = c.timestamp();
> if(line.length()>2)
>   line = line.substring(0,2);
> System.out.println(line + " " +  is.toString());
> c.output(KV.of(line, line));
>   }
>  })
> );
> 
> 
> PCollection> itPc = 
> totalPc.apply(fixWindow).apply(
> "group by appKey",
> GroupByKey.create()
> );
>   itPc.apply(ParDo.of(new DoFn, Void>() {
> @ProcessElement
> public void processElement(ProcessContext c) {
> KV keyIt = c.element();
> String key = keyIt.getKey();
> Iterable itb = keyIt.getValue();
> Iterator it = itb.iterator();
> StringBuilder sb = new StringBuilder();
> sb.append(key).append(":[");
> while(it.hasNext()){
> sb.append(it.next()).append(",");
> }
> String str = sb.toString();
> str = str.substring(0,str.length() -1) + "]";
> System.out.println(str);
> String filePath = "/data/wyp/sparktest.txt";
> String line = 

[jira] [Commented] (BEAM-1789) window can't not use in spark cluster module

2017-04-19 Thread tianyou (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975999#comment-15975999
 ] 

tianyou commented on BEAM-1789:
---

I  change _Window.Bound> fixWindow = 
Window.> into(
FixedWindows.of(size)
);_ to _Window.>into(FixedWindows.of(size))
  .triggering(
  AfterWatermark.pastEndOfWindow()
  
.withLateFirings(AfterProcessingTime
  
.pastFirstElementInPane().plusDelayOf(Duration.standardHours(1
  
.withAllowedLateness(Duration.standardDays(1)).discardingFiredPanes()_ ,But I 
have the same conclusion

> window can't not use in spark cluster module
> 
>
> Key: BEAM-1789
> URL: https://issues.apache.org/jira/browse/BEAM-1789
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: tianyou
>Assignee: Amit Sela
>
>  I user beam in spark cluster,The application is blow.
> SparkPipelineOptions options = 
> PipelineOptionsFactory.as(SparkPipelineOptions.class);
> options.setRunner(SparkRunner.class);
> options.setEnableSparkMetricSinks(false);
> options.setStreaming(true);
> options.setSparkMaster("spark://10.100.124.205:6066");
> options.setAppName("Beam App Spark"+new Random().nextFloat());
> options.setJobName("Beam Job Spark"+new Random().nextFloat());
> System.out.println("App Name:"+options.getAppName());
> System.out.println("Job Name:"+options.getJobName());
> options.setMaxRecordsPerBatch(10L);
> 
> //  PipelineOptions options = PipelineOptionsFactory.create();
> Pipeline p = Pipeline.create(options);
> 
> //  Duration size = Duration.standardMinutes(4);
> long duration = 60;
> if(args!=null && args.length==1){
> duration = Integer.valueOf(args[0]);
> }
> Duration size = Duration.standardSeconds(duration);
> System.out.println("时间窗口为:["+duration+"]秒");
> Window.Bound> fixWindow = 
> Window.> into(
> FixedWindows.of(size)
> );
> 
> String kafkaAddress = "10.100.124.208:9093";
> //  String kafkaAddress = "192.168.100.212:9092";
> 
> Map kfConsunmerConf = new HashMap();
> kfConsunmerConf.put("auto.offset.reset", "latest");
> PCollection kafkaJsonPc = p.apply(KafkaIO. 
> read()
> .withBootstrapServers(kafkaAddress)
> .withTopics(ImmutableList.of("wypxx1"))
> .withKeyCoder(StringUtf8Coder.of()) 
> .withValueCoder(StringUtf8Coder.of())
> .updateConsumerProperties(kfConsunmerConf)
> .withoutMetadata() 
> ).apply(Values. create());
> 
> 
> PCollection> totalPc = kafkaJsonPc.apply(
> "count line",
> ParDo.of(new DoFn>() {
> @ProcessElement
>   public void processElement(ProcessContext c) {
> String line = c.element();
> Instant is = c.timestamp();
> if(line.length()>2)
>   line = line.substring(0,2);
> System.out.println(line + " " +  is.toString());
> c.output(KV.of(line, line));
>   }
>  })
> );
> 
> 
> PCollection> itPc = 
> totalPc.apply(fixWindow).apply(
> "group by appKey",
> GroupByKey.create()
> );
>   itPc.apply(ParDo.of(new DoFn, Void>() {
> @ProcessElement
> public void processElement(ProcessContext c) {
> KV keyIt = c.element();
> String key = keyIt.getKey();
> Iterable itb = keyIt.getValue();
> Iterator it = itb.iterator();
> StringBuilder sb = new StringBuilder();
> sb.append(key).append(":[");
> while(it.hasNext()){
> sb.append(it.next()).append(",");
> }
> String str = sb.toString();
> str = str.substring(0,str.length() -1) + "]";
> System.out.println(str);
> String filePath = "/data/wyp/sparktest.txt";
> String line 

[jira] [Commented] (BEAM-1789) window can't not use in spark cluster module

2017-04-19 Thread tianyou (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975997#comment-15975997
 ] 

tianyou commented on BEAM-1789:
---

bq ttt  bq

> window can't not use in spark cluster module
> 
>
> Key: BEAM-1789
> URL: https://issues.apache.org/jira/browse/BEAM-1789
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: tianyou
>Assignee: Amit Sela
>
>  I user beam in spark cluster,The application is blow.
> SparkPipelineOptions options = 
> PipelineOptionsFactory.as(SparkPipelineOptions.class);
> options.setRunner(SparkRunner.class);
> options.setEnableSparkMetricSinks(false);
> options.setStreaming(true);
> options.setSparkMaster("spark://10.100.124.205:6066");
> options.setAppName("Beam App Spark"+new Random().nextFloat());
> options.setJobName("Beam Job Spark"+new Random().nextFloat());
> System.out.println("App Name:"+options.getAppName());
> System.out.println("Job Name:"+options.getJobName());
> options.setMaxRecordsPerBatch(10L);
> 
> //  PipelineOptions options = PipelineOptionsFactory.create();
> Pipeline p = Pipeline.create(options);
> 
> //  Duration size = Duration.standardMinutes(4);
> long duration = 60;
> if(args!=null && args.length==1){
> duration = Integer.valueOf(args[0]);
> }
> Duration size = Duration.standardSeconds(duration);
> System.out.println("时间窗口为:["+duration+"]秒");
> Window.Bound> fixWindow = 
> Window.> into(
> FixedWindows.of(size)
> );
> 
> String kafkaAddress = "10.100.124.208:9093";
> //  String kafkaAddress = "192.168.100.212:9092";
> 
> Map kfConsunmerConf = new HashMap();
> kfConsunmerConf.put("auto.offset.reset", "latest");
> PCollection kafkaJsonPc = p.apply(KafkaIO. 
> read()
> .withBootstrapServers(kafkaAddress)
> .withTopics(ImmutableList.of("wypxx1"))
> .withKeyCoder(StringUtf8Coder.of()) 
> .withValueCoder(StringUtf8Coder.of())
> .updateConsumerProperties(kfConsunmerConf)
> .withoutMetadata() 
> ).apply(Values. create());
> 
> 
> PCollection> totalPc = kafkaJsonPc.apply(
> "count line",
> ParDo.of(new DoFn>() {
> @ProcessElement
>   public void processElement(ProcessContext c) {
> String line = c.element();
> Instant is = c.timestamp();
> if(line.length()>2)
>   line = line.substring(0,2);
> System.out.println(line + " " +  is.toString());
> c.output(KV.of(line, line));
>   }
>  })
> );
> 
> 
> PCollection> itPc = 
> totalPc.apply(fixWindow).apply(
> "group by appKey",
> GroupByKey.create()
> );
>   itPc.apply(ParDo.of(new DoFn, Void>() {
> @ProcessElement
> public void processElement(ProcessContext c) {
> KV keyIt = c.element();
> String key = keyIt.getKey();
> Iterable itb = keyIt.getValue();
> Iterator it = itb.iterator();
> StringBuilder sb = new StringBuilder();
> sb.append(key).append(":[");
> while(it.hasNext()){
> sb.append(it.next()).append(",");
> }
> String str = sb.toString();
> str = str.substring(0,str.length() -1) + "]";
> System.out.println(str);
> String filePath = "/data/wyp/sparktest.txt";
> String line = "word-->["+key+"]total 
> count="+str+"--->time+"+c.timestamp().toString();
> System.out.println("writefile->"+line);
> FileUtil.write(filePath, line, true, true);
> }
> 
>  }));
>   
> p.run().waitUntilFinish();
> When I user submit application to spark cluster.In spark UI,I can see log of  
> totalPc PCollection  of. after one miniter but I can.t see log of itPc 
> PCollection.
> I use local mode spark,It work well.
> Please help me to resovle this proplems.Thanks!



--
This message was sent by Atlassian JIRA

[jira] [Commented] (BEAM-1789) window can't not use in spark cluster module

2017-04-19 Thread tianyou (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975998#comment-15975998
 ] 

tianyou commented on BEAM-1789:
---

bq. Some block quoted text

> window can't not use in spark cluster module
> 
>
> Key: BEAM-1789
> URL: https://issues.apache.org/jira/browse/BEAM-1789
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: tianyou
>Assignee: Amit Sela
>
>  I user beam in spark cluster,The application is blow.
> SparkPipelineOptions options = 
> PipelineOptionsFactory.as(SparkPipelineOptions.class);
> options.setRunner(SparkRunner.class);
> options.setEnableSparkMetricSinks(false);
> options.setStreaming(true);
> options.setSparkMaster("spark://10.100.124.205:6066");
> options.setAppName("Beam App Spark"+new Random().nextFloat());
> options.setJobName("Beam Job Spark"+new Random().nextFloat());
> System.out.println("App Name:"+options.getAppName());
> System.out.println("Job Name:"+options.getJobName());
> options.setMaxRecordsPerBatch(10L);
> 
> //  PipelineOptions options = PipelineOptionsFactory.create();
> Pipeline p = Pipeline.create(options);
> 
> //  Duration size = Duration.standardMinutes(4);
> long duration = 60;
> if(args!=null && args.length==1){
> duration = Integer.valueOf(args[0]);
> }
> Duration size = Duration.standardSeconds(duration);
> System.out.println("时间窗口为:["+duration+"]秒");
> Window.Bound> fixWindow = 
> Window.> into(
> FixedWindows.of(size)
> );
> 
> String kafkaAddress = "10.100.124.208:9093";
> //  String kafkaAddress = "192.168.100.212:9092";
> 
> Map kfConsunmerConf = new HashMap();
> kfConsunmerConf.put("auto.offset.reset", "latest");
> PCollection kafkaJsonPc = p.apply(KafkaIO. 
> read()
> .withBootstrapServers(kafkaAddress)
> .withTopics(ImmutableList.of("wypxx1"))
> .withKeyCoder(StringUtf8Coder.of()) 
> .withValueCoder(StringUtf8Coder.of())
> .updateConsumerProperties(kfConsunmerConf)
> .withoutMetadata() 
> ).apply(Values. create());
> 
> 
> PCollection> totalPc = kafkaJsonPc.apply(
> "count line",
> ParDo.of(new DoFn>() {
> @ProcessElement
>   public void processElement(ProcessContext c) {
> String line = c.element();
> Instant is = c.timestamp();
> if(line.length()>2)
>   line = line.substring(0,2);
> System.out.println(line + " " +  is.toString());
> c.output(KV.of(line, line));
>   }
>  })
> );
> 
> 
> PCollection> itPc = 
> totalPc.apply(fixWindow).apply(
> "group by appKey",
> GroupByKey.create()
> );
>   itPc.apply(ParDo.of(new DoFn, Void>() {
> @ProcessElement
> public void processElement(ProcessContext c) {
> KV keyIt = c.element();
> String key = keyIt.getKey();
> Iterable itb = keyIt.getValue();
> Iterator it = itb.iterator();
> StringBuilder sb = new StringBuilder();
> sb.append(key).append(":[");
> while(it.hasNext()){
> sb.append(it.next()).append(",");
> }
> String str = sb.toString();
> str = str.substring(0,str.length() -1) + "]";
> System.out.println(str);
> String filePath = "/data/wyp/sparktest.txt";
> String line = "word-->["+key+"]total 
> count="+str+"--->time+"+c.timestamp().toString();
> System.out.println("writefile->"+line);
> FileUtil.write(filePath, line, true, true);
> }
> 
>  }));
>   
> p.run().waitUntilFinish();
> When I user submit application to spark cluster.In spark UI,I can see log of  
> totalPc PCollection  of. after one miniter but I can.t see log of itPc 
> PCollection.
> I use local mode spark,It work well.
> Please help me to resovle this proplems.Thanks!



--
This message was sent by 

[jira] [Issue Comment Deleted] (BEAM-1789) window can't not use in spark cluster module

2017-04-19 Thread tianyou (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tianyou updated BEAM-1789:
--
Comment: was deleted

(was: bq. Some block quoted text)

> window can't not use in spark cluster module
> 
>
> Key: BEAM-1789
> URL: https://issues.apache.org/jira/browse/BEAM-1789
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: tianyou
>Assignee: Amit Sela
>
>  I user beam in spark cluster,The application is blow.
> SparkPipelineOptions options = 
> PipelineOptionsFactory.as(SparkPipelineOptions.class);
> options.setRunner(SparkRunner.class);
> options.setEnableSparkMetricSinks(false);
> options.setStreaming(true);
> options.setSparkMaster("spark://10.100.124.205:6066");
> options.setAppName("Beam App Spark"+new Random().nextFloat());
> options.setJobName("Beam Job Spark"+new Random().nextFloat());
> System.out.println("App Name:"+options.getAppName());
> System.out.println("Job Name:"+options.getJobName());
> options.setMaxRecordsPerBatch(10L);
> 
> //  PipelineOptions options = PipelineOptionsFactory.create();
> Pipeline p = Pipeline.create(options);
> 
> //  Duration size = Duration.standardMinutes(4);
> long duration = 60;
> if(args!=null && args.length==1){
> duration = Integer.valueOf(args[0]);
> }
> Duration size = Duration.standardSeconds(duration);
> System.out.println("时间窗口为:["+duration+"]秒");
> Window.Bound> fixWindow = 
> Window.> into(
> FixedWindows.of(size)
> );
> 
> String kafkaAddress = "10.100.124.208:9093";
> //  String kafkaAddress = "192.168.100.212:9092";
> 
> Map kfConsunmerConf = new HashMap();
> kfConsunmerConf.put("auto.offset.reset", "latest");
> PCollection kafkaJsonPc = p.apply(KafkaIO. 
> read()
> .withBootstrapServers(kafkaAddress)
> .withTopics(ImmutableList.of("wypxx1"))
> .withKeyCoder(StringUtf8Coder.of()) 
> .withValueCoder(StringUtf8Coder.of())
> .updateConsumerProperties(kfConsunmerConf)
> .withoutMetadata() 
> ).apply(Values. create());
> 
> 
> PCollection> totalPc = kafkaJsonPc.apply(
> "count line",
> ParDo.of(new DoFn>() {
> @ProcessElement
>   public void processElement(ProcessContext c) {
> String line = c.element();
> Instant is = c.timestamp();
> if(line.length()>2)
>   line = line.substring(0,2);
> System.out.println(line + " " +  is.toString());
> c.output(KV.of(line, line));
>   }
>  })
> );
> 
> 
> PCollection> itPc = 
> totalPc.apply(fixWindow).apply(
> "group by appKey",
> GroupByKey.create()
> );
>   itPc.apply(ParDo.of(new DoFn, Void>() {
> @ProcessElement
> public void processElement(ProcessContext c) {
> KV keyIt = c.element();
> String key = keyIt.getKey();
> Iterable itb = keyIt.getValue();
> Iterator it = itb.iterator();
> StringBuilder sb = new StringBuilder();
> sb.append(key).append(":[");
> while(it.hasNext()){
> sb.append(it.next()).append(",");
> }
> String str = sb.toString();
> str = str.substring(0,str.length() -1) + "]";
> System.out.println(str);
> String filePath = "/data/wyp/sparktest.txt";
> String line = "word-->["+key+"]total 
> count="+str+"--->time+"+c.timestamp().toString();
> System.out.println("writefile->"+line);
> FileUtil.write(filePath, line, true, true);
> }
> 
>  }));
>   
> p.run().waitUntilFinish();
> When I user submit application to spark cluster.In spark UI,I can see log of  
> totalPc PCollection  of. after one miniter but I can.t see log of itPc 
> PCollection.
> I use local mode spark,It work well.
> Please help me to resovle this proplems.Thanks!



--
This message was sent by Atlassian 

[jira] [Issue Comment Deleted] (BEAM-1789) window can't not use in spark cluster module

2017-04-19 Thread tianyou (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tianyou updated BEAM-1789:
--
Comment: was deleted

(was: bq ttt  bq)

> window can't not use in spark cluster module
> 
>
> Key: BEAM-1789
> URL: https://issues.apache.org/jira/browse/BEAM-1789
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: tianyou
>Assignee: Amit Sela
>
>  I user beam in spark cluster,The application is blow.
> SparkPipelineOptions options = 
> PipelineOptionsFactory.as(SparkPipelineOptions.class);
> options.setRunner(SparkRunner.class);
> options.setEnableSparkMetricSinks(false);
> options.setStreaming(true);
> options.setSparkMaster("spark://10.100.124.205:6066");
> options.setAppName("Beam App Spark"+new Random().nextFloat());
> options.setJobName("Beam Job Spark"+new Random().nextFloat());
> System.out.println("App Name:"+options.getAppName());
> System.out.println("Job Name:"+options.getJobName());
> options.setMaxRecordsPerBatch(10L);
> 
> //  PipelineOptions options = PipelineOptionsFactory.create();
> Pipeline p = Pipeline.create(options);
> 
> //  Duration size = Duration.standardMinutes(4);
> long duration = 60;
> if(args!=null && args.length==1){
> duration = Integer.valueOf(args[0]);
> }
> Duration size = Duration.standardSeconds(duration);
> System.out.println("时间窗口为:["+duration+"]秒");
> Window.Bound> fixWindow = 
> Window.> into(
> FixedWindows.of(size)
> );
> 
> String kafkaAddress = "10.100.124.208:9093";
> //  String kafkaAddress = "192.168.100.212:9092";
> 
> Map kfConsunmerConf = new HashMap();
> kfConsunmerConf.put("auto.offset.reset", "latest");
> PCollection kafkaJsonPc = p.apply(KafkaIO. 
> read()
> .withBootstrapServers(kafkaAddress)
> .withTopics(ImmutableList.of("wypxx1"))
> .withKeyCoder(StringUtf8Coder.of()) 
> .withValueCoder(StringUtf8Coder.of())
> .updateConsumerProperties(kfConsunmerConf)
> .withoutMetadata() 
> ).apply(Values. create());
> 
> 
> PCollection> totalPc = kafkaJsonPc.apply(
> "count line",
> ParDo.of(new DoFn>() {
> @ProcessElement
>   public void processElement(ProcessContext c) {
> String line = c.element();
> Instant is = c.timestamp();
> if(line.length()>2)
>   line = line.substring(0,2);
> System.out.println(line + " " +  is.toString());
> c.output(KV.of(line, line));
>   }
>  })
> );
> 
> 
> PCollection> itPc = 
> totalPc.apply(fixWindow).apply(
> "group by appKey",
> GroupByKey.create()
> );
>   itPc.apply(ParDo.of(new DoFn, Void>() {
> @ProcessElement
> public void processElement(ProcessContext c) {
> KV keyIt = c.element();
> String key = keyIt.getKey();
> Iterable itb = keyIt.getValue();
> Iterator it = itb.iterator();
> StringBuilder sb = new StringBuilder();
> sb.append(key).append(":[");
> while(it.hasNext()){
> sb.append(it.next()).append(",");
> }
> String str = sb.toString();
> str = str.substring(0,str.length() -1) + "]";
> System.out.println(str);
> String filePath = "/data/wyp/sparktest.txt";
> String line = "word-->["+key+"]total 
> count="+str+"--->time+"+c.timestamp().toString();
> System.out.println("writefile->"+line);
> FileUtil.write(filePath, line, true, true);
> }
> 
>  }));
>   
> p.run().waitUntilFinish();
> When I user submit application to spark cluster.In spark UI,I can see log of  
> totalPc PCollection  of. after one miniter but I can.t see log of itPc 
> PCollection.
> I use local mode spark,It work well.
> Please help me to resovle this proplems.Thanks!



--
This message was sent by Atlassian JIRA

[jira] [Commented] (BEAM-1789) window can't not use in spark cluster module

2017-04-19 Thread tianyou (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975996#comment-15975996
 ] 

tianyou commented on BEAM-1789:
---

??ffdsafds??

> window can't not use in spark cluster module
> 
>
> Key: BEAM-1789
> URL: https://issues.apache.org/jira/browse/BEAM-1789
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: tianyou
>Assignee: Amit Sela
>
>  I user beam in spark cluster,The application is blow.
> SparkPipelineOptions options = 
> PipelineOptionsFactory.as(SparkPipelineOptions.class);
> options.setRunner(SparkRunner.class);
> options.setEnableSparkMetricSinks(false);
> options.setStreaming(true);
> options.setSparkMaster("spark://10.100.124.205:6066");
> options.setAppName("Beam App Spark"+new Random().nextFloat());
> options.setJobName("Beam Job Spark"+new Random().nextFloat());
> System.out.println("App Name:"+options.getAppName());
> System.out.println("Job Name:"+options.getJobName());
> options.setMaxRecordsPerBatch(10L);
> 
> //  PipelineOptions options = PipelineOptionsFactory.create();
> Pipeline p = Pipeline.create(options);
> 
> //  Duration size = Duration.standardMinutes(4);
> long duration = 60;
> if(args!=null && args.length==1){
> duration = Integer.valueOf(args[0]);
> }
> Duration size = Duration.standardSeconds(duration);
> System.out.println("时间窗口为:["+duration+"]秒");
> Window.Bound> fixWindow = 
> Window.> into(
> FixedWindows.of(size)
> );
> 
> String kafkaAddress = "10.100.124.208:9093";
> //  String kafkaAddress = "192.168.100.212:9092";
> 
> Map kfConsunmerConf = new HashMap();
> kfConsunmerConf.put("auto.offset.reset", "latest");
> PCollection kafkaJsonPc = p.apply(KafkaIO. 
> read()
> .withBootstrapServers(kafkaAddress)
> .withTopics(ImmutableList.of("wypxx1"))
> .withKeyCoder(StringUtf8Coder.of()) 
> .withValueCoder(StringUtf8Coder.of())
> .updateConsumerProperties(kfConsunmerConf)
> .withoutMetadata() 
> ).apply(Values. create());
> 
> 
> PCollection> totalPc = kafkaJsonPc.apply(
> "count line",
> ParDo.of(new DoFn>() {
> @ProcessElement
>   public void processElement(ProcessContext c) {
> String line = c.element();
> Instant is = c.timestamp();
> if(line.length()>2)
>   line = line.substring(0,2);
> System.out.println(line + " " +  is.toString());
> c.output(KV.of(line, line));
>   }
>  })
> );
> 
> 
> PCollection> itPc = 
> totalPc.apply(fixWindow).apply(
> "group by appKey",
> GroupByKey.create()
> );
>   itPc.apply(ParDo.of(new DoFn, Void>() {
> @ProcessElement
> public void processElement(ProcessContext c) {
> KV keyIt = c.element();
> String key = keyIt.getKey();
> Iterable itb = keyIt.getValue();
> Iterator it = itb.iterator();
> StringBuilder sb = new StringBuilder();
> sb.append(key).append(":[");
> while(it.hasNext()){
> sb.append(it.next()).append(",");
> }
> String str = sb.toString();
> str = str.substring(0,str.length() -1) + "]";
> System.out.println(str);
> String filePath = "/data/wyp/sparktest.txt";
> String line = "word-->["+key+"]total 
> count="+str+"--->time+"+c.timestamp().toString();
> System.out.println("writefile->"+line);
> FileUtil.write(filePath, line, true, true);
> }
> 
>  }));
>   
> p.run().waitUntilFinish();
> When I user submit application to spark cluster.In spark UI,I can see log of  
> totalPc PCollection  of. after one miniter but I can.t see log of itPc 
> PCollection.
> I use local mode spark,It work well.
> Please help me to resovle this proplems.Thanks!



--
This message was sent by Atlassian JIRA

[jira] [Issue Comment Deleted] (BEAM-1789) window can't not use in spark cluster module

2017-04-19 Thread tianyou (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tianyou updated BEAM-1789:
--
Comment: was deleted

(was: ??ffdsafds??)

> window can't not use in spark cluster module
> 
>
> Key: BEAM-1789
> URL: https://issues.apache.org/jira/browse/BEAM-1789
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: tianyou
>Assignee: Amit Sela
>
>  I user beam in spark cluster,The application is blow.
> SparkPipelineOptions options = 
> PipelineOptionsFactory.as(SparkPipelineOptions.class);
> options.setRunner(SparkRunner.class);
> options.setEnableSparkMetricSinks(false);
> options.setStreaming(true);
> options.setSparkMaster("spark://10.100.124.205:6066");
> options.setAppName("Beam App Spark"+new Random().nextFloat());
> options.setJobName("Beam Job Spark"+new Random().nextFloat());
> System.out.println("App Name:"+options.getAppName());
> System.out.println("Job Name:"+options.getJobName());
> options.setMaxRecordsPerBatch(10L);
> 
> //  PipelineOptions options = PipelineOptionsFactory.create();
> Pipeline p = Pipeline.create(options);
> 
> //  Duration size = Duration.standardMinutes(4);
> long duration = 60;
> if(args!=null && args.length==1){
> duration = Integer.valueOf(args[0]);
> }
> Duration size = Duration.standardSeconds(duration);
> System.out.println("时间窗口为:["+duration+"]秒");
> Window.Bound> fixWindow = 
> Window.> into(
> FixedWindows.of(size)
> );
> 
> String kafkaAddress = "10.100.124.208:9093";
> //  String kafkaAddress = "192.168.100.212:9092";
> 
> Map kfConsunmerConf = new HashMap();
> kfConsunmerConf.put("auto.offset.reset", "latest");
> PCollection kafkaJsonPc = p.apply(KafkaIO. 
> read()
> .withBootstrapServers(kafkaAddress)
> .withTopics(ImmutableList.of("wypxx1"))
> .withKeyCoder(StringUtf8Coder.of()) 
> .withValueCoder(StringUtf8Coder.of())
> .updateConsumerProperties(kfConsunmerConf)
> .withoutMetadata() 
> ).apply(Values. create());
> 
> 
> PCollection> totalPc = kafkaJsonPc.apply(
> "count line",
> ParDo.of(new DoFn>() {
> @ProcessElement
>   public void processElement(ProcessContext c) {
> String line = c.element();
> Instant is = c.timestamp();
> if(line.length()>2)
>   line = line.substring(0,2);
> System.out.println(line + " " +  is.toString());
> c.output(KV.of(line, line));
>   }
>  })
> );
> 
> 
> PCollection> itPc = 
> totalPc.apply(fixWindow).apply(
> "group by appKey",
> GroupByKey.create()
> );
>   itPc.apply(ParDo.of(new DoFn, Void>() {
> @ProcessElement
> public void processElement(ProcessContext c) {
> KV keyIt = c.element();
> String key = keyIt.getKey();
> Iterable itb = keyIt.getValue();
> Iterator it = itb.iterator();
> StringBuilder sb = new StringBuilder();
> sb.append(key).append(":[");
> while(it.hasNext()){
> sb.append(it.next()).append(",");
> }
> String str = sb.toString();
> str = str.substring(0,str.length() -1) + "]";
> System.out.println(str);
> String filePath = "/data/wyp/sparktest.txt";
> String line = "word-->["+key+"]total 
> count="+str+"--->time+"+c.timestamp().toString();
> System.out.println("writefile->"+line);
> FileUtil.write(filePath, line, true, true);
> }
> 
>  }));
>   
> p.run().waitUntilFinish();
> When I user submit application to spark cluster.In spark UI,I can see log of  
> totalPc PCollection  of. after one miniter but I can.t see log of itPc 
> PCollection.
> I use local mode spark,It work well.
> Please help me to resovle this proplems.Thanks!



--
This message was sent by Atlassian JIRA

[jira] [Assigned] (BEAM-2020) Move CloudObject to Dataflow runner

2017-04-19 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-2020:
-

Assignee: Luke Cwik  (was: Thomas Groh)

> Move CloudObject to Dataflow runner
> ---
>
> Key: BEAM-2020
> URL: https://issues.apache.org/jira/browse/BEAM-2020
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api, sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Luke Cwik
> Fix For: First stable release
>
>
> This entails primarily eliminating Coder.asCloudObject() by adding the needed 
> accessors, and possibly a serialization registrar discipline, for coders in 
> the Runner API proto.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1789) window can't not use in spark cluster module

2017-04-19 Thread tianyou (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975992#comment-15975992
 ] 

tianyou commented on BEAM-1789:
---

*黑体*

> window can't not use in spark cluster module
> 
>
> Key: BEAM-1789
> URL: https://issues.apache.org/jira/browse/BEAM-1789
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: tianyou
>Assignee: Amit Sela
>
>  I user beam in spark cluster,The application is blow.
> SparkPipelineOptions options = 
> PipelineOptionsFactory.as(SparkPipelineOptions.class);
> options.setRunner(SparkRunner.class);
> options.setEnableSparkMetricSinks(false);
> options.setStreaming(true);
> options.setSparkMaster("spark://10.100.124.205:6066");
> options.setAppName("Beam App Spark"+new Random().nextFloat());
> options.setJobName("Beam Job Spark"+new Random().nextFloat());
> System.out.println("App Name:"+options.getAppName());
> System.out.println("Job Name:"+options.getJobName());
> options.setMaxRecordsPerBatch(10L);
> 
> //  PipelineOptions options = PipelineOptionsFactory.create();
> Pipeline p = Pipeline.create(options);
> 
> //  Duration size = Duration.standardMinutes(4);
> long duration = 60;
> if(args!=null && args.length==1){
> duration = Integer.valueOf(args[0]);
> }
> Duration size = Duration.standardSeconds(duration);
> System.out.println("时间窗口为:["+duration+"]秒");
> Window.Bound> fixWindow = 
> Window.> into(
> FixedWindows.of(size)
> );
> 
> String kafkaAddress = "10.100.124.208:9093";
> //  String kafkaAddress = "192.168.100.212:9092";
> 
> Map kfConsunmerConf = new HashMap();
> kfConsunmerConf.put("auto.offset.reset", "latest");
> PCollection kafkaJsonPc = p.apply(KafkaIO. 
> read()
> .withBootstrapServers(kafkaAddress)
> .withTopics(ImmutableList.of("wypxx1"))
> .withKeyCoder(StringUtf8Coder.of()) 
> .withValueCoder(StringUtf8Coder.of())
> .updateConsumerProperties(kfConsunmerConf)
> .withoutMetadata() 
> ).apply(Values. create());
> 
> 
> PCollection> totalPc = kafkaJsonPc.apply(
> "count line",
> ParDo.of(new DoFn>() {
> @ProcessElement
>   public void processElement(ProcessContext c) {
> String line = c.element();
> Instant is = c.timestamp();
> if(line.length()>2)
>   line = line.substring(0,2);
> System.out.println(line + " " +  is.toString());
> c.output(KV.of(line, line));
>   }
>  })
> );
> 
> 
> PCollection> itPc = 
> totalPc.apply(fixWindow).apply(
> "group by appKey",
> GroupByKey.create()
> );
>   itPc.apply(ParDo.of(new DoFn, Void>() {
> @ProcessElement
> public void processElement(ProcessContext c) {
> KV keyIt = c.element();
> String key = keyIt.getKey();
> Iterable itb = keyIt.getValue();
> Iterator it = itb.iterator();
> StringBuilder sb = new StringBuilder();
> sb.append(key).append(":[");
> while(it.hasNext()){
> sb.append(it.next()).append(",");
> }
> String str = sb.toString();
> str = str.substring(0,str.length() -1) + "]";
> System.out.println(str);
> String filePath = "/data/wyp/sparktest.txt";
> String line = "word-->["+key+"]total 
> count="+str+"--->time+"+c.timestamp().toString();
> System.out.println("writefile->"+line);
> FileUtil.write(filePath, line, true, true);
> }
> 
>  }));
>   
> p.run().waitUntilFinish();
> When I user submit application to spark cluster.In spark UI,I can see log of  
> totalPc PCollection  of. after one miniter but I can.t see log of itPc 
> PCollection.
> I use local mode spark,It work well.
> Please help me to resovle this proplems.Thanks!



--
This message was sent by Atlassian JIRA

[jira] [Created] (BEAM-2020) Move CloudObject to Dataflow runner

2017-04-19 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-2020:
-

 Summary: Move CloudObject to Dataflow runner
 Key: BEAM-2020
 URL: https://issues.apache.org/jira/browse/BEAM-2020
 Project: Beam
  Issue Type: Improvement
  Components: beam-model-runner-api, sdk-java-core
Reporter: Kenneth Knowles
Assignee: Thomas Groh
 Fix For: First stable release


This entails primarily eliminating Coder.asCloudObject() by adding the needed 
accessors, and possibly a serialization registrar discipline, for coders in the 
Runner API proto.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Issue Comment Deleted] (BEAM-1789) window can't not use in spark cluster module

2017-04-19 Thread tianyou (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tianyou updated BEAM-1789:
--
Comment: was deleted

(was: *黑体*)

> window can't not use in spark cluster module
> 
>
> Key: BEAM-1789
> URL: https://issues.apache.org/jira/browse/BEAM-1789
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: tianyou
>Assignee: Amit Sela
>
>  I user beam in spark cluster,The application is blow.
> SparkPipelineOptions options = 
> PipelineOptionsFactory.as(SparkPipelineOptions.class);
> options.setRunner(SparkRunner.class);
> options.setEnableSparkMetricSinks(false);
> options.setStreaming(true);
> options.setSparkMaster("spark://10.100.124.205:6066");
> options.setAppName("Beam App Spark"+new Random().nextFloat());
> options.setJobName("Beam Job Spark"+new Random().nextFloat());
> System.out.println("App Name:"+options.getAppName());
> System.out.println("Job Name:"+options.getJobName());
> options.setMaxRecordsPerBatch(10L);
> 
> //  PipelineOptions options = PipelineOptionsFactory.create();
> Pipeline p = Pipeline.create(options);
> 
> //  Duration size = Duration.standardMinutes(4);
> long duration = 60;
> if(args!=null && args.length==1){
> duration = Integer.valueOf(args[0]);
> }
> Duration size = Duration.standardSeconds(duration);
> System.out.println("时间窗口为:["+duration+"]秒");
> Window.Bound> fixWindow = 
> Window.> into(
> FixedWindows.of(size)
> );
> 
> String kafkaAddress = "10.100.124.208:9093";
> //  String kafkaAddress = "192.168.100.212:9092";
> 
> Map kfConsunmerConf = new HashMap();
> kfConsunmerConf.put("auto.offset.reset", "latest");
> PCollection kafkaJsonPc = p.apply(KafkaIO. 
> read()
> .withBootstrapServers(kafkaAddress)
> .withTopics(ImmutableList.of("wypxx1"))
> .withKeyCoder(StringUtf8Coder.of()) 
> .withValueCoder(StringUtf8Coder.of())
> .updateConsumerProperties(kfConsunmerConf)
> .withoutMetadata() 
> ).apply(Values. create());
> 
> 
> PCollection> totalPc = kafkaJsonPc.apply(
> "count line",
> ParDo.of(new DoFn>() {
> @ProcessElement
>   public void processElement(ProcessContext c) {
> String line = c.element();
> Instant is = c.timestamp();
> if(line.length()>2)
>   line = line.substring(0,2);
> System.out.println(line + " " +  is.toString());
> c.output(KV.of(line, line));
>   }
>  })
> );
> 
> 
> PCollection> itPc = 
> totalPc.apply(fixWindow).apply(
> "group by appKey",
> GroupByKey.create()
> );
>   itPc.apply(ParDo.of(new DoFn, Void>() {
> @ProcessElement
> public void processElement(ProcessContext c) {
> KV keyIt = c.element();
> String key = keyIt.getKey();
> Iterable itb = keyIt.getValue();
> Iterator it = itb.iterator();
> StringBuilder sb = new StringBuilder();
> sb.append(key).append(":[");
> while(it.hasNext()){
> sb.append(it.next()).append(",");
> }
> String str = sb.toString();
> str = str.substring(0,str.length() -1) + "]";
> System.out.println(str);
> String filePath = "/data/wyp/sparktest.txt";
> String line = "word-->["+key+"]total 
> count="+str+"--->time+"+c.timestamp().toString();
> System.out.println("writefile->"+line);
> FileUtil.write(filePath, line, true, true);
> }
> 
>  }));
>   
> p.run().waitUntilFinish();
> When I user submit application to spark cluster.In spark UI,I can see log of  
> totalPc PCollection  of. after one miniter but I can.t see log of itPc 
> PCollection.
> I use local mode spark,It work well.
> Please help me to resovle this proplems.Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1789) window can't not use in spark cluster module

2017-04-19 Thread tianyou (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975989#comment-15975989
 ] 

tianyou commented on BEAM-1789:
---

Test _emphasis_ ttt _emphasis_

> window can't not use in spark cluster module
> 
>
> Key: BEAM-1789
> URL: https://issues.apache.org/jira/browse/BEAM-1789
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: tianyou
>Assignee: Amit Sela
>
>  I user beam in spark cluster,The application is blow.
> SparkPipelineOptions options = 
> PipelineOptionsFactory.as(SparkPipelineOptions.class);
> options.setRunner(SparkRunner.class);
> options.setEnableSparkMetricSinks(false);
> options.setStreaming(true);
> options.setSparkMaster("spark://10.100.124.205:6066");
> options.setAppName("Beam App Spark"+new Random().nextFloat());
> options.setJobName("Beam Job Spark"+new Random().nextFloat());
> System.out.println("App Name:"+options.getAppName());
> System.out.println("Job Name:"+options.getJobName());
> options.setMaxRecordsPerBatch(10L);
> 
> //  PipelineOptions options = PipelineOptionsFactory.create();
> Pipeline p = Pipeline.create(options);
> 
> //  Duration size = Duration.standardMinutes(4);
> long duration = 60;
> if(args!=null && args.length==1){
> duration = Integer.valueOf(args[0]);
> }
> Duration size = Duration.standardSeconds(duration);
> System.out.println("时间窗口为:["+duration+"]秒");
> Window.Bound> fixWindow = 
> Window.> into(
> FixedWindows.of(size)
> );
> 
> String kafkaAddress = "10.100.124.208:9093";
> //  String kafkaAddress = "192.168.100.212:9092";
> 
> Map kfConsunmerConf = new HashMap();
> kfConsunmerConf.put("auto.offset.reset", "latest");
> PCollection kafkaJsonPc = p.apply(KafkaIO. 
> read()
> .withBootstrapServers(kafkaAddress)
> .withTopics(ImmutableList.of("wypxx1"))
> .withKeyCoder(StringUtf8Coder.of()) 
> .withValueCoder(StringUtf8Coder.of())
> .updateConsumerProperties(kfConsunmerConf)
> .withoutMetadata() 
> ).apply(Values. create());
> 
> 
> PCollection> totalPc = kafkaJsonPc.apply(
> "count line",
> ParDo.of(new DoFn>() {
> @ProcessElement
>   public void processElement(ProcessContext c) {
> String line = c.element();
> Instant is = c.timestamp();
> if(line.length()>2)
>   line = line.substring(0,2);
> System.out.println(line + " " +  is.toString());
> c.output(KV.of(line, line));
>   }
>  })
> );
> 
> 
> PCollection> itPc = 
> totalPc.apply(fixWindow).apply(
> "group by appKey",
> GroupByKey.create()
> );
>   itPc.apply(ParDo.of(new DoFn, Void>() {
> @ProcessElement
> public void processElement(ProcessContext c) {
> KV keyIt = c.element();
> String key = keyIt.getKey();
> Iterable itb = keyIt.getValue();
> Iterator it = itb.iterator();
> StringBuilder sb = new StringBuilder();
> sb.append(key).append(":[");
> while(it.hasNext()){
> sb.append(it.next()).append(",");
> }
> String str = sb.toString();
> str = str.substring(0,str.length() -1) + "]";
> System.out.println(str);
> String filePath = "/data/wyp/sparktest.txt";
> String line = "word-->["+key+"]total 
> count="+str+"--->time+"+c.timestamp().toString();
> System.out.println("writefile->"+line);
> FileUtil.write(filePath, line, true, true);
> }
> 
>  }));
>   
> p.run().waitUntilFinish();
> When I user submit application to spark cluster.In spark UI,I can see log of  
> totalPc PCollection  of. after one miniter but I can.t see log of itPc 
> PCollection.
> I use local mode spark,It work well.
> Please help me to resovle this proplems.Thanks!



--
This message was sent by 

[jira] [Issue Comment Deleted] (BEAM-1789) window can't not use in spark cluster module

2017-04-19 Thread tianyou (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tianyou updated BEAM-1789:
--
Comment: was deleted

(was: Test _emphasis_ ttt _emphasis_)

> window can't not use in spark cluster module
> 
>
> Key: BEAM-1789
> URL: https://issues.apache.org/jira/browse/BEAM-1789
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: tianyou
>Assignee: Amit Sela
>
>  I user beam in spark cluster,The application is blow.
> SparkPipelineOptions options = 
> PipelineOptionsFactory.as(SparkPipelineOptions.class);
> options.setRunner(SparkRunner.class);
> options.setEnableSparkMetricSinks(false);
> options.setStreaming(true);
> options.setSparkMaster("spark://10.100.124.205:6066");
> options.setAppName("Beam App Spark"+new Random().nextFloat());
> options.setJobName("Beam Job Spark"+new Random().nextFloat());
> System.out.println("App Name:"+options.getAppName());
> System.out.println("Job Name:"+options.getJobName());
> options.setMaxRecordsPerBatch(10L);
> 
> //  PipelineOptions options = PipelineOptionsFactory.create();
> Pipeline p = Pipeline.create(options);
> 
> //  Duration size = Duration.standardMinutes(4);
> long duration = 60;
> if(args!=null && args.length==1){
> duration = Integer.valueOf(args[0]);
> }
> Duration size = Duration.standardSeconds(duration);
> System.out.println("时间窗口为:["+duration+"]秒");
> Window.Bound> fixWindow = 
> Window.> into(
> FixedWindows.of(size)
> );
> 
> String kafkaAddress = "10.100.124.208:9093";
> //  String kafkaAddress = "192.168.100.212:9092";
> 
> Map kfConsunmerConf = new HashMap();
> kfConsunmerConf.put("auto.offset.reset", "latest");
> PCollection kafkaJsonPc = p.apply(KafkaIO. 
> read()
> .withBootstrapServers(kafkaAddress)
> .withTopics(ImmutableList.of("wypxx1"))
> .withKeyCoder(StringUtf8Coder.of()) 
> .withValueCoder(StringUtf8Coder.of())
> .updateConsumerProperties(kfConsunmerConf)
> .withoutMetadata() 
> ).apply(Values. create());
> 
> 
> PCollection> totalPc = kafkaJsonPc.apply(
> "count line",
> ParDo.of(new DoFn>() {
> @ProcessElement
>   public void processElement(ProcessContext c) {
> String line = c.element();
> Instant is = c.timestamp();
> if(line.length()>2)
>   line = line.substring(0,2);
> System.out.println(line + " " +  is.toString());
> c.output(KV.of(line, line));
>   }
>  })
> );
> 
> 
> PCollection> itPc = 
> totalPc.apply(fixWindow).apply(
> "group by appKey",
> GroupByKey.create()
> );
>   itPc.apply(ParDo.of(new DoFn, Void>() {
> @ProcessElement
> public void processElement(ProcessContext c) {
> KV keyIt = c.element();
> String key = keyIt.getKey();
> Iterable itb = keyIt.getValue();
> Iterator it = itb.iterator();
> StringBuilder sb = new StringBuilder();
> sb.append(key).append(":[");
> while(it.hasNext()){
> sb.append(it.next()).append(",");
> }
> String str = sb.toString();
> str = str.substring(0,str.length() -1) + "]";
> System.out.println(str);
> String filePath = "/data/wyp/sparktest.txt";
> String line = "word-->["+key+"]total 
> count="+str+"--->time+"+c.timestamp().toString();
> System.out.println("writefile->"+line);
> FileUtil.write(filePath, line, true, true);
> }
> 
>  }));
>   
> p.run().waitUntilFinish();
> When I user submit application to spark cluster.In spark UI,I can see log of  
> totalPc PCollection  of. after one miniter but I can.t see log of itPc 
> PCollection.
> I use local mode spark,It work well.
> Please help me to resovle this proplems.Thanks!



--
This message was sent by 

[jira] [Commented] (BEAM-1789) window can't not use in spark cluster module

2017-04-19 Thread tianyou (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975988#comment-15975988
 ] 

tianyou commented on BEAM-1789:
---

I change *strong*Window.Bound> fixWindow = 
Window.> into(FixedWindows.of(size));*strong*

> window can't not use in spark cluster module
> 
>
> Key: BEAM-1789
> URL: https://issues.apache.org/jira/browse/BEAM-1789
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: tianyou
>Assignee: Amit Sela
>
>  I user beam in spark cluster,The application is blow.
> SparkPipelineOptions options = 
> PipelineOptionsFactory.as(SparkPipelineOptions.class);
> options.setRunner(SparkRunner.class);
> options.setEnableSparkMetricSinks(false);
> options.setStreaming(true);
> options.setSparkMaster("spark://10.100.124.205:6066");
> options.setAppName("Beam App Spark"+new Random().nextFloat());
> options.setJobName("Beam Job Spark"+new Random().nextFloat());
> System.out.println("App Name:"+options.getAppName());
> System.out.println("Job Name:"+options.getJobName());
> options.setMaxRecordsPerBatch(10L);
> 
> //  PipelineOptions options = PipelineOptionsFactory.create();
> Pipeline p = Pipeline.create(options);
> 
> //  Duration size = Duration.standardMinutes(4);
> long duration = 60;
> if(args!=null && args.length==1){
> duration = Integer.valueOf(args[0]);
> }
> Duration size = Duration.standardSeconds(duration);
> System.out.println("时间窗口为:["+duration+"]秒");
> Window.Bound> fixWindow = 
> Window.> into(
> FixedWindows.of(size)
> );
> 
> String kafkaAddress = "10.100.124.208:9093";
> //  String kafkaAddress = "192.168.100.212:9092";
> 
> Map kfConsunmerConf = new HashMap();
> kfConsunmerConf.put("auto.offset.reset", "latest");
> PCollection kafkaJsonPc = p.apply(KafkaIO. 
> read()
> .withBootstrapServers(kafkaAddress)
> .withTopics(ImmutableList.of("wypxx1"))
> .withKeyCoder(StringUtf8Coder.of()) 
> .withValueCoder(StringUtf8Coder.of())
> .updateConsumerProperties(kfConsunmerConf)
> .withoutMetadata() 
> ).apply(Values. create());
> 
> 
> PCollection> totalPc = kafkaJsonPc.apply(
> "count line",
> ParDo.of(new DoFn>() {
> @ProcessElement
>   public void processElement(ProcessContext c) {
> String line = c.element();
> Instant is = c.timestamp();
> if(line.length()>2)
>   line = line.substring(0,2);
> System.out.println(line + " " +  is.toString());
> c.output(KV.of(line, line));
>   }
>  })
> );
> 
> 
> PCollection> itPc = 
> totalPc.apply(fixWindow).apply(
> "group by appKey",
> GroupByKey.create()
> );
>   itPc.apply(ParDo.of(new DoFn, Void>() {
> @ProcessElement
> public void processElement(ProcessContext c) {
> KV keyIt = c.element();
> String key = keyIt.getKey();
> Iterable itb = keyIt.getValue();
> Iterator it = itb.iterator();
> StringBuilder sb = new StringBuilder();
> sb.append(key).append(":[");
> while(it.hasNext()){
> sb.append(it.next()).append(",");
> }
> String str = sb.toString();
> str = str.substring(0,str.length() -1) + "]";
> System.out.println(str);
> String filePath = "/data/wyp/sparktest.txt";
> String line = "word-->["+key+"]total 
> count="+str+"--->time+"+c.timestamp().toString();
> System.out.println("writefile->"+line);
> FileUtil.write(filePath, line, true, true);
> }
> 
>  }));
>   
> p.run().waitUntilFinish();
> When I user submit application to spark cluster.In spark UI,I can see log of  
> totalPc PCollection  of. after one miniter but I can.t see log of itPc 
> PCollection.
> I use local mode 

[jira] [Commented] (BEAM-79) Gearpump runner

2017-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975906#comment-15975906
 ] 

ASF GitHub Bot commented on BEAM-79:


GitHub user huafengw opened a pull request:

https://github.com/apache/beam/pull/2607

[BEAM-79] Upgrade Gearpump to a stable version

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/huafengw/beam upgrade

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2607.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2607


commit 3ba9b229a0a2cfcd84d5b83f208748c060caf20d
Author: huafengw 
Date:   2017-04-20T01:36:45Z

[BEAM-79] Upgrade Gearpump to a stable version




> Gearpump runner
> ---
>
> Key: BEAM-79
> URL: https://issues.apache.org/jira/browse/BEAM-79
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-gearpump
>Reporter: Tyler Akidau
>Assignee: Manu Zhang
>
> Intel is submitting Gearpump (http://www.gearpump.io) to ASF 
> (https://wiki.apache.org/incubator/GearpumpProposal). Appears to be a mix of 
> low-level primitives a la MillWheel, with some higher level primitives like 
> non-merging windowing mixed in. Seems like it would make a nice Beam runner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2607: [BEAM-79] Upgrade Gearpump to a stable version

2017-04-19 Thread huafengw
GitHub user huafengw opened a pull request:

https://github.com/apache/beam/pull/2607

[BEAM-79] Upgrade Gearpump to a stable version

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/huafengw/beam upgrade

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2607.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2607


commit 3ba9b229a0a2cfcd84d5b83f208748c060caf20d
Author: huafengw 
Date:   2017-04-20T01:36:45Z

[BEAM-79] Upgrade Gearpump to a stable version




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (BEAM-973) Add end user and developer documentation to gearpump-runner

2017-04-19 Thread Manu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manu Zhang resolved BEAM-973.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> Add end user and developer documentation to gearpump-runner
> ---
>
> Key: BEAM-973
> URL: https://issues.apache.org/jira/browse/BEAM-973
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-gearpump
>Reporter: Manu Zhang
>Assignee: Manu Zhang
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-972) Add basic level of unit testing to gearpump runner

2017-04-19 Thread Manu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manu Zhang resolved BEAM-972.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> Add basic level of unit testing to gearpump runner
> --
>
> Key: BEAM-972
> URL: https://issues.apache.org/jira/browse/BEAM-972
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-gearpump, testing
>Reporter: Manu Zhang
>Assignee: Huafeng Wang
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build is back to stable : beam_PostCommit_Java_ValidatesRunner_Flink #2414

2017-04-19 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-1871) Thin Java SDK Core

2017-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975855#comment-15975855
 ] 

ASF GitHub Bot commented on BEAM-1871:
--

GitHub user tgroh opened a pull request:

https://github.com/apache/beam/pull/2606

[BEAM-1871] Remove DeterministicStandardCoder

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---
This isn't a particularly useful Coder. It has no defined methods other
than verifyDeterministic, which has an empty implementation.
Additionally, there are no guarantees that a DeterministicStandardCoder
is determinsitic.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/beam so_many_custom_coders

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2606.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2606


commit a52186cdd308f155eae34ca287b8bbaf9e95afc9
Author: Thomas Groh 
Date:   2017-04-20T00:58:58Z

Remove DeterministicStandardCoder

This isn't a particularly useful Coder. It has no defined methods other
than verifyDeterministic, which has an empty implementation.
Additionally, there are no guarantees that a DeterministicStandardCoder
is determinsitic.




> Thin Java SDK Core
> --
>
> Key: BEAM-1871
> URL: https://issues.apache.org/jira/browse/BEAM-1871
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Daniel Halperin
>Assignee: Luke Cwik
> Fix For: First stable release
>
>
> Before first stable release we need to thin out {{sdk-java-core}} module. 
> Some candidates for removal, but not a non-exhaustive list:
> {{sdk/io}}
> * anything BigQuery related
> * anything PubSub related
> * everything Protobuf related
> * TFRecordIO
> * XMLSink
> {{sdk/util}}
> * Everything GCS related
> * Everything Backoff related
> * Everything Google API related: ResponseInterceptors, RetryHttpBackoff, etc.
> * Everything CloudObject-related
> * Pubsub stuff
> {{sdk/coders}}
> * JAXBCoder
> * TableRowJsoNCoder



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2606: [BEAM-1871] Remove DeterministicStandardCoder

2017-04-19 Thread tgroh
GitHub user tgroh opened a pull request:

https://github.com/apache/beam/pull/2606

[BEAM-1871] Remove DeterministicStandardCoder

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---
This isn't a particularly useful Coder. It has no defined methods other
than verifyDeterministic, which has an empty implementation.
Additionally, there are no guarantees that a DeterministicStandardCoder
is determinsitic.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/beam so_many_custom_coders

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2606.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2606


commit a52186cdd308f155eae34ca287b8bbaf9e95afc9
Author: Thomas Groh 
Date:   2017-04-20T00:58:58Z

Remove DeterministicStandardCoder

This isn't a particularly useful Coder. It has no defined methods other
than verifyDeterministic, which has an empty implementation.
Additionally, there are no guarantees that a DeterministicStandardCoder
is determinsitic.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (BEAM-1860) SerializableCoder should not extend DeterministicStandardCoder

2017-04-19 Thread Thomas Groh (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Groh reassigned BEAM-1860:
-

Assignee: Thomas Groh  (was: Davor Bonaci)

> SerializableCoder should not extend DeterministicStandardCoder
> --
>
> Key: BEAM-1860
> URL: https://issues.apache.org/jira/browse/BEAM-1860
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 0.6.0
>Reporter: Wesley Tanaka
>Assignee: Thomas Groh
>
> Not sure if this is just a doc bug, but:
> https://beam.apache.org/documentation/sdks/javadoc/0.6.0/org/apache/beam/sdk/coders/SerializableCoder.html
>  says:
> SerializableCoder does not guarantee a deterministic encoding, as Java 
> serialization may produce different binary encodings for two equivalent 
> objects.
> Yet 
> https://beam.apache.org/documentation/sdks/javadoc/0.6.0/org/apache/beam/sdk/coders/DeterministicStandardCoder.html
>  says:
> A DeterministicStandardCoder is a StandardCoder that is deterministic, in the 
> sense that for objects considered equal according to Object.equals(Object), 
> the encoded bytes are also equal.
> These sound like they conflict, and thus that SerializableCoder should not 
> extend DeterministicStandardCoder



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2605: [BEAM-1786, BEAM-1871] Add the ability to register ...

2017-04-19 Thread lukecwik
GitHub user lukecwik opened a pull request:

https://github.com/apache/beam/pull/2605

[BEAM-1786, BEAM-1871] Add the ability to register coder factories

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---
Allowing TableRowJsonCoder to move to sdks/java/io/google-cloud-platform

I needed to make getEncodedSize on StringUtf8Coder public to move 
TableRowJsonCoder


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lukecwik/incubator-beam thin_sdk_core2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2605.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2605


commit 459c3048e86ce31e9b2e1c98897ed98fe73e5d22
Author: Luke Cwik 
Date:   2017-04-20T00:59:15Z

[BEAM-1786, BEAM-1871] Add the ability to register coder factories for 
classes allowing TableRowJsonCoder to move to sdks/java/io/google-cloud-platform




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-1786) AutoService registration of coders, like we do with PipelineRunners

2017-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975846#comment-15975846
 ] 

ASF GitHub Bot commented on BEAM-1786:
--

GitHub user lukecwik opened a pull request:

https://github.com/apache/beam/pull/2605

[BEAM-1786, BEAM-1871] Add the ability to register coder factories

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---
Allowing TableRowJsonCoder to move to sdks/java/io/google-cloud-platform

I needed to make getEncodedSize on StringUtf8Coder public to move 
TableRowJsonCoder


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lukecwik/incubator-beam thin_sdk_core2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2605.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2605


commit 459c3048e86ce31e9b2e1c98897ed98fe73e5d22
Author: Luke Cwik 
Date:   2017-04-20T00:59:15Z

[BEAM-1786, BEAM-1871] Add the ability to register coder factories for 
classes allowing TableRowJsonCoder to move to sdks/java/io/google-cloud-platform




> AutoService registration of coders, like we do with PipelineRunners
> ---
>
> Key: BEAM-1786
> URL: https://issues.apache.org/jira/browse/BEAM-1786
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Luke Cwik
> Fix For: First stable release
>
>
> Today, registering coders for auxiliary data types for a library transform is 
> not very convenient. It the appears in an output/covariant position then it 
> might be possible to use {{getDefaultOutputCoder}} to solve things. But for 
> writes/contravariant positions this is not applicable and the library 
> transform must contort itself to avoid requiring the user to come up with a 
> coder for a type they don't own.
> Probably the best case today is an explicit call to 
> {{LibraryTransform.registerCoders(Pipeline)}} which is far too manual.
> This could likely be solved quite easily with {{@AutoService}} and a static 
> global coder registry, as we do with pipeline runners.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-950) DoFn Setup and Teardown methods should have access to PipelineOptions

2017-04-19 Thread Thomas Groh (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Groh reassigned BEAM-950:


Assignee: (was: Thomas Groh)

> DoFn Setup and Teardown methods should have access to PipelineOptions
> -
>
> Key: BEAM-950
> URL: https://issues.apache.org/jira/browse/BEAM-950
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Thomas Groh
>
> This enables any options-relevant decisions to be made once per DoFn, without 
> having to lazily initialize in {{startBundle}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-210) Allow control of empty ON_TIME panes analogous to final panes

2017-04-19 Thread Thomas Groh (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Groh reassigned BEAM-210:


Assignee: (was: Thomas Groh)

> Allow control of empty ON_TIME panes analogous to final panes
> -
>
> Key: BEAM-210
> URL: https://issues.apache.org/jira/browse/BEAM-210
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model-runner-api, sdk-java-core
>Reporter: Mark Shields
>
> Today, ON_TIME panes are emitted whether or not they are empty. We had 
> decided that for final panes the user would want to control this behavior, to 
> control data volume. But for ON_TIME panes no such control exists. The 
> rationale is perhaps that the ON_TIME pane is a fundamental result that 
> should not be elided. To be considered: whether this is what we want.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1028) Merge content from blog post into /documentation/pipelines/test-your-pipeline.md

2017-04-19 Thread Thomas Groh (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Groh reassigned BEAM-1028:
-

Assignee: (was: Thomas Groh)

> Merge content from blog post into 
> /documentation/pipelines/test-your-pipeline.md
> 
>
> Key: BEAM-1028
> URL: https://issues.apache.org/jira/browse/BEAM-1028
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Hadar Hod
>
> blog post: http://beam.incubator.apache.org/blog/2016/10/20/test-stream.html 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Build failed in Jenkins: beam_PostCommit_Java_ValidatesRunner_Spark #1716

2017-04-19 Thread Apache Jenkins Server
See 


--
[...truncated 232.19 KB...]
 x [deleted] (none) -> origin/pr/942/head
 x [deleted] (none) -> origin/pr/942/merge
 x [deleted] (none) -> origin/pr/943/head
 x [deleted] (none) -> origin/pr/943/merge
 x [deleted] (none) -> origin/pr/944/head
 x [deleted] (none) -> origin/pr/945/head
 x [deleted] (none) -> origin/pr/945/merge
 x [deleted] (none) -> origin/pr/946/head
 x [deleted] (none) -> origin/pr/946/merge
 x [deleted] (none) -> origin/pr/947/head
 x [deleted] (none) -> origin/pr/947/merge
 x [deleted] (none) -> origin/pr/948/head
 x [deleted] (none) -> origin/pr/948/merge
 x [deleted] (none) -> origin/pr/949/head
 x [deleted] (none) -> origin/pr/949/merge
 x [deleted] (none) -> origin/pr/95/head
 x [deleted] (none) -> origin/pr/95/merge
 x [deleted] (none) -> origin/pr/950/head
 x [deleted] (none) -> origin/pr/951/head
 x [deleted] (none) -> origin/pr/951/merge
 x [deleted] (none) -> origin/pr/952/head
 x [deleted] (none) -> origin/pr/952/merge
 x [deleted] (none) -> origin/pr/953/head
 x [deleted] (none) -> origin/pr/954/head
 x [deleted] (none) -> origin/pr/954/merge
 x [deleted] (none) -> origin/pr/955/head
 x [deleted] (none) -> origin/pr/955/merge
 x [deleted] (none) -> origin/pr/956/head
 x [deleted] (none) -> origin/pr/957/head
 x [deleted] (none) -> origin/pr/958/head
 x [deleted] (none) -> origin/pr/959/head
 x [deleted] (none) -> origin/pr/959/merge
 x [deleted] (none) -> origin/pr/96/head
 x [deleted] (none) -> origin/pr/96/merge
 x [deleted] (none) -> origin/pr/960/head
 x [deleted] (none) -> origin/pr/960/merge
 x [deleted] (none) -> origin/pr/961/head
 x [deleted] (none) -> origin/pr/962/head
 x [deleted] (none) -> origin/pr/962/merge
 x [deleted] (none) -> origin/pr/963/head
 x [deleted] (none) -> origin/pr/963/merge
 x [deleted] (none) -> origin/pr/964/head
 x [deleted] (none) -> origin/pr/965/head
 x [deleted] (none) -> origin/pr/965/merge
 x [deleted] (none) -> origin/pr/966/head
 x [deleted] (none) -> origin/pr/967/head
 x [deleted] (none) -> origin/pr/967/merge
 x [deleted] (none) -> origin/pr/968/head
 x [deleted] (none) -> origin/pr/968/merge
 x [deleted] (none) -> origin/pr/969/head
 x [deleted] (none) -> origin/pr/969/merge
 x [deleted] (none) -> origin/pr/97/head
 x [deleted] (none) -> origin/pr/97/merge
 x [deleted] (none) -> origin/pr/970/head
 x [deleted] (none) -> origin/pr/970/merge
 x [deleted] (none) -> origin/pr/971/head
 x [deleted] (none) -> origin/pr/971/merge
 x [deleted] (none) -> origin/pr/972/head
 x [deleted] (none) -> origin/pr/973/head
 x [deleted] (none) -> origin/pr/974/head
 x [deleted] (none) -> origin/pr/974/merge
 x [deleted] (none) -> origin/pr/975/head
 x [deleted] (none) -> origin/pr/975/merge
 x [deleted] (none) -> origin/pr/976/head
 x [deleted] (none) -> origin/pr/976/merge
 x [deleted] (none) -> origin/pr/977/head
 x [deleted] (none) -> origin/pr/977/merge
 x [deleted] (none) -> origin/pr/978/head
 x [deleted] (none) -> origin/pr/978/merge
 x [deleted] (none) -> origin/pr/979/head
 x [deleted] (none) -> origin/pr/979/merge
 x [deleted] (none) -> origin/pr/98/head
 x [deleted] (none) -> origin/pr/980/head
 x [deleted] (none) -> origin/pr/980/merge
 x [deleted] (none) -> origin/pr/981/head
 x [deleted] (none) -> origin/pr/982/head
 x [deleted] (none) -> origin/pr/982/merge
 x [deleted] (none) -> origin/pr/983/head
 x [deleted] (none) -> origin/pr/983/merge
 x [deleted] (none) -> origin/pr/984/head
 x [deleted] (none) -> origin/pr/984/merge
 x [deleted] (none) -> origin/pr/985/head
 x [deleted] (none) -> origin/pr/985/merge
 x [deleted] (none) -> origin/pr/986/head
 x [deleted] (none) -> origin/pr/986/merge
 x [deleted] (none) -> origin/pr/987/head
 x [deleted] (none) -> origin/pr/988/head
 x [deleted] (none) -> origin/pr/988/merge
 x [deleted] (none) -> 

[jira] [Commented] (BEAM-1909) BigQuery read transform fails for DirectRunner when querying non-US regions

2017-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975838#comment-15975838
 ] 

ASF GitHub Bot commented on BEAM-1909:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2582


> BigQuery read transform fails for DirectRunner when querying non-US regions
> ---
>
> Key: BEAM-1909
> URL: https://issues.apache.org/jira/browse/BEAM-1909
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Chamikara Jayalath
>
> See: 
> http://stackoverflow.com/questions/42135002/google-dataflow-cannot-read-and-write-in-different-locations-python-sdk-v0-5-5/42144748?noredirect=1#comment73621983_42144748
> This should be fixed by creating the temp dataset and table in the correct 
> region.
> cc: [~sb2nov]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[1/2] beam git commit: add temp dataset location for non-query BigQuerySource

2017-04-19 Thread chamikara
Repository: beam
Updated Branches:
  refs/heads/master 3ef614c72 -> 104f98235


add temp dataset location for non-query BigQuerySource


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/83dc58ee
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/83dc58ee
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/83dc58ee

Branch: refs/heads/master
Commit: 83dc58ee95e96586266c72f29e9d7c55c8ca0739
Parents: 3ef614c
Author: Uwe Jugel 
Authored: Wed Apr 12 14:56:50 2017 +0200
Committer: chamik...@google.com 
Committed: Wed Apr 19 17:55:12 2017 -0700

--
 sdks/python/apache_beam/io/gcp/bigquery.py | 47 -
 1 file changed, 38 insertions(+), 9 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/83dc58ee/sdks/python/apache_beam/io/gcp/bigquery.py
--
diff --git a/sdks/python/apache_beam/io/gcp/bigquery.py 
b/sdks/python/apache_beam/io/gcp/bigquery.py
index 25f544d..4686518 100644
--- a/sdks/python/apache_beam/io/gcp/bigquery.py
+++ b/sdks/python/apache_beam/io/gcp/bigquery.py
@@ -605,9 +605,27 @@ class BigQueryReader(dataflow_io.NativeSourceReader):
 else:
   self.query = self.source.query
 
+  def _get_source_table_location(self):
+tr = self.source.table_reference
+if tr is None:
+  # TODO: implement location retrieval for query sources
+  return
+
+if tr.projectId is None:
+  source_project_id = self.executing_project
+else:
+  source_project_id = tr.projectId
+
+source_dataset_id = tr.datasetId
+source_table_id = tr.tableId
+source_location = self.client.get_table_location(
+source_project_id, source_dataset_id, source_table_id)
+return source_location
+
   def __enter__(self):
 self.client = BigQueryWrapper(client=self.test_bigquery_client)
-self.client.create_temporary_dataset(self.executing_project)
+self.client.create_temporary_dataset(
+self.executing_project, location=self._get_source_table_location())
 return self
 
   def __exit__(self, exception_type, exception_value, traceback):
@@ -801,7 +819,7 @@ class BigQueryWrapper(object):
   @retry.with_exponential_backoff(
   num_retries=MAX_RETRIES,
   retry_filter=retry.retry_on_server_errors_and_timeout_filter)
-  def get_or_create_dataset(self, project_id, dataset_id):
+  def get_or_create_dataset(self, project_id, dataset_id, location=None):
 # Check if dataset already exists otherwise create it
 try:
   dataset = self.client.datasets.Get(bigquery.BigqueryDatasetsGetRequest(
@@ -809,9 +827,11 @@ class BigQueryWrapper(object):
   return dataset
 except HttpError as exn:
   if exn.status_code == 404:
-dataset = bigquery.Dataset(
-datasetReference=bigquery.DatasetReference(
-projectId=project_id, datasetId=dataset_id))
+dataset_reference = bigquery.DatasetReference(
+projectId=project_id, datasetId=dataset_id)
+dataset = bigquery.Dataset(datasetReference=dataset_reference)
+if location is not None:
+  dataset.location = location
 request = bigquery.BigqueryDatasetsInsertRequest(
 projectId=project_id, dataset=dataset)
 response = self.client.datasets.Insert(request)
@@ -867,7 +887,15 @@ class BigQueryWrapper(object):
   @retry.with_exponential_backoff(
   num_retries=MAX_RETRIES,
   retry_filter=retry.retry_on_server_errors_and_timeout_filter)
-  def create_temporary_dataset(self, project_id):
+  def get_table_location(self, project_id, dataset_id, table_id):
+table = self._get_table(project_id, dataset_id, table_id)
+return table.location
+
+  @retry.with_exponential_backoff(
+  num_retries=MAX_RETRIES,
+  retry_filter=retry.retry_on_server_errors_and_timeout_filter)
+  def create_temporary_dataset(self, project_id, location=None):
+# TODO: make location required, once "query" locations can be determined
 dataset_id = BigQueryWrapper.TEMP_DATASET + self._temporary_table_suffix
 # Check if dataset exists to make sure that the temporary id is unique
 try:
@@ -881,9 +909,10 @@ class BigQueryWrapper(object):
 except HttpError as exn:
   if exn.status_code == 404:
 logging.warning(
-'Dataset %s:%s does not exist so we will create it as temporary',
-project_id, dataset_id)
-self.get_or_create_dataset(project_id, dataset_id)
+'Dataset %s:%s does not exist so we will create it as temporary '
+'with location=%s',
+project_id, dataset_id, location)
+self.get_or_create_dataset(project_id, dataset_id, location=location)
   else:
 raise
 



[GitHub] beam pull request #2582: [BEAM-1909] BigQuery read transform fails for Direc...

2017-04-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2582


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: This closes #2582

2017-04-19 Thread chamikara
This closes #2582


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/104f9823
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/104f9823
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/104f9823

Branch: refs/heads/master
Commit: 104f9823563c4a2cd2ef9495e9a16b3a219dd49b
Parents: 3ef614c 83dc58e
Author: chamik...@google.com 
Authored: Wed Apr 19 17:55:34 2017 -0700
Committer: chamik...@google.com 
Committed: Wed Apr 19 17:55:34 2017 -0700

--
 sdks/python/apache_beam/io/gcp/bigquery.py | 47 -
 1 file changed, 38 insertions(+), 9 deletions(-)
--




Jenkins build is back to normal : beam_PostCommit_Java_MavenInstall #3383

2017-04-19 Thread Apache Jenkins Server
See 




[jira] [Updated] (BEAM-1786) AutoService registration of coders, like we do with PipelineRunners

2017-04-19 Thread Luke Cwik (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-1786:

Fix Version/s: First stable release

> AutoService registration of coders, like we do with PipelineRunners
> ---
>
> Key: BEAM-1786
> URL: https://issues.apache.org/jira/browse/BEAM-1786
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Luke Cwik
> Fix For: First stable release
>
>
> Today, registering coders for auxiliary data types for a library transform is 
> not very convenient. It the appears in an output/covariant position then it 
> might be possible to use {{getDefaultOutputCoder}} to solve things. But for 
> writes/contravariant positions this is not applicable and the library 
> transform must contort itself to avoid requiring the user to come up with a 
> coder for a type they don't own.
> Probably the best case today is an explicit call to 
> {{LibraryTransform.registerCoders(Pipeline)}} which is far too manual.
> This could likely be solved quite easily with {{@AutoService}} and a static 
> global coder registry, as we do with pipeline runners.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1786) AutoService registration of coders, like we do with PipelineRunners

2017-04-19 Thread Luke Cwik (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975814#comment-15975814
 ] 

Luke Cwik commented on BEAM-1786:
-

This is needed to thin out the Java SDK core BEAM-1871 otherwise we are stuck 
with coder classes like TableRowJsonCoder.

> AutoService registration of coders, like we do with PipelineRunners
> ---
>
> Key: BEAM-1786
> URL: https://issues.apache.org/jira/browse/BEAM-1786
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Luke Cwik
>
> Today, registering coders for auxiliary data types for a library transform is 
> not very convenient. It the appears in an output/covariant position then it 
> might be possible to use {{getDefaultOutputCoder}} to solve things. But for 
> writes/contravariant positions this is not applicable and the library 
> transform must contort itself to avoid requiring the user to come up with a 
> coder for a type they don't own.
> Probably the best case today is an explicit call to 
> {{LibraryTransform.registerCoders(Pipeline)}} which is far too manual.
> This could likely be solved quite easily with {{@AutoService}} and a static 
> global coder registry, as we do with pipeline runners.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1786) AutoService registration of coders, like we do with PipelineRunners

2017-04-19 Thread Luke Cwik (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik reassigned BEAM-1786:
---

Assignee: Luke Cwik

> AutoService registration of coders, like we do with PipelineRunners
> ---
>
> Key: BEAM-1786
> URL: https://issues.apache.org/jira/browse/BEAM-1786
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Luke Cwik
>
> Today, registering coders for auxiliary data types for a library transform is 
> not very convenient. It the appears in an output/covariant position then it 
> might be possible to use {{getDefaultOutputCoder}} to solve things. But for 
> writes/contravariant positions this is not applicable and the library 
> transform must contort itself to avoid requiring the user to come up with a 
> coder for a type they don't own.
> Probably the best case today is an explicit call to 
> {{LibraryTransform.registerCoders(Pipeline)}} which is far too manual.
> This could likely be solved quite easily with {{@AutoService}} and a static 
> global coder registry, as we do with pipeline runners.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build is back to normal : beam_PostCommit_Java_ValidatesRunner_Spark #1715

2017-04-19 Thread Apache Jenkins Server
See 




[jira] [Closed] (BEAM-919) Remove remaining old use/learn links from website src

2017-04-19 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin closed BEAM-919.

   Resolution: Fixed
 Assignee: Melissa Pashniak  (was: Frances Perry)
Fix Version/s: Not applicable

> Remove remaining old use/learn links from website src
> -
>
> Key: BEAM-919
> URL: https://issues.apache.org/jira/browse/BEAM-919
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Frances Perry
>Assignee: Melissa Pashniak
>Priority: Minor
> Fix For: Not applicable
>
>
> We still have old links lingering after the website refactoring.
> For example, the release guide 
> (https://github.com/apache/incubator-beam-site/blob/asf-site/src/contribute/release-guide.md)
>  still links to "/use/..." in a bunch of places. 
> impact: links still work because of redirects, but it's tech debt we should 
> fix.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2604: Remove bigshuffle from python examples

2017-04-19 Thread vikkyrk
GitHub user vikkyrk opened a pull request:

https://github.com/apache/beam/pull/2604

Remove bigshuffle from python examples

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vikkyrk/incubator-beam remove_bigshuffle

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2604.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2604


commit 8fda8130bbbef5a5614428508c5a39640301ca8c
Author: Vikas Kedigehalli 
Date:   2017-04-20T00:12:52Z

Remove bigshuffle from python examples




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Build failed in Jenkins: beam_PerformanceTests_JDBC #135

2017-04-19 Thread Apache Jenkins Server
See 


Changes:

[kirpichov] Cache result of BigQuerySourceBase.split

[dhalperi] [BEAM-2015] Remove shared profile in runners/pom.xml and fix Dataflow

[lcwik] [BEAM-2014] Upgrade to Google Auth 0.6.1

[lcwik] [BEAM-2013] Upgrade to Jackson 2.8.8

[dhalperi] [BEAM-2017] Fix NPE in DataflowRunner when there are no metrics

[lcwik] [BEAM-1871] Remove another depedendency by moving TestCredential

[lcwik] Modify types for input PCollections of Flatten transform to that of the

[altay] added module option, use more common zero test, show module name in log

--
[...truncated 1.01 MB...]
at 
com.google.cloud.dataflow.worker.runners.worker.MapTaskExecutorFactory$3.typedApply(MapTaskExecutorFactory.java:261)
at 
com.google.cloud.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:55)
at 
com.google.cloud.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:43)
at 
com.google.cloud.dataflow.worker.graph.Networks.replaceDirectedNetworkNodes(Networks.java:78)
at 
com.google.cloud.dataflow.worker.runners.worker.MapTaskExecutorFactory.create(MapTaskExecutorFactory.java:152)
at 
com.google.cloud.dataflow.worker.runners.worker.DataflowWorker.doWork(DataflowWorker.java:271)
at 
com.google.cloud.dataflow.worker.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:243)
at 
com.google.cloud.dataflow.worker.runners.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:127)
at 
com.google.cloud.dataflow.worker.runners.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
at 
com.google.cloud.dataflow.worker.runners.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:94)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
(1b4502fa479a3cc3): java.lang.RuntimeException: 
org.apache.beam.sdk.util.UserCodeException: org.postgresql.util.PSQLException: 
The connection attempt failed.
at 
com.google.cloud.dataflow.worker.runners.worker.MapTaskExecutorFactory$3.typedApply(MapTaskExecutorFactory.java:289)
at 
com.google.cloud.dataflow.worker.runners.worker.MapTaskExecutorFactory$3.typedApply(MapTaskExecutorFactory.java:261)
at 
com.google.cloud.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:55)
at 
com.google.cloud.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:43)
at 
com.google.cloud.dataflow.worker.graph.Networks.replaceDirectedNetworkNodes(Networks.java:78)
at 
com.google.cloud.dataflow.worker.runners.worker.MapTaskExecutorFactory.create(MapTaskExecutorFactory.java:152)
at 
com.google.cloud.dataflow.worker.runners.worker.DataflowWorker.doWork(DataflowWorker.java:271)
at 
com.google.cloud.dataflow.worker.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:243)
at 
com.google.cloud.dataflow.worker.runners.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:127)
at 
com.google.cloud.dataflow.worker.runners.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
at 
com.google.cloud.dataflow.worker.runners.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:94)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.beam.sdk.util.UserCodeException: 
org.postgresql.util.PSQLException: The connection attempt failed.
at 
org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:36)
at 
org.apache.beam.sdk.io.jdbc.JdbcIO$Read$ReadFn$auxiliary$IMScrFw4.invokeSetup(Unknown
 Source)
at 
com.google.cloud.dataflow.worker.runners.worker.DoFnInstanceManagers$ConcurrentQueueInstanceManager.deserializeCopy(DoFnInstanceManagers.java:66)
at 
com.google.cloud.dataflow.worker.runners.worker.DoFnInstanceManagers$ConcurrentQueueInstanceManager.peek(DoFnInstanceManagers.java:48)
at 
com.google.cloud.dataflow.worker.runners.worker.UserParDoFnFactory.create(UserParDoFnFactory.java:99)
at 
com.google.cloud.dataflow.worker.runners.worker.DefaultParDoFnFactory.create(DefaultParDoFnFactory.java:70)
at 

Jenkins build became unstable: beam_PostCommit_Java_ValidatesRunner_Flink #2413

2017-04-19 Thread Apache Jenkins Server
See 




Build failed in Jenkins: beam_PerformanceTests_Dataflow #329

2017-04-19 Thread Apache Jenkins Server
See 


Changes:

[kirpichov] Cache result of BigQuerySourceBase.split

[dhalperi] [BEAM-2015] Remove shared profile in runners/pom.xml and fix Dataflow

[lcwik] [BEAM-2014] Upgrade to Google Auth 0.6.1

[lcwik] [BEAM-2013] Upgrade to Jackson 2.8.8

[dhalperi] [BEAM-2017] Fix NPE in DataflowRunner when there are no metrics

[lcwik] [BEAM-1871] Remove another depedendency by moving TestCredential

[lcwik] Modify types for input PCollections of Flatten transform to that of the

[altay] added module option, use more common zero test, show module name in log

--
[...truncated 266.21 KB...]
 * [new ref] refs/pull/2544/merge -> origin/pr/2544/merge
 * [new ref] refs/pull/2545/head -> origin/pr/2545/head
 * [new ref] refs/pull/2545/merge -> origin/pr/2545/merge
 * [new ref] refs/pull/2546/head -> origin/pr/2546/head
 * [new ref] refs/pull/2546/merge -> origin/pr/2546/merge
 * [new ref] refs/pull/2547/head -> origin/pr/2547/head
 * [new ref] refs/pull/2547/merge -> origin/pr/2547/merge
 * [new ref] refs/pull/2548/head -> origin/pr/2548/head
 * [new ref] refs/pull/2548/merge -> origin/pr/2548/merge
 * [new ref] refs/pull/2550/head -> origin/pr/2550/head
 * [new ref] refs/pull/2550/merge -> origin/pr/2550/merge
 * [new ref] refs/pull/2551/head -> origin/pr/2551/head
 * [new ref] refs/pull/2551/merge -> origin/pr/2551/merge
 * [new ref] refs/pull/2552/head -> origin/pr/2552/head
 * [new ref] refs/pull/2552/merge -> origin/pr/2552/merge
 * [new ref] refs/pull/2553/head -> origin/pr/2553/head
 * [new ref] refs/pull/2553/merge -> origin/pr/2553/merge
 * [new ref] refs/pull/2554/head -> origin/pr/2554/head
 * [new ref] refs/pull/2554/merge -> origin/pr/2554/merge
 * [new ref] refs/pull/2555/head -> origin/pr/2555/head
 * [new ref] refs/pull/2555/merge -> origin/pr/2555/merge
 * [new ref] refs/pull/2556/head -> origin/pr/2556/head
 * [new ref] refs/pull/2556/merge -> origin/pr/2556/merge
 * [new ref] refs/pull/2557/head -> origin/pr/2557/head
 * [new ref] refs/pull/2557/merge -> origin/pr/2557/merge
 * [new ref] refs/pull/2558/head -> origin/pr/2558/head
 * [new ref] refs/pull/2558/merge -> origin/pr/2558/merge
 * [new ref] refs/pull/2559/head -> origin/pr/2559/head
 * [new ref] refs/pull/2559/merge -> origin/pr/2559/merge
 * [new ref] refs/pull/2560/head -> origin/pr/2560/head
 * [new ref] refs/pull/2560/merge -> origin/pr/2560/merge
 * [new ref] refs/pull/2561/head -> origin/pr/2561/head
 * [new ref] refs/pull/2561/merge -> origin/pr/2561/merge
 * [new ref] refs/pull/2562/head -> origin/pr/2562/head
 * [new ref] refs/pull/2562/merge -> origin/pr/2562/merge
 * [new ref] refs/pull/2563/head -> origin/pr/2563/head
 * [new ref] refs/pull/2563/merge -> origin/pr/2563/merge
 * [new ref] refs/pull/2564/head -> origin/pr/2564/head
 * [new ref] refs/pull/2564/merge -> origin/pr/2564/merge
 * [new ref] refs/pull/2565/head -> origin/pr/2565/head
 * [new ref] refs/pull/2565/merge -> origin/pr/2565/merge
 * [new ref] refs/pull/2566/head -> origin/pr/2566/head
 * [new ref] refs/pull/2566/merge -> origin/pr/2566/merge
 * [new ref] refs/pull/2567/head -> origin/pr/2567/head
 * [new ref] refs/pull/2567/merge -> origin/pr/2567/merge
 * [new ref] refs/pull/2568/head -> origin/pr/2568/head
 * [new ref] refs/pull/2569/head -> origin/pr/2569/head
 * [new ref] refs/pull/2569/merge -> origin/pr/2569/merge
 * [new ref] refs/pull/2570/head -> origin/pr/2570/head
 * [new ref] refs/pull/2570/merge -> origin/pr/2570/merge
 * [new ref] refs/pull/2571/head -> origin/pr/2571/head
 * [new ref] refs/pull/2571/merge -> origin/pr/2571/merge
 * [new ref] refs/pull/2572/head -> origin/pr/2572/head
 * [new ref] refs/pull/2572/merge -> origin/pr/2572/merge
 * [new ref] refs/pull/2573/head -> origin/pr/2573/head
 * [new ref] refs/pull/2573/merge -> origin/pr/2573/merge
 * [new ref] refs/pull/2574/head -> origin/pr/2574/head
 * [new ref] refs/pull/2575/head -> origin/pr/2575/head
 * [new ref] refs/pull/2575/merge -> origin/pr/2575/merge
 * [new ref] refs/pull/2576/head -> origin/pr/2576/head
 * [new ref] refs/pull/2577/head -> origin/pr/2577/head
 * [new ref] refs/pull/2577/merge -> origin/pr/2577/merge
 * [new ref] refs/pull/2578/head -> origin/pr/2578/head
 * [new ref] refs/pull/2578/merge -> origin/pr/2578/merge
 * [new ref] refs/pull/2579/head -> origin/pr/2579/head
 * [new ref] refs/pull/2579/merge -> 

Build failed in Jenkins: beam_PostCommit_Java_ValidatesRunner_Spark #1714

2017-04-19 Thread Apache Jenkins Server
See 


--
[...truncated 241.83 KB...]
 x [deleted] (none) -> origin/pr/942/head
 x [deleted] (none) -> origin/pr/942/merge
 x [deleted] (none) -> origin/pr/943/head
 x [deleted] (none) -> origin/pr/943/merge
 x [deleted] (none) -> origin/pr/944/head
 x [deleted] (none) -> origin/pr/945/head
 x [deleted] (none) -> origin/pr/945/merge
 x [deleted] (none) -> origin/pr/946/head
 x [deleted] (none) -> origin/pr/946/merge
 x [deleted] (none) -> origin/pr/947/head
 x [deleted] (none) -> origin/pr/947/merge
 x [deleted] (none) -> origin/pr/948/head
 x [deleted] (none) -> origin/pr/948/merge
 x [deleted] (none) -> origin/pr/949/head
 x [deleted] (none) -> origin/pr/949/merge
 x [deleted] (none) -> origin/pr/95/head
 x [deleted] (none) -> origin/pr/95/merge
 x [deleted] (none) -> origin/pr/950/head
 x [deleted] (none) -> origin/pr/951/head
 x [deleted] (none) -> origin/pr/951/merge
 x [deleted] (none) -> origin/pr/952/head
 x [deleted] (none) -> origin/pr/952/merge
 x [deleted] (none) -> origin/pr/953/head
 x [deleted] (none) -> origin/pr/954/head
 x [deleted] (none) -> origin/pr/954/merge
 x [deleted] (none) -> origin/pr/955/head
 x [deleted] (none) -> origin/pr/955/merge
 x [deleted] (none) -> origin/pr/956/head
 x [deleted] (none) -> origin/pr/957/head
 x [deleted] (none) -> origin/pr/958/head
 x [deleted] (none) -> origin/pr/959/head
 x [deleted] (none) -> origin/pr/959/merge
 x [deleted] (none) -> origin/pr/96/head
 x [deleted] (none) -> origin/pr/96/merge
 x [deleted] (none) -> origin/pr/960/head
 x [deleted] (none) -> origin/pr/960/merge
 x [deleted] (none) -> origin/pr/961/head
 x [deleted] (none) -> origin/pr/962/head
 x [deleted] (none) -> origin/pr/962/merge
 x [deleted] (none) -> origin/pr/963/head
 x [deleted] (none) -> origin/pr/963/merge
 x [deleted] (none) -> origin/pr/964/head
 x [deleted] (none) -> origin/pr/965/head
 x [deleted] (none) -> origin/pr/965/merge
 x [deleted] (none) -> origin/pr/966/head
 x [deleted] (none) -> origin/pr/967/head
 x [deleted] (none) -> origin/pr/967/merge
 x [deleted] (none) -> origin/pr/968/head
 x [deleted] (none) -> origin/pr/968/merge
 x [deleted] (none) -> origin/pr/969/head
 x [deleted] (none) -> origin/pr/969/merge
 x [deleted] (none) -> origin/pr/97/head
 x [deleted] (none) -> origin/pr/97/merge
 x [deleted] (none) -> origin/pr/970/head
 x [deleted] (none) -> origin/pr/970/merge
 x [deleted] (none) -> origin/pr/971/head
 x [deleted] (none) -> origin/pr/971/merge
 x [deleted] (none) -> origin/pr/972/head
 x [deleted] (none) -> origin/pr/973/head
 x [deleted] (none) -> origin/pr/974/head
 x [deleted] (none) -> origin/pr/974/merge
 x [deleted] (none) -> origin/pr/975/head
 x [deleted] (none) -> origin/pr/975/merge
 x [deleted] (none) -> origin/pr/976/head
 x [deleted] (none) -> origin/pr/976/merge
 x [deleted] (none) -> origin/pr/977/head
 x [deleted] (none) -> origin/pr/977/merge
 x [deleted] (none) -> origin/pr/978/head
 x [deleted] (none) -> origin/pr/978/merge
 x [deleted] (none) -> origin/pr/979/head
 x [deleted] (none) -> origin/pr/979/merge
 x [deleted] (none) -> origin/pr/98/head
 x [deleted] (none) -> origin/pr/980/head
 x [deleted] (none) -> origin/pr/980/merge
 x [deleted] (none) -> origin/pr/981/head
 x [deleted] (none) -> origin/pr/982/head
 x [deleted] (none) -> origin/pr/982/merge
 x [deleted] (none) -> origin/pr/983/head
 x [deleted] (none) -> origin/pr/983/merge
 x [deleted] (none) -> origin/pr/984/head
 x [deleted] (none) -> origin/pr/984/merge
 x [deleted] (none) -> origin/pr/985/head
 x [deleted] (none) -> origin/pr/985/merge
 x [deleted] (none) -> origin/pr/986/head
 x [deleted] (none) -> origin/pr/986/merge
 x [deleted] (none) -> origin/pr/987/head
 x [deleted] (none) -> origin/pr/988/head
 x [deleted] (none) -> origin/pr/988/merge
 x [deleted] (none) -> 

Build failed in Jenkins: beam_PostCommit_Java_MavenInstall #3382

2017-04-19 Thread Apache Jenkins Server
See 


--
[...truncated 242.25 KB...]
 x [deleted] (none) -> origin/pr/942/head
 x [deleted] (none) -> origin/pr/942/merge
 x [deleted] (none) -> origin/pr/943/head
 x [deleted] (none) -> origin/pr/943/merge
 x [deleted] (none) -> origin/pr/944/head
 x [deleted] (none) -> origin/pr/945/head
 x [deleted] (none) -> origin/pr/945/merge
 x [deleted] (none) -> origin/pr/946/head
 x [deleted] (none) -> origin/pr/946/merge
 x [deleted] (none) -> origin/pr/947/head
 x [deleted] (none) -> origin/pr/947/merge
 x [deleted] (none) -> origin/pr/948/head
 x [deleted] (none) -> origin/pr/948/merge
 x [deleted] (none) -> origin/pr/949/head
 x [deleted] (none) -> origin/pr/949/merge
 x [deleted] (none) -> origin/pr/95/head
 x [deleted] (none) -> origin/pr/95/merge
 x [deleted] (none) -> origin/pr/950/head
 x [deleted] (none) -> origin/pr/951/head
 x [deleted] (none) -> origin/pr/951/merge
 x [deleted] (none) -> origin/pr/952/head
 x [deleted] (none) -> origin/pr/952/merge
 x [deleted] (none) -> origin/pr/953/head
 x [deleted] (none) -> origin/pr/954/head
 x [deleted] (none) -> origin/pr/954/merge
 x [deleted] (none) -> origin/pr/955/head
 x [deleted] (none) -> origin/pr/955/merge
 x [deleted] (none) -> origin/pr/956/head
 x [deleted] (none) -> origin/pr/957/head
 x [deleted] (none) -> origin/pr/958/head
 x [deleted] (none) -> origin/pr/959/head
 x [deleted] (none) -> origin/pr/959/merge
 x [deleted] (none) -> origin/pr/96/head
 x [deleted] (none) -> origin/pr/96/merge
 x [deleted] (none) -> origin/pr/960/head
 x [deleted] (none) -> origin/pr/960/merge
 x [deleted] (none) -> origin/pr/961/head
 x [deleted] (none) -> origin/pr/962/head
 x [deleted] (none) -> origin/pr/962/merge
 x [deleted] (none) -> origin/pr/963/head
 x [deleted] (none) -> origin/pr/963/merge
 x [deleted] (none) -> origin/pr/964/head
 x [deleted] (none) -> origin/pr/965/head
 x [deleted] (none) -> origin/pr/965/merge
 x [deleted] (none) -> origin/pr/966/head
 x [deleted] (none) -> origin/pr/967/head
 x [deleted] (none) -> origin/pr/967/merge
 x [deleted] (none) -> origin/pr/968/head
 x [deleted] (none) -> origin/pr/968/merge
 x [deleted] (none) -> origin/pr/969/head
 x [deleted] (none) -> origin/pr/969/merge
 x [deleted] (none) -> origin/pr/97/head
 x [deleted] (none) -> origin/pr/97/merge
 x [deleted] (none) -> origin/pr/970/head
 x [deleted] (none) -> origin/pr/970/merge
 x [deleted] (none) -> origin/pr/971/head
 x [deleted] (none) -> origin/pr/971/merge
 x [deleted] (none) -> origin/pr/972/head
 x [deleted] (none) -> origin/pr/973/head
 x [deleted] (none) -> origin/pr/974/head
 x [deleted] (none) -> origin/pr/974/merge
 x [deleted] (none) -> origin/pr/975/head
 x [deleted] (none) -> origin/pr/975/merge
 x [deleted] (none) -> origin/pr/976/head
 x [deleted] (none) -> origin/pr/976/merge
 x [deleted] (none) -> origin/pr/977/head
 x [deleted] (none) -> origin/pr/977/merge
 x [deleted] (none) -> origin/pr/978/head
 x [deleted] (none) -> origin/pr/978/merge
 x [deleted] (none) -> origin/pr/979/head
 x [deleted] (none) -> origin/pr/979/merge
 x [deleted] (none) -> origin/pr/98/head
 x [deleted] (none) -> origin/pr/980/head
 x [deleted] (none) -> origin/pr/980/merge
 x [deleted] (none) -> origin/pr/981/head
 x [deleted] (none) -> origin/pr/982/head
 x [deleted] (none) -> origin/pr/982/merge
 x [deleted] (none) -> origin/pr/983/head
 x [deleted] (none) -> origin/pr/983/merge
 x [deleted] (none) -> origin/pr/984/head
 x [deleted] (none) -> origin/pr/984/merge
 x [deleted] (none) -> origin/pr/985/head
 x [deleted] (none) -> origin/pr/985/merge
 x [deleted] (none) -> origin/pr/986/head
 x [deleted] (none) -> origin/pr/986/merge
 x [deleted] (none) -> origin/pr/987/head
 x [deleted] (none) -> origin/pr/988/head
 x [deleted] (none) -> origin/pr/988/merge
 x [deleted] (none) -> 

[jira] [Commented] (BEAM-2000) run pylint on specific module

2017-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975755#comment-15975755
 ] 

ASF GitHub Bot commented on BEAM-2000:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2579


> run pylint on specific module
> -
>
> Key: BEAM-2000
> URL: https://issues.apache.org/jira/browse/BEAM-2000
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Uwe Jugel
>Assignee: Ahmet Altay
>Priority: Minor
>
> Since pylint takes quite some time to run on the whole project and since most 
> developers will touch only a part of the code, the lint script should have an 
> option to define which module is to be linted.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2579: [BEAM-2000] run pylint on specific module

2017-04-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2579


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: This closes #2579

2017-04-19 Thread altay
This closes #2579


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/3ef614c7
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/3ef614c7
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/3ef614c7

Branch: refs/heads/master
Commit: 3ef614c72d24f44a8cf463040a2bcf17b9445bc8
Parents: 6224556 28a2682
Author: Ahmet Altay 
Authored: Wed Apr 19 16:46:00 2017 -0700
Committer: Ahmet Altay 
Committed: Wed Apr 19 16:46:00 2017 -0700

--
 sdks/python/run_pylint.sh | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)
--




[1/2] beam git commit: added module option, use more common zero test, show module name in log

2017-04-19 Thread altay
Repository: beam
Updated Branches:
  refs/heads/master 6224556be -> 3ef614c72


added module option, use more common zero test, show module name in log


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/28a2682b
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/28a2682b
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/28a2682b

Branch: refs/heads/master
Commit: 28a2682b4d1f63942f3c0e67458f5558c91c06a9
Parents: 6224556
Author: Uwe Jugel 
Authored: Tue Apr 18 19:41:15 2017 +0200
Committer: Ahmet Altay 
Committed: Wed Apr 19 16:45:57 2017 -0700

--
 sdks/python/run_pylint.sh | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/28a2682b/sdks/python/run_pylint.sh
--
diff --git a/sdks/python/run_pylint.sh b/sdks/python/run_pylint.sh
index 80cbe6e..f733a79 100755
--- a/sdks/python/run_pylint.sh
+++ b/sdks/python/run_pylint.sh
@@ -23,9 +23,20 @@
 #
 # The exit-code of the script indicates success or a failure.
 
-set -e
+set -o errexit
 set -o pipefail
 
+MODULE=apache_beam
+
+usage(){ echo "Usage: $0 [MODULE|--help]  # The default MODULE is $MODULE"; }
+
+if test $# -gt 0; then
+  case "$@" in
+--help) usage; exit 1;;
+*)  MODULE="$@";;
+  esac
+fi
+
 # Following generated files are excluded from lint checks.
 EXCLUDED_GENERATED_FILES=(
 "apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py"
@@ -39,14 +50,14 @@ EXCLUDED_GENERATED_FILES=(
 
 FILES_TO_IGNORE=""
 for file in "${EXCLUDED_GENERATED_FILES[@]}"; do
-  if [[ $FILES_TO_IGNORE ]]; then
-FILES_TO_IGNORE="$FILES_TO_IGNORE, "
+  if test -z "$FILES_TO_IGNORE"
+then FILES_TO_IGNORE="$(basename $file)"
+else FILES_TO_IGNORE="$FILES_TO_IGNORE, $(basename $file)"
   fi
-  FILES_TO_IGNORE="$FILES_TO_IGNORE$(basename $file)"
 done
 echo "Skipping lint for generated files: $FILES_TO_IGNORE"
 
-echo "Running pylint:"
-pylint apache_beam --ignore-patterns="$FILES_TO_IGNORE"
-echo "Running pep8:"
-pep8 apache_beam --exclude="$FILES_TO_IGNORE"
+echo "Running pylint for module $MODULE:"
+pylint $MODULE --ignore-patterns="$FILES_TO_IGNORE"
+echo "Running pep8 for module $MODULE:"
+pep8 $MODULE --exclude="$FILES_TO_IGNORE"



[jira] [Commented] (BEAM-2019) Count.globally() requires default values for non-GlobalWindows

2017-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975725#comment-15975725
 ] 

ASF GitHub Bot commented on BEAM-2019:
--

GitHub user XuMingmin opened a pull request:

https://github.com/apache/beam/pull/2603

[BEAM-2019] Count.globally() requires default values for non-GlobalWindows

fix below error of Count.globally() when not using `GlobalWindows`:
```
Exception in thread "main" java.lang.IllegalStateException: Default values 
are not supported in Combine.globally() \
if the output PCollection is not windowed by GlobalWindows. Instead, use 
Combine.globally().withoutDefaults() \
to output an empty PCollection if the input PCollection is empty, or 
Combine.globally().asSingletonView() \
to get the default output of the CombineFn if the input PCollection is 
empty.
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/XuMingmin/beam BEAM-2019

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2603.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2603


commit 9e49ccc67bd7b0e36f8986d896087cc9250dbe13
Author: mingmxu 
Date:   2017-04-19T23:11:03Z

Use Combine.globally().withoutDefaults() in Count.globally()




> Count.globally() requires default values for non-GlobalWindows
> --
>
> Key: BEAM-2019
> URL: https://issues.apache.org/jira/browse/BEAM-2019
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>Priority: Minor
>
> Here's my code:
> {code}
> .apply(Window.into(FixedWindows.of(Duration.standardHours(1)))  
> .triggering(Repeatedly.forever(AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardMinutes(1
>   .withAllowedLateness(Duration.standardMinutes(10))
>   .accumulatingFiredPanes()
>   )
> .apply(Count.globally());
> {code}
> And the error message as below:
> {code}
> Exception in thread "main" java.lang.IllegalStateException: Default values 
> are not supported in Combine.globally() if the output PCollection is not 
> windowed by GlobalWindows. Instead, use Combine.globally().withoutDefaults() 
> to output an empty PCollection if the input PCollection is empty, or 
> Combine.globally().asSingletonView() to get the default output of the 
> CombineFn if the input PCollection is empty.
>   at 
> org.apache.beam.sdk.transforms.Combine$Globally.expand(Combine.java:1463)
>   at 
> org.apache.beam.sdk.transforms.Combine$Globally.expand(Combine.java:1336)
>   at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:420)
>   at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:334)
>   at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:154)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2603: [BEAM-2019] Count.globally() requires default value...

2017-04-19 Thread XuMingmin
GitHub user XuMingmin opened a pull request:

https://github.com/apache/beam/pull/2603

[BEAM-2019] Count.globally() requires default values for non-GlobalWindows

fix below error of Count.globally() when not using `GlobalWindows`:
```
Exception in thread "main" java.lang.IllegalStateException: Default values 
are not supported in Combine.globally() \
if the output PCollection is not windowed by GlobalWindows. Instead, use 
Combine.globally().withoutDefaults() \
to output an empty PCollection if the input PCollection is empty, or 
Combine.globally().asSingletonView() \
to get the default output of the CombineFn if the input PCollection is 
empty.
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/XuMingmin/beam BEAM-2019

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2603.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2603


commit 9e49ccc67bd7b0e36f8986d896087cc9250dbe13
Author: mingmxu 
Date:   2017-04-19T23:11:03Z

Use Combine.globally().withoutDefaults() in Count.globally()




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #2602: Validates that input and output GCS paths specify a...

2017-04-19 Thread jkff
GitHub user jkff opened a pull request:

https://github.com/apache/beam/pull/2602

Validates that input and output GCS paths specify a bucket

Context: 
http://stackoverflow.com/questions/43505776/google-dataflow-workflow-error

To be backported into Dataflow SDK as well.

R: @dhalperi 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkff/incubator-beam gcs-bucket

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2602.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2602


commit 372357937a0fb6fbd72a11f7d0ddf70380811f3d
Author: Eugene Kirpichov 
Date:   2017-04-19T23:10:45Z

Validates that input and output GCS paths specify a bucket




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (BEAM-2019) Count.globally() requires default values for non-GlobalWindows

2017-04-19 Thread Xu Mingmin (JIRA)
Xu Mingmin created BEAM-2019:


 Summary: Count.globally() requires default values for 
non-GlobalWindows
 Key: BEAM-2019
 URL: https://issues.apache.org/jira/browse/BEAM-2019
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Reporter: Xu Mingmin
Assignee: Xu Mingmin
Priority: Minor


Here's my code:
{code}
.apply(Window.into(FixedWindows.of(Duration.standardHours(1)))  
.triggering(Repeatedly.forever(AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardMinutes(1
  .withAllowedLateness(Duration.standardMinutes(10))
  .accumulatingFiredPanes()
  )
.apply(Count.globally());
{code}

And the error message as below:
{code}
Exception in thread "main" java.lang.IllegalStateException: Default values are 
not supported in Combine.globally() if the output PCollection is not windowed 
by GlobalWindows. Instead, use Combine.globally().withoutDefaults() to output 
an empty PCollection if the input PCollection is empty, or 
Combine.globally().asSingletonView() to get the default output of the CombineFn 
if the input PCollection is empty.
at 
org.apache.beam.sdk.transforms.Combine$Globally.expand(Combine.java:1463)
at 
org.apache.beam.sdk.transforms.Combine$Globally.expand(Combine.java:1336)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:420)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:334)
at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:154)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1414) CountingInput should comply with PTransform style guide

2017-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975690#comment-15975690
 ] 

ASF GitHub Bot commented on BEAM-1414:
--

GitHub user jkff opened a pull request:

https://github.com/apache/beam/pull/2601

[BEAM-1414] Replaces CountingInput with style guide compliant 
GenerateSequence

Migration guide:
`CountingInput.unbounded()` -> `GenerateSequence.from(0)`
`CountingInput.upTo(n)` -> `GenerateSequence.fromTo(0, n)`

Removing rogue direct uses of CountingSource is for another PR and less 
critical.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkff/incubator-beam count-style

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2601.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2601


commit d0d97f165d954471ae2e693efea5d65e44dfaaa2
Author: Eugene Kirpichov 
Date:   2017-04-18T23:48:38Z

Introduces GenerateSequence transform

It is a replacement for CountingInput, which will be deprecated.

commit 8199c5fb65807e062854952720d6dc2a751c1e56
Author: Eugene Kirpichov 
Date:   2017-04-19T22:32:08Z

Replaces all usages of CountingInput with GenerateSequence

commit 767af74f9d7105e25872aad873c3c70fe87f741c
Author: Eugene Kirpichov 
Date:   2017-04-19T22:36:42Z

Deletes CountingInput




> CountingInput should comply with PTransform style guide
> ---
>
> Key: BEAM-1414
> URL: https://issues.apache.org/jira/browse/BEAM-1414
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>  Labels: backward-incompatible, starter
> Fix For: First stable release
>
>
> Suggested changes:
> - Rename the whole class and its inner transforms to sound more verb-like, 
> e.g.: GenerateRange.Bounded/Unbounded (as opposed to current 
> CountingInput.BoundedCountingInput)
> - Provide a more unified API between bounded and unbounded cases: 
> GenerateRange.from(100) should return a GenerateRange.Unbounded; 
> GenerateRange.from(100).to(200) should return a GenerateRange.Bounded. They 
> both should accept a timestampFn. The unbounded one _should not_ have a 
> withMaxNumRecords builder - that's redundant with specifying the range.
> - (optional) Use AutoValue



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1283) DoFn finishBundle should be required to specify the window for output

2017-04-19 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-1283:
--
Summary: DoFn finishBundle should be required to specify the window for 
output  (was: DoFn finishBundle should be required to specify the window)

> DoFn finishBundle should be required to specify the window for output
> -
>
> Key: BEAM-1283
> URL: https://issues.apache.org/jira/browse/BEAM-1283
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model, sdk-java-core, sdk-py
>Reporter: Kenneth Knowles
>  Labels: backward-incompatible
> Fix For: First stable release
>
>
> The spec is here in Javadoc: 
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L128
> "If invoked from {{@StartBundle}} or {{@FinishBundle}}, this will attempt to 
> use the {{WindowFn}} of the input {{PCollection}} to determine what windows 
> the element should be in, throwing an exception if the {{WindowFn}} attempts 
> to access any information about the input element. The output element will 
> have a timestamp of negative infinity."
> This is a collection of caveats that make this method not always technically 
> wrong, but quite a mess. Ideas that reasonable folks have suggested lately:
>  - The {{WindowFn}} cannot actually be applied because {{WindowFn}} is 
> allowed to see the element type. The spec just avoids this by limiting which 
> {{WindowFn}} can be used.
>  - There is no natural output timestamp, so it should always be provided. The 
> spec avoids this by specifying an arbitrary and fairly useless timestamp.
>  - If it is a merging {{WindowFn}} like sessions that has already been merged 
> then you'll just have a bogus proto window regardless of explicit timestamp 
> or not.
> The use cases for these methods are best addressed by state plus window 
> expiry callback, so we should revisit this spec and probably just wipe it.
> There are some rare case where you might need to output from {{FinishBundle}} 
> in a way that is not _actually_ sensitive to bundling (perhaps modulo some 
> downstream notion of equivalence) in which case you had better know what 
> window you are outputting to. Often it should be the global window.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2601: [BEAM-1414] Replaces CountingInput with style guide...

2017-04-19 Thread jkff
GitHub user jkff opened a pull request:

https://github.com/apache/beam/pull/2601

[BEAM-1414] Replaces CountingInput with style guide compliant 
GenerateSequence

Migration guide:
`CountingInput.unbounded()` -> `GenerateSequence.from(0)`
`CountingInput.upTo(n)` -> `GenerateSequence.fromTo(0, n)`

Removing rogue direct uses of CountingSource is for another PR and less 
critical.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkff/incubator-beam count-style

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2601.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2601


commit d0d97f165d954471ae2e693efea5d65e44dfaaa2
Author: Eugene Kirpichov 
Date:   2017-04-18T23:48:38Z

Introduces GenerateSequence transform

It is a replacement for CountingInput, which will be deprecated.

commit 8199c5fb65807e062854952720d6dc2a751c1e56
Author: Eugene Kirpichov 
Date:   2017-04-19T22:32:08Z

Replaces all usages of CountingInput with GenerateSequence

commit 767af74f9d7105e25872aad873c3c70fe87f741c
Author: Eugene Kirpichov 
Date:   2017-04-19T22:36:42Z

Deletes CountingInput




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is back to stable : beam_PostCommit_Java_ValidatesRunner_Flink #2410

2017-04-19 Thread Apache Jenkins Server
See 




[GitHub] beam pull request #2543: [BEAM-1956] Modify types for input PCollections of ...

2017-04-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2543


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: Modify types for input PCollections of Flatten transform to that of the output PCollection

2017-04-19 Thread lcwik
Repository: beam
Updated Branches:
  refs/heads/master 1721cea27 -> 6224556be


Modify types for input PCollections of Flatten transform to that of the output 
PCollection


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/8c88d6ab
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/8c88d6ab
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/8c88d6ab

Branch: refs/heads/master
Commit: 8c88d6ab475db40afb99c08ea44f9a2c61d85862
Parents: 1721cea
Author: Vikas Kedigehalli 
Authored: Fri Apr 14 13:53:13 2017 -0700
Committer: Luke Cwik 
Committed: Wed Apr 19 15:35:06 2017 -0700

--
 .../runners/dataflow/dataflow_runner.py | 68 
 .../runners/dataflow/dataflow_runner_test.py| 65 ++-
 .../apache_beam/runners/direct/direct_runner.py |  2 -
 sdks/python/apache_beam/runners/runner.py   | 41 
 sdks/python/apache_beam/runners/runner_test.py  | 41 
 sdks/python/apache_beam/utils/proto_utils.py|  2 +-
 6 files changed, 133 insertions(+), 86 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/8c88d6ab/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
--
diff --git a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py 
b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
index 779db8f..4534895 100644
--- a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
+++ b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
@@ -149,6 +149,66 @@ class DataflowRunner(PipelineRunner):
 result._job = response
 runner.last_error_msg = last_error_msg
 
+  @staticmethod
+  def group_by_key_input_visitor():
+# Imported here to avoid circular dependencies.
+from apache_beam.pipeline import PipelineVisitor
+
+class GroupByKeyInputVisitor(PipelineVisitor):
+  """A visitor that replaces `Any` element type for input `PCollection` of
+  a `GroupByKey` or `GroupByKeyOnly` with a `KV` type.
+
+  TODO(BEAM-115): Once Python SDk is compatible with the new Runner API,
+  we could directly replace the coder instead of mutating the element type.
+  """
+
+  def visit_transform(self, transform_node):
+# Imported here to avoid circular dependencies.
+# pylint: disable=wrong-import-order, wrong-import-position
+from apache_beam import GroupByKey, GroupByKeyOnly
+if isinstance(transform_node.transform, (GroupByKey, GroupByKeyOnly)):
+  pcoll = transform_node.inputs[0]
+  input_type = pcoll.element_type
+  # If input_type is not specified, then treat it as `Any`.
+  if not input_type:
+input_type = typehints.Any
+
+  if not isinstance(input_type, typehints.TupleHint.TupleConstraint):
+if isinstance(input_type, typehints.AnyTypeConstraint):
+  # `Any` type needs to be replaced with a KV[Any, Any] to
+  # force a KV coder as the main output coder for the pcollection
+  # preceding a GroupByKey.
+  pcoll.element_type = typehints.KV[typehints.Any, typehints.Any]
+else:
+  # TODO: Handle other valid types,
+  # e.g. Union[KV[str, int], KV[str, float]]
+  raise ValueError(
+  "Input to GroupByKey must be of Tuple or Any type. "
+  "Found %s for %s" % (input_type, pcoll))
+
+return GroupByKeyInputVisitor()
+
+  @staticmethod
+  def flatten_input_visitor():
+# Imported here to avoid circular dependencies.
+from apache_beam.pipeline import PipelineVisitor
+
+class FlattenInputVisitor(PipelineVisitor):
+  """A visitor that replaces the element type for input ``PCollections``s 
of
+   a ``Flatten`` transform with that of the output ``PCollection``.
+  """
+
+  def visit_transform(self, transform_node):
+# Imported here to avoid circular dependencies.
+# pylint: disable=wrong-import-order, wrong-import-position
+from apache_beam import Flatten
+if isinstance(transform_node.transform, Flatten):
+  output_pcoll = transform_node.outputs[None]
+  for input_pcoll in transform_node.inputs:
+input_pcoll.element_type = output_pcoll.element_type
+
+return FlattenInputVisitor()
+
   def run(self, pipeline):
 """Remotely executes entire pipeline or parts reachable from node."""
 # Import here to avoid adding the dependency for local running scenarios.
@@ -161,6 +221,14 @@ class DataflowRunner(PipelineRunner):
   'please install apache_beam[gcp]')
 self.job = apiclient.Job(pipeline.options)
 
+# Dataflow runner requires a KV type for GBK inputs, hence we 

[jira] [Commented] (BEAM-1871) Thin Java SDK Core

2017-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975654#comment-15975654
 ] 

ASF GitHub Bot commented on BEAM-1871:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2598


> Thin Java SDK Core
> --
>
> Key: BEAM-1871
> URL: https://issues.apache.org/jira/browse/BEAM-1871
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Daniel Halperin
>Assignee: Luke Cwik
> Fix For: First stable release
>
>
> Before first stable release we need to thin out {{sdk-java-core}} module. 
> Some candidates for removal, but not a non-exhaustive list:
> {{sdk/io}}
> * anything BigQuery related
> * anything PubSub related
> * everything Protobuf related
> * TFRecordIO
> * XMLSink
> {{sdk/util}}
> * Everything GCS related
> * Everything Backoff related
> * Everything Google API related: ResponseInterceptors, RetryHttpBackoff, etc.
> * Everything CloudObject-related
> * Pubsub stuff
> {{sdk/coders}}
> * JAXBCoder
> * TableRowJsoNCoder



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2598: [BEAM-1871] Remove another depedendency by moving T...

2017-04-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2598


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: [BEAM-1871] Remove another depedendency by moving TestCredential

2017-04-19 Thread lcwik
Repository: beam
Updated Branches:
  refs/heads/master 4ab74bf74 -> 1721cea27


[BEAM-1871] Remove another depedendency by moving TestCredential


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/c0baf4ce
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/c0baf4ce
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/c0baf4ce

Branch: refs/heads/master
Commit: c0baf4ce2f9b520ac843e67fb25d0384f1cb3aef
Parents: 4ab74bf
Author: Luke Cwik 
Authored: Wed Apr 19 14:20:46 2017 -0700
Committer: Luke Cwik 
Committed: Wed Apr 19 15:29:51 2017 -0700

--
 sdks/java/core/pom.xml  |  5 --
 .../apache/beam/sdk/util/TestCredential.java| 59 
 .../org/apache/beam/SdkCoreApiSurfaceTest.java  |  1 -
 sdks/java/extensions/gcp-core/pom.xml   |  5 ++
 .../apache/beam/sdk/util/TestCredential.java| 59 
 5 files changed, 64 insertions(+), 65 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/c0baf4ce/sdks/java/core/pom.xml
--
diff --git a/sdks/java/core/pom.xml b/sdks/java/core/pom.xml
index dc80a2c..2860be2 100644
--- a/sdks/java/core/pom.xml
+++ b/sdks/java/core/pom.xml
@@ -140,11 +140,6 @@
 
 
 
-  com.google.auth
-  google-auth-library-credentials
-
-
-
   com.google.apis
   google-api-services-bigquery
 

http://git-wip-us.apache.org/repos/asf/beam/blob/c0baf4ce/sdks/java/core/src/main/java/org/apache/beam/sdk/util/TestCredential.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/TestCredential.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/util/TestCredential.java
deleted file mode 100644
index f34527e..000
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/TestCredential.java
+++ /dev/null
@@ -1,59 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.beam.sdk.util;
-
-import com.google.auth.Credentials;
-import java.io.IOException;
-import java.net.URI;
-import java.util.Collections;
-import java.util.List;
-import java.util.Map;
-
-/**
- * Fake credential, for use in testing.
- */
-public class TestCredential extends Credentials {
-  @Override
-  public String getAuthenticationType() {
-return "Test";
-  }
-
-  @Override
-  public Map getRequestMetadata() throws IOException {
-return Collections.emptyMap();
-  }
-
-  @Override
-  public Map getRequestMetadata(URI uri) throws 
IOException {
-return Collections.emptyMap();
-  }
-
-  @Override
-  public boolean hasRequestMetadata() {
-return false;
-  }
-
-  @Override
-  public boolean hasRequestMetadataOnly() {
-return true;
-  }
-
-  @Override
-  public void refresh() throws IOException {
-  }
-}

http://git-wip-us.apache.org/repos/asf/beam/blob/c0baf4ce/sdks/java/core/src/test/java/org/apache/beam/SdkCoreApiSurfaceTest.java
--
diff --git 
a/sdks/java/core/src/test/java/org/apache/beam/SdkCoreApiSurfaceTest.java 
b/sdks/java/core/src/test/java/org/apache/beam/SdkCoreApiSurfaceTest.java
index 66c1393..eed4457 100644
--- a/sdks/java/core/src/test/java/org/apache/beam/SdkCoreApiSurfaceTest.java
+++ b/sdks/java/core/src/test/java/org/apache/beam/SdkCoreApiSurfaceTest.java
@@ -41,7 +41,6 @@ public class SdkCoreApiSurfaceTest {
 "com.google.api.client",
 "com.google.api.services.bigquery",
 "com.google.api.services.storage",
-"com.google.auth",
 "com.google.protobuf",
 "com.fasterxml.jackson.annotation",
 "com.fasterxml.jackson.core",

http://git-wip-us.apache.org/repos/asf/beam/blob/c0baf4ce/sdks/java/extensions/gcp-core/pom.xml
--
diff --git a/sdks/java/extensions/gcp-core/pom.xml 

[2/2] beam git commit: [BEAM-1871] Remove another depedendency by moving TestCredential

2017-04-19 Thread lcwik
[BEAM-1871] Remove another depedendency by moving TestCredential

This closes #2598


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/1721cea2
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/1721cea2
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/1721cea2

Branch: refs/heads/master
Commit: 1721cea270296348e39fcff1ca27c2604eafaf3c
Parents: 4ab74bf c0baf4c
Author: Luke Cwik 
Authored: Wed Apr 19 15:30:19 2017 -0700
Committer: Luke Cwik 
Committed: Wed Apr 19 15:30:19 2017 -0700

--
 sdks/java/core/pom.xml  |  5 --
 .../apache/beam/sdk/util/TestCredential.java| 59 
 .../org/apache/beam/SdkCoreApiSurfaceTest.java  |  1 -
 sdks/java/extensions/gcp-core/pom.xml   |  5 ++
 .../apache/beam/sdk/util/TestCredential.java| 59 
 5 files changed, 64 insertions(+), 65 deletions(-)
--




[jira] [Commented] (BEAM-1283) DoFn finishBundle should be required to specify the window

2017-04-19 Thread Robert Bradshaw (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975650#comment-15975650
 ] 

Robert Bradshaw commented on BEAM-1283:
---

Python has the same spec. "State plus window expiry callback" doesn't address 
the keyless case. 

> DoFn finishBundle should be required to specify the window
> --
>
> Key: BEAM-1283
> URL: https://issues.apache.org/jira/browse/BEAM-1283
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model, sdk-java-core, sdk-py
>Reporter: Kenneth Knowles
>  Labels: backward-incompatible
> Fix For: First stable release
>
>
> The spec is here in Javadoc: 
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L128
> "If invoked from {{@StartBundle}} or {{@FinishBundle}}, this will attempt to 
> use the {{WindowFn}} of the input {{PCollection}} to determine what windows 
> the element should be in, throwing an exception if the {{WindowFn}} attempts 
> to access any information about the input element. The output element will 
> have a timestamp of negative infinity."
> This is a collection of caveats that make this method not always technically 
> wrong, but quite a mess. Ideas that reasonable folks have suggested lately:
>  - The {{WindowFn}} cannot actually be applied because {{WindowFn}} is 
> allowed to see the element type. The spec just avoids this by limiting which 
> {{WindowFn}} can be used.
>  - There is no natural output timestamp, so it should always be provided. The 
> spec avoids this by specifying an arbitrary and fairly useless timestamp.
>  - If it is a merging {{WindowFn}} like sessions that has already been merged 
> then you'll just have a bogus proto window regardless of explicit timestamp 
> or not.
> The use cases for these methods are best addressed by state plus window 
> expiry callback, so we should revisit this spec and probably just wipe it.
> There are some rare case where you might need to output from {{FinishBundle}} 
> in a way that is not _actually_ sensitive to bundling (perhaps modulo some 
> downstream notion of equivalence) in which case you had better know what 
> window you are outputting to. Often it should be the global window.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build is back to stable : beam_PostCommit_Java_ValidatesRunner_Dataflow #2885

2017-04-19 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-662) SlidingWindows should support sub-second periods

2017-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975642#comment-15975642
 ] 

ASF GitHub Bot commented on BEAM-662:
-

GitHub user sb2nov opened a pull request:

https://github.com/apache/beam/pull/2600

[BEAM-662] Fix for allowing floating point periods in windows

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---

R: @robertwb, @aaltay PTAL

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sb2nov/beam 
BEAM-662-sliding-windows-rounding-off

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2600.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2600


commit 2252dca066e3bd8712bfef1cacf493e3d99c5ecb
Author: Sourabh Bajaj 
Date:   2017-04-19T22:13:26Z

[BEAM-662] Fix for allowing floating point periods in windows




> SlidingWindows should support sub-second periods
> 
>
> Key: BEAM-662
> URL: https://issues.apache.org/jira/browse/BEAM-662
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Daniel Mills
>Assignee: Sourabh Bajaj
>Priority: Minor
> Fix For: First stable release
>
>
> SlidingWindows periods are being rounded to seconds, see 
> http://stackoverflow.com/questions/39604646/can-slidingwindows-have-half-second-periods-in-python-apache-beam



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] beam pull request #2600: [BEAM-662] Fix for allowing floating point periods ...

2017-04-19 Thread sb2nov
GitHub user sb2nov opened a pull request:

https://github.com/apache/beam/pull/2600

[BEAM-662] Fix for allowing floating point periods in windows

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---

R: @robertwb, @aaltay PTAL

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sb2nov/beam 
BEAM-662-sliding-windows-rounding-off

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2600.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2600


commit 2252dca066e3bd8712bfef1cacf493e3d99c5ecb
Author: Sourabh Bajaj 
Date:   2017-04-19T22:13:26Z

[BEAM-662] Fix for allowing floating point periods in windows




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (BEAM-919) Remove remaining old use/learn links from website src

2017-04-19 Thread Melissa Pashniak (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Melissa Pashniak reassigned BEAM-919:
-

Assignee: Frances Perry  (was: Melissa Pashniak)

> Remove remaining old use/learn links from website src
> -
>
> Key: BEAM-919
> URL: https://issues.apache.org/jira/browse/BEAM-919
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Frances Perry
>Assignee: Frances Perry
>Priority: Minor
>
> We still have old links lingering after the website refactoring.
> For example, the release guide 
> (https://github.com/apache/incubator-beam-site/blob/asf-site/src/contribute/release-guide.md)
>  still links to "/use/..." in a bunch of places. 
> impact: links still work because of redirects, but it's tech debt we should 
> fix.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   3   4   >