date:20191106

[jira] [Updated] (BEAM-8568) Local file system does not match relative path with wildcards

2019-11-06 Thread David Moravek (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Moravek updated BEAM-8568:

Fix Version/s: 2.17.0
Affects Version/s: 2.16.0
  Description: 
CWD structure:
{code}
src/test/resources/input/sometestfile.txt
{code}
 
Code:
{code:java}
input 
.apply(Create.of("src/test/resources/input/*)) 
.apply(FileIO.matchAll()) 
.apply(FileIO.readMatches())
{code}

The code above doesn't match any file starting Beam 2.16.0. The regression has 
been introduced in BEAM-7854.

  was:
Relative path in project:
{code:java}
src/test/resources/input/sometestfile.txt{code}
 

Code:
{code:java}
input 
.apply(Create.of("src/test/resources/input/*)) 
.apply(FileIO.matchAll()) 
.apply(FileIO.readMatches())
{code}
This code does not match any file. It does in beam version 2.16.0.

 

Bug was created by BEAM-7854.


> Local file system does not match relative path with wildcards
> -
>
> Key: BEAM-8568
> URL: https://issues.apache.org/jira/browse/BEAM-8568
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.16.0
>Reporter: Ondrej Cerny
>Priority: Major
> Fix For: 2.17.0
>
>
> CWD structure:
> {code}
> src/test/resources/input/sometestfile.txt
> {code}
>  
> Code:
> {code:java}
> input 
> .apply(Create.of("src/test/resources/input/*)) 
> .apply(FileIO.matchAll()) 
> .apply(FileIO.readMatches())
> {code}
> The code above doesn't match any file starting Beam 2.16.0. The regression 
> has been introduced in BEAM-7854.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (BEAM-8568) Local file system does not match relative path with wildcards

2019-11-06 Thread David Moravek (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969012#comment-16969012
 ] 

David Moravek edited comment on BEAM-8568 at 11/7/19 7:52 AM:
--

LocalFileSystem no longer supports wildcard relative paths.


was (Author: davidmoravek):
LocalFileSystem no longer support wildcard relative paths.

> Local file system does not match relative path with wildcards
> -
>
> Key: BEAM-8568
> URL: https://issues.apache.org/jira/browse/BEAM-8568
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Ondrej Cerny
>Priority: Major
>
> Relative path in project:
> {code:java}
> src/test/resources/input/sometestfile.txt{code}
>  
> Code:
> {code:java}
> input 
> .apply(Create.of("src/test/resources/input/*)) 
> .apply(FileIO.matchAll()) 
> .apply(FileIO.readMatches())
> {code}
> This code does not match any file. It does in beam version 2.16.0.
>  
> Bug was created by BEAM-7854.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8568) Local file system does not match relative path with wildcards

2019-11-06 Thread David Moravek (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969012#comment-16969012
 ] 

David Moravek commented on BEAM-8568:
-

LocalFileSystem no longer support wildcard relative paths.

> Local file system does not match relative path with wildcards
> -
>
> Key: BEAM-8568
> URL: https://issues.apache.org/jira/browse/BEAM-8568
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Ondrej Cerny
>Priority: Major
>
> Relative path in project:
> {code:java}
> src/test/resources/input/sometestfile.txt{code}
>  
> Code:
> {code:java}
> input 
> .apply(Create.of("src/test/resources/input/*)) 
> .apply(FileIO.matchAll()) 
> .apply(FileIO.readMatches())
> {code}
> This code does not match any file. It does in beam version 2.16.0.
>  
> Bug was created by BEAM-7854.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8564) Add LZO compression and decompression support

2019-11-06 Thread Amogh Tiwari (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968989#comment-16968989
 ] 

Amogh Tiwari commented on BEAM-8564:


[~kenn], please assign this issue to me, I have already started working on it 
and would love to contribute. You can find a concise plan in the description.

> Add LZO compression and decompression support
> -
>
> Key: BEAM-8564
> URL: https://issues.apache.org/jira/browse/BEAM-8564
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Amogh Tiwari
>Priority: Minor
>
> LZO is a lossless data compression algorithm which is focused on compression 
> and decompression speeds.
> This will enable Apache Beam sdk to compress/decompress files using LZO 
> compression algorithm. 
> This will include the following functionalities:
>  # compress() : for compressing files into an LZO archive
>  # decompress() : for decompressing files archived using LZO compression
> Appropriate Input and Output stream will also be added to enable working with 
> LZO files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=339752&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339752
 ]

ASF GitHub Bot logged work on BEAM-8539:


Author: ASF GitHub Bot
Created on: 07/Nov/19 05:09
Start Date: 07/Nov/19 05:09
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9965: [BEAM-8539] Make job 
state transitions in python-based runners consistent with java-based runners
URL: https://github.com/apache/beam/pull/9965#issuecomment-550135736
 
 
   R: @lukecwik 
   R: @mxm 
   R: @robertwb 
   
   This is a followup to https://github.com/apache/beam/pull/9969 that corrects 
python-based runners to behave the same as Java-based runners wrt STOPPED: it 
should be the initial state and not a terminal state.  
   
   Tests pass, but I'm not sure if this could cause problems with Dataflow. 
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339752)
Time Spent: 3h 20m  (was: 3h 10m)

> Clearly define the valid job state transitions
> --
>
> Key: BEAM-8539
> URL: https://issues.apache.org/jira/browse/BEAM-8539
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, runner-core, sdk-java-core, sdk-py-core
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> The Beam job state transitions are ill-defined, which is big problem for 
> anything that relies on the values coming from JobAPI.GetStateStream.
> I was hoping to find something like a state transition diagram in the docs so 
> that I could determine the start state, the terminal states, and the valid 
> transitions, but I could not find this. The code reveals that the SDKs differ 
> on the fundamentals:
> Java InMemoryJobService:
>  * start state: *STOPPED*
>  * run - about to submit to executor:  STARTING
>  * run - actually running on executor:  RUNNING
>  * terminal states: DONE, FAILED, CANCELLED, DRAINED
> Python AbstractJobServiceServicer / LocalJobServicer:
>  * start state: STARTING
>  * terminal states: DONE, FAILED, CANCELLED, *STOPPED*
> I think it would be good to make python work like Java, so that there is a 
> difference in state between a job that has been prepared and one that has 
> additionally been run.
> It's hard to tell how far this problem has spread within the various runners. 
>  I think a simple thing that can be done to help standardize behavior is to 
> implement the terminal states as an enum in the beam_job_api.proto, or create 
> a utility function in each language for checking if a state is terminal, so 
> that it's not left up to each runner to reimplement this logic.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8512) Add integration tests for Python "flink_runner.py"

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8512?focusedWorklogId=339748&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339748
 ]

ASF GitHub Bot logged work on BEAM-8512:


Author: ASF GitHub Bot
Created on: 07/Nov/19 04:42
Start Date: 07/Nov/19 04:42
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #9998: [BEAM-8512] Add 
integration tests for flink_runner.py
URL: https://github.com/apache/beam/pull/9998#discussion_r343474271
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -1953,8 +1953,10 @@ class BeamModulePlugin implements Plugin {
   }
   project.ext.addPortableWordCountTasks = {
 ->
-addPortableWordCountTask(false)
-addPortableWordCountTask(true)
+addPortableWordCountTask(false, "PortableRunner")
 
 Review comment:
   Is there anything that we are testing with PortableRunner that isn't 
included in FlinkRunner? If not, perhaps just replace those?
   
   Side question for follow-up: Since these tasks are Flink specific, should 
they be renamed and possibly moved out of BeamModulePlugin?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339748)
Time Spent: 1h  (was: 50m)

> Add integration tests for Python "flink_runner.py"
> --
>
> Key: BEAM-8512
> URL: https://issues.apache.org/jira/browse/BEAM-8512
> Project: Beam
>  Issue Type: Test
>  Components: runner-flink, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> There are currently no integration tests for the Python FlinkRunner. We need 
> a set of tests similar to {{flink_runner_test.py}} which currently use the 
> PortableRunner and not the FlinkRunner.
> CC [~robertwb] [~ibzib] [~thw]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-3288) Guard against unsafe triggers at construction time

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-3288?focusedWorklogId=339746&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339746
 ]

ASF GitHub Bot logged work on BEAM-3288:


Author: ASF GitHub Bot
Created on: 07/Nov/19 03:57
Start Date: 07/Nov/19 03:57
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #9960: [BEAM-3288] Guard 
against unsafe triggers at construction time
URL: https://github.com/apache/beam/pull/9960#issuecomment-550675399
 
 
   Abandoned node enforcement was the culprit, since I was deliberately not 
running the pipeline. https://scans.gradle.com/s/ie7pt3dazwpbm/tests
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339746)
Time Spent: 3h 20m  (was: 3h 10m)

> Guard against unsafe triggers at construction time 
> ---
>
> Key: BEAM-3288
> URL: https://issues.apache.org/jira/browse/BEAM-3288
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core, sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Current Beam trigger semantics are rather confusing and in some cases 
> extremely unsafe, especially if the pipeline includes multiple chained GBKs. 
> One example of that is https://issues.apache.org/jira/browse/BEAM-3169 .
> There's multiple issues:
> The API allows users to specify terminating top-level triggers (e.g. "trigger 
> a pane after receiving 1 elements in the window, and that's it"), but 
> experience from user support shows that this is nearly always a mistake and 
> the user did not intend to drop all further data.
> In general, triggers are the only place in Beam where data is being dropped 
> without making a lot of very loud noise about it - a practice for which the 
> PTransform style guide uses the language: "never, ever, ever do this".
> Continuation triggers are still worse. For context: continuation trigger is 
> the trigger that's set on the output of a GBK and controls further 
> aggregation of the results of this aggregation by downstream GBKs. The output 
> shouldn't just use the same trigger as the input, because e.g. if the input 
> trigger said "wait for an hour before emitting a pane", that doesn't mean 
> that we should wait for another hour before emitting a result of aggregating 
> the result of the input trigger. Continuation triggers try to simulate the 
> behavior "as if a pane of the input propagated through the entire pipeline", 
> but the implementation of individual continuation triggers doesn't do that. 
> E.g. the continuation of "first N elements in pane" trigger is "first 1 
> element in pane", and if the results of a first GBK are further grouped by a 
> second GBK onto more coarse key (e.g. if everything is grouped onto the same 
> key), that effectively means that, of the keys of the first GBK, only one 
> survives and all others are dropped (what happened in the data loss bug).
> The ultimate fix to all of these things is 
> https://s.apache.org/beam-sink-triggers . However, it is a huge model change, 
> and meanwhile we have to do something. The options are, in order of 
> increasing backward incompatibility (but incompatibility in a "rejecting 
> something that previously was accepted but extremely dangerous" kind of way):
> - Make the continuation trigger of most triggers be the "always-fire" 
> trigger. Seems that this should be the case for all triggers except the 
> watermark trigger. This will definitely increase safety, but lead to more 
> eager firing of downstream aggregations. It also will violate a user's 
> expectation that a fire-once trigger fires everything downstream only once, 
> but that expectation appears impossible to satisfy safely.
> - Make the continuation trigger of some triggers be the "invalid" trigger, 
> i.e. require the user to set it explicitly: there's in general no good and 
> safe way to infer what a trigger on a second GBK "truly" should be, based on 
> the trigger of the PCollection input into a first GBK. This is especially 
> true for terminating triggers.
> - Prohibit top-level terminating triggers entirely. This will ensure that the 
> only data that ever gets dropped is "droppably late" data.
> CC: [~bchambers] [~kenn] [~tgroh]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-3288) Guard against unsafe triggers at construction time

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-3288?focusedWorklogId=339743&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339743
 ]

ASF GitHub Bot logged work on BEAM-3288:


Author: ASF GitHub Bot
Created on: 07/Nov/19 03:51
Start Date: 07/Nov/19 03:51
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #9960: [BEAM-3288] Guard 
against unsafe triggers at construction time
URL: https://github.com/apache/beam/pull/9960#issuecomment-550667460
 
 
   Ah, this is of course part of `./gradlew 
:runners:direct-java:needsRunnerTests`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339743)
Time Spent: 3h 10m  (was: 3h)

> Guard against unsafe triggers at construction time 
> ---
>
> Key: BEAM-3288
> URL: https://issues.apache.org/jira/browse/BEAM-3288
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core, sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Current Beam trigger semantics are rather confusing and in some cases 
> extremely unsafe, especially if the pipeline includes multiple chained GBKs. 
> One example of that is https://issues.apache.org/jira/browse/BEAM-3169 .
> There's multiple issues:
> The API allows users to specify terminating top-level triggers (e.g. "trigger 
> a pane after receiving 1 elements in the window, and that's it"), but 
> experience from user support shows that this is nearly always a mistake and 
> the user did not intend to drop all further data.
> In general, triggers are the only place in Beam where data is being dropped 
> without making a lot of very loud noise about it - a practice for which the 
> PTransform style guide uses the language: "never, ever, ever do this".
> Continuation triggers are still worse. For context: continuation trigger is 
> the trigger that's set on the output of a GBK and controls further 
> aggregation of the results of this aggregation by downstream GBKs. The output 
> shouldn't just use the same trigger as the input, because e.g. if the input 
> trigger said "wait for an hour before emitting a pane", that doesn't mean 
> that we should wait for another hour before emitting a result of aggregating 
> the result of the input trigger. Continuation triggers try to simulate the 
> behavior "as if a pane of the input propagated through the entire pipeline", 
> but the implementation of individual continuation triggers doesn't do that. 
> E.g. the continuation of "first N elements in pane" trigger is "first 1 
> element in pane", and if the results of a first GBK are further grouped by a 
> second GBK onto more coarse key (e.g. if everything is grouped onto the same 
> key), that effectively means that, of the keys of the first GBK, only one 
> survives and all others are dropped (what happened in the data loss bug).
> The ultimate fix to all of these things is 
> https://s.apache.org/beam-sink-triggers . However, it is a huge model change, 
> and meanwhile we have to do something. The options are, in order of 
> increasing backward incompatibility (but incompatibility in a "rejecting 
> something that previously was accepted but extremely dangerous" kind of way):
> - Make the continuation trigger of most triggers be the "always-fire" 
> trigger. Seems that this should be the case for all triggers except the 
> watermark trigger. This will definitely increase safety, but lead to more 
> eager firing of downstream aggregations. It also will violate a user's 
> expectation that a fire-once trigger fires everything downstream only once, 
> but that expectation appears impossible to satisfy safely.
> - Make the continuation trigger of some triggers be the "invalid" trigger, 
> i.e. require the user to set it explicitly: there's in general no good and 
> safe way to infer what a trigger on a second GBK "truly" should be, based on 
> the trigger of the PCollection input into a first GBK. This is especially 
> true for terminating triggers.
> - Prohibit top-level terminating triggers entirely. This will ensure that the 
> only data that ever gets dropped is "droppably late" data.
> CC: [~bchambers] [~kenn] [~tgroh]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-3288) Guard against unsafe triggers at construction time

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-3288?focusedWorklogId=339740&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339740
 ]

ASF GitHub Bot logged work on BEAM-3288:


Author: ASF GitHub Bot
Created on: 07/Nov/19 03:44
Start Date: 07/Nov/19 03:44
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #9960: 
[BEAM-3288] Guard against unsafe triggers at construction time
URL: https://github.com/apache/beam/pull/9960#discussion_r343465429
 
 

 ##
 File path: 
sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/Query10.java
 ##
 @@ -314,7 +315,7 @@ public void processElement(ProcessContext c, BoundedWindow 
window)
 name + ".WindowLogFiles",
 Window.>into(
 
FixedWindows.of(Duration.standardSeconds(configuration.windowSizeSec)))
-.triggering(AfterWatermark.pastEndOfWindow())
+.triggering(DefaultTrigger.of())
 
 Review comment:
   (did nothing - the formatting is correct)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339740)
Time Spent: 3h  (was: 2h 50m)

> Guard against unsafe triggers at construction time 
> ---
>
> Key: BEAM-3288
> URL: https://issues.apache.org/jira/browse/BEAM-3288
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core, sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Current Beam trigger semantics are rather confusing and in some cases 
> extremely unsafe, especially if the pipeline includes multiple chained GBKs. 
> One example of that is https://issues.apache.org/jira/browse/BEAM-3169 .
> There's multiple issues:
> The API allows users to specify terminating top-level triggers (e.g. "trigger 
> a pane after receiving 1 elements in the window, and that's it"), but 
> experience from user support shows that this is nearly always a mistake and 
> the user did not intend to drop all further data.
> In general, triggers are the only place in Beam where data is being dropped 
> without making a lot of very loud noise about it - a practice for which the 
> PTransform style guide uses the language: "never, ever, ever do this".
> Continuation triggers are still worse. For context: continuation trigger is 
> the trigger that's set on the output of a GBK and controls further 
> aggregation of the results of this aggregation by downstream GBKs. The output 
> shouldn't just use the same trigger as the input, because e.g. if the input 
> trigger said "wait for an hour before emitting a pane", that doesn't mean 
> that we should wait for another hour before emitting a result of aggregating 
> the result of the input trigger. Continuation triggers try to simulate the 
> behavior "as if a pane of the input propagated through the entire pipeline", 
> but the implementation of individual continuation triggers doesn't do that. 
> E.g. the continuation of "first N elements in pane" trigger is "first 1 
> element in pane", and if the results of a first GBK are further grouped by a 
> second GBK onto more coarse key (e.g. if everything is grouped onto the same 
> key), that effectively means that, of the keys of the first GBK, only one 
> survives and all others are dropped (what happened in the data loss bug).
> The ultimate fix to all of these things is 
> https://s.apache.org/beam-sink-triggers . However, it is a huge model change, 
> and meanwhile we have to do something. The options are, in order of 
> increasing backward incompatibility (but incompatibility in a "rejecting 
> something that previously was accepted but extremely dangerous" kind of way):
> - Make the continuation trigger of most triggers be the "always-fire" 
> trigger. Seems that this should be the case for all triggers except the 
> watermark trigger. This will definitely increase safety, but lead to more 
> eager firing of downstream aggregations. It also will violate a user's 
> expectation that a fire-once trigger fires everything downstream only once, 
> but that expectation appears impossible to satisfy safely.
> - Make the continuation trigger of some triggers be the "invalid" trigger, 
> i.e. require the user to set it explicitly: there's in general no good and 
> safe way to infer what a trigger on a second GBK "truly" should be, based on 
> the trigger of the PCollecti

[jira] [Work logged] (BEAM-3288) Guard against unsafe triggers at construction time

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-3288?focusedWorklogId=339738&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339738
 ]

ASF GitHub Bot logged work on BEAM-3288:


Author: ASF GitHub Bot
Created on: 07/Nov/19 03:41
Start Date: 07/Nov/19 03:41
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #9960: [BEAM-3288] Guard 
against unsafe triggers at construction time
URL: https://github.com/apache/beam/pull/9960#issuecomment-550653454
 
 
   Curiously, `./gradlew -p sdks/java/core test` does not repro the failure 
locally. I have noticed some inconsistencies between `./gradlew -p  
` and `./gradlew :`. I thought these were the same. So we 
probably have some issues in our gradle configs.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339738)
Time Spent: 2h 50m  (was: 2h 40m)

> Guard against unsafe triggers at construction time 
> ---
>
> Key: BEAM-3288
> URL: https://issues.apache.org/jira/browse/BEAM-3288
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core, sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Current Beam trigger semantics are rather confusing and in some cases 
> extremely unsafe, especially if the pipeline includes multiple chained GBKs. 
> One example of that is https://issues.apache.org/jira/browse/BEAM-3169 .
> There's multiple issues:
> The API allows users to specify terminating top-level triggers (e.g. "trigger 
> a pane after receiving 1 elements in the window, and that's it"), but 
> experience from user support shows that this is nearly always a mistake and 
> the user did not intend to drop all further data.
> In general, triggers are the only place in Beam where data is being dropped 
> without making a lot of very loud noise about it - a practice for which the 
> PTransform style guide uses the language: "never, ever, ever do this".
> Continuation triggers are still worse. For context: continuation trigger is 
> the trigger that's set on the output of a GBK and controls further 
> aggregation of the results of this aggregation by downstream GBKs. The output 
> shouldn't just use the same trigger as the input, because e.g. if the input 
> trigger said "wait for an hour before emitting a pane", that doesn't mean 
> that we should wait for another hour before emitting a result of aggregating 
> the result of the input trigger. Continuation triggers try to simulate the 
> behavior "as if a pane of the input propagated through the entire pipeline", 
> but the implementation of individual continuation triggers doesn't do that. 
> E.g. the continuation of "first N elements in pane" trigger is "first 1 
> element in pane", and if the results of a first GBK are further grouped by a 
> second GBK onto more coarse key (e.g. if everything is grouped onto the same 
> key), that effectively means that, of the keys of the first GBK, only one 
> survives and all others are dropped (what happened in the data loss bug).
> The ultimate fix to all of these things is 
> https://s.apache.org/beam-sink-triggers . However, it is a huge model change, 
> and meanwhile we have to do something. The options are, in order of 
> increasing backward incompatibility (but incompatibility in a "rejecting 
> something that previously was accepted but extremely dangerous" kind of way):
> - Make the continuation trigger of most triggers be the "always-fire" 
> trigger. Seems that this should be the case for all triggers except the 
> watermark trigger. This will definitely increase safety, but lead to more 
> eager firing of downstream aggregations. It also will violate a user's 
> expectation that a fire-once trigger fires everything downstream only once, 
> but that expectation appears impossible to satisfy safely.
> - Make the continuation trigger of some triggers be the "invalid" trigger, 
> i.e. require the user to set it explicitly: there's in general no good and 
> safe way to infer what a trigger on a second GBK "truly" should be, based on 
> the trigger of the PCollection input into a first GBK. This is especially 
> true for terminating triggers.
> - Prohibit top-level terminating triggers entirely. This will ensure that the 
> only data that ever gets dropped is "droppably late" data.
> CC: [~bchambers] [~kenn] [~tgroh]



--
This message was sent by Atlassia

[jira] [Work logged] (BEAM-3288) Guard against unsafe triggers at construction time

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-3288?focusedWorklogId=339733&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339733
 ]

ASF GitHub Bot logged work on BEAM-3288:


Author: ASF GitHub Bot
Created on: 07/Nov/19 03:27
Start Date: 07/Nov/19 03:27
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #9960: 
[BEAM-3288] Guard against unsafe triggers at construction time
URL: https://github.com/apache/beam/pull/9960#discussion_r343462469
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupByKey.java
 ##
 @@ -162,6 +164,45 @@ public static void applicableTo(PCollection input) {
   throw new IllegalStateException(
   "GroupByKey must have a valid Window merge function.  " + "Invalid 
because: " + cause);
 }
+
+// Validate that the trigger does not finish before garbage collection time
+if (!triggerIsSafe(windowingStrategy)) {
+  throw new IllegalArgumentException(
+  String.format(
+  "Unsafe trigger may lose data, see"
+  + " https://s.apache.org/finishing-triggers-drop-data: %s",
+  windowingStrategy.getTrigger()));
+}
+  }
+
+  // Note that Never trigger finishes *at* GC time so it is OK, and
+  // AfterWatermark.fromEndOfWindow() finishes at end-of-window time so it is
+  // OK if there is no allowed lateness.
+  private static boolean triggerIsSafe(WindowingStrategy 
windowingStrategy) {
+if (!windowingStrategy.getTrigger().mayFinish()) {
 
 Review comment:
   I agree exactly with all of your analysis. This pull request will forbid 
triggers of type a).
   
   The meaning of `Trigger.mayFinish()` is to answer this question "is this a 
trigger that might finish?"
   
   If the answer to that question is "yes" then it might be a type a) trigger. 
It depends on allowed lateness. There are a couple of very special triggers 
where the answer "yes" is OK as long as allowed lateness is zero.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339733)
Time Spent: 2.5h  (was: 2h 20m)

> Guard against unsafe triggers at construction time 
> ---
>
> Key: BEAM-3288
> URL: https://issues.apache.org/jira/browse/BEAM-3288
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core, sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Current Beam trigger semantics are rather confusing and in some cases 
> extremely unsafe, especially if the pipeline includes multiple chained GBKs. 
> One example of that is https://issues.apache.org/jira/browse/BEAM-3169 .
> There's multiple issues:
> The API allows users to specify terminating top-level triggers (e.g. "trigger 
> a pane after receiving 1 elements in the window, and that's it"), but 
> experience from user support shows that this is nearly always a mistake and 
> the user did not intend to drop all further data.
> In general, triggers are the only place in Beam where data is being dropped 
> without making a lot of very loud noise about it - a practice for which the 
> PTransform style guide uses the language: "never, ever, ever do this".
> Continuation triggers are still worse. For context: continuation trigger is 
> the trigger that's set on the output of a GBK and controls further 
> aggregation of the results of this aggregation by downstream GBKs. The output 
> shouldn't just use the same trigger as the input, because e.g. if the input 
> trigger said "wait for an hour before emitting a pane", that doesn't mean 
> that we should wait for another hour before emitting a result of aggregating 
> the result of the input trigger. Continuation triggers try to simulate the 
> behavior "as if a pane of the input propagated through the entire pipeline", 
> but the implementation of individual continuation triggers doesn't do that. 
> E.g. the continuation of "first N elements in pane" trigger is "first 1 
> element in pane", and if the results of a first GBK are further grouped by a 
> second GBK onto more coarse key (e.g. if everything is grouped onto the same 
> key), that effectively means that, of the keys of the first GBK, only one 
> survives and all others are dropped (what happened in the data loss bug).
> The ultimate fix to all of these things is 
> https://s.apache.org/beam-sink-triggers . However, i

[jira] [Work logged] (BEAM-3288) Guard against unsafe triggers at construction time

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-3288?focusedWorklogId=339734&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339734
 ]

ASF GitHub Bot logged work on BEAM-3288:


Author: ASF GitHub Bot
Created on: 07/Nov/19 03:27
Start Date: 07/Nov/19 03:27
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #9960: 
[BEAM-3288] Guard against unsafe triggers at construction time
URL: https://github.com/apache/beam/pull/9960#discussion_r343462578
 
 

 ##
 File path: 
sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/Query10.java
 ##
 @@ -314,7 +315,7 @@ public void processElement(ProcessContext c, BoundedWindow 
window)
 name + ".WindowLogFiles",
 Window.>into(
 
FixedWindows.of(Duration.standardSeconds(configuration.windowSizeSec)))
-.triggering(AfterWatermark.pastEndOfWindow())
+.triggering(DefaultTrigger.of())
 
 Review comment:
   I will run spotlessApply.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339734)
Time Spent: 2h 40m  (was: 2.5h)

> Guard against unsafe triggers at construction time 
> ---
>
> Key: BEAM-3288
> URL: https://issues.apache.org/jira/browse/BEAM-3288
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core, sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Current Beam trigger semantics are rather confusing and in some cases 
> extremely unsafe, especially if the pipeline includes multiple chained GBKs. 
> One example of that is https://issues.apache.org/jira/browse/BEAM-3169 .
> There's multiple issues:
> The API allows users to specify terminating top-level triggers (e.g. "trigger 
> a pane after receiving 1 elements in the window, and that's it"), but 
> experience from user support shows that this is nearly always a mistake and 
> the user did not intend to drop all further data.
> In general, triggers are the only place in Beam where data is being dropped 
> without making a lot of very loud noise about it - a practice for which the 
> PTransform style guide uses the language: "never, ever, ever do this".
> Continuation triggers are still worse. For context: continuation trigger is 
> the trigger that's set on the output of a GBK and controls further 
> aggregation of the results of this aggregation by downstream GBKs. The output 
> shouldn't just use the same trigger as the input, because e.g. if the input 
> trigger said "wait for an hour before emitting a pane", that doesn't mean 
> that we should wait for another hour before emitting a result of aggregating 
> the result of the input trigger. Continuation triggers try to simulate the 
> behavior "as if a pane of the input propagated through the entire pipeline", 
> but the implementation of individual continuation triggers doesn't do that. 
> E.g. the continuation of "first N elements in pane" trigger is "first 1 
> element in pane", and if the results of a first GBK are further grouped by a 
> second GBK onto more coarse key (e.g. if everything is grouped onto the same 
> key), that effectively means that, of the keys of the first GBK, only one 
> survives and all others are dropped (what happened in the data loss bug).
> The ultimate fix to all of these things is 
> https://s.apache.org/beam-sink-triggers . However, it is a huge model change, 
> and meanwhile we have to do something. The options are, in order of 
> increasing backward incompatibility (but incompatibility in a "rejecting 
> something that previously was accepted but extremely dangerous" kind of way):
> - Make the continuation trigger of most triggers be the "always-fire" 
> trigger. Seems that this should be the case for all triggers except the 
> watermark trigger. This will definitely increase safety, but lead to more 
> eager firing of downstream aggregations. It also will violate a user's 
> expectation that a fire-once trigger fires everything downstream only once, 
> but that expectation appears impossible to satisfy safely.
> - Make the continuation trigger of some triggers be the "invalid" trigger, 
> i.e. require the user to set it explicitly: there's in general no good and 
> safe way to infer what a trigger on a second GBK "truly" should be, based on 
> the trigger of the PCollection input i

[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=339712&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339712
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 07/Nov/19 02:45
Start Date: 07/Nov/19 02:45
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on issue #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-550595649
 
 
   > @chamikaramj are there up-to-date docs anywhere on how to run the 
BigQueryIOIT tests? The example args in the javadoc are super out of date.
   
   nm I hacked it up enough to get it to run.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339712)
Time Spent: 3h 10m  (was: 3h)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=339698&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339698
 ]

ASF GitHub Bot logged work on BEAM-8382:


Author: ASF GitHub Bot
Created on: 07/Nov/19 02:14
Start Date: 07/Nov/19 02:14
Worklog Time Spent: 10m 
  Work Description: jfarr commented on pull request #9765: [WIP][BEAM-8382] 
Add rate limit policy to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9765#discussion_r343429384
 
 

 ##
 File path: 
sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/ShardReadersPool.java
 ##
 @@ -149,10 +157,20 @@ private void readLoop(ShardRecordsIterator 
shardRecordsIterator) {
   recordsQueue.put(kinesisRecord);
   
numberOfRecordsInAQueueByShard.get(kinesisRecord.getShardId()).incrementAndGet();
 }
+rateLimiter.onSuccess(kinesisRecords);
+  } catch (KinesisClientThrottledException e) {
+try {
+  rateLimiter.onThrottle(e);
+} catch (InterruptedException ex) {
+  LOG.warn("Thread was interrupted, finishing the read loop", ex);
+  Thread.currentThread().interrupt();
 
 Review comment:
   Thread.interrupt() does not interrupt the current thread, it sets the 
thread's interrupt status. The short answer is that catching 
InterruptedException clears the interrupt status so in order to handle it 
properly you should always either rethrow the exception or reset the thread's 
interrupt status so any code higher up the call stack can still detect the 
interruption. There is a good article by Brian Goetz that goes into the 
details: https://www.ibm.com/developerworks/java/library/j-jtp05236/index.html.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339698)
Time Spent: 9.5h  (was: 9h 20m)

> Add polling interval to KinesisIO.Read
> --
>
> Key: BEAM-8382
> URL: https://issues.apache.org/jira/browse/BEAM-8382
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> With the current implementation we are observing Kinesis throttling due to 
> ReadProvisionedThroughputExceeded on the order of hundreds of times per 
> second, regardless of the actual Kinesis throughput. This is because the 
> ShardReadersPool readLoop() method is polling getRecords() as fast as 
> possible.
> From the KDS documentation:
> {quote}Each shard can support up to five read transactions per second.
> {quote}
> and
> {quote}For best results, sleep for at least 1 second (1,000 milliseconds) 
> between calls to getRecords to avoid exceeding the limit on getRecords 
> frequency.
> {quote}
> [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html]
> [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=339693&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339693
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 07/Nov/19 02:05
Start Date: 07/Nov/19 02:05
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on issue #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-550586372
 
 
   @chamikaramj are there up-to-date docs anywhere on how to run the 
BigQueryIOIT tests?  The example args in the javadoc are super out of date.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339693)
Time Spent: 3h  (was: 2h 50m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-3288) Guard against unsafe triggers at construction time

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-3288?focusedWorklogId=339692&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339692
 ]

ASF GitHub Bot logged work on BEAM-3288:


Author: ASF GitHub Bot
Created on: 07/Nov/19 02:04
Start Date: 07/Nov/19 02:04
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #9960: 
[BEAM-3288] Guard against unsafe triggers at construction time
URL: https://github.com/apache/beam/pull/9960#discussion_r343423684
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupByKey.java
 ##
 @@ -162,6 +164,45 @@ public static void applicableTo(PCollection input) {
   throw new IllegalStateException(
   "GroupByKey must have a valid Window merge function.  " + "Invalid 
because: " + cause);
 }
+
+// Validate that the trigger does not finish before garbage collection time
+if (!triggerIsSafe(windowingStrategy)) {
 
 Review comment:
   Combine is not a primitive operation. If you look at Combine.java you can 
see that it expands to include a GroupByKey. When a runner sees it in a 
pipeline, it will usually replace the operation. But the validation happens on 
the raw graph.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339692)
Time Spent: 2h 20m  (was: 2h 10m)

> Guard against unsafe triggers at construction time 
> ---
>
> Key: BEAM-3288
> URL: https://issues.apache.org/jira/browse/BEAM-3288
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core, sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Current Beam trigger semantics are rather confusing and in some cases 
> extremely unsafe, especially if the pipeline includes multiple chained GBKs. 
> One example of that is https://issues.apache.org/jira/browse/BEAM-3169 .
> There's multiple issues:
> The API allows users to specify terminating top-level triggers (e.g. "trigger 
> a pane after receiving 1 elements in the window, and that's it"), but 
> experience from user support shows that this is nearly always a mistake and 
> the user did not intend to drop all further data.
> In general, triggers are the only place in Beam where data is being dropped 
> without making a lot of very loud noise about it - a practice for which the 
> PTransform style guide uses the language: "never, ever, ever do this".
> Continuation triggers are still worse. For context: continuation trigger is 
> the trigger that's set on the output of a GBK and controls further 
> aggregation of the results of this aggregation by downstream GBKs. The output 
> shouldn't just use the same trigger as the input, because e.g. if the input 
> trigger said "wait for an hour before emitting a pane", that doesn't mean 
> that we should wait for another hour before emitting a result of aggregating 
> the result of the input trigger. Continuation triggers try to simulate the 
> behavior "as if a pane of the input propagated through the entire pipeline", 
> but the implementation of individual continuation triggers doesn't do that. 
> E.g. the continuation of "first N elements in pane" trigger is "first 1 
> element in pane", and if the results of a first GBK are further grouped by a 
> second GBK onto more coarse key (e.g. if everything is grouped onto the same 
> key), that effectively means that, of the keys of the first GBK, only one 
> survives and all others are dropped (what happened in the data loss bug).
> The ultimate fix to all of these things is 
> https://s.apache.org/beam-sink-triggers . However, it is a huge model change, 
> and meanwhile we have to do something. The options are, in order of 
> increasing backward incompatibility (but incompatibility in a "rejecting 
> something that previously was accepted but extremely dangerous" kind of way):
> - Make the continuation trigger of most triggers be the "always-fire" 
> trigger. Seems that this should be the case for all triggers except the 
> watermark trigger. This will definitely increase safety, but lead to more 
> eager firing of downstream aggregations. It also will violate a user's 
> expectation that a fire-once trigger fires everything downstream only once, 
> but that expectation appears impossible to satisfy safely.
> - Make the continuation trigger of some triggers be the "invalid" trigger, 
> i

[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=339691&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339691
 ]

ASF GitHub Bot logged work on BEAM-8382:


Author: ASF GitHub Bot
Created on: 07/Nov/19 02:04
Start Date: 07/Nov/19 02:04
Worklog Time Spent: 10m 
  Work Description: jfarr commented on pull request #9765: [WIP][BEAM-8382] 
Add rate limit policy to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9765#discussion_r343423203
 
 

 ##
 File path: 
sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/RateLimitPolicyFactory.java
 ##
 @@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.kinesis;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.List;
+import org.apache.beam.sdk.util.BackOff;
+import org.apache.beam.sdk.util.FluentBackoff;
+import org.joda.time.Duration;
+
+/**
+ * Implement this interface to create a {@code RateLimitPolicy}. Used to 
create a rate limiter for
+ * each shard.
+ */
+public interface RateLimitPolicyFactory extends Serializable {
+
+  RateLimitPolicy getRateLimitPolicy();
+
+  static RateLimitPolicyFactory withoutLimiter() {
+return () -> new RateLimitPolicy() {};
+  }
+
+  static RateLimitPolicyFactory withBackoff(FluentBackoff fluentBackoff) {
+return () -> new BackoffRateLimiter(fluentBackoff);
+  }
+
+  static RateLimitPolicyFactory withFixedDelay() {
+return FixedDelayRateLimiter::new;
+  }
+
+  static RateLimitPolicyFactory withFixedDelay(Duration delay) {
+return () -> new FixedDelayRateLimiter(delay);
+  }
+
+  class BackoffRateLimiter implements RateLimitPolicy {
+
+private final BackOff backoff;
+
+public BackoffRateLimiter(FluentBackoff fluentBackoff) {
+  this.backoff =
+  fluentBackoff
+  // never stop retrying
+  .withMaxRetries(Integer.MAX_VALUE)
+  .withMaxCumulativeBackoff(Duration.standardDays(1000))
+  .backoff();
+}
+
+@Override
+public void onThrottle(KinesisClientThrottledException t) throws 
InterruptedException {
+  try {
+long backOffMillis = backoff.nextBackOffMillis();
+if (backOffMillis != BackOff.STOP) {
+  Thread.sleep(backOffMillis);
+}
+  } catch (IOException e) {
+// do nothing
+  }
+}
+
+@Override
+public void onSuccess(List records) {
+  try {
+backoff.reset();
+  } catch (IOException e) {
+// do nothing
+  }
+}
+  }
+
+  class FixedDelayRateLimiter implements RateLimitPolicy {
+
+private static final Duration DEFAULT_DELAY = Duration.standardSeconds(1);
+
+private final Duration delay;
+
+public FixedDelayRateLimiter() {
+  this(DEFAULT_DELAY);
+}
+
+public FixedDelayRateLimiter(Duration delay) {
+  this.delay = delay;
+}
+
+@Override
+public void onSuccess(List records) throws 
InterruptedException {
 
 Review comment:
   It's hard to anticipate what state a custom rate limit policy will want to 
keep but since we have the list of records and it might want them I just passed 
them in here. Same with passing the KinesisClientThrottledException in 
onThrottled(). Currently, no, there are no implementations using these.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339691)
Time Spent: 9h 20m  (was: 9h 10m)

> Add polling interval to KinesisIO.Read
> --
>
> Key: BEAM-8382
> URL: https://issues.apache.org/jira/browse/BEAM-8382
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter:

[jira] [Work logged] (BEAM-8557) Clean up useless null check.

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8557?focusedWorklogId=339690&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339690
 ]

ASF GitHub Bot logged work on BEAM-8557:


Author: ASF GitHub Bot
Created on: 07/Nov/19 02:03
Start Date: 07/Nov/19 02:03
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on pull request #9991: 
[BEAM-8557] Cleanup the useless null check for ResponseStreamObserver…
URL: https://github.com/apache/beam/pull/9991#discussion_r343423145
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java
 ##
 @@ -148,16 +148,14 @@ public void onNext(BeamFnApi.InstructionResponse 
response) {
   LOG.debug("Received InstructionResponse {}", response);
   CompletableFuture responseFuture =
   outstandingRequests.remove(response.getInstructionId());
-  if (responseFuture != null) {
-if (response.getError().isEmpty()) {
-  responseFuture.complete(response);
-} else {
-  responseFuture.completeExceptionally(
-  new RuntimeException(
-  String.format(
-  "Error received from SDK harness for instruction %s: %s",
-  response.getInstructionId(), response.getError(;
-}
+  if (response.getError().isEmpty()) {
 
 Review comment:
   Totally agree that this may mean a bug in the SDK harness if it happens. I'm 
just thinking that a log message may be not enough as it's easy to be ignored. 
An exception makes sure that problem was discovered in time. But I'm fine if 
you think log is enough as users will investigate the log if there are serious 
problem for them after all. So, we need only add a WARNING log for this case, 
right?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339690)
Time Spent: 1h 40m  (was: 1.5h)

> Clean up useless null check.
> 
>
> Key: BEAM-8557
> URL: https://issues.apache.org/jira/browse/BEAM-8557
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core, sdk-java-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> I think we do not need null check here:
> [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151]
> Because before the the `onNext` call, the `Future` already put into the queue 
> in `handle` method.
>  
> I found the test as follows:
> {code:java}
>  @Test
>  public void testUnknownResponseIgnored() throws Exception{code}
> I do not know why we need test this case? I think it would be better if we 
> throw the Exception for an UnknownResponse, otherwise, this may hidden a 
> potential bug. 
> Please correct me if there anything I misunderstand @kennknowles
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=339689&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339689
 ]

ASF GitHub Bot logged work on BEAM-8382:


Author: ASF GitHub Bot
Created on: 07/Nov/19 02:00
Start Date: 07/Nov/19 02:00
Worklog Time Spent: 10m 
  Work Description: jfarr commented on issue #9765: [WIP][BEAM-8382] Add 
rate limit policy to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9765#issuecomment-550585285
 
 
   > 2\. From user perspective, what if users set all 3 or 4 withRateLimitXXX ? 
which one will override what? or we apply all of them? Would it be helpful if 
we provide an enum?
   
   If you call one of these methods again it will override the previous call.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339689)
Time Spent: 9h 10m  (was: 9h)

> Add polling interval to KinesisIO.Read
> --
>
> Key: BEAM-8382
> URL: https://issues.apache.org/jira/browse/BEAM-8382
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> With the current implementation we are observing Kinesis throttling due to 
> ReadProvisionedThroughputExceeded on the order of hundreds of times per 
> second, regardless of the actual Kinesis throughput. This is because the 
> ShardReadersPool readLoop() method is polling getRecords() as fast as 
> possible.
> From the KDS documentation:
> {quote}Each shard can support up to five read transactions per second.
> {quote}
> and
> {quote}For best results, sleep for at least 1 second (1,000 milliseconds) 
> between calls to getRecords to avoid exceeding the limit on getRecords 
> frequency.
> {quote}
> [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html]
> [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=339688&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339688
 ]

ASF GitHub Bot logged work on BEAM-8382:


Author: ASF GitHub Bot
Created on: 07/Nov/19 01:56
Start Date: 07/Nov/19 01:56
Worklog Time Spent: 10m 
  Work Description: jfarr commented on issue #9765: [WIP][BEAM-8382] Add 
rate limit policy to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9765#issuecomment-550584252
 
 
   > 1. Have we try to use Kinesis's RetryPolicy and BackoffStrategy? This look 
like we reengineer this feature, which already supported by Kinesis's API. Here 
is the example: 
https://www.programcreek.com/java-api-examples/?api=com.amazonaws.retry.RetryPolicy.
 If we already tried it, then what would be the reasons not to use the 
Kinesis's one ?
   
   I'm not sure if RetryPolicy and BackoffStrategy apply to 
LimitExceededException / ProvisionedThroughputExceededException but I can look 
into that. If so I think it makes sense to just configure this in the Kinesis 
client instead of having a BackoffRateLimitPolicy. What do you think 
@aromanenko-dev and @lukecwik ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339688)
Time Spent: 9h  (was: 8h 50m)

> Add polling interval to KinesisIO.Read
> --
>
> Key: BEAM-8382
> URL: https://issues.apache.org/jira/browse/BEAM-8382
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> With the current implementation we are observing Kinesis throttling due to 
> ReadProvisionedThroughputExceeded on the order of hundreds of times per 
> second, regardless of the actual Kinesis throughput. This is because the 
> ShardReadersPool readLoop() method is polling getRecords() as fast as 
> possible.
> From the KDS documentation:
> {quote}Each shard can support up to five read transactions per second.
> {quote}
> and
> {quote}For best results, sleep for at least 1 second (1,000 milliseconds) 
> between calls to getRecords to avoid exceeding the limit on getRecords 
> frequency.
> {quote}
> [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html]
> [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=339687&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339687
 ]

ASF GitHub Bot logged work on BEAM-8382:


Author: ASF GitHub Bot
Created on: 07/Nov/19 01:56
Start Date: 07/Nov/19 01:56
Worklog Time Spent: 10m 
  Work Description: jfarr commented on pull request #9765: [WIP][BEAM-8382] 
Add rate limit policy to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9765#discussion_r343418039
 
 

 ##
 File path: 
sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java
 ##
 @@ -420,6 +426,47 @@ public Read 
withCustomWatermarkPolicy(WatermarkPolicyFactory watermarkPolicyFact
   return 
toBuilder().setWatermarkPolicyFactory(watermarkPolicyFactory).build();
 }
 
+/**
+ * Specifies the rate limit policy as BackoffRateLimiter.
+ *
+ * @param fluentBackoff The {@code FluentBackoff} used to create the 
backoff policy.
+ */
+public Read withBackoffRateLimitPolicy(FluentBackoff fluentBackoff) {
+  checkArgument(fluentBackoff != null, "fluentBackoff cannot be null");
+  return toBuilder()
+  
.setRateLimitPolicyFactory(RateLimitPolicyFactory.withBackoff(fluentBackoff))
+  .build();
+}
+
+/**
+ * Specifies the rate limit policy as FixedDelayRateLimiter with the 
default delay of 1 second.
+ */
+public Read withFixedDelayRateLimitPolicy() {
+  return 
toBuilder().setRateLimitPolicyFactory(RateLimitPolicyFactory.withFixedDelay()).build();
+}
+
+/**
+ * Specifies the rate limit policy as FixedDelayRateLimiter with the given 
delay.
+ *
+ * @param delay Denotes the fixed delay duration.
+ */
+public Read withFixedDelayRateLimitPolicy(Duration delay) {
 
 Review comment:
   > 1. Have we try to use Kinesis's RetryPolicy and BackoffStrategy? This look 
like we reengineer this feature, which already supported by Kinesis's API. Here 
is the example: 
https://www.programcreek.com/java-api-examples/?api=com.amazonaws.retry.RetryPolicy.
 If we already tried it, then what would be the reasons not to use the 
Kinesis's one ?
   
   I'm not sure if RetryPolicy and BackoffStrategy apply to 
LimitExceededException / ProvisionedThroughputExceededException but I can look 
into that. If so I think it makes sense to just configure this in the Kinesis 
client instead of having a BackoffRateLimitPolicy. What do you think 
@aromanenko-dev and @lukecwik ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339687)
Time Spent: 8h 50m  (was: 8h 40m)

> Add polling interval to KinesisIO.Read
> --
>
> Key: BEAM-8382
> URL: https://issues.apache.org/jira/browse/BEAM-8382
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> With the current implementation we are observing Kinesis throttling due to 
> ReadProvisionedThroughputExceeded on the order of hundreds of times per 
> second, regardless of the actual Kinesis throughput. This is because the 
> ShardReadersPool readLoop() method is polling getRecords() as fast as 
> possible.
> From the KDS documentation:
> {quote}Each shard can support up to five read transactions per second.
> {quote}
> and
> {quote}For best results, sleep for at least 1 second (1,000 milliseconds) 
> between calls to getRecords to avoid exceeding the limit on getRecords 
> frequency.
> {quote}
> [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html]
> [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=339686&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339686
 ]

ASF GitHub Bot logged work on BEAM-8382:


Author: ASF GitHub Bot
Created on: 07/Nov/19 01:55
Start Date: 07/Nov/19 01:55
Worklog Time Spent: 10m 
  Work Description: jfarr commented on pull request #9765: [WIP][BEAM-8382] 
Add rate limit policy to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9765#discussion_r343418039
 
 

 ##
 File path: 
sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java
 ##
 @@ -420,6 +426,47 @@ public Read 
withCustomWatermarkPolicy(WatermarkPolicyFactory watermarkPolicyFact
   return 
toBuilder().setWatermarkPolicyFactory(watermarkPolicyFactory).build();
 }
 
+/**
+ * Specifies the rate limit policy as BackoffRateLimiter.
+ *
+ * @param fluentBackoff The {@code FluentBackoff} used to create the 
backoff policy.
+ */
+public Read withBackoffRateLimitPolicy(FluentBackoff fluentBackoff) {
+  checkArgument(fluentBackoff != null, "fluentBackoff cannot be null");
+  return toBuilder()
+  
.setRateLimitPolicyFactory(RateLimitPolicyFactory.withBackoff(fluentBackoff))
+  .build();
+}
+
+/**
+ * Specifies the rate limit policy as FixedDelayRateLimiter with the 
default delay of 1 second.
+ */
+public Read withFixedDelayRateLimitPolicy() {
+  return 
toBuilder().setRateLimitPolicyFactory(RateLimitPolicyFactory.withFixedDelay()).build();
+}
+
+/**
+ * Specifies the rate limit policy as FixedDelayRateLimiter with the given 
delay.
+ *
+ * @param delay Denotes the fixed delay duration.
+ */
+public Read withFixedDelayRateLimitPolicy(Duration delay) {
 
 Review comment:
   > 1. Have we try to use Kinesis's RetryPolicy and BackoffStrategy? This look 
like we reengineer this feature, which already supported by Kinesis's API. Here 
is the example: 
https://www.programcreek.com/java-api-examples/?api=com.amazonaws.retry.RetryPolicy.
 If we already tried it, then what would be the reasons not to use the 
Kinesis's one ?
   
   I'm not sure if RetryPolicy and BackoffStrategy apply to 
LimitExceededException / ProvisionedThroughputExceededException but I can look 
into that. If so I think it makes sense to just configure this in the Kinesis 
client instead of having a BackoffRateLimitPolicy. What do you think 
@aromanenko-dev and @lukecwik ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339686)
Time Spent: 8h 40m  (was: 8.5h)

> Add polling interval to KinesisIO.Read
> --
>
> Key: BEAM-8382
> URL: https://issues.apache.org/jira/browse/BEAM-8382
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> With the current implementation we are observing Kinesis throttling due to 
> ReadProvisionedThroughputExceeded on the order of hundreds of times per 
> second, regardless of the actual Kinesis throughput. This is because the 
> ShardReadersPool readLoop() method is polling getRecords() as fast as 
> possible.
> From the KDS documentation:
> {quote}Each shard can support up to five read transactions per second.
> {quote}
> and
> {quote}For best results, sleep for at least 1 second (1,000 milliseconds) 
> between calls to getRecords to avoid exceeding the limit on getRecords 
> frequency.
> {quote}
> [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html]
> [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=339679&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339679
 ]

ASF GitHub Bot logged work on BEAM-8382:


Author: ASF GitHub Bot
Created on: 07/Nov/19 01:30
Start Date: 07/Nov/19 01:30
Worklog Time Spent: 10m 
  Work Description: jfarr commented on pull request #9765: [WIP][BEAM-8382] 
Add rate limit policy to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9765#discussion_r343412074
 
 

 ##
 File path: 
sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java
 ##
 @@ -420,6 +426,47 @@ public Read 
withCustomWatermarkPolicy(WatermarkPolicyFactory watermarkPolicyFact
   return 
toBuilder().setWatermarkPolicyFactory(watermarkPolicyFactory).build();
 }
 
+/**
+ * Specifies the rate limit policy as BackoffRateLimiter.
+ *
+ * @param fluentBackoff The {@code FluentBackoff} used to create the 
backoff policy.
+ */
+public Read withBackoffRateLimitPolicy(FluentBackoff fluentBackoff) {
 
 Review comment:
   I was considering exposing a wrapper object like SnsIO. I didn't realize 
FluentBackoff was internal. I think that makes a lot of sense here too then.
   
   What does it mean to give up retrying with an unbounded source though? What 
should we do in that case? Throw an unchecked exception and let the pipeline 
crash? That's why I explicitly overrode max retries and max cumulative backoff 
because I didn't think that made sense but I can change that if you want.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339679)
Time Spent: 8.5h  (was: 8h 20m)

> Add polling interval to KinesisIO.Read
> --
>
> Key: BEAM-8382
> URL: https://issues.apache.org/jira/browse/BEAM-8382
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> With the current implementation we are observing Kinesis throttling due to 
> ReadProvisionedThroughputExceeded on the order of hundreds of times per 
> second, regardless of the actual Kinesis throughput. This is because the 
> ShardReadersPool readLoop() method is polling getRecords() as fast as 
> possible.
> From the KDS documentation:
> {quote}Each shard can support up to five read transactions per second.
> {quote}
> and
> {quote}For best results, sleep for at least 1 second (1,000 milliseconds) 
> between calls to getRecords to avoid exceeding the limit on getRecords 
> frequency.
> {quote}
> [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html]
> [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=339675&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339675
 ]

ASF GitHub Bot logged work on BEAM-8382:


Author: ASF GitHub Bot
Created on: 07/Nov/19 01:23
Start Date: 07/Nov/19 01:23
Worklog Time Spent: 10m 
  Work Description: jfarr commented on pull request #9765: [WIP][BEAM-8382] 
Add rate limit policy to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9765#discussion_r343410781
 
 

 ##
 File path: 
sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/ShardReadersPool.java
 ##
 @@ -51,15 +51,13 @@
 
   /**
* Executor service for running the threads that read records from shards 
handled by this pool.
-   * Each thread runs the {@link 
ShardReadersPool#readLoop(ShardRecordsIterator)} method and handles
-   * exactly one shard.
+   * Each thread runs the {@link ShardReadersPool#readLoop} method and handles 
exactly one shard.
 
 Review comment:
   Sure, I'll change that.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339675)
Time Spent: 8h 20m  (was: 8h 10m)

> Add polling interval to KinesisIO.Read
> --
>
> Key: BEAM-8382
> URL: https://issues.apache.org/jira/browse/BEAM-8382
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> With the current implementation we are observing Kinesis throttling due to 
> ReadProvisionedThroughputExceeded on the order of hundreds of times per 
> second, regardless of the actual Kinesis throughput. This is because the 
> ShardReadersPool readLoop() method is polling getRecords() as fast as 
> possible.
> From the KDS documentation:
> {quote}Each shard can support up to five read transactions per second.
> {quote}
> and
> {quote}For best results, sleep for at least 1 second (1,000 milliseconds) 
> between calls to getRecords to avoid exceeding the limit on getRecords 
> frequency.
> {quote}
> [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html]
> [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=339673&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339673
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 07/Nov/19 01:22
Start Date: 07/Nov/19 01:22
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on issue #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-550576486
 
 
   > Thanks. Looks great to me.
   > 
   > To make sure I understood correctly, this will not be enabled for existing 
users by default and to enable this users have to specify 
withAvroFormatFunction(), correct ?

   Correct, with schemas I think we could make this enabled transparently, but 
for now its opt-in only.
   
   > Also, can we add a version of BigQueryIOIT so that we can continue to 
monitor both Avro and JSON based BQ write transforms ?
   
   Yeah I can add that in there.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339673)
Time Spent: 2h 50m  (was: 2h 40m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=339672&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339672
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 07/Nov/19 01:22
Start Date: 07/Nov/19 01:22
Worklog Time Spent: 10m 
  Work Description: liumomo315 commented on issue #9957: [BEAM-8575] Add 
validates runner tests for 1. Custom window fn: Test a customized window fn 
work as expected; 2. Windows idempotency: Applying the same window fn (or 
window fn + GBK) to the input multiple times will have the same effect as 
applying it once.
URL: https://github.com/apache/beam/pull/9957#issuecomment-550576421
 
 
   > Welcome to Beam, all non-trivial PRs are meant to be accompanied with a 
JIRA in the title as per the [contribution 
guide](https://beam.apache.org/contribute/#share-your-intent).
   > 
   Thanks for the pointer. Added JIRA:)
   
   > Also, please address the lint issue:
   > 
   > ```
   > Task :sdks:python:test-suites:tox:py37:lintPy37
   > 15:45:38 * Module apache_beam.transforms.window_test
   > 15:45:38 apache_beam/transforms/window_test.py:283:48: E0110: Abstract 
class 'TestCustomWindows' with abstract methods instantiated 
(abstract-class-instantiated)
   > ```
   Done fully implementing it, but have a question on whether I could make it 
simpler without changing other files, in another thread. Also cc'ed you there. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339672)
Time Spent: 40m  (was: 0.5h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=339668&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339668
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 07/Nov/19 01:17
Start Date: 07/Nov/19 01:17
Worklog Time Spent: 10m 
  Work Description: liumomo315 commented on pull request #9957: [BEAM-8575] 
Add validates runner tests for 1. Custom window fn: Test a customized window fn 
work as expected; 2. Windows idempotency: Applying the same window fn (or 
window fn + GBK) to the input multiple times will have the same effect as 
applying it once.
URL: https://github.com/apache/beam/pull/9957#discussion_r343409486
 
 

 ##
 File path: sdks/python/apache_beam/transforms/window_test.py
 ##
 @@ -65,6 +76,44 @@ def process(self, element, window=core.DoFn.WindowParam):
 
 reify_windows = core.ParDo(ReifyWindowsFn())
 
+class TestCustomWindows(NonMergingWindowFn):
+  """A custom non merging window fn which assigns elements into interval 
windows
+  based on the element timestamps.
+  """
+
+  def __init__(self, first_window_end, second_window_end):
+self.first_window_end = Timestamp.of(first_window_end)
+self.second_window_end = Timestamp.of(second_window_end)
+
+  def assign(self, context):
+timestamp = context.timestamp
+if timestamp < self.first_window_end:
+  return [IntervalWindow(0, self.first_window_end)]
+elif timestamp < self.second_window_end:
+  return [IntervalWindow(self.first_window_end, self.second_window_end)]
+else:
+  return [IntervalWindow(self.second_window_end, timestamp)]
+
+  def get_window_coder(self):
+return IntervalWindowCoder()
+
+  def to_runner_api_parameter(self, context):
 
 Review comment:
   This custom window fn is added only for a test, to test a non-standard 
window fn. To fully implement it, I need to also implement 
to_runner_api_parameter and from_runner_api_parameter. Those require me to also 
change the standard_window_fns.proto and common_urns.py. Are these changes 
necessary? If not, is there a better way to do this? Thanks! @lukecwik 
@robertwb 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339668)
Time Spent: 0.5h  (was: 20m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-11-06 Thread Eugene Kirpichov (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968834#comment-16968834
 ] 

Eugene Kirpichov commented on BEAM-8561:


Please above all make sure that this connector provides APIs compatible with 
FileIO: i.e. ThriftIO.readFiles() and ThritfIO.sink().
Providing ThriftIO.read() and write() for basic use cases also makes sense, but 
there's no need to make them super customizable, or to provide readAll() - all 
advanced use cases can be handled by the two methods above + FileIO.

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Minor
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=339663&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339663
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 07/Nov/19 01:06
Start Date: 07/Nov/19 01:06
Worklog Time Spent: 10m 
  Work Description: liumomo315 commented on pull request #9957: [BEAM-8575] 
Add validates runner tests for 1. Custom window fn: Test a customized window fn 
work as expected; 2. Windows idempotency: Applying the same window fn (or 
window fn + GBK) to the input multiple times will have the same effect as 
applying it once.
URL: https://github.com/apache/beam/pull/9957#discussion_r343407270
 
 

 ##
 File path: sdks/python/apache_beam/transforms/window_test.py
 ##
 @@ -252,6 +276,50 @@ def test_timestamped_with_combiners(self):
   assert_that(mean_per_window, equal_to([(0, 2.0), (1, 7.0)]),
   label='assert:mean')
 
+  @attr('ValidatesRunner')
+  def test_custom_windows(self):
+with TestPipeline() as p:
+  result = (p | Create([TimestampedValue(x, x * 100) for x in range(7)])
+| 'custom window' >> WindowInto(TestCustomWindows(250, 200))
+| 'insert key' >> Map(lambda v: ('key', v.value))
+| GroupByKey())
+
+  assert_that(result, equal_to([('key', [0, 1, 2]),
+('key', [3, 4]),
+('key', [5]),
+('key', [6])]))
+
+  @attr('ValidatesRunner')
+  def test_windows_idempotency(self):
+with TestPipeline() as p:
+  result = (p | Create([(x, x * 2) for x in range(5)])
+| Map(lambda item: TimestampedValue(item[0], item[1]))
+| Map(lambda v: ('key', v))
+| 'window' >> WindowInto(FixedWindows(4))
+| 'same window' >> WindowInto(FixedWindows(4))
+| 'same window again' >> WindowInto(FixedWindows(4))
+| GroupByKey())
+
+  assert_that(result, equal_to([('key', [0, 1]),
+('key', [2, 3]),
+('key', [4])]))
+
+  @attr('ValidatesRunner')
+  def test_windows_gbk_idempotency(self):
+with TestPipeline() as p:
+  result = (p | Create([(x, x * 2) for x in range(5)])
 
 Review comment:
   Same for this one:)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339663)
Time Spent: 20m  (was: 10m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=339662&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339662
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 07/Nov/19 01:06
Start Date: 07/Nov/19 01:06
Worklog Time Spent: 10m 
  Work Description: liumomo315 commented on pull request #9957: [BEAM-8575] 
Add validates runner tests for 1. Custom window fn: Test a customized window fn 
work as expected; 2. Windows idempotency: Applying the same window fn (or 
window fn + GBK) to the input multiple times will have the same effect as 
applying it once.
URL: https://github.com/apache/beam/pull/9957#discussion_r343407158
 
 

 ##
 File path: sdks/python/apache_beam/transforms/window_test.py
 ##
 @@ -252,6 +276,50 @@ def test_timestamped_with_combiners(self):
   assert_that(mean_per_window, equal_to([(0, 2.0), (1, 7.0)]),
   label='assert:mean')
 
+  @attr('ValidatesRunner')
+  def test_custom_windows(self):
+with TestPipeline() as p:
+  result = (p | Create([TimestampedValue(x, x * 100) for x in range(7)])
+| 'custom window' >> WindowInto(TestCustomWindows(250, 200))
+| 'insert key' >> Map(lambda v: ('key', v.value))
+| GroupByKey())
+
+  assert_that(result, equal_to([('key', [0, 1, 2]),
+('key', [3, 4]),
+('key', [5]),
+('key', [6])]))
+
+  @attr('ValidatesRunner')
+  def test_windows_idempotency(self):
+with TestPipeline() as p:
+  result = (p | Create([(x, x * 2) for x in range(5)])
 
 Review comment:
   I tried that but it failed.
   
   Dug this issue more and found this is a known issue with Python timestamped 
value, see https://github.com/apache/beam/pull/8206. In short, to have the 
downstream ptransforms treat an timestamped element as an element with a 
timestamp instead of a timestamped value object, there must be a DoFn (could be 
a simple Map(lamda x: x)) applied to the timestamped elements. (This should be 
fixed, but probably not in this PR)
   
   I found this test class already have a helper fun timestamped_key_values, so 
I reuse it in the new tests.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339662)
Remaining Estimate: 0h
Time Spent: 10m

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=339653&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339653
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 07/Nov/19 00:55
Start Date: 07/Nov/19 00:55
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #9720: [BEAM-8335] Add 
initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#issuecomment-550570058
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339653)
Time Spent: 19h 10m  (was: 19h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 19h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=339655&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339655
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 07/Nov/19 00:55
Start Date: 07/Nov/19 00:55
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #9720: [BEAM-8335] Add 
initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#issuecomment-550570123
 
 
   Run Java_Examples_Dataflow PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339655)
Time Spent: 19.5h  (was: 19h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 19.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=339654&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339654
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 07/Nov/19 00:55
Start Date: 07/Nov/19 00:55
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #9720: [BEAM-8335] Add 
initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#issuecomment-550570078
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339654)
Time Spent: 19h 20m  (was: 19h 10m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 19h 20m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339650&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339650
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 07/Nov/19 00:44
Start Date: 07/Nov/19 00:44
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r343401907
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -396,28 +400,57 @@ def replace_all(self, replacements):
 for override in replacements:
   self._check_replacement(override)
 
-  def run(self, test_runner_api=True):
-"""Runs the pipeline. Returns whatever our runner returns after running."""
-
+  def run(self, test_runner_api=True, runner=None, options=None,
 
 Review comment:
   > Do you think we should put it into a separate PR,
   
   Yes. At least let's have a seperate discussion for API changes like this.
   
   >  or simply not supporting it at all?
   
   Maybe not. This could be just a override for the interactive runners run() 
(e.g. run_with(NewRunner, NewOptions). At least, let's discuss with all 
stakeholders.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339650)
Time Spent: 8h  (was: 7h 50m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339649&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339649
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 07/Nov/19 00:41
Start Date: 07/Nov/19 00:41
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r343401313
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -360,6 +360,16 @@ def visit_transform(self, transform_node):
 
   def run_pipeline(self, pipeline, options):
 """Remotely executes entire pipeline or parts reachable from node."""
+# Label goog-dataflow-notebook if pipeline is initiated from interactive
+# runner.
+if pipeline.interactive:
 
 Review comment:
   Could we move 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_environment.py#L100:L102
 to a common utility function, and each runner if they want could call this 
without worrying about require additional imports?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339649)
Time Spent: 7h 50m  (was: 7h 40m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8575) Add more Python validates runner tests

2019-11-06 Thread wendy liu (Jira)

wendy liu created BEAM-8575:
---

 Summary: Add more Python validates runner tests
 Key: BEAM-8575
 URL: https://issues.apache.org/jira/browse/BEAM-8575
 Project: Beam
  Issue Type: Test
  Components: sdk-py-core, testing
Reporter: wendy liu


This is the umbrella issue to track the work of adding more Python tests to 
improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=339648&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339648
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 07/Nov/19 00:29
Start Date: 07/Nov/19 00:29
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on issue #9778: [BEAM-7013] Update 
BigQueryHllSketchCompatibilityIT to cover empty sketch cases
URL: https://github.com/apache/beam/pull/9778#issuecomment-550564200
 
 
   Run Java PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339648)
Time Spent: 37h 10m  (was: 37h)

> A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 37h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=339641&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339641
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 06/Nov/19 23:39
Start Date: 06/Nov/19 23:39
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339641)
Time Spent: 15h 50m  (was: 15h 40m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=339640&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339640
 ]

ASF GitHub Bot logged work on BEAM-8508:


Author: ASF GitHub Bot
Created on: 06/Nov/19 23:35
Start Date: 06/Nov/19 23:35
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #9943: [BEAM-8508] [SQL] 
Standalone filter push down
URL: https://github.com/apache/beam/pull/9943#issuecomment-550550057
 
 
   `PushDownRule` should not be applied to the same BeamIOSourceRel more than 
once.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339640)
Time Spent: 3h 50m  (was: 3h 40m)

> [SQL] Support predicate push-down without project push-down
> ---
>
> Key: BEAM-8508
> URL: https://issues.apache.org/jira/browse/BEAM-8508
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> In this PR: [https://github.com/apache/beam/pull/9863]
> Support for Predicate push-down is added, but only for IOs that support 
> project push-down.
> In order to accomplish that some checks need to be added to not perform 
> certain Calc and IO manipulations when only filter push-down is needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8574) [SQL] MongoDb PostCommit_SQL fails

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8574?focusedWorklogId=339639&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339639
 ]

ASF GitHub Bot logged work on BEAM-8574:


Author: ASF GitHub Bot
Created on: 06/Nov/19 23:33
Start Date: 06/Nov/19 23:33
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #10018: [BEAM-8574] 
Revert "[BEAM-8427] Create a table and a table provider for MongoDB"
URL: https://github.com/apache/beam/pull/10018
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339639)
Time Spent: 50m  (was: 40m)

> [SQL] MongoDb PostCommit_SQL fails
> --
>
> Key: BEAM-8574
> URL: https://issues.apache.org/jira/browse/BEAM-8574
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Critical
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Integration test for Sql MongoDb table read and write fails.
> Jenkins: 
> [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]
> Cause: [https://github.com/apache/beam/pull/9806] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8570) Use SDK version in default Java container tag

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8570?focusedWorklogId=339636&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339636
 ]

ASF GitHub Bot logged work on BEAM-8570:


Author: ASF GitHub Bot
Created on: 06/Nov/19 23:22
Start Date: 06/Nov/19 23:22
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10017: [BEAM-8570] Use SDK 
version in default Java container tag
URL: https://github.com/apache/beam/pull/10017#issuecomment-550546532
 
 
   Run XVR_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339636)
Time Spent: 0.5h  (was: 20m)

> Use SDK version in default Java container tag
> -
>
> Key: BEAM-8570
> URL: https://issues.apache.org/jira/browse/BEAM-8570
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, the Java SDK uses container `apachebeam/java_sdk:latest` by 
> default [1]. This causes confusion when using locally built containers [2], 
> especially since images are automatically pulled, meaning the release image 
> is used instead of the developer's own image (BEAM-8545).
> [[1] 
> https://github.com/apache/beam/blob/473377ef8f51949983508f70663e75ef0ee24a7f/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L84-L91|https://github.com/apache/beam/blob/473377ef8f51949983508f70663e75ef0ee24a7f/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L84-L91]
> [[2] 
> https://lists.apache.org/thread.html/07131e314e229ec60100eaa2c0cf6dfc206bf2b0f78c3cee9ebb0bda@%3Cdev.beam.apache.org%3E|https://lists.apache.org/thread.html/07131e314e229ec60100eaa2c0cf6dfc206bf2b0f78c3cee9ebb0bda@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8574) [SQL] MongoDb PostCommit_SQL fails

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8574?focusedWorklogId=339634&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339634
 ]

ASF GitHub Bot logged work on BEAM-8574:


Author: ASF GitHub Bot
Created on: 06/Nov/19 23:14
Start Date: 06/Nov/19 23:14
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #10018: [BEAM-8574] Revert 
"[BEAM-8427] Create a table and a table provider for MongoDB"
URL: https://github.com/apache/beam/pull/10018#issuecomment-550544180
 
 
   Run sql postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339634)
Time Spent: 40m  (was: 0.5h)

> [SQL] MongoDb PostCommit_SQL fails
> --
>
> Key: BEAM-8574
> URL: https://issues.apache.org/jira/browse/BEAM-8574
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Critical
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Integration test for Sql MongoDb table read and write fails.
> Jenkins: 
> [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]
> Cause: [https://github.com/apache/beam/pull/9806] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-1438) The default behavior for the Write transform doesn't work well with the Dataflow streaming runner

2019-11-06 Thread Amit Kumar (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968788#comment-16968788
 ] 

Amit Kumar commented on BEAM-1438:
--

I have also recently seen failure withNumShards(0) for an unbounded source.

> The default behavior for the Write transform doesn't work well with the 
> Dataflow streaming runner
> -
>
> Key: BEAM-1438
> URL: https://issues.apache.org/jira/browse/BEAM-1438
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
> Fix For: 2.5.0
>
>
> If a Write specifies 0 output shards, that implies the runner should pick an 
> appropriate sharding. The default behavior is to write one shard per input 
> bundle. This works well with the Dataflow batch runner, but not with the 
> streaming runner which produces large numbers of small bundles.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8512) Add integration tests for Python "flink_runner.py"

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8512?focusedWorklogId=339633&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339633
 ]

ASF GitHub Bot logged work on BEAM-8512:


Author: ASF GitHub Bot
Created on: 06/Nov/19 23:12
Start Date: 06/Nov/19 23:12
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9998: [BEAM-8512] Add 
integration tests for flink_runner.py
URL: https://github.com/apache/beam/pull/9998#issuecomment-550543641
 
 
   Run Python 3.7 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339633)
Time Spent: 50m  (was: 40m)

> Add integration tests for Python "flink_runner.py"
> --
>
> Key: BEAM-8512
> URL: https://issues.apache.org/jira/browse/BEAM-8512
> Project: Beam
>  Issue Type: Test
>  Components: runner-flink, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> There are currently no integration tests for the Python FlinkRunner. We need 
> a set of tests similar to {{flink_runner_test.py}} which currently use the 
> PortableRunner and not the FlinkRunner.
> CC [~robertwb] [~ibzib] [~thw]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8512) Add integration tests for Python "flink_runner.py"

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8512?focusedWorklogId=339632&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339632
 ]

ASF GitHub Bot logged work on BEAM-8512:


Author: ASF GitHub Bot
Created on: 06/Nov/19 23:11
Start Date: 06/Nov/19 23:11
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9998: [BEAM-8512] Add 
integration tests for flink_runner.py
URL: https://github.com/apache/beam/pull/9998#issuecomment-550543546
 
 
   Run Python 2 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339632)
Time Spent: 40m  (was: 0.5h)

> Add integration tests for Python "flink_runner.py"
> --
>
> Key: BEAM-8512
> URL: https://issues.apache.org/jira/browse/BEAM-8512
> Project: Beam
>  Issue Type: Test
>  Components: runner-flink, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> There are currently no integration tests for the Python FlinkRunner. We need 
> a set of tests similar to {{flink_runner_test.py}} which currently use the 
> PortableRunner and not the FlinkRunner.
> CC [~robertwb] [~ibzib] [~thw]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8570) Use SDK version in default Java container tag

2019-11-06 Thread Kyle Weaver (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8570:
--
Status: Open  (was: Triage Needed)

> Use SDK version in default Java container tag
> -
>
> Key: BEAM-8570
> URL: https://issues.apache.org/jira/browse/BEAM-8570
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, the Java SDK uses container `apachebeam/java_sdk:latest` by 
> default [1]. This causes confusion when using locally built containers [2], 
> especially since images are automatically pulled, meaning the release image 
> is used instead of the developer's own image (BEAM-8545).
> [[1] 
> https://github.com/apache/beam/blob/473377ef8f51949983508f70663e75ef0ee24a7f/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L84-L91|https://github.com/apache/beam/blob/473377ef8f51949983508f70663e75ef0ee24a7f/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L84-L91]
> [[2] 
> https://lists.apache.org/thread.html/07131e314e229ec60100eaa2c0cf6dfc206bf2b0f78c3cee9ebb0bda@%3Cdev.beam.apache.org%3E|https://lists.apache.org/thread.html/07131e314e229ec60100eaa2c0cf6dfc206bf2b0f78c3cee9ebb0bda@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8571) Use SDK version in default Go container tag

2019-11-06 Thread Kyle Weaver (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8571:
--
Status: Open  (was: Triage Needed)

> Use SDK version in default Go container tag
> ---
>
> Key: BEAM-8571
> URL: https://issues.apache.org/jira/browse/BEAM-8571
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>
> Currently, the Go SDK uses container `apachebeam/go_sdk:latest` by default 
> [1]. This causes confusion when using locally built containers [2], 
> especially since images are automatically pulled, meaning the release image 
> is used instead of the developer's own image (BEAM-8545).
> [1] 
> [https://github.com/apache/beam/blob/473377ef8f51949983508f70663e75ef0ee24a7f/sdks/go/pkg/beam/options/jobopts/options.go#L111]
> [[2] 
> https://lists.apache.org/thread.html/07131e314e229ec60100eaa2c0cf6dfc206bf2b0f78c3cee9ebb0bda@%3Cdev.beam.apache.org%3E|https://lists.apache.org/thread.html/07131e314e229ec60100eaa2c0cf6dfc206bf2b0f78c3cee9ebb0bda@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8574) [SQL] MongoDb PostCommit_SQL fails

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8574?focusedWorklogId=339630&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339630
 ]

ASF GitHub Bot logged work on BEAM-8574:


Author: ASF GitHub Bot
Created on: 06/Nov/19 23:07
Start Date: 06/Nov/19 23:07
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10018: [BEAM-8574] Revert 
"[BEAM-8427] Create a table and a table provider for MongoDB"
URL: https://github.com/apache/beam/pull/10018#issuecomment-550542272
 
 
   R: @apilloud 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339630)
Time Spent: 0.5h  (was: 20m)

> [SQL] MongoDb PostCommit_SQL fails
> --
>
> Key: BEAM-8574
> URL: https://issues.apache.org/jira/browse/BEAM-8574
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Critical
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Integration test for Sql MongoDb table read and write fails.
> Jenkins: 
> [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]
> Cause: [https://github.com/apache/beam/pull/9806] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8574) [SQL] MongoDb PostCommit_SQL fails

2019-11-06 Thread Kirill Kozlov (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8574:

Description: 
Integration test for Sql MongoDb table read and write fails.

Jenkins: 
[https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]

Cause: [https://github.com/apache/beam/pull/9806] 

Revert PR: [https://github.com/apache/beam/pull/10018]

  was:
Integration test for Sql MongoDb table read and write fails.

Jenkins: 
[https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]

Cause: [https://github.com/apache/beam/pull/9806] 


> [SQL] MongoDb PostCommit_SQL fails
> --
>
> Key: BEAM-8574
> URL: https://issues.apache.org/jira/browse/BEAM-8574
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Critical
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Integration test for Sql MongoDb table read and write fails.
> Jenkins: 
> [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]
> Cause: [https://github.com/apache/beam/pull/9806] 
> Revert PR: [https://github.com/apache/beam/pull/10018]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8574) [SQL] MongoDb PostCommit_SQL fails

2019-11-06 Thread Kirill Kozlov (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8574:

Description: 
Integration test for Sql MongoDb table read and write fails.

Jenkins: 
[https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]

Cause: [https://github.com/apache/beam/pull/9806] 

  was:
Integration test for Sql MongoDb table read and write fails.

Jenkins: 
[https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]

Cause: [https://github.com/apache/beam/pull/9806] 

Revert PR: [https://github.com/apache/beam/pull/10018]


> [SQL] MongoDb PostCommit_SQL fails
> --
>
> Key: BEAM-8574
> URL: https://issues.apache.org/jira/browse/BEAM-8574
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Critical
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Integration test for Sql MongoDb table read and write fails.
> Jenkins: 
> [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]
> Cause: [https://github.com/apache/beam/pull/9806] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8574) [SQL] MongoDb PostCommit_SQL fails

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8574?focusedWorklogId=339627&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339627
 ]

ASF GitHub Bot logged work on BEAM-8574:


Author: ASF GitHub Bot
Created on: 06/Nov/19 22:52
Start Date: 06/Nov/19 22:52
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10018: [BEAM-8574] Revert 
"[BEAM-8427] Create a table and a table provider for MongoDB"
URL: https://github.com/apache/beam/pull/10018#issuecomment-550537682
 
 
   Run JavaPortabilityApi PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339627)
Time Spent: 20m  (was: 10m)

> [SQL] MongoDb PostCommit_SQL fails
> --
>
> Key: BEAM-8574
> URL: https://issues.apache.org/jira/browse/BEAM-8574
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Critical
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Integration test for Sql MongoDb table read and write fails.
> Jenkins: 
> [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]
> Cause: [https://github.com/apache/beam/pull/9806] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8574) [SQL] MongoDb PostCommit_SQL fails

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8574?focusedWorklogId=339626&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339626
 ]

ASF GitHub Bot logged work on BEAM-8574:


Author: ASF GitHub Bot
Created on: 06/Nov/19 22:52
Start Date: 06/Nov/19 22:52
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10018: [BEAM-8574] Revert 
"[BEAM-8427] Create a table and a table provider for MongoDB"
URL: https://github.com/apache/beam/pull/10018#issuecomment-550537682
 
 
   Run JavaPortabilityApi PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339626)
Remaining Estimate: 0h
Time Spent: 10m

> [SQL] MongoDb PostCommit_SQL fails
> --
>
> Key: BEAM-8574
> URL: https://issues.apache.org/jira/browse/BEAM-8574
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Integration test for Sql MongoDb table read and write fails.
> Jenkins: 
> [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]
> Cause: [https://github.com/apache/beam/pull/9806] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8512) Add integration tests for Python "flink_runner.py"

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8512?focusedWorklogId=339624&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339624
 ]

ASF GitHub Bot logged work on BEAM-8512:


Author: ASF GitHub Bot
Created on: 06/Nov/19 22:50
Start Date: 06/Nov/19 22:50
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9998: [BEAM-8512] Add 
integration tests for flink_runner.py
URL: https://github.com/apache/beam/pull/9998#issuecomment-550536929
 
 
   Run Seed Job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339624)
Time Spent: 0.5h  (was: 20m)

> Add integration tests for Python "flink_runner.py"
> --
>
> Key: BEAM-8512
> URL: https://issues.apache.org/jira/browse/BEAM-8512
> Project: Beam
>  Issue Type: Test
>  Components: runner-flink, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> There are currently no integration tests for the Python FlinkRunner. We need 
> a set of tests similar to {{flink_runner_test.py}} which currently use the 
> PortableRunner and not the FlinkRunner.
> CC [~robertwb] [~ibzib] [~thw]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8574) [SQL] MongoDb PostCommit_SQL fails

2019-11-06 Thread Kirill Kozlov (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8574:

Description: 
Integration test for Sql MongoDb table read and write fails.

Jenkins: 
[https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]

Cause: [https://github.com/apache/beam/pull/9806] 

  was:
Integration test for Sql MongoDb table read and write fails.

Jenkins: 
[https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]

 


> [SQL] MongoDb PostCommit_SQL fails
> --
>
> Key: BEAM-8574
> URL: https://issues.apache.org/jira/browse/BEAM-8574
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Critical
>
> Integration test for Sql MongoDb table read and write fails.
> Jenkins: 
> [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]
> Cause: [https://github.com/apache/beam/pull/9806] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8574) [SQL] MongoDb PostCommit_SQL fails

2019-11-06 Thread Kirill Kozlov (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8574:

Description: 
Integration test for Sql MongoDb table read and write fails.

Jenkins: 
[https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]

 

  was:
Integration test for Sql MongoDb table read and write fails.

[https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]


> [SQL] MongoDb PostCommit_SQL fails
> --
>
> Key: BEAM-8574
> URL: https://issues.apache.org/jira/browse/BEAM-8574
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Critical
>
> Integration test for Sql MongoDb table read and write fails.
> Jenkins: 
> [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8574) [SQL] MongoDb PostCommit_SQL fails

2019-11-06 Thread Kirill Kozlov (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8574:

Description: 
Integration test for Sql MongoDb table read and write fails.

[https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]

  was:Integration test for Sql MongoDb table read and write fails.


> [SQL] MongoDb PostCommit_SQL fails
> --
>
> Key: BEAM-8574
> URL: https://issues.apache.org/jira/browse/BEAM-8574
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Critical
>
> Integration test for Sql MongoDb table read and write fails.
> [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8402) Create a class hierarchy to represent environments

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8402?focusedWorklogId=339622&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339622
 ]

ASF GitHub Bot logged work on BEAM-8402:


Author: ASF GitHub Bot
Created on: 06/Nov/19 22:43
Start Date: 06/Nov/19 22:43
Worklog Time Spent: 10m 
  Work Description: violalyu commented on issue #9811: [BEAM-8402] Create a 
class hierarchy to represent environments
URL: https://github.com/apache/beam/pull/9811#issuecomment-550534787
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339622)
Time Spent: 2h 50m  (was: 2h 40m)

> Create a class hierarchy to represent environments
> --
>
> Key: BEAM-8402
> URL: https://issues.apache.org/jira/browse/BEAM-8402
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> As a first step towards making it possible to assign different environments 
> to sections of a pipeline, we first need to expose environment classes to the 
> pipeline API.  Unlike PTransforms, PCollections, Coders, and Windowings,  
> environments exists solely in the portability framework as protobuf objects.  
>  By creating a hierarchy of "native" classes that represent the various 
> environment types -- external, docker, process, etc -- users will be able to 
> instantiate these and assign them to parts of the pipeline.  The assignment 
> portion will be covered in a follow-up issue/PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8570) Use SDK version in default Java container tag

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8570?focusedWorklogId=339623&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339623
 ]

ASF GitHub Bot logged work on BEAM-8570:


Author: ASF GitHub Bot
Created on: 06/Nov/19 22:43
Start Date: 06/Nov/19 22:43
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10017: [BEAM-8570] Use SDK 
version in default Java container tag
URL: https://github.com/apache/beam/pull/10017#issuecomment-550534969
 
 
   Run XVR_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339623)
Time Spent: 20m  (was: 10m)

> Use SDK version in default Java container tag
> -
>
> Key: BEAM-8570
> URL: https://issues.apache.org/jira/browse/BEAM-8570
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, the Java SDK uses container `apachebeam/java_sdk:latest` by 
> default [1]. This causes confusion when using locally built containers [2], 
> especially since images are automatically pulled, meaning the release image 
> is used instead of the developer's own image (BEAM-8545).
> [[1] 
> https://github.com/apache/beam/blob/473377ef8f51949983508f70663e75ef0ee24a7f/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L84-L91|https://github.com/apache/beam/blob/473377ef8f51949983508f70663e75ef0ee24a7f/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L84-L91]
> [[2] 
> https://lists.apache.org/thread.html/07131e314e229ec60100eaa2c0cf6dfc206bf2b0f78c3cee9ebb0bda@%3Cdev.beam.apache.org%3E|https://lists.apache.org/thread.html/07131e314e229ec60100eaa2c0cf6dfc206bf2b0f78c3cee9ebb0bda@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8574) [SQL] MongoDb PostCommit_SQL fails

2019-11-06 Thread Kirill Kozlov (Jira)

Kirill Kozlov created BEAM-8574:
---

 Summary: [SQL] MongoDb PostCommit_SQL fails
 Key: BEAM-8574
 URL: https://issues.apache.org/jira/browse/BEAM-8574
 Project: Beam
  Issue Type: Bug
  Components: dsl-sql
Reporter: Kirill Kozlov
Assignee: Kirill Kozlov


Integration test for Sql MongoDb table read and write fails.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=339621&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339621
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 06/Nov/19 22:37
Start Date: 06/Nov/19 22:37
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #10018: Revert 
"[BEAM-8427] Create a table and a table provider for MongoDB"
URL: https://github.com/apache/beam/pull/10018
 
 
   Reverts apache/beam#9806
   Post Commit tests break.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339621)
Time Spent: 4.5h  (was: 4h 20m)

> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: https://issues.apache.org/jira/browse/BEAM-8427
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> In progress:
>  * Create a MongoDB table and table provider.
>  * Implement buildIOReader
>  * Support primitive types
> Still needs to be done:
>  * Implement buildIOWrite
>  * improve getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8570) Use SDK version in default Java container tag

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8570?focusedWorklogId=339620&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339620
 ]

ASF GitHub Bot logged work on BEAM-8570:


Author: ASF GitHub Bot
Created on: 06/Nov/19 22:36
Start Date: 06/Nov/19 22:36
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #10017: [BEAM-8570] Use 
SDK version in default Java container tag
URL: https://github.com/apache/beam/pull/10017
 
 
   One small concern I have is the provenance of the release version. It 
ultimately comes from `sdk.properties`, which is generated by [`./gradlew 
:sdks:java:core:processResources`](https://github.com/ibzib/beam/blob/java-container-version/sdks/java/core/build.gradle#L46-L52),
 which appears to do nothing unless it's a clean build.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCon

[jira] [Work logged] (BEAM-8554) Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to be broken up

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8554?focusedWorklogId=339618&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339618
 ]

ASF GitHub Bot logged work on BEAM-8554:


Author: ASF GitHub Bot
Created on: 06/Nov/19 22:32
Start Date: 06/Nov/19 22:32
Worklog Time Spent: 10m 
  Work Description: stevekoonce commented on pull request #10013: 
[BEAM-8554] Use WorkItemCommitRequest protobuf fields to signal that …
URL: https://github.com/apache/beam/pull/10013#discussion_r343363529
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorkerTest.java
 ##
 @@ -949,63 +949,14 @@ public void testKeyCommitTooLargeException() throws 
Exception {
 assertEquals(2, result.size());
 assertEquals(makeExpectedOutput(2, 0, "key", "key").build(), 
result.get(2L));
 assertTrue(result.containsKey(1L));
-assertEquals("large_key", result.get(1L).getKey().toStringUtf8());
-assertTrue(result.get(1L).getSerializedSize() > 1000);
 
-// Spam worker updates a few times.
-int maxTries = 10;
-while (--maxTries > 0) {
-  worker.reportPeriodicWorkerUpdates();
-  Uninterruptibles.sleepUninterruptibly(1000, TimeUnit.MILLISECONDS);
-}
+WorkItemCommitRequest largeCommit = result.get(1L);
+assertEquals("large_key", largeCommit.getKey().toStringUtf8());
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339618)
Time Spent: 1h 40m  (was: 1.5h)

> Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to 
> be broken up
> -
>
> Key: BEAM-8554
> URL: https://issues.apache.org/jira/browse/BEAM-8554
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Koonce
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> +Background:+
> When a WorkItemCommitRequest is generated that's bigger than the permitted 
> size (> ~180 MB), a KeyCommitTooLargeException is logged (_not thrown_) and 
> the request is still sent to the service.  The service rejects the commit, 
> but breaks up input messages that were bundled together and adds them to new, 
> smaller work items that will later be pulled and re-tried - likely without 
> generating another commit that is too large.
> When a WorkItemCommitRequest is generated that's too large to be sent back to 
> the service (> 2 GB), a KeyCommitTooLargeException is thrown and nothing is 
> sent back to the service.
>  
> +Proposed Improvement+
> In both cases, prevent the doomed, large commit item from being sent back to 
> the service.  Instead send flags in the commit request signaling that the 
> current work item led to a commit that is too large and the work item should 
> be broken up.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8554) Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to be broken up

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8554?focusedWorklogId=339617&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339617
 ]

ASF GitHub Bot logged work on BEAM-8554:


Author: ASF GitHub Bot
Created on: 06/Nov/19 22:32
Start Date: 06/Nov/19 22:32
Worklog Time Spent: 10m 
  Work Description: stevekoonce commented on pull request #10013: 
[BEAM-8554] Use WorkItemCommitRequest protobuf fields to signal that …
URL: https://github.com/apache/beam/pull/10013#discussion_r343363496
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorkerTest.java
 ##
 @@ -949,63 +949,14 @@ public void testKeyCommitTooLargeException() throws 
Exception {
 assertEquals(2, result.size());
 assertEquals(makeExpectedOutput(2, 0, "key", "key").build(), 
result.get(2L));
 assertTrue(result.containsKey(1L));
-assertEquals("large_key", result.get(1L).getKey().toStringUtf8());
-assertTrue(result.get(1L).getSerializedSize() > 1000);
 
-// Spam worker updates a few times.
-int maxTries = 10;
-while (--maxTries > 0) {
-  worker.reportPeriodicWorkerUpdates();
-  Uninterruptibles.sleepUninterruptibly(1000, TimeUnit.MILLISECONDS);
-}
+WorkItemCommitRequest largeCommit = result.get(1L);
+assertEquals("large_key", largeCommit.getKey().toStringUtf8());
 
-// We should see an exception reported for the large commit but not the 
small one.
-ArgumentCaptor workItemStatusCaptor =
-ArgumentCaptor.forClass(WorkItemStatus.class);
-verify(mockWorkUnitClient, 
atLeast(2)).reportWorkItemStatus(workItemStatusCaptor.capture());
-List capturedStatuses = 
workItemStatusCaptor.getAllValues();
-boolean foundErrors = false;
-for (WorkItemStatus status : capturedStatuses) {
-  if (!status.getErrors().isEmpty()) {
-assertFalse(foundErrors);
-foundErrors = true;
-String errorMessage = status.getErrors().get(0).getMessage();
-assertThat(errorMessage, 
Matchers.containsString("KeyCommitTooLargeException"));
-  }
-}
-assertTrue(foundErrors);
-  }
-
-  @Test
-  public void testKeyCommitTooLargeException_StreamingEngine() throws 
Exception {
-KvCoder kvCoder = KvCoder.of(StringUtf8Coder.of(), 
StringUtf8Coder.of());
-
-List instructions =
-Arrays.asList(
-makeSourceInstruction(kvCoder),
-makeDoFnInstruction(new LargeCommitFn(), 0, kvCoder),
-makeSinkInstruction(kvCoder, 1));
-
-FakeWindmillServer server = new FakeWindmillServer(errorCollector);
-server.setExpectedExceptionCount(1);
-
-StreamingDataflowWorkerOptions options =
-createTestingPipelineOptions(server, 
"--experiments=enable_streaming_engine");
-StreamingDataflowWorker worker = makeWorker(instructions, options, true /* 
publishCounters */);
-worker.setMaxWorkItemCommitBytes(1000);
-worker.start();
-
-server.addWorkToOffer(makeInput(1, 0, "large_key"));
-server.addWorkToOffer(makeInput(2, 0, "key"));
-server.waitForEmptyWorkQueue();
-
-Map result = 
server.waitForAndGetCommits(1);
-
-assertEquals(2, result.size());
-assertEquals(makeExpectedOutput(2, 0, "key", "key").build(), 
result.get(2L));
-assertTrue(result.containsKey(1L));
-assertEquals("large_key", result.get(1L).getKey().toStringUtf8());
-assertTrue(result.get(1L).getSerializedSize() > 1000);
+// The large commit should have its flags set marking it for truncation
+assertTrue(largeCommit.getExceedsMaxWorkItemCommitBytes());
+assertTrue(largeCommit.getSerializedSize() < 100);
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339617)
Time Spent: 1.5h  (was: 1h 20m)

> Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to 
> be broken up
> -
>
> Key: BEAM-8554
> URL: https://issues.apache.org/jira/browse/BEAM-8554
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Koonce
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> +Background:+
> When a WorkItemCommitRequest is generated that's bigger than the permitted 
> size (> ~180 MB), a KeyCommitTooLargeException is logged (_not thrown_) and 
> the request is still sent to the service.  The service rejects

[jira] [Work logged] (BEAM-8554) Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to be broken up

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8554?focusedWorklogId=339614&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339614
 ]

ASF GitHub Bot logged work on BEAM-8554:


Author: ASF GitHub Bot
Created on: 06/Nov/19 22:23
Start Date: 06/Nov/19 22:23
Worklog Time Spent: 10m 
  Work Description: stevekoonce commented on pull request #10013: 
[BEAM-8554] Use WorkItemCommitRequest protobuf fields to signal that …
URL: https://github.com/apache/beam/pull/10013#discussion_r343360175
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/windmill/src/main/proto/windmill.proto
 ##
 @@ -290,12 +293,19 @@ message WorkItemCommitRequest {
   optional SourceState source_state_updates = 12;
   optional int64 source_watermark = 13 [default=-0x8000];
   optional int64 source_backlog_bytes = 17 [default=-1];
+  optional int64 source_bytes_processed = 22 [default = 0];
+
   repeated WatermarkHold watermark_holds = 14;
 
+  repeated int64 finalize_ids = 19 [packed = true];
+
+  optional int64 testonly_fake_clock_time_usec = 23;
+
   // DEPRECATED
   repeated GlobalDataId global_data_id_requests = 9;
 
   reserved 6;
+
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339614)
Time Spent: 1h 20m  (was: 1h 10m)

> Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to 
> be broken up
> -
>
> Key: BEAM-8554
> URL: https://issues.apache.org/jira/browse/BEAM-8554
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Koonce
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> +Background:+
> When a WorkItemCommitRequest is generated that's bigger than the permitted 
> size (> ~180 MB), a KeyCommitTooLargeException is logged (_not thrown_) and 
> the request is still sent to the service.  The service rejects the commit, 
> but breaks up input messages that were bundled together and adds them to new, 
> smaller work items that will later be pulled and re-tried - likely without 
> generating another commit that is too large.
> When a WorkItemCommitRequest is generated that's too large to be sent back to 
> the service (> 2 GB), a KeyCommitTooLargeException is thrown and nothing is 
> sent back to the service.
>  
> +Proposed Improvement+
> In both cases, prevent the doomed, large commit item from being sent back to 
> the service.  Instead send flags in the commit request signaling that the 
> current work item led to a commit that is too large and the work item should 
> be broken up.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8554) Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to be broken up

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8554?focusedWorklogId=339612&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339612
 ]

ASF GitHub Bot logged work on BEAM-8554:


Author: ASF GitHub Bot
Created on: 06/Nov/19 22:23
Start Date: 06/Nov/19 22:23
Worklog Time Spent: 10m 
  Work Description: stevekoonce commented on pull request #10013: 
[BEAM-8554] Use WorkItemCommitRequest protobuf fields to signal that …
URL: https://github.com/apache/beam/pull/10013#discussion_r343360096
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/windmill/src/main/proto/windmill.proto
 ##
 @@ -290,12 +293,19 @@ message WorkItemCommitRequest {
   optional SourceState source_state_updates = 12;
   optional int64 source_watermark = 13 [default=-0x8000];
   optional int64 source_backlog_bytes = 17 [default=-1];
+  optional int64 source_bytes_processed = 22 [default = 0];
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339612)
Time Spent: 1h  (was: 50m)

> Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to 
> be broken up
> -
>
> Key: BEAM-8554
> URL: https://issues.apache.org/jira/browse/BEAM-8554
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Koonce
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> +Background:+
> When a WorkItemCommitRequest is generated that's bigger than the permitted 
> size (> ~180 MB), a KeyCommitTooLargeException is logged (_not thrown_) and 
> the request is still sent to the service.  The service rejects the commit, 
> but breaks up input messages that were bundled together and adds them to new, 
> smaller work items that will later be pulled and re-tried - likely without 
> generating another commit that is too large.
> When a WorkItemCommitRequest is generated that's too large to be sent back to 
> the service (> 2 GB), a KeyCommitTooLargeException is thrown and nothing is 
> sent back to the service.
>  
> +Proposed Improvement+
> In both cases, prevent the doomed, large commit item from being sent back to 
> the service.  Instead send flags in the commit request signaling that the 
> current work item led to a commit that is too large and the work item should 
> be broken up.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8554) Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to be broken up

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8554?focusedWorklogId=339613&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339613
 ]

ASF GitHub Bot logged work on BEAM-8554:


Author: ASF GitHub Bot
Created on: 06/Nov/19 22:23
Start Date: 06/Nov/19 22:23
Worklog Time Spent: 10m 
  Work Description: stevekoonce commented on pull request #10013: 
[BEAM-8554] Use WorkItemCommitRequest protobuf fields to signal that …
URL: https://github.com/apache/beam/pull/10013#discussion_r343360142
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/windmill/src/main/proto/windmill.proto
 ##
 @@ -290,12 +293,19 @@ message WorkItemCommitRequest {
   optional SourceState source_state_updates = 12;
   optional int64 source_watermark = 13 [default=-0x8000];
   optional int64 source_backlog_bytes = 17 [default=-1];
+  optional int64 source_bytes_processed = 22 [default = 0];
+
   repeated WatermarkHold watermark_holds = 14;
 
+  repeated int64 finalize_ids = 19 [packed = true];
+
+  optional int64 testonly_fake_clock_time_usec = 23;
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339613)
Time Spent: 1h 10m  (was: 1h)

> Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to 
> be broken up
> -
>
> Key: BEAM-8554
> URL: https://issues.apache.org/jira/browse/BEAM-8554
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Koonce
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> +Background:+
> When a WorkItemCommitRequest is generated that's bigger than the permitted 
> size (> ~180 MB), a KeyCommitTooLargeException is logged (_not thrown_) and 
> the request is still sent to the service.  The service rejects the commit, 
> but breaks up input messages that were bundled together and adds them to new, 
> smaller work items that will later be pulled and re-tried - likely without 
> generating another commit that is too large.
> When a WorkItemCommitRequest is generated that's too large to be sent back to 
> the service (> 2 GB), a KeyCommitTooLargeException is thrown and nothing is 
> sent back to the service.
>  
> +Proposed Improvement+
> In both cases, prevent the doomed, large commit item from being sent back to 
> the service.  Instead send flags in the commit request signaling that the 
> current work item led to a commit that is too large and the work item should 
> be broken up.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8573) @SplitRestriction's documented signature is incorrect

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8573?focusedWorklogId=339603&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339603
 ]

ASF GitHub Bot logged work on BEAM-8573:


Author: ASF GitHub Bot
Created on: 06/Nov/19 21:55
Start Date: 06/Nov/19 21:55
Worklog Time Spent: 10m 
  Work Description: jagthebeetle commented on pull request #10016: 
[BEAM-8573] Updates documented signature for @SplitRestriction 
URL: https://github.com/apache/beam/pull/10016
 
 
   Updating the documentation for DoFn.SplitRestriction
   
   The signature does not match the compiler-enforced signature actually 
required when writing a Splittable DoFn.
   
   R: @kennknowles
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [X] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [X] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [X] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3

[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=339601&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339601
 ]

ASF GitHub Bot logged work on BEAM-8382:


Author: ASF GitHub Bot
Created on: 06/Nov/19 21:44
Start Date: 06/Nov/19 21:44
Worklog Time Spent: 10m 
  Work Description: cmachgodaddy commented on issue #9765: [WIP][BEAM-8382] 
Add rate limit policy to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9765#issuecomment-550514005
 
 
   > @cmachgodaddy
   > 
   > 1. Good point, but we use `AmazonKinesis` as a client for Kinesis. Can we 
leverage `RetryPolicy` in this case?
   > 2. I believe the last one should win but it would make sense to add more 
checks to avoid an ambiguity.
   
   --> #1 Yes, every AWS client (Dynamo, Sns, Sqs, Kinesis, ...) can take in 
ClientConfiguration as a argument. And we can set any configurations with this 
object.
   --> #2 Of course, we can add a ton of checkers, but users will be confuse of 
which one of withRateLimitXXXs to use. I would rather have just one 
withRateLimitXXX and let user pass in an Enum.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339601)
Time Spent: 8h  (was: 7h 50m)

> Add polling interval to KinesisIO.Read
> --
>
> Key: BEAM-8382
> URL: https://issues.apache.org/jira/browse/BEAM-8382
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> With the current implementation we are observing Kinesis throttling due to 
> ReadProvisionedThroughputExceeded on the order of hundreds of times per 
> second, regardless of the actual Kinesis throughput. This is because the 
> ShardReadersPool readLoop() method is polling getRecords() as fast as 
> possible.
> From the KDS documentation:
> {quote}Each shard can support up to five read transactions per second.
> {quote}
> and
> {quote}For best results, sleep for at least 1 second (1,000 milliseconds) 
> between calls to getRecords to avoid exceeding the limit on getRecords 
> frequency.
> {quote}
> [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html]
> [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=339602&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339602
 ]

ASF GitHub Bot logged work on BEAM-8382:


Author: ASF GitHub Bot
Created on: 06/Nov/19 21:44
Start Date: 06/Nov/19 21:44
Worklog Time Spent: 10m 
  Work Description: cmachgodaddy commented on issue #9765: [WIP][BEAM-8382] 
Add rate limit policy to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9765#issuecomment-550514005
 
 
   > 1. Good point, but we use `AmazonKinesis` as a client for Kinesis. Can we 
leverage `RetryPolicy` in this case?
   > 2. I believe the last one should win but it would make sense to add more 
checks to avoid an ambiguity.
   
   --> #1 Yes, every AWS client (Dynamo, Sns, Sqs, Kinesis, ...) can take in 
ClientConfiguration as a argument. And we can set any configurations with this 
object.
   --> #2 Of course, we can add a ton of checkers, but users will be confuse of 
which one of withRateLimitXXXs to use. I would rather have just one 
withRateLimitXXX and let user pass in an Enum.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339602)
Time Spent: 8h 10m  (was: 8h)

> Add polling interval to KinesisIO.Read
> --
>
> Key: BEAM-8382
> URL: https://issues.apache.org/jira/browse/BEAM-8382
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> With the current implementation we are observing Kinesis throttling due to 
> ReadProvisionedThroughputExceeded on the order of hundreds of times per 
> second, regardless of the actual Kinesis throughput. This is because the 
> ShardReadersPool readLoop() method is polling getRecords() as fast as 
> possible.
> From the KDS documentation:
> {quote}Each shard can support up to five read transactions per second.
> {quote}
> and
> {quote}For best results, sleep for at least 1 second (1,000 milliseconds) 
> between calls to getRecords to avoid exceeding the limit on getRecords 
> frequency.
> {quote}
> [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html]
> [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-8338) Support ES 7.x for ElasticsearchIO

2019-11-06 Thread Jing Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Chen reassigned BEAM-8338:
---

Assignee: Jing Chen

> Support ES 7.x for ElasticsearchIO
> --
>
> Key: BEAM-8338
> URL: https://issues.apache.org/jira/browse/BEAM-8338
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Michal Brunát
>Assignee: Jing Chen
>Priority: Major
>
> Elasticsearch has released 7.4 but ElasticsearchIO only supports 2x,5.x,6.x.
>  We should support ES 7.x for ElasticsearchIO.
>  [https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html]
>  
> [https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8452) TriggerLoadJobs.process in bigquery_file_loads schema is type str

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8452?focusedWorklogId=339594&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339594
 ]

ASF GitHub Bot logged work on BEAM-8452:


Author: ASF GitHub Bot
Created on: 06/Nov/19 21:15
Start Date: 06/Nov/19 21:15
Worklog Time Spent: 10m 
  Work Description: noah-goodrich commented on pull request #1: 
BEAM-8452 - TriggerLoadJobs.process in bigquery_file_loads schema is type str
URL: https://github.com/apache/beam/pull/1#discussion_r343330994
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads.py
 ##
 @@ -387,6 +387,14 @@ def process(self, element, load_job_name_prefix, 
*schema_side_inputs):
 else:
   schema = self.schema
 
+import unicode
+import json
+
+if isinstance(schema, (str, unicode)):
+  schema = bigquery_tools.parse_table_schema_from_json(schema)
+elif isinstance(schema, dict):
 
 Review comment:
   Apologies if the issue as described was not clear. 
   
   Basically - I was trying to creating a dataflow template, which requires 
using ValueProvider argument. I do not believe it is possible to pass a 
serialized object as the argument in this case.
   
   I'm actually borrowing the logic from here: 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L757
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339594)
Time Spent: 1h 20m  (was: 1h 10m)

> TriggerLoadJobs.process in bigquery_file_loads schema is type str
> -
>
> Key: BEAM-8452
> URL: https://issues.apache.org/jira/browse/BEAM-8452
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.15.0, 2.16.0
>Reporter: Noah Goodrich
>Assignee: Noah Goodrich
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
>  I've found a first issue with the BigQueryFileLoads Transform and the type 
> of the schema parameter.
> {code:java}
> Triggering job 
> beam_load_2019_10_11_140829_19_157670e4d458f0ff578fbe971a91b30a_1570802915 to 
> load data to BigQuery table   datasetId: 'pyr_monat_dev'
>  projectId: 'icentris-ml-dev'
>  tableId: 'tree_user_types'>.Schema: {"fields": [{"name": "id", "type": 
> "INTEGER", "mode": "required"}, {"name": "description", "type": "STRING", 
> "mode": "nullable"}]}. Additional parameters: {}
> Retry with exponential backoff: waiting for 4.875033410381894 seconds before 
> retrying _insert_load_job because we caught exception: 
> apitools.base.protorpclite.messages.ValidationError: Expected type  s 
> 'apache_beam.io.gcp.internal.clients.bigquery.bigquery_v2_messages.TableSchema'>
>  for field schema, found {"fields": [{"name": "id", "type": "INTEGER", 
> "mode": "required"}, {"name": "description", "type"
> : "STRING", "mode": "nullable"}]} (type )
>  Traceback for above exception (most recent call last):
>   File "/opt/conda/lib/python3.7/site-packages/apache_beam/utils/retry.py", 
> line 206, in wrapper
>     return fun(*args, **kwargs)
>   File 
> "/opt/conda/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
>  line 344, in _insert_load_job
>     **additional_load_parameters
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 791, in __init__
>     setattr(self, name, value)
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 973, in __setattr__
>     object.__setattr__(self, name, value)
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 1652, in __set__
>     super(MessageField, self).__set__(message_instance, value)
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 1293, in __set__
>     value = self.validate(value)
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 1400, in validate
>     return self.__validate(value, self.validate_element)
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 1358, in __validate
>     return validate_element(value)   
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 1340, in validate_element
>     (self.type, name, value, type(value)))
>  
> {code}
>  
> The triggering code looks like this:
>  
> options.view_as(D

[jira] [Assigned] (BEAM-8573) @SplitRestriction's documented signature is incorrect

2019-11-06 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-8573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-8573:
--

Assignee: Jonathan Alvarez-Gutierrez

> @SplitRestriction's documented signature is incorrect
> -
>
> Key: BEAM-8573
> URL: https://issues.apache.org/jira/browse/BEAM-8573
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Jonathan Alvarez-Gutierrez
>Assignee: Jonathan Alvarez-Gutierrez
>Priority: Minor
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> The documented signature for DoFn.SplitRestriction is 
> {code:java}
> List splitRestriction( InputT element, RestrictionT 
> restriction);{code}
> This fails to compile with, for example:
> {{@SplitRestriction split(long, OffsetRange): Must return void}}
> It looks like the correct signature is:
> {code:java}
> void splitRestriction(InputT element, RestrictionT restriction,
> OutputReceiver restriction);{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8573) @SplitRestriction's documented signature is incorrect

2019-11-06 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-8573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8573:
---
Status: Open  (was: Triage Needed)

> @SplitRestriction's documented signature is incorrect
> -
>
> Key: BEAM-8573
> URL: https://issues.apache.org/jira/browse/BEAM-8573
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Jonathan Alvarez-Gutierrez
>Priority: Minor
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> The documented signature for DoFn.SplitRestriction is 
> {code:java}
> List splitRestriction( InputT element, RestrictionT 
> restriction);{code}
> This fails to compile with, for example:
> {{@SplitRestriction split(long, OffsetRange): Must return void}}
> It looks like the correct signature is:
> {code:java}
> void splitRestriction(InputT element, RestrictionT restriction,
> OutputReceiver restriction);{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8572) tox environment: assert on Cython source file presence

2019-11-06 Thread Udi Meiri (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968716#comment-16968716
 ] 

Udi Meiri commented on BEAM-8572:
-

Ideas:
1a. Migrate precommit unit testing to pytest, mark Cython requiring tests as 
such (pytest.mark.cython), and only run them when 
1b. Same migration, but add a flag to conftest.py that makes it verify that 
cythonize has been run. (utils.check_compiled('apache_beam.coders')?)

> tox environment: assert on Cython source file presence
> --
>
> Key: BEAM-8572
> URL: https://issues.apache.org/jira/browse/BEAM-8572
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, testing
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>
> Add an assertion somewhere that checks if Cythonized files are present in the 
> sdist tarball in use. That is for "tox -e py27" assert that these files are 
> not present, for "tox -e py27-cython" assert that they are present.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8554) Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to be broken up

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8554?focusedWorklogId=339593&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339593
 ]

ASF GitHub Bot logged work on BEAM-8554:


Author: ASF GitHub Bot
Created on: 06/Nov/19 21:12
Start Date: 06/Nov/19 21:12
Worklog Time Spent: 10m 
  Work Description: scwhittle commented on pull request #10013: [BEAM-8554] 
Use WorkItemCommitRequest protobuf fields to signal that …
URL: https://github.com/apache/beam/pull/10013#discussion_r343328612
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorkerTest.java
 ##
 @@ -949,63 +949,14 @@ public void testKeyCommitTooLargeException() throws 
Exception {
 assertEquals(2, result.size());
 assertEquals(makeExpectedOutput(2, 0, "key", "key").build(), 
result.get(2L));
 assertTrue(result.containsKey(1L));
-assertEquals("large_key", result.get(1L).getKey().toStringUtf8());
-assertTrue(result.get(1L).getSerializedSize() > 1000);
 
-// Spam worker updates a few times.
-int maxTries = 10;
-while (--maxTries > 0) {
-  worker.reportPeriodicWorkerUpdates();
-  Uninterruptibles.sleepUninterruptibly(1000, TimeUnit.MILLISECONDS);
-}
+WorkItemCommitRequest largeCommit = result.get(1L);
+assertEquals("large_key", largeCommit.getKey().toStringUtf8());
 
-// We should see an exception reported for the large commit but not the 
small one.
-ArgumentCaptor workItemStatusCaptor =
-ArgumentCaptor.forClass(WorkItemStatus.class);
-verify(mockWorkUnitClient, 
atLeast(2)).reportWorkItemStatus(workItemStatusCaptor.capture());
-List capturedStatuses = 
workItemStatusCaptor.getAllValues();
-boolean foundErrors = false;
-for (WorkItemStatus status : capturedStatuses) {
-  if (!status.getErrors().isEmpty()) {
-assertFalse(foundErrors);
-foundErrors = true;
-String errorMessage = status.getErrors().get(0).getMessage();
-assertThat(errorMessage, 
Matchers.containsString("KeyCommitTooLargeException"));
-  }
-}
-assertTrue(foundErrors);
-  }
-
-  @Test
-  public void testKeyCommitTooLargeException_StreamingEngine() throws 
Exception {
-KvCoder kvCoder = KvCoder.of(StringUtf8Coder.of(), 
StringUtf8Coder.of());
-
-List instructions =
-Arrays.asList(
-makeSourceInstruction(kvCoder),
-makeDoFnInstruction(new LargeCommitFn(), 0, kvCoder),
-makeSinkInstruction(kvCoder, 1));
-
-FakeWindmillServer server = new FakeWindmillServer(errorCollector);
-server.setExpectedExceptionCount(1);
-
-StreamingDataflowWorkerOptions options =
-createTestingPipelineOptions(server, 
"--experiments=enable_streaming_engine");
-StreamingDataflowWorker worker = makeWorker(instructions, options, true /* 
publishCounters */);
-worker.setMaxWorkItemCommitBytes(1000);
-worker.start();
-
-server.addWorkToOffer(makeInput(1, 0, "large_key"));
-server.addWorkToOffer(makeInput(2, 0, "key"));
-server.waitForEmptyWorkQueue();
-
-Map result = 
server.waitForAndGetCommits(1);
-
-assertEquals(2, result.size());
-assertEquals(makeExpectedOutput(2, 0, "key", "key").build(), 
result.get(2L));
-assertTrue(result.containsKey(1L));
-assertEquals("large_key", result.get(1L).getKey().toStringUtf8());
-assertTrue(result.get(1L).getSerializedSize() > 1000);
+// The large commit should have its flags set marking it for truncation
+assertTrue(largeCommit.getExceedsMaxWorkItemCommitBytes());
+assertTrue(largeCommit.getSerializedSize() < 100);
 
 Review comment:
   verify the timers and output messages repeated fields are empty
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339593)
Time Spent: 50m  (was: 40m)

> Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to 
> be broken up
> -
>
> Key: BEAM-8554
> URL: https://issues.apache.org/jira/browse/BEAM-8554
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Koonce
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> +Background:+
> When a WorkItemCommitRequest is generated that's bigger than the permitted 
> size (> ~180 MB), a KeyCommitTooLargeException is logged (_not thrown_) and 
> the request

[jira] [Work logged] (BEAM-8554) Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to be broken up

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8554?focusedWorklogId=339590&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339590
 ]

ASF GitHub Bot logged work on BEAM-8554:


Author: ASF GitHub Bot
Created on: 06/Nov/19 21:12
Start Date: 06/Nov/19 21:12
Worklog Time Spent: 10m 
  Work Description: scwhittle commented on pull request #10013: [BEAM-8554] 
Use WorkItemCommitRequest protobuf fields to signal that …
URL: https://github.com/apache/beam/pull/10013#discussion_r343329744
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorkerTest.java
 ##
 @@ -949,63 +949,14 @@ public void testKeyCommitTooLargeException() throws 
Exception {
 assertEquals(2, result.size());
 assertEquals(makeExpectedOutput(2, 0, "key", "key").build(), 
result.get(2L));
 assertTrue(result.containsKey(1L));
-assertEquals("large_key", result.get(1L).getKey().toStringUtf8());
-assertTrue(result.get(1L).getSerializedSize() > 1000);
 
-// Spam worker updates a few times.
-int maxTries = 10;
-while (--maxTries > 0) {
-  worker.reportPeriodicWorkerUpdates();
-  Uninterruptibles.sleepUninterruptibly(1000, TimeUnit.MILLISECONDS);
-}
+WorkItemCommitRequest largeCommit = result.get(1L);
+assertEquals("large_key", largeCommit.getKey().toStringUtf8());
 
 Review comment:
   verify sharding_key, work_token, cache_token also (verified in other cases 
with makeExpectedOutput)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339590)

> Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to 
> be broken up
> -
>
> Key: BEAM-8554
> URL: https://issues.apache.org/jira/browse/BEAM-8554
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Koonce
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> +Background:+
> When a WorkItemCommitRequest is generated that's bigger than the permitted 
> size (> ~180 MB), a KeyCommitTooLargeException is logged (_not thrown_) and 
> the request is still sent to the service.  The service rejects the commit, 
> but breaks up input messages that were bundled together and adds them to new, 
> smaller work items that will later be pulled and re-tried - likely without 
> generating another commit that is too large.
> When a WorkItemCommitRequest is generated that's too large to be sent back to 
> the service (> 2 GB), a KeyCommitTooLargeException is thrown and nothing is 
> sent back to the service.
>  
> +Proposed Improvement+
> In both cases, prevent the doomed, large commit item from being sent back to 
> the service.  Instead send flags in the commit request signaling that the 
> current work item led to a commit that is too large and the work item should 
> be broken up.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8554) Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to be broken up

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8554?focusedWorklogId=339592&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339592
 ]

ASF GitHub Bot logged work on BEAM-8554:


Author: ASF GitHub Bot
Created on: 06/Nov/19 21:12
Start Date: 06/Nov/19 21:12
Worklog Time Spent: 10m 
  Work Description: scwhittle commented on pull request #10013: [BEAM-8554] 
Use WorkItemCommitRequest protobuf fields to signal that …
URL: https://github.com/apache/beam/pull/10013#discussion_r343327998
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/windmill/src/main/proto/windmill.proto
 ##
 @@ -290,12 +293,19 @@ message WorkItemCommitRequest {
   optional SourceState source_state_updates = 12;
   optional int64 source_watermark = 13 [default=-0x8000];
   optional int64 source_backlog_bytes = 17 [default=-1];
+  optional int64 source_bytes_processed = 22 [default = 0];
+
   repeated WatermarkHold watermark_holds = 14;
 
+  repeated int64 finalize_ids = 19 [packed = true];
+
+  optional int64 testonly_fake_clock_time_usec = 23;
 
 Review comment:
   rm and instead have in reserved field list below
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339592)
Time Spent: 50m  (was: 40m)

> Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to 
> be broken up
> -
>
> Key: BEAM-8554
> URL: https://issues.apache.org/jira/browse/BEAM-8554
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Koonce
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> +Background:+
> When a WorkItemCommitRequest is generated that's bigger than the permitted 
> size (> ~180 MB), a KeyCommitTooLargeException is logged (_not thrown_) and 
> the request is still sent to the service.  The service rejects the commit, 
> but breaks up input messages that were bundled together and adds them to new, 
> smaller work items that will later be pulled and re-tried - likely without 
> generating another commit that is too large.
> When a WorkItemCommitRequest is generated that's too large to be sent back to 
> the service (> 2 GB), a KeyCommitTooLargeException is thrown and nothing is 
> sent back to the service.
>  
> +Proposed Improvement+
> In both cases, prevent the doomed, large commit item from being sent back to 
> the service.  Instead send flags in the commit request signaling that the 
> current work item led to a commit that is too large and the work item should 
> be broken up.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8554) Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to be broken up

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8554?focusedWorklogId=339591&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339591
 ]

ASF GitHub Bot logged work on BEAM-8554:


Author: ASF GitHub Bot
Created on: 06/Nov/19 21:12
Start Date: 06/Nov/19 21:12
Worklog Time Spent: 10m 
  Work Description: scwhittle commented on pull request #10013: [BEAM-8554] 
Use WorkItemCommitRequest protobuf fields to signal that …
URL: https://github.com/apache/beam/pull/10013#discussion_r343328116
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/windmill/src/main/proto/windmill.proto
 ##
 @@ -290,12 +293,19 @@ message WorkItemCommitRequest {
   optional SourceState source_state_updates = 12;
   optional int64 source_watermark = 13 [default=-0x8000];
   optional int64 source_backlog_bytes = 17 [default=-1];
+  optional int64 source_bytes_processed = 22 [default = 0];
+
   repeated WatermarkHold watermark_holds = 14;
 
+  repeated int64 finalize_ids = 19 [packed = true];
+
+  optional int64 testonly_fake_clock_time_usec = 23;
+
   // DEPRECATED
   repeated GlobalDataId global_data_id_requests = 9;
 
   reserved 6;
+
 
 Review comment:
   rm blank line
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339591)

> Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to 
> be broken up
> -
>
> Key: BEAM-8554
> URL: https://issues.apache.org/jira/browse/BEAM-8554
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Koonce
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> +Background:+
> When a WorkItemCommitRequest is generated that's bigger than the permitted 
> size (> ~180 MB), a KeyCommitTooLargeException is logged (_not thrown_) and 
> the request is still sent to the service.  The service rejects the commit, 
> but breaks up input messages that were bundled together and adds them to new, 
> smaller work items that will later be pulled and re-tried - likely without 
> generating another commit that is too large.
> When a WorkItemCommitRequest is generated that's too large to be sent back to 
> the service (> 2 GB), a KeyCommitTooLargeException is thrown and nothing is 
> sent back to the service.
>  
> +Proposed Improvement+
> In both cases, prevent the doomed, large commit item from being sent back to 
> the service.  Instead send flags in the commit request signaling that the 
> current work item led to a commit that is too large and the work item should 
> be broken up.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8554) Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to be broken up

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8554?focusedWorklogId=339589&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339589
 ]

ASF GitHub Bot logged work on BEAM-8554:


Author: ASF GitHub Bot
Created on: 06/Nov/19 21:12
Start Date: 06/Nov/19 21:12
Worklog Time Spent: 10m 
  Work Description: scwhittle commented on pull request #10013: [BEAM-8554] 
Use WorkItemCommitRequest protobuf fields to signal that …
URL: https://github.com/apache/beam/pull/10013#discussion_r343327746
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/windmill/src/main/proto/windmill.proto
 ##
 @@ -290,12 +293,19 @@ message WorkItemCommitRequest {
   optional SourceState source_state_updates = 12;
   optional int64 source_watermark = 13 [default=-0x8000];
   optional int64 source_backlog_bytes = 17 [default=-1];
+  optional int64 source_bytes_processed = 22 [default = 0];
 
 Review comment:
   rm default=0, it's the default :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339589)
Time Spent: 40m  (was: 0.5h)

> Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to 
> be broken up
> -
>
> Key: BEAM-8554
> URL: https://issues.apache.org/jira/browse/BEAM-8554
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Koonce
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> +Background:+
> When a WorkItemCommitRequest is generated that's bigger than the permitted 
> size (> ~180 MB), a KeyCommitTooLargeException is logged (_not thrown_) and 
> the request is still sent to the service.  The service rejects the commit, 
> but breaks up input messages that were bundled together and adds them to new, 
> smaller work items that will later be pulled and re-tried - likely without 
> generating another commit that is too large.
> When a WorkItemCommitRequest is generated that's too large to be sent back to 
> the service (> 2 GB), a KeyCommitTooLargeException is thrown and nothing is 
> sent back to the service.
>  
> +Proposed Improvement+
> In both cases, prevent the doomed, large commit item from being sent back to 
> the service.  Instead send flags in the commit request signaling that the 
> current work item led to a commit that is too large and the work item should 
> be broken up.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-11-06 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-8561:
--

Assignee: Chris Larsen

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Minor
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-11-06 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8561:
---
Status: Open  (was: Triage Needed)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Priority: Minor
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8572) tox environment: assert on Cython source file presence

2019-11-06 Thread Udi Meiri (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968713#comment-16968713
 ] 

Udi Meiri commented on BEAM-8572:
-

Looks like some Cython-requiring tests are always skipped.

{code}
$ curl -s 
https://builds.apache.org/job/beam_PreCommit_Python_Phrase/900/consoleText | 
grep fast_coders_test.FastCoders consoleText 
test_using_fast_impl (apache_beam.coders.fast_coders_test.FastCoders) ... SKIP: 
Cython is not installed
test_using_fast_impl (apache_beam.coders.fast_coders_test.FastCoders) ... SKIP: 
Cython is not installed
test_using_fast_impl (apache_beam.coders.fast_coders_test.FastCoders) ... SKIP: 
Cython is not installed
test_using_fast_impl (apache_beam.coders.fast_coders_test.FastCoders) ... SKIP: 
Cython is not installed
test_using_fast_impl (apache_beam.coders.fast_coders_test.FastCoders) ... SKIP: 
Cython is not installed
test_using_fast_impl (apache_beam.coders.fast_coders_test.FastCoders) ... SKIP: 
Cython is not installed
test_using_fast_impl (apache_beam.coders.fast_coders_test.FastCoders) ... SKIP: 
Cython is not installed
test_using_fast_impl (apache_beam.coders.fast_coders_test.FastCoders) ... SKIP: 
Cython is not installed
test_using_fast_impl (apache_beam.coders.fast_coders_test.FastCoders) ... SKIP: 
Cython is not installed
test_using_fast_impl (apache_beam.coders.fast_coders_test.FastCoders) ... SKIP: 
Cython is not installed
test_using_fast_impl (apache_beam.coders.fast_coders_test.FastCoders) ... SKIP: 
Cython is not installed
test_using_fast_impl (apache_beam.coders.fast_coders_test.FastCoders) ... SKIP: 
Cython is not installed
{code}

> tox environment: assert on Cython source file presence
> --
>
> Key: BEAM-8572
> URL: https://issues.apache.org/jira/browse/BEAM-8572
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, testing
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>
> Add an assertion somewhere that checks if Cythonized files are present in the 
> sdist tarball in use. That is for "tox -e py27" assert that these files are 
> not present, for "tox -e py27-cython" assert that they are present.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8572) tox environment: assert on Cython source file presence

2019-11-06 Thread Udi Meiri (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri updated BEAM-8572:

Status: Open  (was: Triage Needed)

> tox environment: assert on Cython source file presence
> --
>
> Key: BEAM-8572
> URL: https://issues.apache.org/jira/browse/BEAM-8572
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, testing
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>
> Add an assertion somewhere that checks if Cythonized files are present in the 
> sdist tarball in use. That is for "tox -e py27" assert that these files are 
> not present, for "tox -e py27-cython" assert that they are present.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8452) TriggerLoadJobs.process in bigquery_file_loads schema is type str

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8452?focusedWorklogId=339588&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339588
 ]

ASF GitHub Bot logged work on BEAM-8452:


Author: ASF GitHub Bot
Created on: 06/Nov/19 21:06
Start Date: 06/Nov/19 21:06
Worklog Time Spent: 10m 
  Work Description: noah-goodrich commented on pull request #1: 
BEAM-8452 - TriggerLoadJobs.process in bigquery_file_loads schema is type str
URL: https://github.com/apache/beam/pull/1#discussion_r343327032
 
 

 ##
 File path: sdks/python/tox.ini
 ##
 @@ -20,6 +20,9 @@
 envlist = 
py27,py35,py36,py37,py27-{gcp,cython,lint,lint3},py35-{gcp,cython},py36-{gcp,cython},py37-{gcp,cython,lint},docs
 toxworkdir = {toxinidir}/target/{env:ENV_NAME:.tox}
 
+[flake8]
+ignore = E111,E114,E121,E125,E127,E129,E226,E302,E41,E502,W503,W504
 
 Review comment:
   Locally flake8 identified all of these as errors in the file I am trying to 
edit. Rather than refactor the file to match all of these rules, I thought it 
was simpler for myself and anyone else coming behind if flake8 knows that those 
rules aren't enforced. For example, a lot of the code was indented by multiples 
of 2 instead of 4, etc.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339588)
Time Spent: 1h 10m  (was: 1h)

> TriggerLoadJobs.process in bigquery_file_loads schema is type str
> -
>
> Key: BEAM-8452
> URL: https://issues.apache.org/jira/browse/BEAM-8452
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.15.0, 2.16.0
>Reporter: Noah Goodrich
>Assignee: Noah Goodrich
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
>  I've found a first issue with the BigQueryFileLoads Transform and the type 
> of the schema parameter.
> {code:java}
> Triggering job 
> beam_load_2019_10_11_140829_19_157670e4d458f0ff578fbe971a91b30a_1570802915 to 
> load data to BigQuery table   datasetId: 'pyr_monat_dev'
>  projectId: 'icentris-ml-dev'
>  tableId: 'tree_user_types'>.Schema: {"fields": [{"name": "id", "type": 
> "INTEGER", "mode": "required"}, {"name": "description", "type": "STRING", 
> "mode": "nullable"}]}. Additional parameters: {}
> Retry with exponential backoff: waiting for 4.875033410381894 seconds before 
> retrying _insert_load_job because we caught exception: 
> apitools.base.protorpclite.messages.ValidationError: Expected type  s 
> 'apache_beam.io.gcp.internal.clients.bigquery.bigquery_v2_messages.TableSchema'>
>  for field schema, found {"fields": [{"name": "id", "type": "INTEGER", 
> "mode": "required"}, {"name": "description", "type"
> : "STRING", "mode": "nullable"}]} (type )
>  Traceback for above exception (most recent call last):
>   File "/opt/conda/lib/python3.7/site-packages/apache_beam/utils/retry.py", 
> line 206, in wrapper
>     return fun(*args, **kwargs)
>   File 
> "/opt/conda/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
>  line 344, in _insert_load_job
>     **additional_load_parameters
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 791, in __init__
>     setattr(self, name, value)
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 973, in __setattr__
>     object.__setattr__(self, name, value)
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 1652, in __set__
>     super(MessageField, self).__set__(message_instance, value)
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 1293, in __set__
>     value = self.validate(value)
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 1400, in validate
>     return self.__validate(value, self.validate_element)
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 1358, in __validate
>     return validate_element(value)   
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 1340, in validate_element
>     (self.type, name, value, type(value)))
>  
> {code}
>  
> The triggering code looks like this:
>  
> options.view_as(DebugOptions).experiments = ['use_beam_bq_sink']
>         # Save main session state so pickled functions and classes
>         # defined

[jira] [Updated] (BEAM-8573) @SplitRestriction's documented signature is incorrect

2019-11-06 Thread Jonathan Alvarez-Gutierrez (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Alvarez-Gutierrez updated BEAM-8573:
-
Description: 
The documented signature for DoFn.SplitRestriction is 
{code:java}
List splitRestriction( InputT element, RestrictionT 
restriction);{code}
This fails to compile with, for example:

{{@SplitRestriction split(long, OffsetRange): Must return void}}

It looks like the correct signature is:
{code:java}
void splitRestriction(InputT element, RestrictionT restriction,
OutputReceiver restriction);{code}

  was:
The documented signature for DoFn.SplitRestriction is 
{code:java}
List splitRestriction( InputT element, RestrictionT 
restriction);{code}
This fails to compile with:

{{@SplitRestriction split(long, OffsetRange): Must return void}}

It looks like the correct signature is:
{code:java}
void splitRestriction(InputT element, RestrictionT restriction,
OutputReceiver restriction);{code}


> @SplitRestriction's documented signature is incorrect
> -
>
> Key: BEAM-8573
> URL: https://issues.apache.org/jira/browse/BEAM-8573
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Jonathan Alvarez-Gutierrez
>Priority: Minor
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> The documented signature for DoFn.SplitRestriction is 
> {code:java}
> List splitRestriction( InputT element, RestrictionT 
> restriction);{code}
> This fails to compile with, for example:
> {{@SplitRestriction split(long, OffsetRange): Must return void}}
> It looks like the correct signature is:
> {code:java}
> void splitRestriction(InputT element, RestrictionT restriction,
> OutputReceiver restriction);{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8554) Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to be broken up

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8554?focusedWorklogId=339584&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339584
 ]

ASF GitHub Bot logged work on BEAM-8554:


Author: ASF GitHub Bot
Created on: 06/Nov/19 20:58
Start Date: 06/Nov/19 20:58
Worklog Time Spent: 10m 
  Work Description: stevekoonce commented on issue #10013: [BEAM-8554] Use 
WorkItemCommitRequest protobuf fields to signal that …
URL: https://github.com/apache/beam/pull/10013#issuecomment-550497575
 
 
   R: @scwhittle 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339584)
Time Spent: 0.5h  (was: 20m)

> Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to 
> be broken up
> -
>
> Key: BEAM-8554
> URL: https://issues.apache.org/jira/browse/BEAM-8554
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Koonce
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> +Background:+
> When a WorkItemCommitRequest is generated that's bigger than the permitted 
> size (> ~180 MB), a KeyCommitTooLargeException is logged (_not thrown_) and 
> the request is still sent to the service.  The service rejects the commit, 
> but breaks up input messages that were bundled together and adds them to new, 
> smaller work items that will later be pulled and re-tried - likely without 
> generating another commit that is too large.
> When a WorkItemCommitRequest is generated that's too large to be sent back to 
> the service (> 2 GB), a KeyCommitTooLargeException is thrown and nothing is 
> sent back to the service.
>  
> +Proposed Improvement+
> In both cases, prevent the doomed, large commit item from being sent back to 
> the service.  Instead send flags in the commit request signaling that the 
> current work item led to a commit that is too large and the work item should 
> be broken up.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8554) Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to be broken up

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8554?focusedWorklogId=339583&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339583
 ]

ASF GitHub Bot logged work on BEAM-8554:


Author: ASF GitHub Bot
Created on: 06/Nov/19 20:58
Start Date: 06/Nov/19 20:58
Worklog Time Spent: 10m 
  Work Description: stevekoonce commented on issue #10013: [BEAM-8554] Use 
WorkItemCommitRequest protobuf fields to signal that …
URL: https://github.com/apache/beam/pull/10013#issuecomment-550497470
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339583)
Time Spent: 20m  (was: 10m)

> Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to 
> be broken up
> -
>
> Key: BEAM-8554
> URL: https://issues.apache.org/jira/browse/BEAM-8554
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Koonce
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> +Background:+
> When a WorkItemCommitRequest is generated that's bigger than the permitted 
> size (> ~180 MB), a KeyCommitTooLargeException is logged (_not thrown_) and 
> the request is still sent to the service.  The service rejects the commit, 
> but breaks up input messages that were bundled together and adds them to new, 
> smaller work items that will later be pulled and re-tried - likely without 
> generating another commit that is too large.
> When a WorkItemCommitRequest is generated that's too large to be sent back to 
> the service (> 2 GB), a KeyCommitTooLargeException is thrown and nothing is 
> sent back to the service.
>  
> +Proposed Improvement+
> In both cases, prevent the doomed, large commit item from being sent back to 
> the service.  Instead send flags in the commit request signaling that the 
> current work item led to a commit that is too large and the work item should 
> be broken up.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8573) @SplitRestriction's documented signature is incorrect

2019-11-06 Thread Jonathan Alvarez-Gutierrez (Jira)

Jonathan Alvarez-Gutierrez created BEAM-8573:


 Summary: @SplitRestriction's documented signature is incorrect
 Key: BEAM-8573
 URL: https://issues.apache.org/jira/browse/BEAM-8573
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core
Affects Versions: 2.15.0
Reporter: Jonathan Alvarez-Gutierrez


The documented signature for DoFn.SplitRestriction is 
{code:java}
List splitRestriction( InputT element, RestrictionT 
restriction);{code}
This fails to compile with:

{{@SplitRestriction split(long, OffsetRange): Must return void}}

It looks like the correct signature is:
{code:java}
void splitRestriction(InputT element, RestrictionT restriction,
OutputReceiver restriction);{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam

2019-11-06 Thread Brian Hulette (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968671#comment-16968671
 ] 

Brian Hulette commented on BEAM-8368:
-

Awesome, thank you!

> [Python] libprotobuf-generated exception when importing apache_beam
> ---
>
> Key: BEAM-8368
> URL: https://issues.apache.org/jira/browse/BEAM-8368
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.15.0, 2.17.0
>Reporter: Ubaier Bhat
>Assignee: Brian Hulette
>Priority: Blocker
> Fix For: 2.17.0
>
> Attachments: error_log.txt
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Unable to import apache_beam after upgrading to macos 10.15 (Catalina). 
> Cleared all the pipenvs and but can't get it working again.
> {code}
> import apache_beam as beam
> /Users/***/.local/share/virtualenvs/beam-etl-ims6DitU/lib/python3.7/site-packages/apache_beam/__init__.py:84:
>  UserWarning: Some syntactic constructs of Python 3 are not yet fully 
> supported by Apache Beam.
>   'Some syntactic constructs of Python 3 are not yet fully supported by '
> [libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already 
> exists in database: 
> [libprotobuf FATAL google/protobuf/descriptor.cc:1370] CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> libc++abi.dylib: terminating with uncaught exception of type 
> google::protobuf::FatalException: CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8554) Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to be broken up

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8554?focusedWorklogId=339570&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339570
 ]

ASF GitHub Bot logged work on BEAM-8554:


Author: ASF GitHub Bot
Created on: 06/Nov/19 20:07
Start Date: 06/Nov/19 20:07
Worklog Time Spent: 10m 
  Work Description: stevekoonce commented on pull request #10013: 
[BEAM-8554] Use WorkItemCommitRequest protobuf fields to signal that …
URL: https://github.com/apache/beam/pull/10013
 
 
   …a WorkItem needs to be broken up
   
   This implements the improvement described in 
[BEAM-8554](https://issues.apache.org/jira/browse/BEAM-8554): when the 
serialized size of a WorkItemCommitRequest proto is larger than the maximum 
size, the commit request will be replaced by a request for a server-side 
'truncation' which will cause the WorkItem itself to be broken up and, after 
reprocessing, result in multiple, smaller WorkItemCommitRequests that are each 
smaller and can be successfully submitted.
   
   I updated an existing unit test and removed a redundant one - the 
StreamingDataflowWorkerTest is already configured to run all tests with and 
without StreamingEngine and Windmill, so separate, otherwise-identical tests 
are not necessary.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCom

[jira] [Work logged] (BEAM-8402) Create a class hierarchy to represent environments

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8402?focusedWorklogId=339569&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339569
 ]

ASF GitHub Bot logged work on BEAM-8402:


Author: ASF GitHub Bot
Created on: 06/Nov/19 20:06
Start Date: 06/Nov/19 20:06
Worklog Time Spent: 10m 
  Work Description: violalyu commented on pull request #9811: [BEAM-8402] 
Create a class hierarchy to represent environments
URL: https://github.com/apache/beam/pull/9811#discussion_r343302106
 
 

 ##
 File path: sdks/python/apache_beam/transforms/environments.py
 ##
 @@ -0,0 +1,394 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Environments concepts.
+
+For internal use only. No backwards compatibility guarantees."""
+
+from __future__ import absolute_import
+
+import json
+
+from google.protobuf import message
+
+from apache_beam.portability import common_urns
+from apache_beam.portability import python_urns
+from apache_beam.portability.api import beam_runner_api_pb2
+from apache_beam.portability.api import endpoints_pb2
+from apache_beam.utils import proto_utils
+
+__all__ = ['Environment',
+   'DockerEnvironment', 'ProcessEnvironment', 'ExternalEnvironment',
+   'EmbeddedPythonEnvironment', 'EmbeddedPythonGrpcEnvironment',
+   'SubprocessSDKEnvironment', 'RunnerAPIEnvironmentHolder']
+
+
+class Environment(object):
+  """Abstract base class for environments.
+
+  Represents a type and configuration of environment.
+  Each type of Environment should have a unique urn.
+  """
 
 Review comment:
   Updated!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339569)
Time Spent: 2h 40m  (was: 2.5h)

> Create a class hierarchy to represent environments
> --
>
> Key: BEAM-8402
> URL: https://issues.apache.org/jira/browse/BEAM-8402
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> As a first step towards making it possible to assign different environments 
> to sections of a pipeline, we first need to expose environment classes to the 
> pipeline API.  Unlike PTransforms, PCollections, Coders, and Windowings,  
> environments exists solely in the portability framework as protobuf objects.  
>  By creating a hierarchy of "native" classes that represent the various 
> environment types -- external, docker, process, etc -- users will be able to 
> instantiate these and assign them to parts of the pipeline.  The assignment 
> portion will be covered in a follow-up issue/PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8402) Create a class hierarchy to represent environments

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8402?focusedWorklogId=339568&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339568
 ]

ASF GitHub Bot logged work on BEAM-8402:


Author: ASF GitHub Bot
Created on: 06/Nov/19 19:56
Start Date: 06/Nov/19 19:56
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9811: [BEAM-8402] 
Create a class hierarchy to represent environments
URL: https://github.com/apache/beam/pull/9811#discussion_r343297664
 
 

 ##
 File path: sdks/python/apache_beam/transforms/environments.py
 ##
 @@ -0,0 +1,394 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Environments concepts.
+
+For internal use only. No backwards compatibility guarantees."""
+
+from __future__ import absolute_import
+
+import json
+
+from google.protobuf import message
+
+from apache_beam.portability import common_urns
+from apache_beam.portability import python_urns
+from apache_beam.portability.api import beam_runner_api_pb2
+from apache_beam.portability.api import endpoints_pb2
+from apache_beam.utils import proto_utils
+
+__all__ = ['Environment',
+   'DockerEnvironment', 'ProcessEnvironment', 'ExternalEnvironment',
+   'EmbeddedPythonEnvironment', 'EmbeddedPythonGrpcEnvironment',
+   'SubprocessSDKEnvironment', 'RunnerAPIEnvironmentHolder']
+
+
+class Environment(object):
+  """Abstract base class for environments.
+
+  Represents a type and configuration of environment.
+  Each type of Environment should have a unique urn.
+  """
 
 Review comment:
   let's add the internal warning here too. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339568)
Time Spent: 2.5h  (was: 2h 20m)

> Create a class hierarchy to represent environments
> --
>
> Key: BEAM-8402
> URL: https://issues.apache.org/jira/browse/BEAM-8402
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> As a first step towards making it possible to assign different environments 
> to sections of a pipeline, we first need to expose environment classes to the 
> pipeline API.  Unlike PTransforms, PCollections, Coders, and Windowings,  
> environments exists solely in the portability framework as protobuf objects.  
>  By creating a hierarchy of "native" classes that represent the various 
> environment types -- external, docker, process, etc -- users will be able to 
> instantiate these and assign them to parts of the pipeline.  The assignment 
> portion will be covered in a follow-up issue/PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8402) Create a class hierarchy to represent environments

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8402?focusedWorklogId=339566&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339566
 ]

ASF GitHub Bot logged work on BEAM-8402:


Author: ASF GitHub Bot
Created on: 06/Nov/19 19:54
Start Date: 06/Nov/19 19:54
Worklog Time Spent: 10m 
  Work Description: violalyu commented on issue #9811: [BEAM-8402] Create a 
class hierarchy to represent environments
URL: https://github.com/apache/beam/pull/9811#issuecomment-550474432
 
 
   @mxm updated!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339566)
Time Spent: 2h 20m  (was: 2h 10m)

> Create a class hierarchy to represent environments
> --
>
> Key: BEAM-8402
> URL: https://issues.apache.org/jira/browse/BEAM-8402
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> As a first step towards making it possible to assign different environments 
> to sections of a pipeline, we first need to expose environment classes to the 
> pipeline API.  Unlike PTransforms, PCollections, Coders, and Windowings,  
> environments exists solely in the portability framework as protobuf objects.  
>  By creating a hierarchy of "native" classes that represent the various 
> environment types -- external, docker, process, etc -- users will be able to 
> instantiate these and assign them to parts of the pipeline.  The assignment 
> portion will be covered in a follow-up issue/PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-6756) Support lazy iterables in schemas

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-6756?focusedWorklogId=339565&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339565
 ]

ASF GitHub Bot logged work on BEAM-6756:


Author: ASF GitHub Bot
Created on: 06/Nov/19 19:54
Start Date: 06/Nov/19 19:54
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #10003: [BEAM-6756] Create 
Iterable type for Schema
URL: https://github.com/apache/beam/pull/10003#issuecomment-550474375
 
 
   run sql postcommit
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339565)
Time Spent: 1h  (was: 50m)

> Support lazy iterables in schemas
> -
>
> Key: BEAM-6756
> URL: https://issues.apache.org/jira/browse/BEAM-6756
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The iterables returned by GroupByKey and CoGroupByKey are lazy; this allows a 
> runner to page data into memory if the full iterable is too large. We 
> currently don't support this in Schemas, so the Schema Group and CoGroup 
> transforms materialize all data into memory. We should add support for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=339559&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339559
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 06/Nov/19 19:25
Start Date: 06/Nov/19 19:25
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #9892: [BEAM-8427] [SQL] 
buildIOWrite from MongoDb Table
URL: https://github.com/apache/beam/pull/9892#issuecomment-550462776
 
 
   R: @TheNeuralBit 
   cc: @apilloud 
   cc: @amaliujia 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339559)
Time Spent: 4h 20m  (was: 4h 10m)

> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: https://issues.apache.org/jira/browse/BEAM-8427
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> In progress:
>  * Create a MongoDB table and table provider.
>  * Implement buildIOReader
>  * Support primitive types
> Still needs to be done:
>  * Implement buildIOWrite
>  * improve getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8376) Add FirestoreIO connector to Java SDK

2019-11-06 Thread Jing Chen (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968627#comment-16968627
 ] 

Jing Chen commented on BEAM-8376:
-

chat with [~chamikara] offline, there are ongoing efforts on the issue. 
unassign myself.

> Add FirestoreIO connector to Java SDK
> -
>
> Key: BEAM-8376
> URL: https://issues.apache.org/jira/browse/BEAM-8376
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Stefan Djelekar
>Priority: Major
>
> Motivation:
> There is no Firestore connector for Java SDK at the moment.
> Having it will enhance the integrations with database options on the Google 
> Cloud Platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines

2019-11-06 Thread Kirill Kozlov (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8343:

Description: 
The objective is to create a universal way for Beam SQL IO APIs to support 
predicate/project push-down.
 A proposed way to achieve that is by introducing an interface responsible for 
identifying what portion(s) of a Calc can be moved down to IO layer. Also, 
adding following methods to a BeamSqlTable interface to pass necessary 
parameters to IO APIs:
 - BeamSqlTableFilter constructFilter(List filter)
 - ProjectSupport supportsProjects()
 - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
List fieldNames)
  

ProjectSupport is an enum with the following options:
 * NONE
 * WITHOUT_FIELD_REORDERING
 * WITH_FIELD_REORDERING

 

Design doc 
[link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].

  was:
The objective is to create a universal way for Beam SQL IO APIs to support 
predicate/project push-down.
 A proposed way to achieve that is by introducing an interface responsible for 
identifying what portion(s) of a Calc can be moved down to IO layer. Also, 
adding following methods to a BeamSqlTable interface to pass necessary 
parameters to IO APIs:
 - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
 - ProjectSupport supportsProjects()
 - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
List fieldNames)
  

 * ProjectSupport is an enum with the following options:
 * NONE
 * WITHOUT_FIELD_REORDERING
 * WITH_FIELD_REORDERING

 

Design doc 
[link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].


> Add means for IO APIs to support predicate and/or project push-down when 
> running SQL pipelines
> --
>
> Key: BEAM-8343
> URL: https://issues.apache.org/jira/browse/BEAM-8343
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> The objective is to create a universal way for Beam SQL IO APIs to support 
> predicate/project push-down.
>  A proposed way to achieve that is by introducing an interface responsible 
> for identifying what portion(s) of a Calc can be moved down to IO layer. 
> Also, adding following methods to a BeamSqlTable interface to pass necessary 
> parameters to IO APIs:
>  - BeamSqlTableFilter constructFilter(List filter)
>  - ProjectSupport supportsProjects()
>  - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
> List fieldNames)
>   
> ProjectSupport is an enum with the following options:
>  * NONE
>  * WITHOUT_FIELD_REORDERING
>  * WITH_FIELD_REORDERING
>  
> Design doc 
> [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-8376) Add FirestoreIO connector to Java SDK

2019-11-06 Thread Jing Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Chen reassigned BEAM-8376:
---

Assignee: (was: Jing Chen)

> Add FirestoreIO connector to Java SDK
> -
>
> Key: BEAM-8376
> URL: https://issues.apache.org/jira/browse/BEAM-8376
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Stefan Djelekar
>Priority: Major
>
> Motivation:
> There is no Firestore connector for Java SDK at the moment.
> Having it will enhance the integrations with database options on the Google 
> Cloud Platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines

2019-11-06 Thread Kirill Kozlov (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8343:

Description: 
The objective is to create a universal way for Beam SQL IO APIs to support 
predicate/project push-down.
 A proposed way to achieve that is by introducing an interface responsible for 
identifying what portion(s) of a Calc can be moved down to IO layer. Also, 
adding following methods to a BeamSqlTable interface to pass necessary 
parameters to IO APIs:
 - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
 - ProjectSupport supportsProjects()
 - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
List fieldNames)
  

 * ProjectSupport is an enum with the following options:
 * NONE
 * WITHOUT_FIELD_REORDERING
 * WITH_FIELD_REORDERING

 

Design doc 
[link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].

  was:
The objective is to create a universal way for Beam SQL IO APIs to support 
predicate/project push-down.
 A proposed way to achieve that is by introducing an interface responsible for 
identifying what portion(s) of a Calc can be moved down to IO layer. Also, 
adding following methods to a BeamSqlTable interface to pass necessary 
parameters to IO APIs:
 - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
 - Boolean supportsProjects()
 - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
List fieldNames)
  

Design doc 
[link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].


> Add means for IO APIs to support predicate and/or project push-down when 
> running SQL pipelines
> --
>
> Key: BEAM-8343
> URL: https://issues.apache.org/jira/browse/BEAM-8343
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> The objective is to create a universal way for Beam SQL IO APIs to support 
> predicate/project push-down.
>  A proposed way to achieve that is by introducing an interface responsible 
> for identifying what portion(s) of a Calc can be moved down to IO layer. 
> Also, adding following methods to a BeamSqlTable interface to pass necessary 
> parameters to IO APIs:
>  - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
>  - ProjectSupport supportsProjects()
>  - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
> List fieldNames)
>   
>  * ProjectSupport is an enum with the following options:
>  * NONE
>  * WITHOUT_FIELD_REORDERING
>  * WITH_FIELD_REORDERING
>  
> Design doc 
> [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-6756) Support lazy iterables in schemas

2019-11-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-6756?focusedWorklogId=339556&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339556
 ]

ASF GitHub Bot logged work on BEAM-6756:


Author: ASF GitHub Bot
Created on: 06/Nov/19 19:17
Start Date: 06/Nov/19 19:17
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #10003: [BEAM-6756] 
Create Iterable type for Schema
URL: https://github.com/apache/beam/pull/10003#discussion_r343279150
 
 

 ##
 File path: learning/katas/java/.idea/study_project.xml
 ##
 @@ -0,0 +1,3151 @@
+
 
 Review comment:
   No. Ignore that file - I will remove it from the PR.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339556)
Time Spent: 50m  (was: 40m)

> Support lazy iterables in schemas
> -
>
> Key: BEAM-6756
> URL: https://issues.apache.org/jira/browse/BEAM-6756
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The iterables returned by GroupByKey and CoGroupByKey are lazy; this allows a 
> runner to page data into memory if the full iterable is too large. We 
> currently don't support this in Schemas, so the Schema Group and CoGroup 
> transforms materialize all data into memory. We should add support for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

1 2 3 >

1 - 100 of 206 matches

Mail list logo