[jira] [Created] (BEAM-7666) Pipe memory thrashing signal to Dataflow
Dustin Rhodes created BEAM-7666: --- Summary: Pipe memory thrashing signal to Dataflow Key: BEAM-7666 URL: https://issues.apache.org/jira/browse/BEAM-7666 Project: Beam Issue Type: Improvement Components: runner-dataflow Reporter: Dustin Rhodes For autoscaling we would like to know if the user worker is spending too much time garbage collecting. Pipe this signal through counters to DF. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7666) Pipe memory thrashing signal to Dataflow
[ https://issues.apache.org/jira/browse/BEAM-7666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Rhodes updated BEAM-7666: Priority: Critical (was: Major) > Pipe memory thrashing signal to Dataflow > > > Key: BEAM-7666 > URL: https://issues.apache.org/jira/browse/BEAM-7666 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Dustin Rhodes >Priority: Critical > > For autoscaling we would like to know if the user worker is spending too much > time garbage collecting. Pipe this signal through counters to DF. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-7666) Pipe memory thrashing signal to Dataflow
[ https://issues.apache.org/jira/browse/BEAM-7666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Rhodes reassigned BEAM-7666: --- Assignee: Dustin Rhodes > Pipe memory thrashing signal to Dataflow > > > Key: BEAM-7666 > URL: https://issues.apache.org/jira/browse/BEAM-7666 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Dustin Rhodes >Assignee: Dustin Rhodes >Priority: Critical > > For autoscaling we would like to know if the user worker is spending too much > time garbage collecting. Pipe this signal through counters to DF. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-6540) Autoscaling should be aware of Streaming RPC Quota
[ https://issues.apache.org/jira/browse/BEAM-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Rhodes closed BEAM-6540. --- Resolution: Fixed > Autoscaling should be aware of Streaming RPC Quota > --- > > Key: BEAM-6540 > URL: https://issues.apache.org/jira/browse/BEAM-6540 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Affects Versions: 2.11.0 >Reporter: Dustin Rhodes >Assignee: Tyler Akidau >Priority: Major > Labels: triaged > Fix For: 2.11.0 > > Time Spent: 7.5h > Remaining Estimate: 0h > > Streaming Windmill Service introduces quota for the shared windmill workers. > Autoscaling needs to be aware of throttling due to this quota in order to not > upscale. This PR adds in that reporting. > > It also introduces the flag --EnableStreamingEngine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-6571) Flag for streaming engine
[ https://issues.apache.org/jira/browse/BEAM-6571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Rhodes closed BEAM-6571. --- Resolution: Fixed Fix Version/s: 2.11.0 > Flag for streaming engine > - > > Key: BEAM-6571 > URL: https://issues.apache.org/jira/browse/BEAM-6571 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Dustin Rhodes >Assignee: Dustin Rhodes >Priority: Major > Labels: triaged > Fix For: 2.11.0 > > > Adds the --enableStreamingEngine for Java and Python -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-6561) WorkProgressUpdater synchronizes on java.util.concurrent classes
[ https://issues.apache.org/jira/browse/BEAM-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758558#comment-16758558 ] Dustin Rhodes edited comment on BEAM-6561 at 2/1/19 6:51 PM: - whoops meant to put this on 6562 was (Author: dustin12): I believe this is because there is synchronization on batches, a ConcurrentLinkedDeque, to enforce the atomicity of a sequence of method calls on it. As far as I'm aware this pattern is error prone (or at least confusing) as ConcurrentLinkedDeque does not make any guarantees about using itself as a lock for its internal operations so there is not guarantee that the synchronized blocks run atomically. Is best practice here to switch it to a standard LinkedDeque and do all the synchronization on it manually so that we are sure the synchronized blocks are atomic? Is there an easy way (a gradle command) I can run Findbugs to confirm that this is the bug its finding (although I'm pretty sure)? > WorkProgressUpdater synchronizes on java.util.concurrent classes > > > Key: BEAM-6561 > URL: https://issues.apache.org/jira/browse/BEAM-6561 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Kenneth Knowles >Priority: Major > > Findbugs caught this. There seems to be synchronization on variables where > the classes themselves have their own mechanisms. If intended, it should be > made more clear by simple mutexes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6562) GrpcWindmillServer synchronizes on java.util.concurrent classes
[ https://issues.apache.org/jira/browse/BEAM-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758573#comment-16758573 ] Dustin Rhodes commented on BEAM-6562: - I believe this is because there is synchronization on batches, a ConcurrentLinkedDeque, to enforce the atomicity of a sequence of method calls on it. As far as I'm aware this pattern is error prone (or at least confusing) as ConcurrentLinkedDeque does not make any guarantees about using itself as a lock for its internal operations so there is not guarantee that the synchronized blocks run atomically. Is best practice here to switch it to a standard LinkedDeque and do all the synchronization on it manually so that we are sure the synchronized blocks are atomic? Is there an easy way (a gradle command) I can run Findbugs to confirm that this is the bug its finding (although I'm pretty sure)? > GrpcWindmillServer synchronizes on java.util.concurrent classes > --- > > Key: BEAM-6562 > URL: https://issues.apache.org/jira/browse/BEAM-6562 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Kenneth Knowles >Priority: Major > > Findbugs caught this. There seems to be synchronization on variables where > the classes themselves have their own mechanisms. If intended, it should be > made more clear by simple mutexes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6561) WorkProgressUpdater synchronizes on java.util.concurrent classes
[ https://issues.apache.org/jira/browse/BEAM-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758558#comment-16758558 ] Dustin Rhodes commented on BEAM-6561: - I believe this is because there is synchronization on batches, a ConcurrentLinkedDeque, to enforce the atomicity of a sequence of method calls on it. As far as I'm aware this pattern is error prone (or at least confusing) as ConcurrentLinkedDeque does not make any guarantees about using itself as a lock for its internal operations so there is not guarantee that the synchronized blocks run atomically. Is best practice here to switch it to a standard LinkedDeque and do all the synchronization on it manually so that we are sure the synchronized blocks are atomic? Is there an easy way (a gradle command) I can run Findbugs to confirm that this is the bug its finding (although I'm pretty sure)? > WorkProgressUpdater synchronizes on java.util.concurrent classes > > > Key: BEAM-6561 > URL: https://issues.apache.org/jira/browse/BEAM-6561 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Kenneth Knowles >Priority: Major > > Findbugs caught this. There seems to be synchronization on variables where > the classes themselves have their own mechanisms. If intended, it should be > made more clear by simple mutexes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-6571) Flag for streaming engine
[ https://issues.apache.org/jira/browse/BEAM-6571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Rhodes reassigned BEAM-6571: --- Assignee: Dustin Rhodes (was: Tyler Akidau) > Flag for streaming engine > - > > Key: BEAM-6571 > URL: https://issues.apache.org/jira/browse/BEAM-6571 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Dustin Rhodes >Assignee: Dustin Rhodes >Priority: Major > > Adds the --enableStreamingEngine for Java and Python -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6571) Flag for streaming engine
Dustin Rhodes created BEAM-6571: --- Summary: Flag for streaming engine Key: BEAM-6571 URL: https://issues.apache.org/jira/browse/BEAM-6571 Project: Beam Issue Type: Improvement Components: runner-dataflow Reporter: Dustin Rhodes Assignee: Tyler Akidau Adds the --enableStreamingEngine for Java and Python -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-6190) "Processing stuck" messages should be visible on Pantheon
[ https://issues.apache.org/jira/browse/BEAM-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Rhodes closed BEAM-6190. --- Verified it is now reported in prod. > "Processing stuck" messages should be visible on Pantheon > - > > Key: BEAM-6190 > URL: https://issues.apache.org/jira/browse/BEAM-6190 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Affects Versions: 2.8.0 > Environment: Running on Google Cloud Dataflow >Reporter: Dustin Rhodes >Assignee: Dustin Rhodes >Priority: Minor > Fix For: Not applicable > > Original Estimate: 24h > Time Spent: 1h 40m > Remaining Estimate: 22h 20m > > When user processing results in an exception, it is clearly visible on the > Pantheon landing page for a streaming Dataflow job. But when user processing > becomes stuck, there is no indication, even though the worker logs it. Most > users don't check worker logs and it is not that convenient to check for most > users. Ideally a stuck worker would result in a visible error on the > Pantheon landing page. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-6190) "Processing stuck" messages should be visible on Pantheon
[ https://issues.apache.org/jira/browse/BEAM-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Rhodes resolved BEAM-6190. - Resolution: Fixed > "Processing stuck" messages should be visible on Pantheon > - > > Key: BEAM-6190 > URL: https://issues.apache.org/jira/browse/BEAM-6190 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Affects Versions: 2.8.0 > Environment: Running on Google Cloud Dataflow >Reporter: Dustin Rhodes >Assignee: Dustin Rhodes >Priority: Minor > Fix For: Not applicable > > Original Estimate: 24h > Time Spent: 1h 40m > Remaining Estimate: 22h 20m > > When user processing results in an exception, it is clearly visible on the > Pantheon landing page for a streaming Dataflow job. But when user processing > becomes stuck, there is no indication, even though the worker logs it. Most > users don't check worker logs and it is not that convenient to check for most > users. Ideally a stuck worker would result in a visible error on the > Pantheon landing page. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6540) Autoscaling should be aware of Streaming RPC Quota
Dustin Rhodes created BEAM-6540: --- Summary: Autoscaling should be aware of Streaming RPC Quota Key: BEAM-6540 URL: https://issues.apache.org/jira/browse/BEAM-6540 Project: Beam Issue Type: Improvement Components: runner-dataflow Affects Versions: 2.11.0 Reporter: Dustin Rhodes Assignee: Tyler Akidau Fix For: 2.11.0 Streaming Windmill Service introduces quota for the shared windmill workers. Autoscaling needs to be aware of throttling due to this quota in order to not upscale. This PR adds in that reporting. It also introduces the flag --EnableStreamingEngine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-6190) "Processing stuck" messages should be visible on Pantheon
[ https://issues.apache.org/jira/browse/BEAM-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dustin Rhodes reassigned BEAM-6190: --- Assignee: Dustin Rhodes (was: Tyler Akidau) > "Processing stuck" messages should be visible on Pantheon > - > > Key: BEAM-6190 > URL: https://issues.apache.org/jira/browse/BEAM-6190 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Affects Versions: 2.8.0 > Environment: Running on Google Cloud Dataflow >Reporter: Dustin Rhodes >Assignee: Dustin Rhodes >Priority: Minor > Fix For: Not applicable > > Original Estimate: 24h > Remaining Estimate: 24h > > When user processing results in an exception, it is clearly visible on the > Pantheon landing page for a streaming Dataflow job. But when user processing > becomes stuck, there is no indication, even though the worker logs it. Most > users don't check worker logs and it is not that convenient to check for most > users. Ideally a stuck worker would result in a visible error on the > Pantheon landing page. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6190) "Processing stuck" messages should be visible on Pantheon
Dustin Rhodes created BEAM-6190: --- Summary: "Processing stuck" messages should be visible on Pantheon Key: BEAM-6190 URL: https://issues.apache.org/jira/browse/BEAM-6190 Project: Beam Issue Type: Improvement Components: runner-dataflow Affects Versions: 2.8.0 Environment: Running on Google Cloud Dataflow Reporter: Dustin Rhodes Assignee: Tyler Akidau Fix For: Not applicable When user processing results in an exception, it is clearly visible on the Pantheon landing page for a streaming Dataflow job. But when user processing becomes stuck, there is no indication, even though the worker logs it. Most users don't check worker logs and it is not that convenient to check for most users. Ideally a stuck worker would result in a visible error on the Pantheon landing page. -- This message was sent by Atlassian JIRA (v7.6.3#76005)