[jira] [Commented] (FLINK-2097) Add support for JobSessions
[ https://issues.apache.org/jira/browse/FLINK-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637045#comment-14637045 ] ASF GitHub Bot commented on FLINK-2097: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/858#issuecomment-123755792 It should just add more nodes to the ExecutionGraph. Existing ones should not be modified. For batch, I think the assumption is that it needs to be finished. For streaming, I could also picture attaching nodes at runtime but this has to be carefully implemented.. Add support for JobSessions --- Key: FLINK-2097 URL: https://issues.apache.org/jira/browse/FLINK-2097 Project: Flink Issue Type: Sub-task Components: JobManager Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Maximilian Michels Fix For: 0.9 Sessions make sure that the JobManager does not immediately discard a JobGraph after execution, but keeps it around for further operations to be attached to the graph. By keeping the JobGraph around, the cached streams (intermediate data) are also kept, That is the way of realizing interactive sessions on top of a streaming dataflow abstraction. ExecutionGraphs should be kept as long as - no timeout occurred or - the session has not been explicitly ended -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2097] Implement job session management
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/858#issuecomment-123755792 It should just add more nodes to the ExecutionGraph. Existing ones should not be modified. For batch, I think the assumption is that it needs to be finished. For streaming, I could also picture attaching nodes at runtime but this has to be carefully implemented.. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-2393) Add a stateless at-least-once mode for streaming
Stephan Ewen created FLINK-2393: --- Summary: Add a stateless at-least-once mode for streaming Key: FLINK-2393 URL: https://issues.apache.org/jira/browse/FLINK-2393 Project: Flink Issue Type: New Feature Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Currently, the checkpointing mechanism provides exactly once guarantees. Part of that is the step that temporarily aligns the data streams. This step increases the tuple latency temporarily. By offering a version that does not provide exactly-once, but only at-least-once, we can avoid the latency increase. For super-low-latency applications, that tolerate duplicates, this may be an interesting option. To realize that, we would use a slightly modified version of the checkpointing algorithm. Effectively, the streams would not be aligned, but tasks would only count the received barriers and emit their own barrier as soon as the saw a barrier from all inputs. My feeling is that it makes not sense to implement state backups, when being concerned with this super low latency. The mode would hence be a purely stateless at-least-once mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2097) Add support for JobSessions
[ https://issues.apache.org/jira/browse/FLINK-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637038#comment-14637038 ] ASF GitHub Bot commented on FLINK-2097: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/858#issuecomment-123754538 I think right now, it pretty much behaves as if someone started a new job, with the grown execution graph. Add support for JobSessions --- Key: FLINK-2097 URL: https://issues.apache.org/jira/browse/FLINK-2097 Project: Flink Issue Type: Sub-task Components: JobManager Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Maximilian Michels Fix For: 0.9 Sessions make sure that the JobManager does not immediately discard a JobGraph after execution, but keeps it around for further operations to be attached to the graph. By keeping the JobGraph around, the cached streams (intermediate data) are also kept, That is the way of realizing interactive sessions on top of a streaming dataflow abstraction. ExecutionGraphs should be kept as long as - no timeout occurred or - the session has not been explicitly ended -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2097] Implement job session management
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/858#issuecomment-123754538 I think right now, it pretty much behaves as if someone started a new job, with the grown execution graph. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-2394) HadoopOutFormat OutputCommitter is default to FileOutputCommiter
Stefano Bortoli created FLINK-2394: -- Summary: HadoopOutFormat OutputCommitter is default to FileOutputCommiter Key: FLINK-2394 URL: https://issues.apache.org/jira/browse/FLINK-2394 Project: Flink Issue Type: Bug Components: Hadoop Compatibility Affects Versions: 0.9.0 Reporter: Stefano Bortoli MongoOutputFormat does not write back in collection because the HadoopOutputFormat wrapper does not allow to set the MongoOutputCommiter and is set as default to FileOutputCommitter. Therefore, on close and globalFinalize execution the commit does not happen and mongo collection stays untouched. A simple solution would be to: 1 - create a constructor of HadoopOutputFormatBase and HadoopOutputFormat that gets the OutputCommitter as a parameter 2 - change the outputCommitter field of HadoopOutputFormatBase to be a generic OutputCommitter 3 - remove the default assignment in the open() and finalizeGlobal to the outputCommitter to FileOutputCommitter(), or keep it as a default in case of no specific assignment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2385] [scala api] Add parenthesis to Sc...
GitHub user StephanEwen opened a pull request: https://github.com/apache/flink/pull/933 [FLINK-2385] [scala api] Add parenthesis to Scala 'distinct' transformation. The operation is not side-effect free. It does not mutate the original DataSet, but defines distributed computation. This pull request also correct some comment markups. You can merge this pull request into a Git repository by running: $ git pull https://github.com/StephanEwen/incubator-flink scala_distinct Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/933.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #933 commit 828761adaeff7a08e0927170e61b1f46c77b55d4 Author: Stephan Ewen se...@apache.org Date: 2015-07-21T14:55:44Z [FLINK-2385] [scala api] Add parenthesis to 'distinct' transformation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2332] [runtime] Adds leader session IDs...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/917#issuecomment-123755971 Addressed the following comments: Corrected order of visibility and abstract modifiers. Removed the lazy log field from `FlinkActor`. Now all implementing subclasses have to implement it. Made `RequiresLeaderSessionID` a Java interface. All other comments haven been resolved by discussion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2332) Assign session IDs to JobManager and TaskManager messages
[ https://issues.apache.org/jira/browse/FLINK-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637059#comment-14637059 ] ASF GitHub Bot commented on FLINK-2332: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/917#issuecomment-123755971 Addressed the following comments: Corrected order of visibility and abstract modifiers. Removed the lazy log field from `FlinkActor`. Now all implementing subclasses have to implement it. Made `RequiresLeaderSessionID` a Java interface. All other comments haven been resolved by discussion. Assign session IDs to JobManager and TaskManager messages - Key: FLINK-2332 URL: https://issues.apache.org/jira/browse/FLINK-2332 Project: Flink Issue Type: Sub-task Components: JobManager, TaskManager Reporter: Till Rohrmann Assignee: Till Rohrmann Fix For: 0.10 In order to support true high availability {{TaskManager}} and {{JobManager}} have to be able to distinguish whether a message was sent from the leader or whether a message was sent from a former leader. Messages which come from a former leader have to be discarded in order to guarantee a consistent state. A way to do achieve this is to assign a leader session ID to a {{JobManager}} once he's elected as leader. This leader session ID is sent to the {{TaskManager}} upon registration at the {{JobManager}}. All subsequent messages should then be decorated with this leader session ID to mark them as valid. On the {{TaskManager}} side the received leader session ID as a response to the registration message, can then be used to validate incoming messages. The same holds true for registration messages which should have a registration session ID, too. That way, it is possible to distinguish invalid registration messages from valid ones. The registration session ID can be assigned once the TaskManager is notified about the new leader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2393) Add a stateless at-least-once mode for streaming
[ https://issues.apache.org/jira/browse/FLINK-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637083#comment-14637083 ] Aljoscha Krettek commented on FLINK-2393: - I guess you meant no state backup for operators other than sources. Otherwise you wouldn't need to state barriers at all since they don't do anything. Add a stateless at-least-once mode for streaming -- Key: FLINK-2393 URL: https://issues.apache.org/jira/browse/FLINK-2393 Project: Flink Issue Type: New Feature Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Currently, the checkpointing mechanism provides exactly once guarantees. Part of that is the step that temporarily aligns the data streams. This step increases the tuple latency temporarily. By offering a version that does not provide exactly-once, but only at-least-once, we can avoid the latency increase. For super-low-latency applications, that tolerate duplicates, this may be an interesting option. To realize that, we would use a slightly modified version of the checkpointing algorithm. Effectively, the streams would not be aligned, but tasks would only count the received barriers and emit their own barrier as soon as the saw a barrier from all inputs. My feeling is that it makes not sense to implement state backups, when being concerned with this super low latency. The mode would hence be a purely stateless at-least-once mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Collect(): Fixing the akka.framesize size limi...
Github user kl0u commented on the pull request: https://github.com/apache/flink/pull/887#issuecomment-123752371 Hi @mxm. Thanks a lot! I don't have your email unfortunately. Could you somehow send it to me? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2332] [runtime] Adds leader session IDs...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/917#issuecomment-123751721 Do you mean `JobManager.getJobManagerGateway`? This is only a temporary solution to obtain an `ActorGateway` for the JobManager for which you have to know the current leader session ID. This will be changed once HA with ZooKeeper is introduced. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Collect(): Fixing the akka.framesize size limi...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/887#issuecomment-123756678 @kl0u Sure, I've sent you an email. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2332] [runtime] Adds leader session IDs...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/917#issuecomment-123757309 Looks good. +1 to merge this... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2332) Assign session IDs to JobManager and TaskManager messages
[ https://issues.apache.org/jira/browse/FLINK-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637064#comment-14637064 ] ASF GitHub Bot commented on FLINK-2332: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/917#issuecomment-123757309 Looks good. +1 to merge this... Assign session IDs to JobManager and TaskManager messages - Key: FLINK-2332 URL: https://issues.apache.org/jira/browse/FLINK-2332 Project: Flink Issue Type: Sub-task Components: JobManager, TaskManager Reporter: Till Rohrmann Assignee: Till Rohrmann Fix For: 0.10 In order to support true high availability {{TaskManager}} and {{JobManager}} have to be able to distinguish whether a message was sent from the leader or whether a message was sent from a former leader. Messages which come from a former leader have to be discarded in order to guarantee a consistent state. A way to do achieve this is to assign a leader session ID to a {{JobManager}} once he's elected as leader. This leader session ID is sent to the {{TaskManager}} upon registration at the {{JobManager}}. All subsequent messages should then be decorated with this leader session ID to mark them as valid. On the {{TaskManager}} side the received leader session ID as a response to the registration message, can then be used to validate incoming messages. The same holds true for registration messages which should have a registration session ID, too. That way, it is possible to distinguish invalid registration messages from valid ones. The registration session ID can be assigned once the TaskManager is notified about the new leader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2385) Scala DataSet.distinct should have parenthesis
[ https://issues.apache.org/jira/browse/FLINK-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637068#comment-14637068 ] ASF GitHub Bot commented on FLINK-2385: --- GitHub user StephanEwen opened a pull request: https://github.com/apache/flink/pull/933 [FLINK-2385] [scala api] Add parenthesis to Scala 'distinct' transformation. The operation is not side-effect free. It does not mutate the original DataSet, but defines distributed computation. This pull request also correct some comment markups. You can merge this pull request into a Git repository by running: $ git pull https://github.com/StephanEwen/incubator-flink scala_distinct Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/933.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #933 commit 828761adaeff7a08e0927170e61b1f46c77b55d4 Author: Stephan Ewen se...@apache.org Date: 2015-07-21T14:55:44Z [FLINK-2385] [scala api] Add parenthesis to 'distinct' transformation. Scala DataSet.distinct should have parenthesis -- Key: FLINK-2385 URL: https://issues.apache.org/jira/browse/FLINK-2385 Project: Flink Issue Type: Bug Components: Scala API Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 The method is not a side-effect free accessor, but defines heavy computation, even if it does not mutate the original data set. This is a somewhat API breaking change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2332) Assign session IDs to JobManager and TaskManager messages
[ https://issues.apache.org/jira/browse/FLINK-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637031#comment-14637031 ] ASF GitHub Bot commented on FLINK-2332: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/917#issuecomment-123751721 Do you mean `JobManager.getJobManagerGateway`? This is only a temporary solution to obtain an `ActorGateway` for the JobManager for which you have to know the current leader session ID. This will be changed once HA with ZooKeeper is introduced. Assign session IDs to JobManager and TaskManager messages - Key: FLINK-2332 URL: https://issues.apache.org/jira/browse/FLINK-2332 Project: Flink Issue Type: Sub-task Components: JobManager, TaskManager Reporter: Till Rohrmann Assignee: Till Rohrmann Fix For: 0.10 In order to support true high availability {{TaskManager}} and {{JobManager}} have to be able to distinguish whether a message was sent from the leader or whether a message was sent from a former leader. Messages which come from a former leader have to be discarded in order to guarantee a consistent state. A way to do achieve this is to assign a leader session ID to a {{JobManager}} once he's elected as leader. This leader session ID is sent to the {{TaskManager}} upon registration at the {{JobManager}}. All subsequent messages should then be decorated with this leader session ID to mark them as valid. On the {{TaskManager}} side the received leader session ID as a response to the registration message, can then be used to validate incoming messages. The same holds true for registration messages which should have a registration session ID, too. That way, it is possible to distinguish invalid registration messages from valid ones. The registration session ID can be assigned once the TaskManager is notified about the new leader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Collect(): Fixing the akka.framesize size limi...
Github user kl0u commented on the pull request: https://github.com/apache/flink/pull/887#issuecomment-123756855 Thanks a lot! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1818] Added api to cancel job from clie...
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/642#discussion_r35240445 --- Diff: flink-clients/src/main/java/org/apache/flink/client/program/Client.java --- @@ -421,6 +426,50 @@ public JobSubmissionResult run(JobGraph jobGraph, boolean wait) throws ProgramIn } } + /** +* Executes the CANCEL action through Client API. +* +* @param Accepts job id and cancels the job +* +*/ + + + public int cancel(JobID jobId) throws ProgramInvocationException { --- End diff -- You only return 0 or 1, so you might also return a Boolean. Or just return nothing and throw an Exception in case of an error. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1818) Provide API to cancel running job
[ https://issues.apache.org/jira/browse/FLINK-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637250#comment-14637250 ] ASF GitHub Bot commented on FLINK-1818: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/642#discussion_r35240445 --- Diff: flink-clients/src/main/java/org/apache/flink/client/program/Client.java --- @@ -421,6 +426,50 @@ public JobSubmissionResult run(JobGraph jobGraph, boolean wait) throws ProgramIn } } + /** +* Executes the CANCEL action through Client API. +* +* @param Accepts job id and cancels the job +* +*/ + + + public int cancel(JobID jobId) throws ProgramInvocationException { --- End diff -- You only return 0 or 1, so you might also return a Boolean. Or just return nothing and throw an Exception in case of an error. Provide API to cancel running job - Key: FLINK-1818 URL: https://issues.apache.org/jira/browse/FLINK-1818 Project: Flink Issue Type: Improvement Components: Java API Affects Versions: 0.9 Reporter: Robert Metzger Assignee: niraj rai Labels: starter http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Canceling-a-Cluster-Job-from-Java-td4897.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1818] Added api to cancel job from clie...
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/642#discussion_r35240432 --- Diff: flink-clients/src/main/java/org/apache/flink/client/program/Client.java --- @@ -421,6 +426,50 @@ public JobSubmissionResult run(JobGraph jobGraph, boolean wait) throws ProgramIn } } + /** +* Executes the CANCEL action through Client API. --- End diff -- Could be a bit more explanatory :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1818) Provide API to cancel running job
[ https://issues.apache.org/jira/browse/FLINK-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637248#comment-14637248 ] ASF GitHub Bot commented on FLINK-1818: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/642#discussion_r35240432 --- Diff: flink-clients/src/main/java/org/apache/flink/client/program/Client.java --- @@ -421,6 +426,50 @@ public JobSubmissionResult run(JobGraph jobGraph, boolean wait) throws ProgramIn } } + /** +* Executes the CANCEL action through Client API. --- End diff -- Could be a bit more explanatory :) Provide API to cancel running job - Key: FLINK-1818 URL: https://issues.apache.org/jira/browse/FLINK-1818 Project: Flink Issue Type: Improvement Components: Java API Affects Versions: 0.9 Reporter: Robert Metzger Assignee: niraj rai Labels: starter http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Canceling-a-Cluster-Job-from-Java-td4897.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1818) Provide API to cancel running job
[ https://issues.apache.org/jira/browse/FLINK-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637249#comment-14637249 ] ASF GitHub Bot commented on FLINK-1818: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/642#discussion_r35240436 --- Diff: flink-clients/src/main/java/org/apache/flink/client/program/Client.java --- @@ -421,6 +426,50 @@ public JobSubmissionResult run(JobGraph jobGraph, boolean wait) throws ProgramIn } } + /** +* Executes the CANCEL action through Client API. +* +* @param Accepts job id and cancels the job --- End diff -- JavaDoc is not correct. Provide API to cancel running job - Key: FLINK-1818 URL: https://issues.apache.org/jira/browse/FLINK-1818 Project: Flink Issue Type: Improvement Components: Java API Affects Versions: 0.9 Reporter: Robert Metzger Assignee: niraj rai Labels: starter http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Canceling-a-Cluster-Job-from-Java-td4897.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2373) Add configuration parameter to createRemoteEnvironment method
[ https://issues.apache.org/jira/browse/FLINK-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637329#comment-14637329 ] Stephan Ewen commented on FLINK-2373: - That is definitely a good idea. Would you be up for creating a patch? Add configuration parameter to createRemoteEnvironment method - Key: FLINK-2373 URL: https://issues.apache.org/jira/browse/FLINK-2373 Project: Flink Issue Type: Bug Components: other Reporter: Andreas Kunft Priority: Minor Original Estimate: 24h Remaining Estimate: 24h Currently there is no way to provide a custom configuration upon creation of a remote environment (via ExecutionEnvironment.createRemoteEnvironment(...)). This leads to errors when the submitted job exceeds the default value for the max. payload size in Akka, as we can not increase the configuration value (akka.remote.OversizedPayloadException: Discarding oversized payload...) Providing an overloaded method with a configuration parameter for the remote environment fixes that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1818) Provide API to cancel running job
[ https://issues.apache.org/jira/browse/FLINK-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637385#comment-14637385 ] niraj rai commented on FLINK-1818: -- Thanks Max for mentoring me.. Really appreciate your help.. Looking forward to contribute more .. Provide API to cancel running job - Key: FLINK-1818 URL: https://issues.apache.org/jira/browse/FLINK-1818 Project: Flink Issue Type: Improvement Components: Java API Affects Versions: 0.9 Reporter: Robert Metzger Assignee: niraj rai Labels: starter http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Canceling-a-Cluster-Job-from-Java-td4897.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1818] Added api to cancel job from clie...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/642#issuecomment-123815171 @rainiraj Thanks for your contribution! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1818] Added api to cancel job from clie...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/642#issuecomment-123802868 Thanks @rainiraj. I think we can merge your changes with some small adjustments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1818] Added api to cancel job from clie...
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/642#discussion_r35240453 --- Diff: flink-clients/src/main/java/org/apache/flink/client/program/Client.java --- @@ -421,6 +426,50 @@ public JobSubmissionResult run(JobGraph jobGraph, boolean wait) throws ProgramIn } } + /** +* Executes the CANCEL action through Client API. +* +* @param Accepts job id and cancels the job +* +*/ + + + public int cancel(JobID jobId) throws ProgramInvocationException { + LOG.info(Executing 'cancel' command.); + final FiniteDuration timeout = AkkaUtils.getTimeout(configuration); + //String address = configuration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, null); --- End diff -- This comment can probably be removed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1818) Provide API to cancel running job
[ https://issues.apache.org/jira/browse/FLINK-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637251#comment-14637251 ] ASF GitHub Bot commented on FLINK-1818: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/642#discussion_r35240453 --- Diff: flink-clients/src/main/java/org/apache/flink/client/program/Client.java --- @@ -421,6 +426,50 @@ public JobSubmissionResult run(JobGraph jobGraph, boolean wait) throws ProgramIn } } + /** +* Executes the CANCEL action through Client API. +* +* @param Accepts job id and cancels the job +* +*/ + + + public int cancel(JobID jobId) throws ProgramInvocationException { + LOG.info(Executing 'cancel' command.); + final FiniteDuration timeout = AkkaUtils.getTimeout(configuration); + //String address = configuration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, null); --- End diff -- This comment can probably be removed. Provide API to cancel running job - Key: FLINK-1818 URL: https://issues.apache.org/jira/browse/FLINK-1818 Project: Flink Issue Type: Improvement Components: Java API Affects Versions: 0.9 Reporter: Robert Metzger Assignee: niraj rai Labels: starter http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Canceling-a-Cluster-Job-from-Java-td4897.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1818] Added api to cancel job from clie...
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/642#discussion_r35240436 --- Diff: flink-clients/src/main/java/org/apache/flink/client/program/Client.java --- @@ -421,6 +426,50 @@ public JobSubmissionResult run(JobGraph jobGraph, boolean wait) throws ProgramIn } } + /** +* Executes the CANCEL action through Client API. +* +* @param Accepts job id and cancels the job --- End diff -- JavaDoc is not correct. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Closed] (FLINK-2388) JobManager should try retrieving jobs from archive
[ https://issues.apache.org/jira/browse/FLINK-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels closed FLINK-2388. - Resolution: Fixed Thanks for your help, [~ebautistabar]! JobManager should try retrieving jobs from archive -- Key: FLINK-2388 URL: https://issues.apache.org/jira/browse/FLINK-2388 Project: Flink Issue Type: Task Components: JobManager Reporter: Enrique Bautista Barahona I was following the quickstart guide with the WordCount example and, when I entered the analyze page for the job, nothing came up. Apparently the JobManagerInfoServlet fails with a NullPointerException. I've been reading the code and I've seen the problem is in the processing of the RequestAccumulatorResultsStringified message in JobManager. There's a TODO where the accumulators should be retrieved from the archive. As I wanted to know more about Flink internals, I decided to try and fix it. I've later seen that there's currently ongoing work in that part of the code, so I guess maybe it's not needed, but if you want I could submit a PR. If you have already taken it into account and will solve it shortly, please feel free to close the issue. If you want to take a look, the commit is here: https://github.com/ebautistabar/flink/commit/8536352b21fb6c78ad8840b0397509df04358c6b -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2388] return AccumulatorResultsNotFound...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/930 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2388) JobManager should try retrieving jobs from archive
[ https://issues.apache.org/jira/browse/FLINK-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637274#comment-14637274 ] ASF GitHub Bot commented on FLINK-2388: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/930 JobManager should try retrieving jobs from archive -- Key: FLINK-2388 URL: https://issues.apache.org/jira/browse/FLINK-2388 Project: Flink Issue Type: Task Components: JobManager Reporter: Enrique Bautista Barahona I was following the quickstart guide with the WordCount example and, when I entered the analyze page for the job, nothing came up. Apparently the JobManagerInfoServlet fails with a NullPointerException. I've been reading the code and I've seen the problem is in the processing of the RequestAccumulatorResultsStringified message in JobManager. There's a TODO where the accumulators should be retrieved from the archive. As I wanted to know more about Flink internals, I decided to try and fix it. I've later seen that there's currently ongoing work in that part of the code, so I guess maybe it's not needed, but if you want I could submit a PR. If you have already taken it into account and will solve it shortly, please feel free to close the issue. If you want to take a look, the commit is here: https://github.com/ebautistabar/flink/commit/8536352b21fb6c78ad8840b0397509df04358c6b -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1818) Provide API to cancel running job
[ https://issues.apache.org/jira/browse/FLINK-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637347#comment-14637347 ] ASF GitHub Bot commented on FLINK-1818: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/642#issuecomment-123815171 @rainiraj Thanks for your contribution! Provide API to cancel running job - Key: FLINK-1818 URL: https://issues.apache.org/jira/browse/FLINK-1818 Project: Flink Issue Type: Improvement Components: Java API Affects Versions: 0.9 Reporter: Robert Metzger Assignee: niraj rai Labels: starter http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Canceling-a-Cluster-Job-from-Java-td4897.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2304) Add named attribute access to Storm compatibility layer
[ https://issues.apache.org/jira/browse/FLINK-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636427#comment-14636427 ] ASF GitHub Bot commented on FLINK-2304: --- Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/878#issuecomment-123589583 Thanks I will merge it later today! :) Add named attribute access to Storm compatibility layer --- Key: FLINK-2304 URL: https://issues.apache.org/jira/browse/FLINK-2304 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Matthias J. Sax Assignee: Matthias J. Sax Priority: Minor Currently, Bolts running in Flink can access fields only by index. Enabling named index access is possible for whole topologies and Tuple-type as well as POJO type inputs for embedded Bolts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2304] Add named attribute access to Sto...
Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/878#issuecomment-123589583 Thanks I will merge it later today! :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1658) Rename AbstractEvent to AbstractTaskEvent and AbstractJobEvent
[ https://issues.apache.org/jira/browse/FLINK-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636520#comment-14636520 ] ASF GitHub Bot commented on FLINK-1658: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/929#issuecomment-123620462 No, +1. Rename AbstractEvent to AbstractTaskEvent and AbstractJobEvent -- Key: FLINK-1658 URL: https://issues.apache.org/jira/browse/FLINK-1658 Project: Flink Issue Type: Improvement Components: Distributed Runtime, Local Runtime Reporter: Gyula Fora Assignee: Matthias J. Sax Priority: Trivial The same name is used for different event classes in the runtime which can cause confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1658] Rename AbstractEvent to AbstractT...
Github user uce commented on the pull request: https://github.com/apache/flink/pull/929#issuecomment-123620462 No, +1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2388) JobManager should try retrieving jobs from archive
[ https://issues.apache.org/jira/browse/FLINK-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636531#comment-14636531 ] Maximilian Michels commented on FLINK-2388: --- Hi [~ebautistabar], thanks for finding and even looking into the problem. It's wrong to return a null value if the accumulators are not found. Instead, we should send a {{AccumulatorNotFound}} message. Then, the {{JobManagerInfoServlet}} doesn't have to deal with null values. I've prepared a fix in the pull request. I will add another commit to the pull request that forwards the message to the {{MemoryArchivist}} that in turn replies to the {{JobManagerInfoServlet}}. JobManager should try retrieving jobs from archive -- Key: FLINK-2388 URL: https://issues.apache.org/jira/browse/FLINK-2388 Project: Flink Issue Type: Task Components: JobManager Reporter: Enrique Bautista Barahona I was following the quickstart guide with the WordCount example and, when I entered the analyze page for the job, nothing came up. Apparently the JobManagerInfoServlet fails with a NullPointerException. I've been reading the code and I've seen the problem is in the processing of the RequestAccumulatorResultsStringified message in JobManager. There's a TODO where the accumulators should be retrieved from the archive. As I wanted to know more about Flink internals, I decided to try and fix it. I've later seen that there's currently ongoing work in that part of the code, so I guess maybe it's not needed, but if you want I could submit a PR. If you have already taken it into account and will solve it shortly, please feel free to close the issue. If you want to take a look, the commit is here: https://github.com/ebautistabar/flink/commit/8536352b21fb6c78ad8840b0397509df04358c6b -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2388) JobManager should try retrieving jobs from archive
[ https://issues.apache.org/jira/browse/FLINK-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636533#comment-14636533 ] ASF GitHub Bot commented on FLINK-2388: --- GitHub user mxm opened a pull request: https://github.com/apache/flink/pull/930 [FLINK-2388] return AccumulatorResultsNotFound until archive is checked You can merge this pull request into a Git repository by running: $ git pull https://github.com/mxm/flink accumulator-archive-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/930.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #930 commit 0df9996bc2c054d11e837f1d1556ee14f76240f9 Author: Maximilian Michels m...@apache.org Date: 2015-07-22T08:38:43Z [JobManager] return AccumulatorResultsNotFound until archive is checked JobManager should try retrieving jobs from archive -- Key: FLINK-2388 URL: https://issues.apache.org/jira/browse/FLINK-2388 Project: Flink Issue Type: Task Components: JobManager Reporter: Enrique Bautista Barahona I was following the quickstart guide with the WordCount example and, when I entered the analyze page for the job, nothing came up. Apparently the JobManagerInfoServlet fails with a NullPointerException. I've been reading the code and I've seen the problem is in the processing of the RequestAccumulatorResultsStringified message in JobManager. There's a TODO where the accumulators should be retrieved from the archive. As I wanted to know more about Flink internals, I decided to try and fix it. I've later seen that there's currently ongoing work in that part of the code, so I guess maybe it's not needed, but if you want I could submit a PR. If you have already taken it into account and will solve it shortly, please feel free to close the issue. If you want to take a look, the commit is here: https://github.com/ebautistabar/flink/commit/8536352b21fb6c78ad8840b0397509df04358c6b -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2388] return AccumulatorResultsNotFound...
GitHub user mxm opened a pull request: https://github.com/apache/flink/pull/930 [FLINK-2388] return AccumulatorResultsNotFound until archive is checked You can merge this pull request into a Git repository by running: $ git pull https://github.com/mxm/flink accumulator-archive-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/930.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #930 commit 0df9996bc2c054d11e837f1d1556ee14f76240f9 Author: Maximilian Michels m...@apache.org Date: 2015-07-22T08:38:43Z [JobManager] return AccumulatorResultsNotFound until archive is checked --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2388) JobManager should try retrieving jobs from archive
[ https://issues.apache.org/jira/browse/FLINK-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636549#comment-14636549 ] ASF GitHub Bot commented on FLINK-2388: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/930#discussion_r35193777 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -688,16 +688,15 @@ class JobManager( case RequestAccumulatorResults(jobID) = try { - val accumulatorValues: java.util.Map[String, SerializedValue[Object]] = { -currentJobs.get(jobID) match { - case Some((graph, jobInfo)) = -graph.getAccumulatorsSerialized - case None = -null // TODO check also archive -} + currentJobs.get(jobID) match { +case Some((graph, jobInfo)) = + val accumulatorValues = graph.getAccumulatorsSerialized + sender() ! AccumulatorResultsFound(jobID, accumulatorValues) +case None = + // TODO check also archive + sender() ! AccumulatorResultsNotFound(jobID) --- End diff -- Why not properly resolving the TODO? JobManager should try retrieving jobs from archive -- Key: FLINK-2388 URL: https://issues.apache.org/jira/browse/FLINK-2388 Project: Flink Issue Type: Task Components: JobManager Reporter: Enrique Bautista Barahona I was following the quickstart guide with the WordCount example and, when I entered the analyze page for the job, nothing came up. Apparently the JobManagerInfoServlet fails with a NullPointerException. I've been reading the code and I've seen the problem is in the processing of the RequestAccumulatorResultsStringified message in JobManager. There's a TODO where the accumulators should be retrieved from the archive. As I wanted to know more about Flink internals, I decided to try and fix it. I've later seen that there's currently ongoing work in that part of the code, so I guess maybe it's not needed, but if you want I could submit a PR. If you have already taken it into account and will solve it shortly, please feel free to close the issue. If you want to take a look, the commit is here: https://github.com/ebautistabar/flink/commit/8536352b21fb6c78ad8840b0397509df04358c6b -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2388) JobManager should try retrieving jobs from archive
[ https://issues.apache.org/jira/browse/FLINK-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636551#comment-14636551 ] ASF GitHub Bot commented on FLINK-2388: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/930#discussion_r35193782 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -707,33 +706,33 @@ class JobManager( case RequestAccumulatorResultsStringified(jobId) = try { - val accumulatorValues: Array[StringifiedAccumulatorResult] = { -currentJobs.get(jobId) match { - case Some((graph, jobInfo)) = -val accumulators = graph.aggregateUserAccumulators() - -val result: Array[StringifiedAccumulatorResult] = new -Array[StringifiedAccumulatorResult](accumulators.size) - -var i = 0 -accumulators foreach { - case (name, accumulator) = -val (typeString, valueString) = - if (accumulator != null) { -(accumulator.getClass.getSimpleName, accumulator.toString) - } else { -(null, null) - } -result(i) = new StringifiedAccumulatorResult(name, typeString, valueString) -i += 1 -} -result - case None = -null // TODO check also archive -} + currentJobs.get(jobId) match { +case Some((graph, jobInfo)) = + val accumulators = graph.aggregateUserAccumulators() + + val result: Array[StringifiedAccumulatorResult] = new + Array[StringifiedAccumulatorResult](accumulators.size) + + var i = 0 + accumulators foreach { +case (name, accumulator) = + val (typeString, valueString) = +if (accumulator != null) { + (accumulator.getClass.getSimpleName, accumulator.toString) +} else { + (null, null) +} + result(i) = new StringifiedAccumulatorResult(name, typeString, valueString) + i += 1 + } + + sender() ! AccumulatorResultStringsFound(jobId, result) + +case None = + // TODO check also archive + sender() ! AccumulatorResultsNotFound(jobId) --- End diff -- Same here. JobManager should try retrieving jobs from archive -- Key: FLINK-2388 URL: https://issues.apache.org/jira/browse/FLINK-2388 Project: Flink Issue Type: Task Components: JobManager Reporter: Enrique Bautista Barahona I was following the quickstart guide with the WordCount example and, when I entered the analyze page for the job, nothing came up. Apparently the JobManagerInfoServlet fails with a NullPointerException. I've been reading the code and I've seen the problem is in the processing of the RequestAccumulatorResultsStringified message in JobManager. There's a TODO where the accumulators should be retrieved from the archive. As I wanted to know more about Flink internals, I decided to try and fix it. I've later seen that there's currently ongoing work in that part of the code, so I guess maybe it's not needed, but if you want I could submit a PR. If you have already taken it into account and will solve it shortly, please feel free to close the issue. If you want to take a look, the commit is here: https://github.com/ebautistabar/flink/commit/8536352b21fb6c78ad8840b0397509df04358c6b -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2388] return AccumulatorResultsNotFound...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/930#discussion_r35193782 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -707,33 +706,33 @@ class JobManager( case RequestAccumulatorResultsStringified(jobId) = try { - val accumulatorValues: Array[StringifiedAccumulatorResult] = { -currentJobs.get(jobId) match { - case Some((graph, jobInfo)) = -val accumulators = graph.aggregateUserAccumulators() - -val result: Array[StringifiedAccumulatorResult] = new -Array[StringifiedAccumulatorResult](accumulators.size) - -var i = 0 -accumulators foreach { - case (name, accumulator) = -val (typeString, valueString) = - if (accumulator != null) { -(accumulator.getClass.getSimpleName, accumulator.toString) - } else { -(null, null) - } -result(i) = new StringifiedAccumulatorResult(name, typeString, valueString) -i += 1 -} -result - case None = -null // TODO check also archive -} + currentJobs.get(jobId) match { +case Some((graph, jobInfo)) = + val accumulators = graph.aggregateUserAccumulators() + + val result: Array[StringifiedAccumulatorResult] = new + Array[StringifiedAccumulatorResult](accumulators.size) + + var i = 0 + accumulators foreach { +case (name, accumulator) = + val (typeString, valueString) = +if (accumulator != null) { + (accumulator.getClass.getSimpleName, accumulator.toString) +} else { + (null, null) +} + result(i) = new StringifiedAccumulatorResult(name, typeString, valueString) + i += 1 + } + + sender() ! AccumulatorResultStringsFound(jobId, result) + +case None = + // TODO check also archive + sender() ! AccumulatorResultsNotFound(jobId) --- End diff -- Same here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2363) Add an end-to-end overview of program execution in Flink to the docs
[ https://issues.apache.org/jira/browse/FLINK-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636431#comment-14636431 ] ASF GitHub Bot commented on FLINK-2363: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/913#issuecomment-123590116 @fhueske Naa, I just want to see it finished. So I'll probably have to write some other parts myself but it would be cool if you could contribute this part. I'll open my own branch and then people can do pull requests against that, I will try to write the stuff myself or delegate to others for stuff that I don't know very well. Add an end-to-end overview of program execution in Flink to the docs Key: FLINK-2363 URL: https://issues.apache.org/jira/browse/FLINK-2363 Project: Flink Issue Type: Improvement Components: Documentation Reporter: Stephan Ewen -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2363] [docs] First part of internals - ...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/913#issuecomment-123590116 @fhueske Naa, I just want to see it finished. So I'll probably have to write some other parts myself but it would be cool if you could contribute this part. I'll open my own branch and then people can do pull requests against that, I will try to write the stuff myself or delegate to others for stuff that I don't know very well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1901) Create sample operator for Dataset
[ https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636451#comment-14636451 ] Till Rohrmann commented on FLINK-1901: -- If you use the sampling operator this way, it works. However, usually your iteration data set is something like the weight vector of your model and you have another training dataset from which you want to take a small sample to update your weight vector in each iteration (e.g. SGD). When you write a program like that, then you'll see that the output of the sampling operator will always be the same (for every iteration). The reason is that the sampling no longer is on the dynamic path of the iteration and thus it is only once calculated and then cached. This is not the intended behaviour, though. Create sample operator for Dataset -- Key: FLINK-1901 URL: https://issues.apache.org/jira/browse/FLINK-1901 Project: Flink Issue Type: Improvement Components: Core Reporter: Theodore Vasiloudis Assignee: Chengxiang Li In order to be able to implement Stochastic Gradient Descent and a number of other machine learning algorithms we need to have a way to take a random sample from a Dataset. We need to be able to sample with or without replacement from the Dataset, choose the relative size of the sample, and set a seed for reproducibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2390) Replace iteration timeout with algorithm for detecting termination
[ https://issues.apache.org/jira/browse/FLINK-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636497#comment-14636497 ] Aljoscha Krettek commented on FLINK-2390: - Yes, this could be a good idea. How would the head operator figure out that no more input is arriving? I'm asking because I expect streaming jobs to be long-running, so there might not arrive input for a while before more input arrives. Replace iteration timeout with algorithm for detecting termination -- Key: FLINK-2390 URL: https://issues.apache.org/jira/browse/FLINK-2390 Project: Flink Issue Type: New Feature Components: Streaming Reporter: Gyula Fora Fix For: 0.10 Currently the user can set a timeout which will shut down the iteration source/sink nodes if no new data is received during that time to allow program termination in iterative streaming jobs. This method is used due to the non-trivial nature of termination in iterative streaming jobs. While termination is not a main concern in long running streaming jobs, this behaviour makes iterative tests non-deterministic and they often fail on travis due to the timeout. Also setting a timeout can cause jobs to terminate prematurely. I propose to remove iteration timeouts and replace it with the following algorithm for detecting termination: -We first identify loop edges in the jobgraph (the channels from the iteration sources to the head operators) -Once the head operators (the ones with loop input) finish with all their non-loop inputs they broadcast a marker to their outputs. -Each operator will broadcast a marker once it received a marker from all its non-finished inputs -Iteration sources are terminated when they receive 2 consecutive markers without receiving any record in-between The idea behind the algorithm is to find out when no more outputs are generated from the operators inside an iteration after their normal inputs are finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1927][py] Operator distribution rework
GitHub user zentol opened a pull request: https://github.com/apache/flink/pull/931 [FLINK-1927][py] Operator distribution rework Python operators are no longer serialized and shipped across the cluster. Instead the plan file is executed on each node, followed by usage of the respective operator object. removed dill library also fixed [FLINK-2173] by always passing file paths explicitly to python You can merge this pull request into a Git repository by running: $ git pull https://github.com/zentol/flink python_operator4 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/931.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #931 commit 40fd3501cacb7b382c7265f0370d0f94887b7e85 Author: zentol s.mo...@web.de Date: 2015-07-21T19:22:19Z [FLINK-1927][py] Operator distribution rework Python operators are no longer serialized and shipped across the cluster. Instead the plan file is executed on each node, followed by usage of the respective operator object. removed dill library [FLINK-2173] filepaths are always explicitly passed to python --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Assigned] (FLINK-2173) Python uses different tmp file than Flink
[ https://issues.apache.org/jira/browse/FLINK-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chesnay Schepler reassigned FLINK-2173: --- Assignee: Chesnay Schepler Python uses different tmp file than Flink - Key: FLINK-2173 URL: https://issues.apache.org/jira/browse/FLINK-2173 Project: Flink Issue Type: Bug Components: Python API Environment: Debian Linux Reporter: Matthias J. Sax Assignee: Chesnay Schepler Priority: Critical Flink gets the temp file path using System.getProperty(java.io.tmpdir) while Python uses the tempfile.gettempdir() method. However, both can be semantically different. On my system Flink uses /tmp while Pyhton used /tmp/users/1000 (1000 is my Linux user-id) This issues leads (at least) to failing tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1927) [Py] Rework operator distribution
[ https://issues.apache.org/jira/browse/FLINK-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636537#comment-14636537 ] ASF GitHub Bot commented on FLINK-1927: --- GitHub user zentol opened a pull request: https://github.com/apache/flink/pull/931 [FLINK-1927][py] Operator distribution rework Python operators are no longer serialized and shipped across the cluster. Instead the plan file is executed on each node, followed by usage of the respective operator object. removed dill library also fixed [FLINK-2173] by always passing file paths explicitly to python You can merge this pull request into a Git repository by running: $ git pull https://github.com/zentol/flink python_operator4 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/931.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #931 commit 40fd3501cacb7b382c7265f0370d0f94887b7e85 Author: zentol s.mo...@web.de Date: 2015-07-21T19:22:19Z [FLINK-1927][py] Operator distribution rework Python operators are no longer serialized and shipped across the cluster. Instead the plan file is executed on each node, followed by usage of the respective operator object. removed dill library [FLINK-2173] filepaths are always explicitly passed to python [Py] Rework operator distribution - Key: FLINK-1927 URL: https://issues.apache.org/jira/browse/FLINK-1927 Project: Flink Issue Type: Improvement Components: Python API Affects Versions: 0.9 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Priority: Minor Fix For: 0.9 Currently, the python operator is created when execution the python plan file, serialized using dill and saved as a byte[] in the java function. It is then deserialized at runtime on each node. The current implementation is fairly hacky, and imposes certain limitations that make it hard to work with. Chaining, or generally saving other user-code, always requires a separate deserialization step after deserializing the operator. These issues can be easily circumvented by rebuilding the (python) plan on each node, instead of serializing the operator. The plan creation is deterministic, and every operator is uniquely identified by an ID that is already known to the java function. This change will allow us to easily support custom serializers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2346) Mesos clustering
[ https://issues.apache.org/jira/browse/FLINK-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger closed FLINK-2346. - Resolution: Duplicate I'm closing this issue, its a duplicate of FLINK-1984. Mesos clustering Key: FLINK-2346 URL: https://issues.apache.org/jira/browse/FLINK-2346 Project: Flink Issue Type: New Feature Reporter: Suminda Dharmasena -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2388] return AccumulatorResultsNotFound...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/930#discussion_r35193777 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -688,16 +688,15 @@ class JobManager( case RequestAccumulatorResults(jobID) = try { - val accumulatorValues: java.util.Map[String, SerializedValue[Object]] = { -currentJobs.get(jobID) match { - case Some((graph, jobInfo)) = -graph.getAccumulatorsSerialized - case None = -null // TODO check also archive -} + currentJobs.get(jobID) match { +case Some((graph, jobInfo)) = + val accumulatorValues = graph.getAccumulatorsSerialized + sender() ! AccumulatorResultsFound(jobID, accumulatorValues) +case None = + // TODO check also archive + sender() ! AccumulatorResultsNotFound(jobID) --- End diff -- Why not properly resolving the TODO? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2200) Flink API with Scala 2.11 - Maven Repository
[ https://issues.apache.org/jira/browse/FLINK-2200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636575#comment-14636575 ] ASF GitHub Bot commented on FLINK-2200: --- Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/885#discussion_r35194868 --- Diff: flink-contrib/flink-storm-compatibility/flink-storm-compatibility-examples/src/assembly/word-count-storm.xml --- @@ -36,7 +36,7 @@ under the License. includes !-- need to be added explicitly to get 'defaults.yaml' -- includeorg.apache.storm:storm-core:jar/include - includeorg.apache.flink:flink-storm-examples:jar/include + includeorg.apache.flink:flink-storm-compatibility-examples${scala.suffix}:jar/include --- End diff -- good finding ;) Flink API with Scala 2.11 - Maven Repository Key: FLINK-2200 URL: https://issues.apache.org/jira/browse/FLINK-2200 Project: Flink Issue Type: Wish Components: Build System, Scala API Reporter: Philipp Götze Assignee: Chiwan Park Priority: Trivial Labels: maven It would be nice if you could upload a pre-built version of the Flink API with Scala 2.11 to the maven repository. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2200] Add Flink with Scala 2.11 in Mave...
Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/885#discussion_r35194868 --- Diff: flink-contrib/flink-storm-compatibility/flink-storm-compatibility-examples/src/assembly/word-count-storm.xml --- @@ -36,7 +36,7 @@ under the License. includes !-- need to be added explicitly to get 'defaults.yaml' -- includeorg.apache.storm:storm-core:jar/include - includeorg.apache.flink:flink-storm-examples:jar/include + includeorg.apache.flink:flink-storm-compatibility-examples${scala.suffix}:jar/include --- End diff -- good finding ;) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2218) Web client cannot distinguesh between Flink options and program arguments
[ https://issues.apache.org/jira/browse/FLINK-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636627#comment-14636627 ] ASF GitHub Bot commented on FLINK-2218: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/904#issuecomment-123648113 This only occurs with your changes, right? Then let's first fix the issue of the failing tests before merging. Web client cannot distinguesh between Flink options and program arguments - Key: FLINK-2218 URL: https://issues.apache.org/jira/browse/FLINK-2218 Project: Flink Issue Type: Improvement Components: Webfrontend Affects Versions: master Reporter: Ufuk Celebi Assignee: Matthias J. Sax WebClient has only one input field for arguments. This field is used for Flink options (e.g., `-p`) and program arguments. Thus, supported Flink options restrict the possible program arguments. CliFrontend in contrast can distinguish both and thus `-p` can also be used as an program argument. Solution: add a second input field for Flink options to WebClient -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2218] Web client cannot distinguesh bet...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/904#issuecomment-123648113 This only occurs with your changes, right? Then let's first fix the issue of the failing tests before merging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2388) JobManager should try retrieving jobs from archive
[ https://issues.apache.org/jira/browse/FLINK-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636740#comment-14636740 ] ASF GitHub Bot commented on FLINK-2388: --- Github user ebautistabar commented on a diff in the pull request: https://github.com/apache/flink/pull/930#discussion_r35202991 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -687,61 +687,23 @@ class JobManager( message match { case RequestAccumulatorResults(jobID) = -try { --- End diff -- Are the try/catch unnecessary? JobManager should try retrieving jobs from archive -- Key: FLINK-2388 URL: https://issues.apache.org/jira/browse/FLINK-2388 Project: Flink Issue Type: Task Components: JobManager Reporter: Enrique Bautista Barahona I was following the quickstart guide with the WordCount example and, when I entered the analyze page for the job, nothing came up. Apparently the JobManagerInfoServlet fails with a NullPointerException. I've been reading the code and I've seen the problem is in the processing of the RequestAccumulatorResultsStringified message in JobManager. There's a TODO where the accumulators should be retrieved from the archive. As I wanted to know more about Flink internals, I decided to try and fix it. I've later seen that there's currently ongoing work in that part of the code, so I guess maybe it's not needed, but if you want I could submit a PR. If you have already taken it into account and will solve it shortly, please feel free to close the issue. If you want to take a look, the commit is here: https://github.com/ebautistabar/flink/commit/8536352b21fb6c78ad8840b0397509df04358c6b -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2388) JobManager should try retrieving jobs from archive
[ https://issues.apache.org/jira/browse/FLINK-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636601#comment-14636601 ] Maximilian Michels commented on FLINK-2388: --- [~ebautistabar] Can you take a look at the pull request and see if you're satisfied with that one. I just took a look at your commit as well and I think it's similar. I tried to clean up where possible. JobManager should try retrieving jobs from archive -- Key: FLINK-2388 URL: https://issues.apache.org/jira/browse/FLINK-2388 Project: Flink Issue Type: Task Components: JobManager Reporter: Enrique Bautista Barahona I was following the quickstart guide with the WordCount example and, when I entered the analyze page for the job, nothing came up. Apparently the JobManagerInfoServlet fails with a NullPointerException. I've been reading the code and I've seen the problem is in the processing of the RequestAccumulatorResultsStringified message in JobManager. There's a TODO where the accumulators should be retrieved from the archive. As I wanted to know more about Flink internals, I decided to try and fix it. I've later seen that there's currently ongoing work in that part of the code, so I guess maybe it's not needed, but if you want I could submit a PR. If you have already taken it into account and will solve it shortly, please feel free to close the issue. If you want to take a look, the commit is here: https://github.com/ebautistabar/flink/commit/8536352b21fb6c78ad8840b0397509df04358c6b -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2200) Flink API with Scala 2.11 - Maven Repository
[ https://issues.apache.org/jira/browse/FLINK-2200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636598#comment-14636598 ] ASF GitHub Bot commented on FLINK-2200: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/885#issuecomment-123643760 No need to hurry .. I needed 10 days to look at it .. so a few hours don't matter ;) I tried to install the artifacts to my local repository, but the _2.11 suffix is not added to the artifacts. There are a few points still missing: the parent pom probably also needs the suffix, + all the parent definitions in the modules and the module.../module definitions in the pom projects. In this commit you can see what I mean: https://github.com/rmetzger/flink/commit/77622269a50babfdc85f46a4235bcdf093cb8e50 Please let me know if you have further questions. Flink API with Scala 2.11 - Maven Repository Key: FLINK-2200 URL: https://issues.apache.org/jira/browse/FLINK-2200 Project: Flink Issue Type: Wish Components: Build System, Scala API Reporter: Philipp Götze Assignee: Chiwan Park Priority: Trivial Labels: maven It would be nice if you could upload a pre-built version of the Flink API with Scala 2.11 to the maven repository. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2200] Add Flink with Scala 2.11 in Mave...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/885#issuecomment-123643760 No need to hurry .. I needed 10 days to look at it .. so a few hours don't matter ;) I tried to install the artifacts to my local repository, but the _2.11 suffix is not added to the artifacts. There are a few points still missing: the parent pom probably also needs the suffix, + all the parent definitions in the modules and the module.../module definitions in the pom projects. In this commit you can see what I mean: https://github.com/rmetzger/flink/commit/77622269a50babfdc85f46a4235bcdf093cb8e50 Please let me know if you have further questions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2218) Web client cannot distinguesh between Flink options and program arguments
[ https://issues.apache.org/jira/browse/FLINK-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636606#comment-14636606 ] ASF GitHub Bot commented on FLINK-2218: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/904#issuecomment-123645039 +1 for merging it! Web client cannot distinguesh between Flink options and program arguments - Key: FLINK-2218 URL: https://issues.apache.org/jira/browse/FLINK-2218 Project: Flink Issue Type: Improvement Components: Webfrontend Affects Versions: master Reporter: Ufuk Celebi Assignee: Matthias J. Sax WebClient has only one input field for arguments. This field is used for Flink options (e.g., `-p`) and program arguments. Thus, supported Flink options restrict the possible program arguments. CliFrontend in contrast can distinguish both and thus `-p` can also be used as an program argument. Solution: add a second input field for Flink options to WebClient -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2218] Web client cannot distinguesh bet...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/904#issuecomment-123645039 +1 for merging it! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2388] return AccumulatorResultsNotFound...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/930#issuecomment-123649614 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2388) JobManager should try retrieving jobs from archive
[ https://issues.apache.org/jira/browse/FLINK-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636741#comment-14636741 ] ASF GitHub Bot commented on FLINK-2388: --- Github user ebautistabar commented on a diff in the pull request: https://github.com/apache/flink/pull/930#discussion_r35202994 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionGraph.java --- @@ -580,6 +581,34 @@ public ExecutionContext getExecutionContext() { return result; } + /** +* Returns the a stringified version of the user-defined accumulators. +* @return an Array containing the StringifiedAccumulatorResult objects +*/ + public StringifiedAccumulatorResult[] getAccumulatorResultsStringified() { + + MapString, Accumulator?, ? accumulatorMap = aggregateUserAccumulators(); + + int num = accumulatorMap.size(); + StringifiedAccumulatorResult[] resultStrings = new StringifiedAccumulatorResult[num]; + + int i = 0; + for (Map.EntryString, Accumulator?, ? entry : accumulatorMap.entrySet()) { + + StringifiedAccumulatorResult result; + Accumulator?, ? value = entry.getValue(); + if (value != null) { + result = new StringifiedAccumulatorResult(entry.getKey(), value.getClass().getSimpleName(), value.toString()); + } else { + result = new StringifiedAccumulatorResult(entry.getKey(), null, null); --- End diff -- Why do you use the string `null` instead of `null`? JobManager should try retrieving jobs from archive -- Key: FLINK-2388 URL: https://issues.apache.org/jira/browse/FLINK-2388 Project: Flink Issue Type: Task Components: JobManager Reporter: Enrique Bautista Barahona I was following the quickstart guide with the WordCount example and, when I entered the analyze page for the job, nothing came up. Apparently the JobManagerInfoServlet fails with a NullPointerException. I've been reading the code and I've seen the problem is in the processing of the RequestAccumulatorResultsStringified message in JobManager. There's a TODO where the accumulators should be retrieved from the archive. As I wanted to know more about Flink internals, I decided to try and fix it. I've later seen that there's currently ongoing work in that part of the code, so I guess maybe it's not needed, but if you want I could submit a PR. If you have already taken it into account and will solve it shortly, please feel free to close the issue. If you want to take a look, the commit is here: https://github.com/ebautistabar/flink/commit/8536352b21fb6c78ad8840b0397509df04358c6b -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2388] return AccumulatorResultsNotFound...
Github user ebautistabar commented on a diff in the pull request: https://github.com/apache/flink/pull/930#discussion_r35202994 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionGraph.java --- @@ -580,6 +581,34 @@ public ExecutionContext getExecutionContext() { return result; } + /** +* Returns the a stringified version of the user-defined accumulators. +* @return an Array containing the StringifiedAccumulatorResult objects +*/ + public StringifiedAccumulatorResult[] getAccumulatorResultsStringified() { + + MapString, Accumulator?, ? accumulatorMap = aggregateUserAccumulators(); + + int num = accumulatorMap.size(); + StringifiedAccumulatorResult[] resultStrings = new StringifiedAccumulatorResult[num]; + + int i = 0; + for (Map.EntryString, Accumulator?, ? entry : accumulatorMap.entrySet()) { + + StringifiedAccumulatorResult result; + Accumulator?, ? value = entry.getValue(); + if (value != null) { + result = new StringifiedAccumulatorResult(entry.getKey(), value.getClass().getSimpleName(), value.toString()); + } else { + result = new StringifiedAccumulatorResult(entry.getKey(), null, null); --- End diff -- Why do you use the string `null` instead of `null`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2388] return AccumulatorResultsNotFound...
Github user ebautistabar commented on a diff in the pull request: https://github.com/apache/flink/pull/930#discussion_r35202991 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -687,61 +687,23 @@ class JobManager( message match { case RequestAccumulatorResults(jobID) = -try { --- End diff -- Are the try/catch unnecessary? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2388) JobManager should try retrieving jobs from archive
[ https://issues.apache.org/jira/browse/FLINK-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636591#comment-14636591 ] ASF GitHub Bot commented on FLINK-2388: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/930#discussion_r35195856 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -688,16 +688,15 @@ class JobManager( case RequestAccumulatorResults(jobID) = try { - val accumulatorValues: java.util.Map[String, SerializedValue[Object]] = { -currentJobs.get(jobID) match { - case Some((graph, jobInfo)) = -graph.getAccumulatorsSerialized - case None = -null // TODO check also archive -} + currentJobs.get(jobID) match { +case Some((graph, jobInfo)) = + val accumulatorValues = graph.getAccumulatorsSerialized + sender() ! AccumulatorResultsFound(jobID, accumulatorValues) +case None = + // TODO check also archive + sender() ! AccumulatorResultsNotFound(jobID) --- End diff -- I've updated the pull request. JobManager should try retrieving jobs from archive -- Key: FLINK-2388 URL: https://issues.apache.org/jira/browse/FLINK-2388 Project: Flink Issue Type: Task Components: JobManager Reporter: Enrique Bautista Barahona I was following the quickstart guide with the WordCount example and, when I entered the analyze page for the job, nothing came up. Apparently the JobManagerInfoServlet fails with a NullPointerException. I've been reading the code and I've seen the problem is in the processing of the RequestAccumulatorResultsStringified message in JobManager. There's a TODO where the accumulators should be retrieved from the archive. As I wanted to know more about Flink internals, I decided to try and fix it. I've later seen that there's currently ongoing work in that part of the code, so I guess maybe it's not needed, but if you want I could submit a PR. If you have already taken it into account and will solve it shortly, please feel free to close the issue. If you want to take a look, the commit is here: https://github.com/ebautistabar/flink/commit/8536352b21fb6c78ad8840b0397509df04358c6b -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2388] return AccumulatorResultsNotFound...
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/930#discussion_r35195856 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -688,16 +688,15 @@ class JobManager( case RequestAccumulatorResults(jobID) = try { - val accumulatorValues: java.util.Map[String, SerializedValue[Object]] = { -currentJobs.get(jobID) match { - case Some((graph, jobInfo)) = -graph.getAccumulatorsSerialized - case None = -null // TODO check also archive -} + currentJobs.get(jobID) match { +case Some((graph, jobInfo)) = + val accumulatorValues = graph.getAccumulatorsSerialized + sender() ! AccumulatorResultsFound(jobID, accumulatorValues) +case None = + // TODO check also archive + sender() ! AccumulatorResultsNotFound(jobID) --- End diff -- I've updated the pull request. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2391) Storm-compatibility:method FlinkTopologyBuilder.createTopology() throws java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/FLINK-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636603#comment-14636603 ] ASF GitHub Bot commented on FLINK-2391: --- GitHub user ffbin opened a pull request: https://github.com/apache/flink/pull/932 [FLINK-2391]Fix Storm-compatibility FlinkTopologyBuilder.createTopology bug 1.Error Scene: Error happend in program like this: builder.setSpout(source0, new Generator(pt), pt.getInt(sourceParallelism)); builder.setBolt(sa, new RepartPassThroughBolt(pt), pt.getInt(sinkParallelism)).fieldsGrouping(source0, new Fields(id)); builder.setBolt(sink, new Sink(pt), pt.getInt(sinkParallelism)).fieldsGrouping(sa, new Fields(id)); final FlinkLocalCluster cluster = FlinkLocalCluster.getLocalCluster(); cluster.submitTopology(throughput, conf, builder.createTopology()); if the last bolt use fieldsGrouping, createTopology will throw NullPointerException. 2.Reason: where get streaming group attribute index, it get downstream operator outputFields,this is error。Because the last bolt has no outputFields, so the outputSchema of declarer in null and throw NullPointerException. 3.Modify: where get streaming group attribute index, it get upstream operator outputFields. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ffbin/flink FFB_Flink Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/932.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #932 commit e9c098670a14020d38f9b5e05442a2a606d90dcb Author: ffbin 869218...@qq.com Date: 2015-07-22T09:04:45Z modify Storm-compatibility:method FlinkTopologyBuilder.createTopology() throws java.lang.NullPointerException -- Key: FLINK-2391 URL: https://issues.apache.org/jira/browse/FLINK-2391 Project: Flink Issue Type: Bug Components: flink-contrib Affects Versions: 0.10 Environment: win7 32bit;linux Reporter: Huang Wei Labels: features Fix For: 0.10 Original Estimate: 168h Remaining Estimate: 168h core dumped at FlinkOutputFieldsDeclarer.java : 160(package FlinkOutputFieldsDeclarer). code : fieldIndexes[i] = this.outputSchema.fieldIndex(groupingFields.get(i)); in this line, the var this.outputSchema may be null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2391]Fix Storm-compatibility FlinkTopol...
GitHub user ffbin opened a pull request: https://github.com/apache/flink/pull/932 [FLINK-2391]Fix Storm-compatibility FlinkTopologyBuilder.createTopology bug 1.Error Scene: Error happend in program like this: builder.setSpout(source0, new Generator(pt), pt.getInt(sourceParallelism)); builder.setBolt(sa, new RepartPassThroughBolt(pt), pt.getInt(sinkParallelism)).fieldsGrouping(source0, new Fields(id)); builder.setBolt(sink, new Sink(pt), pt.getInt(sinkParallelism)).fieldsGrouping(sa, new Fields(id)); final FlinkLocalCluster cluster = FlinkLocalCluster.getLocalCluster(); cluster.submitTopology(throughput, conf, builder.createTopology()); if the last bolt use fieldsGrouping, createTopology will throw NullPointerException. 2.Reason: where get streaming group attribute index, it get downstream operator outputFields,this is errorãBecause the last bolt has no outputFields, so the outputSchema of declarer in null and throw NullPointerException. 3.Modify: where get streaming group attribute index, it get upstream operator outputFields. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ffbin/flink FFB_Flink Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/932.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #932 commit e9c098670a14020d38f9b5e05442a2a606d90dcb Author: ffbin 869218...@qq.com Date: 2015-07-22T09:04:45Z modify --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2218] Web client cannot distinguesh bet...
Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/904#issuecomment-123646720 You are right. I just updated it. What about the failing YARN test. Is there already a JIRA for it? https://travis-ci.org/apache/flink/builds/72019686 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1658] Rename AbstractEvent to AbstractT...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/929#issuecomment-123646781 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1658) Rename AbstractEvent to AbstractTaskEvent and AbstractJobEvent
[ https://issues.apache.org/jira/browse/FLINK-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636621#comment-14636621 ] ASF GitHub Bot commented on FLINK-1658: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/929#issuecomment-123646781 +1 Rename AbstractEvent to AbstractTaskEvent and AbstractJobEvent -- Key: FLINK-1658 URL: https://issues.apache.org/jira/browse/FLINK-1658 Project: Flink Issue Type: Improvement Components: Distributed Runtime, Local Runtime Reporter: Gyula Fora Assignee: Matthias J. Sax Priority: Trivial The same name is used for different event classes in the runtime which can cause confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2218) Web client cannot distinguesh between Flink options and program arguments
[ https://issues.apache.org/jira/browse/FLINK-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636620#comment-14636620 ] ASF GitHub Bot commented on FLINK-2218: --- Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/904#issuecomment-123646720 You are right. I just updated it. What about the failing YARN test. Is there already a JIRA for it? https://travis-ci.org/apache/flink/builds/72019686 Web client cannot distinguesh between Flink options and program arguments - Key: FLINK-2218 URL: https://issues.apache.org/jira/browse/FLINK-2218 Project: Flink Issue Type: Improvement Components: Webfrontend Affects Versions: master Reporter: Ufuk Celebi Assignee: Matthias J. Sax WebClient has only one input field for arguments. This field is used for Flink options (e.g., `-p`) and program arguments. Thus, supported Flink options restrict the possible program arguments. CliFrontend in contrast can distinguish both and thus `-p` can also be used as an program argument. Solution: add a second input field for Flink options to WebClient -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-2388) JobManager should try retrieving jobs from archive
[ https://issues.apache.org/jira/browse/FLINK-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636727#comment-14636727 ] Enrique Bautista Barahona edited comment on FLINK-2388 at 7/22/15 11:10 AM: Sure [~mxm]. As far as I can see it's very similar. I have some comments/doubts that I will post in the PR. was (Author: ebautistabar): Sure [~mxm], I'll take a look. JobManager should try retrieving jobs from archive -- Key: FLINK-2388 URL: https://issues.apache.org/jira/browse/FLINK-2388 Project: Flink Issue Type: Task Components: JobManager Reporter: Enrique Bautista Barahona I was following the quickstart guide with the WordCount example and, when I entered the analyze page for the job, nothing came up. Apparently the JobManagerInfoServlet fails with a NullPointerException. I've been reading the code and I've seen the problem is in the processing of the RequestAccumulatorResultsStringified message in JobManager. There's a TODO where the accumulators should be retrieved from the archive. As I wanted to know more about Flink internals, I decided to try and fix it. I've later seen that there's currently ongoing work in that part of the code, so I guess maybe it's not needed, but if you want I could submit a PR. If you have already taken it into account and will solve it shortly, please feel free to close the issue. If you want to take a look, the commit is here: https://github.com/ebautistabar/flink/commit/8536352b21fb6c78ad8840b0397509df04358c6b -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2388) JobManager should try retrieving jobs from archive
[ https://issues.apache.org/jira/browse/FLINK-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636727#comment-14636727 ] Enrique Bautista Barahona commented on FLINK-2388: -- Sure [~mxm], I'll take a look. JobManager should try retrieving jobs from archive -- Key: FLINK-2388 URL: https://issues.apache.org/jira/browse/FLINK-2388 Project: Flink Issue Type: Task Components: JobManager Reporter: Enrique Bautista Barahona I was following the quickstart guide with the WordCount example and, when I entered the analyze page for the job, nothing came up. Apparently the JobManagerInfoServlet fails with a NullPointerException. I've been reading the code and I've seen the problem is in the processing of the RequestAccumulatorResultsStringified message in JobManager. There's a TODO where the accumulators should be retrieved from the archive. As I wanted to know more about Flink internals, I decided to try and fix it. I've later seen that there's currently ongoing work in that part of the code, so I guess maybe it's not needed, but if you want I could submit a PR. If you have already taken it into account and will solve it shortly, please feel free to close the issue. If you want to take a look, the commit is here: https://github.com/ebautistabar/flink/commit/8536352b21fb6c78ad8840b0397509df04358c6b -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2218] Web client cannot distinguesh bet...
Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/904#issuecomment-123674841 I don't know exactly. My changes should not influence YARN component (as fas as I can tell). YARN test have been instable a lot -- I would guess, it is a general issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2200) Flink API with Scala 2.11 - Maven Repository
[ https://issues.apache.org/jira/browse/FLINK-2200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636578#comment-14636578 ] Robert Metzger commented on FLINK-2200: --- [~pgoetze], can you check if Chiwan's pull request is resolving your issue? Flink API with Scala 2.11 - Maven Repository Key: FLINK-2200 URL: https://issues.apache.org/jira/browse/FLINK-2200 Project: Flink Issue Type: Wish Components: Build System, Scala API Reporter: Philipp Götze Assignee: Chiwan Park Priority: Trivial Labels: maven It would be nice if you could upload a pre-built version of the Flink API with Scala 2.11 to the maven repository. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2388] return AccumulatorResultsNotFound...
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/930#discussion_r35195608 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -688,16 +688,15 @@ class JobManager( case RequestAccumulatorResults(jobID) = try { - val accumulatorValues: java.util.Map[String, SerializedValue[Object]] = { -currentJobs.get(jobID) match { - case Some((graph, jobInfo)) = -graph.getAccumulatorsSerialized - case None = -null // TODO check also archive -} + currentJobs.get(jobID) match { +case Some((graph, jobInfo)) = + val accumulatorValues = graph.getAccumulatorsSerialized + sender() ! AccumulatorResultsFound(jobID, accumulatorValues) +case None = + // TODO check also archive + sender() ! AccumulatorResultsNotFound(jobID) --- End diff -- See the JIRA for an explanation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2200] Add Flink with Scala 2.11 in Mave...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/885#issuecomment-123640287 Sorry for the delay. I will deploy the artifacts from this branch to the maven snapshot repository to see if everything works as expected. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2200) Flink API with Scala 2.11 - Maven Repository
[ https://issues.apache.org/jira/browse/FLINK-2200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636589#comment-14636589 ] ASF GitHub Bot commented on FLINK-2200: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/885#issuecomment-123640287 Sorry for the delay. I will deploy the artifacts from this branch to the maven snapshot repository to see if everything works as expected. Flink API with Scala 2.11 - Maven Repository Key: FLINK-2200 URL: https://issues.apache.org/jira/browse/FLINK-2200 Project: Flink Issue Type: Wish Components: Build System, Scala API Reporter: Philipp Götze Assignee: Chiwan Park Priority: Trivial Labels: maven It would be nice if you could upload a pre-built version of the Flink API with Scala 2.11 to the maven repository. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2200] Add Flink with Scala 2.11 in Mave...
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/885#issuecomment-123642852 Hi, currently this PR is not ready to merge, because this PR doesn't contain changes for #677. I'll update soon. Unfortunately I'm outside now. Maybe I can update this PR in 4-5 hours. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2200) Flink API with Scala 2.11 - Maven Repository
[ https://issues.apache.org/jira/browse/FLINK-2200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636595#comment-14636595 ] ASF GitHub Bot commented on FLINK-2200: --- Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/885#issuecomment-123642852 Hi, currently this PR is not ready to merge, because this PR doesn't contain changes for #677. I'll update soon. Unfortunately I'm outside now. Maybe I can update this PR in 4-5 hours. Flink API with Scala 2.11 - Maven Repository Key: FLINK-2200 URL: https://issues.apache.org/jira/browse/FLINK-2200 Project: Flink Issue Type: Wish Components: Build System, Scala API Reporter: Philipp Götze Assignee: Chiwan Park Priority: Trivial Labels: maven It would be nice if you could upload a pre-built version of the Flink API with Scala 2.11 to the maven repository. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2371) AccumulatorLiveITCase fails
[ https://issues.apache.org/jira/browse/FLINK-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax closed FLINK-2371. -- Fixed test passed 10 Travis runs for me without failing. Seems to be stable now. AccumulatorLiveITCase fails --- Key: FLINK-2371 URL: https://issues.apache.org/jira/browse/FLINK-2371 Project: Flink Issue Type: Bug Components: Tests Reporter: Matthias J. Sax Assignee: Maximilian Michels AccumulatorLiveITCase fails regularly (however, not in each run). The tests relies on timing (via sleep) which does not work well on Travis. See dev-list for more details: https://mail-archives.apache.org/mod_mbox/flink-dev/201507.mbox/browser AccumulatorLiveITCase.testProgram:106-access$1100:68-checkFlinkAccumulators:189 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-685) Add support for semi-joins
[ https://issues.apache.org/jira/browse/FLINK-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636714#comment-14636714 ] Stephan Ewen commented on FLINK-685: So far, we have had the paradigm that any operation in Flink's batch API must be size safe, so it has to scale to groups beyond main memory without an {{OutOfMemoryError}}. This put a bit of a hurdle on implementing certain operations. How about we start a class for extended operations, where we add simpler implementations? We would not give these guarantees to handle super large groups, but that is probably still good enough for a big part of the use cases. We could add your implementation to that class of extended operations. Add support for semi-joins -- Key: FLINK-685 URL: https://issues.apache.org/jira/browse/FLINK-685 Project: Flink Issue Type: New Feature Reporter: GitHub Import Assignee: pietro pinoli Priority: Minor Labels: github-import Fix For: pre-apache A semi-join is basically a join filter. One input is filtering and the other one is filtered. A tuple of the filtered input is emitted exactly once if the filtering input has one (ore more) tuples with matching join keys. That means that the output of a semi-join has the same type as the filtered input and the filtering input is completely discarded. In order to support a semi-join, we need to add an additional physical execution strategy, that ensures, that a tuple of the filtered input is emitted only once if the filtering input has more than one tuple with matching keys. Furthermore, a couple of optimizations compared to standard joins can be done such as storing only keys and not the full tuple of the filtering input in a hash table. Imported from GitHub Url: https://github.com/stratosphere/stratosphere/issues/685 Created by: [fhueske|https://github.com/fhueske] Labels: enhancement, java api, runtime, Milestone: Release 0.6 (unplanned) Created at: Mon Apr 14 12:05:29 CEST 2014 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2371) AccumulatorLiveITCase fails
[ https://issues.apache.org/jira/browse/FLINK-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636632#comment-14636632 ] Maximilian Michels commented on FLINK-2371: --- Good to hear :) AccumulatorLiveITCase fails --- Key: FLINK-2371 URL: https://issues.apache.org/jira/browse/FLINK-2371 Project: Flink Issue Type: Bug Components: Tests Reporter: Matthias J. Sax Assignee: Maximilian Michels AccumulatorLiveITCase fails regularly (however, not in each run). The tests relies on timing (via sleep) which does not work well on Travis. See dev-list for more details: https://mail-archives.apache.org/mod_mbox/flink-dev/201507.mbox/browser AccumulatorLiveITCase.testProgram:106-access$1100:68-checkFlinkAccumulators:189 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2388) JobManager should try retrieving jobs from archive
[ https://issues.apache.org/jira/browse/FLINK-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636635#comment-14636635 ] ASF GitHub Bot commented on FLINK-2388: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/930#issuecomment-123649614 LGTM JobManager should try retrieving jobs from archive -- Key: FLINK-2388 URL: https://issues.apache.org/jira/browse/FLINK-2388 Project: Flink Issue Type: Task Components: JobManager Reporter: Enrique Bautista Barahona I was following the quickstart guide with the WordCount example and, when I entered the analyze page for the job, nothing came up. Apparently the JobManagerInfoServlet fails with a NullPointerException. I've been reading the code and I've seen the problem is in the processing of the RequestAccumulatorResultsStringified message in JobManager. There's a TODO where the accumulators should be retrieved from the archive. As I wanted to know more about Flink internals, I decided to try and fix it. I've later seen that there's currently ongoing work in that part of the code, so I guess maybe it's not needed, but if you want I could submit a PR. If you have already taken it into account and will solve it shortly, please feel free to close the issue. If you want to take a look, the commit is here: https://github.com/ebautistabar/flink/commit/8536352b21fb6c78ad8840b0397509df04358c6b -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2388) JobManager should try retrieving jobs from archive
[ https://issues.apache.org/jira/browse/FLINK-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636764#comment-14636764 ] ASF GitHub Bot commented on FLINK-2388: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/930#discussion_r35204559 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionGraph.java --- @@ -580,6 +581,34 @@ public ExecutionContext getExecutionContext() { return result; } + /** +* Returns the a stringified version of the user-defined accumulators. +* @return an Array containing the StringifiedAccumulatorResult objects +*/ + public StringifiedAccumulatorResult[] getAccumulatorResultsStringified() { + + MapString, Accumulator?, ? accumulatorMap = aggregateUserAccumulators(); + + int num = accumulatorMap.size(); + StringifiedAccumulatorResult[] resultStrings = new StringifiedAccumulatorResult[num]; + + int i = 0; + for (Map.EntryString, Accumulator?, ? entry : accumulatorMap.entrySet()) { + + StringifiedAccumulatorResult result; + Accumulator?, ? value = entry.getValue(); + if (value != null) { + result = new StringifiedAccumulatorResult(entry.getKey(), value.getClass().getSimpleName(), value.toString()); + } else { + result = new StringifiedAccumulatorResult(entry.getKey(), null, null); --- End diff -- StringifiedAccumulatorResult expects a String. I thought it's better to pass a null String instead of null. JobManager should try retrieving jobs from archive -- Key: FLINK-2388 URL: https://issues.apache.org/jira/browse/FLINK-2388 Project: Flink Issue Type: Task Components: JobManager Reporter: Enrique Bautista Barahona I was following the quickstart guide with the WordCount example and, when I entered the analyze page for the job, nothing came up. Apparently the JobManagerInfoServlet fails with a NullPointerException. I've been reading the code and I've seen the problem is in the processing of the RequestAccumulatorResultsStringified message in JobManager. There's a TODO where the accumulators should be retrieved from the archive. As I wanted to know more about Flink internals, I decided to try and fix it. I've later seen that there's currently ongoing work in that part of the code, so I guess maybe it's not needed, but if you want I could submit a PR. If you have already taken it into account and will solve it shortly, please feel free to close the issue. If you want to take a look, the commit is here: https://github.com/ebautistabar/flink/commit/8536352b21fb6c78ad8840b0397509df04358c6b -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2388] return AccumulatorResultsNotFound...
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/930#discussion_r35204559 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionGraph.java --- @@ -580,6 +581,34 @@ public ExecutionContext getExecutionContext() { return result; } + /** +* Returns the a stringified version of the user-defined accumulators. +* @return an Array containing the StringifiedAccumulatorResult objects +*/ + public StringifiedAccumulatorResult[] getAccumulatorResultsStringified() { + + MapString, Accumulator?, ? accumulatorMap = aggregateUserAccumulators(); + + int num = accumulatorMap.size(); + StringifiedAccumulatorResult[] resultStrings = new StringifiedAccumulatorResult[num]; + + int i = 0; + for (Map.EntryString, Accumulator?, ? entry : accumulatorMap.entrySet()) { + + StringifiedAccumulatorResult result; + Accumulator?, ? value = entry.getValue(); + if (value != null) { + result = new StringifiedAccumulatorResult(entry.getKey(), value.getClass().getSimpleName(), value.toString()); + } else { + result = new StringifiedAccumulatorResult(entry.getKey(), null, null); --- End diff -- StringifiedAccumulatorResult expects a String. I thought it's better to pass a null String instead of null. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2388] return AccumulatorResultsNotFound...
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/930#discussion_r35204984 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -687,61 +687,23 @@ class JobManager( message match { case RequestAccumulatorResults(jobID) = -try { --- End diff -- You're right, this should stay in there. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2205) Confusing entries in JM Webfrontend Job Configuration section
[ https://issues.apache.org/jira/browse/FLINK-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636798#comment-14636798 ] ASF GitHub Bot commented on FLINK-2205: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/927#issuecomment-123700358 The job manager UI is under rework right now, there is a new version coming up, see here: https://github.com/apache/flink/tree/master/flink-runtime-web This fix (if going to the old UI) would be only temporary... Confusing entries in JM Webfrontend Job Configuration section - Key: FLINK-2205 URL: https://issues.apache.org/jira/browse/FLINK-2205 Project: Flink Issue Type: Bug Components: Webfrontend Affects Versions: 0.9 Reporter: Fabian Hueske Priority: Minor Labels: starter The Job Configuration section of the job history / analyze page of the JobManager webinterface contains two confusing entries: - {{Number of execution retries}} is actually the maximum number of retries and should be renamed accordingly. The default value is -1 and should be changed to deactivated (or 0). - {{Job parallelism}} which is -1 by default. A parallelism of -1 is not very meaningful. It would be better to show something like auto -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2205] Fix confusing entries in JM UI Jo...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/927#issuecomment-123700358 The job manager UI is under rework right now, there is a new version coming up, see here: https://github.com/apache/flink/tree/master/flink-runtime-web This fix (if going to the old UI) would be only temporary... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: FLINK-2322 Unclosed stream may leak resource
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/928#issuecomment-123700349 Maybe your vim is set to replace tabs with spaces, but in the files that you changed there are definitely spaces now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2322) Unclosed stream may leak resource
[ https://issues.apache.org/jira/browse/FLINK-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636797#comment-14636797 ] ASF GitHub Bot commented on FLINK-2322: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/928#issuecomment-123700349 Maybe your vim is set to replace tabs with spaces, but in the files that you changed there are definitely spaces now. Unclosed stream may leak resource - Key: FLINK-2322 URL: https://issues.apache.org/jira/browse/FLINK-2322 Project: Flink Issue Type: Bug Reporter: Ted Yu Labels: starter In UdfAnalyzerUtils.java : {code} ClassReader cr = new ClassReader(Thread.currentThread().getContextClassLoader() .getResourceAsStream(internalClassName.replace('.', '/') + .class)); {code} The stream returned by getResourceAsStream() should be closed upon exit of findMethodNode() In ParameterTool#fromPropertiesFile(): {code} props.load(new FileInputStream(propertiesFile)); {code} The FileInputStream should be closed before returning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1658] Rename AbstractEvent to AbstractT...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/929 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1658) Rename AbstractEvent to AbstractTaskEvent and AbstractJobEvent
[ https://issues.apache.org/jira/browse/FLINK-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636828#comment-14636828 ] ASF GitHub Bot commented on FLINK-1658: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/929 Rename AbstractEvent to AbstractTaskEvent and AbstractJobEvent -- Key: FLINK-1658 URL: https://issues.apache.org/jira/browse/FLINK-1658 Project: Flink Issue Type: Improvement Components: Distributed Runtime, Local Runtime Reporter: Gyula Fora Assignee: Matthias J. Sax Priority: Trivial The same name is used for different event classes in the runtime which can cause confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2218) Web client cannot distinguesh between Flink options and program arguments
[ https://issues.apache.org/jira/browse/FLINK-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636854#comment-14636854 ] ASF GitHub Bot commented on FLINK-2218: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/904#issuecomment-123707491 Yes, please open a JIRA. Merging. Web client cannot distinguesh between Flink options and program arguments - Key: FLINK-2218 URL: https://issues.apache.org/jira/browse/FLINK-2218 Project: Flink Issue Type: Improvement Components: Webfrontend Affects Versions: master Reporter: Ufuk Celebi Assignee: Matthias J. Sax WebClient has only one input field for arguments. This field is used for Flink options (e.g., `-p`) and program arguments. Thus, supported Flink options restrict the possible program arguments. CliFrontend in contrast can distinguish both and thus `-p` can also be used as an program argument. Solution: add a second input field for Flink options to WebClient -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2218) Web client cannot distinguesh between Flink options and program arguments
[ https://issues.apache.org/jira/browse/FLINK-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels resolved FLINK-2218. --- Resolution: Fixed Web client cannot distinguesh between Flink options and program arguments - Key: FLINK-2218 URL: https://issues.apache.org/jira/browse/FLINK-2218 Project: Flink Issue Type: Improvement Components: Webfrontend Affects Versions: master Reporter: Ufuk Celebi Assignee: Matthias J. Sax WebClient has only one input field for arguments. This field is used for Flink options (e.g., `-p`) and program arguments. Thus, supported Flink options restrict the possible program arguments. CliFrontend in contrast can distinguish both and thus `-p` can also be used as an program argument. Solution: add a second input field for Flink options to WebClient -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2332] [runtime] Adds leader session IDs...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/917#issuecomment-123720593 The mechanism looks good, all in all. Some comments: - I think it makes the code more understandable, if the `decorateMessage()` method would be called something like `attachSession()`, or so. Is the decoration used - We have decided to gradually transition the runtime to Java, as this mixture of languages is making it very clumsy in many parts. All other changes followed the pattern to add new classes only in Java. Are there principle reasons to not do this here as well? Especially by adding classes that are at the core of this new mechanism (like `RequiresLeaderSessionID`) in Scala, we effectively cement this language blend. - In prior refactoring, we changed it such that JobManager, TaskManager, etc do not use mixins any more. A big part of the decision were clean logs and Java interoperability. This patch reverts this effort. Is there any principle reason for that? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---