[GitHub] flink pull request: [FLINK-2292][FLINK-1573] add live per-task acc...
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/896#discussion_r34335824 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java --- @@ -131,6 +133,35 @@ private SerializedValueStateHandle? operatorState; + /* Lock for updating the accumulators atomically. */ --- End diff -- If you consider comment lines as empty (non-code) lines, then it follows the classes' style. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2292][FLINK-1573] add live per-task acc...
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/896#discussion_r34335830 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java --- @@ -172,13 +173,20 @@ /** The library cache, from which the task can request its required JAR files */ private final LibraryCacheManager libraryCache; - + /** The cache for user-defined files that the invokable requires */ private final FileCache fileCache; - + /** The gateway to the network stack, which handles inputs and produced results */ private final NetworkEnvironment network; + /** The registry of this task which enables live reporting of accumulators */ + private final AccumulatorRegistry accumulatorRegistry; --- End diff -- Yes absolutely. This is an artifact of a former design. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2292) Report accumulators periodically while job is running
[ https://issues.apache.org/jira/browse/FLINK-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621978#comment-14621978 ] ASF GitHub Bot commented on FLINK-2292: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/896#discussion_r34335830 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java --- @@ -172,13 +173,20 @@ /** The library cache, from which the task can request its required JAR files */ private final LibraryCacheManager libraryCache; - + /** The cache for user-defined files that the invokable requires */ private final FileCache fileCache; - + /** The gateway to the network stack, which handles inputs and produced results */ private final NetworkEnvironment network; + /** The registry of this task which enables live reporting of accumulators */ + private final AccumulatorRegistry accumulatorRegistry; --- End diff -- Yes absolutely. This is an artifact of a former design. Report accumulators periodically while job is running - Key: FLINK-2292 URL: https://issues.apache.org/jira/browse/FLINK-2292 Project: Flink Issue Type: Sub-task Components: JobManager, TaskManager Reporter: Maximilian Michels Assignee: Maximilian Michels Fix For: 0.10 Accumulators should be sent periodically, as part of the heartbeat that sends metrics. This allows them to be updated in real time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2292) Report accumulators periodically while job is running
[ https://issues.apache.org/jira/browse/FLINK-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621981#comment-14621981 ] ASF GitHub Bot commented on FLINK-2292: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/896#discussion_r34335963 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/accumulators/AccumulatorRegistry.java --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.accumulators; + +import org.apache.flink.api.common.JobID; +import org.apache.flink.api.common.accumulators.Accumulator; +import org.apache.flink.api.common.accumulators.LongCounter; +import org.apache.flink.runtime.executiongraph.ExecutionAttemptID; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.util.HashMap; +import java.util.Map; + + +/** + * Main accumulator registry which encapsulates internal and user-defined accumulators. + */ +public class AccumulatorRegistry { --- End diff -- True. Originally, those two were a bit different. The API can be changed to be the same but I find keeping the two classes useful because it helps to differentiate acumulator context by type. Report accumulators periodically while job is running - Key: FLINK-2292 URL: https://issues.apache.org/jira/browse/FLINK-2292 Project: Flink Issue Type: Sub-task Components: JobManager, TaskManager Reporter: Maximilian Michels Assignee: Maximilian Michels Fix For: 0.10 Accumulators should be sent periodically, as part of the heartbeat that sends metrics. This allows them to be updated in real time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2292][FLINK-1573] add live per-task acc...
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/896#discussion_r34335963 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/accumulators/AccumulatorRegistry.java --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.accumulators; + +import org.apache.flink.api.common.JobID; +import org.apache.flink.api.common.accumulators.Accumulator; +import org.apache.flink.api.common.accumulators.LongCounter; +import org.apache.flink.runtime.executiongraph.ExecutionAttemptID; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.util.HashMap; +import java.util.Map; + + +/** + * Main accumulator registry which encapsulates internal and user-defined accumulators. + */ +public class AccumulatorRegistry { --- End diff -- True. Originally, those two were a bit different. The API can be changed to be the same but I find keeping the two classes useful because it helps to differentiate acumulator context by type. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2293) Division by Zero Exception
[ https://issues.apache.org/jira/browse/FLINK-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621986#comment-14621986 ] Andra Lungu commented on FLINK-2293: Hi [~StephanEwen], Thank you for your patience. Indeed I had the wrong update for Flink and then I had to fix some minor glitches in my code :) But everything is perfect now! I tested it on all the jobs that previously failed with this exception. Division by Zero Exception -- Key: FLINK-2293 URL: https://issues.apache.org/jira/browse/FLINK-2293 Project: Flink Issue Type: Bug Components: Local Runtime Affects Versions: 0.9, 0.10 Reporter: Andra Lungu Assignee: Stephan Ewen Priority: Critical Fix For: 0.10, 0.9.1 I am basically running an algorithm that simulates a Gather Sum Apply Iteration that performs Traingle Count (Why simulate it? Because you just need a superstep - useless overhead if you use the runGatherSumApply function in Graph). What happens, at a high level: 1). Select neighbors with ID greater than the one corresponding to the current vertex; 2). Propagate the received values to neighbors with higher ID; 3). compute the number of triangles by checking whether trgVertex.getValue().get(srcVertex.getId()); As you can see, I *do not* perform any division at all; code is here: https://github.com/andralungu/gelly-partitioning/blob/master/src/main/java/example/GSATriangleCount.java Now for small graphs, 50MB max, the computation finishes nicely with the correct result. For a 10GB graph, however, I got this: java.lang.ArithmeticException: / by zero at org.apache.flink.runtime.operators.hash.MutableHashTable.insertIntoTable(MutableHashTable.java:836) at org.apache.flink.runtime.operators.hash.MutableHashTable.buildTableFromSpilledPartition(MutableHashTable.java:819) at org.apache.flink.runtime.operators.hash.MutableHashTable.prepareNextPartition(MutableHashTable.java:508) at org.apache.flink.runtime.operators.hash.MutableHashTable.nextRecord(MutableHashTable.java:544) at org.apache.flink.runtime.operators.hash.NonReusingBuildFirstHashMatchIterator.callWithNextKey(NonReusingBuildFirstHashMatchIterator.java:104) at org.apache.flink.runtime.operators.MatchDriver.run(MatchDriver.java:173) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:722) see the full log here: https://gist.github.com/andralungu/984774f6348269df7951 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2293) Division by Zero Exception
[ https://issues.apache.org/jira/browse/FLINK-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andra Lungu resolved FLINK-2293. Resolution: Fixed Sorry for reopening it when it was already fixed. Never hurts to make sure that it also works in the environment where it previously failed :). Thanks again! Division by Zero Exception -- Key: FLINK-2293 URL: https://issues.apache.org/jira/browse/FLINK-2293 Project: Flink Issue Type: Bug Components: Local Runtime Affects Versions: 0.9, 0.10 Reporter: Andra Lungu Assignee: Stephan Ewen Priority: Critical Fix For: 0.10, 0.9.1 I am basically running an algorithm that simulates a Gather Sum Apply Iteration that performs Traingle Count (Why simulate it? Because you just need a superstep - useless overhead if you use the runGatherSumApply function in Graph). What happens, at a high level: 1). Select neighbors with ID greater than the one corresponding to the current vertex; 2). Propagate the received values to neighbors with higher ID; 3). compute the number of triangles by checking whether trgVertex.getValue().get(srcVertex.getId()); As you can see, I *do not* perform any division at all; code is here: https://github.com/andralungu/gelly-partitioning/blob/master/src/main/java/example/GSATriangleCount.java Now for small graphs, 50MB max, the computation finishes nicely with the correct result. For a 10GB graph, however, I got this: java.lang.ArithmeticException: / by zero at org.apache.flink.runtime.operators.hash.MutableHashTable.insertIntoTable(MutableHashTable.java:836) at org.apache.flink.runtime.operators.hash.MutableHashTable.buildTableFromSpilledPartition(MutableHashTable.java:819) at org.apache.flink.runtime.operators.hash.MutableHashTable.prepareNextPartition(MutableHashTable.java:508) at org.apache.flink.runtime.operators.hash.MutableHashTable.nextRecord(MutableHashTable.java:544) at org.apache.flink.runtime.operators.hash.NonReusingBuildFirstHashMatchIterator.callWithNextKey(NonReusingBuildFirstHashMatchIterator.java:104) at org.apache.flink.runtime.operators.MatchDriver.run(MatchDriver.java:173) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:722) see the full log here: https://gist.github.com/andralungu/984774f6348269df7951 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2293) Division by Zero Exception
[ https://issues.apache.org/jira/browse/FLINK-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621997#comment-14621997 ] Stephan Ewen commented on FLINK-2293: - No worries, just glad the bug is fixed :-) Division by Zero Exception -- Key: FLINK-2293 URL: https://issues.apache.org/jira/browse/FLINK-2293 Project: Flink Issue Type: Bug Components: Local Runtime Affects Versions: 0.9, 0.10 Reporter: Andra Lungu Assignee: Stephan Ewen Priority: Critical Fix For: 0.10, 0.9.1 I am basically running an algorithm that simulates a Gather Sum Apply Iteration that performs Traingle Count (Why simulate it? Because you just need a superstep - useless overhead if you use the runGatherSumApply function in Graph). What happens, at a high level: 1). Select neighbors with ID greater than the one corresponding to the current vertex; 2). Propagate the received values to neighbors with higher ID; 3). compute the number of triangles by checking whether trgVertex.getValue().get(srcVertex.getId()); As you can see, I *do not* perform any division at all; code is here: https://github.com/andralungu/gelly-partitioning/blob/master/src/main/java/example/GSATriangleCount.java Now for small graphs, 50MB max, the computation finishes nicely with the correct result. For a 10GB graph, however, I got this: java.lang.ArithmeticException: / by zero at org.apache.flink.runtime.operators.hash.MutableHashTable.insertIntoTable(MutableHashTable.java:836) at org.apache.flink.runtime.operators.hash.MutableHashTable.buildTableFromSpilledPartition(MutableHashTable.java:819) at org.apache.flink.runtime.operators.hash.MutableHashTable.prepareNextPartition(MutableHashTable.java:508) at org.apache.flink.runtime.operators.hash.MutableHashTable.nextRecord(MutableHashTable.java:544) at org.apache.flink.runtime.operators.hash.NonReusingBuildFirstHashMatchIterator.callWithNextKey(NonReusingBuildFirstHashMatchIterator.java:104) at org.apache.flink.runtime.operators.MatchDriver.run(MatchDriver.java:173) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:722) see the full log here: https://gist.github.com/andralungu/984774f6348269df7951 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2292) Report accumulators periodically while job is running
[ https://issues.apache.org/jira/browse/FLINK-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621969#comment-14621969 ] ASF GitHub Bot commented on FLINK-2292: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/896#discussion_r34335420 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala --- @@ -1017,6 +1040,9 @@ object TaskManager { val HEARTBEAT_INTERVAL: FiniteDuration = 5000 milliseconds + /* Interval to send accumulators to the job manager */ + val ACCUMULATOR_REPORT_INTERVAL: FiniteDuration = 10 seconds --- End diff -- Removed. The update interval corresponds to the heartbeat interval. Report accumulators periodically while job is running - Key: FLINK-2292 URL: https://issues.apache.org/jira/browse/FLINK-2292 Project: Flink Issue Type: Sub-task Components: JobManager, TaskManager Reporter: Maximilian Michels Assignee: Maximilian Michels Fix For: 0.10 Accumulators should be sent periodically, as part of the heartbeat that sends metrics. This allows them to be updated in real time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2292) Report accumulators periodically while job is running
[ https://issues.apache.org/jira/browse/FLINK-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621965#comment-14621965 ] ASF GitHub Bot commented on FLINK-2292: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/896#discussion_r34335345 --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/runtime/io/StreamingAbstractRecordReader.java --- @@ -57,6 +58,12 @@ private final BarrierBuffer barrierBuffer; + /** +* Counters for the number of bytes read and records processed. +*/ + private LongCounter numRecordsRead = null; --- End diff -- Thanks, I didn't know. Report accumulators periodically while job is running - Key: FLINK-2292 URL: https://issues.apache.org/jira/browse/FLINK-2292 Project: Flink Issue Type: Sub-task Components: JobManager, TaskManager Reporter: Maximilian Michels Assignee: Maximilian Michels Fix For: 0.10 Accumulators should be sent periodically, as part of the heartbeat that sends metrics. This allows them to be updated in real time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2292][FLINK-1573] add live per-task acc...
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/896#discussion_r34335420 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala --- @@ -1017,6 +1040,9 @@ object TaskManager { val HEARTBEAT_INTERVAL: FiniteDuration = 5000 milliseconds + /* Interval to send accumulators to the job manager */ + val ACCUMULATOR_REPORT_INTERVAL: FiniteDuration = 10 seconds --- End diff -- Removed. The update interval corresponds to the heartbeat interval. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2292) Report accumulators periodically while job is running
[ https://issues.apache.org/jira/browse/FLINK-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621977#comment-14621977 ] ASF GitHub Bot commented on FLINK-2292: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/896#discussion_r34335824 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java --- @@ -131,6 +133,35 @@ private SerializedValueStateHandle? operatorState; + /* Lock for updating the accumulators atomically. */ --- End diff -- If you consider comment lines as empty (non-code) lines, then it follows the classes' style. Report accumulators periodically while job is running - Key: FLINK-2292 URL: https://issues.apache.org/jira/browse/FLINK-2292 Project: Flink Issue Type: Sub-task Components: JobManager, TaskManager Reporter: Maximilian Michels Assignee: Maximilian Michels Fix For: 0.10 Accumulators should be sent periodically, as part of the heartbeat that sends metrics. This allows them to be updated in real time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2292][FLINK-1573] add live per-task acc...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/896#issuecomment-120302065 Looks like this change breaks the YARN integration. The YARN WordCount no longer works. Should be working again now. It would be good if the accumulator update interval was configurable. Edit: Is that the same value as the heartbeats? Yes, that was a design rationale to keep the message count low. We could only send the accumulators in every Nth heartbeat and let it be configurable. The is a potential modification conflict: Drawing a snapshot for serialization and registering a new accumulator can lead to a ConcurrentModificationException in the drawing of the snapshot. I conducted tests with concurrent insertions and deletions and found that only concurrent removals cause ConcurrentModificationExceptions. Removals are not allowed for accumulators. Anyways, we could switch to a synchronized or copy on write hash map. If we do I would opt for the latter. The naming of the accumulators refers sometimes to flink vs. user-defined, and sometimes to internal vs. external. Can we make this consistent? I actually like the flink vs. user-defined naming better. Then let's stick to the flink vs. user-defined naming scheme. I think the code would be simpler is the registry simply always had a created map for internal and external accumulators. Also, a reporter object would help. I agree that would be a nicer way of dealing with the API. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2292) Report accumulators periodically while job is running
[ https://issues.apache.org/jira/browse/FLINK-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621992#comment-14621992 ] ASF GitHub Bot commented on FLINK-2292: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/896#issuecomment-120302065 Looks like this change breaks the YARN integration. The YARN WordCount no longer works. Should be working again now. It would be good if the accumulator update interval was configurable. Edit: Is that the same value as the heartbeats? Yes, that was a design rationale to keep the message count low. We could only send the accumulators in every Nth heartbeat and let it be configurable. The is a potential modification conflict: Drawing a snapshot for serialization and registering a new accumulator can lead to a ConcurrentModificationException in the drawing of the snapshot. I conducted tests with concurrent insertions and deletions and found that only concurrent removals cause ConcurrentModificationExceptions. Removals are not allowed for accumulators. Anyways, we could switch to a synchronized or copy on write hash map. If we do I would opt for the latter. The naming of the accumulators refers sometimes to flink vs. user-defined, and sometimes to internal vs. external. Can we make this consistent? I actually like the flink vs. user-defined naming better. Then let's stick to the flink vs. user-defined naming scheme. I think the code would be simpler is the registry simply always had a created map for internal and external accumulators. Also, a reporter object would help. I agree that would be a nicer way of dealing with the API. Report accumulators periodically while job is running - Key: FLINK-2292 URL: https://issues.apache.org/jira/browse/FLINK-2292 Project: Flink Issue Type: Sub-task Components: JobManager, TaskManager Reporter: Maximilian Michels Assignee: Maximilian Michels Fix For: 0.10 Accumulators should be sent periodically, as part of the heartbeat that sends metrics. This allows them to be updated in real time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2339) Prevent asynchronous checkpoint calls from overtaking each other
[ https://issues.apache.org/jira/browse/FLINK-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-2339. - Resolution: Fixed Fixed in cbde2c2a3d71e17990d76d603e1bb6d275c888be Prevent asynchronous checkpoint calls from overtaking each other Key: FLINK-2339 URL: https://issues.apache.org/jira/browse/FLINK-2339 Project: Flink Issue Type: Improvement Components: TaskManager Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 Currently, when checkpoint state materialization takes very long, and the checkpoint interval is low, the asynchronous calls to trigger checkpoints (on the sources) could overtake prior calls. We can fix that by making sure that all calls are dispatched in order by the same thread, rather than spawning a new thread for each call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2339) Prevent asynchronous checkpoint calls from overtaking each other
[ https://issues.apache.org/jira/browse/FLINK-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-2339. --- Prevent asynchronous checkpoint calls from overtaking each other Key: FLINK-2339 URL: https://issues.apache.org/jira/browse/FLINK-2339 Project: Flink Issue Type: Improvement Components: TaskManager Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 Currently, when checkpoint state materialization takes very long, and the checkpoint interval is low, the asynchronous calls to trigger checkpoints (on the sources) could overtake prior calls. We can fix that by making sure that all calls are dispatched in order by the same thread, rather than spawning a new thread for each call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2292][FLINK-1573] add live per-task acc...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/896#issuecomment-120305327 The reporter object has the advantage that it is more easily extensible. At some point we will want to differentiate between locally received bytes, and remotely received bytes, for example. Or include wait/stall times to detect whether a task is throttled by a slower predecessor. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2292) Report accumulators periodically while job is running
[ https://issues.apache.org/jira/browse/FLINK-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622010#comment-14622010 ] ASF GitHub Bot commented on FLINK-2292: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/896#issuecomment-120305327 The reporter object has the advantage that it is more easily extensible. At some point we will want to differentiate between locally received bytes, and remotely received bytes, for example. Or include wait/stall times to detect whether a task is throttled by a slower predecessor. Report accumulators periodically while job is running - Key: FLINK-2292 URL: https://issues.apache.org/jira/browse/FLINK-2292 Project: Flink Issue Type: Sub-task Components: JobManager, TaskManager Reporter: Maximilian Michels Assignee: Maximilian Michels Fix For: 0.10 Accumulators should be sent periodically, as part of the heartbeat that sends metrics. This allows them to be updated in real time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2180) Streaming iterate test fails spuriously
[ https://issues.apache.org/jira/browse/FLINK-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621938#comment-14621938 ] ASF GitHub Bot commented on FLINK-2180: --- Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/802#issuecomment-120276079 I am currently preparing a new PR with a rework of how the iterations are constructed in the StreamGraph it will be ready today. It will solve many issues that we currently have and will make the iterations much more robust. I am also adding a bunch of new tests. I think this is still a good addition because some of my tests will depend on the maxWaitTime. I propose to add your changes afterwards. I volunteer to port your commits to my changes :) Streaming iterate test fails spuriously --- Key: FLINK-2180 URL: https://issues.apache.org/jira/browse/FLINK-2180 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9 Reporter: Márton Balassi Assignee: Gábor Hermann Following output seen occasionally: Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.667 sec FAILURE! - in org.apache.flink.streaming.api.IterateTest test(org.apache.flink.streaming.api.IterateTest) Time elapsed: 3.662 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.flink.streaming.api.IterateTest.test(IterateTest.java:154) See: https://travis-ci.org/mbalassi/flink/jobs/65803465 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2292][FLINK-1573] add live per-task acc...
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/896#discussion_r34335325 --- Diff: flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/wordcount/WordCount.java --- @@ -60,6 +60,7 @@ public static void main(String[] args) throws Exception { // set up the execution environment final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + env.setParallelism(1); --- End diff -- Thanks for spotting it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2292][FLINK-1573] add live per-task acc...
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/896#discussion_r34335345 --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/runtime/io/StreamingAbstractRecordReader.java --- @@ -57,6 +58,12 @@ private final BarrierBuffer barrierBuffer; + /** +* Counters for the number of bytes read and records processed. +*/ + private LongCounter numRecordsRead = null; --- End diff -- Thanks, I didn't know. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2292][FLINK-1573] add live per-task acc...
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/896#discussion_r34335337 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java --- @@ -172,13 +173,20 @@ /** The library cache, from which the task can request its required JAR files */ private final LibraryCacheManager libraryCache; - + /** The cache for user-defined files that the invokable requires */ private final FileCache fileCache; - + /** The gateway to the network stack, which handles inputs and produced results */ private final NetworkEnvironment network; + /** The registry of this task which enables live reporting of accumulators */ + private final AccumulatorRegistry accumulatorRegistry; + + public AccumulatorRegistry getAccumulatorRegistry() { --- End diff -- I agree. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2292][FLINK-1573] add live per-task acc...
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/896#discussion_r34335380 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala --- @@ -306,99 +308,100 @@ extends Actor with ActorLogMessages with ActorSynchronousLogging { if (!isConnected) { log.debug(sDropping message $message because the TaskManager is currently + not connected to a JobManager.) -} +} else { -// we order the messages by frequency, to make sure the code paths for matching -// are as short as possible -message match { + // we order the messages by frequency, to make sure the code paths for matching + // are as short as possible + message match { + +// tell the task about the availability of a new input partition +case UpdateTaskSinglePartitionInfo(executionID, resultID, partitionInfo) = + updateTaskInputPartitions(executionID, List((resultID, partitionInfo))) + +// tell the task about the availability of some new input partitions +case UpdateTaskMultiplePartitionInfos(executionID, partitionInfos) = + updateTaskInputPartitions(executionID, partitionInfos) + +// discards intermediate result partitions of a task execution on this TaskManager +case FailIntermediateResultPartitions(executionID) = + log.info(Discarding the results produced by task execution + executionID) + if (network.isAssociated) { +try { + network.getPartitionManager.releasePartitionsProducedBy(executionID) +} catch { + case t: Throwable = killTaskManagerFatal( +Fatal leak: Unable to release intermediate result partition data, t) +} + } - // tell the task about the availability of a new input partition - case UpdateTaskSinglePartitionInfo(executionID, resultID, partitionInfo) = -updateTaskInputPartitions(executionID, List((resultID, partitionInfo))) +// notifies the TaskManager that the state of a task has changed. +// the TaskManager informs the JobManager and cleans up in case the transition +// was into a terminal state, or in case the JobManager cannot be informed of the +// state transition - // tell the task about the availability of some new input partitions - case UpdateTaskMultiplePartitionInfos(executionID, partitionInfos) = -updateTaskInputPartitions(executionID, partitionInfos) +case updateMsg@UpdateTaskExecutionState(taskExecutionState: TaskExecutionState) = - // discards intermediate result partitions of a task execution on this TaskManager - case FailIntermediateResultPartitions(executionID) = -log.info(Discarding the results produced by task execution + executionID) -if (network.isAssociated) { - try { - network.getPartitionManager.releasePartitionsProducedBy(executionID) - } catch { -case t: Throwable = killTaskManagerFatal( -Fatal leak: Unable to release intermediate result partition data, t) - } -} + // we receive these from our tasks and forward them to the JobManager --- End diff -- This is a bug I discovered while reading through the code. It prevents processing of messages when the task manager is not connected to the job manager. If you look at line 307, it says it would skip the message but continues to process it. If you want I can open a separate pull request. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2292][FLINK-1573] add live per-task acc...
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/896#discussion_r34335330 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/RuntimeEnvironment.java --- @@ -212,19 +218,19 @@ public InputGate getInputGate(int index) { return inputGates; } - @Override - public void reportAccumulators(MapString, Accumulator?, ? accumulators) { - AccumulatorEvent evt; - try { - evt = new AccumulatorEvent(getJobID(), accumulators); - } - catch (IOException e) { - throw new RuntimeException(Cannot serialize accumulators to send them to JobManager, e); - } - - ReportAccumulatorResult accResult = new ReportAccumulatorResult(jobId, executionId, evt); - jobManagerActor.tell(accResult, ActorRef.noSender()); - } +// @Override --- End diff -- Yes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2292][FLINK-1573] add live per-task acc...
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/896#discussion_r34335342 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java --- @@ -237,12 +246,13 @@ public Task(TaskDeploymentDescriptor tdd, this.memoryManager = checkNotNull(memManager); this.ioManager = checkNotNull(ioManager); - this.broadcastVariableManager =checkNotNull(bcVarManager); + this.broadcastVariableManager = checkNotNull(bcVarManager); + this.accumulatorRegistry = accumulatorRegistry; this.jobManager = checkNotNull(jobManagerActor); this.taskManager = checkNotNull(taskManagerActor); this.actorAskTimeout = new Timeout(checkNotNull(actorAskTimeout)); - --- End diff -- No auto formats, just removes redundant tabs in empty lines and corrects a missing space. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2292) Report accumulators periodically while job is running
[ https://issues.apache.org/jira/browse/FLINK-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621964#comment-14621964 ] ASF GitHub Bot commented on FLINK-2292: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/896#discussion_r34335342 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java --- @@ -237,12 +246,13 @@ public Task(TaskDeploymentDescriptor tdd, this.memoryManager = checkNotNull(memManager); this.ioManager = checkNotNull(ioManager); - this.broadcastVariableManager =checkNotNull(bcVarManager); + this.broadcastVariableManager = checkNotNull(bcVarManager); + this.accumulatorRegistry = accumulatorRegistry; this.jobManager = checkNotNull(jobManagerActor); this.taskManager = checkNotNull(taskManagerActor); this.actorAskTimeout = new Timeout(checkNotNull(actorAskTimeout)); - --- End diff -- No auto formats, just removes redundant tabs in empty lines and corrects a missing space. Report accumulators periodically while job is running - Key: FLINK-2292 URL: https://issues.apache.org/jira/browse/FLINK-2292 Project: Flink Issue Type: Sub-task Components: JobManager, TaskManager Reporter: Maximilian Michels Assignee: Maximilian Michels Fix For: 0.10 Accumulators should be sent periodically, as part of the heartbeat that sends metrics. This allows them to be updated in real time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2292) Report accumulators periodically while job is running
[ https://issues.apache.org/jira/browse/FLINK-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621961#comment-14621961 ] ASF GitHub Bot commented on FLINK-2292: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/896#discussion_r34335325 --- Diff: flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/wordcount/WordCount.java --- @@ -60,6 +60,7 @@ public static void main(String[] args) throws Exception { // set up the execution environment final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + env.setParallelism(1); --- End diff -- Thanks for spotting it. Report accumulators periodically while job is running - Key: FLINK-2292 URL: https://issues.apache.org/jira/browse/FLINK-2292 Project: Flink Issue Type: Sub-task Components: JobManager, TaskManager Reporter: Maximilian Michels Assignee: Maximilian Michels Fix For: 0.10 Accumulators should be sent periodically, as part of the heartbeat that sends metrics. This allows them to be updated in real time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2292) Report accumulators periodically while job is running
[ https://issues.apache.org/jira/browse/FLINK-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621963#comment-14621963 ] ASF GitHub Bot commented on FLINK-2292: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/896#discussion_r34335337 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java --- @@ -172,13 +173,20 @@ /** The library cache, from which the task can request its required JAR files */ private final LibraryCacheManager libraryCache; - + /** The cache for user-defined files that the invokable requires */ private final FileCache fileCache; - + /** The gateway to the network stack, which handles inputs and produced results */ private final NetworkEnvironment network; + /** The registry of this task which enables live reporting of accumulators */ + private final AccumulatorRegistry accumulatorRegistry; + + public AccumulatorRegistry getAccumulatorRegistry() { --- End diff -- I agree. Report accumulators periodically while job is running - Key: FLINK-2292 URL: https://issues.apache.org/jira/browse/FLINK-2292 Project: Flink Issue Type: Sub-task Components: JobManager, TaskManager Reporter: Maximilian Michels Assignee: Maximilian Michels Fix For: 0.10 Accumulators should be sent periodically, as part of the heartbeat that sends metrics. This allows them to be updated in real time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2180] [streaming] Iteration test fix
Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/802#issuecomment-120276079 I am currently preparing a new PR with a rework of how the iterations are constructed in the StreamGraph it will be ready today. It will solve many issues that we currently have and will make the iterations much more robust. I am also adding a bunch of new tests. I think this is still a good addition because some of my tests will depend on the maxWaitTime. I propose to add your changes afterwards. I volunteer to port your commits to my changes :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [doc] Fix wordcount example in YARN setup
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/897#issuecomment-120279506 +1 to merge --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2310] Add an Adamic Adar Similarity exa...
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/892#issuecomment-120319905 Hi @shghatge! I agree, let's deal with the approximate version as a separate issue. In the end though, it would be nice to have a single library method and an input parameter to decide whether the computation should be exact or approximate. Regarding the bloom filter, the idea is for each vertex to build a bloom filter with its neighbors and send it to its neighbors. Then, each vertex can compare its own neighborhood (the exact one) with the received bloom filter neighborhoods. Take a look at how approximate Jaccard is computed in the okapi library [here](https://github.com/grafos-ml/okapi/blob/master/src/main/java/ml/grafos/okapi/graphs/similarity/Jaccard.java) (class `JaccardApproximation `). Let me know if you have more questions :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2310) Add an Adamic-Adar Similarity example
[ https://issues.apache.org/jira/browse/FLINK-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622039#comment-14622039 ] ASF GitHub Bot commented on FLINK-2310: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/892#issuecomment-120319905 Hi @shghatge! I agree, let's deal with the approximate version as a separate issue. In the end though, it would be nice to have a single library method and an input parameter to decide whether the computation should be exact or approximate. Regarding the bloom filter, the idea is for each vertex to build a bloom filter with its neighbors and send it to its neighbors. Then, each vertex can compare its own neighborhood (the exact one) with the received bloom filter neighborhoods. Take a look at how approximate Jaccard is computed in the okapi library [here](https://github.com/grafos-ml/okapi/blob/master/src/main/java/ml/grafos/okapi/graphs/similarity/Jaccard.java) (class `JaccardApproximation `). Let me know if you have more questions :) Add an Adamic-Adar Similarity example - Key: FLINK-2310 URL: https://issues.apache.org/jira/browse/FLINK-2310 Project: Flink Issue Type: Task Components: Gelly Reporter: Andra Lungu Assignee: Shivani Ghatge Priority: Minor Just as Jaccard, the Adamic-Adar algorithm measures the similarity between a set of nodes. However, instead of counting the common neighbors and dividing them by the total number of neighbors, the similarity is weighted according to the vertex degrees. In particular, it's equal to log(1/numberOfEdges). The Adamic-Adar algorithm can be broken into three steps: 1). For each vertex, compute the log of its inverse degrees (with the formula above) and set it as the vertex value. 2). Each vertex will then send this new computed value along with a list of neighbors to the targets of its out-edges 3). Weigh the edges with the Adamic-Adar index: Sum over n from CN of log(1/k_n)(CN is the set of all common neighbors of two vertices x, y. k_n is the degree of node n). See [2] Prerequisites: - Full understanding of the Jaccard Similarity Measure algorithm - Reading the associated literature: [1] http://social.cs.uiuc.edu/class/cs591kgk/friendsadamic.pdf [2] http://stackoverflow.com/questions/22565620/fast-algorithm-to-compute-adamic-adar -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [Flink-gelly] Added local clustering coefficie...
GitHub user samk3211 opened a pull request: https://github.com/apache/flink/pull/898 [Flink-gelly] Added local clustering coefficient (for directed graph) exa⦠â¦mple in the tutorialExamples subfolder. You can merge this pull request into a Git repository by running: $ git pull https://github.com/samk3211/flink tutorial Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/898.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #898 commit 1cacbd8ab58e431c32db0c2bef2538df81e088b1 Author: Samia samia Date: 2015-07-10T09:25:46Z [Flink-gelly] Added local clustering coefficient (directed graph) example in the tutorialExamples subfolder. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [Flink-gelly] Added local clustering coefficie...
Github user samk3211 closed the pull request at: https://github.com/apache/flink/pull/898 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-2342) Add new fit operation and more tests for StandardScaler
Theodore Vasiloudis created FLINK-2342: -- Summary: Add new fit operation and more tests for StandardScaler Key: FLINK-2342 URL: https://issues.apache.org/jira/browse/FLINK-2342 Project: Flink Issue Type: Improvement Components: Machine Learning Library Reporter: Theodore Vasiloudis Assignee: Theodore Vasiloudis Priority: Minor Fix For: 0.10 StandardScaler currently has a transform operation for types (Vector, Double), but no corresponding fit operation. The test cases also do not cover all the possible types that we can call fit and transform on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2342) Add new fit operation and more tests for StandardScaler
[ https://issues.apache.org/jira/browse/FLINK-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622075#comment-14622075 ] ASF GitHub Bot commented on FLINK-2342: --- GitHub user thvasilo opened a pull request: https://github.com/apache/flink/pull/899 [FLINK-2342] [ml] Add new fit operation and more tests for StandardScaler StandardScaler currently has a transform operation for types (Vector, Double), but no corresponding fit operation. The test cases also do not cover all the possible types that we can call fit and transform on. This PR addresses this, and removes some unused code. You can merge this pull request into a Git repository by running: $ git pull https://github.com/thvasilo/flink scaler-extra-tests Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/899.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #899 commit aa18b3dc82475c179f56fad25363184a47c36c2f Author: Theodore Vasiloudis t...@sics.se Date: 2015-07-10T09:34:32Z More tests and a new operation for StandardScaler Add new fit operation and more tests for StandardScaler --- Key: FLINK-2342 URL: https://issues.apache.org/jira/browse/FLINK-2342 Project: Flink Issue Type: Improvement Components: Machine Learning Library Reporter: Theodore Vasiloudis Assignee: Theodore Vasiloudis Priority: Minor Labels: ML Fix For: 0.10 StandardScaler currently has a transform operation for types (Vector, Double), but no corresponding fit operation. The test cases also do not cover all the possible types that we can call fit and transform on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2342] [ml] Add new fit operation and mo...
GitHub user thvasilo opened a pull request: https://github.com/apache/flink/pull/899 [FLINK-2342] [ml] Add new fit operation and more tests for StandardScaler StandardScaler currently has a transform operation for types (Vector, Double), but no corresponding fit operation. The test cases also do not cover all the possible types that we can call fit and transform on. This PR addresses this, and removes some unused code. You can merge this pull request into a Git repository by running: $ git pull https://github.com/thvasilo/flink scaler-extra-tests Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/899.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #899 commit aa18b3dc82475c179f56fad25363184a47c36c2f Author: Theodore Vasiloudis t...@sics.se Date: 2015-07-10T09:34:32Z More tests and a new operation for StandardScaler --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Make streaming iterations more robust
GitHub user gyfora opened a pull request: https://github.com/apache/flink/pull/900 Make streaming iterations more robust This PR reworks the way iterations are constructed in the streamgraph from the previous eager approach to do it when we actually create the jobgraph. This new approach solves many know and unknown issues that the previous approach had, such as issues with multiple tail and head operators (FLINK-2328), issues with partitioning on the feedback stream, issues with ConnectedIterations and immutability of the IterativeDataStream etc. I also included a new set of tests to validate all the expected functionality: -Test for simple and connected iterations with more heads and tails -Parallelism, partitioning and co-location checks -Tests for the exceptions thrown during job creation for invalid iterative topologies You can merge this pull request into a Git repository by running: $ git pull https://github.com/gyfora/flink master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/900.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #900 commit 9fef88306bc79f1f2e0520ddeb2ef10be8e4de09 Author: Gyula Fora gyf...@apache.org Date: 2015-07-10T11:26:44Z [FLINK-2335] [streaming] Lazy iteration construction in StreamGraph --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2310] Add an Adamic Adar Similarity exa...
Github user shghatge commented on the pull request: https://github.com/apache/flink/pull/892#issuecomment-120366455 Hello @vasia Thanks for the clarification. I was only thinking of changing the data structure :) But this method seems much more sensible. In any case... Should I close this pull request? Or should I commit the changes made to it according to your first suggestion? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2106) Add outer joins to API, Optimizer, and Runtime
[ https://issues.apache.org/jira/browse/FLINK-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622235#comment-14622235 ] Johann Kovacs commented on FLINK-2106: -- [~r-pogalz] and I would like to start working on this as part of our project in the next couple of weeks. Can you please assign (one of) us? Add outer joins to API, Optimizer, and Runtime -- Key: FLINK-2106 URL: https://issues.apache.org/jira/browse/FLINK-2106 Project: Flink Issue Type: Sub-task Components: Java API, Local Runtime, Optimizer, Scala API Reporter: Fabian Hueske Priority: Minor Fix For: pre-apache Add left/right/full outer join methods to the DataSet APIs (Java, Scala), to the optimizer, and the runtime of Flink. Initially, the execution strategy should be a sort-merge outer join (FLINK-2105) but can later be extended to hash joins for left/right outer joins. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2111] Add stop signal to cleanly shut...
Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/750#issuecomment-120408249 Changes applied: - renamed signal name from TERMINATE to STOP (as decided by community vote) - reverted changes to execution graph state machine - added interface `Stopable` an reverted method `AbstractInvocable.stop()` - added `JobType` enum (Batching / Streaming) [as discussed at last Meetup with @StephanEwen ] - extended test coverage Please review again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [doc] Fix wordcount example in YARN setup
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/897 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2106) Add outer joins to API, Optimizer, and Runtime
[ https://issues.apache.org/jira/browse/FLINK-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622248#comment-14622248 ] Fabian Hueske commented on FLINK-2106: -- Having a working outer join algorithm (FLINK-2105 or FLINK-2107) is a prerequisite for this issue. I would strongly recommend to first solve FLINK-2105 before tackling this issue. Add outer joins to API, Optimizer, and Runtime -- Key: FLINK-2106 URL: https://issues.apache.org/jira/browse/FLINK-2106 Project: Flink Issue Type: Sub-task Components: Java API, Local Runtime, Optimizer, Scala API Reporter: Fabian Hueske Priority: Minor Fix For: pre-apache Add left/right/full outer join methods to the DataSet APIs (Java, Scala), to the optimizer, and the runtime of Flink. Initially, the execution strategy should be a sort-merge outer join (FLINK-2105) but can later be extended to hash joins for left/right outer joins. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2310) Add an Adamic-Adar Similarity example
[ https://issues.apache.org/jira/browse/FLINK-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622107#comment-14622107 ] ASF GitHub Bot commented on FLINK-2310: --- Github user shghatge commented on the pull request: https://github.com/apache/flink/pull/892#issuecomment-120366455 Hello @vasia Thanks for the clarification. I was only thinking of changing the data structure :) But this method seems much more sensible. In any case... Should I close this pull request? Or should I commit the changes made to it according to your first suggestion? Add an Adamic-Adar Similarity example - Key: FLINK-2310 URL: https://issues.apache.org/jira/browse/FLINK-2310 Project: Flink Issue Type: Task Components: Gelly Reporter: Andra Lungu Assignee: Shivani Ghatge Priority: Minor Just as Jaccard, the Adamic-Adar algorithm measures the similarity between a set of nodes. However, instead of counting the common neighbors and dividing them by the total number of neighbors, the similarity is weighted according to the vertex degrees. In particular, it's equal to log(1/numberOfEdges). The Adamic-Adar algorithm can be broken into three steps: 1). For each vertex, compute the log of its inverse degrees (with the formula above) and set it as the vertex value. 2). Each vertex will then send this new computed value along with a list of neighbors to the targets of its out-edges 3). Weigh the edges with the Adamic-Adar index: Sum over n from CN of log(1/k_n)(CN is the set of all common neighbors of two vertices x, y. k_n is the degree of node n). See [2] Prerequisites: - Full understanding of the Jaccard Similarity Measure algorithm - Reading the associated literature: [1] http://social.cs.uiuc.edu/class/cs591kgk/friendsadamic.pdf [2] http://stackoverflow.com/questions/22565620/fast-algorithm-to-compute-adamic-adar -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Make streaming iterations more robust
Github user senorcarbone commented on a diff in the pull request: https://github.com/apache/flink/pull/900#discussion_r34352021 --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/datastream/IterativeDataStream.java --- @@ -201,14 +204,16 @@ public ConnectedIterativeDataStream(IterativeDataStreamI input, TypeInformatio * @return The feedback stream. * */ + @SuppressWarnings({ rawtypes, unchecked }) public DataStreamF closeWith(DataStreamF feedbackStream) { - DataStreamF iterationSink = new DataStreamSinkF(input.environment, Iteration Sink, - null, null); + if (input.closed) { + throw new IllegalStateException( --- End diff -- Since now we build the loops at the job graph generation, wouldn't it be reasonable to do an implicit union there? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [doc] Fix wordcount example in YARN setup
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/897#issuecomment-120384704 Merging... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2310) Add an Adamic-Adar Similarity example
[ https://issues.apache.org/jira/browse/FLINK-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622253#comment-14622253 ] ASF GitHub Bot commented on FLINK-2310: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/892#issuecomment-120404920 You can use this PR for the exact computation, no need to open a new one! Add an Adamic-Adar Similarity example - Key: FLINK-2310 URL: https://issues.apache.org/jira/browse/FLINK-2310 Project: Flink Issue Type: Task Components: Gelly Reporter: Andra Lungu Assignee: Shivani Ghatge Priority: Minor Just as Jaccard, the Adamic-Adar algorithm measures the similarity between a set of nodes. However, instead of counting the common neighbors and dividing them by the total number of neighbors, the similarity is weighted according to the vertex degrees. In particular, it's equal to log(1/numberOfEdges). The Adamic-Adar algorithm can be broken into three steps: 1). For each vertex, compute the log of its inverse degrees (with the formula above) and set it as the vertex value. 2). Each vertex will then send this new computed value along with a list of neighbors to the targets of its out-edges 3). Weigh the edges with the Adamic-Adar index: Sum over n from CN of log(1/k_n)(CN is the set of all common neighbors of two vertices x, y. k_n is the degree of node n). See [2] Prerequisites: - Full understanding of the Jaccard Similarity Measure algorithm - Reading the associated literature: [1] http://social.cs.uiuc.edu/class/cs591kgk/friendsadamic.pdf [2] http://stackoverflow.com/questions/22565620/fast-algorithm-to-compute-adamic-adar -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Make streaming iterations more robust
Github user senorcarbone commented on the pull request: https://github.com/apache/flink/pull/900#issuecomment-120405183 There are quite a few changes in this PR! In general I am very keen to eliminating DataStream mutations and adding that logic into the jobgraph building phase. The code is clear enough, I didn't find any errors so far and tests are use-case complete and pass so :+1: from me. So again what actually happens now (correct me if I am wrong) is that we actually reuse the same src-sink pair for loops with the same parallelism and create additional src-sinks otherwise, correct? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2200] Add Flink with Scala 2.11 in Mave...
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/885#issuecomment-120554092 If I get 1 more LGTM, I'll merge this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2108] [ml] [WIP] Add score function for...
GitHub user thvasilo opened a pull request: https://github.com/apache/flink/pull/902 [FLINK-2108] [ml] [WIP] Add score function for Predictors This PR build upon the evaluation PR currently under review (#871) and adds to new operations to the Predictor class, one that takes a scorer and a test set and produces a score as an evaluation of the Predictor performance using the provided score, and one that takes only a test set and a default score is used. This PR includes implementations for custom scores and simple scores for all Predictor implementations, either through the Classifier and Regressor base classes, or specific ones, like the one provided for ALS. The provided score custom score operation currently expects DataSet[LabeledVector] as the type of test set and Double as the type of prediction. TODO: Docs and code cleanup You can merge this pull request into a Git repository by running: $ git pull https://github.com/thvasilo/flink score-operation Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/902.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #902 commit 305b43a451af3d8bc859671476c215308fbfc7fc Author: mikiobraun mikiobr...@gmail.com Date: 2015-06-22T15:04:42Z Adding some first loss functions for the evaluation framework commit bdb1a6912d2bcec29446ca4a9fbc550f2ecb8f4a Author: Theodore Vasiloudis t...@sics.se Date: 2015-06-23T14:07:48Z Scorer for evaluation commit 4a7593ade68f43d444a6b289191f053a4ea8b031 Author: Theodore Vasiloudis t...@sics.se Date: 2015-06-25T09:41:10Z Adds accuracy score and R^2 score. Also trying out Scores as classes instead of functions. Not too happy with the extra biolerplate of Score as classes will probably revert, and have objects like RegressionsScores, ClassificationScores that contain the definitions of the relevant scores. commit 5c89c478bd00f168bfe48954d06367b28f948571 Author: Theodore Vasiloudis t...@sics.se Date: 2015-06-26T11:30:56Z Adds a evaluate operation for LabeledVector input commit e7bb4b42424641d640df370cd6ace71f7f42ee8d Author: Theodore Vasiloudis t...@sics.se Date: 2015-06-26T11:32:13Z Adds Regressor interface, and a score function for regression algorithms. commit 3d8a6928b02b30c732f282df61613561dbf8d4fc Author: Theodore Vasiloudis t...@sics.se Date: 2015-06-30T14:04:58Z Added Classifier intermediate class, and default score function for classifiers. commit e1a26ed30bb784633685703892f67d51136f6060 Author: Theodore Vasiloudis t...@sics.se Date: 2015-07-01T08:20:41Z Going back to having scores defined in objects instead of their own classes. commit 0dd251a5a59cd610c4df3e9a1ea3921b1a9cc2e0 Author: Theodore Vasiloudis t...@sics.se Date: 2015-07-01T13:00:37Z Removed ParameterMap from predict function of PredictOperation commit 492e9a383af6285f0fdca5031d2bd7bdfe3cd511 Author: Theodore Vasiloudis t...@sics.se Date: 2015-07-02T10:21:28Z Reworked score functionality allow chained Predictors. All predictors must now implement a calculateScore function. We are for now assuming that predictors are supervised learning algorithms, once unsupervised learning algorithms are added this will need to be reworked. Also added an evaluate dataset operation to ALS, to allow for scoring of the algorithm. Default performance measure for ALS is RMSE. commit d9715ed3a6faba78e0b34368425768e826d5a736 Author: Theodore Vasiloudis t...@sics.se Date: 2015-07-06T08:50:59Z Made calculateScore only take DataSet[(Double, Double)] commit 7f1a6da52dfcd47d39c39cee2141112e5c10ddad Author: Theodore Vasiloudis t...@sics.se Date: 2015-07-07T08:15:58Z Added test for DataSet.mean() commit edbe3dd9ea48d168f67a9ff231f8373a6aaee38d Author: Theodore Vasiloudis t...@sics.se Date: 2015-07-07T12:11:45Z Switched from cross to mapWithBcVariable commit e840c14032f5fea3b476e1a99122eb9125ba5a4f Author: Theodore Vasiloudis t...@sics.se Date: 2015-07-08T16:48:27Z Addressed multiple PR comments. commit 6e48b612f0a367e798e40590f6921d4dc242f2aa Author: Theodore Vasiloudis t...@sics.se Date: 2015-07-08T17:06:42Z Add approximatelyEquals for Doubles, used for score calculation. commit 57d0ef2c4bc268d1d870c7aab537dd611f464fcf Author: Theodore Vasiloudis t...@sics.se Date: 2015-07-08T17:23:14Z Improved dostrings for Score commit eb66de590947ca8f887a8e52f8c66ec860b82af3 Author: Theodore Vasiloudis t...@sics.se Date: 2015-07-08T17:27:32Z Removed score function from Predictor. commit 13053ef358091427fa89c66305d602a84a819c87 Author: Theodore Vasiloudis t...@sics.se Date: 2015-07-09T13:52:18Z Added score operation. This operation is similar to the predict and evaluate operations, allowing type-dependent
[GitHub] flink pull request: [FLINK-2337] Multiple SLF4J bindings using Sto...
GitHub user mjsax opened a pull request: https://github.com/apache/flink/pull/903 [FLINK-2337] Multiple SLF4J bindings using Storm compatibility layer exclude logback from Storm dependencies You can merge this pull request into a Git repository by running: $ git pull https://github.com/mjsax/flink flink-2337-slf4j-bindings Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/903.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #903 commit 9bc3e974e7833fd92a2251a38b54ad9d6e09d759 Author: mjsax mj...@informatik.hu-berlin.de Date: 2015-07-10T13:43:21Z excluded 'logback' from Storm dependencies fixes FLINK-2337 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2108) Add score function for Predictors
[ https://issues.apache.org/jira/browse/FLINK-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622406#comment-14622406 ] ASF GitHub Bot commented on FLINK-2108: --- GitHub user thvasilo opened a pull request: https://github.com/apache/flink/pull/902 [FLINK-2108] [ml] [WIP] Add score function for Predictors This PR build upon the evaluation PR currently under review (#871) and adds to new operations to the Predictor class, one that takes a scorer and a test set and produces a score as an evaluation of the Predictor performance using the provided score, and one that takes only a test set and a default score is used. This PR includes implementations for custom scores and simple scores for all Predictor implementations, either through the Classifier and Regressor base classes, or specific ones, like the one provided for ALS. The provided score custom score operation currently expects DataSet[LabeledVector] as the type of test set and Double as the type of prediction. TODO: Docs and code cleanup You can merge this pull request into a Git repository by running: $ git pull https://github.com/thvasilo/flink score-operation Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/902.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #902 commit 305b43a451af3d8bc859671476c215308fbfc7fc Author: mikiobraun mikiobr...@gmail.com Date: 2015-06-22T15:04:42Z Adding some first loss functions for the evaluation framework commit bdb1a6912d2bcec29446ca4a9fbc550f2ecb8f4a Author: Theodore Vasiloudis t...@sics.se Date: 2015-06-23T14:07:48Z Scorer for evaluation commit 4a7593ade68f43d444a6b289191f053a4ea8b031 Author: Theodore Vasiloudis t...@sics.se Date: 2015-06-25T09:41:10Z Adds accuracy score and R^2 score. Also trying out Scores as classes instead of functions. Not too happy with the extra biolerplate of Score as classes will probably revert, and have objects like RegressionsScores, ClassificationScores that contain the definitions of the relevant scores. commit 5c89c478bd00f168bfe48954d06367b28f948571 Author: Theodore Vasiloudis t...@sics.se Date: 2015-06-26T11:30:56Z Adds a evaluate operation for LabeledVector input commit e7bb4b42424641d640df370cd6ace71f7f42ee8d Author: Theodore Vasiloudis t...@sics.se Date: 2015-06-26T11:32:13Z Adds Regressor interface, and a score function for regression algorithms. commit 3d8a6928b02b30c732f282df61613561dbf8d4fc Author: Theodore Vasiloudis t...@sics.se Date: 2015-06-30T14:04:58Z Added Classifier intermediate class, and default score function for classifiers. commit e1a26ed30bb784633685703892f67d51136f6060 Author: Theodore Vasiloudis t...@sics.se Date: 2015-07-01T08:20:41Z Going back to having scores defined in objects instead of their own classes. commit 0dd251a5a59cd610c4df3e9a1ea3921b1a9cc2e0 Author: Theodore Vasiloudis t...@sics.se Date: 2015-07-01T13:00:37Z Removed ParameterMap from predict function of PredictOperation commit 492e9a383af6285f0fdca5031d2bd7bdfe3cd511 Author: Theodore Vasiloudis t...@sics.se Date: 2015-07-02T10:21:28Z Reworked score functionality allow chained Predictors. All predictors must now implement a calculateScore function. We are for now assuming that predictors are supervised learning algorithms, once unsupervised learning algorithms are added this will need to be reworked. Also added an evaluate dataset operation to ALS, to allow for scoring of the algorithm. Default performance measure for ALS is RMSE. commit d9715ed3a6faba78e0b34368425768e826d5a736 Author: Theodore Vasiloudis t...@sics.se Date: 2015-07-06T08:50:59Z Made calculateScore only take DataSet[(Double, Double)] commit 7f1a6da52dfcd47d39c39cee2141112e5c10ddad Author: Theodore Vasiloudis t...@sics.se Date: 2015-07-07T08:15:58Z Added test for DataSet.mean() commit edbe3dd9ea48d168f67a9ff231f8373a6aaee38d Author: Theodore Vasiloudis t...@sics.se Date: 2015-07-07T12:11:45Z Switched from cross to mapWithBcVariable commit e840c14032f5fea3b476e1a99122eb9125ba5a4f Author: Theodore Vasiloudis t...@sics.se Date: 2015-07-08T16:48:27Z Addressed multiple PR comments. commit 6e48b612f0a367e798e40590f6921d4dc242f2aa Author: Theodore Vasiloudis t...@sics.se Date: 2015-07-08T17:06:42Z Add approximatelyEquals for Doubles, used for score calculation. commit 57d0ef2c4bc268d1d870c7aab537dd611f464fcf Author: Theodore Vasiloudis t...@sics.se Date: 2015-07-08T17:23:14Z Improved dostrings for Score commit eb66de590947ca8f887a8e52f8c66ec860b82af3 Author: Theodore Vasiloudis t...@sics.se Date: 2015-07-08T17:27:32Z Removed score function from Predictor.
[jira] [Commented] (FLINK-2337) Multiple SLF4J bindings using Storm compatibility layer
[ https://issues.apache.org/jira/browse/FLINK-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622407#comment-14622407 ] ASF GitHub Bot commented on FLINK-2337: --- GitHub user mjsax opened a pull request: https://github.com/apache/flink/pull/903 [FLINK-2337] Multiple SLF4J bindings using Storm compatibility layer exclude logback from Storm dependencies You can merge this pull request into a Git repository by running: $ git pull https://github.com/mjsax/flink flink-2337-slf4j-bindings Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/903.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #903 commit 9bc3e974e7833fd92a2251a38b54ad9d6e09d759 Author: mjsax mj...@informatik.hu-berlin.de Date: 2015-07-10T13:43:21Z excluded 'logback' from Storm dependencies fixes FLINK-2337 Multiple SLF4J bindings using Storm compatibility layer --- Key: FLINK-2337 URL: https://issues.apache.org/jira/browse/FLINK-2337 Project: Flink Issue Type: Bug Components: flink-contrib Reporter: Matthias J. Sax Assignee: Matthias J. Sax Priority: Minor Storm depends on logback as slf4j implemenation but Flink uses log4j. The log shows the following conflict: SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/cicero/.m2/repository/ch/qos/logback/logback-classic/1.0.13/logback-classic-1.0.13.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/cicero/.m2/repository/org/slf4j/slf4j-log4j12/1.7.7/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. Need to exclude logback from storm dependencies to fix this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2200) Flink API with Scala 2.11 - Maven Repository
[ https://issues.apache.org/jira/browse/FLINK-2200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622381#comment-14622381 ] ASF GitHub Bot commented on FLINK-2200: --- Github user aalexandrov commented on the pull request: https://github.com/apache/flink/pull/885#issuecomment-120423639 Looks very neat! Flink API with Scala 2.11 - Maven Repository Key: FLINK-2200 URL: https://issues.apache.org/jira/browse/FLINK-2200 Project: Flink Issue Type: Wish Components: Build System, Scala API Reporter: Philipp Götze Assignee: Chiwan Park Priority: Trivial Labels: maven It would be nice if you could upload a pre-built version of the Flink API with Scala 2.11 to the maven repository. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2328) Applying more than one transformation on an IterativeDataStream fails
[ https://issues.apache.org/jira/browse/FLINK-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622166#comment-14622166 ] ASF GitHub Bot commented on FLINK-2328: --- GitHub user gyfora opened a pull request: https://github.com/apache/flink/pull/900 Make streaming iterations more robust This PR reworks the way iterations are constructed in the streamgraph from the previous eager approach to do it when we actually create the jobgraph. This new approach solves many know and unknown issues that the previous approach had, such as issues with multiple tail and head operators (FLINK-2328), issues with partitioning on the feedback stream, issues with ConnectedIterations and immutability of the IterativeDataStream etc. I also included a new set of tests to validate all the expected functionality: -Test for simple and connected iterations with more heads and tails -Parallelism, partitioning and co-location checks -Tests for the exceptions thrown during job creation for invalid iterative topologies You can merge this pull request into a Git repository by running: $ git pull https://github.com/gyfora/flink master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/900.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #900 commit 9fef88306bc79f1f2e0520ddeb2ef10be8e4de09 Author: Gyula Fora gyf...@apache.org Date: 2015-07-10T11:26:44Z [FLINK-2335] [streaming] Lazy iteration construction in StreamGraph Applying more than one transformation on an IterativeDataStream fails - Key: FLINK-2328 URL: https://issues.apache.org/jira/browse/FLINK-2328 Project: Flink Issue Type: Bug Components: Streaming Reporter: Gyula Fora Assignee: Gyula Fora Currently the user cannot apply more than one transformation on the IterativeDataStream directly. It fails because instead of creating one iteration source and connecting it to the operators it will try to create two iteration sources which fails on the shared broker slot. A workaround is to use a no-op mapper on the iterative stream and applying the two transformations on that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2343) Change default garbage collector in streaming environments
[ https://issues.apache.org/jira/browse/FLINK-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622388#comment-14622388 ] ASF GitHub Bot commented on FLINK-2343: --- GitHub user StephanEwen opened a pull request: https://github.com/apache/flink/pull/901 [FLINK-2343] [scripts] Make CMS the default GC for streaming setups When the system is started with `start-cluster-streaming.sh`, this patch sets the Garbage Collector to Concurrent Mark Sweep, to keep a low and smooth latency. Without these options, the parallel bulk GC introduces frequent latency spikes. The GC option is currently only set if no other java option is set via `env.java.opts`. Otherwise, if users define their favorite GC in these options, the JVM may not start due to option conflicts. When setting the CMS GC, we also set the `CMSClassUnloadingEnabled` option, to make sure that user-code classes are removed once the classloader expires. You can merge this pull request into a Git repository by running: $ git pull https://github.com/StephanEwen/incubator-flink gc_option Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/901.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #901 commit 8840d7508668504bade7f28ca128389d5e1392d1 Author: Stephan Ewen se...@apache.org Date: 2015-07-10T14:37:28Z [FLINK-2343] [scripts] Make CMS the default GC for streaming setups Change default garbage collector in streaming environments -- Key: FLINK-2343 URL: https://issues.apache.org/jira/browse/FLINK-2343 Project: Flink Issue Type: Improvement Components: Start-Stop Scripts Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 When starting Flink, we don't pass any particular GC related JVM flags to the system. That means, it uses the default garbage collectors, which are the bulk parallel GCs for both old gen and new gen. For streaming applications, this results in vastly fluctuating latencies. Latencies are much more constant with either the {{CMS}} or {{G1}} GC. I propose to make the CMS the default GC for streaming setups. G1 may become the GC of choice in the future, but fro various articles I found, it is still somewhat in beta status (see for example here: http://jaxenter.com/kirk-pepperdine-on-the-g1-for-java-9-118190.html ) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2200] Add Flink with Scala 2.11 in Mave...
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/885#issuecomment-120411775 Hi, I updated PR :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2008][FLINK-2296] Fix checkpoint commit...
Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/895#issuecomment-120421266 :+1: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2008) PersistentKafkaSource is sometimes emitting tuples multiple times
[ https://issues.apache.org/jira/browse/FLINK-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622359#comment-14622359 ] ASF GitHub Bot commented on FLINK-2008: --- Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/895#issuecomment-120421266 :+1: PersistentKafkaSource is sometimes emitting tuples multiple times - Key: FLINK-2008 URL: https://issues.apache.org/jira/browse/FLINK-2008 Project: Flink Issue Type: Bug Components: Kafka Connector, Streaming Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Robert Metzger The PersistentKafkaSource is expected to emit records exactly once. Two test cases of the KafkaITCase are sporadically failing because records are emitted multiple times. Affected tests: {{testPersistentSourceWithOffsetUpdates()}}, after the offsets have been changed manually in ZK: {code} java.lang.RuntimeException: Expected v to be 3, but was 4 on element 0 array=[4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2] {code} {{brokerFailureTest()}} also fails: {code} 05/13/2015 08:13:16 Custom source - Stream Sink(1/1) switched to FAILED java.lang.AssertionError: Received tuple with value 21 twice at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertFalse(Assert.java:64) at org.apache.flink.streaming.connectors.kafka.KafkaITCase$15.invoke(KafkaITCase.java:877) at org.apache.flink.streaming.connectors.kafka.KafkaITCase$15.invoke(KafkaITCase.java:859) at org.apache.flink.streaming.api.operators.StreamSink.callUserFunction(StreamSink.java:39) at org.apache.flink.streaming.api.operators.StreamOperator.callUserFunctionAndLogException(StreamOperator.java:137) at org.apache.flink.streaming.api.operators.ChainableStreamOperator.collect(ChainableStreamOperator.java:54) at org.apache.flink.streaming.api.collector.CollectorWrapper.collect(CollectorWrapper.java:39) at org.apache.flink.streaming.connectors.kafka.api.persistent.PersistentKafkaSource.run(PersistentKafkaSource.java:173) at org.apache.flink.streaming.api.operators.StreamSource.callUserFunction(StreamSource.java:40) at org.apache.flink.streaming.api.operators.StreamOperator.callUserFunctionAndLogException(StreamOperator.java:137) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:34) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:139) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2343) Change default garbage collector in streaming environments
Stephan Ewen created FLINK-2343: --- Summary: Change default garbage collector in streaming environments Key: FLINK-2343 URL: https://issues.apache.org/jira/browse/FLINK-2343 Project: Flink Issue Type: Improvement Components: Start-Stop Scripts Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 When starting Flink, we don't pass any particular GC related JVM flags to the system. That means, it uses the default garbage collectors, which are the bulk parallel GCs for both old gen and new gen. For streaming applications, this results in vastly fluctuating latencies. Latencies are much more constant with either the {{CMS}} or {{G1}} GC. I propose to make the CMS the default GC for streaming setups. G1 may become the GC of choice in the future, but fro various articles I found, it is still somewhat in beta status (see for example here: http://jaxenter.com/kirk-pepperdine-on-the-g1-for-java-9-118190.html ) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2328) Applying more than one transformation on an IterativeDataStream fails
[ https://issues.apache.org/jira/browse/FLINK-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gyula Fora closed FLINK-2328. - Resolution: Duplicate This issue will be solved by this: https://issues.apache.org/jira/browse/FLINK-2335 Applying more than one transformation on an IterativeDataStream fails - Key: FLINK-2328 URL: https://issues.apache.org/jira/browse/FLINK-2328 Project: Flink Issue Type: Bug Components: Streaming Reporter: Gyula Fora Assignee: Gyula Fora Currently the user cannot apply more than one transformation on the IterativeDataStream directly. It fails because instead of creating one iteration source and connecting it to the operators it will try to create two iteration sources which fails on the shared broker slot. A workaround is to use a no-op mapper on the iterative stream and applying the two transformations on that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Make streaming iterations more robust
Github user gyfora commented on a diff in the pull request: https://github.com/apache/flink/pull/900#discussion_r34357746 --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/datastream/IterativeDataStream.java --- @@ -201,14 +204,16 @@ public ConnectedIterativeDataStream(IterativeDataStreamI input, TypeInformatio * @return The feedback stream. * */ + @SuppressWarnings({ rawtypes, unchecked }) public DataStreamF closeWith(DataStreamF feedbackStream) { - DataStreamF iterationSink = new DataStreamSinkF(input.environment, Iteration Sink, - null, null); + if (input.closed) { + throw new IllegalStateException( --- End diff -- I think this is a fair restriction, the user can always union. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2200] Add Flink with Scala 2.11 in Mave...
Github user aalexandrov commented on the pull request: https://github.com/apache/flink/pull/885#issuecomment-120423639 Looks very neat! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2111) Add stop signal to cleanly shutdown streaming jobs
[ https://issues.apache.org/jira/browse/FLINK-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622264#comment-14622264 ] ASF GitHub Bot commented on FLINK-2111: --- Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/750#issuecomment-120408249 Changes applied: - renamed signal name from TERMINATE to STOP (as decided by community vote) - reverted changes to execution graph state machine - added interface `Stopable` an reverted method `AbstractInvocable.stop()` - added `JobType` enum (Batching / Streaming) [as discussed at last Meetup with @StephanEwen ] - extended test coverage Please review again. Add stop signal to cleanly shutdown streaming jobs Key: FLINK-2111 URL: https://issues.apache.org/jira/browse/FLINK-2111 Project: Flink Issue Type: Improvement Components: Distributed Runtime, JobManager, Local Runtime, Streaming, TaskManager, Webfrontend Reporter: Matthias J. Sax Assignee: Matthias J. Sax Priority: Minor Currently, streaming jobs can only be stopped using cancel command, what is a hard stop with no clean shutdown. The new introduced stop signal, will only affect streaming source tasks such that the sources can stop emitting data and shutdown cleanly, resulting in a clean shutdown of the whole streaming job. This feature is a pre-requirment for https://issues.apache.org/jira/browse/FLINK-1929 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2200) Flink API with Scala 2.11 - Maven Repository
[ https://issues.apache.org/jira/browse/FLINK-2200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622276#comment-14622276 ] ASF GitHub Bot commented on FLINK-2200: --- Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/885#issuecomment-120411775 Hi, I updated PR :) Flink API with Scala 2.11 - Maven Repository Key: FLINK-2200 URL: https://issues.apache.org/jira/browse/FLINK-2200 Project: Flink Issue Type: Wish Components: Build System, Scala API Reporter: Philipp Götze Assignee: Chiwan Park Priority: Trivial Labels: maven It would be nice if you could upload a pre-built version of the Flink API with Scala 2.11 to the maven repository. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Make streaming iterations more robust
Github user senorcarbone commented on a diff in the pull request: https://github.com/apache/flink/pull/900#discussion_r34359293 --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/datastream/IterativeDataStream.java --- @@ -201,14 +204,16 @@ public ConnectedIterativeDataStream(IterativeDataStreamI input, TypeInformatio * @return The feedback stream. * */ + @SuppressWarnings({ rawtypes, unchecked }) public DataStreamF closeWith(DataStreamF feedbackStream) { - DataStreamF iterationSink = new DataStreamSinkF(input.environment, Iteration Sink, - null, null); + if (input.closed) { + throw new IllegalStateException( --- End diff -- sure --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2343] [scripts] Make CMS the default GC...
GitHub user StephanEwen opened a pull request: https://github.com/apache/flink/pull/901 [FLINK-2343] [scripts] Make CMS the default GC for streaming setups When the system is started with `start-cluster-streaming.sh`, this patch sets the Garbage Collector to Concurrent Mark Sweep, to keep a low and smooth latency. Without these options, the parallel bulk GC introduces frequent latency spikes. The GC option is currently only set if no other java option is set via `env.java.opts`. Otherwise, if users define their favorite GC in these options, the JVM may not start due to option conflicts. When setting the CMS GC, we also set the `CMSClassUnloadingEnabled` option, to make sure that user-code classes are removed once the classloader expires. You can merge this pull request into a Git repository by running: $ git pull https://github.com/StephanEwen/incubator-flink gc_option Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/901.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #901 commit 8840d7508668504bade7f28ca128389d5e1392d1 Author: Stephan Ewen se...@apache.org Date: 2015-07-10T14:37:28Z [FLINK-2343] [scripts] Make CMS the default GC for streaming setups --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-2344) Deprecate/Remove the old Pact Pair type
Stephan Ewen created FLINK-2344: --- Summary: Deprecate/Remove the old Pact Pair type Key: FLINK-2344 URL: https://issues.apache.org/jira/browse/FLINK-2344 Project: Flink Issue Type: Improvement Components: Core Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2337) Multiple SLF4J bindings using Storm compatibility layer
[ https://issues.apache.org/jira/browse/FLINK-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622669#comment-14622669 ] ASF GitHub Bot commented on FLINK-2337: --- Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/903#issuecomment-120479371 Travis failed due to unrelated error: Failed tests: ProcessFailureStreamingRecoveryITCaseAbstractProcessFailureRecoveryTest.testTaskManagerProcessFailure:198 The program encountered a ProgramInvocationException : The program execution failed: Job execution failed. Does anyone know anything about this failing test? The log warning `SLF4J: Class path contains multiple SLF4J bindings.` is gone for the 4 successful Travis runs. This PR should be ready to get merged. Multiple SLF4J bindings using Storm compatibility layer --- Key: FLINK-2337 URL: https://issues.apache.org/jira/browse/FLINK-2337 Project: Flink Issue Type: Bug Components: flink-contrib Reporter: Matthias J. Sax Assignee: Matthias J. Sax Priority: Minor Storm depends on logback as slf4j implemenation but Flink uses log4j. The log shows the following conflict: SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/cicero/.m2/repository/ch/qos/logback/logback-classic/1.0.13/logback-classic-1.0.13.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/cicero/.m2/repository/org/slf4j/slf4j-log4j12/1.7.7/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. Need to exclude logback from storm dependencies to fix this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2218) Web client cannot distinguesh between Flink options and program arguments
[ https://issues.apache.org/jira/browse/FLINK-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax updated FLINK-2218: --- Summary: Web client cannot distinguesh between Flink options and program arguments (was: Web client cannot specify parallelism) Web client cannot distinguesh between Flink options and program arguments - Key: FLINK-2218 URL: https://issues.apache.org/jira/browse/FLINK-2218 Project: Flink Issue Type: Improvement Components: Webfrontend Affects Versions: master Reporter: Ufuk Celebi Assignee: Matthias J. Sax Programs submitted via the CLI can set the parallelism of a program ({{-p}}). This is not possible for programs submitted via the web client. With the web client, the program needs to take care of setting the parallelism (by having an extra program arg) or the default parallelism is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2218) Web client cannot distinguesh between Flink options and program arguments
[ https://issues.apache.org/jira/browse/FLINK-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax updated FLINK-2218: --- Description: WebClient has only one input field for arguments. This field is used for Flink options (e.g., `-p`) and program arguments. Thus, supported Flink options restrict the possible program arguments. CliFrontend in contrast can distinguish both and thus `-p` can also be used as an program argument. Solution: add a second input field for Flink options to WebClient was: Programs submitted via the CLI can set the parallelism of a program ({{-p}}). This is not possible for programs submitted via the web client. With the web client, the program needs to take care of setting the parallelism (by having an extra program arg) or the default parallelism is used. Web client cannot distinguesh between Flink options and program arguments - Key: FLINK-2218 URL: https://issues.apache.org/jira/browse/FLINK-2218 Project: Flink Issue Type: Improvement Components: Webfrontend Affects Versions: master Reporter: Ufuk Celebi Assignee: Matthias J. Sax WebClient has only one input field for arguments. This field is used for Flink options (e.g., `-p`) and program arguments. Thus, supported Flink options restrict the possible program arguments. CliFrontend in contrast can distinguish both and thus `-p` can also be used as an program argument. Solution: add a second input field for Flink options to WebClient -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2337] Multiple SLF4J bindings using Sto...
Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/903#issuecomment-120479371 Travis failed due to unrelated error: Failed tests: ProcessFailureStreamingRecoveryITCaseAbstractProcessFailureRecoveryTest.testTaskManagerProcessFailure:198 The program encountered a ProgramInvocationException : The program execution failed: Job execution failed. Does anyone know anything about this failing test? The log warning `SLF4J: Class path contains multiple SLF4J bindings.` is gone for the 4 successful Travis runs. This PR should be ready to get merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2292) Report accumulators periodically while job is running
[ https://issues.apache.org/jira/browse/FLINK-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622590#comment-14622590 ] ASF GitHub Bot commented on FLINK-2292: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/896#issuecomment-120460741 I've addressed your comments in a new commit. Report accumulators periodically while job is running - Key: FLINK-2292 URL: https://issues.apache.org/jira/browse/FLINK-2292 Project: Flink Issue Type: Sub-task Components: JobManager, TaskManager Reporter: Maximilian Michels Assignee: Maximilian Michels Fix For: 0.10 Accumulators should be sent periodically, as part of the heartbeat that sends metrics. This allows them to be updated in real time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2292][FLINK-1573] add live per-task acc...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/896#issuecomment-120460741 I've addressed your comments in a new commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2218] Web client cannot distinguesh bet...
GitHub user mjsax opened a pull request: https://github.com/apache/flink/pull/904 [FLINK-2218] Web client cannot distinguesh between Flink options and program arguments added new input fields 'options' to WebClient adapted WebClient-to-JobManager job submission logic updated documentation (including new screenshots) (some additional minor cleanup in launch.html and program.js) You can merge this pull request into a Git repository by running: $ git pull https://github.com/mjsax/flink flink-2218-WebClientDOP Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/904.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #904 commit 3f8a02b8233a136b5b5eb890ed11f7861b346bbc Author: mjsax mj...@informatik.hu-berlin.de Date: 2015-07-10T14:53:13Z [FLINK-2218] Web client cannot distinguesh between Flink options and program arguments added new input fields 'options' to WebClient adapted WebClient-to-JobManager job submission logic updated documentation (including new screenshots) (some additional minor cleanup in launch.html and program.js) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2218) Web client cannot distinguesh between Flink options and program arguments
[ https://issues.apache.org/jira/browse/FLINK-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622775#comment-14622775 ] ASF GitHub Bot commented on FLINK-2218: --- GitHub user mjsax opened a pull request: https://github.com/apache/flink/pull/904 [FLINK-2218] Web client cannot distinguesh between Flink options and program arguments added new input fields 'options' to WebClient adapted WebClient-to-JobManager job submission logic updated documentation (including new screenshots) (some additional minor cleanup in launch.html and program.js) You can merge this pull request into a Git repository by running: $ git pull https://github.com/mjsax/flink flink-2218-WebClientDOP Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/904.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #904 commit 3f8a02b8233a136b5b5eb890ed11f7861b346bbc Author: mjsax mj...@informatik.hu-berlin.de Date: 2015-07-10T14:53:13Z [FLINK-2218] Web client cannot distinguesh between Flink options and program arguments added new input fields 'options' to WebClient adapted WebClient-to-JobManager job submission logic updated documentation (including new screenshots) (some additional minor cleanup in launch.html and program.js) Web client cannot distinguesh between Flink options and program arguments - Key: FLINK-2218 URL: https://issues.apache.org/jira/browse/FLINK-2218 Project: Flink Issue Type: Improvement Components: Webfrontend Affects Versions: master Reporter: Ufuk Celebi Assignee: Matthias J. Sax WebClient has only one input field for arguments. This field is used for Flink options (e.g., `-p`) and program arguments. Thus, supported Flink options restrict the possible program arguments. CliFrontend in contrast can distinguish both and thus `-p` can also be used as an program argument. Solution: add a second input field for Flink options to WebClient -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2138) PartitionCustom for streaming
[ https://issues.apache.org/jira/browse/FLINK-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622813#comment-14622813 ] ASF GitHub Bot commented on FLINK-2138: --- Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/872#issuecomment-120506018 Looks good to merge. If no objections, I will merge it tomorrow. :+1: PartitionCustom for streaming - Key: FLINK-2138 URL: https://issues.apache.org/jira/browse/FLINK-2138 Project: Flink Issue Type: New Feature Components: Streaming Affects Versions: 0.9 Reporter: Márton Balassi Assignee: Gábor Hermann Priority: Minor The batch API has support for custom partitioning, this should be added for streaming with a similar signature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2138] Added custom partitioning to Data...
Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/872#issuecomment-120506018 Looks good to merge. If no objections, I will merge it tomorrow. :+1: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---