[jira] [Created] (FLINK-2827) Potential resource leak in TwitterSource#loadAuthenticationProperties()
Ted Yu created FLINK-2827: - Summary: Potential resource leak in TwitterSource#loadAuthenticationProperties() Key: FLINK-2827 URL: https://issues.apache.org/jira/browse/FLINK-2827 Project: Flink Issue Type: Bug Reporter: Ted Yu Here is related code: {code} Properties properties = new Properties(); try { InputStream input = new FileInputStream(authPath); properties.load(input); input.close(); } catch (Exception e) { throw new RuntimeException("Cannot open .properties file: " + authPath, e); } {code} If there is exception coming out of properties.load() call, input would be left open. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2826) transformed is modified in BroadcastVariableMaterialization#decrementReferenceInternal without proper locking
Ted Yu created FLINK-2826: - Summary: transformed is modified in BroadcastVariableMaterialization#decrementReferenceInternal without proper locking Key: FLINK-2826 URL: https://issues.apache.org/jira/browse/FLINK-2826 Project: Flink Issue Type: Bug Reporter: Ted Yu Here is related code: {code} if (references.isEmpty()) { disposed = true; data = null; transformed = null; {code} Elsewhere, transformed is modified with lock on BroadcastVariableMaterialization.this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2825) FlinkClient.killTopology fails due to missing leader session ID
Matthias J. Sax created FLINK-2825: -- Summary: FlinkClient.killTopology fails due to missing leader session ID Key: FLINK-2825 URL: https://issues.apache.org/jira/browse/FLINK-2825 Project: Flink Issue Type: Bug Components: Storm Compatibility Reporter: Matthias J. Sax Assignee: Matthias J. Sax Calling FlinkClient.killTopology(...) fails with exception: {noformat} java.lang.Exception: Received a message CancelJob(e56e680ab6781ce3a16bdac7bcdb4dd7) without a leader session ID, even though the message requires a leader session ID. at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:41) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:107) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2824) Iteration feedback partitioning does not work as expected
Gyula Fora created FLINK-2824: - Summary: Iteration feedback partitioning does not work as expected Key: FLINK-2824 URL: https://issues.apache.org/jira/browse/FLINK-2824 Project: Flink Issue Type: Bug Components: Streaming Reporter: Gyula Fora Priority: Blocker Iteration feedback partitioning is not handled transparently and can cause serious issues if the user does not know the specific implementation details of streaming iterations (which is not a realistic expectation). Example: IterativeStream it = ... (parallelism 1) DataStream mapped = it.map(...) (parallelism 2) // this does not work as the feedback has parallelism 2 != 1 // it.closeWith(mapped.partitionByHash(someField)) // so we need rebalance the data it.closeWith(mapped.map(NoOpMap).setParallelism(1).partitionByHash(someField)) This program will execute but the feedback will not be partitioned by hash to the mapper instances: The partitioning will be set from the noOpMap to the iteration sink which has parallelism different from the mapper (1 vs 2) and then the iteration source forwards the element to the mapper (always to 0). So the problem is basically that the iteration source/sink pair gets the parallelism of the input stream (p=1) not the head operator (p = 2) which leads to incorrect partitioning. Workaround: Set input parallelism to the same as the head operator Suggested solution: The iteration construction should be reworked to set the parallelism of the source/sink to the parallelism of the head operator (and validate that all heads have the same parallelism) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2823) YARN client should report a proper exception if Hadoop Env variables are not set
Stephan Ewen created FLINK-2823: --- Summary: YARN client should report a proper exception if Hadoop Env variables are not set Key: FLINK-2823 URL: https://issues.apache.org/jira/browse/FLINK-2823 Project: Flink Issue Type: Bug Components: YARN Client Affects Versions: 0.10 Reporter: Stephan Ewen Fix For: 0.10 The important variables are {{HADOOP_HOME}} and {{HADOOP_CONF_DIR}}. If none of the two environment variables are set, the YARN client does not pick the correct settings / file system to successfully deploy a job. The YARN client should throw a good exception. Right now, this requires some tricky debugging to find the cause of the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Failing test
If there is none yet, then we do. Label it with "test-stability". I think the consensus was also to mark it as critical. Otherwise, just add the log to the JIRA. On Tue, Oct 6, 2015 at 2:57 PM, Matthias J. Sax wrote: > Hi, > > One test just failed on current master: > https://travis-ci.org/apache/flink/jobs/83871008 > > Do we need a JIRA? > > > LeaderChangeStateCleanupTest.testReelectionOfSameJobManager:245 » > Timeout Futu... > > > -Matthias > >
Failing test
Hi, One test just failed on current master: https://travis-ci.org/apache/flink/jobs/83871008 Do we need a JIRA? > LeaderChangeStateCleanupTest.testReelectionOfSameJobManager:245 » Timeout > Futu... -Matthias signature.asc Description: OpenPGP digital signature
[jira] [Created] (FLINK-2822) Windowing classes incorrectly import scala.Serializable
Gyula Fora created FLINK-2822: - Summary: Windowing classes incorrectly import scala.Serializable Key: FLINK-2822 URL: https://issues.apache.org/jira/browse/FLINK-2822 Project: Flink Issue Type: Bug Reporter: Gyula Fora Assignee: Gyula Fora Priority: Blocker Windowing classes incorrectly pull in the Scala serializable interface. grep -lr scala.Serializable * flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/windowing/assigners/WindowAssigner.java flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/windowing/evictors/Evictor.java flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/windowing/triggers/Trigger.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Towards Flink 0.10
@Stephan: There is an issue for implementing a detached job submission: https://issues.apache.org/jira/browse/FLINK-2797. That has the advantage that the execution mode and control flow is not hard-coded in the user job. I think it makes more sense to integrate methods like (cancel/getAccumulators/scale) in the Client class and then also make them accessible via the command-line client. IMHO we could think about a special executeWithControl() method for these things but I would not break the existing execute(). On Tue, Oct 6, 2015 at 12:18 PM, Alexey Sapozhnikov wrote: > Hello everyone. > > Stephan, will 0.10 include the cache issues for createRemoteEnvironment? > That not every addressing of environment will retransmit all the jars? > > On Tue, Oct 6, 2015 at 1:16 PM, Stephan Ewen wrote: > > > How about making a quick effort for this: > > https://issues.apache.org/jira/browse/FLINK-2313 > > > > It introduces an execution control handle (returned by > > StreamExecutionEnvironment.execute()) for operations like cancel(), > > getAccumulators(), scaleIn/Out(), ... > > It would be big time API breaking, so would be good to have it before the > > release. > > > > As a first version we could return a control handle that has only a > single > > method "waitForCompletion()" to emulate the current behavior. > > Or we postpone it and later add a method "executeWithControl()" or so, > that > > returns the control handle. > > > > > > On Mon, Oct 5, 2015 at 6:34 PM, Vasiliki Kalavri < > > vasilikikala...@gmail.com> > > wrote: > > > > > Yes, FLINK-2785 that's right! > > > Alright, thanks a lot! > > > > > > On 5 October 2015 at 18:31, Fabian Hueske wrote: > > > > > > > Hi Vasia, > > > > > > > > I guess you are referring to FLINK-2785. Should be fine, as there is > a > > PR > > > > already. > > > > I'll add it to the list. > > > > > > > > Would be nice if you could take care of FLINK-2786 (remove Spargel). > > > > > > > > Cheers, Fabian > > > > > > > > 2015-10-05 18:25 GMT+02:00 Vasiliki Kalavri < > vasilikikala...@gmail.com > > >: > > > > > > > > > Thank you Max for putting the list together and to whomever added > > > > > FLINK-2561 to the list :) > > > > > I would also add FLINK-2561 (pending PR #1205). It's a sub-task of > > > > > FLINK-2561, so maybe it's covered as is. > > > > > > > > > > If we go for Gelly graduation, I can take care of FLINK-2786 > "Remove > > > > > Spargel from source code and update documentation in favor of > Gelly", > > > but > > > > > maybe it makes sense to wait for the restructuring, since we're > > getting > > > > rid > > > > > of staging altogether? > > > > > > > > > > -V. > > > > > > > > > > On 5 October 2015 at 17:51, Fabian Hueske > wrote: > > > > > > > > > > > Thanks Max. > > > > > > I extended the list of issues to fix for the release. > > > > > > > > > > > > 2015-10-05 17:10 GMT+02:00 Maximilian Michels : > > > > > > > > > > > > > Thanks Greg, we have added that to the list of API breaking > > > changes. > > > > > > > > > > > > > > On Mon, Oct 5, 2015 at 4:36 PM, Greg Hogan > > > > > wrote: > > > > > > > > > > > > > > > Max, > > > > > > > > > > > > > > > > Stephan noted that FLINK-2723 is an API breaking change. The > > > > > > > CopyableValue > > > > > > > > interface has a new method "T copy()". Commit > > > > > > > > e727355e42bd0ad7d403aee703aaf33a68a839d2 > > > > > > > > > > > > > > > > Greg > > > > > > > > > > > > > > > > On Mon, Oct 5, 2015 at 10:20 AM, Maximilian Michels < > > > > m...@apache.org> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi Flinksters, > > > > > > > > > > > > > > > > > > After a lot of development effort in the past months, it is > > > about > > > > > > time > > > > > > > > > to move towards the next major release. We decided to move > > > > towards > > > > > > > > > 0.10 instead of a milestone release. This release will > > probably > > > > be > > > > > > the > > > > > > > > > last release before 1.0. > > > > > > > > > > > > > > > > > > For 0.10 we most noticeably have the new Streaming API > which > > > > comes > > > > > > > > > with an improved runtime including exactly-once sources and > > > > sinks. > > > > > > > > > Additionally, we have a new web interface with support for > > > > > > > > > live-monitoring. Not to mention the countless fixes and > > > > > improvements. > > > > > > > > > > > > > > > > > > I've been ransacking the JIRA issues to find out what > issues > > we > > > > > have > > > > > > > > > to fix before we can release. I've put these issues on the > > 0.10 > > > > > wiki > > > > > > > > > page: > > > > > https://cwiki.apache.org/confluence/display/FLINK/0.10+Release > > > > > > > > > > > > > > > > > > It would be great to fix all of these issues. However, I > > think > > > we > > > > > > need > > > > > > > > > to be pragmatic and pick the most pressing ones. Let's do > > that > > > on > > > > > the > > > > > > > > > wiki page on the "To Be Fixed" section. > > > > > > > > > > > > > > > > > > Cheers, > > > > > > >
Re: Towards Flink 0.10
Hello everyone. Stephan, will 0.10 include the cache issues for createRemoteEnvironment? That not every addressing of environment will retransmit all the jars? On Tue, Oct 6, 2015 at 1:16 PM, Stephan Ewen wrote: > How about making a quick effort for this: > https://issues.apache.org/jira/browse/FLINK-2313 > > It introduces an execution control handle (returned by > StreamExecutionEnvironment.execute()) for operations like cancel(), > getAccumulators(), scaleIn/Out(), ... > It would be big time API breaking, so would be good to have it before the > release. > > As a first version we could return a control handle that has only a single > method "waitForCompletion()" to emulate the current behavior. > Or we postpone it and later add a method "executeWithControl()" or so, that > returns the control handle. > > > On Mon, Oct 5, 2015 at 6:34 PM, Vasiliki Kalavri < > vasilikikala...@gmail.com> > wrote: > > > Yes, FLINK-2785 that's right! > > Alright, thanks a lot! > > > > On 5 October 2015 at 18:31, Fabian Hueske wrote: > > > > > Hi Vasia, > > > > > > I guess you are referring to FLINK-2785. Should be fine, as there is a > PR > > > already. > > > I'll add it to the list. > > > > > > Would be nice if you could take care of FLINK-2786 (remove Spargel). > > > > > > Cheers, Fabian > > > > > > 2015-10-05 18:25 GMT+02:00 Vasiliki Kalavri >: > > > > > > > Thank you Max for putting the list together and to whomever added > > > > FLINK-2561 to the list :) > > > > I would also add FLINK-2561 (pending PR #1205). It's a sub-task of > > > > FLINK-2561, so maybe it's covered as is. > > > > > > > > If we go for Gelly graduation, I can take care of FLINK-2786 "Remove > > > > Spargel from source code and update documentation in favor of Gelly", > > but > > > > maybe it makes sense to wait for the restructuring, since we're > getting > > > rid > > > > of staging altogether? > > > > > > > > -V. > > > > > > > > On 5 October 2015 at 17:51, Fabian Hueske wrote: > > > > > > > > > Thanks Max. > > > > > I extended the list of issues to fix for the release. > > > > > > > > > > 2015-10-05 17:10 GMT+02:00 Maximilian Michels : > > > > > > > > > > > Thanks Greg, we have added that to the list of API breaking > > changes. > > > > > > > > > > > > On Mon, Oct 5, 2015 at 4:36 PM, Greg Hogan > > > wrote: > > > > > > > > > > > > > Max, > > > > > > > > > > > > > > Stephan noted that FLINK-2723 is an API breaking change. The > > > > > > CopyableValue > > > > > > > interface has a new method "T copy()". Commit > > > > > > > e727355e42bd0ad7d403aee703aaf33a68a839d2 > > > > > > > > > > > > > > Greg > > > > > > > > > > > > > > On Mon, Oct 5, 2015 at 10:20 AM, Maximilian Michels < > > > m...@apache.org> > > > > > > > wrote: > > > > > > > > > > > > > > > Hi Flinksters, > > > > > > > > > > > > > > > > After a lot of development effort in the past months, it is > > about > > > > > time > > > > > > > > to move towards the next major release. We decided to move > > > towards > > > > > > > > 0.10 instead of a milestone release. This release will > probably > > > be > > > > > the > > > > > > > > last release before 1.0. > > > > > > > > > > > > > > > > For 0.10 we most noticeably have the new Streaming API which > > > comes > > > > > > > > with an improved runtime including exactly-once sources and > > > sinks. > > > > > > > > Additionally, we have a new web interface with support for > > > > > > > > live-monitoring. Not to mention the countless fixes and > > > > improvements. > > > > > > > > > > > > > > > > I've been ransacking the JIRA issues to find out what issues > we > > > > have > > > > > > > > to fix before we can release. I've put these issues on the > 0.10 > > > > wiki > > > > > > > > page: > > > > https://cwiki.apache.org/confluence/display/FLINK/0.10+Release > > > > > > > > > > > > > > > > It would be great to fix all of these issues. However, I > think > > we > > > > > need > > > > > > > > to be pragmatic and pick the most pressing ones. Let's do > that > > on > > > > the > > > > > > > > wiki page on the "To Be Fixed" section. > > > > > > > > > > > > > > > > Cheers, > > > > > > > > Max > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- *Regards* *Alexey Sapozhnikov* CTO& Co-Founder Scalabillit Inc Aba Even 10-C, Herzelia, Israel M : +972-52-2363823 E : ale...@scalabill.it W : http://www.scalabill.it YT - https://youtu.be/9Rj309PTOFA Map:http://mapta.gs/Scalabillit Revolutionizing Proof-of-Concept
Re: Towards Flink 0.10
How about making a quick effort for this: https://issues.apache.org/jira/browse/FLINK-2313 It introduces an execution control handle (returned by StreamExecutionEnvironment.execute()) for operations like cancel(), getAccumulators(), scaleIn/Out(), ... It would be big time API breaking, so would be good to have it before the release. As a first version we could return a control handle that has only a single method "waitForCompletion()" to emulate the current behavior. Or we postpone it and later add a method "executeWithControl()" or so, that returns the control handle. On Mon, Oct 5, 2015 at 6:34 PM, Vasiliki Kalavri wrote: > Yes, FLINK-2785 that's right! > Alright, thanks a lot! > > On 5 October 2015 at 18:31, Fabian Hueske wrote: > > > Hi Vasia, > > > > I guess you are referring to FLINK-2785. Should be fine, as there is a PR > > already. > > I'll add it to the list. > > > > Would be nice if you could take care of FLINK-2786 (remove Spargel). > > > > Cheers, Fabian > > > > 2015-10-05 18:25 GMT+02:00 Vasiliki Kalavri : > > > > > Thank you Max for putting the list together and to whomever added > > > FLINK-2561 to the list :) > > > I would also add FLINK-2561 (pending PR #1205). It's a sub-task of > > > FLINK-2561, so maybe it's covered as is. > > > > > > If we go for Gelly graduation, I can take care of FLINK-2786 "Remove > > > Spargel from source code and update documentation in favor of Gelly", > but > > > maybe it makes sense to wait for the restructuring, since we're getting > > rid > > > of staging altogether? > > > > > > -V. > > > > > > On 5 October 2015 at 17:51, Fabian Hueske wrote: > > > > > > > Thanks Max. > > > > I extended the list of issues to fix for the release. > > > > > > > > 2015-10-05 17:10 GMT+02:00 Maximilian Michels : > > > > > > > > > Thanks Greg, we have added that to the list of API breaking > changes. > > > > > > > > > > On Mon, Oct 5, 2015 at 4:36 PM, Greg Hogan > > wrote: > > > > > > > > > > > Max, > > > > > > > > > > > > Stephan noted that FLINK-2723 is an API breaking change. The > > > > > CopyableValue > > > > > > interface has a new method "T copy()". Commit > > > > > > e727355e42bd0ad7d403aee703aaf33a68a839d2 > > > > > > > > > > > > Greg > > > > > > > > > > > > On Mon, Oct 5, 2015 at 10:20 AM, Maximilian Michels < > > m...@apache.org> > > > > > > wrote: > > > > > > > > > > > > > Hi Flinksters, > > > > > > > > > > > > > > After a lot of development effort in the past months, it is > about > > > > time > > > > > > > to move towards the next major release. We decided to move > > towards > > > > > > > 0.10 instead of a milestone release. This release will probably > > be > > > > the > > > > > > > last release before 1.0. > > > > > > > > > > > > > > For 0.10 we most noticeably have the new Streaming API which > > comes > > > > > > > with an improved runtime including exactly-once sources and > > sinks. > > > > > > > Additionally, we have a new web interface with support for > > > > > > > live-monitoring. Not to mention the countless fixes and > > > improvements. > > > > > > > > > > > > > > I've been ransacking the JIRA issues to find out what issues we > > > have > > > > > > > to fix before we can release. I've put these issues on the 0.10 > > > wiki > > > > > > > page: > > > https://cwiki.apache.org/confluence/display/FLINK/0.10+Release > > > > > > > > > > > > > > It would be great to fix all of these issues. However, I think > we > > > > need > > > > > > > to be pragmatic and pick the most pressing ones. Let's do that > on > > > the > > > > > > > wiki page on the "To Be Fixed" section. > > > > > > > > > > > > > > Cheers, > > > > > > > Max > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Re: Iteration feedback partitioning does not work properly
Hi, This is just a workaround, which actually breaks input order from my source. I think the iteration construction should be reworked to set the parallelism of the source/sink to the parallelism of the head operator (and validate that all heads have the same parallelism). I thought this was the solution that you described with Stephan in some older discussion before the rewrite. Cheers, Gyula Aljoscha Krettek ezt írta (időpont: 2015. okt. 6., K, 9:15): > Hi, > I think what you would like to to can be achieved by: > > IterativeStream it = in.map(IdentityMap).setParallelism(2).iterate() > DataStream mapped = it.map(...) > it.closeWith(mapped.partitionByHash(someField)) > > The input is rebalanced to the map inside the iteration as in your example > and the feedback should be partitioned by hash. > > Cheers, > Aljoscha > > > On Tue, 6 Oct 2015 at 00:11 Gyula Fóra wrote: > > > Hey, > > > > This question is mainly targeted towards Aljoscha but maybe someone can > > help me out here: > > > > I think the way feedback partitioning is handled does not work, let me > > illustrate with a simple example: > > > > IterativeStream it = ... (parallelism 1) > > DataStream mapped = it.map(...) (parallelism 2) > > // this does not work as the feedback has parallelism 2 != 1 > > // it.closeWith(mapped.partitionByHash(someField)) > > // so we need rebalance the data > > > > > it.closeWith(mapped.map(NoOpMap).setParallelism(1).partitionByHash(someField)) > > > > This program will execute but the feedback will not be partitioned by > hash > > to the mapper instances: > > The partitioning will be set from the noOpMap to the iteration sink which > > has parallelism different from the mapper (1 vs 2) and then the iteration > > source forwards the element to the mapper (always to 0). > > > > So the problem is basically that the iteration source/sink pair gets the > > parallelism of the input stream (p=1) not the head operator (p = 2) which > > leads to incorrect partitioning. > > > > Did I miss something here? > > > > Cheers, > > Gyula > > > > >
Re: streaming GroupBy + Fold
The window is actually part of the workaround we currently using (should have commented it out) where we use a window and a MapFunction instead of a Fold. Original I was running fold without a window facing the same problems. The workaround works for now so there is no urgency on that one. I just wanted to make sure I was not doing something stupid and it was a bug that you guys where aware of. cheers Martin On Tue, Oct 6, 2015 at 8:09 AM, Aljoscha Krettek wrote: > Hi, > If you are using a fold you are using none of the new code paths. I will > add support for Fold to the new windowing implementation today, though. > > Cheers, > Aljoscha > > On Mon, 5 Oct 2015 at 23:49 Márton Balassi > wrote: > > > Martin, I have looked at your code and you are running a fold in a > window, > > that is a very important distinction - the code paths are separate. > > Those code paths have been recently touched by Aljoscha if I am not > > mistaken. > > > > I have mocked up a simple example and could not reproduce your problem > > unfortunately. [1] Could you maybe produce a minimalistic example that we > > can actually execute? :) > > > > [1] > > > > > https://github.com/mbalassi/flink/commit/9f1f02d05e2bc2043a8f514d39fbf7753ea7058d > > > > On Mon, Oct 5, 2015 at 10:06 PM, Márton Balassi < > balassi.mar...@gmail.com> > > wrote: > > > > > Thanks, I am checking it out tomorrow morning. > > > > > > On Mon, Oct 5, 2015 at 9:59 PM, Martin Neumann > wrote: > > > > > >> Hej, > > >> > > >> Sorry it took so long to respond I needed to check if I was actually > > >> allowed to share the code since it uses internal datasets. > > >> > > >> In the appendix of this email you will find the main class of this job > > >> without the supporting classes or the actual dataset. If you want to > > run it > > >> you need to replace the dataset by something else but that should be > > >> trivial. > > >> If you just want to see the problem itself, have a look at the > appended > > >> log in conjunction with the code. Each ERROR printout in the log > > relates to > > >> an accumulator receiving wrong values. > > >> > > >> cheers Martin > > >> > > >> On Sat, Oct 3, 2015 at 11:29 AM, Márton Balassi < > > balassi.mar...@gmail.com > > >> > wrote: > > >> > > >>> Hey, > > >>> > > >>> Thanks for reporting the problem, Martin. I have not merged the PR > > >>> Stephan > > >>> is referring to yet. [1] There I am cleaning up some of the internals > > >>> too. > > >>> Just out of curiosity, could you share the code for the failing test > > >>> please? > > >>> > > >>> [1] https://github.com/apache/flink/pull/1155 > > >>> > > >>> On Fri, Oct 2, 2015 at 8:26 PM, Martin Neumann > > wrote: > > >>> > > >>> > One of my colleagues found it today when we where hunting bugs > today. > > >>> We > > >>> > where using the latest 0.10 version pulled from maven this morning. > > >>> > The program we where testing is new code so I cant tell you if the > > >>> behavior > > >>> > has changed or if it was always like this. > > >>> > > > >>> > On Fri, Oct 2, 2015 at 7:46 PM, Stephan Ewen > > wrote: > > >>> > > > >>> > > I think these operations were recently moved to the internal > state > > >>> > > interface. Did the behavior change then? > > >>> > > > > >>> > > @Marton or Gyula, can you comment? Is it per chance not mapped to > > the > > >>> > > partitioned state? > > >>> > > > > >>> > > On Fri, Oct 2, 2015 at 6:37 PM, Martin Neumann > > > >>> wrote: > > >>> > > > > >>> > > > Hej, > > >>> > > > > > >>> > > > In one of my Programs I run a Fold on a GroupedDataStream. The > > aim > > >>> is > > >>> > to > > >>> > > > aggregate the values in each group. > > >>> > > > It seems the aggregator in the Fold function is shared on > > operator > > >>> > level, > > >>> > > > so all groups that end up on the same operator get mashed > > together. > > >>> > > > > > >>> > > > Is this the wanted behavior? If so, what do I have to do to > > >>> separate > > >>> > > them? > > >>> > > > > > >>> > > > > > >>> > > > cheers Martin > > >>> > > > > > >>> > > > > >>> > > > >>> > > >> > > >> > > > > > >
Re: [DISCUSS] Introducing a review process for pull requests
One problem that we are seeing with FlinkML PRs is that there are simply not enough commiters to "shepherd" all of them. While I think this process would help generally, I don't think it would solve this kind of problem. Regards, Theodore On Mon, Oct 5, 2015 at 3:28 PM, Matthias J. Sax wrote: > One comment: > We should ensure that contributors follow discussions on the dev mailing > list. Otherwise, they might miss important discussions regarding their > PR (what happened already). Thus, the contributor was waiting for > feedback on the PR, while the reviewer(s) waited for the PR to be > updated according to the discussion consensus, resulting in unnecessary > delays. > > -Matthias > > > > On 10/05/2015 02:18 PM, Fabian Hueske wrote: > > Hi everybody, > > > > Along with our efforts to improve the “How to contribute” guide, I would > > like to start a discussion about a setting up a review process for pull > > requests. > > > > Right now, I feel that our PR review efforts are often a bit unorganized. > > This leads to situation such as: > > > > - PRs which are lingering around without review or feedback > > - PRs which got a +1 for merging but which did not get merged > > - PRs which have been rejected after a long time > > - PRs which became irrelevant because some component was rewritten > > - PRs which are lingering around and have been abandoned by their > > contributors > > > > To address these issues I propose to define a pull request review process > > as follows: > > > > 1. [Get a Shepherd] Each pull request is taken care of by a shepherd. > > Shepherds are committers that voluntarily sign up and *feel responsible* > > for helping the PR through the process until it is merged (or discarded). > > The shepherd is also the person to contact for the author of the PR. A > > committer becomes the shepherd of a PR by dropping a comment on Github > like > > “I would like to shepherd this PR”. A PR can be reassigned with lazy > > consensus. > > > > 2. [Accept Decision] The first decision for a PR should be whether it is > > accepted or not. This depends on a) whether it is a desired feature or > > improvement for Flink and b) whether the higher-level solution design is > > appropriate. In many cases such as bug fixes or discussed features or > > improvements, this should be an easy decision. In case of more a complex > > feature, the discussion should have been started when the mandatory JIRA > > was created. If it is still not clear whether the PR should be accepted > or > > not, a discussion should be started in JIRA (a JIRA issue needs to be > > created if none exists). The acceptance decision should be recorded by a > > “+1 to accept” message in Github. If the PR is not accepted, it should be > > closed in a timely manner. > > > > 3. [Merge Decision] Once a PR has been “accepted”, it should be brought > > into a mergeable state. This means the community should quickly react on > > contributor questions or PR updates. Everybody is encouraged to review > pull > > requests and give feedback. Ideally, the PR author does not have to wait > > for a long time to get feedback. The shepherd of a PR should feel > > responsible to drive this process forward. Once the PR is considered to > be > > mergeable, this should be recorded by a “+1 to merge” message in Github. > If > > the pull request becomes abandoned at some point in time, it should be > > either taken over by somebody else or be closed after a reasonable time. > > Again, this can be done by anybody, but the shepherd should feel > > responsible to resolve the issue. > > > > 4. Once, the PR is in a mergeable state, it should be merged. This can be > > done by anybody, however the shepherd should make sure that it happens > in a > > timely manner. > > > > IMPORTANT: Everybody is encouraged to discuss pull requests, give > feedback, > > and merge pull requests which are in a good shape. However, the shepherd > > should feel responsible to drive a PR forward if nothing happens. > > > > By defining a review process for pull requests, I hope we can > > > > - Improve the satisfaction of and interaction with contributors > > - Improve and speed-up the review process of pull requests. > > - Close irrelevant and stale PRs more timely > > - Reduce the effort for code reviewing > > > > The review process can be supported by some tooling: > > > > - A QA bot that checks the quality of pull requests such as increase of > > compiler warnings, code style, API changes, etc. > > - A PR management tool such as Sparks PR dashboard (see: > > https://spark-prs.appspot.com/) > > > > I would like to start discussions about using such tools later as > separate > > discussions. > > > > Looking forward to your feedback, > > Fabian > > > >
Re: Iteration feedback partitioning does not work properly
Hi, I think what you would like to to can be achieved by: IterativeStream it = in.map(IdentityMap).setParallelism(2).iterate() DataStream mapped = it.map(...) it.closeWith(mapped.partitionByHash(someField)) The input is rebalanced to the map inside the iteration as in your example and the feedback should be partitioned by hash. Cheers, Aljoscha On Tue, 6 Oct 2015 at 00:11 Gyula Fóra wrote: > Hey, > > This question is mainly targeted towards Aljoscha but maybe someone can > help me out here: > > I think the way feedback partitioning is handled does not work, let me > illustrate with a simple example: > > IterativeStream it = ... (parallelism 1) > DataStream mapped = it.map(...) (parallelism 2) > // this does not work as the feedback has parallelism 2 != 1 > // it.closeWith(mapped.partitionByHash(someField)) > // so we need rebalance the data > > it.closeWith(mapped.map(NoOpMap).setParallelism(1).partitionByHash(someField)) > > This program will execute but the feedback will not be partitioned by hash > to the mapper instances: > The partitioning will be set from the noOpMap to the iteration sink which > has parallelism different from the mapper (1 vs 2) and then the iteration > source forwards the element to the mapper (always to 0). > > So the problem is basically that the iteration source/sink pair gets the > parallelism of the input stream (p=1) not the head operator (p = 2) which > leads to incorrect partitioning. > > Did I miss something here? > > Cheers, > Gyula > >