Re: Implementing a list accumulator

2015-01-21 Thread Stephan Ewen
True, that is tricky. The user code does not necessarily respect the non-reuse mode. That may be true for any user code. Can the list accumulator immediately serialize the objects and send over a byte array? That should since it reliably without adding overhead (serialization will happen anyways).

Re: Very strange behaviour of groupBy() -> sort() -> first()

2015-01-21 Thread Fabian Hueske
This should directly go into the API, IMO. As I said, there are several open JIRAs for this issue. 2015-01-21 22:29 GMT+01:00 Felix Neutatz : > Thanks, @Fabian, your workaround works :) > > But I think this feature is really missing. Shall we add this functionality > natively or via the proposed

Re: Very strange behaviour of groupBy() -> sort() -> first()

2015-01-21 Thread Felix Neutatz
Thanks, @Fabian, your workaround works :) But I think this feature is really missing. Shall we add this functionality natively or via the proposed lib package? 2015-01-21 20:38 GMT+01:00 Fabian Hueske : > Chesnay is right. > Right now, it is not possible to do want you want in a straightforward

Re: Master not building and how to notice it faster in the future

2015-01-21 Thread Robert Metzger
I'm actually quite happy with the travis setup. Since most of the contributors/committers have travis activated for their forks, its very convenient to run tests on travis, without polluting any public apache infrastructure. Also, it would add another big dependency to a (perceived) unstable apache

Re: Master not building and how to notice it faster in the future

2015-01-21 Thread Henry Saputra
Would it be better to use Github Jenkins plugin [1] to connect to ASF Jenkins cluster? [1] https://wiki.jenkins-ci.org/display/JENKINS/GitHub+pull+request+builder+plugin [2] http://events.linuxfoundation.org/sites/events/files/slides/Jenkins_at_ASF_2014.pdf On Tue, Jan 20, 2015 at 2:57 PM, Robe

Re: Turn lazy operator execution off for streaming jobs

2015-01-21 Thread Stephan Ewen
I think that this is a fairly delicate thing. The execution graph / scheduling is the most delicate part of the system. I would not feel too well about a quick fix there, so let's think this through a little bit. The logic currently does the following: 1) It schedules the sources (see "Execution

[jira] [Created] (FLINK-1431) Add news mailing list to website

2015-01-21 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-1431: Summary: Add news mailing list to website Key: FLINK-1431 URL: https://issues.apache.org/jira/browse/FLINK-1431 Project: Flink Issue Type: Task Com

Re: Very strange behaviour of groupBy() -> sort() -> first()

2015-01-21 Thread Fabian Hueske
Chesnay is right. Right now, it is not possible to do want you want in a straightforward way because Flink does not support to fully sort a data set (there are several related issues in JIRA). A workaround would be to attach a constant value to each tuple, group on that (all tuples are sent to the

Re: Very strange behaviour of groupBy() -> sort() -> first()

2015-01-21 Thread Stephan Ewen
Chesnay is right. What you want is a non-grouped sort/first, which would need to be added... Stephan Am 21.01.2015 11:25 schrieb "Chesnay Schepler" < chesnay.schep...@fu-berlin.de>: > If i remember correctly first() returns the first n values for every > group. the javadocs actually don't make t

Re: Turn lazy operator execution off for streaming jobs

2015-01-21 Thread Gyula Fóra
Thank you! I will play around with it. On Wed, Jan 21, 2015 at 3:50 PM, Ufuk Celebi wrote: > Hey Gyula, > > On 21 Jan 2015, at 15:41, Gyula Fóra wrote: > > > Hey Guys, > > > > I think it would make sense to turn lazy operator execution off for > > streaming programs because it would make life s

Re: Very strange behaviour of groupBy() -> sort() -> first()

2015-01-21 Thread Chesnay Schepler
If i remember correctly first() returns the first n values for every group. the javadocs actually don't make this behaviour very clear. On 21.01.2015 19:18, Felix Neutatz wrote: Hi, my use case is the following: I have a Tuple2. I want to group by the String and sum up the Long values accordi

[jira] [Created] (FLINK-1430) Add test for streaming scala api completeness

2015-01-21 Thread JIRA
Márton Balassi created FLINK-1430: - Summary: Add test for streaming scala api completeness Key: FLINK-1430 URL: https://issues.apache.org/jira/browse/FLINK-1430 Project: Flink Issue Type: Tes

[jira] [Created] (FLINK-1429) Add documentation for streaming scala api

2015-01-21 Thread JIRA
Márton Balassi created FLINK-1429: - Summary: Add documentation for streaming scala api Key: FLINK-1429 URL: https://issues.apache.org/jira/browse/FLINK-1429 Project: Flink Issue Type: Task

Re: [flink-streaming] Regarding loops in the Job Graph

2015-01-21 Thread Stephan Ewen
Hi Paris! The Streaming API allows you to define iterations, where parts of the stream are fed back. Do those work for you? In general, cyclic flows are a tricky thing, as the topological order of operators is needed for scheduling (may not be important for continuous streams) but also for a clea

Very strange behaviour of groupBy() -> sort() -> first()

2015-01-21 Thread Felix Neutatz
Hi, my use case is the following: I have a Tuple2. I want to group by the String and sum up the Long values accordingly. This works fine with these lines: DataSet lineitems = getLineitemDataSet(env); lineitems.project(new int []{3,0}).groupBy(0).aggregate(Aggregations.SUM, 1); After the aggrega

Re: Implementing a list accumulator

2015-01-21 Thread Ufuk Celebi
The thing is that the DefaultCrossFunction always uses the same holder Tuple2 object, which is then handed over to the chained collect helper flatMap(). I can see why it is OK to keep the default functions to reuse "holder" objects, but when they are chained to an operator it becomes problematic

Re: Implementing a list accumulator

2015-01-21 Thread Ufuk Celebi
I just checked out your branch and ran it with a break point set at the CollectHelper. If you look into the (list) accumulator you see that always the same object is added to it. Strangely enough, object re-use is disabled in the config. I don't have time to look further into it now, but it seem

[jira] [Created] (FLINK-1428) Typos in Java code example for RichGroupReduceFunction

2015-01-21 Thread Felix Neutatz (JIRA)
Felix Neutatz created FLINK-1428: Summary: Typos in Java code example for RichGroupReduceFunction Key: FLINK-1428 URL: https://issues.apache.org/jira/browse/FLINK-1428 Project: Flink Issue Ty

[jira] [Created] (FLINK-1427) Configuration through environment variables

2015-01-21 Thread Max Michels (JIRA)
Max Michels created FLINK-1427: -- Summary: Configuration through environment variables Key: FLINK-1427 URL: https://issues.apache.org/jira/browse/FLINK-1427 Project: Flink Issue Type: Improvement

Re: Turn lazy operator execution off for streaming jobs

2015-01-21 Thread Ufuk Celebi
Hey Gyula, On 21 Jan 2015, at 15:41, Gyula Fóra wrote: > Hey Guys, > > I think it would make sense to turn lazy operator execution off for > streaming programs because it would make life simpler for windowing. I > also created a JIRA issue here >

Turn lazy operator execution off for streaming jobs

2015-01-21 Thread Gyula Fóra
Hey Guys, I think it would make sense to turn lazy operator execution off for streaming programs because it would make life simpler for windowing. I also created a JIRA issue here . Can anyone give me some quick pointers how to do this? Its proba

Re: [jira] [Created] (FLINK-1424) bin/flink run does not recognize -c parameter anymore

2015-01-21 Thread Alexander Alexandrov
We verified this today in the IMPRO-3 lecture. We also had problems passing parameters like "-c XXX" via the JobSubmission tool. 2015-01-21 2:46 GMT+01:00 Carsten Brandt (JIRA) : > Carsten Brandt created FLINK-1424: > - > > Summary: bin/flink run d

Re: Master not building and how to notice it faster in the future

2015-01-21 Thread Ufuk Celebi
Thanks for the nice script. I've just installed it :-) On 21 Jan 2015, at 13:57, Max Michels wrote: > I've created a pre-push hook that does what I described (and a bit > more). It does only enforce a check for the remote flink master branch > and doesn't disturb you on your pushes. > > https:/

Re: Master not building and how to notice it faster in the future

2015-01-21 Thread Max Michels
I've created a pre-push hook that does what I described (and a bit more). It does only enforce a check for the remote flink master branch and doesn't disturb you on your pushes. https://gist.github.com/mxm/4d1e26b901c66a682e4a Just put the the file in the .git/hooks/ directory of your repository

Re: Master not building and how to notice it faster in the future

2015-01-21 Thread Max Michels
@Robert The pre-push hook only resides in your local repository. It cannot be pushed. Thus, we cannot enforce this check but it certainly helps to prevent mistakes. As Ufuk mentioned, you can then even skip the check with the --no-verify option if you're really sure. On Wed, Jan 21, 2015 at 11:45

Re: [streaming] Regarding loops in the job graph

2015-01-21 Thread Márton Balassi
As far as I remember when we were implementing the iteration for streaming we deliberately "went off" the jobgraph and implemented the backward edge as an in-memory hook (practically the same in batch but not blocking in the backward channel). When we asked the reasoning here on the dev list we go

[jira] [Created] (FLINK-1426) JobManager AJAX requests sometimes fail

2015-01-21 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-1426: - Summary: JobManager AJAX requests sometimes fail Key: FLINK-1426 URL: https://issues.apache.org/jira/browse/FLINK-1426 Project: Flink Issue Type: Bug

[streaming] Regarding loops in the job graph

2015-01-21 Thread Paris Carbone
Hello, While implementing the SAMOA adapter for Flink-Streaming we stumbled upon the need to allow loops (or circular dependencies) in the job graph. Many incremental machine learning tasks define loops already and there is no trivial way of getting around it. In the streaming job graph builde

[flink-streaming] Regarding loops in the Job Graph

2015-01-21 Thread Paris Carbone
Hello, While implementing the SAMOA adapter for Flink-Streaming we stumbled upon the need to allow loops (or circular dependencies) in the job graph. Many incremental machine learning tasks define loops already and there is no trivial way of getting around it. In the streaming job graph builde

Re: Master not building and how to notice it faster in the future

2015-01-21 Thread Ufuk Celebi
On 21 Jan 2015, at 11:40, Robert Metzger wrote: > Is the git hook something we can control for everybody? I thought its more > like a personal thing everybody can set up if wanted? > > I'm against enforcing something like this for every committer. I don't want > to wait for 15 minutes for pushi

Re: Master not building and how to notice it faster in the future

2015-01-21 Thread Robert Metzger
Is the git hook something we can control for everybody? I thought its more like a personal thing everybody can set up if wanted? I'm against enforcing something like this for every committer. I don't want to wait for 15 minutes for pushing a typo fix to the documentation. On Wed, Jan 21, 2015 at

Re: Master not building and how to notice it faster in the future

2015-01-21 Thread Ufuk Celebi
On 21 Jan 2015, at 10:40, Till Rohrmann wrote: > I like both approaches because maven won't necessarily find all problems if > executed locally. Especially concurrency problems seem to occur more often > on travis than on my local machine. I agree with Till. Let's add a git hook to catch most o

Re: Master not building and how to notice it faster in the future

2015-01-21 Thread Till Rohrmann
I like both approaches because maven won't necessarily find all problems if executed locally. Especially concurrency problems seem to occur more often on travis than on my local machine. Greets, Till On Wed, Jan 21, 2015 at 9:59 AM, Max Michels wrote: > Hi Robert, > > I like your solution usin

[jira] [Created] (FLINK-1425) Turn lazy operator execution off for streaming programs

2015-01-21 Thread Gyula Fora (JIRA)
Gyula Fora created FLINK-1425: - Summary: Turn lazy operator execution off for streaming programs Key: FLINK-1425 URL: https://issues.apache.org/jira/browse/FLINK-1425 Project: Flink Issue Type: T

Re: Master not building and how to notice it faster in the future

2015-01-21 Thread Max Michels
Hi Robert, I like your solution using Travis and Google App Engine. However, I think there's a much simpler solution which can prevent commiters from pushing not even compiling or test-failing code to the master in the first place. Commiters could simply install a git pre-push hook in their git r

Re: Keeping around temp datasets

2015-01-21 Thread Stephan Ewen
I agree, the type serializer IO formats should be the best match. They would also work rather efficient. On Tue, Jan 20, 2015 at 2:18 PM, Robert Metzger wrote: > I would not recommend using the AvroInput/Output format because its meant > to be used with Avro types (usually POJOs generated from a

Re: Representing Scala base types in the Flink RT

2015-01-21 Thread Stephan Ewen
Hi! The Basic types hold the Java primitives and their boxed versions. Since the "scala.Int" boils down to the Java int primitive type at runtime, they should be interoperable. I would guess that the problem is that "classOf[Int]" actually returns (in Java terms) "scala.Int.class", which is not r