Re: [POLL] Who still uses Java 7 with Flink ?

2017-03-15 Thread Martin Neumann
I think this easier done in a straw poll than in an email conversation. I created one at: http://www.strawpoll.me/12535073 (Note that you have multiple choices.) Though I prefer Java 8 most of the time I have to work on Java 7. A lot of the infrastructure I work on still runs Java 7, one of the

Re: Some thoughts about the lower-level Flink APIs

2016-08-17 Thread Martin Neumann
I agree with Vasia that for data scientist it's likely easier to learn the high-level api. I like the material from http://dataartisans.github.io/flink-training/ but all of them focus on the high level api. Maybe we could have a guide (blog post, lecture, whatever) on how to get into Flink as a

Re: Opening a discussion on FlinkML

2016-02-14 Thread Martin Neumann
I think the focus of this discussion should be how we proceed not what to do. The what comes from the committers anyway. There are several people who like to commit, including people from the Streamline project. Having pull requests that are older than 6 Month is not good for any project. The

User Feedback

2016-02-09 Thread Martin Neumann
During this year's FOSDEM Martin Junghans and I set together and gathered some feedback for the Flink project. It is based on our personal experience as well as the feedback and questions from People we taught the system. This is going to be a longer email therefore I have split things into

Re: maven dependency problem when building stream job

2016-01-20 Thread Martin Neumann
ugin to enforce a > minimum version of netty? > We recently downgraded (a minor version) of netty because of an issue. > Maybe that's the issue. > > Can you check the enforcer rules of your project? > > On Wed, Jan 20, 2016 at 1:48 PM, Martin Neumann <mneum...@sics.se> wrote

maven dependency problem when building stream job

2016-01-20 Thread Martin Neumann
Hi, I have a weird problem. Yesterday I had to clean my local maven cache for a different application. Since afterwards one of my Flink streaming jobs does not compile anymore. I didn't change any code just made maven pull all dependencies again. I'm totally stomped by this, please help me!

Re: maven dependency problem when building stream job

2016-01-20 Thread Martin Neumann
org> wrote: > Hi Martin. > > can you try to exclude the netty dependency from your Flink dependencies? > Another approach would be to disable the check, or add an exception to it > ;) > > Why did you add the check in the first place? > > > On Wed, Jan 20, 20

Streaming statefull operator with hashmap

2015-11-11 Thread Martin Neumann
Hej, What is the correct way of initializing a state-full operator that is using a hashmap? modelMapInit.getClass() does not work neither does HashMap.class. Do I have to implement my own TypeInformation class or is there a simpler way? cheers Martin private

Re: Streaming statefull operator with hashmap

2015-11-11 Thread Martin Neumann
gt; > Yes what you wrote should work. You can alternatively use > > TypeExtractor.getForObject(modelMapInit) to extract the tye information. > > > > I also like to implement my custom type info for Hashmaps and the other > > types and use that. > > > >

0.10 streaming state documentation out of date

2015-11-05 Thread Martin Neumann
Hej, I'm working with some state full streaming operators at the moment and I noticed that the Documentation is out of date. The documentation says: @Override public void open(Configuration config) { counter = getRuntimeContext().getOperatorState(“counter”, 0L, false); }

Re: [gelly] Spargel model rework

2015-11-03 Thread Martin Neumann
I tried out Spargel during my work with Spotify and have implemented several algorithms using it. In all implementations I ended up storing additional Data and Flags on the Vertex to carry them over from one UDF to the next one. It definitely makes the code harder to write and maintain. I wonder

Re: [gelly] Spargel model rework

2015-11-03 Thread Martin Neumann
The problem with having many different graph model in gelly is that it might get quite confusing for a user. Maybe this can be fixed with good documentation so that its clear how each model works and what its benefits are (and maybe when its better to use it over a different model). On Tue, Nov

Re: streaming GroupBy + Fold

2015-10-06 Thread Martin Neumann
@gmail.com > > >> > wrote: > > >> > > >>> Hey, > > >>> > > >>> Thanks for reporting the problem, Martin. I have not merged the PR > > >>> Stephan > > >>> is referring to yet. [1] There I am cleaning

Re: streaming GroupBy + Fold

2015-10-05 Thread Martin Neumann
test > please? > > [1] https://github.com/apache/flink/pull/1155 > > On Fri, Oct 2, 2015 at 8:26 PM, Martin Neumann <mneum...@sics.se> wrote: > > > One of my colleagues found it today when we where hunting bugs today. We > > where using the latest 0.10 version

Re: Rethink the "always copy" policy for streaming topologies

2015-10-02 Thread Martin Neumann
It seems like I'm one of the few people that run into the mutable elements trap on the Batch API from time to time. At the moment I always clone when I'm not 100% sure to avoid hunting the bugs later. So far I was happy to learn that this is not a problem in Streaming, but that's just me. When

streaming GroupBy + Fold

2015-10-02 Thread Martin Neumann
Hej, In one of my Programs I run a Fold on a GroupedDataStream. The aim is to aggregate the values in each group. It seems the aggregator in the Fold function is shared on operator level, so all groups that end up on the same operator get mashed together. Is this the wanted behavior? If so, what

Re: streaming GroupBy + Fold

2015-10-02 Thread Martin Neumann
, Stephan Ewen <se...@apache.org> wrote: > I think these operations were recently moved to the internal state > interface. Did the behavior change then? > > @Marton or Gyula, can you comment? Is it per chance not mapped to the > partitioned state? > > On Fri, Oct 2, 2015 at 6:3

EventTime in streaming

2015-09-17 Thread Martin Neumann
After some work experience with the current solution I want to give some feedback and maybe start a discussion about event time in streaming. This is not about watermarks or any of the incoming improvements just some observations from the current code. *Starttime for EventTime:* In the current

broadcast set sizes

2015-04-09 Thread Martin Neumann
Hej, Up to what sizes are broadcast sets a good idea? I have large dataset (~5 GB) and I'm only interested in lines with a certain ID that I have in a file. The file has ~10 k entries. I could either Join the dataset with the IDList or I could broadcast the ID list and do the filtering in a

gelli graph algorithm

2015-02-26 Thread Martin Neumann
Hej, I was busy with other stuff for a while but I hope I will have more time to work on Flink and Graphs again now. I need to do some basic analytic's on a large graph set (stuff like degree distribution, triangle count, component size distribution etc.) Is there anything implemented in Gelli

gelli graph algorithm

2015-02-26 Thread Martin Neumann
Hej, I was busy with other stuff for a while but I hope I will have more time to work on Flink and Graphs again now. I need to do some basic analytic's on a large graph set (stuff like degree distribution, triangle count, component size distribution etc.) Is there anything implemented in Gelli

Re: Stale Synchronous Parallel iterations in Flink

2015-02-25 Thread Martin Neumann
Hej, Very interesting discussion. I hadn't heard of the SSP model before, looks like something I want to look into. I wonder if any of the algorithms that would work in that model would not work in an asynchronous model. Since asynchronous is basically a SSP model with infinite slack. Iterative