Re: Apache Flink

2016-04-18 Thread Jörn Franke
What is your exact set of requirements for algo trading? Is it react in real-time or analysis over longer time? In the first case, I do not think a framework such as Spark or Flink makes sense. They are generic, but in order to compete with other usually custom developed highly - specialized

Re: Apache Flink

2016-04-18 Thread Mich Talebzadeh
Thanks Todd I will have a look. Regards Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com On 18 April

Re: Apache Flink

2016-04-18 Thread Mich Talebzadeh
Please forgive me for going in tangent Well there may be many SQL engines but only 5-6 are in the league. Oracle has Oracle and MySQL plus TimesTen from various acquisitions. SAP has Hana, SAP ASE, SAP IQ and few others again from acquiring Sybase . So very few big players. Cars, Fiat owns many

Re: Apache Flink

2016-04-18 Thread Sean Owen
Interesting tangent. I think there will never be a time when an interesting area is covered only by one project, or product. Why are there 30 SQL engines? or 50 car companies? it's a feature not a bug. To the extent they provide different tradeoffs or functionality, they're not entirely

Re: Apache Flink

2016-04-17 Thread Todd Nist
So there is an offering from Stratio, https://github.com/Stratio/Decision Decision CEP engine is a Complex Event Processing platform built on Spark > Streaming. > > It is the result of combining the power of Spark Streaming as a continuous > computing framework and Siddhi CEP engine as complex

Re: Apache Flink

2016-04-17 Thread Corey Nolet
Peyman, I'm sorry, I missed the comment that microbatching was a waste of time. Did someone mention this? I know this thread got pretty long so I may have missed it somewhere above. My comment about Spark's microbatching being a downside is stricly in reference to CEP. Complex CEP flows are

Re: Apache Flink

2016-04-17 Thread Peyman Mohajerian
Microbatching is certainly not a waste of time, you are making way too strong of an statement. In fact in certain cases one tuple at the time makes no sense, it all depends on the use cases. In fact if you understand the history of the project Storm you would know that microbatching was added

Re: Apache Flink

2016-04-17 Thread Otis Gospodnetić
010 USENIX paper on Spark. Nephele is the execution engine of >> Stratosphere, which became Flink. >> >> >> -- >> *From:* Mark Hamstra <m...@clearstorydata.com> >> *To:* Mich Talebzadeh <mich.talebza...@gmail.com> >> *Cc:* Corey Nolet &

Re: Apache Flink

2016-04-17 Thread Michael Malak
Malak <michaelma...@yahoo.com>; "user @spark" <user@spark.apache.org> Sent: Sunday, April 17, 2016 3:55 PM Subject: Re: Apache Flink Assuming that both Spark and Flink are contemporaries what are the reasons that Flink has not been adopted widely? (this may sound obviou

Re: Apache Flink

2016-04-17 Thread Mich Talebzadeh
lebza...@gmail.com> > *Cc:* Corey Nolet <cjno...@gmail.com>; "user @spark" < > user@spark.apache.org> > *Sent:* Sunday, April 17, 2016 3:30 PM > *Subject:* Re: Apache Flink > > To be fair, the Stratosphere project from which Flink springs was started >

Re: Apache Flink

2016-04-17 Thread Michael Malak
nt: Sunday, April 17, 2016 3:48 PM Subject: Re: Apache Flink The problem is that the strength and wider acceptance of a typical Open source project is its sizeable user and development community. When the community is small like Flink, then it is not a viable solution to adopt I am rather dis

Re: Apache Flink

2016-04-17 Thread Michael Malak
; Cc: Corey Nolet <cjno...@gmail.com>; "user @spark" <user@spark.apache.org> Sent: Sunday, April 17, 2016 3:30 PM Subject: Re: Apache Flink To be fair, the Stratosphere project from which Flink springs was started as a collaborative university research project in Germa

Re: Apache Flink

2016-04-17 Thread Mich Talebzadeh
The problem is that the strength and wider acceptance of a typical Open source project is its sizeable user and development community. When the community is small like Flink, then it is not a viable solution to adopt I am rather disappointed that no big data project can be used for Complex Event

Re: Apache Flink

2016-04-17 Thread Mark Hamstra
To be fair, the Stratosphere project from which Flink springs was started as a collaborative university research project in Germany about the same time that Spark was first released as Open Source, so they are near contemporaries rather than Flink having been started only well after Spark was an

Re: Apache Flink

2016-04-17 Thread Mich Talebzadeh
Also it always amazes me why they are so many tangential projects in Big Data space? Would not it be easier if efforts were spent on adding to Spark functionality rather than creating a new product like Flink? Dr Mich Talebzadeh LinkedIn *

Re: Apache Flink

2016-04-17 Thread Mich Talebzadeh
Thanks Corey for the useful info. I have used Sybase Aleri and StreamBase as commercial CEPs engines. However, there does not seem to be anything close to these products in Hadoop Ecosystem. So I guess there is nothing there? Regards. Dr Mich Talebzadeh LinkedIn *

Re: Apache Flink

2016-04-17 Thread Corey Nolet
i have not been intrigued at all by the microbatching concept in Spark. I am used to CEP in real streams processing environments like Infosphere Streams & Storm where the granularity of processing is at the level of each individual tuple and processing units (workers) can react immediately to

Re: Apache Flink

2016-04-17 Thread Mich Talebzadeh
Hi Corey, Can you please point me to docs on using Spark for CEP? Do we have a set of CEP libraries somewhere. I am keen on getting hold of adaptor libraries for Spark something like below ​ Thanks Dr Mich Talebzadeh LinkedIn *

Re: Apache Flink

2016-04-17 Thread Ovidiu-Cristian MARCU
The Streaming use case is important IMO, as Spark (like Flink) advocates for the unification of analytics tools, so having all in one, batch and graph processing, sql, ml and streaming. > On 17 Apr 2016, at 17:07, Corey Nolet wrote: > > One thing I've noticed about Flink in

Re: Apache Flink

2016-04-17 Thread Ovidiu-Cristian MARCU
For the streaming case Flink is fault tolerant (DataStream API), for the batch case (DataSet API) not yet, as from my research regarding their platform. > On 17 Apr 2016, at 17:03, Koert Kuipers wrote: > > i never found much info that flink was actually designed to be fault

Re: Apache Flink

2016-04-17 Thread Ovidiu-Cristian MARCU
Hi Mich, IMO one will try to see if there is an alternative, a better one at least. This benchmark could be a good starting point. Best, Ovidiu > On 17 Apr 2016, at 15:52, Mich Talebzadeh wrote: > > Hi, > > I read the benchmark published by Yahoo. Obviously they

Re: Apache Flink

2016-04-17 Thread Ovidiu-Cristian MARCU
<mailto:ascot.m...@gmail.com>; Ted Yu <mailto:yuzhih...@gmail.com>; user > @spark <mailto:user@spark.apache.org> > Subject: Re: Apache Flink > > You probably read this benchmark at Yahoo, any comments from Spark? > https://yahooeng.tumblr.com/post/135321837876

Re: Apache Flink

2016-04-17 Thread Corey Nolet
One thing I've noticed about Flink in my following of the project has been that it has established, in a few cases, some novel ideas and improvements over Spark. The problem with it, however, is that both the development team and the community around it are very small and many of those novel

Re: Apache Flink

2016-04-17 Thread Koert Kuipers
i never found much info that flink was actually designed to be fault tolerant. if fault tolerance is more bolt-on/add-on/afterthought then that doesn't bode well for large scale data processing. spark was designed with fault tolerance in mind from the beginning. On Sun, Apr 17, 2016 at 9:52 AM,

Re: Apache Flink

2016-04-17 Thread Mich Talebzadeh
Hi, I read the benchmark published by Yahoo. Obviously they already use Storm and inevitably very familiar with that tool. To start with although these benchmarks were somehow interesting IMO, it lend itself to an assurance that the tool chosen for their platform is still the best choice. So

RE: Apache Flink

2016-04-17 Thread Silvio Fiorito
gmail.com> Cc: Mich Talebzadeh<mailto:mich.talebza...@gmail.com>; Ascot Moss<mailto:ascot.m...@gmail.com>; Ted Yu<mailto:yuzhih...@gmail.com>; user @spark<mailto:user@spark.apache.org> Subject: Re: Apache Flink You probably read this benchmark at Yahoo, any comments fro

Re: Apache Flink

2016-04-17 Thread Igor Berman
latency in Flink is not eliminated, but it might be smaller since Flink process each event 1-by-1 while Spark does microbatching(so you can't achieve latency lesser than your microbatch config) probably Spark will have better throughput due to this microbatching On 17 April 2016 at 14:47,

Re: Apache Flink

2016-04-17 Thread Ovidiu-Cristian MARCU
You probably read this benchmark at Yahoo, any comments from Spark? https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at > On 17 Apr 2016, at 12:41, andy

Re: Apache Flink

2016-04-17 Thread andy petrella
Just adding one thing to the mix: `that the latency for streaming data is eliminated` is insane :-D On Sun, Apr 17, 2016 at 12:19 PM Mich Talebzadeh wrote: > It seems that Flink argues that the latency for streaming data is > eliminated whereas with Spark RDD there

Re: Apache Flink

2016-04-17 Thread Mich Talebzadeh
It seems that Flink argues that the latency for streaming data is eliminated whereas with Spark RDD there is this latency. I noticed that Flink does not support interactive shell much like Spark shell where you can add jars to it to do kafka testing. The advice was to add the streaming Kafka jar

Re: Apache Flink

2016-04-16 Thread Ascot Moss
I compared both last month, seems to me that Flink's MLLib is not yet ready. On Sun, Apr 17, 2016 at 12:23 AM, Mich Talebzadeh wrote: > Thanks Ted. I was wondering if someone is using both :) > > Dr Mich Talebzadeh > > > > LinkedIn * >

Re: Apache Flink

2016-04-16 Thread Mich Talebzadeh
Thanks Ted. I was wondering if someone is using both :) Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com

Re: Apache Flink

2016-04-16 Thread Ted Yu
Looks like this question is more relevant on flink mailing list :-) On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh wrote: > Hi, > > Has anyone used Apache Flink instead of Spark by any chance > > I am interested in its set of libraries for Complex Event Processing.