Thanks Todd I will have a look. Regards
Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 18 April 2016 at 01:58, Todd Nist <tsind...@gmail.com> wrote: > So there is an offering from Stratio, https://github.com/Stratio/Decision > > Decision CEP engine is a Complex Event Processing platform built on Spark >> Streaming. >> > > >> It is the result of combining the power of Spark Streaming as a >> continuous computing framework and Siddhi CEP engine as complex event >> processing engine. > > > https://stratio.atlassian.net/wiki/display/DECISION0x9/Home > > I have not used it, only read about it but it may be of some interest to > you. > > -Todd > > On Sun, Apr 17, 2016 at 5:49 PM, Peyman Mohajerian <mohaj...@gmail.com> > wrote: > >> Microbatching is certainly not a waste of time, you are making way too >> strong of an statement. In fact in certain cases one tuple at the time >> makes no sense, it all depends on the use cases. In fact if you understand >> the history of the project Storm you would know that microbatching was >> added later in Storm, Trident, and it is specifically for >> microbatching/windowing. >> In certain cases you are doing aggregation/windowing and throughput is >> the dominant design consideration and you don't care what each individual >> event/tuple does, e.g. of you push different event types to separate kafka >> topics and all you care is to do a count, what is the need for single event >> processing. >> >> On Sun, Apr 17, 2016 at 12:43 PM, Corey Nolet <cjno...@gmail.com> wrote: >> >>> i have not been intrigued at all by the microbatching concept in Spark. >>> I am used to CEP in real streams processing environments like Infosphere >>> Streams & Storm where the granularity of processing is at the level of each >>> individual tuple and processing units (workers) can react immediately to >>> events being received and processed. The closest Spark streaming comes to >>> this concept is the notion of "state" that that can be updated via the >>> "updateStateBykey()" functions which are only able to be run in a >>> microbatch. Looking at the expected design changes to Spark Streaming in >>> Spark 2.0.0, it also does not look like tuple-at-a-time processing is on >>> the radar for Spark, though I have seen articles stating that more effort >>> is going to go into the Spark SQL layer in Spark streaming which may make >>> it more reminiscent of Esper. >>> >>> For these reasons, I have not even tried to implement CEP in Spark. I >>> feel it's a waste of time without immediate tuple-at-a-time processing. >>> Without this, they avoid the whole problem of "back pressure" (though keep >>> in mind, it is still very possible to overload the Spark streaming layer >>> with stages that will continue to pile up and never get worked off) but >>> they lose the granular control that you get in CEP environments by allowing >>> the rules & processors to react with the receipt of each tuple, right away. >>> >>> Awhile back, I did attempt to implement an InfoSphere Streams-like API >>> [1] on top of Apache Storm as an example of what such a design may look >>> like. It looks like Storm is going to be replaced in the not so distant >>> future by Twitter's new design called Heron. IIRC, Heron does not have an >>> open source implementation as of yet. >>> >>> [1] https://github.com/calrissian/flowmix >>> >>> On Sun, Apr 17, 2016 at 3:11 PM, Mich Talebzadeh < >>> mich.talebza...@gmail.com> wrote: >>> >>>> Hi Corey, >>>> >>>> Can you please point me to docs on using Spark for CEP? Do we have a >>>> set of CEP libraries somewhere. I am keen on getting hold of adaptor >>>> libraries for Spark something like below >>>> >>>> >>>> >>>> >>>> Thanks >>>> >>>> >>>> Dr Mich Talebzadeh >>>> >>>> >>>> >>>> LinkedIn * >>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>> >>>> >>>> >>>> http://talebzadehmich.wordpress.com >>>> >>>> >>>> >>>> On 17 April 2016 at 16:07, Corey Nolet <cjno...@gmail.com> wrote: >>>> >>>>> One thing I've noticed about Flink in my following of the project has >>>>> been that it has established, in a few cases, some novel ideas and >>>>> improvements over Spark. The problem with it, however, is that both the >>>>> development team and the community around it are very small and many of >>>>> those novel improvements have been rolled directly into Spark in >>>>> subsequent >>>>> versions. I was considering changing over my architecture to Flink at one >>>>> point to get better, more real-time CEP streaming support, but in the end >>>>> I >>>>> decided to stick with Spark and just watch Flink continue to pressure it >>>>> into improvement. >>>>> >>>>> On Sun, Apr 17, 2016 at 11:03 AM, Koert Kuipers <ko...@tresata.com> >>>>> wrote: >>>>> >>>>>> i never found much info that flink was actually designed to be fault >>>>>> tolerant. if fault tolerance is more bolt-on/add-on/afterthought then >>>>>> that >>>>>> doesn't bode well for large scale data processing. spark was designed >>>>>> with >>>>>> fault tolerance in mind from the beginning. >>>>>> >>>>>> On Sun, Apr 17, 2016 at 9:52 AM, Mich Talebzadeh < >>>>>> mich.talebza...@gmail.com> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I read the benchmark published by Yahoo. Obviously they already use >>>>>>> Storm and inevitably very familiar with that tool. To start with >>>>>>> although >>>>>>> these benchmarks were somehow interesting IMO, it lend itself to an >>>>>>> assurance that the tool chosen for their platform is still the best >>>>>>> choice. >>>>>>> So inevitably the benchmarks and the tests were done to support >>>>>>> primary their approach. >>>>>>> >>>>>>> In general anything which is not done through TCP Council or similar >>>>>>> body is questionable.. >>>>>>> Their argument is that because Spark handles data streaming in micro >>>>>>> batches then inevitably it introduces this in-built latency as per >>>>>>> design. >>>>>>> In contrast, both Storm and Flink do not (at the face value) have this >>>>>>> issue. >>>>>>> >>>>>>> In addition as we already know Spark has far more capabilities >>>>>>> compared to Flink (know nothing about Storm). So really it boils down to >>>>>>> the business SLA to choose which tool one wants to deploy for your use >>>>>>> case. IMO Spark micro batching approach is probably OK for 99% of use >>>>>>> cases. If we had in built libraries for CEP for Spark (I am searching >>>>>>> for >>>>>>> it), I would not bother with Flink. >>>>>>> >>>>>>> HTH >>>>>>> >>>>>>> >>>>>>> Dr Mich Talebzadeh >>>>>>> >>>>>>> >>>>>>> >>>>>>> LinkedIn * >>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>> >>>>>>> >>>>>>> >>>>>>> http://talebzadehmich.wordpress.com >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 17 April 2016 at 12:47, Ovidiu-Cristian MARCU < >>>>>>> ovidiu-cristian.ma...@inria.fr> wrote: >>>>>>> >>>>>>>> You probably read this benchmark at Yahoo, any comments from Spark? >>>>>>>> >>>>>>>> https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at >>>>>>>> >>>>>>>> >>>>>>>> On 17 Apr 2016, at 12:41, andy petrella <andy.petre...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Just adding one thing to the mix: `that the latency for streaming >>>>>>>> data is eliminated` is insane :-D >>>>>>>> >>>>>>>> On Sun, Apr 17, 2016 at 12:19 PM Mich Talebzadeh < >>>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>>> >>>>>>>>> It seems that Flink argues that the latency for streaming data is >>>>>>>>> eliminated whereas with Spark RDD there is this latency. >>>>>>>>> >>>>>>>>> I noticed that Flink does not support interactive shell much like >>>>>>>>> Spark shell where you can add jars to it to do kafka testing. The >>>>>>>>> advice >>>>>>>>> was to add the streaming Kafka jar file to CLASSPATH but that does >>>>>>>>> not work. >>>>>>>>> >>>>>>>>> Most Flink documentation also rather sparce with the usual example >>>>>>>>> of word count which is not exactly what you want. >>>>>>>>> >>>>>>>>> Anyway I will have a look at it further. I have a Spark Scala >>>>>>>>> streaming Kafka program that works fine in Spark and I want to recode >>>>>>>>> it >>>>>>>>> using Scala for Flink with Kafka but have difficulty importing and >>>>>>>>> testing >>>>>>>>> libraries. >>>>>>>>> >>>>>>>>> Cheers >>>>>>>>> >>>>>>>>> Dr Mich Talebzadeh >>>>>>>>> >>>>>>>>> >>>>>>>>> LinkedIn * >>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>>>> >>>>>>>>> >>>>>>>>> http://talebzadehmich.wordpress.com >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 17 April 2016 at 02:41, Ascot Moss <ascot.m...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I compared both last month, seems to me that Flink's MLLib is not >>>>>>>>>> yet ready. >>>>>>>>>> >>>>>>>>>> On Sun, Apr 17, 2016 at 12:23 AM, Mich Talebzadeh < >>>>>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Thanks Ted. I was wondering if someone is using both :) >>>>>>>>>>> >>>>>>>>>>> Dr Mich Talebzadeh >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> LinkedIn * >>>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> http://talebzadehmich.wordpress.com >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 16 April 2016 at 17:08, Ted Yu <yuzhih...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Looks like this question is more relevant on flink mailing list >>>>>>>>>>>> :-) >>>>>>>>>>>> >>>>>>>>>>>> On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh < >>>>>>>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> Has anyone used Apache Flink instead of Spark by any chance >>>>>>>>>>>>> >>>>>>>>>>>>> I am interested in its set of libraries for Complex Event >>>>>>>>>>>>> Processing. >>>>>>>>>>>>> >>>>>>>>>>>>> Frankly I don't know if it offers far more than Spark offers. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks >>>>>>>>>>>>> >>>>>>>>>>>>> Dr Mich Talebzadeh >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> LinkedIn * >>>>>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> http://talebzadehmich.wordpress.com >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> -- >>>>>>>> andy >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >