Re: Apache Flink

Mich Talebzadeh Mon, 18 Apr 2016 00:35:15 -0700

Thanks Todd I will have a look.

Regards


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 18 April 2016 at 01:58, Todd Nist <tsind...@gmail.com> wrote:

> So there is an offering from Stratio, https://github.com/Stratio/Decision
>
> Decision CEP engine is a Complex Event Processing platform built on Spark
>> Streaming.
>>
>
>
>> It is the result of combining the power of Spark Streaming as a
>> continuous computing framework and Siddhi CEP engine as complex event
>> processing engine.
>
>
> https://stratio.atlassian.net/wiki/display/DECISION0x9/Home
>
> I have not used it, only read about it but it may be of some interest to
> you.
>
> -Todd
>
> On Sun, Apr 17, 2016 at 5:49 PM, Peyman Mohajerian <mohaj...@gmail.com>
> wrote:
>
>> Microbatching is certainly not a waste of time, you are making way too
>> strong of an statement. In fact in certain cases one tuple at the time
>> makes no sense, it all depends on the use cases. In fact if you understand
>> the history of the project Storm you would know that microbatching was
>> added later in Storm, Trident, and it is specifically for
>> microbatching/windowing.
>> In certain cases you are doing aggregation/windowing and throughput is
>> the dominant design consideration and you don't care what each individual
>> event/tuple does, e.g. of you push different event types to separate kafka
>> topics and all you care is to do a count, what is the need for single event
>> processing.
>>
>> On Sun, Apr 17, 2016 at 12:43 PM, Corey Nolet <cjno...@gmail.com> wrote:
>>
>>> i have not been intrigued at all by the microbatching concept in Spark.
>>> I am used to CEP in real streams processing environments like Infosphere
>>> Streams & Storm where the granularity of processing is at the level of each
>>> individual tuple and processing units (workers) can react immediately to
>>> events being received and processed. The closest Spark streaming comes to
>>> this concept is the notion of "state" that that can be updated via the
>>> "updateStateBykey()" functions which are only able to be run in a
>>> microbatch. Looking at the expected design changes to Spark Streaming in
>>> Spark 2.0.0, it also does not look like tuple-at-a-time processing is on
>>> the radar for Spark, though I have seen articles stating that more effort
>>> is going to go into the Spark SQL layer in Spark streaming which may make
>>> it more reminiscent of Esper.
>>>
>>> For these reasons, I have not even tried to implement CEP in Spark. I
>>> feel it's a waste of time without immediate tuple-at-a-time processing.
>>> Without this, they avoid the whole problem of "back pressure" (though keep
>>> in mind, it is still very possible to overload the Spark streaming layer
>>> with stages that will continue to pile up and never get worked off) but
>>> they lose the granular control that you get in CEP environments by allowing
>>> the rules & processors to react with the receipt of each tuple, right away.
>>>
>>> Awhile back, I did attempt to implement an InfoSphere Streams-like API
>>> [1] on top of Apache Storm as an example of what such a design may look
>>> like. It looks like Storm is going to be replaced in the not so distant
>>> future by Twitter's new design called Heron. IIRC, Heron does not have an
>>> open source implementation as of yet.
>>>
>>> [1] https://github.com/calrissian/flowmix
>>>
>>> On Sun, Apr 17, 2016 at 3:11 PM, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>> Hi Corey,
>>>>
>>>> Can you please point me to docs on using Spark for CEP? Do we have a
>>>> set of CEP libraries somewhere. I am keen on getting hold of adaptor
>>>> libraries for Spark something like below
>>>>
>>>>
>>>>
>>>> 
>>>> Thanks
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * 
>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 17 April 2016 at 16:07, Corey Nolet <cjno...@gmail.com> wrote:
>>>>
>>>>> One thing I've noticed about Flink in my following of the project has
>>>>> been that it has established, in a few cases, some novel ideas and
>>>>> improvements over Spark. The problem with it, however, is that both the
>>>>> development team and the community around it are very small and many of
>>>>> those novel improvements have been rolled directly into Spark in 
>>>>> subsequent
>>>>> versions. I was considering changing over my architecture to Flink at one
>>>>> point to get better, more real-time CEP streaming support, but in the end 
>>>>> I
>>>>> decided to stick with Spark and just watch Flink continue to pressure it
>>>>> into improvement.
>>>>>
>>>>> On Sun, Apr 17, 2016 at 11:03 AM, Koert Kuipers <ko...@tresata.com>
>>>>> wrote:
>>>>>
>>>>>> i never found much info that flink was actually designed to be fault
>>>>>> tolerant. if fault tolerance is more bolt-on/add-on/afterthought then 
>>>>>> that
>>>>>> doesn't bode well for large scale data processing. spark was designed 
>>>>>> with
>>>>>> fault tolerance in mind from the beginning.
>>>>>>
>>>>>> On Sun, Apr 17, 2016 at 9:52 AM, Mich Talebzadeh <
>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I read the benchmark published by Yahoo. Obviously they already use
>>>>>>> Storm and inevitably very familiar with that tool. To start with 
>>>>>>> although
>>>>>>> these benchmarks were somehow interesting IMO, it lend itself to an
>>>>>>> assurance that the tool chosen for their platform is still the best 
>>>>>>> choice.
>>>>>>> So inevitably the benchmarks and the tests were done to support
>>>>>>> primary their approach.
>>>>>>>
>>>>>>> In general anything which is not done through TCP Council or similar
>>>>>>> body is questionable..
>>>>>>> Their argument is that because Spark handles data streaming in micro
>>>>>>> batches then inevitably it introduces this in-built latency as per 
>>>>>>> design.
>>>>>>> In contrast, both Storm and Flink do not (at the face value) have this
>>>>>>> issue.
>>>>>>>
>>>>>>> In addition as we already know Spark has far more capabilities
>>>>>>> compared to Flink (know nothing about Storm). So really it boils down to
>>>>>>> the business SLA to choose which tool one wants to deploy for your use
>>>>>>> case. IMO Spark micro batching approach is probably OK for 99% of use
>>>>>>> cases. If we had in built libraries for CEP for Spark (I am searching 
>>>>>>> for
>>>>>>> it), I would not bother with Flink.
>>>>>>>
>>>>>>> HTH
>>>>>>>
>>>>>>>
>>>>>>> Dr Mich Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> LinkedIn * 
>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 17 April 2016 at 12:47, Ovidiu-Cristian MARCU <
>>>>>>> ovidiu-cristian.ma...@inria.fr> wrote:
>>>>>>>
>>>>>>>> You probably read this benchmark at Yahoo, any comments from Spark?
>>>>>>>>
>>>>>>>> https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
>>>>>>>>
>>>>>>>>
>>>>>>>> On 17 Apr 2016, at 12:41, andy petrella <andy.petre...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Just adding one thing to the mix: `that the latency for streaming
>>>>>>>> data is eliminated` is insane :-D
>>>>>>>>
>>>>>>>> On Sun, Apr 17, 2016 at 12:19 PM Mich Talebzadeh <
>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>>  It seems that Flink argues that the latency for streaming data is
>>>>>>>>> eliminated whereas with Spark RDD there is this latency.
>>>>>>>>>
>>>>>>>>> I noticed that Flink does not support interactive shell much like
>>>>>>>>> Spark shell where you can add jars to it to do kafka testing. The 
>>>>>>>>> advice
>>>>>>>>> was to add the streaming Kafka jar file to CLASSPATH but that does 
>>>>>>>>> not work.
>>>>>>>>>
>>>>>>>>> Most Flink documentation also rather sparce with the usual example
>>>>>>>>> of word count which is not exactly what you want.
>>>>>>>>>
>>>>>>>>> Anyway I will have a look at it further. I have a Spark Scala
>>>>>>>>> streaming Kafka program that works fine in Spark and I want to recode 
>>>>>>>>> it
>>>>>>>>> using Scala for Flink with Kafka but have difficulty importing and 
>>>>>>>>> testing
>>>>>>>>> libraries.
>>>>>>>>>
>>>>>>>>> Cheers
>>>>>>>>>
>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> LinkedIn * 
>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 17 April 2016 at 02:41, Ascot Moss <ascot.m...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I compared both last month, seems to me that Flink's MLLib is not
>>>>>>>>>> yet ready.
>>>>>>>>>>
>>>>>>>>>> On Sun, Apr 17, 2016 at 12:23 AM, Mich Talebzadeh <
>>>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks Ted. I was wondering if someone is using both :)
>>>>>>>>>>>
>>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> LinkedIn * 
>>>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 16 April 2016 at 17:08, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Looks like this question is more relevant on flink mailing list
>>>>>>>>>>>> :-)
>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh <
>>>>>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Has anyone used Apache Flink instead of Spark by any chance
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am interested in its set of libraries for Complex Event
>>>>>>>>>>>>> Processing.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Frankly I don't know if it offers far more than Spark offers.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>
>>>>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> LinkedIn * 
>>>>>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> --
>>>>>>>> andy
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Apache Flink

Reply via email to