Re: Apache Flink

Corey Nolet Sun, 17 Apr 2016 08:08:14 -0700

One thing I've noticed about Flink in my following of the project has been
that it has established, in a few cases, some novel ideas and improvements
over Spark. The problem with it, however, is that both the development team
and the community around it are very small and many of those novel
improvements have been rolled directly into Spark in subsequent versions. I
was considering changing over my architecture to Flink at one point to get
better, more real-time CEP streaming support, but in the end I decided to
stick with Spark and just watch Flink continue to pressure it into
improvement.


On Sun, Apr 17, 2016 at 11:03 AM, Koert Kuipers <ko...@tresata.com> wrote:

> i never found much info that flink was actually designed to be fault
> tolerant. if fault tolerance is more bolt-on/add-on/afterthought then that
> doesn't bode well for large scale data processing. spark was designed with
> fault tolerance in mind from the beginning.
>
> On Sun, Apr 17, 2016 at 9:52 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Hi,
>>
>> I read the benchmark published by Yahoo. Obviously they already use Storm
>> and inevitably very familiar with that tool. To start with although these
>> benchmarks were somehow interesting IMO, it lend itself to an assurance
>> that the tool chosen for their platform is still the best choice. So
>> inevitably the benchmarks and the tests were done to support primary their
>> approach.
>>
>> In general anything which is not done through TCP Council or similar body
>> is questionable..
>> Their argument is that because Spark handles data streaming in micro
>> batches then inevitably it introduces this in-built latency as per design.
>> In contrast, both Storm and Flink do not (at the face value) have this
>> issue.
>>
>> In addition as we already know Spark has far more capabilities compared
>> to Flink (know nothing about Storm). So really it boils down to the
>> business SLA to choose which tool one wants to deploy for your use case.
>> IMO Spark micro batching approach is probably OK for 99% of use cases. If
>> we had in built libraries for CEP for Spark (I am searching for it), I
>> would not bother with Flink.
>>
>> HTH
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 17 April 2016 at 12:47, Ovidiu-Cristian MARCU <
>> ovidiu-cristian.ma...@inria.fr> wrote:
>>
>>> You probably read this benchmark at Yahoo, any comments from Spark?
>>>
>>> https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
>>>
>>>
>>> On 17 Apr 2016, at 12:41, andy petrella <andy.petre...@gmail.com> wrote:
>>>
>>> Just adding one thing to the mix: `that the latency for streaming data
>>> is eliminated` is insane :-D
>>>
>>> On Sun, Apr 17, 2016 at 12:19 PM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>>  It seems that Flink argues that the latency for streaming data is
>>>> eliminated whereas with Spark RDD there is this latency.
>>>>
>>>> I noticed that Flink does not support interactive shell much like Spark
>>>> shell where you can add jars to it to do kafka testing. The advice was to
>>>> add the streaming Kafka jar file to CLASSPATH but that does not work.
>>>>
>>>> Most Flink documentation also rather sparce with the usual example of
>>>> word count which is not exactly what you want.
>>>>
>>>> Anyway I will have a look at it further. I have a Spark Scala streaming
>>>> Kafka program that works fine in Spark and I want to recode it using Scala
>>>> for Flink with Kafka but have difficulty importing and testing libraries.
>>>>
>>>> Cheers
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>> LinkedIn * 
>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 17 April 2016 at 02:41, Ascot Moss <ascot.m...@gmail.com> wrote:
>>>>
>>>>> I compared both last month, seems to me that Flink's MLLib is not yet
>>>>> ready.
>>>>>
>>>>> On Sun, Apr 17, 2016 at 12:23 AM, Mich Talebzadeh <
>>>>> mich.talebza...@gmail.com> wrote:
>>>>>
>>>>>> Thanks Ted. I was wondering if someone is using both :)
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>> LinkedIn * 
>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 16 April 2016 at 17:08, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>>>
>>>>>>> Looks like this question is more relevant on flink mailing list :-)
>>>>>>>
>>>>>>> On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh <
>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Has anyone used Apache Flink instead of Spark by any chance
>>>>>>>>
>>>>>>>> I am interested in its set of libraries for Complex Event
>>>>>>>> Processing.
>>>>>>>>
>>>>>>>> Frankly I don't know if it offers far more than Spark offers.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>> LinkedIn * 
>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>
>>>>>>>>
>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>> --
>>> andy
>>>
>>>
>>>
>>
>

Re: Apache Flink

Reply via email to