Re: Apache Flink

2016-04-18 Thread Jörn Franke

What is your exact set of requirements for algo trading? Is it react in 
real-time or analysis over longer time? In the first case, I do not think a 
framework such as Spark or Flink makes sense. They are generic, but in order to 
compete with other usually custom developed highly - specialized engines in a 
low level language you need something else.

> On 18 Apr 2016, at 09:19, Mich Talebzadeh  wrote:
> 
> Please forgive me for going in tangent
> 
> Well there may be many SQL engines but only 5-6 are in the league. Oracle has 
> Oracle and MySQL plus TimesTen from various acquisitions. SAP has Hana, SAP 
> ASE, SAP IQ and few others again from acquiring Sybase . So very few big 
> players.
> 
> Cars, Fiat owns many groups including Ferrari and Maserati. The beloved 
> Jaguar belongs to Tata Motors and Rolls Royce belongs to BMW and actually 
> uses BMW engines. VW has many companies from Seat to Skoda etc.
> 
> However, that is the results of Markets getting too fragmented when 
> consolidation happens. Big data world is quite young but I gather it will go 
> the same way as most go, consolidation.
> 
> Anyway my original point was finding a tool that allows me to do CEP on Algo 
> trading using Kafka + another. As of now there is really none. I am still 
> exploring if Flink can do the job.
> 
> HTH
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
>  
> 
>> On 18 April 2016 at 07:46, Sean Owen  wrote:
>> Interesting tangent. I think there will never be a time when an
>> interesting area is covered only by one project, or product. Why are
>> there 30 SQL engines? or 50 car companies? it's a feature not a bug.
>> To the extent they provide different tradeoffs or functionality,
>> they're not entirely duplicative; to the extent they compete directly,
>> it's a win for the user.
>> 
>> As others have said, Flink (née Stratosphere) started quite a while
>> ago. But you can draw lines of influence back earlier than Spark. I
>> presume MS Dryad is the forerunner of all these.
>> 
>> And in case you wanted a third option, Google's DataFlow (now Apache
>> Beam) is really a reinvention of FlumeJava (nothing to do with Apache
>> Flume) from Google, in a way that Crunch was a port and minor update
>> of FlumeJava earlier. And it claims to run on Flink/Spark if you want.
>> 
>> https://cloud.google.com/dataflow/blog/dataflow-beam-and-spark-comparison
>> 
>> 
>> On Sun, Apr 17, 2016 at 10:25 PM, Mich Talebzadeh
>>  wrote:
>> > Also it always amazes me why they are so many tangential projects in Big
>> > Data space? Would not it be easier if efforts were spent on adding to Spark
>> > functionality rather than creating a new product like Flink?
> 


Re: Apache Flink

2016-04-18 Thread Mich Talebzadeh
Thanks Todd I will have a look.

Regards

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 18 April 2016 at 01:58, Todd Nist  wrote:

> So there is an offering from Stratio, https://github.com/Stratio/Decision
>
> Decision CEP engine is a Complex Event Processing platform built on Spark
>> Streaming.
>>
>
>
>> It is the result of combining the power of Spark Streaming as a
>> continuous computing framework and Siddhi CEP engine as complex event
>> processing engine.
>
>
> https://stratio.atlassian.net/wiki/display/DECISION0x9/Home
>
> I have not used it, only read about it but it may be of some interest to
> you.
>
> -Todd
>
> On Sun, Apr 17, 2016 at 5:49 PM, Peyman Mohajerian 
> wrote:
>
>> Microbatching is certainly not a waste of time, you are making way too
>> strong of an statement. In fact in certain cases one tuple at the time
>> makes no sense, it all depends on the use cases. In fact if you understand
>> the history of the project Storm you would know that microbatching was
>> added later in Storm, Trident, and it is specifically for
>> microbatching/windowing.
>> In certain cases you are doing aggregation/windowing and throughput is
>> the dominant design consideration and you don't care what each individual
>> event/tuple does, e.g. of you push different event types to separate kafka
>> topics and all you care is to do a count, what is the need for single event
>> processing.
>>
>> On Sun, Apr 17, 2016 at 12:43 PM, Corey Nolet  wrote:
>>
>>> i have not been intrigued at all by the microbatching concept in Spark.
>>> I am used to CEP in real streams processing environments like Infosphere
>>> Streams & Storm where the granularity of processing is at the level of each
>>> individual tuple and processing units (workers) can react immediately to
>>> events being received and processed. The closest Spark streaming comes to
>>> this concept is the notion of "state" that that can be updated via the
>>> "updateStateBykey()" functions which are only able to be run in a
>>> microbatch. Looking at the expected design changes to Spark Streaming in
>>> Spark 2.0.0, it also does not look like tuple-at-a-time processing is on
>>> the radar for Spark, though I have seen articles stating that more effort
>>> is going to go into the Spark SQL layer in Spark streaming which may make
>>> it more reminiscent of Esper.
>>>
>>> For these reasons, I have not even tried to implement CEP in Spark. I
>>> feel it's a waste of time without immediate tuple-at-a-time processing.
>>> Without this, they avoid the whole problem of "back pressure" (though keep
>>> in mind, it is still very possible to overload the Spark streaming layer
>>> with stages that will continue to pile up and never get worked off) but
>>> they lose the granular control that you get in CEP environments by allowing
>>> the rules & processors to react with the receipt of each tuple, right away.
>>>
>>> Awhile back, I did attempt to implement an InfoSphere Streams-like API
>>> [1] on top of Apache Storm as an example of what such a design may look
>>> like. It looks like Storm is going to be replaced in the not so distant
>>> future by Twitter's new design called Heron. IIRC, Heron does not have an
>>> open source implementation as of yet.
>>>
>>> [1] https://github.com/calrissian/flowmix
>>>
>>> On Sun, Apr 17, 2016 at 3:11 PM, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 Hi Corey,

 Can you please point me to docs on using Spark for CEP? Do we have a
 set of CEP libraries somewhere. I am keen on getting hold of adaptor
 libraries for Spark something like below



 ​
 Thanks


 Dr Mich Talebzadeh



 LinkedIn * 
 https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 *



 http://talebzadehmich.wordpress.com



 On 17 April 2016 at 16:07, Corey Nolet  wrote:

> One thing I've noticed about Flink in my following of the project has
> been that it has established, in a few cases, some novel ideas and
> improvements over Spark. The problem with it, however, is that both the
> development team and the community around it are very small and many of
> those novel improvements have been rolled directly into Spark in 
> subsequent
> versions. I was considering changing over my architecture to Flink at one
> point to get better, more real-time CEP streaming support, but in the end 
> I
> decided to stick with Spark and just watch Flink continue to pressure it
> into improvement.
>
> On Sun, Apr 17, 2016 at 11:03 AM, Koert Kuipers 
> wrote:
>
>> i never found much info that flink w

Re: Apache Flink

2016-04-18 Thread Mich Talebzadeh
Please forgive me for going in tangent

Well there may be many SQL engines but only 5-6 are in the league. Oracle
has Oracle and MySQL plus TimesTen from various acquisitions. SAP has Hana,
SAP ASE, SAP IQ and few others again from acquiring Sybase . So very few
big players.

Cars, Fiat owns many groups including Ferrari and Maserati. The beloved
Jaguar belongs to Tata Motors and Rolls Royce belongs to BMW and actually
uses BMW engines. VW has many companies from Seat to Skoda etc.

However, that is the results of Markets getting too fragmented when
consolidation happens. Big data world is quite young but I gather it will
go the same way as most go, consolidation.

Anyway my original point was finding a tool that allows me to do CEP on
Algo trading using Kafka + another. As of now there is really none. I am
still exploring if Flink can do the job.

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 18 April 2016 at 07:46, Sean Owen  wrote:

> Interesting tangent. I think there will never be a time when an
> interesting area is covered only by one project, or product. Why are
> there 30 SQL engines? or 50 car companies? it's a feature not a bug.
> To the extent they provide different tradeoffs or functionality,
> they're not entirely duplicative; to the extent they compete directly,
> it's a win for the user.
>
> As others have said, Flink (née Stratosphere) started quite a while
> ago. But you can draw lines of influence back earlier than Spark. I
> presume MS Dryad is the forerunner of all these.
>
> And in case you wanted a third option, Google's DataFlow (now Apache
> Beam) is really a reinvention of FlumeJava (nothing to do with Apache
> Flume) from Google, in a way that Crunch was a port and minor update
> of FlumeJava earlier. And it claims to run on Flink/Spark if you want.
>
> https://cloud.google.com/dataflow/blog/dataflow-beam-and-spark-comparison
>
>
> On Sun, Apr 17, 2016 at 10:25 PM, Mich Talebzadeh
>  wrote:
> > Also it always amazes me why they are so many tangential projects in Big
> > Data space? Would not it be easier if efforts were spent on adding to
> Spark
> > functionality rather than creating a new product like Flink?
>


Re: Apache Flink

2016-04-17 Thread Sean Owen
Interesting tangent. I think there will never be a time when an
interesting area is covered only by one project, or product. Why are
there 30 SQL engines? or 50 car companies? it's a feature not a bug.
To the extent they provide different tradeoffs or functionality,
they're not entirely duplicative; to the extent they compete directly,
it's a win for the user.

As others have said, Flink (née Stratosphere) started quite a while
ago. But you can draw lines of influence back earlier than Spark. I
presume MS Dryad is the forerunner of all these.

And in case you wanted a third option, Google's DataFlow (now Apache
Beam) is really a reinvention of FlumeJava (nothing to do with Apache
Flume) from Google, in a way that Crunch was a port and minor update
of FlumeJava earlier. And it claims to run on Flink/Spark if you want.

https://cloud.google.com/dataflow/blog/dataflow-beam-and-spark-comparison


On Sun, Apr 17, 2016 at 10:25 PM, Mich Talebzadeh
 wrote:
> Also it always amazes me why they are so many tangential projects in Big
> Data space? Would not it be easier if efforts were spent on adding to Spark
> functionality rather than creating a new product like Flink?

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Apache Flink

2016-04-17 Thread Todd Nist
So there is an offering from Stratio, https://github.com/Stratio/Decision

Decision CEP engine is a Complex Event Processing platform built on Spark
> Streaming.
>


> It is the result of combining the power of Spark Streaming as a continuous
> computing framework and Siddhi CEP engine as complex event processing
> engine.


https://stratio.atlassian.net/wiki/display/DECISION0x9/Home

I have not used it, only read about it but it may be of some interest to
you.

-Todd

On Sun, Apr 17, 2016 at 5:49 PM, Peyman Mohajerian 
wrote:

> Microbatching is certainly not a waste of time, you are making way too
> strong of an statement. In fact in certain cases one tuple at the time
> makes no sense, it all depends on the use cases. In fact if you understand
> the history of the project Storm you would know that microbatching was
> added later in Storm, Trident, and it is specifically for
> microbatching/windowing.
> In certain cases you are doing aggregation/windowing and throughput is the
> dominant design consideration and you don't care what each individual
> event/tuple does, e.g. of you push different event types to separate kafka
> topics and all you care is to do a count, what is the need for single event
> processing.
>
> On Sun, Apr 17, 2016 at 12:43 PM, Corey Nolet  wrote:
>
>> i have not been intrigued at all by the microbatching concept in Spark. I
>> am used to CEP in real streams processing environments like Infosphere
>> Streams & Storm where the granularity of processing is at the level of each
>> individual tuple and processing units (workers) can react immediately to
>> events being received and processed. The closest Spark streaming comes to
>> this concept is the notion of "state" that that can be updated via the
>> "updateStateBykey()" functions which are only able to be run in a
>> microbatch. Looking at the expected design changes to Spark Streaming in
>> Spark 2.0.0, it also does not look like tuple-at-a-time processing is on
>> the radar for Spark, though I have seen articles stating that more effort
>> is going to go into the Spark SQL layer in Spark streaming which may make
>> it more reminiscent of Esper.
>>
>> For these reasons, I have not even tried to implement CEP in Spark. I
>> feel it's a waste of time without immediate tuple-at-a-time processing.
>> Without this, they avoid the whole problem of "back pressure" (though keep
>> in mind, it is still very possible to overload the Spark streaming layer
>> with stages that will continue to pile up and never get worked off) but
>> they lose the granular control that you get in CEP environments by allowing
>> the rules & processors to react with the receipt of each tuple, right away.
>>
>> Awhile back, I did attempt to implement an InfoSphere Streams-like API
>> [1] on top of Apache Storm as an example of what such a design may look
>> like. It looks like Storm is going to be replaced in the not so distant
>> future by Twitter's new design called Heron. IIRC, Heron does not have an
>> open source implementation as of yet.
>>
>> [1] https://github.com/calrissian/flowmix
>>
>> On Sun, Apr 17, 2016 at 3:11 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi Corey,
>>>
>>> Can you please point me to docs on using Spark for CEP? Do we have a set
>>> of CEP libraries somewhere. I am keen on getting hold of adaptor libraries
>>> for Spark something like below
>>>
>>>
>>>
>>> ​
>>> Thanks
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 17 April 2016 at 16:07, Corey Nolet  wrote:
>>>
 One thing I've noticed about Flink in my following of the project has
 been that it has established, in a few cases, some novel ideas and
 improvements over Spark. The problem with it, however, is that both the
 development team and the community around it are very small and many of
 those novel improvements have been rolled directly into Spark in subsequent
 versions. I was considering changing over my architecture to Flink at one
 point to get better, more real-time CEP streaming support, but in the end I
 decided to stick with Spark and just watch Flink continue to pressure it
 into improvement.

 On Sun, Apr 17, 2016 at 11:03 AM, Koert Kuipers 
 wrote:

> i never found much info that flink was actually designed to be fault
> tolerant. if fault tolerance is more bolt-on/add-on/afterthought then that
> doesn't bode well for large scale data processing. spark was designed with
> fault tolerance in mind from the beginning.
>
> On Sun, Apr 17, 2016 at 9:52 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Hi,
>>
>> I read the benchmark published by Yahoo. Obviously they already use
>> Storm a

Re: Apache Flink

2016-04-17 Thread Corey Nolet
Peyman,

I'm sorry, I missed the comment that microbatching was a waste of time. Did
someone mention this? I know this thread got pretty long so I may have
missed it somewhere above.

My comment about Spark's microbatching being a downside is stricly in
reference to CEP. Complex CEP flows are reactive and the batched streaming
technique that Spark's architecture utilizes is not very easy for
programming real-time reactive designs. The thing is, many good streaming
engines start with just that, the streaming engine. They start at the core
with an architecture that generally promotes tuple-at-a-time. Whatever they
build on top of that is strictly just to make other use-cases easier to
implement, hence the main difference between Flink and Spark.

Storm, Esper and Infosphere Streams are three examples of this that come to
mind very quickly. All three of them are powerful tuple-at-a-time streams
processing engines under the hood and all 3 of them also have abstractions
 built on top of that core that make it easier to implement more specific
and more batch processing paradigms. Flink is similar to this.

I hope you didn't take my comment as an attack that Spark's microbatching
does not follow a traditional design at it's core as most well-accepted
streams processing framework have in the past. I am not implying that
microbatching is not useful in some use cases. What I am implying is that
it does not make real-time reactive environments very easy to implement.



On Sun, Apr 17, 2016 at 8:49 PM, Peyman Mohajerian 
wrote:

> Microbatching is certainly not a waste of time, you are making way too
> strong of an statement. In fact in certain cases one tuple at the time
> makes no sense, it all depends on the use cases. In fact if you understand
> the history of the project Storm you would know that microbatching was
> added later in Storm, Trident, and it is specifically for
> microbatching/windowing.
> In certain cases you are doing aggregation/windowing and throughput is the
> dominant design consideration and you don't care what each individual
> event/tuple does, e.g. of you push different event types to separate kafka
> topics and all you care is to do a count, what is the need for single event
> processing.
>
> On Sun, Apr 17, 2016 at 12:43 PM, Corey Nolet  wrote:
>
>> i have not been intrigued at all by the microbatching concept in Spark. I
>> am used to CEP in real streams processing environments like Infosphere
>> Streams & Storm where the granularity of processing is at the level of each
>> individual tuple and processing units (workers) can react immediately to
>> events being received and processed. The closest Spark streaming comes to
>> this concept is the notion of "state" that that can be updated via the
>> "updateStateBykey()" functions which are only able to be run in a
>> microbatch. Looking at the expected design changes to Spark Streaming in
>> Spark 2.0.0, it also does not look like tuple-at-a-time processing is on
>> the radar for Spark, though I have seen articles stating that more effort
>> is going to go into the Spark SQL layer in Spark streaming which may make
>> it more reminiscent of Esper.
>>
>> For these reasons, I have not even tried to implement CEP in Spark. I
>> feel it's a waste of time without immediate tuple-at-a-time processing.
>> Without this, they avoid the whole problem of "back pressure" (though keep
>> in mind, it is still very possible to overload the Spark streaming layer
>> with stages that will continue to pile up and never get worked off) but
>> they lose the granular control that you get in CEP environments by allowing
>> the rules & processors to react with the receipt of each tuple, right away.
>>
>> Awhile back, I did attempt to implement an InfoSphere Streams-like API
>> [1] on top of Apache Storm as an example of what such a design may look
>> like. It looks like Storm is going to be replaced in the not so distant
>> future by Twitter's new design called Heron. IIRC, Heron does not have an
>> open source implementation as of yet.
>>
>> [1] https://github.com/calrissian/flowmix
>>
>> On Sun, Apr 17, 2016 at 3:11 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi Corey,
>>>
>>> Can you please point me to docs on using Spark for CEP? Do we have a set
>>> of CEP libraries somewhere. I am keen on getting hold of adaptor libraries
>>> for Spark something like below
>>>
>>>
>>>
>>> ​
>>> Thanks
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 17 April 2016 at 16:07, Corey Nolet  wrote:
>>>
 One thing I've noticed about Flink in my following of the project has
 been that it has established, in a few cases, some novel ideas and
 improvements over Spark. The problem with it, however, is th

Re: Apache Flink

2016-04-17 Thread Peyman Mohajerian
Microbatching is certainly not a waste of time, you are making way too
strong of an statement. In fact in certain cases one tuple at the time
makes no sense, it all depends on the use cases. In fact if you understand
the history of the project Storm you would know that microbatching was
added later in Storm, Trident, and it is specifically for
microbatching/windowing.
In certain cases you are doing aggregation/windowing and throughput is the
dominant design consideration and you don't care what each individual
event/tuple does, e.g. of you push different event types to separate kafka
topics and all you care is to do a count, what is the need for single event
processing.

On Sun, Apr 17, 2016 at 12:43 PM, Corey Nolet  wrote:

> i have not been intrigued at all by the microbatching concept in Spark. I
> am used to CEP in real streams processing environments like Infosphere
> Streams & Storm where the granularity of processing is at the level of each
> individual tuple and processing units (workers) can react immediately to
> events being received and processed. The closest Spark streaming comes to
> this concept is the notion of "state" that that can be updated via the
> "updateStateBykey()" functions which are only able to be run in a
> microbatch. Looking at the expected design changes to Spark Streaming in
> Spark 2.0.0, it also does not look like tuple-at-a-time processing is on
> the radar for Spark, though I have seen articles stating that more effort
> is going to go into the Spark SQL layer in Spark streaming which may make
> it more reminiscent of Esper.
>
> For these reasons, I have not even tried to implement CEP in Spark. I feel
> it's a waste of time without immediate tuple-at-a-time processing. Without
> this, they avoid the whole problem of "back pressure" (though keep in mind,
> it is still very possible to overload the Spark streaming layer with stages
> that will continue to pile up and never get worked off) but they lose the
> granular control that you get in CEP environments by allowing the rules &
> processors to react with the receipt of each tuple, right away.
>
> Awhile back, I did attempt to implement an InfoSphere Streams-like API [1]
> on top of Apache Storm as an example of what such a design may look like.
> It looks like Storm is going to be replaced in the not so distant future by
> Twitter's new design called Heron. IIRC, Heron does not have an open source
> implementation as of yet.
>
> [1] https://github.com/calrissian/flowmix
>
> On Sun, Apr 17, 2016 at 3:11 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Hi Corey,
>>
>> Can you please point me to docs on using Spark for CEP? Do we have a set
>> of CEP libraries somewhere. I am keen on getting hold of adaptor libraries
>> for Spark something like below
>>
>>
>>
>> ​
>> Thanks
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 17 April 2016 at 16:07, Corey Nolet  wrote:
>>
>>> One thing I've noticed about Flink in my following of the project has
>>> been that it has established, in a few cases, some novel ideas and
>>> improvements over Spark. The problem with it, however, is that both the
>>> development team and the community around it are very small and many of
>>> those novel improvements have been rolled directly into Spark in subsequent
>>> versions. I was considering changing over my architecture to Flink at one
>>> point to get better, more real-time CEP streaming support, but in the end I
>>> decided to stick with Spark and just watch Flink continue to pressure it
>>> into improvement.
>>>
>>> On Sun, Apr 17, 2016 at 11:03 AM, Koert Kuipers 
>>> wrote:
>>>
 i never found much info that flink was actually designed to be fault
 tolerant. if fault tolerance is more bolt-on/add-on/afterthought then that
 doesn't bode well for large scale data processing. spark was designed with
 fault tolerance in mind from the beginning.

 On Sun, Apr 17, 2016 at 9:52 AM, Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:

> Hi,
>
> I read the benchmark published by Yahoo. Obviously they already use
> Storm and inevitably very familiar with that tool. To start with although
> these benchmarks were somehow interesting IMO, it lend itself to an
> assurance that the tool chosen for their platform is still the best 
> choice.
> So inevitably the benchmarks and the tests were done to support
> primary their approach.
>
> In general anything which is not done through TCP Council or similar
> body is questionable..
> Their argument is that because Spark handles data streaming in micro
> batches then inevitably it introduces this in-built latency as per design.
> In contrast, both Storm and Flink do not (a

Re: Apache Flink

2016-04-17 Thread Otis Gospodnetić
While Flink may not be younger than Spark, Spark came to Apache first,
which always helps.  Plus, there was already a lot of buzz around Spark
before it came to Apache.  Coming from Berkeley also helps.

That said, Flink seems decently healthy to me:
- http://search-hadoop.com/?fc_project=Flink&fc_type=mail+_hash_+user&q=
- http://search-hadoop.com/?fc_project=Flink&fc_type=mail+_hash_+dev&q=
-
http://search-hadoop.com/?fc_project=Flink&fc_type=issue&q=&startDate=144547200&endDate=146102400

Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/


On Sun, Apr 17, 2016 at 5:55 PM, Mich Talebzadeh 
wrote:

> Assuming that both Spark and Flink are contemporaries what are the reasons
> that Flink has not been adopted widely? (this may sound obvious and or
> prejudged). I mean Spark has surged in popularity in the past year if I am
> correct
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 17 April 2016 at 22:49, Michael Malak  wrote:
>
>> In terms of publication date, a paper on Nephele was published in 2009,
>> prior to the 2010 USENIX paper on Spark. Nephele is the execution engine of
>> Stratosphere, which became Flink.
>>
>>
>> --
>> *From:* Mark Hamstra 
>> *To:* Mich Talebzadeh 
>> *Cc:* Corey Nolet ; "user @spark" <
>> user@spark.apache.org>
>> *Sent:* Sunday, April 17, 2016 3:30 PM
>> *Subject:* Re: Apache Flink
>>
>> To be fair, the Stratosphere project from which Flink springs was started
>> as a collaborative university research project in Germany about the same
>> time that Spark was first released as Open Source, so they are near
>> contemporaries rather than Flink having been started only well after Spark
>> was an established and widely-used Apache project.
>>
>> On Sun, Apr 17, 2016 at 2:25 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>> Also it always amazes me why they are so many tangential projects in Big
>> Data space? Would not it be easier if efforts were spent on adding to Spark
>> functionality rather than creating a new product like Flink?
>>
>> Dr Mich Talebzadeh
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> On 17 April 2016 at 21:08, Mich Talebzadeh 
>> wrote:
>>
>> Thanks Corey for the useful info.
>>
>> I have used Sybase Aleri and StreamBase as commercial CEPs engines.
>> However, there does not seem to be anything close to these products in
>> Hadoop Ecosystem. So I guess there is nothing there?
>>
>> Regards.
>>
>>
>> Dr Mich Talebzadeh
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> On 17 April 2016 at 20:43, Corey Nolet  wrote:
>>
>> i have not been intrigued at all by the microbatching concept in Spark. I
>> am used to CEP in real streams processing environments like Infosphere
>> Streams & Storm where the granularity of processing is at the level of each
>> individual tuple and processing units (workers) can react immediately to
>> events being received and processed. The closest Spark streaming comes to
>> this concept is the notion of "state" that that can be updated via the
>> "updateStateBykey()" functions which are only able to be run in a
>> microbatch. Looking at the expected design changes to Spark Streaming in
>> Spark 2.0.0, it also does not look like tuple-at-a-time processing is on
>> the radar for Spark, though I have seen articles stating that more effort
>> is going to go into the Spark SQL layer in Spark streaming which may make
>> it more reminiscent of Esper.
>>
>> For these reasons, I have not even tried to implement CEP in Spark. I
>> feel it's a waste of time without immediate tuple-at-a-time processing.
>> Without this, they avoid the whole problem of "back pressure" (though keep
>> in mind, it is still 

Re: Apache Flink

2016-04-17 Thread Mich Talebzadeh
I have to agree with you Michael on the resources and availability in US
versus everywhere else. It is a fact.

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 17 April 2016 at 22:59, Michael Malak  wrote:

> As with all history, "what if"s are not scientifically testable
> hypotheses, but my speculation is the energy (VCs, startups, big Internet
> companies, universities) within Silicon Valley contrasted to Germany.
>
>
> --
> *From:* Mich Talebzadeh 
> *To:* Michael Malak ; "user @spark" <
> user@spark.apache.org>
> *Sent:* Sunday, April 17, 2016 3:55 PM
> *Subject:* Re: Apache Flink
>
> Assuming that both Spark and Flink are contemporaries what are the reasons
> that Flink has not been adopted widely? (this may sound obvious and or
> prejudged). I mean Spark has surged in popularity in the past year if I am
> correct
>
> Dr Mich Talebzadeh
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
> http://talebzadehmich.wordpress.com
>
>
> On 17 April 2016 at 22:49, Michael Malak  wrote:
>
> In terms of publication date, a paper on Nephele was published in 2009,
> prior to the 2010 USENIX paper on Spark. Nephele is the execution engine of
> Stratosphere, which became Flink.
>
>
> --
> *From:* Mark Hamstra 
> *To:* Mich Talebzadeh 
> *Cc:* Corey Nolet ; "user @spark" <
> user@spark.apache.org>
> *Sent:* Sunday, April 17, 2016 3:30 PM
> *Subject:* Re: Apache Flink
>
> To be fair, the Stratosphere project from which Flink springs was started
> as a collaborative university research project in Germany about the same
> time that Spark was first released as Open Source, so they are near
> contemporaries rather than Flink having been started only well after Spark
> was an established and widely-used Apache project.
>
> On Sun, Apr 17, 2016 at 2:25 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
> Also it always amazes me why they are so many tangential projects in Big
> Data space? Would not it be easier if efforts were spent on adding to Spark
> functionality rather than creating a new product like Flink?
>
> Dr Mich Talebzadeh
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
> http://talebzadehmich.wordpress.com
>
>
> On 17 April 2016 at 21:08, Mich Talebzadeh 
> wrote:
>
> Thanks Corey for the useful info.
>
> I have used Sybase Aleri and StreamBase as commercial CEPs engines.
> However, there does not seem to be anything close to these products in
> Hadoop Ecosystem. So I guess there is nothing there?
>
> Regards.
>
>
> Dr Mich Talebzadeh
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
> http://talebzadehmich.wordpress.com
>
>
> On 17 April 2016 at 20:43, Corey Nolet  wrote:
>
> i have not been intrigued at all by the microbatching concept in Spark. I
> am used to CEP in real streams processing environments like Infosphere
> Streams & Storm where the granularity of processing is at the level of each
> individual tuple and processing units (workers) can react immediately to
> events being received and processed. The closest Spark streaming comes to
> this concept is the notion of "state" that that can be updated via the
> "updateStateBykey()" functions which are only able to be run in a
> microbatch. Looking at the expected design changes to Spark Streaming in
> Spark 2.0.0, it also does not look like tuple-at-a-time processing is on
> the radar for Spark, though I have seen articles stating that more effort
> is going to go into the Spark SQL layer in Spark streaming which may make
> it more reminiscent of Esper.
>
> For these reasons, I have not even tried to implement CEP in Spark. I feel
> it's a waste of time without immediate tuple-at-a-time processing. Without
> this, they avoid the whole problem of "back pressure" (though keep in mind,
> it is still very possible to overload the Spark streaming layer with stages
> that will continue to pile up and never get worked off) but they lose the
> granular control that yo

Re: Apache Flink

2016-04-17 Thread Michael Malak
As with all history, "what if"s are not scientifically testable hypotheses, but 
my speculation is the energy (VCs, startups, big Internet companies, 
universities) within Silicon Valley contrasted to Germany.

  From: Mich Talebzadeh 
 To: Michael Malak ; "user @spark" 
 
 Sent: Sunday, April 17, 2016 3:55 PM
 Subject: Re: Apache Flink
   
Assuming that both Spark and Flink are contemporaries what are the reasons that 
Flink has not been adopted widely? (this may sound obvious and or prejudged). I 
mean Spark has surged in popularity in the past year if I am correct
Dr Mich Talebzadeh LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 http://talebzadehmich.wordpress.com 
On 17 April 2016 at 22:49, Michael Malak  wrote:

In terms of publication date, a paper on Nephele was published in 2009, prior 
to the 2010 USENIX paper on Spark. Nephele is the execution engine of 
Stratosphere, which became Flink.

  From: Mark Hamstra 
 To: Mich Talebzadeh  
Cc: Corey Nolet ; "user @spark" 
 Sent: Sunday, April 17, 2016 3:30 PM
 Subject: Re: Apache Flink
  
To be fair, the Stratosphere project from which Flink springs was started as a 
collaborative university research project in Germany about the same time that 
Spark was first released as Open Source, so they are near contemporaries rather 
than Flink having been started only well after Spark was an established and 
widely-used Apache project.
On Sun, Apr 17, 2016 at 2:25 PM, Mich Talebzadeh  
wrote:

Also it always amazes me why they are so many tangential projects in Big Data 
space? Would not it be easier if efforts were spent on adding to Spark 
functionality rather than creating a new product like Flink?
Dr Mich Talebzadeh LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 http://talebzadehmich.wordpress.com 
On 17 April 2016 at 21:08, Mich Talebzadeh  wrote:

Thanks Corey for the useful info.
I have used Sybase Aleri and StreamBase as commercial CEPs engines. However, 
there does not seem to be anything close to these products in Hadoop Ecosystem. 
So I guess there is nothing there?
Regards.

Dr Mich Talebzadeh LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 http://talebzadehmich.wordpress.com 
On 17 April 2016 at 20:43, Corey Nolet  wrote:

i have not been intrigued at all by the microbatching concept in Spark. I am 
used to CEP in real streams processing environments like Infosphere Streams & 
Storm where the granularity of processing is at the level of each individual 
tuple and processing units (workers) can react immediately to events being 
received and processed. The closest Spark streaming comes to this concept is 
the notion of "state" that that can be updated via the "updateStateBykey()" 
functions which are only able to be run in a microbatch. Looking at the 
expected design changes to Spark Streaming in Spark 2.0.0, it also does not 
look like tuple-at-a-time processing is on the radar for Spark, though I have 
seen articles stating that more effort is going to go into the Spark SQL layer 
in Spark streaming which may make it more reminiscent of Esper.
For these reasons, I have not even tried to implement CEP in Spark. I feel it's 
a waste of time without immediate tuple-at-a-time processing. Without this, 
they avoid the whole problem of "back pressure" (though keep in mind, it is 
still very possible to overload the Spark streaming layer with stages that will 
continue to pile up and never get worked off) but they lose the granular 
control that you get in CEP environments by allowing the rules & processors to 
react with the receipt of each tuple, right away. 
Awhile back, I did attempt to implement an InfoSphere Streams-like API [1] on 
top of Apache Storm as an example of what such a design may look like. It looks 
like Storm is going to be replaced in the not so distant future by Twitter's 
new design called Heron. IIRC, Heron does not have an open source 
implementation as of yet. 
[1] https://github.com/calrissian/flowmix
On Sun, Apr 17, 2016 at 3:11 PM, Mich Talebzadeh  
wrote:

Hi Corey,
Can you please point me to docs on using Spark for CEP? Do we have a set of CEP 
libraries somewhere. I am keen on getting hold of adaptor libraries for Spark 
something like below


​Thanks

Dr Mich Talebzadeh LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 http://talebzadehmich.wordpress.com 
On 17 April 2016 at 16:07, Corey Nolet  wrote:

One thing I've noticed about Flink in my following of the project has been that 
it has established, in a few cases, some novel ideas and improvements over 
Spark. The problem with it, however, is that both the development team and the 
community around it are very small and many of those novel improvements have 
been rolled directly into Spark in subseque

Re: Apache Flink

2016-04-17 Thread Mich Talebzadeh
Assuming that both Spark and Flink are contemporaries what are the reasons
that Flink has not been adopted widely? (this may sound obvious and or
prejudged). I mean Spark has surged in popularity in the past year if I am
correct

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 17 April 2016 at 22:49, Michael Malak  wrote:

> In terms of publication date, a paper on Nephele was published in 2009,
> prior to the 2010 USENIX paper on Spark. Nephele is the execution engine of
> Stratosphere, which became Flink.
>
>
> --
> *From:* Mark Hamstra 
> *To:* Mich Talebzadeh 
> *Cc:* Corey Nolet ; "user @spark" <
> user@spark.apache.org>
> *Sent:* Sunday, April 17, 2016 3:30 PM
> *Subject:* Re: Apache Flink
>
> To be fair, the Stratosphere project from which Flink springs was started
> as a collaborative university research project in Germany about the same
> time that Spark was first released as Open Source, so they are near
> contemporaries rather than Flink having been started only well after Spark
> was an established and widely-used Apache project.
>
> On Sun, Apr 17, 2016 at 2:25 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
> Also it always amazes me why they are so many tangential projects in Big
> Data space? Would not it be easier if efforts were spent on adding to Spark
> functionality rather than creating a new product like Flink?
>
> Dr Mich Talebzadeh
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
> http://talebzadehmich.wordpress.com
>
>
> On 17 April 2016 at 21:08, Mich Talebzadeh 
> wrote:
>
> Thanks Corey for the useful info.
>
> I have used Sybase Aleri and StreamBase as commercial CEPs engines.
> However, there does not seem to be anything close to these products in
> Hadoop Ecosystem. So I guess there is nothing there?
>
> Regards.
>
>
> Dr Mich Talebzadeh
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
> http://talebzadehmich.wordpress.com
>
>
> On 17 April 2016 at 20:43, Corey Nolet  wrote:
>
> i have not been intrigued at all by the microbatching concept in Spark. I
> am used to CEP in real streams processing environments like Infosphere
> Streams & Storm where the granularity of processing is at the level of each
> individual tuple and processing units (workers) can react immediately to
> events being received and processed. The closest Spark streaming comes to
> this concept is the notion of "state" that that can be updated via the
> "updateStateBykey()" functions which are only able to be run in a
> microbatch. Looking at the expected design changes to Spark Streaming in
> Spark 2.0.0, it also does not look like tuple-at-a-time processing is on
> the radar for Spark, though I have seen articles stating that more effort
> is going to go into the Spark SQL layer in Spark streaming which may make
> it more reminiscent of Esper.
>
> For these reasons, I have not even tried to implement CEP in Spark. I feel
> it's a waste of time without immediate tuple-at-a-time processing. Without
> this, they avoid the whole problem of "back pressure" (though keep in mind,
> it is still very possible to overload the Spark streaming layer with stages
> that will continue to pile up and never get worked off) but they lose the
> granular control that you get in CEP environments by allowing the rules &
> processors to react with the receipt of each tuple, right away.
>
> Awhile back, I did attempt to implement an InfoSphere Streams-like API [1]
> on top of Apache Storm as an example of what such a design may look like.
> It looks like Storm is going to be replaced in the not so distant future by
> Twitter's new design called Heron. IIRC, Heron does not have an open source
> implementation as of yet.
>
> [1] https://github.com/calrissian/flowmix
>
> On Sun, Apr 17, 2016 at 3:11 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
> Hi Corey,
>
> Can you please point me to docs on using Spark for CEP? Do we have a set
> of CEP libraries somewhere. I am keen on getting hold of adaptor libraries
> for Spark something like below
>
>
>
> ​
> Thanks
>
>
> Dr Mich Talebzadeh
>
> LinkedIn * 
> https://ww

Re: Apache Flink

2016-04-17 Thread Michael Malak
There have been commercial CEP solutions for decades, including from my 
employer.

  From: Mich Talebzadeh 
 To: Mark Hamstra  
Cc: Corey Nolet ; "user @spark" 
 Sent: Sunday, April 17, 2016 3:48 PM
 Subject: Re: Apache Flink
   
The problem is that the strength and wider acceptance of a typical Open source 
project is its sizeable user and development community. When the community is 
small like Flink, then it is not a viable solution to adopt 
I am rather disappointed that no big data project can be used for Complex Event 
Processing as it has wider use in Algorithmic trading among others.

Dr Mich Talebzadeh LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 http://talebzadehmich.wordpress.com 
On 17 April 2016 at 22:30, Mark Hamstra  wrote:

To be fair, the Stratosphere project from which Flink springs was started as a 
collaborative university research project in Germany about the same time that 
Spark was first released as Open Source, so they are near contemporaries rather 
than Flink having been started only well after Spark was an established and 
widely-used Apache project.
On Sun, Apr 17, 2016 at 2:25 PM, Mich Talebzadeh  
wrote:

Also it always amazes me why they are so many tangential projects in Big Data 
space? Would not it be easier if efforts were spent on adding to Spark 
functionality rather than creating a new product like Flink?
Dr Mich Talebzadeh LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 http://talebzadehmich.wordpress.com 
On 17 April 2016 at 21:08, Mich Talebzadeh  wrote:

Thanks Corey for the useful info.
I have used Sybase Aleri and StreamBase as commercial CEPs engines. However, 
there does not seem to be anything close to these products in Hadoop Ecosystem. 
So I guess there is nothing there?
Regards.

Dr Mich Talebzadeh LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 http://talebzadehmich.wordpress.com 
On 17 April 2016 at 20:43, Corey Nolet  wrote:

i have not been intrigued at all by the microbatching concept in Spark. I am 
used to CEP in real streams processing environments like Infosphere Streams & 
Storm where the granularity of processing is at the level of each individual 
tuple and processing units (workers) can react immediately to events being 
received and processed. The closest Spark streaming comes to this concept is 
the notion of "state" that that can be updated via the "updateStateBykey()" 
functions which are only able to be run in a microbatch. Looking at the 
expected design changes to Spark Streaming in Spark 2.0.0, it also does not 
look like tuple-at-a-time processing is on the radar for Spark, though I have 
seen articles stating that more effort is going to go into the Spark SQL layer 
in Spark streaming which may make it more reminiscent of Esper.
For these reasons, I have not even tried to implement CEP in Spark. I feel it's 
a waste of time without immediate tuple-at-a-time processing. Without this, 
they avoid the whole problem of "back pressure" (though keep in mind, it is 
still very possible to overload the Spark streaming layer with stages that will 
continue to pile up and never get worked off) but they lose the granular 
control that you get in CEP environments by allowing the rules & processors to 
react with the receipt of each tuple, right away. 
Awhile back, I did attempt to implement an InfoSphere Streams-like API [1] on 
top of Apache Storm as an example of what such a design may look like. It looks 
like Storm is going to be replaced in the not so distant future by Twitter's 
new design called Heron. IIRC, Heron does not have an open source 
implementation as of yet. 
[1] https://github.com/calrissian/flowmix
On Sun, Apr 17, 2016 at 3:11 PM, Mich Talebzadeh  
wrote:

Hi Corey,
Can you please point me to docs on using Spark for CEP? Do we have a set of CEP 
libraries somewhere. I am keen on getting hold of adaptor libraries for Spark 
something like below


​Thanks

Dr Mich Talebzadeh LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 http://talebzadehmich.wordpress.com 
On 17 April 2016 at 16:07, Corey Nolet  wrote:

One thing I've noticed about Flink in my following of the project has been that 
it has established, in a few cases, some novel ideas and improvements over 
Spark. The problem with it, however, is that both the development team and the 
community around it are very small and many of those novel improvements have 
been rolled directly into Spark in subsequent versions. I was considering 
changing over my architecture to Flink at one point to get better, more 
real-time CEP streaming support, but in the end I decided to stick with Spark 
and just watch Flink continue to pressure it into improvement.
On Sun, Apr 17, 2016 at 11:03 AM, Koert Kuipers  wrote:

i never found much info t

Re: Apache Flink

2016-04-17 Thread Michael Malak
In terms of publication date, a paper on Nephele was published in 2009, prior 
to the 2010 USENIX paper on Spark. Nephele is the execution engine of 
Stratosphere, which became Flink.

  From: Mark Hamstra 
 To: Mich Talebzadeh  
Cc: Corey Nolet ; "user @spark" 
 Sent: Sunday, April 17, 2016 3:30 PM
 Subject: Re: Apache Flink
   
To be fair, the Stratosphere project from which Flink springs was started as a 
collaborative university research project in Germany about the same time that 
Spark was first released as Open Source, so they are near contemporaries rather 
than Flink having been started only well after Spark was an established and 
widely-used Apache project.
On Sun, Apr 17, 2016 at 2:25 PM, Mich Talebzadeh  
wrote:

Also it always amazes me why they are so many tangential projects in Big Data 
space? Would not it be easier if efforts were spent on adding to Spark 
functionality rather than creating a new product like Flink?
Dr Mich Talebzadeh LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 http://talebzadehmich.wordpress.com 
On 17 April 2016 at 21:08, Mich Talebzadeh  wrote:

Thanks Corey for the useful info.
I have used Sybase Aleri and StreamBase as commercial CEPs engines. However, 
there does not seem to be anything close to these products in Hadoop Ecosystem. 
So I guess there is nothing there?
Regards.

Dr Mich Talebzadeh LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 http://talebzadehmich.wordpress.com 
On 17 April 2016 at 20:43, Corey Nolet  wrote:

i have not been intrigued at all by the microbatching concept in Spark. I am 
used to CEP in real streams processing environments like Infosphere Streams & 
Storm where the granularity of processing is at the level of each individual 
tuple and processing units (workers) can react immediately to events being 
received and processed. The closest Spark streaming comes to this concept is 
the notion of "state" that that can be updated via the "updateStateBykey()" 
functions which are only able to be run in a microbatch. Looking at the 
expected design changes to Spark Streaming in Spark 2.0.0, it also does not 
look like tuple-at-a-time processing is on the radar for Spark, though I have 
seen articles stating that more effort is going to go into the Spark SQL layer 
in Spark streaming which may make it more reminiscent of Esper.
For these reasons, I have not even tried to implement CEP in Spark. I feel it's 
a waste of time without immediate tuple-at-a-time processing. Without this, 
they avoid the whole problem of "back pressure" (though keep in mind, it is 
still very possible to overload the Spark streaming layer with stages that will 
continue to pile up and never get worked off) but they lose the granular 
control that you get in CEP environments by allowing the rules & processors to 
react with the receipt of each tuple, right away. 
Awhile back, I did attempt to implement an InfoSphere Streams-like API [1] on 
top of Apache Storm as an example of what such a design may look like. It looks 
like Storm is going to be replaced in the not so distant future by Twitter's 
new design called Heron. IIRC, Heron does not have an open source 
implementation as of yet. 
[1] https://github.com/calrissian/flowmix
On Sun, Apr 17, 2016 at 3:11 PM, Mich Talebzadeh  
wrote:

Hi Corey,
Can you please point me to docs on using Spark for CEP? Do we have a set of CEP 
libraries somewhere. I am keen on getting hold of adaptor libraries for Spark 
something like below


​Thanks

Dr Mich Talebzadeh LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 http://talebzadehmich.wordpress.com 
On 17 April 2016 at 16:07, Corey Nolet  wrote:

One thing I've noticed about Flink in my following of the project has been that 
it has established, in a few cases, some novel ideas and improvements over 
Spark. The problem with it, however, is that both the development team and the 
community around it are very small and many of those novel improvements have 
been rolled directly into Spark in subsequent versions. I was considering 
changing over my architecture to Flink at one point to get better, more 
real-time CEP streaming support, but in the end I decided to stick with Spark 
and just watch Flink continue to pressure it into improvement.
On Sun, Apr 17, 2016 at 11:03 AM, Koert Kuipers  wrote:

i never found much info that flink was actually designed to be fault tolerant. 
if fault tolerance is more bolt-on/add-on/afterthought then that doesn't bode 
well for large scale data processing. spark was designed with fault tolerance 
in mind from the beginning.

On Sun, Apr 17, 2016 at 9:52 AM, Mich Talebzadeh  
wrote:

Hi,
I read the benchmark published by Yahoo. Obviously they already use Storm and 
inevitably very familiar with that tool. To start with although these 
benchmar

Re: Apache Flink

2016-04-17 Thread Mich Talebzadeh
The problem is that the strength and wider acceptance of a typical Open
source project is its sizeable user and development community. When the
community is small like Flink, then it is not a viable solution to adopt

I am rather disappointed that no big data project can be used for Complex
Event Processing as it has wider use in Algorithmic trading among others.


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 17 April 2016 at 22:30, Mark Hamstra  wrote:

> To be fair, the Stratosphere project from which Flink springs was started
> as a collaborative university research project in Germany about the same
> time that Spark was first released as Open Source, so they are near
> contemporaries rather than Flink having been started only well after Spark
> was an established and widely-used Apache project.
>
> On Sun, Apr 17, 2016 at 2:25 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Also it always amazes me why they are so many tangential projects in Big
>> Data space? Would not it be easier if efforts were spent on adding to Spark
>> functionality rather than creating a new product like Flink?
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 17 April 2016 at 21:08, Mich Talebzadeh 
>> wrote:
>>
>>> Thanks Corey for the useful info.
>>>
>>> I have used Sybase Aleri and StreamBase as commercial CEPs engines.
>>> However, there does not seem to be anything close to these products in
>>> Hadoop Ecosystem. So I guess there is nothing there?
>>>
>>> Regards.
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 17 April 2016 at 20:43, Corey Nolet  wrote:
>>>
 i have not been intrigued at all by the microbatching concept in Spark.
 I am used to CEP in real streams processing environments like Infosphere
 Streams & Storm where the granularity of processing is at the level of each
 individual tuple and processing units (workers) can react immediately to
 events being received and processed. The closest Spark streaming comes to
 this concept is the notion of "state" that that can be updated via the
 "updateStateBykey()" functions which are only able to be run in a
 microbatch. Looking at the expected design changes to Spark Streaming in
 Spark 2.0.0, it also does not look like tuple-at-a-time processing is on
 the radar for Spark, though I have seen articles stating that more effort
 is going to go into the Spark SQL layer in Spark streaming which may make
 it more reminiscent of Esper.

 For these reasons, I have not even tried to implement CEP in Spark. I
 feel it's a waste of time without immediate tuple-at-a-time processing.
 Without this, they avoid the whole problem of "back pressure" (though keep
 in mind, it is still very possible to overload the Spark streaming layer
 with stages that will continue to pile up and never get worked off) but
 they lose the granular control that you get in CEP environments by allowing
 the rules & processors to react with the receipt of each tuple, right away.

 Awhile back, I did attempt to implement an InfoSphere Streams-like API
 [1] on top of Apache Storm as an example of what such a design may look
 like. It looks like Storm is going to be replaced in the not so distant
 future by Twitter's new design called Heron. IIRC, Heron does not have an
 open source implementation as of yet.

 [1] https://github.com/calrissian/flowmix

 On Sun, Apr 17, 2016 at 3:11 PM, Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:

> Hi Corey,
>
> Can you please point me to docs on using Spark for CEP? Do we have a
> set of CEP libraries somewhere. I am keen on getting hold of adaptor
> libraries for Spark something like below
>
>
>
> ​
> Thanks
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 17 April 2016 at 16:07, Corey Nolet  wrote:
>
>> One thing I've noticed about Flink in my following of the project has
>> been that it has established, in a few ca

Re: Apache Flink

2016-04-17 Thread Mark Hamstra
To be fair, the Stratosphere project from which Flink springs was started
as a collaborative university research project in Germany about the same
time that Spark was first released as Open Source, so they are near
contemporaries rather than Flink having been started only well after Spark
was an established and widely-used Apache project.

On Sun, Apr 17, 2016 at 2:25 PM, Mich Talebzadeh 
wrote:

> Also it always amazes me why they are so many tangential projects in Big
> Data space? Would not it be easier if efforts were spent on adding to Spark
> functionality rather than creating a new product like Flink?
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 17 April 2016 at 21:08, Mich Talebzadeh 
> wrote:
>
>> Thanks Corey for the useful info.
>>
>> I have used Sybase Aleri and StreamBase as commercial CEPs engines.
>> However, there does not seem to be anything close to these products in
>> Hadoop Ecosystem. So I guess there is nothing there?
>>
>> Regards.
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 17 April 2016 at 20:43, Corey Nolet  wrote:
>>
>>> i have not been intrigued at all by the microbatching concept in Spark.
>>> I am used to CEP in real streams processing environments like Infosphere
>>> Streams & Storm where the granularity of processing is at the level of each
>>> individual tuple and processing units (workers) can react immediately to
>>> events being received and processed. The closest Spark streaming comes to
>>> this concept is the notion of "state" that that can be updated via the
>>> "updateStateBykey()" functions which are only able to be run in a
>>> microbatch. Looking at the expected design changes to Spark Streaming in
>>> Spark 2.0.0, it also does not look like tuple-at-a-time processing is on
>>> the radar for Spark, though I have seen articles stating that more effort
>>> is going to go into the Spark SQL layer in Spark streaming which may make
>>> it more reminiscent of Esper.
>>>
>>> For these reasons, I have not even tried to implement CEP in Spark. I
>>> feel it's a waste of time without immediate tuple-at-a-time processing.
>>> Without this, they avoid the whole problem of "back pressure" (though keep
>>> in mind, it is still very possible to overload the Spark streaming layer
>>> with stages that will continue to pile up and never get worked off) but
>>> they lose the granular control that you get in CEP environments by allowing
>>> the rules & processors to react with the receipt of each tuple, right away.
>>>
>>> Awhile back, I did attempt to implement an InfoSphere Streams-like API
>>> [1] on top of Apache Storm as an example of what such a design may look
>>> like. It looks like Storm is going to be replaced in the not so distant
>>> future by Twitter's new design called Heron. IIRC, Heron does not have an
>>> open source implementation as of yet.
>>>
>>> [1] https://github.com/calrissian/flowmix
>>>
>>> On Sun, Apr 17, 2016 at 3:11 PM, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 Hi Corey,

 Can you please point me to docs on using Spark for CEP? Do we have a
 set of CEP libraries somewhere. I am keen on getting hold of adaptor
 libraries for Spark something like below



 ​
 Thanks


 Dr Mich Talebzadeh



 LinkedIn * 
 https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 *



 http://talebzadehmich.wordpress.com



 On 17 April 2016 at 16:07, Corey Nolet  wrote:

> One thing I've noticed about Flink in my following of the project has
> been that it has established, in a few cases, some novel ideas and
> improvements over Spark. The problem with it, however, is that both the
> development team and the community around it are very small and many of
> those novel improvements have been rolled directly into Spark in 
> subsequent
> versions. I was considering changing over my architecture to Flink at one
> point to get better, more real-time CEP streaming support, but in the end 
> I
> decided to stick with Spark and just watch Flink continue to pressure it
> into improvement.
>
> On Sun, Apr 17, 2016 at 11:03 AM, Koert Kuipers 
> wrote:
>
>> i never found much info that flink was actually designed to be fault
>> tolerant. if fault tolerance is more bolt-on/add-on/afterthought then 
>> that
>> doesn't bode w

Re: Apache Flink

2016-04-17 Thread Mich Talebzadeh
Also it always amazes me why they are so many tangential projects in Big
Data space? Would not it be easier if efforts were spent on adding to Spark
functionality rather than creating a new product like Flink?

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 17 April 2016 at 21:08, Mich Talebzadeh 
wrote:

> Thanks Corey for the useful info.
>
> I have used Sybase Aleri and StreamBase as commercial CEPs engines.
> However, there does not seem to be anything close to these products in
> Hadoop Ecosystem. So I guess there is nothing there?
>
> Regards.
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 17 April 2016 at 20:43, Corey Nolet  wrote:
>
>> i have not been intrigued at all by the microbatching concept in Spark. I
>> am used to CEP in real streams processing environments like Infosphere
>> Streams & Storm where the granularity of processing is at the level of each
>> individual tuple and processing units (workers) can react immediately to
>> events being received and processed. The closest Spark streaming comes to
>> this concept is the notion of "state" that that can be updated via the
>> "updateStateBykey()" functions which are only able to be run in a
>> microbatch. Looking at the expected design changes to Spark Streaming in
>> Spark 2.0.0, it also does not look like tuple-at-a-time processing is on
>> the radar for Spark, though I have seen articles stating that more effort
>> is going to go into the Spark SQL layer in Spark streaming which may make
>> it more reminiscent of Esper.
>>
>> For these reasons, I have not even tried to implement CEP in Spark. I
>> feel it's a waste of time without immediate tuple-at-a-time processing.
>> Without this, they avoid the whole problem of "back pressure" (though keep
>> in mind, it is still very possible to overload the Spark streaming layer
>> with stages that will continue to pile up and never get worked off) but
>> they lose the granular control that you get in CEP environments by allowing
>> the rules & processors to react with the receipt of each tuple, right away.
>>
>> Awhile back, I did attempt to implement an InfoSphere Streams-like API
>> [1] on top of Apache Storm as an example of what such a design may look
>> like. It looks like Storm is going to be replaced in the not so distant
>> future by Twitter's new design called Heron. IIRC, Heron does not have an
>> open source implementation as of yet.
>>
>> [1] https://github.com/calrissian/flowmix
>>
>> On Sun, Apr 17, 2016 at 3:11 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi Corey,
>>>
>>> Can you please point me to docs on using Spark for CEP? Do we have a set
>>> of CEP libraries somewhere. I am keen on getting hold of adaptor libraries
>>> for Spark something like below
>>>
>>>
>>>
>>> ​
>>> Thanks
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 17 April 2016 at 16:07, Corey Nolet  wrote:
>>>
 One thing I've noticed about Flink in my following of the project has
 been that it has established, in a few cases, some novel ideas and
 improvements over Spark. The problem with it, however, is that both the
 development team and the community around it are very small and many of
 those novel improvements have been rolled directly into Spark in subsequent
 versions. I was considering changing over my architecture to Flink at one
 point to get better, more real-time CEP streaming support, but in the end I
 decided to stick with Spark and just watch Flink continue to pressure it
 into improvement.

 On Sun, Apr 17, 2016 at 11:03 AM, Koert Kuipers 
 wrote:

> i never found much info that flink was actually designed to be fault
> tolerant. if fault tolerance is more bolt-on/add-on/afterthought then that
> doesn't bode well for large scale data processing. spark was designed with
> fault tolerance in mind from the beginning.
>
> On Sun, Apr 17, 2016 at 9:52 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Hi,
>>
>> I read the benchmark published by Yahoo. Obviously they already use
>> Storm and inevitably very familiar with that tool. To start with although
>> these benchmarks were somehow interesting IMO, it lend itself to an
>> assurance that the tool chosen for their platform is still the best 
>> ch

Re: Apache Flink

2016-04-17 Thread Mich Talebzadeh
Thanks Corey for the useful info.

I have used Sybase Aleri and StreamBase as commercial CEPs engines.
However, there does not seem to be anything close to these products in
Hadoop Ecosystem. So I guess there is nothing there?

Regards.


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 17 April 2016 at 20:43, Corey Nolet  wrote:

> i have not been intrigued at all by the microbatching concept in Spark. I
> am used to CEP in real streams processing environments like Infosphere
> Streams & Storm where the granularity of processing is at the level of each
> individual tuple and processing units (workers) can react immediately to
> events being received and processed. The closest Spark streaming comes to
> this concept is the notion of "state" that that can be updated via the
> "updateStateBykey()" functions which are only able to be run in a
> microbatch. Looking at the expected design changes to Spark Streaming in
> Spark 2.0.0, it also does not look like tuple-at-a-time processing is on
> the radar for Spark, though I have seen articles stating that more effort
> is going to go into the Spark SQL layer in Spark streaming which may make
> it more reminiscent of Esper.
>
> For these reasons, I have not even tried to implement CEP in Spark. I feel
> it's a waste of time without immediate tuple-at-a-time processing. Without
> this, they avoid the whole problem of "back pressure" (though keep in mind,
> it is still very possible to overload the Spark streaming layer with stages
> that will continue to pile up and never get worked off) but they lose the
> granular control that you get in CEP environments by allowing the rules &
> processors to react with the receipt of each tuple, right away.
>
> Awhile back, I did attempt to implement an InfoSphere Streams-like API [1]
> on top of Apache Storm as an example of what such a design may look like.
> It looks like Storm is going to be replaced in the not so distant future by
> Twitter's new design called Heron. IIRC, Heron does not have an open source
> implementation as of yet.
>
> [1] https://github.com/calrissian/flowmix
>
> On Sun, Apr 17, 2016 at 3:11 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Hi Corey,
>>
>> Can you please point me to docs on using Spark for CEP? Do we have a set
>> of CEP libraries somewhere. I am keen on getting hold of adaptor libraries
>> for Spark something like below
>>
>>
>>
>> ​
>> Thanks
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 17 April 2016 at 16:07, Corey Nolet  wrote:
>>
>>> One thing I've noticed about Flink in my following of the project has
>>> been that it has established, in a few cases, some novel ideas and
>>> improvements over Spark. The problem with it, however, is that both the
>>> development team and the community around it are very small and many of
>>> those novel improvements have been rolled directly into Spark in subsequent
>>> versions. I was considering changing over my architecture to Flink at one
>>> point to get better, more real-time CEP streaming support, but in the end I
>>> decided to stick with Spark and just watch Flink continue to pressure it
>>> into improvement.
>>>
>>> On Sun, Apr 17, 2016 at 11:03 AM, Koert Kuipers 
>>> wrote:
>>>
 i never found much info that flink was actually designed to be fault
 tolerant. if fault tolerance is more bolt-on/add-on/afterthought then that
 doesn't bode well for large scale data processing. spark was designed with
 fault tolerance in mind from the beginning.

 On Sun, Apr 17, 2016 at 9:52 AM, Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:

> Hi,
>
> I read the benchmark published by Yahoo. Obviously they already use
> Storm and inevitably very familiar with that tool. To start with although
> these benchmarks were somehow interesting IMO, it lend itself to an
> assurance that the tool chosen for their platform is still the best 
> choice.
> So inevitably the benchmarks and the tests were done to support
> primary their approach.
>
> In general anything which is not done through TCP Council or similar
> body is questionable..
> Their argument is that because Spark handles data streaming in micro
> batches then inevitably it introduces this in-built latency as per design.
> In contrast, both Storm and Flink do not (at the face value) have this
> issue.
>
> In addition as we already know Spark has far more capabilities
> compared to Flink (know nothing about Storm). So really it boils down to
> the bu

Re: Apache Flink

2016-04-17 Thread Corey Nolet
i have not been intrigued at all by the microbatching concept in Spark. I
am used to CEP in real streams processing environments like Infosphere
Streams & Storm where the granularity of processing is at the level of each
individual tuple and processing units (workers) can react immediately to
events being received and processed. The closest Spark streaming comes to
this concept is the notion of "state" that that can be updated via the
"updateStateBykey()" functions which are only able to be run in a
microbatch. Looking at the expected design changes to Spark Streaming in
Spark 2.0.0, it also does not look like tuple-at-a-time processing is on
the radar for Spark, though I have seen articles stating that more effort
is going to go into the Spark SQL layer in Spark streaming which may make
it more reminiscent of Esper.

For these reasons, I have not even tried to implement CEP in Spark. I feel
it's a waste of time without immediate tuple-at-a-time processing. Without
this, they avoid the whole problem of "back pressure" (though keep in mind,
it is still very possible to overload the Spark streaming layer with stages
that will continue to pile up and never get worked off) but they lose the
granular control that you get in CEP environments by allowing the rules &
processors to react with the receipt of each tuple, right away.

Awhile back, I did attempt to implement an InfoSphere Streams-like API [1]
on top of Apache Storm as an example of what such a design may look like.
It looks like Storm is going to be replaced in the not so distant future by
Twitter's new design called Heron. IIRC, Heron does not have an open source
implementation as of yet.

[1] https://github.com/calrissian/flowmix

On Sun, Apr 17, 2016 at 3:11 PM, Mich Talebzadeh 
wrote:

> Hi Corey,
>
> Can you please point me to docs on using Spark for CEP? Do we have a set
> of CEP libraries somewhere. I am keen on getting hold of adaptor libraries
> for Spark something like below
>
>
>
> ​
> Thanks
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 17 April 2016 at 16:07, Corey Nolet  wrote:
>
>> One thing I've noticed about Flink in my following of the project has
>> been that it has established, in a few cases, some novel ideas and
>> improvements over Spark. The problem with it, however, is that both the
>> development team and the community around it are very small and many of
>> those novel improvements have been rolled directly into Spark in subsequent
>> versions. I was considering changing over my architecture to Flink at one
>> point to get better, more real-time CEP streaming support, but in the end I
>> decided to stick with Spark and just watch Flink continue to pressure it
>> into improvement.
>>
>> On Sun, Apr 17, 2016 at 11:03 AM, Koert Kuipers 
>> wrote:
>>
>>> i never found much info that flink was actually designed to be fault
>>> tolerant. if fault tolerance is more bolt-on/add-on/afterthought then that
>>> doesn't bode well for large scale data processing. spark was designed with
>>> fault tolerance in mind from the beginning.
>>>
>>> On Sun, Apr 17, 2016 at 9:52 AM, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 Hi,

 I read the benchmark published by Yahoo. Obviously they already use
 Storm and inevitably very familiar with that tool. To start with although
 these benchmarks were somehow interesting IMO, it lend itself to an
 assurance that the tool chosen for their platform is still the best choice.
 So inevitably the benchmarks and the tests were done to support
 primary their approach.

 In general anything which is not done through TCP Council or similar
 body is questionable..
 Their argument is that because Spark handles data streaming in micro
 batches then inevitably it introduces this in-built latency as per design.
 In contrast, both Storm and Flink do not (at the face value) have this
 issue.

 In addition as we already know Spark has far more capabilities compared
 to Flink (know nothing about Storm). So really it boils down to the
 business SLA to choose which tool one wants to deploy for your use case.
 IMO Spark micro batching approach is probably OK for 99% of use cases. If
 we had in built libraries for CEP for Spark (I am searching for it), I
 would not bother with Flink.

 HTH


 Dr Mich Talebzadeh



 LinkedIn * 
 https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 *



 http://talebzadehmich.wordpress.com



 On 17 April 2016 at 12:47, Ovidiu-Cristian MARCU <
 ovidiu-cristian.ma...@inria.fr> wrote:

Re: Apache Flink

2016-04-17 Thread Mich Talebzadeh
Hi Corey,

Can you please point me to docs on using Spark for CEP? Do we have a set of
CEP libraries somewhere. I am keen on getting hold of adaptor libraries for
Spark something like below



​
Thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 17 April 2016 at 16:07, Corey Nolet  wrote:

> One thing I've noticed about Flink in my following of the project has been
> that it has established, in a few cases, some novel ideas and improvements
> over Spark. The problem with it, however, is that both the development team
> and the community around it are very small and many of those novel
> improvements have been rolled directly into Spark in subsequent versions. I
> was considering changing over my architecture to Flink at one point to get
> better, more real-time CEP streaming support, but in the end I decided to
> stick with Spark and just watch Flink continue to pressure it into
> improvement.
>
> On Sun, Apr 17, 2016 at 11:03 AM, Koert Kuipers  wrote:
>
>> i never found much info that flink was actually designed to be fault
>> tolerant. if fault tolerance is more bolt-on/add-on/afterthought then that
>> doesn't bode well for large scale data processing. spark was designed with
>> fault tolerance in mind from the beginning.
>>
>> On Sun, Apr 17, 2016 at 9:52 AM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I read the benchmark published by Yahoo. Obviously they already use
>>> Storm and inevitably very familiar with that tool. To start with although
>>> these benchmarks were somehow interesting IMO, it lend itself to an
>>> assurance that the tool chosen for their platform is still the best choice.
>>> So inevitably the benchmarks and the tests were done to support
>>> primary their approach.
>>>
>>> In general anything which is not done through TCP Council or similar
>>> body is questionable..
>>> Their argument is that because Spark handles data streaming in micro
>>> batches then inevitably it introduces this in-built latency as per design.
>>> In contrast, both Storm and Flink do not (at the face value) have this
>>> issue.
>>>
>>> In addition as we already know Spark has far more capabilities compared
>>> to Flink (know nothing about Storm). So really it boils down to the
>>> business SLA to choose which tool one wants to deploy for your use case.
>>> IMO Spark micro batching approach is probably OK for 99% of use cases. If
>>> we had in built libraries for CEP for Spark (I am searching for it), I
>>> would not bother with Flink.
>>>
>>> HTH
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 17 April 2016 at 12:47, Ovidiu-Cristian MARCU <
>>> ovidiu-cristian.ma...@inria.fr> wrote:
>>>
 You probably read this benchmark at Yahoo, any comments from Spark?

 https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at


 On 17 Apr 2016, at 12:41, andy petrella 
 wrote:

 Just adding one thing to the mix: `that the latency for streaming data
 is eliminated` is insane :-D

 On Sun, Apr 17, 2016 at 12:19 PM Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:

>  It seems that Flink argues that the latency for streaming data is
> eliminated whereas with Spark RDD there is this latency.
>
> I noticed that Flink does not support interactive shell much like
> Spark shell where you can add jars to it to do kafka testing. The advice
> was to add the streaming Kafka jar file to CLASSPATH but that does not 
> work.
>
> Most Flink documentation also rather sparce with the usual example of
> word count which is not exactly what you want.
>
> Anyway I will have a look at it further. I have a Spark Scala
> streaming Kafka program that works fine in Spark and I want to recode it
> using Scala for Flink with Kafka but have difficulty importing and testing
> libraries.
>
> Cheers
>
> Dr Mich Talebzadeh
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 17 April 2016 at 02:41, Ascot Moss  wrote:
>
>> I compared both last month, seems to me that Flink's MLLib is not yet
>> ready.
>>
>> On Sun, Apr 17, 2016 at 12:23 AM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Thanks Ted. I was wondering if someone is using

Re: Apache Flink

2016-04-17 Thread Ovidiu-Cristian MARCU
The Streaming use case is important IMO, as Spark (like Flink) advocates for 
the unification of analytics tools, so having all in one, batch and graph 
processing, sql, ml and streaming.

> On 17 Apr 2016, at 17:07, Corey Nolet  wrote:
> 
> One thing I've noticed about Flink in my following of the project has been 
> that it has established, in a few cases, some novel ideas and improvements 
> over Spark. The problem with it, however, is that both the development team 
> and the community around it are very small and many of those novel 
> improvements have been rolled directly into Spark in subsequent versions. I 
> was considering changing over my architecture to Flink at one point to get 
> better, more real-time CEP streaming support, but in the end I decided to 
> stick with Spark and just watch Flink continue to pressure it into 
> improvement.
> 
> On Sun, Apr 17, 2016 at 11:03 AM, Koert Kuipers  > wrote:
> i never found much info that flink was actually designed to be fault 
> tolerant. if fault tolerance is more bolt-on/add-on/afterthought then that 
> doesn't bode well for large scale data processing. spark was designed with 
> fault tolerance in mind from the beginning.
> 
> On Sun, Apr 17, 2016 at 9:52 AM, Mich Talebzadeh  > wrote:
> Hi,
> 
> I read the benchmark published by Yahoo. Obviously they already use Storm and 
> inevitably very familiar with that tool. To start with although these 
> benchmarks were somehow interesting IMO, it lend itself to an assurance that 
> the tool chosen for their platform is still the best choice. So inevitably 
> the benchmarks and the tests were done to support primary their approach.
> 
> In general anything which is not done through TCP Council or similar body is 
> questionable..
> Their argument is that because Spark handles data streaming in micro batches 
> then inevitably it introduces this in-built latency as per design. In 
> contrast, both Storm and Flink do not (at the face value) have this issue.
> 
> In addition as we already know Spark has far more capabilities compared to 
> Flink (know nothing about Storm). So really it boils down to the business SLA 
> to choose which tool one wants to deploy for your use case. IMO Spark micro 
> batching approach is probably OK for 99% of use cases. If we had in built 
> libraries for CEP for Spark (I am searching for it), I would not bother with 
> Flink.
> 
> HTH
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> 
>  
> http://talebzadehmich.wordpress.com 
>  
> 
> On 17 April 2016 at 12:47, Ovidiu-Cristian MARCU 
> mailto:ovidiu-cristian.ma...@inria.fr>> 
> wrote:
> You probably read this benchmark at Yahoo, any comments from Spark?
> https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
>  
> 
> 
> 
>> On 17 Apr 2016, at 12:41, andy petrella > > wrote:
>> 
>> Just adding one thing to the mix: `that the latency for streaming data is 
>> eliminated` is insane :-D
>> 
>> On Sun, Apr 17, 2016 at 12:19 PM Mich Talebzadeh > > wrote:
>>  It seems that Flink argues that the latency for streaming data is 
>> eliminated whereas with Spark RDD there is this latency.
>> 
>> I noticed that Flink does not support interactive shell much like Spark 
>> shell where you can add jars to it to do kafka testing. The advice was to 
>> add the streaming Kafka jar file to CLASSPATH but that does not work.
>> 
>> Most Flink documentation also rather sparce with the usual example of word 
>> count which is not exactly what you want.
>> 
>> Anyway I will have a look at it further. I have a Spark Scala streaming 
>> Kafka program that works fine in Spark and I want to recode it using Scala 
>> for Flink with Kafka but have difficulty importing and testing libraries.
>> 
>> Cheers
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>  
>> 
>>  
>> http://talebzadehmich.wordpress.com 
>>  
>> 
>> On 17 April 2016 at 02:41, Ascot Moss > > wrote:
>> I compared both last month, seems to me that Flink's MLLib is not yet ready.
>> 
>> On Sun, Apr 17, 2016 at 12:23 AM, Mich Talebzadeh > > wrote:
>> Thanks Ted. I was wondering if someone is using both :)
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>  
>> 

Re: Apache Flink

2016-04-17 Thread Ovidiu-Cristian MARCU
For the streaming case Flink is fault tolerant (DataStream API), for the batch 
case (DataSet API) not yet, as from my research regarding their platform.

> On 17 Apr 2016, at 17:03, Koert Kuipers  wrote:
> 
> i never found much info that flink was actually designed to be fault 
> tolerant. if fault tolerance is more bolt-on/add-on/afterthought then that 
> doesn't bode well for large scale data processing. spark was designed with 
> fault tolerance in mind from the beginning.
> 
> On Sun, Apr 17, 2016 at 9:52 AM, Mich Talebzadeh  > wrote:
> Hi,
> 
> I read the benchmark published by Yahoo. Obviously they already use Storm and 
> inevitably very familiar with that tool. To start with although these 
> benchmarks were somehow interesting IMO, it lend itself to an assurance that 
> the tool chosen for their platform is still the best choice. So inevitably 
> the benchmarks and the tests were done to support primary their approach.
> 
> In general anything which is not done through TCP Council or similar body is 
> questionable..
> Their argument is that because Spark handles data streaming in micro batches 
> then inevitably it introduces this in-built latency as per design. In 
> contrast, both Storm and Flink do not (at the face value) have this issue.
> 
> In addition as we already know Spark has far more capabilities compared to 
> Flink (know nothing about Storm). So really it boils down to the business SLA 
> to choose which tool one wants to deploy for your use case. IMO Spark micro 
> batching approach is probably OK for 99% of use cases. If we had in built 
> libraries for CEP for Spark (I am searching for it), I would not bother with 
> Flink.
> 
> HTH
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> 
>  
> http://talebzadehmich.wordpress.com 
>  
> 
> On 17 April 2016 at 12:47, Ovidiu-Cristian MARCU 
> mailto:ovidiu-cristian.ma...@inria.fr>> 
> wrote:
> You probably read this benchmark at Yahoo, any comments from Spark?
> https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
>  
> 
> 
> 
>> On 17 Apr 2016, at 12:41, andy petrella > > wrote:
>> 
>> Just adding one thing to the mix: `that the latency for streaming data is 
>> eliminated` is insane :-D
>> 
>> On Sun, Apr 17, 2016 at 12:19 PM Mich Talebzadeh > > wrote:
>>  It seems that Flink argues that the latency for streaming data is 
>> eliminated whereas with Spark RDD there is this latency.
>> 
>> I noticed that Flink does not support interactive shell much like Spark 
>> shell where you can add jars to it to do kafka testing. The advice was to 
>> add the streaming Kafka jar file to CLASSPATH but that does not work.
>> 
>> Most Flink documentation also rather sparce with the usual example of word 
>> count which is not exactly what you want.
>> 
>> Anyway I will have a look at it further. I have a Spark Scala streaming 
>> Kafka program that works fine in Spark and I want to recode it using Scala 
>> for Flink with Kafka but have difficulty importing and testing libraries.
>> 
>> Cheers
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>  
>> 
>>  
>> http://talebzadehmich.wordpress.com 
>>  
>> 
>> On 17 April 2016 at 02:41, Ascot Moss > > wrote:
>> I compared both last month, seems to me that Flink's MLLib is not yet ready.
>> 
>> On Sun, Apr 17, 2016 at 12:23 AM, Mich Talebzadeh > > wrote:
>> Thanks Ted. I was wondering if someone is using both :)
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>  
>> 
>>  
>> http://talebzadehmich.wordpress.com 
>>  
>> 
>> On 16 April 2016 at 17:08, Ted Yu > > wrote:
>> Looks like this question is more relevant on flink mailing list :-)
>> 
>> On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh > > wrote:
>> Hi,
>> 
>> Has anyone used Apache Flink instead of Spark by any chance
>> 
>> I am interested in its set of libraries for Complex Event Processing.
>> 
>> Frankly I don't know if it offers far more than Spark offers.
>> 
>> Thanks
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> 

Re: Apache Flink

2016-04-17 Thread Ovidiu-Cristian MARCU
Hi Mich,

IMO one will try to see if there is an alternative, a better one at least.
This benchmark could be a good starting point.

Best,
Ovidiu
> On 17 Apr 2016, at 15:52, Mich Talebzadeh  wrote:
> 
> Hi,
> 
> I read the benchmark published by Yahoo. Obviously they already use Storm and 
> inevitably very familiar with that tool. To start with although these 
> benchmarks were somehow interesting IMO, it lend itself to an assurance that 
> the tool chosen for their platform is still the best choice. So inevitably 
> the benchmarks and the tests were done to support primary their approach.
> 
> In general anything which is not done through TCP Council or similar body is 
> questionable..
> Their argument is that because Spark handles data streaming in micro batches 
> then inevitably it introduces this in-built latency as per design. In 
> contrast, both Storm and Flink do not (at the face value) have this issue.
> 
> In addition as we already know Spark has far more capabilities compared to 
> Flink (know nothing about Storm). So really it boils down to the business SLA 
> to choose which tool one wants to deploy for your use case. IMO Spark micro 
> batching approach is probably OK for 99% of use cases. If we had in built 
> libraries for CEP for Spark (I am searching for it), I would not bother with 
> Flink.
> 
> HTH
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> 
>  
> http://talebzadehmich.wordpress.com 
>  
> 
> On 17 April 2016 at 12:47, Ovidiu-Cristian MARCU 
> mailto:ovidiu-cristian.ma...@inria.fr>> 
> wrote:
> You probably read this benchmark at Yahoo, any comments from Spark?
> https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
>  
> 
> 
> 
>> On 17 Apr 2016, at 12:41, andy petrella > > wrote:
>> 
>> Just adding one thing to the mix: `that the latency for streaming data is 
>> eliminated` is insane :-D
>> 
>> On Sun, Apr 17, 2016 at 12:19 PM Mich Talebzadeh > > wrote:
>>  It seems that Flink argues that the latency for streaming data is 
>> eliminated whereas with Spark RDD there is this latency.
>> 
>> I noticed that Flink does not support interactive shell much like Spark 
>> shell where you can add jars to it to do kafka testing. The advice was to 
>> add the streaming Kafka jar file to CLASSPATH but that does not work.
>> 
>> Most Flink documentation also rather sparce with the usual example of word 
>> count which is not exactly what you want.
>> 
>> Anyway I will have a look at it further. I have a Spark Scala streaming 
>> Kafka program that works fine in Spark and I want to recode it using Scala 
>> for Flink with Kafka but have difficulty importing and testing libraries.
>> 
>> Cheers
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>  
>> 
>>  
>> http://talebzadehmich.wordpress.com 
>>  
>> 
>> On 17 April 2016 at 02:41, Ascot Moss > > wrote:
>> I compared both last month, seems to me that Flink's MLLib is not yet ready.
>> 
>> On Sun, Apr 17, 2016 at 12:23 AM, Mich Talebzadeh > > wrote:
>> Thanks Ted. I was wondering if someone is using both :)
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>  
>> 
>>  
>> http://talebzadehmich.wordpress.com 
>>  
>> 
>> On 16 April 2016 at 17:08, Ted Yu > > wrote:
>> Looks like this question is more relevant on flink mailing list :-)
>> 
>> On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh > > wrote:
>> Hi,
>> 
>> Has anyone used Apache Flink instead of Spark by any chance
>> 
>> I am interested in its set of libraries for Complex Event Processing.
>> 
>> Frankly I don't know if it offers far more than Spark offers.
>> 
>> Thanks
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>  
>> 
>>  
>> http://talebzadehmich.wordpress.com 
>>  
>> 
>> 
>> 
>> 
>> -- 
>> andy
> 
> 



Re: Apache Flink

2016-04-17 Thread Ovidiu-Cristian MARCU
Yes, mostly regarding spark partitioning and use of groupByKey instead of 
reduceByKey.
However, Flink extended the benchmark here 
http://data-artisans.com/extending-the-yahoo-streaming-benchmark/ 
<http://data-artisans.com/extending-the-yahoo-streaming-benchmark/>
So I was curious about an answer from Spark team, do they plan to do something 
similar.

> On 17 Apr 2016, at 15:33, Silvio Fiorito  
> wrote:
> 
> Actually there were multiple responses to it on the GitHub project, including 
> a PR to improve the Spark code, but they weren’t acknowledged.
>  
>  
> From: Ovidiu-Cristian MARCU <mailto:ovidiu-cristian.ma...@inria.fr>
> Sent: Sunday, April 17, 2016 7:48 AM
> To: andy petrella <mailto:andy.petre...@gmail.com>
> Cc: Mich Talebzadeh <mailto:mich.talebza...@gmail.com>; Ascot Moss 
> <mailto:ascot.m...@gmail.com>; Ted Yu <mailto:yuzhih...@gmail.com>; user 
> @spark <mailto:user@spark.apache.org>
> Subject: Re: Apache Flink
>  
> You probably read this benchmark at Yahoo, any comments from Spark?
> https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
>  
> <https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at>
> 
> 
>> On 17 Apr 2016, at 12:41, andy petrella > <mailto:andy.petre...@gmail.com>> wrote:
>> 
>> Just adding one thing to the mix: `that the latency for streaming data is 
>> eliminated` is insane :-D
>> 
>> On Sun, Apr 17, 2016 at 12:19 PM Mich Talebzadeh > <mailto:mich.talebza...@gmail.com>> wrote:
>>  It seems that Flink argues that the latency for streaming data is 
>> eliminated whereas with Spark RDD there is this latency.
>> 
>> I noticed that Flink does not support interactive shell much like Spark 
>> shell where you can add jars to it to do kafka testing. The advice was to 
>> add the streaming Kafka jar file to CLASSPATH but that does not work.
>> 
>> Most Flink documentation also rather sparce with the usual example of word 
>> count which is not exactly what you want.
>> 
>> Anyway I will have a look at it further. I have a Spark Scala streaming 
>> Kafka program that works fine in Spark and I want to recode it using Scala 
>> for Flink with Kafka but have difficulty importing and testing libraries.
>> 
>> Cheers
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>  
>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>  
>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>  
>> 
>> On 17 April 2016 at 02:41, Ascot Moss > <mailto:ascot.m...@gmail.com>> wrote:
>> I compared both last month, seems to me that Flink's MLLib is not yet ready.
>> 
>> On Sun, Apr 17, 2016 at 12:23 AM, Mich Talebzadeh > <mailto:mich.talebza...@gmail.com>> wrote:
>> Thanks Ted. I was wondering if someone is using both :)
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>  
>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>  
>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>  
>> 
>> On 16 April 2016 at 17:08, Ted Yu > <mailto:yuzhih...@gmail.com>> wrote:
>> Looks like this question is more relevant on flink mailing list :-)
>> 
>> On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh > <mailto:mich.talebza...@gmail.com>> wrote:
>> Hi,
>> 
>> Has anyone used Apache Flink instead of Spark by any chance
>> 
>> I am interested in its set of libraries for Complex Event Processing.
>> 
>> Frankly I don't know if it offers far more than Spark offers.
>> 
>> Thanks
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>  
>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>  
>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>  
>> 
>> 
>> 
>> 
>> -- 
>> andy



Re: Apache Flink

2016-04-17 Thread Corey Nolet
One thing I've noticed about Flink in my following of the project has been
that it has established, in a few cases, some novel ideas and improvements
over Spark. The problem with it, however, is that both the development team
and the community around it are very small and many of those novel
improvements have been rolled directly into Spark in subsequent versions. I
was considering changing over my architecture to Flink at one point to get
better, more real-time CEP streaming support, but in the end I decided to
stick with Spark and just watch Flink continue to pressure it into
improvement.

On Sun, Apr 17, 2016 at 11:03 AM, Koert Kuipers  wrote:

> i never found much info that flink was actually designed to be fault
> tolerant. if fault tolerance is more bolt-on/add-on/afterthought then that
> doesn't bode well for large scale data processing. spark was designed with
> fault tolerance in mind from the beginning.
>
> On Sun, Apr 17, 2016 at 9:52 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Hi,
>>
>> I read the benchmark published by Yahoo. Obviously they already use Storm
>> and inevitably very familiar with that tool. To start with although these
>> benchmarks were somehow interesting IMO, it lend itself to an assurance
>> that the tool chosen for their platform is still the best choice. So
>> inevitably the benchmarks and the tests were done to support primary their
>> approach.
>>
>> In general anything which is not done through TCP Council or similar body
>> is questionable..
>> Their argument is that because Spark handles data streaming in micro
>> batches then inevitably it introduces this in-built latency as per design.
>> In contrast, both Storm and Flink do not (at the face value) have this
>> issue.
>>
>> In addition as we already know Spark has far more capabilities compared
>> to Flink (know nothing about Storm). So really it boils down to the
>> business SLA to choose which tool one wants to deploy for your use case.
>> IMO Spark micro batching approach is probably OK for 99% of use cases. If
>> we had in built libraries for CEP for Spark (I am searching for it), I
>> would not bother with Flink.
>>
>> HTH
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 17 April 2016 at 12:47, Ovidiu-Cristian MARCU <
>> ovidiu-cristian.ma...@inria.fr> wrote:
>>
>>> You probably read this benchmark at Yahoo, any comments from Spark?
>>>
>>> https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
>>>
>>>
>>> On 17 Apr 2016, at 12:41, andy petrella  wrote:
>>>
>>> Just adding one thing to the mix: `that the latency for streaming data
>>> is eliminated` is insane :-D
>>>
>>> On Sun, Apr 17, 2016 at 12:19 PM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
  It seems that Flink argues that the latency for streaming data is
 eliminated whereas with Spark RDD there is this latency.

 I noticed that Flink does not support interactive shell much like Spark
 shell where you can add jars to it to do kafka testing. The advice was to
 add the streaming Kafka jar file to CLASSPATH but that does not work.

 Most Flink documentation also rather sparce with the usual example of
 word count which is not exactly what you want.

 Anyway I will have a look at it further. I have a Spark Scala streaming
 Kafka program that works fine in Spark and I want to recode it using Scala
 for Flink with Kafka but have difficulty importing and testing libraries.

 Cheers

 Dr Mich Talebzadeh


 LinkedIn * 
 https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 *


 http://talebzadehmich.wordpress.com



 On 17 April 2016 at 02:41, Ascot Moss  wrote:

> I compared both last month, seems to me that Flink's MLLib is not yet
> ready.
>
> On Sun, Apr 17, 2016 at 12:23 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Thanks Ted. I was wondering if someone is using both :)
>>
>> Dr Mich Talebzadeh
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 16 April 2016 at 17:08, Ted Yu  wrote:
>>
>>> Looks like this question is more relevant on flink mailing list :-)
>>>
>>> On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 Hi,

 Has anyone used Apache

Re: Apache Flink

2016-04-17 Thread Koert Kuipers
i never found much info that flink was actually designed to be fault
tolerant. if fault tolerance is more bolt-on/add-on/afterthought then that
doesn't bode well for large scale data processing. spark was designed with
fault tolerance in mind from the beginning.

On Sun, Apr 17, 2016 at 9:52 AM, Mich Talebzadeh 
wrote:

> Hi,
>
> I read the benchmark published by Yahoo. Obviously they already use Storm
> and inevitably very familiar with that tool. To start with although these
> benchmarks were somehow interesting IMO, it lend itself to an assurance
> that the tool chosen for their platform is still the best choice. So
> inevitably the benchmarks and the tests were done to support primary their
> approach.
>
> In general anything which is not done through TCP Council or similar body
> is questionable..
> Their argument is that because Spark handles data streaming in micro
> batches then inevitably it introduces this in-built latency as per design.
> In contrast, both Storm and Flink do not (at the face value) have this
> issue.
>
> In addition as we already know Spark has far more capabilities compared to
> Flink (know nothing about Storm). So really it boils down to the business
> SLA to choose which tool one wants to deploy for your use case. IMO Spark
> micro batching approach is probably OK for 99% of use cases. If we had in
> built libraries for CEP for Spark (I am searching for it), I would not
> bother with Flink.
>
> HTH
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 17 April 2016 at 12:47, Ovidiu-Cristian MARCU <
> ovidiu-cristian.ma...@inria.fr> wrote:
>
>> You probably read this benchmark at Yahoo, any comments from Spark?
>>
>> https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
>>
>>
>> On 17 Apr 2016, at 12:41, andy petrella  wrote:
>>
>> Just adding one thing to the mix: `that the latency for streaming data is
>> eliminated` is insane :-D
>>
>> On Sun, Apr 17, 2016 at 12:19 PM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>>  It seems that Flink argues that the latency for streaming data is
>>> eliminated whereas with Spark RDD there is this latency.
>>>
>>> I noticed that Flink does not support interactive shell much like Spark
>>> shell where you can add jars to it to do kafka testing. The advice was to
>>> add the streaming Kafka jar file to CLASSPATH but that does not work.
>>>
>>> Most Flink documentation also rather sparce with the usual example of
>>> word count which is not exactly what you want.
>>>
>>> Anyway I will have a look at it further. I have a Spark Scala streaming
>>> Kafka program that works fine in Spark and I want to recode it using Scala
>>> for Flink with Kafka but have difficulty importing and testing libraries.
>>>
>>> Cheers
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 17 April 2016 at 02:41, Ascot Moss  wrote:
>>>
 I compared both last month, seems to me that Flink's MLLib is not yet
 ready.

 On Sun, Apr 17, 2016 at 12:23 AM, Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:

> Thanks Ted. I was wondering if someone is using both :)
>
> Dr Mich Talebzadeh
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 16 April 2016 at 17:08, Ted Yu  wrote:
>
>> Looks like this question is more relevant on flink mailing list :-)
>>
>> On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Has anyone used Apache Flink instead of Spark by any chance
>>>
>>> I am interested in its set of libraries for Complex Event Processing.
>>>
>>> Frankly I don't know if it offers far more than Spark offers.
>>>
>>> Thanks
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>
>>
>

>>> --
>> andy
>>
>>
>>
>


Re: Apache Flink

2016-04-17 Thread Mich Talebzadeh
Hi,

I read the benchmark published by Yahoo. Obviously they already use Storm
and inevitably very familiar with that tool. To start with although these
benchmarks were somehow interesting IMO, it lend itself to an assurance
that the tool chosen for their platform is still the best choice. So
inevitably the benchmarks and the tests were done to support primary their
approach.

In general anything which is not done through TCP Council or similar body
is questionable..
Their argument is that because Spark handles data streaming in micro
batches then inevitably it introduces this in-built latency as per design.
In contrast, both Storm and Flink do not (at the face value) have this
issue.

In addition as we already know Spark has far more capabilities compared to
Flink (know nothing about Storm). So really it boils down to the business
SLA to choose which tool one wants to deploy for your use case. IMO Spark
micro batching approach is probably OK for 99% of use cases. If we had in
built libraries for CEP for Spark (I am searching for it), I would not
bother with Flink.

HTH


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 17 April 2016 at 12:47, Ovidiu-Cristian MARCU <
ovidiu-cristian.ma...@inria.fr> wrote:

> You probably read this benchmark at Yahoo, any comments from Spark?
>
> https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
>
>
> On 17 Apr 2016, at 12:41, andy petrella  wrote:
>
> Just adding one thing to the mix: `that the latency for streaming data is
> eliminated` is insane :-D
>
> On Sun, Apr 17, 2016 at 12:19 PM Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>>  It seems that Flink argues that the latency for streaming data is
>> eliminated whereas with Spark RDD there is this latency.
>>
>> I noticed that Flink does not support interactive shell much like Spark
>> shell where you can add jars to it to do kafka testing. The advice was to
>> add the streaming Kafka jar file to CLASSPATH but that does not work.
>>
>> Most Flink documentation also rather sparce with the usual example of
>> word count which is not exactly what you want.
>>
>> Anyway I will have a look at it further. I have a Spark Scala streaming
>> Kafka program that works fine in Spark and I want to recode it using Scala
>> for Flink with Kafka but have difficulty importing and testing libraries.
>>
>> Cheers
>>
>> Dr Mich Talebzadeh
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 17 April 2016 at 02:41, Ascot Moss  wrote:
>>
>>> I compared both last month, seems to me that Flink's MLLib is not yet
>>> ready.
>>>
>>> On Sun, Apr 17, 2016 at 12:23 AM, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 Thanks Ted. I was wondering if someone is using both :)

 Dr Mich Talebzadeh


 LinkedIn * 
 https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 *


 http://talebzadehmich.wordpress.com



 On 16 April 2016 at 17:08, Ted Yu  wrote:

> Looks like this question is more relevant on flink mailing list :-)
>
> On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Hi,
>>
>> Has anyone used Apache Flink instead of Spark by any chance
>>
>> I am interested in its set of libraries for Complex Event Processing.
>>
>> Frankly I don't know if it offers far more than Spark offers.
>>
>> Thanks
>>
>> Dr Mich Talebzadeh
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>
>

>>>
>> --
> andy
>
>
>


RE: Apache Flink

2016-04-17 Thread Silvio Fiorito
Actually there were multiple responses to it on the GitHub project, including a 
PR to improve the Spark code, but they weren’t acknowledged.


From: Ovidiu-Cristian MARCU<mailto:ovidiu-cristian.ma...@inria.fr>
Sent: Sunday, April 17, 2016 7:48 AM
To: andy petrella<mailto:andy.petre...@gmail.com>
Cc: Mich Talebzadeh<mailto:mich.talebza...@gmail.com>; Ascot 
Moss<mailto:ascot.m...@gmail.com>; Ted Yu<mailto:yuzhih...@gmail.com>; user 
@spark<mailto:user@spark.apache.org>
Subject: Re: Apache Flink

You probably read this benchmark at Yahoo, any comments from Spark?
https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at


On 17 Apr 2016, at 12:41, andy petrella 
mailto:andy.petre...@gmail.com>> wrote:

Just adding one thing to the mix: `that the latency for streaming data is 
eliminated` is insane :-D

On Sun, Apr 17, 2016 at 12:19 PM Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>> wrote:
 It seems that Flink argues that the latency for streaming data is eliminated 
whereas with Spark RDD there is this latency.

I noticed that Flink does not support interactive shell much like Spark shell 
where you can add jars to it to do kafka testing. The advice was to add the 
streaming Kafka jar file to CLASSPATH but that does not work.

Most Flink documentation also rather sparce with the usual example of word 
count which is not exactly what you want.

Anyway I will have a look at it further. I have a Spark Scala streaming Kafka 
program that works fine in Spark and I want to recode it using Scala for Flink 
with Kafka but have difficulty importing and testing libraries.

Cheers

Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>



On 17 April 2016 at 02:41, Ascot Moss 
mailto:ascot.m...@gmail.com>> wrote:
I compared both last month, seems to me that Flink's MLLib is not yet ready.

On Sun, Apr 17, 2016 at 12:23 AM, Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>> wrote:
Thanks Ted. I was wondering if someone is using both :)

Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>



On 16 April 2016 at 17:08, Ted Yu 
mailto:yuzhih...@gmail.com>> wrote:
Looks like this question is more relevant on flink mailing list :-)

On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>> wrote:
Hi,

Has anyone used Apache Flink instead of Spark by any chance

I am interested in its set of libraries for Complex Event Processing.

Frankly I don't know if it offers far more than Spark offers.

Thanks

Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>






--
andy



Re: Apache Flink

2016-04-17 Thread Igor Berman
latency in Flink is not eliminated, but it might be smaller since Flink
process each event 1-by-1 while Spark does microbatching(so you can't
achieve latency lesser than your microbatch config)
probably Spark will have better throughput due to this microbatching



On 17 April 2016 at 14:47, Ovidiu-Cristian MARCU <
ovidiu-cristian.ma...@inria.fr> wrote:

> You probably read this benchmark at Yahoo, any comments from Spark?
>
> https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
>
>
> On 17 Apr 2016, at 12:41, andy petrella  wrote:
>
> Just adding one thing to the mix: `that the latency for streaming data is
> eliminated` is insane :-D
>
> On Sun, Apr 17, 2016 at 12:19 PM Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>>  It seems that Flink argues that the latency for streaming data is
>> eliminated whereas with Spark RDD there is this latency.
>>
>> I noticed that Flink does not support interactive shell much like Spark
>> shell where you can add jars to it to do kafka testing. The advice was to
>> add the streaming Kafka jar file to CLASSPATH but that does not work.
>>
>> Most Flink documentation also rather sparce with the usual example of
>> word count which is not exactly what you want.
>>
>> Anyway I will have a look at it further. I have a Spark Scala streaming
>> Kafka program that works fine in Spark and I want to recode it using Scala
>> for Flink with Kafka but have difficulty importing and testing libraries.
>>
>> Cheers
>>
>> Dr Mich Talebzadeh
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 17 April 2016 at 02:41, Ascot Moss  wrote:
>>
>>> I compared both last month, seems to me that Flink's MLLib is not yet
>>> ready.
>>>
>>> On Sun, Apr 17, 2016 at 12:23 AM, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 Thanks Ted. I was wondering if someone is using both :)

 Dr Mich Talebzadeh


 LinkedIn * 
 https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 *


 http://talebzadehmich.wordpress.com



 On 16 April 2016 at 17:08, Ted Yu  wrote:

> Looks like this question is more relevant on flink mailing list :-)
>
> On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Hi,
>>
>> Has anyone used Apache Flink instead of Spark by any chance
>>
>> I am interested in its set of libraries for Complex Event Processing.
>>
>> Frankly I don't know if it offers far more than Spark offers.
>>
>> Thanks
>>
>> Dr Mich Talebzadeh
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>
>

>>>
>> --
> andy
>
>
>


Re: Apache Flink

2016-04-17 Thread Ovidiu-Cristian MARCU
You probably read this benchmark at Yahoo, any comments from Spark?
https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
 



> On 17 Apr 2016, at 12:41, andy petrella  wrote:
> 
> Just adding one thing to the mix: `that the latency for streaming data is 
> eliminated` is insane :-D
> 
> On Sun, Apr 17, 2016 at 12:19 PM Mich Talebzadeh  > wrote:
>  It seems that Flink argues that the latency for streaming data is eliminated 
> whereas with Spark RDD there is this latency.
> 
> I noticed that Flink does not support interactive shell much like Spark shell 
> where you can add jars to it to do kafka testing. The advice was to add the 
> streaming Kafka jar file to CLASSPATH but that does not work.
> 
> Most Flink documentation also rather sparce with the usual example of word 
> count which is not exactly what you want.
> 
> Anyway I will have a look at it further. I have a Spark Scala streaming Kafka 
> program that works fine in Spark and I want to recode it using Scala for 
> Flink with Kafka but have difficulty importing and testing libraries.
> 
> Cheers
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> 
>  
> http://talebzadehmich.wordpress.com 
>  
> 
> On 17 April 2016 at 02:41, Ascot Moss  > wrote:
> I compared both last month, seems to me that Flink's MLLib is not yet ready.
> 
> On Sun, Apr 17, 2016 at 12:23 AM, Mich Talebzadeh  > wrote:
> Thanks Ted. I was wondering if someone is using both :)
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> 
>  
> http://talebzadehmich.wordpress.com 
>  
> 
> On 16 April 2016 at 17:08, Ted Yu  > wrote:
> Looks like this question is more relevant on flink mailing list :-)
> 
> On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh  > wrote:
> Hi,
> 
> Has anyone used Apache Flink instead of Spark by any chance
> 
> I am interested in its set of libraries for Complex Event Processing.
> 
> Frankly I don't know if it offers far more than Spark offers.
> 
> Thanks
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> 
>  
> http://talebzadehmich.wordpress.com 
>  
> 
> 
> 
> 
> -- 
> andy



Re: Apache Flink

2016-04-17 Thread andy petrella
Just adding one thing to the mix: `that the latency for streaming data is
eliminated` is insane :-D

On Sun, Apr 17, 2016 at 12:19 PM Mich Talebzadeh 
wrote:

>  It seems that Flink argues that the latency for streaming data is
> eliminated whereas with Spark RDD there is this latency.
>
> I noticed that Flink does not support interactive shell much like Spark
> shell where you can add jars to it to do kafka testing. The advice was to
> add the streaming Kafka jar file to CLASSPATH but that does not work.
>
> Most Flink documentation also rather sparce with the usual example of word
> count which is not exactly what you want.
>
> Anyway I will have a look at it further. I have a Spark Scala streaming
> Kafka program that works fine in Spark and I want to recode it using Scala
> for Flink with Kafka but have difficulty importing and testing libraries.
>
> Cheers
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 17 April 2016 at 02:41, Ascot Moss  wrote:
>
>> I compared both last month, seems to me that Flink's MLLib is not yet
>> ready.
>>
>> On Sun, Apr 17, 2016 at 12:23 AM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Thanks Ted. I was wondering if someone is using both :)
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 16 April 2016 at 17:08, Ted Yu  wrote:
>>>
 Looks like this question is more relevant on flink mailing list :-)

 On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:

> Hi,
>
> Has anyone used Apache Flink instead of Spark by any chance
>
> I am interested in its set of libraries for Complex Event Processing.
>
> Frankly I don't know if it offers far more than Spark offers.
>
> Thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>


>>>
>>
> --
andy


Re: Apache Flink

2016-04-17 Thread Mich Talebzadeh
 It seems that Flink argues that the latency for streaming data is
eliminated whereas with Spark RDD there is this latency.

I noticed that Flink does not support interactive shell much like Spark
shell where you can add jars to it to do kafka testing. The advice was to
add the streaming Kafka jar file to CLASSPATH but that does not work.

Most Flink documentation also rather sparce with the usual example of word
count which is not exactly what you want.

Anyway I will have a look at it further. I have a Spark Scala streaming
Kafka program that works fine in Spark and I want to recode it using Scala
for Flink with Kafka but have difficulty importing and testing libraries.

Cheers

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 17 April 2016 at 02:41, Ascot Moss  wrote:

> I compared both last month, seems to me that Flink's MLLib is not yet
> ready.
>
> On Sun, Apr 17, 2016 at 12:23 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Thanks Ted. I was wondering if someone is using both :)
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 16 April 2016 at 17:08, Ted Yu  wrote:
>>
>>> Looks like this question is more relevant on flink mailing list :-)
>>>
>>> On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 Hi,

 Has anyone used Apache Flink instead of Spark by any chance

 I am interested in its set of libraries for Complex Event Processing.

 Frankly I don't know if it offers far more than Spark offers.

 Thanks

 Dr Mich Talebzadeh



 LinkedIn * 
 https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 *



 http://talebzadehmich.wordpress.com



>>>
>>>
>>
>


Re: Apache Flink

2016-04-16 Thread Ascot Moss
I compared both last month, seems to me that Flink's MLLib is not yet ready.

On Sun, Apr 17, 2016 at 12:23 AM, Mich Talebzadeh  wrote:

> Thanks Ted. I was wondering if someone is using both :)
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 16 April 2016 at 17:08, Ted Yu  wrote:
>
>> Looks like this question is more relevant on flink mailing list :-)
>>
>> On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Has anyone used Apache Flink instead of Spark by any chance
>>>
>>> I am interested in its set of libraries for Complex Event Processing.
>>>
>>> Frankly I don't know if it offers far more than Spark offers.
>>>
>>> Thanks
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>
>>
>


Re: Apache Flink

2016-04-16 Thread Mich Talebzadeh
Thanks Ted. I was wondering if someone is using both :)

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 16 April 2016 at 17:08, Ted Yu  wrote:

> Looks like this question is more relevant on flink mailing list :-)
>
> On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Hi,
>>
>> Has anyone used Apache Flink instead of Spark by any chance
>>
>> I am interested in its set of libraries for Complex Event Processing.
>>
>> Frankly I don't know if it offers far more than Spark offers.
>>
>> Thanks
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>
>


Re: Apache Flink

2016-04-16 Thread Ted Yu
Looks like this question is more relevant on flink mailing list :-)

On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh 
wrote:

> Hi,
>
> Has anyone used Apache Flink instead of Spark by any chance
>
> I am interested in its set of libraries for Complex Event Processing.
>
> Frankly I don't know if it offers far more than Spark offers.
>
> Thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>