Re: Comparison of storm and flink

2016-01-23 Thread Vinaya M S
Hi Slim Baltagi,

   Thank you for the list you mentioned. It will be really helpful. I have
gone through few of the materials you have mentioned, like:

1. Benchmarking Streaming Computation Engines at Yahoo!
2. CapitalOne slides in slideshare.
3. Data-artisan article.

Based on these I have identified few of the metrics.

1. Number of tuples processed for every second.
2. Measuring throughput by keeping number of tuples/second constant.

I'm thinking of comparing:
Read/write throughput: I have to figure out a way to compare storm::spout
~flink::env.getstream and storm::ReportBolt ~ flink::sink

I'm not sure of it yet.

During the seven-week Insight Data Engineering Fellows program we aim to
build a data platform to handle large, real-time datasets. Considering the
short period we spend at Insight working on a project, I don't consider it
to be full blown benchmark study. But I wanted to be careful and would be
willing to work further on those lines.

I have enrolled for the meet up happening at NYC as I consider it to be
great place to gain knowledge on flink. Looking forward for your talk as
well as to meet you and discuss the questions I have.


Thank you,
Vinaya M S




On Sat, Jan 23, 2016 at 3:14 PM, Slim Baltagi  wrote:

> Hi Vinaya
>
> 1. Comparing streaming tools ( in this case Storm and Flink) should not be
> based on performance benchmarks only! For example, slides 16-36 list over
> 96
> criteria, that we identified at Capital One, to compare two streaming tools
> http://www.slideshare.net/sbaltagi/flink-vs-spark/17
>
> 2. Now, if you are focusing on performance only, I'll suggest a few related
> resources:
>
> - Benchmarking Streaming Computation Engines at Yahoo!
>
> http://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
> December 16, 2015 Code at github:
> https://github.com/yahoo/streaming-benchmarks
>
> -  There is some work started by some Flink contributors to create some
> performance scripts for Flink, Spark, and MapReduce here: There is Apache
> Flink: Performance and Testing
> https://github.com/project-flink/flink-perf
>
> - Some first numbers on performance of streaming jobs with Apache Flink are
> here:
>
> http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/
> under the section: 'Show me the numbers'. Code used is at:
> https://github.com/dataArtisans/performance
>
> - Yangjun Wang is currently working on his Master thesis at Aalto
> university
> in Helsinki, Finland. The topic of his thesis is about building a standard
> benchmark system for streaming processing systems like Apache Storm, Spark
> and Flink. Code at github
> https://github.com/wangyangjun/StreamBench/tree/master/StreamBench
>
> 3. I am giving a talk in NYC on Tuesday February 2nd, 2016 on Apache Flink
> and I will be touching a bit on benchmarks
>
> http://www.meetup.com/New-York-City-NYC-Apache-Flink-Meetup/events/228113118/
> You are welcome to attend.
>
> Thanks
>
> Slim Baltagi
>
>
>
> --
> View this message in context:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Comparison-of-storm-and-flink-tp4468p4469.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>


Re: Comparison of storm and flink

2016-01-23 Thread Slim Baltagi
Hi Vinaya

1. Comparing streaming tools ( in this case Storm and Flink) should not be
based on performance benchmarks only! For example, slides 16-36 list over 96
criteria, that we identified at Capital One, to compare two streaming tools   
http://www.slideshare.net/sbaltagi/flink-vs-spark/17

2. Now, if you are focusing on performance only, I'll suggest a few related
resources: 

- Benchmarking Streaming Computation Engines at Yahoo!  
http://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
 
December 16, 2015 Code at github:
https://github.com/yahoo/streaming-benchmarks

-  There is some work started by some Flink contributors to create some
performance scripts for Flink, Spark, and MapReduce here: There is Apache
Flink: Performance and Testing  https://github.com/project-flink/flink-perf

- Some first numbers on performance of streaming jobs with Apache Flink are
here:
http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/
 
under the section: 'Show me the numbers'. Code used is at:
https://github.com/dataArtisans/performance  

- Yangjun Wang is currently working on his Master thesis at Aalto university
in Helsinki, Finland. The topic of his thesis is about building a standard
benchmark system for streaming processing systems like Apache Storm, Spark
and Flink. Code at github
https://github.com/wangyangjun/StreamBench/tree/master/StreamBench

3. I am giving a talk in NYC on Tuesday February 2nd, 2016 on Apache Flink
and I will be touching a bit on benchmarks
http://www.meetup.com/New-York-City-NYC-Apache-Flink-Meetup/events/228113118/
You are welcome to attend. 

Thanks

Slim Baltagi 



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Comparison-of-storm-and-flink-tp4468p4469.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Comparison of storm and flink

2016-01-23 Thread Vinaya M S
Hi Flink user group,

I am working on a project for the Insight Data Engineering Program in New
York to compare streaming tools. The program is designed for software
engineers and those straight from the university to transition to a data
engineering role.  After completing the project, we present demos of the
project to several companies in NYC that we are interested in working for
(including top companies like NY Times, Capital One, Bloomberg, etc).

I have decided to work on a project to compare streaming tools, namely
Flink and Storm.  I already have Twitter data stored and would like to
design tests to benchmark the the two tools if possible.

I wanted to be extra-careful in constructing a benchmark to work on and
present at companies here in NY.  Do you have any recommendations to tests
to run with the Twitter data that I have that would showcase when to and
not use Flink compared to Storm?

Thanks!
Vinaya