RE: Benchmarking Streaming Computation Engines at Yahoo!

刘键(Basti Liu) Fri, 18 Dec 2015 00:44:07 -0800

Hi Jerry,

Thanks for the clarification.
But just for my understanding, the reason why we got the lower latency is the 
"window" mechanism in Flink. I guess the stream in Flink is flushed as one or 
several batches 
for a window. So when lower throughputs, it will lead to the extra waiting at 
source component. So it is possible to lower the latency of Flink by adjusting 
configuration.
Actually, my point here is that if we want to compete with Flink or spark 
stream for at least once or exactly once (high throughput and low latency), the 
acking mechanism 
of storm needs to be improved. Currently, there are too many extras messages 
for acking mechanism in Storm. Sometimes, the throughput of topology depends on 
the 
throughput of acker.

Regards
Basti

-----Original Message-----
From: Boyang(Jerry) Peng [mailto:[email protected]] 
Sent: Friday, December 18, 2015 7:08 AM
To: [email protected]
Subject: Re: Benchmarking Streaming Computation Engines at Yahoo!

Hello Satiash,
One of the experiments we wish to do in the future is to compare flink with 
checkpointing with Storm with acking. If you look at our results, Storm with 
acking does have lower latency than Flink without checkpointing at lower 
throughputs.  The keyword here is lower throughputs. What we were trying to say 
is that Storm with the optimizations we proposed can be comparable to with 
Flink without checkpointing at higher throughputs even with acking turned on. 
Best, Jerry 

    On Thursday, December 17, 2015 1:27 PM, Satish Duggana 
<[email protected]> wrote:

 Hi Jerry,
Thanks for updating the blog.

Storm with acking should be compared with similar configuration on Flink
which may be with checkpointing enabled or some other configuration which
gives at-least-once guarantee. But the below paragraph gives an impression
that storm with acking is equivalent of Flink without checkpointing which
is not right.

"Without acking, Storm even beat Flink at very high throughput, and we
expect that with further optimizations like combining bolts, more
intelligent routing of tuples, and improved acking, Storm with acking
enabled would compete with Flink at very high throughput too."

Thanks,
Satish.

On Thu, Dec 17, 2015 at 10:47 PM, Boyang(Jerry) Peng <
[email protected]> wrote:

> Hello Satish,
> You are correct, there was a typo.  The sentence should be:
> Flink uses a mechanism called checkpointing to guarantee processing.
> Unless checkpointing is used in the Flink job, Flink offers at most once
> processing similar to Storm with acking turned OFF.  For the Flink
> benchmark we did not use checkpointing."
>
> We have already fixed the typo on the blog.  Thanks!
> Best,
> Boyang Jerry Peng
>
>
>    On Thursday, December 17, 2015 4:12 AM, Satish Duggana <
> [email protected]> wrote:
>
>
>  Hi Bobby etal,
> Thanks for publishing blog post on “Benchmarking streaming computation
> engines<
> http://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at>”.
> It gives good insights on how different streaming engines perform with the
> usecase mentioned.
>
> “Flink uses a mechanism called checkpointing to guarantee processing.
> Unless checkpointing is used in the Flink job, Flink offers at most once
> processing similar to Storm with acking turned on.  For the Flink benchmark
> we did not use checkpointing."
>
> Above snippet in your blog was confusing regarding at-most-once guarantee.
> My understanding is that Storm gives at-most-once without acking. But
> at-least-once guarantee requires acking on. So, Storm’s acking should be
> compared with Flink’s at-least-once guarantee which may be by enabling
> checkpointing or any other required configuration. Am I missing anything
> here?
>
> Thanks,
> Satish.
>
>
>
>

RE: Benchmarking Streaming Computation Engines at Yahoo!

Reply via email to