Re: Is "spark streaming" streaming or mini-batch?

Matei Zaharia Tue, 23 Aug 2016 16:16:33 -0700

I think people explained this pretty well, but in practice, this distinction is 
also somewhat of a marketing term, because every system will perform some kind 
of batching. For example, every time you use TCP, the OS and network stack may 
buffer multiple messages together and send them at once; and likewise, 
virtually all streaming engines can batch data internally to achieve higher 
throughput. Furthermore, in all APIs, you can see individual records and 
respond to them one by one. The main question is just what overall performance 
you get (throughput and latency).


Matei

> On Aug 23, 2016, at 4:08 PM, Aseem Bansal <asmbans...@gmail.com> wrote:
> 
> Thanks everyone for clarifying.
> 
> On Tue, Aug 23, 2016 at 9:11 PM, Aseem Bansal <asmbans...@gmail.com 
> <mailto:asmbans...@gmail.com>> wrote:
> I was reading this article https://www.inovex.de/blog/storm-in-a-teacup/ 
> <https://www.inovex.de/blog/storm-in-a-teacup/> and it mentioned that spark 
> streaming actually mini-batch not actual streaming. 
> 
> I have not used streaming and I am not sure what is the difference in the 2 
> terms. Hence could not make a judgement myself.
>

Re: Is "spark streaming" streaming or mini-batch?

Reply via email to