I think people explained this pretty well, but in practice, this distinction is also somewhat of a marketing term, because every system will perform some kind of batching. For example, every time you use TCP, the OS and network stack may buffer multiple messages together and send them at once; and likewise, virtually all streaming engines can batch data internally to achieve higher throughput. Furthermore, in all APIs, you can see individual records and respond to them one by one. The main question is just what overall performance you get (throughput and latency).
Matei > On Aug 23, 2016, at 4:08 PM, Aseem Bansal <asmbans...@gmail.com> wrote: > > Thanks everyone for clarifying. > > On Tue, Aug 23, 2016 at 9:11 PM, Aseem Bansal <asmbans...@gmail.com > <mailto:asmbans...@gmail.com>> wrote: > I was reading this article https://www.inovex.de/blog/storm-in-a-teacup/ > <https://www.inovex.de/blog/storm-in-a-teacup/> and it mentioned that spark > streaming actually mini-batch not actual streaming. > > I have not used streaming and I am not sure what is the difference in the 2 > terms. Hence could not make a judgement myself. >