Is "spark streaming" streaming or mini-batch?
I look at something Like Complex Event Processing (CEP) which is a leader
use case for data streaming (and I am experimenting with Spark for it) and
in the realm of CEP there is really no such thing as continuous data
streaming. The point is that when
On 23 Aug 2016, at 17:58, Mich Talebzadeh
> wrote:
In general depending what you are doing you can tighten above parameters. For
example if you are using Spark Streaming for Anti-fraud detection, you may
stream data in at 2 seconds
I think people explained this pretty well, but in practice, this distinction is
also somewhat of a marketing term, because every system will perform some kind
of batching. For example, every time you use TCP, the OS and network stack may
buffer multiple messages together and send them at once;
Thanks everyone for clarifying.
On Tue, Aug 23, 2016 at 9:11 PM, Aseem Bansal wrote:
> I was reading this article https://www.inovex.de/blog/storm-in-a-teacup/
> and it mentioned that spark streaming actually mini-batch not actual
> streaming.
>
> I have not used streaming
Russell Is correct here.
micro-batch means it does processing within a window. In general there are
three things here.
batch window
This is the basic interval at which the system with receive the data in
batches. This is the interval set when creating a StreamingContext. For
example, if you set
Spark streaming does not process 1 event at a time which is in general I
think what people call "Streaming." It instead processes groups of events.
Each group is a "MicroBatch" that gets processed at the same time.
Streaming theoretically always has better latency because the event is
processed
It's based on "micro batching" model.
Sent from my iPhone
> On Aug 23, 2016, at 8:41 AM, Aseem Bansal wrote:
>
> I was reading this article https://www.inovex.de/blog/storm-in-a-teacup/ and
> it mentioned that spark streaming actually mini-batch not actual streaming.
>