Re: Structured Streaming and Microbatches

Jacek Laskowski Wed, 13 Jul 2016 14:12:42 -0700

Hi,

It's still microbatching architecture with triggers as batchIntervals. It's
just faster by default and the API is more pleasant, i.e. Dataset-driven.


Jacek
On 13 Jul 2016 10:35 p.m., "Matthias Niehoff" <
[email protected]> wrote:

Hi everybody,

as far as I understand with new the structured Streaming API the output
will not get processed every x seconds anymore. Instead the data will be
processed as soon as is arrived. But there might be a delay due to
processing time for the data.

A small example:
Data comes in and the processing takes 1 second (quite long)
In this 1 second a lot of new data come in which will be processed after
the processing of the first data finished.

My questions are:
Is the data for each processing, i.e all the data collected in the 1 second
still processed as a microbatch (included reprocessing in case of failure
on another worker, etc.)? Or is the bulk of data processed one by one?

With regards to the processing time: is the behavior the same for high
processing times as in spark 1.x? Meaning we get a scheduling delay, data
is stored by a receiver,.. (is there even a concept of receiver in Spark 2?
Is a source in streaming basically a receiver?)

Hope those questions aren’t to confusing :-)

Thank you!

Re: Structured Streaming and Microbatches

Reply via email to