Hi, It's still microbatching architecture with triggers as batchIntervals. It's just faster by default and the API is more pleasant, i.e. Dataset-driven.
Jacek On 13 Jul 2016 10:35 p.m., "Matthias Niehoff" < matthias.nieh...@codecentric.de> wrote: Hi everybody, as far as I understand with new the structured Streaming API the output will not get processed every x seconds anymore. Instead the data will be processed as soon as is arrived. But there might be a delay due to processing time for the data. A small example: Data comes in and the processing takes 1 second (quite long) In this 1 second a lot of new data come in which will be processed after the processing of the first data finished. My questions are: Is the data for each processing, i.e all the data collected in the 1 second still processed as a microbatch (included reprocessing in case of failure on another worker, etc.)? Or is the bulk of data processed one by one? With regards to the processing time: is the behavior the same for high processing times as in spark 1.x? Meaning we get a scheduling delay, data is stored by a receiver,.. (is there even a concept of receiver in Spark 2? Is a source in streaming basically a receiver?) Hope those questions aren’t to confusing :-) Thank you!