Re: [Structured Streaming]Data processing and output trigger should be decoupled

Shixiong(Ryan) Zhu Wed, 30 Aug 2017 10:59:26 -0700

I don't think that's a good idea. If the engine keeps on processing data
but doesn't output anything, where to keep the intermediate data?


On Wed, Aug 30, 2017 at 9:26 AM, KevinZwx <kevinzwx1...@gmail.com> wrote:

> Hi,
>
> I'm working with structured streaming, and I'm wondering whether there
> should be some improvements about trigger.
>
> Currently, when I specify a trigger, i.e. tigger(Trigger.ProcessingTime(
> "10
> minutes")), the engine will begin processing data at the time the trigger
> begins, like 10:00:00, 10:10:00, 10:20:00,..., etc, if the engine takes 10s
> to process this batch of data, then we will get the output result at
> 10:00:10...,  then the engine just waits without processing any data. When
> the next trigger begins, the engine begins to process the data during the
> interval, and if this time the engine takes 15s to process the batch, we
> will get result at 10:10:15. This is the problem.
>
> In my understanding, the trigger and data processing should be decoupled,
> the engine should keep on processing data as fast as possible, but only
> generate output results at each trigger, therefore we can get the result at
> 10:00:00, 10:10:00, 10:20:00, ... So I'm wondering if there is any solution
> or plan to work on this?
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Re: [Structured Streaming]Data processing and output trigger should be decoupled

Reply via email to