I don't think that's a good idea. If the engine keeps on processing data but doesn't output anything, where to keep the intermediate data?
On Wed, Aug 30, 2017 at 9:26 AM, KevinZwx <kevinzwx1...@gmail.com> wrote: > Hi, > > I'm working with structured streaming, and I'm wondering whether there > should be some improvements about trigger. > > Currently, when I specify a trigger, i.e. tigger(Trigger.ProcessingTime( > "10 > minutes")), the engine will begin processing data at the time the trigger > begins, like 10:00:00, 10:10:00, 10:20:00,..., etc, if the engine takes 10s > to process this batch of data, then we will get the output result at > 10:00:10..., then the engine just waits without processing any data. When > the next trigger begins, the engine begins to process the data during the > interval, and if this time the engine takes 15s to process the batch, we > will get result at 10:10:15. This is the problem. > > In my understanding, the trigger and data processing should be decoupled, > the engine should keep on processing data as fast as possible, but only > generate output results at each trigger, therefore we can get the result at > 10:00:00, 10:10:00, 10:20:00, ... So I'm wondering if there is any solution > or plan to work on this? > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >