Hi, I'm working with structured streaming, and I'm wondering whether there should be some improvements about trigger.
Currently, when I specify a trigger, i.e. tigger(Trigger.ProcessingTime("10 minutes")), the engine will begin processing data at the time the trigger begins, like 10:00:00, 10:10:00, 10:20:00,..., etc, if the engine takes 10s to process this batch of data, then we will get the output result at 10:00:10..., then the engine just waits without processing any data. When the next trigger begins, the engine begins to process the data during the interval, and if this time the engine takes 15s to process the batch, we will get result at 10:10:15. This is the problem. In my understanding, the trigger and data processing should be decoupled, the engine should keep on processing data as fast as possible, but only generate output results at each trigger, therefore we can get the result at 10:00:00, 10:10:00, 10:20:00, ... So I'm wondering if there is any solution or plan to work on this? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org