Hello there, I have a quick question for the following case:
situation: a spark consumer is able to process 5 batches in 10 sec (where the batch interval is zero by default - correct me if this is wrong). the window size is 10 sec (zero overlapping sliding). there are some fluctuations in the incoming message arriving rate, resulting a slightly higher incoming message rate than the consumer is able to handle, say, sometimes 6 batch worth of data comes in 10 sec for 5 minutes ... question: would we (spark 2.2) drop the 6th. batch when the 10 sec window moves on - Or unprocessed batches keeps accumulate? if we drop, would we dump a warning in the log? i can see warning (see attached below) when the batch processing takes more time than an explicitly set batch interval (which is not the case here). i would expect a similar warning in the log (can't find this type of warning) when we have to drop batches in the above case. maybe i was just looking for wrong text in the log? in general, is the expectation reasonable? (I can't find anything here: https://spark.apache.org/docs/2.2.1/streaming-programming-guide.html and from general googling) any comments/suggestions would be very much appreciated! Thanks, Peter 18/08/03 00:33:43 WARN streaming.ProcessingTimeExecutor: Current batch is falling behind. The trigger interval is 10000 milliseconds, but spent 11965 milliseconds 18/08/03 00:33:55 WARN streaming.ProcessingTimeExecutor: Current batch is falling behind. The trigger interval is 10000 milliseconds, but spent 11266 milliseconds
