Sorry, should have mentioned that Spark only seems reluctant to take the last windowed, groupBy batch from Kafka when using OutputMode.Append.
I've asked on StackOverflow: https://stackoverflow.com/questions/62915922/spark-structured-streaming-wont-pull-the-final-batch-from-kafka but am still struggling. Can anybody please help? How do people test their SSS code if you have to put a message on Kafka to get Spark to consume a batch? Kind regards, Phillip On Sun, Jul 12, 2020 at 4:55 PM Phillip Henry <londonjava...@gmail.com> wrote: > Hi, folks. > > I noticed that SSS won't process a waiting batch if there are no batches > after that. To put it another way, Spark must always leave one batch on > Kafka waiting to be consumed. > > There is a JIRA for this at: > > https://issues.apache.org/jira/browse/SPARK-24156 > > that says it's resolved in 2.4.0 but my code > <https://github.com/PhillHenry/SSSPlayground/blob/Spark2/src/test/scala/uk/co/odinconsultants/sssplayground/windows/TimestampedStreamingSpec.scala> > is using 2.4.2 yet I still see Spark reluctant to consume another batch > from Kafka if it means there is nothing else waiting to be processed in the > topic. > > Do I have to do something special to exploit the behaviour that > SPARK-24156 says it has addressed? > > Regards, > > Phillip > > > >