As Spark uses micro-batch for streaming, it's unavoidable to adjust the
batch size properly to achieve your expectation of throughput vs latency.
Especially, Spark uses global watermark which doesn't propagate (change)
during micro-batch, you'd want to make the batch relatively small to make
While running my Spark (Stateful) Structured Streaming job I am setting
'maxOffsetsPerTrigger' value to 10 Million. I've noticed that messages are
processed faster if I use a large value for this property.
What I am also noticing is that until the batch is completely processed, no
messages are