Re: Question about 'maxOffsetsPerTrigger'

2020-06-30 Thread Jungtaek Lim
As Spark uses micro-batch for streaming, it's unavoidable to adjust the batch size properly to achieve your expectation of throughput vs latency. Especially, Spark uses global watermark which doesn't propagate (change) during micro-batch, you'd want to make the batch relatively small to make

Question about 'maxOffsetsPerTrigger'

2020-06-30 Thread Eric Beabes
While running my Spark (Stateful) Structured Streaming job I am setting 'maxOffsetsPerTrigger' value to 10 Million. I've noticed that messages are processed faster if I use a large value for this property. What I am also noticing is that until the batch is completely processed, no messages are