We can certainly help you debug this more. Some questions:

1. Are you processing messages (at all) from the "suffering" containers?
(You can verify that by observing metrics/ logging etc.)

2. If you are indeed processing messages, is it possible the impacted
containers not able to keep up with the surge in the data? You can try
re-partitioning your input topics (and increasing the number of containers)

3. If you are not processing messages, maybe can you provide us with your
stack trace? It will be super-helpful to find out if (or where) containers
are stuck.

Thanks,
Jagadish


On Wed, Mar 8, 2017 at 1:05 PM, Ankit Malhotra <amalho...@appnexus.com>
wrote:

> Hi,
>
> While joining streams from 2 partitions to join 2 streams, we see that
> some containers start suffering in that, lag (messages behind high
> watermark) for one of the tasks starts sky rocketing while the other one is
> ~ 0.
>
> We are using default values for buffer sizes, fetch threshold, are using 4
> threads as part of the pool and are using the default
> RoundRobinMessageChooser.
>
> Happy to share more details/config if it can help to debug this further.
>
> Thanks
> Ankit
>
>
>
>
>


-- 
Jagadish V,
Graduate Student,
Department of Computer Science,
Stanford University

Reply via email to