Re: Multi task Container Starvation

Ankit Malhotra Thu, 09 Mar 2017 05:21:45 -0800

Replies inline.

--
Ankit


> On Mar 9, 2017, at 12:34 AM, Jagadish Venkatraman <jagadish1...@gmail.com> 
> wrote:
> 
> We can certainly help you debug this more. Some questions:
> 
> 1. Are you processing messages (at all) from the "suffering" containers?
> (You can verify that by observing metrics/ logging etc.)
Processing messages for sure. But mainly from one of the 2 partitions that the 
container is reading from.

The overall number of process calls for task 1 is much higher than the process 
calls for task 2 and the diff is approx the lag on the container which is all 
from task 2.
> 
> 2. If you are indeed processing messages, is it possible the impacted
> containers not able to keep up with the surge in the data? You can try
> re-partitioning your input topics (and increasing the number of containers)
We were trying the async loop by having 2 tasks be on the same container and 
multiple threads processing messages. The process is a simple inner join with a 
get and put into the RocksDB store.

We saw that both get and put for the suffering task was higher than the task 
that was chugging log.
> 
> 3. If you are not processing messages, maybe can you provide us with your
> stack trace? It will be super-helpful to find out if (or where) containers
> are stuck.
> 

Another thing to add is our amount of time being blocked is generally quite 
high which we believe is mainly because our processing is not fast enough? To 
add more color, our rocks store's average get across all tasks is around 
20,000ns BUT average put is 5X or more. We have the object cache enabled.

> Thanks,
> Jagadish
> 
> 
> On Wed, Mar 8, 2017 at 1:05 PM, Ankit Malhotra <amalho...@appnexus.com>
> wrote:
> 
>> Hi,
>> 
>> While joining streams from 2 partitions to join 2 streams, we see that
>> some containers start suffering in that, lag (messages behind high
>> watermark) for one of the tasks starts sky rocketing while the other one is
>> ~ 0.
>> 
>> We are using default values for buffer sizes, fetch threshold, are using 4
>> threads as part of the pool and are using the default
>> RoundRobinMessageChooser.
>> 
>> Happy to share more details/config if it can help to debug this further.
>> 
>> Thanks
>> Ankit
>> 
>> 
>> 
>> 
>> 
> 
> 
> -- 
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University

Re: Multi task Container Starvation

Reply via email to