Grid hang on compute

Dmitry Karachentsev Wed, 07 Dec 2016 05:36:21 -0800

Igniters!

Recently faced with arguable issue, it looks like a bug. Scenario isfollowing:


1) Start two data nodes with some cache.

2) From one node in async mode post some big number of jobs to another.That jobs do some cache operations.

3) Grid hangs almost immediately and all threads are sleeping exceptpublic ones, they are waiting for response.

This happens because all cache and job messages are queued oncommunication and limited with default number (1024). It looks like jobsare waiting for cache responses that could not be received due to thislimit. It's hard to diagnose and looks not convenient (as I know we haveno limitation in docs for using cache ops from compute jobs).

So, my question is. Should we try to solve that or, may be, it's enoughto update documentation with recommendation to disable queue limit forsuch cases?

Grid hang on compute

Reply via email to