[ https://issues.apache.org/jira/browse/IGNITE-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15853652#comment-15853652 ]
Dmitry Karachentsev commented on IGNITE-4395: --------------------------------------------- [review|http://reviews.ignite.apache.org/ignite/review/IGNT-CR-89] [PR#1495|https://github.com/apache/ignite/pull/1495] > Implement communication backpressure per policy - SYSTEM or PUBLIC > ------------------------------------------------------------------ > > Key: IGNITE-4395 > URL: https://issues.apache.org/jira/browse/IGNITE-4395 > Project: Ignite > Issue Type: Improvement > Components: cache, compute > Affects Versions: 1.7 > Reporter: Dmitry Karachentsev > Assignee: Dmitry Karachentsev > Fix For: 1.9 > > > 1) Start two data nodes with some cache. > 2) From one node in async mode post some big number of jobs to another. That > jobs do some cache operations. > 3) Grid hangs almost immediately and all threads are sleeping except public > ones, they are waiting for response. > This happens because all cache and job messages are queued on communication > and limited with default number (1024). It looks like jobs are waiting for > cache responses that could not be received due to this limit. > Proper solution here is to have communication backpressure per policy - > SYSTEM or PUBLIC, but not single point as it is now. It could be achieved > with having two queues per communication session or (which looks a bit > easier to implement) to have separate connections. > [PR#1331|https://github.com/apache/ignite/pull/1331] with test that leads to > grid hang. -- This message was sent by Atlassian JIRA (v6.3.15#6346)