Alex, thanks for the review! Sure, this is just a local fix. Recently I've detected and fixed several issues in TCP communication SPI that happened because of invalidated cache context. In addition, Andrey Gura mentioned that periodically he reproduces hangs in cache get operations that most likely to happen because of invalidated cache context as well.
Seems that it's time to fix the situation with invalidated cache context globally. I'll create a task in JIRA in several days when return from a short vacation putting extensive details. Then someone from the community or me will have a chance to makes his/her hands dirty with this :) As for this deadlock I'll merge that changes in any case because we need to have them in the code to omit other RuntimeExceptions that may happen because of any other reason. The threads that led to the deadlock were threads from partitions supply pool or some internal workers pool. Regards, Denis On 4 авг. 2015 г., at 22:09, Alexey Goncharuk <alexey.goncha...@gmail.com> wrote: The change by itself looks right and can be merged, however I do not think this is a complete fix. What kind of running threads were using invalidated cache context? These threads may raise plenty of other exceptions if invalid context is used. I think the proper solution should block a guard (I am sure we already have a guard that we can reuse) and wait for all threads to release this guard before cleaning up the context. 2015-08-04 8:28 GMT-07:00 Denis Magda <dma...@gridgain.com>: Hi Alex, Igniters, I've fixed a deadlock in GridDhtAtomicCache that was a reason of frequent hanging of "Cache Restart" test suite. In short, the deadlock happened because a cache was already stopped but some running threads, that perform cache related operations, keep using invalidated GridCacheContext. All the details are described here: https://issues.apache.org/jira/browse/IGNITE-1189 < https://issues.apache.org/jira/browse/IGNITE-1189> Alex, as one of earlier implementers of this code, please review the changes. Regards, Denis