Denis, https://issues.apache.org/jira/browse/IGNITE-1135 can be related with this problem (linked with https://issues.apache.org/jira/browse/IGNITE-1221).
On Mon, Aug 10, 2015 at 3:16 PM, Denis Magda <dma...@gridgain.com> wrote: > Andrey Gura, > > Could you put the info on the errors you observed with cache read > operations in the ticket below? > > -- > Denis > > > On 8/10/2015 3:13 PM, Denis Magda wrote: > >> What do you mean under the cleanup on a higher level? >> >> Do you consider setting all cache context references to null when >> required letting a garbage collector to deallocate context's internals when >> it's time for that? >> >> In any case I've created a ticker where we can put all the useful >> thoughts/ideas that should help an implementor. >> https://issues.apache.org/jira/browse/IGNITE-1221 >> >> -- >> Denis >> >> On 8/5/2015 5:48 PM, Yakov Zhdanov wrote: >> >>> Guys, what about not invalidating cache contexts on stop? Let's cleanup >>> on >>> higher level. >>> >>> --Yakov >>> >>> 2015-08-04 22:48 GMT+03:00 Denis Magda <dma...@gridgain.com>: >>> >>> Alex, thanks for the review! >>>> >>>> Sure, this is just a local fix. >>>> Recently I've detected and fixed several issues in TCP communication SPI >>>> that happened because of invalidated cache context. In addition, Andrey >>>> Gura mentioned that periodically he reproduces hangs in cache get >>>> operations that most likely to happen because of invalidated cache >>>> context >>>> as well. >>>> >>>> Seems that it's time to fix the situation with invalidated cache context >>>> globally. I'll create a task in JIRA in several days when return from a >>>> short vacation putting extensive details. Then someone from the >>>> community >>>> or me will have a chance to makes his/her hands dirty with this :) >>>> >>>> As for this deadlock I'll merge that changes in any case because we >>>> need to >>>> have them in the code to omit other RuntimeExceptions that may happen >>>> because of any other reason. The threads that led to the deadlock were >>>> threads from partitions supply pool or some internal workers pool. >>>> >>>> Regards, >>>> Denis >>>> >>>> On 4 авг. 2015 г., at 22:09, Alexey Goncharuk < >>>> alexey.goncha...@gmail.com> >>>> wrote: >>>> >>>> The change by itself looks right and can be merged, however I do not >>>> think >>>> this is a complete fix. What kind of running threads were using >>>> invalidated >>>> cache context? These threads may raise plenty of other exceptions if >>>> invalid context is used. I think the proper solution should block a >>>> guard >>>> (I am sure we already have a guard that we can reuse) and wait for all >>>> threads to release this guard before cleaning up the context. >>>> >>>> 2015-08-04 8:28 GMT-07:00 Denis Magda <dma...@gridgain.com>: >>>> >>>> Hi Alex, Igniters, >>>> >>>> I've fixed a deadlock in GridDhtAtomicCache that was a reason of >>>> frequent >>>> hanging of "Cache Restart" test suite. >>>> >>>> In short, the deadlock happened because a cache was already stopped but >>>> some running threads, that perform cache related operations, keep using >>>> invalidated GridCacheContext. >>>> All the details are described here: >>>> https://issues.apache.org/jira/browse/IGNITE-1189 < >>>> https://issues.apache.org/jira/browse/IGNITE-1189> >>>> >>>> Alex, as one of earlier implementers of this code, please review the >>>> changes. >>>> >>>> Regards, >>>> Denis >>>> >>>> >> > -- Andrey Gura GridGain Systems, Inc. www.gridgain.com