Re: Fixed deadlock in GridDhtAtomicCache (Alex G. your review is needed)

Andrey Gura Tue, 11 Aug 2015 03:18:05 -0700

Denis,

https://issues.apache.org/jira/browse/IGNITE-1135 can be related with this
problem (linked with https://issues.apache.org/jira/browse/IGNITE-1221).



On Mon, Aug 10, 2015 at 3:16 PM, Denis Magda <[email protected]> wrote:

> Andrey Gura,
>
> Could you put the info on the errors you observed with cache read
> operations in the ticket below?
>
> --
> Denis
>
>
> On 8/10/2015 3:13 PM, Denis Magda wrote:
>
>> What do you mean under the cleanup on a higher level?
>>
>> Do you consider setting all cache context references to null when
>> required letting a garbage collector to deallocate context's internals when
>> it's time for that?
>>
>> In any case I've created a ticker where we can put all the useful
>> thoughts/ideas that should help an implementor.
>> https://issues.apache.org/jira/browse/IGNITE-1221
>>
>> --
>> Denis
>>
>> On 8/5/2015 5:48 PM, Yakov Zhdanov wrote:
>>
>>> Guys, what about not invalidating cache contexts on stop? Let's cleanup
>>> on
>>> higher level.
>>>
>>> --Yakov
>>>
>>> 2015-08-04 22:48 GMT+03:00 Denis Magda <[email protected]>:
>>>
>>> Alex, thanks for the review!
>>>>
>>>> Sure, this is just a local fix.
>>>> Recently I've detected and fixed several issues in TCP communication SPI
>>>> that happened because of invalidated cache context. In addition, Andrey
>>>> Gura mentioned that periodically he reproduces hangs in cache get
>>>> operations that most likely to happen because of invalidated cache
>>>> context
>>>> as well.
>>>>
>>>> Seems that it's time to fix the situation with invalidated cache context
>>>> globally. I'll create a task in JIRA in several days when return from a
>>>> short vacation putting extensive details. Then someone from the
>>>> community
>>>> or me will have a chance to makes his/her hands dirty with this :)
>>>>
>>>> As for this deadlock I'll merge that changes in any case because we
>>>> need to
>>>> have them in the code to omit other RuntimeExceptions that may happen
>>>> because of any other reason. The threads that led to the deadlock were
>>>> threads from partitions supply pool or some internal workers pool.
>>>>
>>>> Regards,
>>>> Denis
>>>>
>>>> On 4 авг. 2015 г., at 22:09, Alexey Goncharuk <
>>>> [email protected]>
>>>> wrote:
>>>>
>>>> The change by itself looks right and can be merged, however I do not
>>>> think
>>>> this is a complete fix. What kind of running threads were using
>>>> invalidated
>>>> cache context? These threads may raise plenty of other exceptions if
>>>> invalid context is used. I think the proper solution should block a
>>>> guard
>>>> (I am sure we already have a guard that we can reuse) and wait for all
>>>> threads to release this guard before cleaning up the context.
>>>>
>>>> 2015-08-04 8:28 GMT-07:00 Denis Magda <[email protected]>:
>>>>
>>>> Hi Alex, Igniters,
>>>>
>>>> I've fixed a deadlock in GridDhtAtomicCache that was a reason of
>>>> frequent
>>>> hanging of "Cache Restart" test suite.
>>>>
>>>> In short, the deadlock happened because a cache was already stopped but
>>>> some running threads, that perform cache related operations, keep using
>>>> invalidated GridCacheContext.
>>>> All the details are described here:
>>>> https://issues.apache.org/jira/browse/IGNITE-1189 <
>>>> https://issues.apache.org/jira/browse/IGNITE-1189>
>>>>
>>>> Alex, as one of earlier implementers of this code, please review the
>>>> changes.
>>>>
>>>> Regards,
>>>> Denis
>>>>
>>>>
>>
>


-- 
Andrey Gura
GridGain Systems, Inc.
www.gridgain.com

Re: Fixed deadlock in GridDhtAtomicCache (Alex G. your review is needed)

Reply via email to