Looks ok to me.

/Staffan

On 3 jun 2014, at 15:49, Aleksej Efimov <aleksej.efi...@oracle.com> wrote:

> Staffan, David,
> 
> Returning back to our WaitForMultipleObjects()/WAIT_ABANDONED discussions:
> Thank you for your comments and remarks. I can't disagree with motivation 
> that it's better to have a fatal error during the incorrect mutex handling 
> then data corruption (the consequence of the previous fix). In case of such 
> error it'll be much more easier to debug/catch it (but as Staffan said - we 
> have tried to check all call paths and don't think that we'll encounter such 
> behavior).
> I have modified the discussed code according to your suggestions: 
> http://cr.openjdk.java.net/~aefimov/8032901/9/webrev.01/
> To abort the process the 'exitTransportWithError' function was utilized.
> Also I have tried to check that behavior isn't changed by running "svc" 
> regression tests set. There was no related test failures observed.
> 
> Best Regards,
> Aleksej
> 
> On 05/15/2014 01:11 PM, Staffan Larsen wrote:
>> On 15 maj 2014, at 03:48, David Holmes <david.hol...@oracle.com> wrote:
>> 
>>> On 14/05/2014 11:18 PM, Aleksej Efimov wrote:
>>>> David, Vitaly,
>>>> 
>>>> I totally agree with Vitaly's explanation (Vitaly - thank you for that)
>>>> and code in shmemBase.c (the usage of enterMutex() function for
>>>> sending/receiving bytes through shared memory connection) illustrates on
>>>> how the connection shutdown event is used as a "cancellation token".
>>> Thanks for clarifying that.
>>> 
>>> So if we were to encounter an abandoned mutex the code would presently have 
>>> acquired the mutex but return an error, thus preventing a subsequent 
>>> release, and preventing other threads from acquiring (but allowing current 
>>> thread to recurisevely acquire. So this could both hang and cause data 
>>> corruption.
>>> 
>>> The new code will still return an error but release the mutex. So no more 
>>> hangs (other than by conditions caused by data corruption) but more 
>>> opportunity for data corruption.
>>> 
>>> Obviously it depends on exactly what happens in the critical sections 
>>> guarded by this mutex, but in general I don't agree with this rationale for 
>>> making the change:
>>> 
>>> 204     /* If the mutex is abandoned we want to return an error
>>> 205      * and also release the mutex so that we don't cause
>>> 206      * other threads to be blocked. If a mutex was abandoned
>>> 207      * we are in scary state. Data may be corrupted or inconsistent
>>> 208      * but it is still better to let other threads run (and possibly
>>> 209      * crash) than having them blocked (on the premise that a crash
>>> 210      * is always easier to debug than a hang).
>>> 
>>> Considering something has to have gone drastically wrong for the mutex to 
>>> become abandoned, I'm more inclined to consider this a fatal error and 
>>> abort.
>>> 
>>> But I'll let the serviceability folk chime in here.
>> I was involved in fixing this and writing the comment, so obviously I 
>> thought it a good solution :-)
>> 
>> I do agree that it would probably be a good idea to consider this a fatal 
>> error and abort. At that point in the code we don’t have any really nice 
>> ways of doing that, though. We could just print an error and call abort(). 
>> What we are doing now is returning an error from sysIPMutexEnter() and 
>> delegating the error handling to the caller. We have tried to check all call 
>> paths to verify that they do “the right thing” in the face of an error. It 
>> is obviously hard to verify, but it looks like they all terminate the 
>> connection with some kind of error message.
>> 
>> /Staffan
>> 
>> 
>>> Thanks,
>>> David
>>> 
>>> 
>>>> Thank you,
>>>> -Aleksej
>>>> 
>>>> 
>>>> On 05/14/2014 01:05 PM, David Holmes wrote:
>>>>> On 14/05/2014 11:06 AM, Vitaly Davidovich wrote:
>>>>>> In windows, you acquire a mutex by waiting on it using one of the wait
>>>>>> functions, one of them employed in the code in question.  If
>>>>>> WaitForMultipleObjects succeeds and returns the index of the mutex,
>>>>>> current thread has ownership now.
>>>>> Yes I understand the basic mechanics :)
>>>>> 
>>>>>> It's also common to use multi wait functions where the event is a
>>>>>> "cancelation token", e.g. manual reset event; this allows someone to
>>>>>> cancel waiting on mutex acquisition and return from the wait function.
>>>>>> Presumably that's the case here, but I'll let Aleksej confirm; just
>>>>>> wanted to throw this out there in the meantime :).
>>>>> Ah I see - yes cancellable lock acquisition would make sense.
>>>>> 
>>>>> Thanks,
>>>>> David
>>>>> 
>>>>>> Sent from my phone
>>>>>> 
>>>>>> On May 13, 2014 6:46 PM, "David Holmes" <david.hol...@oracle.com
>>>>>> <mailto:david.hol...@oracle.com>> wrote:
>>>>>> 
>>>>>>    Hi Aleksej,
>>>>>> 
>>>>>>    Thanks for the doc references regarding abandonment.
>>>>>> 
>>>>>>    Let me rephrase my question. What is this logic trying to achieve by
>>>>>>    waiting on both a mutex and an event? Do we already own the mutex
>>>>>>    when this function is called?
>>>>>> 
>>>>>>    David
>>>>>> 
>>>>>>    On 13/05/2014 11:19 PM, Aleksej Efimov wrote:
>>>>>> 
>>>>>>        David,
>>>>>> 
>>>>>>        The Windows has a different terminology for mutex objects (much
>>>>>>        differs
>>>>>>        from the POSIX one). This one link gave me some understanding of
>>>>>>        it [1].
>>>>>> 
>>>>>>        Here is the MSDN [1] description of what "abandoned mutex" is:
>>>>>>        " If a thread terminates without releasing its ownership of a
>>>>>> mutex
>>>>>>        object, the mutex object is considered to be abandoned. A
>>>>>>        waiting thread
>>>>>>        can acquire ownership of an abandoned mutex object, but the wait
>>>>>>        function will return*WAIT_ABANDONED*to indicate that the mutex
>>>>>>        object is
>>>>>>        abandoned. An abandoned mutex object indicates that an error has
>>>>>>        occurred and that any shared resource being protected by the
>>>>>> mutex
>>>>>>        object is in an undefined state. If the thread proceeds as
>>>>>>        though the
>>>>>>        mutex object had not been abandoned, it is no longer considered
>>>>>>        abandoned after the thread releases its ownership. This restores
>>>>>>        normal
>>>>>>        behavior if a handle to the mutex object is subsequently
>>>>>>        specified in a
>>>>>>        wait function."
>>>>>> 
>>>>>> 
>>>>>>        What does it mean to wait on mutex and ownership of the mutex
>>>>>>        object:
>>>>>>        "Any thread with a handle to a mutex object can use one of
>>>>>> thewait
>>>>>>        functions
>>>>>> <http://msdn.microsoft.com/en-__gb/library/windows/desktop/__ms687069%28v=vs.85%29.aspx
>>>>>> <http://msdn.microsoft.com/en-gb/library/windows/desktop/ms687069%28v=vs.85%29.aspx>>to
>>>>>>        request ownership of the mutex object. If the mutex object is
>>>>>>        owned by
>>>>>>        another thread, the wait function blocks the requesting thread
>>>>>>        until the
>>>>>>        owning thread releases the mutex object using the*ReleaseMutex*
>>>>>> <http://msdn.microsoft.com/en-__gb/library/windows/desktop/__ms685066%28v=vs.85%29.aspx
>>>>>> <http://msdn.microsoft.com/en-gb/library/windows/desktop/ms685066%28v=vs.85%29.aspx>>__function."
>>>>>> 
>>>>>>        How we can release mutex and wait on already owned mutex:
>>>>>>        " After a thread obtains ownership of a mutex, it can specify
>>>>>>        the same
>>>>>>        mutex in repeated calls to the wait-functions
>>>>>> <http://msdn.microsoft.com/en-__gb/library/windows/desktop/__ms687069%28v=vs.85%29.aspx
>>>>>> <http://msdn.microsoft.com/en-gb/library/windows/desktop/ms687069%28v=vs.85%29.aspx>>__without
>>>>>>        blocking its execution. This prevents a thread from deadlocking
>>>>>>        itself
>>>>>>        while waiting for a mutex that it already owns. To release its
>>>>>>        ownership
>>>>>>        under such circumstances, the thread must call*ReleaseMutex*
>>>>>> <http://msdn.microsoft.com/en-__gb/library/windows/desktop/__ms685066%28v=vs.85%29.aspx
>>>>>> <http://msdn.microsoft.com/en-gb/library/windows/desktop/ms685066%28v=vs.85%29.aspx>>__once
>>>>>>        for each time that the mutex satisfied the conditions of a wait
>>>>>>        function."
>>>>>> 
>>>>>>        [1]
>>>>>> http://msdn.microsoft.com/en-__gb/library/windows/desktop/__ms684266(v=vs.85).aspx
>>>>>> <http://msdn.microsoft.com/en-gb/library/windows/desktop/ms684266(v=vs.85).aspx>
>>>>>> 
>>>>>>        -Aleksej
>>>>>> 
>>>>>>        On 05/13/2014 04:00 PM, David Holmes wrote:
>>>>>> 
>>>>>>            I don't understand this one at all. What is an "abandoned
>>>>>>            mutex"? For
>>>>>>            that matter why does the code wait on a mutex and an event?
>>>>>>            Do we
>>>>>>            already own the mutex? If so what does it mean to wait on
>>>>>>            it? If not
>>>>>>            then how can we release it?
>>>>>> 
>>>>>>            ???
>>>>>> 
>>>>>>            Thanks,
>>>>>>            David
>>>>>> 
>>>>>> 
>>>>>>            On 13/05/2014 8:57 PM, Alan Bateman wrote:
>>>>>> 
>>>>>> 
>>>>>>                This is debugger's shared memory transport so cc'ing
>>>>>>                serviceability-dev
>>>>>>                as that is there this code is maintained.
>>>>>> 
>>>>>>                Is there a test case or any outline of the conditions
>>>>>>                that cause this? I
>>>>>>                think that would be useful to understand the issue
>>>>>> further.
>>>>>> 
>>>>>>                -Alan
>>>>>> 
>>>>>>                On 13/05/2014 11:46, Aleksej Efimov wrote:
>>>>>> 
>>>>>>                    Hi,
>>>>>> 
>>>>>>                    Can I have a review for 8032901 bug [1] fix [2].
>>>>>>                    There is a possible
>>>>>>                    case when 'WaitForMultipleObjects' function can
>>>>>>                    return the
>>>>>>                    WAIT_ABANDONED_0 [3] error value.
>>>>>>                    In such case it's better to release the mutex and
>>>>>>                    return error value.
>>>>>>                    This will prevent other threads to be blocked on
>>>>>>                    abandoned mutex.
>>>>>> 
>>>>>>                    Thank you,
>>>>>>                    Aleksej
>>>>>> 
>>>>>>                    [1]
>>>>>> https://bugs.openjdk.java.net/__browse/JDK-8032901
>>>>>> <https://bugs.openjdk.java.net/browse/JDK-8032901>
>>>>>>                    [2]
>>>>>> http://cr.openjdk.java.net/~__aefimov/8032901/9/webrev.00/
>>>>>> <http://cr.openjdk.java.net/~aefimov/8032901/9/webrev.00/>
>>>>>>                    [3]
>>>>>> http://msdn.microsoft.com/en-__gb/library/windows/desktop/__ms687025(v=vs.85).aspx
>>>>>> <http://msdn.microsoft.com/en-gb/library/windows/desktop/ms687025(v=vs.85).aspx>
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
> 

Reply via email to