Re: RFR: 8032901: WaitForMultipleObjects() return value not handled appropriately

2014-05-13 Thread Alan Bateman


This is debugger's shared memory transport so cc'ing serviceability-dev 
as that is there this code is maintained.


Is there a test case or any outline of the conditions that cause this? I 
think that would be useful to understand the issue further.


-Alan

On 13/05/2014 11:46, Aleksej Efimov wrote:

Hi,

Can I have a review for 8032901 bug [1] fix [2]. There is a possible 
case when 'WaitForMultipleObjects' function can return the 
WAIT_ABANDONED_0 [3] error value.
In such case it's better to release the mutex and return error value. 
This will prevent other threads to be blocked on abandoned mutex.


Thank you,
Aleksej

[1] https://bugs.openjdk.java.net/browse/JDK-8032901
[2] http://cr.openjdk.java.net/~aefimov/8032901/9/webrev.00/
[3] 
http://msdn.microsoft.com/en-gb/library/windows/desktop/ms687025(v=vs.85).aspx 





Re: RFR: 8032901: WaitForMultipleObjects() return value not handled appropriately

2014-05-13 Thread Aleksej Efimov

Alan,
There is no test case for this issue and also the attempt to outline the 
conditions ended with no result. The report is based only on 
'WaitForMultipleObjects' code analysis and this fix is an attempt to 
make the code looks a little more correct.


-Aleksej

On 05/13/2014 02:57 PM, Alan Bateman wrote:


This is debugger's shared memory transport so cc'ing 
serviceability-dev as that is there this code is maintained.


Is there a test case or any outline of the conditions that cause this? 
I think that would be useful to understand the issue further.


-Alan

On 13/05/2014 11:46, Aleksej Efimov wrote:

Hi,

Can I have a review for 8032901 bug [1] fix [2]. There is a possible 
case when 'WaitForMultipleObjects' function can return the 
WAIT_ABANDONED_0 [3] error value.
In such case it's better to release the mutex and return error value. 
This will prevent other threads to be blocked on abandoned mutex.


Thank you,
Aleksej

[1] https://bugs.openjdk.java.net/browse/JDK-8032901
[2] http://cr.openjdk.java.net/~aefimov/8032901/9/webrev.00/
[3] 
http://msdn.microsoft.com/en-gb/library/windows/desktop/ms687025(v=vs.85).aspx 







Re: RFR: 8032901: WaitForMultipleObjects() return value not handled appropriately

2014-05-13 Thread David Holmes
I don't understand this one at all. What is an "abandoned mutex"? For 
that matter why does the code wait on a mutex and an event? Do we 
already own the mutex? If so what does it mean to wait on it? If not 
then how can we release it?


???

Thanks,
David


On 13/05/2014 8:57 PM, Alan Bateman wrote:


This is debugger's shared memory transport so cc'ing serviceability-dev
as that is there this code is maintained.

Is there a test case or any outline of the conditions that cause this? I
think that would be useful to understand the issue further.

-Alan

On 13/05/2014 11:46, Aleksej Efimov wrote:

Hi,

Can I have a review for 8032901 bug [1] fix [2]. There is a possible
case when 'WaitForMultipleObjects' function can return the
WAIT_ABANDONED_0 [3] error value.
In such case it's better to release the mutex and return error value.
This will prevent other threads to be blocked on abandoned mutex.

Thank you,
Aleksej

[1] https://bugs.openjdk.java.net/browse/JDK-8032901
[2] http://cr.openjdk.java.net/~aefimov/8032901/9/webrev.00/
[3]
http://msdn.microsoft.com/en-gb/library/windows/desktop/ms687025(v=vs.85).aspx





Re: RFR: 8032901: WaitForMultipleObjects() return value not handled appropriately

2014-05-13 Thread Aleksej Efimov

David,

The Windows has a different terminology for mutex objects (much differs 
from the POSIX one). This one link gave me some understanding of it [1].


Here is the MSDN [1] description of what "abandoned mutex" is:
" If a thread terminates without releasing its ownership of a mutex 
object, the mutex object is considered to be abandoned. A waiting thread 
can acquire ownership of an abandoned mutex object, but the wait 
function will return*WAIT_ABANDONED*to indicate that the mutex object is 
abandoned. An abandoned mutex object indicates that an error has 
occurred and that any shared resource being protected by the mutex 
object is in an undefined state. If the thread proceeds as though the 
mutex object had not been abandoned, it is no longer considered 
abandoned after the thread releases its ownership. This restores normal 
behavior if a handle to the mutex object is subsequently specified in a 
wait function."



What does it mean to wait on mutex and ownership of the mutex object:
"Any thread with a handle to a mutex object can use one of thewait 
functions 
to 
request ownership of the mutex object. If the mutex object is owned by 
another thread, the wait function blocks the requesting thread until the 
owning thread releases the mutex object using the*ReleaseMutex* 
function."


How we can release mutex and wait on already owned mutex:
" After a thread obtains ownership of a mutex, it can specify the same 
mutex in repeated calls to the wait-functions 
without 
blocking its execution. This prevents a thread from deadlocking itself 
while waiting for a mutex that it already owns. To release its ownership 
under such circumstances, the thread must call*ReleaseMutex* 
once 
for each time that the mutex satisfied the conditions of a wait function."


[1] 
http://msdn.microsoft.com/en-gb/library/windows/desktop/ms684266(v=vs.85).aspx


-Aleksej

On 05/13/2014 04:00 PM, David Holmes wrote:
I don't understand this one at all. What is an "abandoned mutex"? For 
that matter why does the code wait on a mutex and an event? Do we 
already own the mutex? If so what does it mean to wait on it? If not 
then how can we release it?


???

Thanks,
David


On 13/05/2014 8:57 PM, Alan Bateman wrote:


This is debugger's shared memory transport so cc'ing serviceability-dev
as that is there this code is maintained.

Is there a test case or any outline of the conditions that cause this? I
think that would be useful to understand the issue further.

-Alan

On 13/05/2014 11:46, Aleksej Efimov wrote:

Hi,

Can I have a review for 8032901 bug [1] fix [2]. There is a possible
case when 'WaitForMultipleObjects' function can return the
WAIT_ABANDONED_0 [3] error value.
In such case it's better to release the mutex and return error value.
This will prevent other threads to be blocked on abandoned mutex.

Thank you,
Aleksej

[1] https://bugs.openjdk.java.net/browse/JDK-8032901
[2] http://cr.openjdk.java.net/~aefimov/8032901/9/webrev.00/
[3]
http://msdn.microsoft.com/en-gb/library/windows/desktop/ms687025(v=vs.85).aspx 









Re: RFR: 8032901: WaitForMultipleObjects() return value not handled appropriately

2014-05-13 Thread David Holmes

Hi Aleksej,

Thanks for the doc references regarding abandonment.

Let me rephrase my question. What is this logic trying to achieve by 
waiting on both a mutex and an event? Do we already own the mutex when 
this function is called?


David

On 13/05/2014 11:19 PM, Aleksej Efimov wrote:

David,

The Windows has a different terminology for mutex objects (much differs
from the POSIX one). This one link gave me some understanding of it [1].

Here is the MSDN [1] description of what "abandoned mutex" is:
" If a thread terminates without releasing its ownership of a mutex
object, the mutex object is considered to be abandoned. A waiting thread
can acquire ownership of an abandoned mutex object, but the wait
function will return*WAIT_ABANDONED*to indicate that the mutex object is
abandoned. An abandoned mutex object indicates that an error has
occurred and that any shared resource being protected by the mutex
object is in an undefined state. If the thread proceeds as though the
mutex object had not been abandoned, it is no longer considered
abandoned after the thread releases its ownership. This restores normal
behavior if a handle to the mutex object is subsequently specified in a
wait function."


What does it mean to wait on mutex and ownership of the mutex object:
"Any thread with a handle to a mutex object can use one of thewait
functions
to
request ownership of the mutex object. If the mutex object is owned by
another thread, the wait function blocks the requesting thread until the
owning thread releases the mutex object using the*ReleaseMutex*
function."

How we can release mutex and wait on already owned mutex:
" After a thread obtains ownership of a mutex, it can specify the same
mutex in repeated calls to the wait-functions
without
blocking its execution. This prevents a thread from deadlocking itself
while waiting for a mutex that it already owns. To release its ownership
under such circumstances, the thread must call*ReleaseMutex*
once
for each time that the mutex satisfied the conditions of a wait function."

[1]
http://msdn.microsoft.com/en-gb/library/windows/desktop/ms684266(v=vs.85).aspx

-Aleksej

On 05/13/2014 04:00 PM, David Holmes wrote:

I don't understand this one at all. What is an "abandoned mutex"? For
that matter why does the code wait on a mutex and an event? Do we
already own the mutex? If so what does it mean to wait on it? If not
then how can we release it?

???

Thanks,
David


On 13/05/2014 8:57 PM, Alan Bateman wrote:


This is debugger's shared memory transport so cc'ing serviceability-dev
as that is there this code is maintained.

Is there a test case or any outline of the conditions that cause this? I
think that would be useful to understand the issue further.

-Alan

On 13/05/2014 11:46, Aleksej Efimov wrote:

Hi,

Can I have a review for 8032901 bug [1] fix [2]. There is a possible
case when 'WaitForMultipleObjects' function can return the
WAIT_ABANDONED_0 [3] error value.
In such case it's better to release the mutex and return error value.
This will prevent other threads to be blocked on abandoned mutex.

Thank you,
Aleksej

[1] https://bugs.openjdk.java.net/browse/JDK-8032901
[2] http://cr.openjdk.java.net/~aefimov/8032901/9/webrev.00/
[3]
http://msdn.microsoft.com/en-gb/library/windows/desktop/ms687025(v=vs.85).aspx








Re: RFR: 8032901: WaitForMultipleObjects() return value not handled appropriately

2014-05-14 Thread David Holmes

On 14/05/2014 11:06 AM, Vitaly Davidovich wrote:

In windows, you acquire a mutex by waiting on it using one of the wait
functions, one of them employed in the code in question.  If
WaitForMultipleObjects succeeds and returns the index of the mutex,
current thread has ownership now.


Yes I understand the basic mechanics :)


It's also common to use multi wait functions where the event is a
"cancelation token", e.g. manual reset event; this allows someone to
cancel waiting on mutex acquisition and return from the wait function.
Presumably that's the case here, but I'll let Aleksej confirm; just
wanted to throw this out there in the meantime :).


Ah I see - yes cancellable lock acquisition would make sense.

Thanks,
David


Sent from my phone

On May 13, 2014 6:46 PM, "David Holmes" mailto:david.hol...@oracle.com>> wrote:

Hi Aleksej,

Thanks for the doc references regarding abandonment.

Let me rephrase my question. What is this logic trying to achieve by
waiting on both a mutex and an event? Do we already own the mutex
when this function is called?

David

On 13/05/2014 11:19 PM, Aleksej Efimov wrote:

David,

The Windows has a different terminology for mutex objects (much
differs
from the POSIX one). This one link gave me some understanding of
it [1].

Here is the MSDN [1] description of what "abandoned mutex" is:
" If a thread terminates without releasing its ownership of a mutex
object, the mutex object is considered to be abandoned. A
waiting thread
can acquire ownership of an abandoned mutex object, but the wait
function will return*WAIT_ABANDONED*to indicate that the mutex
object is
abandoned. An abandoned mutex object indicates that an error has
occurred and that any shared resource being protected by the mutex
object is in an undefined state. If the thread proceeds as
though the
mutex object had not been abandoned, it is no longer considered
abandoned after the thread releases its ownership. This restores
normal
behavior if a handle to the mutex object is subsequently
specified in a
wait function."


What does it mean to wait on mutex and ownership of the mutex
object:
"Any thread with a handle to a mutex object can use one of thewait
functions

>to
request ownership of the mutex object. If the mutex object is
owned by
another thread, the wait function blocks the requesting thread
until the
owning thread releases the mutex object using the*ReleaseMutex*

>__function."

How we can release mutex and wait on already owned mutex:
" After a thread obtains ownership of a mutex, it can specify
the same
mutex in repeated calls to the wait-functions

>__without
blocking its execution. This prevents a thread from deadlocking
itself
while waiting for a mutex that it already owns. To release its
ownership
under such circumstances, the thread must call*ReleaseMutex*

>__once
for each time that the mutex satisfied the conditions of a wait
function."

[1]

http://msdn.microsoft.com/en-__gb/library/windows/desktop/__ms684266(v=vs.85).aspx



-Aleksej

On 05/13/2014 04:00 PM, David Holmes wrote:

I don't understand this one at all. What is an "abandoned
mutex"? For
that matter why does the code wait on a mutex and an event?
Do we
already own the mutex? If so what does it mean to wait on
it? If not
then how can we release it?

???

Thanks,
David


On 13/05/2014 8:57 PM, Alan Bateman wrote:


This is debugger's shared memory transport so cc'ing
serviceability-dev
as that is there this code is maintained.

Is there a test case or any outline of the conditions
that cause this? I
 

Re: RFR: 8032901: WaitForMultipleObjects() return value not handled appropriately

2014-05-14 Thread Aleksej Efimov

David, Vitaly,

I totally agree with Vitaly's explanation (Vitaly - thank you for that) 
and code in shmemBase.c (the usage of enterMutex() function for 
sending/receiving bytes through shared memory connection) illustrates on 
how the connection shutdown event is used as a "cancellation token".


Thank you,
-Aleksej


On 05/14/2014 01:05 PM, David Holmes wrote:

On 14/05/2014 11:06 AM, Vitaly Davidovich wrote:

In windows, you acquire a mutex by waiting on it using one of the wait
functions, one of them employed in the code in question.  If
WaitForMultipleObjects succeeds and returns the index of the mutex,
current thread has ownership now.


Yes I understand the basic mechanics :)


It's also common to use multi wait functions where the event is a
"cancelation token", e.g. manual reset event; this allows someone to
cancel waiting on mutex acquisition and return from the wait function.
Presumably that's the case here, but I'll let Aleksej confirm; just
wanted to throw this out there in the meantime :).


Ah I see - yes cancellable lock acquisition would make sense.

Thanks,
David


Sent from my phone

On May 13, 2014 6:46 PM, "David Holmes" mailto:david.hol...@oracle.com>> wrote:

Hi Aleksej,

Thanks for the doc references regarding abandonment.

Let me rephrase my question. What is this logic trying to achieve by
waiting on both a mutex and an event? Do we already own the mutex
when this function is called?

David

On 13/05/2014 11:19 PM, Aleksej Efimov wrote:

David,

The Windows has a different terminology for mutex objects (much
differs
from the POSIX one). This one link gave me some understanding of
it [1].

Here is the MSDN [1] description of what "abandoned mutex" is:
" If a thread terminates without releasing its ownership of a 
mutex

object, the mutex object is considered to be abandoned. A
waiting thread
can acquire ownership of an abandoned mutex object, but the wait
function will return*WAIT_ABANDONED*to indicate that the mutex
object is
abandoned. An abandoned mutex object indicates that an error has
occurred and that any shared resource being protected by the 
mutex

object is in an undefined state. If the thread proceeds as
though the
mutex object had not been abandoned, it is no longer considered
abandoned after the thread releases its ownership. This restores
normal
behavior if a handle to the mutex object is subsequently
specified in a
wait function."


What does it mean to wait on mutex and ownership of the mutex
object:
"Any thread with a handle to a mutex object can use one of 
thewait

functions
>to
request ownership of the mutex object. If the mutex object is
owned by
another thread, the wait function blocks the requesting thread
until the
owning thread releases the mutex object using the*ReleaseMutex*
>__function."

How we can release mutex and wait on already owned mutex:
" After a thread obtains ownership of a mutex, it can specify
the same
mutex in repeated calls to the wait-functions
>__without
blocking its execution. This prevents a thread from deadlocking
itself
while waiting for a mutex that it already owns. To release its
ownership
under such circumstances, the thread must call*ReleaseMutex*
>__once
for each time that the mutex satisfied the conditions of a wait
function."

[1]
http://msdn.microsoft.com/en-__gb/library/windows/desktop/__ms684266(v=vs.85).aspx


-Aleksej

On 05/13/2014 04:00 PM, David Holmes wrote:

I don't understand this one at all. What is an "abandoned
mutex"? For
that matter why does the code wait on a mutex and an event?
Do we
already own the mutex? If so what does it mean to wait on
it? If not
then how can we release it?

???

Thanks,
David


On 13/05/2014 8:57 PM, Alan Bateman wrote:


Re: RFR: 8032901: WaitForMultipleObjects() return value not handled appropriately

2014-05-14 Thread David Holmes

On 14/05/2014 11:18 PM, Aleksej Efimov wrote:

David, Vitaly,

I totally agree with Vitaly's explanation (Vitaly - thank you for that)
and code in shmemBase.c (the usage of enterMutex() function for
sending/receiving bytes through shared memory connection) illustrates on
how the connection shutdown event is used as a "cancellation token".


Thanks for clarifying that.

So if we were to encounter an abandoned mutex the code would presently 
have acquired the mutex but return an error, thus preventing a 
subsequent release, and preventing other threads from acquiring (but 
allowing current thread to recurisevely acquire. So this could both hang 
and cause data corruption.


The new code will still return an error but release the mutex. So no 
more hangs (other than by conditions caused by data corruption) but more 
opportunity for data corruption.


Obviously it depends on exactly what happens in the critical sections 
guarded by this mutex, but in general I don't agree with this rationale 
for making the change:


 204 /* If the mutex is abandoned we want to return an error
 205  * and also release the mutex so that we don't cause
 206  * other threads to be blocked. If a mutex was abandoned
 207  * we are in scary state. Data may be corrupted or inconsistent
 208  * but it is still better to let other threads run (and possibly
 209  * crash) than having them blocked (on the premise that a crash
 210  * is always easier to debug than a hang).

Considering something has to have gone drastically wrong for the mutex 
to become abandoned, I'm more inclined to consider this a fatal error 
and abort.


But I'll let the serviceability folk chime in here.

Thanks,
David



Thank you,
-Aleksej


On 05/14/2014 01:05 PM, David Holmes wrote:

On 14/05/2014 11:06 AM, Vitaly Davidovich wrote:

In windows, you acquire a mutex by waiting on it using one of the wait
functions, one of them employed in the code in question.  If
WaitForMultipleObjects succeeds and returns the index of the mutex,
current thread has ownership now.


Yes I understand the basic mechanics :)


It's also common to use multi wait functions where the event is a
"cancelation token", e.g. manual reset event; this allows someone to
cancel waiting on mutex acquisition and return from the wait function.
Presumably that's the case here, but I'll let Aleksej confirm; just
wanted to throw this out there in the meantime :).


Ah I see - yes cancellable lock acquisition would make sense.

Thanks,
David


Sent from my phone

On May 13, 2014 6:46 PM, "David Holmes" mailto:david.hol...@oracle.com>> wrote:

Hi Aleksej,

Thanks for the doc references regarding abandonment.

Let me rephrase my question. What is this logic trying to achieve by
waiting on both a mutex and an event? Do we already own the mutex
when this function is called?

David

On 13/05/2014 11:19 PM, Aleksej Efimov wrote:

David,

The Windows has a different terminology for mutex objects (much
differs
from the POSIX one). This one link gave me some understanding of
it [1].

Here is the MSDN [1] description of what "abandoned mutex" is:
" If a thread terminates without releasing its ownership of a
mutex
object, the mutex object is considered to be abandoned. A
waiting thread
can acquire ownership of an abandoned mutex object, but the wait
function will return*WAIT_ABANDONED*to indicate that the mutex
object is
abandoned. An abandoned mutex object indicates that an error has
occurred and that any shared resource being protected by the
mutex
object is in an undefined state. If the thread proceeds as
though the
mutex object had not been abandoned, it is no longer considered
abandoned after the thread releases its ownership. This restores
normal
behavior if a handle to the mutex object is subsequently
specified in a
wait function."


What does it mean to wait on mutex and ownership of the mutex
object:
"Any thread with a handle to a mutex object can use one of
thewait
functions
>to
request ownership of the mutex object. If the mutex object is
owned by
another thread, the wait function blocks the requesting thread
until the
owning thread releases the mutex object using the*ReleaseMutex*
>__function."

How we can release mutex and wait on already owned mutex:
" After a thread obtains ownership of a mutex, it can specify
the same
mutex in 

Re: RFR: 8032901: WaitForMultipleObjects() return value not handled appropriately

2014-05-14 Thread Vitaly Davidovich
In windows, you acquire a mutex by waiting on it using one of the wait
functions, one of them employed in the code in question.  If
WaitForMultipleObjects succeeds and returns the index of the mutex, current
thread has ownership now.

It's also common to use multi wait functions where the event is a
"cancelation token", e.g. manual reset event; this allows someone to cancel
waiting on mutex acquisition and return from the wait function.  Presumably
that's the case here, but I'll let Aleksej confirm; just wanted to throw
this out there in the meantime :).

Sent from my phone
On May 13, 2014 6:46 PM, "David Holmes"  wrote:

> Hi Aleksej,
>
> Thanks for the doc references regarding abandonment.
>
> Let me rephrase my question. What is this logic trying to achieve by
> waiting on both a mutex and an event? Do we already own the mutex when this
> function is called?
>
> David
>
> On 13/05/2014 11:19 PM, Aleksej Efimov wrote:
>
>> David,
>>
>> The Windows has a different terminology for mutex objects (much differs
>> from the POSIX one). This one link gave me some understanding of it [1].
>>
>> Here is the MSDN [1] description of what "abandoned mutex" is:
>> " If a thread terminates without releasing its ownership of a mutex
>> object, the mutex object is considered to be abandoned. A waiting thread
>> can acquire ownership of an abandoned mutex object, but the wait
>> function will return*WAIT_ABANDONED*to indicate that the mutex object is
>> abandoned. An abandoned mutex object indicates that an error has
>> occurred and that any shared resource being protected by the mutex
>> object is in an undefined state. If the thread proceeds as though the
>> mutex object had not been abandoned, it is no longer considered
>> abandoned after the thread releases its ownership. This restores normal
>> behavior if a handle to the mutex object is subsequently specified in a
>> wait function."
>>
>>
>> What does it mean to wait on mutex and ownership of the mutex object:
>> "Any thread with a handle to a mutex object can use one of thewait
>> functions
>> > ms687069%28v=vs.85%29.aspx>to
>> request ownership of the mutex object. If the mutex object is owned by
>> another thread, the wait function blocks the requesting thread until the
>> owning thread releases the mutex object using the*ReleaseMutex*
>> > ms685066%28v=vs.85%29.aspx>function."
>>
>> How we can release mutex and wait on already owned mutex:
>> " After a thread obtains ownership of a mutex, it can specify the same
>> mutex in repeated calls to the wait-functions
>> > ms687069%28v=vs.85%29.aspx>without
>> blocking its execution. This prevents a thread from deadlocking itself
>> while waiting for a mutex that it already owns. To release its ownership
>> under such circumstances, the thread must call*ReleaseMutex*
>> > ms685066%28v=vs.85%29.aspx>once
>> for each time that the mutex satisfied the conditions of a wait function."
>>
>> [1]
>> http://msdn.microsoft.com/en-gb/library/windows/desktop/
>> ms684266(v=vs.85).aspx
>>
>> -Aleksej
>>
>> On 05/13/2014 04:00 PM, David Holmes wrote:
>>
>>> I don't understand this one at all. What is an "abandoned mutex"? For
>>> that matter why does the code wait on a mutex and an event? Do we
>>> already own the mutex? If so what does it mean to wait on it? If not
>>> then how can we release it?
>>>
>>> ???
>>>
>>> Thanks,
>>> David
>>>
>>>
>>> On 13/05/2014 8:57 PM, Alan Bateman wrote:
>>>

 This is debugger's shared memory transport so cc'ing serviceability-dev
 as that is there this code is maintained.

 Is there a test case or any outline of the conditions that cause this? I
 think that would be useful to understand the issue further.

 -Alan

 On 13/05/2014 11:46, Aleksej Efimov wrote:

> Hi,
>
> Can I have a review for 8032901 bug [1] fix [2]. There is a possible
> case when 'WaitForMultipleObjects' function can return the
> WAIT_ABANDONED_0 [3] error value.
> In such case it's better to release the mutex and return error value.
> This will prevent other threads to be blocked on abandoned mutex.
>
> Thank you,
> Aleksej
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8032901
> [2] http://cr.openjdk.java.net/~aefimov/8032901/9/webrev.00/
> [3]
> http://msdn.microsoft.com/en-gb/library/windows/desktop/
> ms687025(v=vs.85).aspx
>
>
>

>>


Re: RFR: 8032901: WaitForMultipleObjects() return value not handled appropriately

2014-05-15 Thread Staffan Larsen

On 15 maj 2014, at 03:48, David Holmes  wrote:

> On 14/05/2014 11:18 PM, Aleksej Efimov wrote:
>> David, Vitaly,
>> 
>> I totally agree with Vitaly's explanation (Vitaly - thank you for that)
>> and code in shmemBase.c (the usage of enterMutex() function for
>> sending/receiving bytes through shared memory connection) illustrates on
>> how the connection shutdown event is used as a "cancellation token".
> 
> Thanks for clarifying that.
> 
> So if we were to encounter an abandoned mutex the code would presently have 
> acquired the mutex but return an error, thus preventing a subsequent release, 
> and preventing other threads from acquiring (but allowing current thread to 
> recurisevely acquire. So this could both hang and cause data corruption.
> 
> The new code will still return an error but release the mutex. So no more 
> hangs (other than by conditions caused by data corruption) but more 
> opportunity for data corruption.
> 
> Obviously it depends on exactly what happens in the critical sections guarded 
> by this mutex, but in general I don't agree with this rationale for making 
> the change:
> 
> 204 /* If the mutex is abandoned we want to return an error
> 205  * and also release the mutex so that we don't cause
> 206  * other threads to be blocked. If a mutex was abandoned
> 207  * we are in scary state. Data may be corrupted or inconsistent
> 208  * but it is still better to let other threads run (and possibly
> 209  * crash) than having them blocked (on the premise that a crash
> 210  * is always easier to debug than a hang).
> 
> Considering something has to have gone drastically wrong for the mutex to 
> become abandoned, I'm more inclined to consider this a fatal error and abort.
> 
> But I'll let the serviceability folk chime in here.

I was involved in fixing this and writing the comment, so obviously I thought 
it a good solution :-)

I do agree that it would probably be a good idea to consider this a fatal error 
and abort. At that point in the code we don’t have any really nice ways of 
doing that, though. We could just print an error and call abort(). What we are 
doing now is returning an error from sysIPMutexEnter() and delegating the error 
handling to the caller. We have tried to check all call paths to verify that 
they do “the right thing” in the face of an error. It is obviously hard to 
verify, but it looks like they all terminate the connection with some kind of 
error message.

/Staffan


> 
> Thanks,
> David
> 
> 
>> Thank you,
>> -Aleksej
>> 
>> 
>> On 05/14/2014 01:05 PM, David Holmes wrote:
>>> On 14/05/2014 11:06 AM, Vitaly Davidovich wrote:
 In windows, you acquire a mutex by waiting on it using one of the wait
 functions, one of them employed in the code in question.  If
 WaitForMultipleObjects succeeds and returns the index of the mutex,
 current thread has ownership now.
>>> 
>>> Yes I understand the basic mechanics :)
>>> 
 It's also common to use multi wait functions where the event is a
 "cancelation token", e.g. manual reset event; this allows someone to
 cancel waiting on mutex acquisition and return from the wait function.
 Presumably that's the case here, but I'll let Aleksej confirm; just
 wanted to throw this out there in the meantime :).
>>> 
>>> Ah I see - yes cancellable lock acquisition would make sense.
>>> 
>>> Thanks,
>>> David
>>> 
 Sent from my phone
 
 On May 13, 2014 6:46 PM, "David Holmes" >>> > wrote:
 
Hi Aleksej,
 
Thanks for the doc references regarding abandonment.
 
Let me rephrase my question. What is this logic trying to achieve by
waiting on both a mutex and an event? Do we already own the mutex
when this function is called?
 
David
 
On 13/05/2014 11:19 PM, Aleksej Efimov wrote:
 
David,
 
The Windows has a different terminology for mutex objects (much
differs
from the POSIX one). This one link gave me some understanding of
it [1].
 
Here is the MSDN [1] description of what "abandoned mutex" is:
" If a thread terminates without releasing its ownership of a
 mutex
object, the mutex object is considered to be abandoned. A
waiting thread
can acquire ownership of an abandoned mutex object, but the wait
function will return*WAIT_ABANDONED*to indicate that the mutex
object is
abandoned. An abandoned mutex object indicates that an error has
occurred and that any shared resource being protected by the
 mutex
object is in an undefined state. If the thread proceeds as
though the
mutex object had not been abandoned, it is no longer considered
abandoned after the thread releases its ownership. This restores
normal
  

Re: RFR: 8032901: WaitForMultipleObjects() return value not handled appropriately

2014-06-03 Thread Aleksej Efimov

Staffan, David,

Returning back to our WaitForMultipleObjects()/WAIT_ABANDONED discussions:
Thank you for your comments and remarks. I can't disagree with 
motivation that it's better to have a fatal error during the incorrect 
mutex handling then data corruption (the consequence of the previous 
fix). In case of such error it'll be much more easier to debug/catch it 
(but as Staffan said - we have tried to check all call paths and don't 
think that we'll encounter such behavior).
I have modified the discussed code according to your suggestions: 
http://cr.openjdk.java.net/~aefimov/8032901/9/webrev.01/

To abort the process the 'exitTransportWithError' function was utilized.
Also I have tried to check that behavior isn't changed by running "svc" 
regression tests set. There was no related test failures observed.


Best Regards,
Aleksej

On 05/15/2014 01:11 PM, Staffan Larsen wrote:

On 15 maj 2014, at 03:48, David Holmes  wrote:


On 14/05/2014 11:18 PM, Aleksej Efimov wrote:

David, Vitaly,

I totally agree with Vitaly's explanation (Vitaly - thank you for that)
and code in shmemBase.c (the usage of enterMutex() function for
sending/receiving bytes through shared memory connection) illustrates on
how the connection shutdown event is used as a "cancellation token".

Thanks for clarifying that.

So if we were to encounter an abandoned mutex the code would presently have 
acquired the mutex but return an error, thus preventing a subsequent release, 
and preventing other threads from acquiring (but allowing current thread to 
recurisevely acquire. So this could both hang and cause data corruption.

The new code will still return an error but release the mutex. So no more hangs 
(other than by conditions caused by data corruption) but more opportunity for 
data corruption.

Obviously it depends on exactly what happens in the critical sections guarded 
by this mutex, but in general I don't agree with this rationale for making the 
change:

204 /* If the mutex is abandoned we want to return an error
205  * and also release the mutex so that we don't cause
206  * other threads to be blocked. If a mutex was abandoned
207  * we are in scary state. Data may be corrupted or inconsistent
208  * but it is still better to let other threads run (and possibly
209  * crash) than having them blocked (on the premise that a crash
210  * is always easier to debug than a hang).

Considering something has to have gone drastically wrong for the mutex to 
become abandoned, I'm more inclined to consider this a fatal error and abort.

But I'll let the serviceability folk chime in here.

I was involved in fixing this and writing the comment, so obviously I thought 
it a good solution :-)

I do agree that it would probably be a good idea to consider this a fatal error 
and abort. At that point in the code we don’t have any really nice ways of 
doing that, though. We could just print an error and call abort(). What we are 
doing now is returning an error from sysIPMutexEnter() and delegating the error 
handling to the caller. We have tried to check all call paths to verify that 
they do “the right thing” in the face of an error. It is obviously hard to 
verify, but it looks like they all terminate the connection with some kind of 
error message.

/Staffan



Thanks,
David



Thank you,
-Aleksej


On 05/14/2014 01:05 PM, David Holmes wrote:

On 14/05/2014 11:06 AM, Vitaly Davidovich wrote:

In windows, you acquire a mutex by waiting on it using one of the wait
functions, one of them employed in the code in question.  If
WaitForMultipleObjects succeeds and returns the index of the mutex,
current thread has ownership now.

Yes I understand the basic mechanics :)


It's also common to use multi wait functions where the event is a
"cancelation token", e.g. manual reset event; this allows someone to
cancel waiting on mutex acquisition and return from the wait function.
Presumably that's the case here, but I'll let Aleksej confirm; just
wanted to throw this out there in the meantime :).

Ah I see - yes cancellable lock acquisition would make sense.

Thanks,
David


Sent from my phone

On May 13, 2014 6:46 PM, "David Holmes" mailto:david.hol...@oracle.com>> wrote:

Hi Aleksej,

Thanks for the doc references regarding abandonment.

Let me rephrase my question. What is this logic trying to achieve by
waiting on both a mutex and an event? Do we already own the mutex
when this function is called?

David

On 13/05/2014 11:19 PM, Aleksej Efimov wrote:

David,

The Windows has a different terminology for mutex objects (much
differs
from the POSIX one). This one link gave me some understanding of
it [1].

Here is the MSDN [1] description of what "abandoned mutex" is:
" If a thread terminates without releasing its ownership of a
mutex
object, the mutex object is considered to be abandoned. A
waiting thread
can acquire

Re: RFR: 8032901: WaitForMultipleObjects() return value not handled appropriately

2014-06-04 Thread Staffan Larsen
Looks ok to me.

/Staffan

On 3 jun 2014, at 15:49, Aleksej Efimov  wrote:

> Staffan, David,
> 
> Returning back to our WaitForMultipleObjects()/WAIT_ABANDONED discussions:
> Thank you for your comments and remarks. I can't disagree with motivation 
> that it's better to have a fatal error during the incorrect mutex handling 
> then data corruption (the consequence of the previous fix). In case of such 
> error it'll be much more easier to debug/catch it (but as Staffan said - we 
> have tried to check all call paths and don't think that we'll encounter such 
> behavior).
> I have modified the discussed code according to your suggestions: 
> http://cr.openjdk.java.net/~aefimov/8032901/9/webrev.01/
> To abort the process the 'exitTransportWithError' function was utilized.
> Also I have tried to check that behavior isn't changed by running "svc" 
> regression tests set. There was no related test failures observed.
> 
> Best Regards,
> Aleksej
> 
> On 05/15/2014 01:11 PM, Staffan Larsen wrote:
>> On 15 maj 2014, at 03:48, David Holmes  wrote:
>> 
>>> On 14/05/2014 11:18 PM, Aleksej Efimov wrote:
 David, Vitaly,
 
 I totally agree with Vitaly's explanation (Vitaly - thank you for that)
 and code in shmemBase.c (the usage of enterMutex() function for
 sending/receiving bytes through shared memory connection) illustrates on
 how the connection shutdown event is used as a "cancellation token".
>>> Thanks for clarifying that.
>>> 
>>> So if we were to encounter an abandoned mutex the code would presently have 
>>> acquired the mutex but return an error, thus preventing a subsequent 
>>> release, and preventing other threads from acquiring (but allowing current 
>>> thread to recurisevely acquire. So this could both hang and cause data 
>>> corruption.
>>> 
>>> The new code will still return an error but release the mutex. So no more 
>>> hangs (other than by conditions caused by data corruption) but more 
>>> opportunity for data corruption.
>>> 
>>> Obviously it depends on exactly what happens in the critical sections 
>>> guarded by this mutex, but in general I don't agree with this rationale for 
>>> making the change:
>>> 
>>> 204 /* If the mutex is abandoned we want to return an error
>>> 205  * and also release the mutex so that we don't cause
>>> 206  * other threads to be blocked. If a mutex was abandoned
>>> 207  * we are in scary state. Data may be corrupted or inconsistent
>>> 208  * but it is still better to let other threads run (and possibly
>>> 209  * crash) than having them blocked (on the premise that a crash
>>> 210  * is always easier to debug than a hang).
>>> 
>>> Considering something has to have gone drastically wrong for the mutex to 
>>> become abandoned, I'm more inclined to consider this a fatal error and 
>>> abort.
>>> 
>>> But I'll let the serviceability folk chime in here.
>> I was involved in fixing this and writing the comment, so obviously I 
>> thought it a good solution :-)
>> 
>> I do agree that it would probably be a good idea to consider this a fatal 
>> error and abort. At that point in the code we don’t have any really nice 
>> ways of doing that, though. We could just print an error and call abort(). 
>> What we are doing now is returning an error from sysIPMutexEnter() and 
>> delegating the error handling to the caller. We have tried to check all call 
>> paths to verify that they do “the right thing” in the face of an error. It 
>> is obviously hard to verify, but it looks like they all terminate the 
>> connection with some kind of error message.
>> 
>> /Staffan
>> 
>> 
>>> Thanks,
>>> David
>>> 
>>> 
 Thank you,
 -Aleksej
 
 
 On 05/14/2014 01:05 PM, David Holmes wrote:
> On 14/05/2014 11:06 AM, Vitaly Davidovich wrote:
>> In windows, you acquire a mutex by waiting on it using one of the wait
>> functions, one of them employed in the code in question.  If
>> WaitForMultipleObjects succeeds and returns the index of the mutex,
>> current thread has ownership now.
> Yes I understand the basic mechanics :)
> 
>> It's also common to use multi wait functions where the event is a
>> "cancelation token", e.g. manual reset event; this allows someone to
>> cancel waiting on mutex acquisition and return from the wait function.
>> Presumably that's the case here, but I'll let Aleksej confirm; just
>> wanted to throw this out there in the meantime :).
> Ah I see - yes cancellable lock acquisition would make sense.
> 
> Thanks,
> David
> 
>> Sent from my phone
>> 
>> On May 13, 2014 6:46 PM, "David Holmes" > > wrote:
>> 
>>Hi Aleksej,
>> 
>>Thanks for the doc references regarding abandonment.
>> 
>>Let me rephrase my question. What is this logic trying to achieve by
>>waiting on both a mutex and an event? Do we already own the mutex
>>when this 

Re: RFR: 8032901: WaitForMultipleObjects() return value not handled appropriately

2014-06-04 Thread David Holmes

Me too.

Thanks,
David

On 4/06/2014 5:23 PM, Staffan Larsen wrote:

Looks ok to me.

/Staffan

On 3 jun 2014, at 15:49, Aleksej Efimov  wrote:


Staffan, David,

Returning back to our WaitForMultipleObjects()/WAIT_ABANDONED discussions:
Thank you for your comments and remarks. I can't disagree with motivation that 
it's better to have a fatal error during the incorrect mutex handling then data 
corruption (the consequence of the previous fix). In case of such error it'll 
be much more easier to debug/catch it (but as Staffan said - we have tried to 
check all call paths and don't think that we'll encounter such behavior).
I have modified the discussed code according to your suggestions: 
http://cr.openjdk.java.net/~aefimov/8032901/9/webrev.01/
To abort the process the 'exitTransportWithError' function was utilized.
Also I have tried to check that behavior isn't changed by running "svc" 
regression tests set. There was no related test failures observed.

Best Regards,
Aleksej

On 05/15/2014 01:11 PM, Staffan Larsen wrote:

On 15 maj 2014, at 03:48, David Holmes  wrote:


On 14/05/2014 11:18 PM, Aleksej Efimov wrote:

David, Vitaly,

I totally agree with Vitaly's explanation (Vitaly - thank you for that)
and code in shmemBase.c (the usage of enterMutex() function for
sending/receiving bytes through shared memory connection) illustrates on
how the connection shutdown event is used as a "cancellation token".

Thanks for clarifying that.

So if we were to encounter an abandoned mutex the code would presently have 
acquired the mutex but return an error, thus preventing a subsequent release, 
and preventing other threads from acquiring (but allowing current thread to 
recurisevely acquire. So this could both hang and cause data corruption.

The new code will still return an error but release the mutex. So no more hangs 
(other than by conditions caused by data corruption) but more opportunity for 
data corruption.

Obviously it depends on exactly what happens in the critical sections guarded 
by this mutex, but in general I don't agree with this rationale for making the 
change:

204 /* If the mutex is abandoned we want to return an error
205  * and also release the mutex so that we don't cause
206  * other threads to be blocked. If a mutex was abandoned
207  * we are in scary state. Data may be corrupted or inconsistent
208  * but it is still better to let other threads run (and possibly
209  * crash) than having them blocked (on the premise that a crash
210  * is always easier to debug than a hang).

Considering something has to have gone drastically wrong for the mutex to 
become abandoned, I'm more inclined to consider this a fatal error and abort.

But I'll let the serviceability folk chime in here.

I was involved in fixing this and writing the comment, so obviously I thought 
it a good solution :-)

I do agree that it would probably be a good idea to consider this a fatal error 
and abort. At that point in the code we don’t have any really nice ways of 
doing that, though. We could just print an error and call abort(). What we are 
doing now is returning an error from sysIPMutexEnter() and delegating the error 
handling to the caller. We have tried to check all call paths to verify that 
they do “the right thing” in the face of an error. It is obviously hard to 
verify, but it looks like they all terminate the connection with some kind of 
error message.

/Staffan



Thanks,
David



Thank you,
-Aleksej


On 05/14/2014 01:05 PM, David Holmes wrote:

On 14/05/2014 11:06 AM, Vitaly Davidovich wrote:

In windows, you acquire a mutex by waiting on it using one of the wait
functions, one of them employed in the code in question.  If
WaitForMultipleObjects succeeds and returns the index of the mutex,
current thread has ownership now.

Yes I understand the basic mechanics :)


It's also common to use multi wait functions where the event is a
"cancelation token", e.g. manual reset event; this allows someone to
cancel waiting on mutex acquisition and return from the wait function.
Presumably that's the case here, but I'll let Aleksej confirm; just
wanted to throw this out there in the meantime :).

Ah I see - yes cancellable lock acquisition would make sense.

Thanks,
David


Sent from my phone

On May 13, 2014 6:46 PM, "David Holmes" mailto:david.hol...@oracle.com>> wrote:

Hi Aleksej,

Thanks for the doc references regarding abandonment.

Let me rephrase my question. What is this logic trying to achieve by
waiting on both a mutex and an event? Do we already own the mutex
when this function is called?

David

On 13/05/2014 11:19 PM, Aleksej Efimov wrote:

David,

The Windows has a different terminology for mutex objects (much
differs
from the POSIX one). This one link gave me some understanding of
it [1].

Here is the MSDN [1] description of what "abandoned mutex" is:
" If a thread terminates without 

Re: RFR: 8032901: WaitForMultipleObjects() return value not handled appropriately

2014-06-04 Thread Aleksej Efimov

Thank you David.
Thank you Staffan.

On 06/04/2014 01:46 PM, David Holmes wrote:

Me too.

Thanks,
David

On 4/06/2014 5:23 PM, Staffan Larsen wrote:

Looks ok to me.

/Staffan

On 3 jun 2014, at 15:49, Aleksej Efimov  
wrote:



Staffan, David,

Returning back to our WaitForMultipleObjects()/WAIT_ABANDONED 
discussions:
Thank you for your comments and remarks. I can't disagree with 
motivation that it's better to have a fatal error during the 
incorrect mutex handling then data corruption (the consequence of 
the previous fix). In case of such error it'll be much more easier 
to debug/catch it (but as Staffan said - we have tried to check all 
call paths and don't think that we'll encounter such behavior).
I have modified the discussed code according to your suggestions: 
http://cr.openjdk.java.net/~aefimov/8032901/9/webrev.01/
To abort the process the 'exitTransportWithError' function was 
utilized.
Also I have tried to check that behavior isn't changed by running 
"svc" regression tests set. There was no related test failures 
observed.


Best Regards,
Aleksej

On 05/15/2014 01:11 PM, Staffan Larsen wrote:
On 15 maj 2014, at 03:48, David Holmes  
wrote:



On 14/05/2014 11:18 PM, Aleksej Efimov wrote:

David, Vitaly,

I totally agree with Vitaly's explanation (Vitaly - thank you for 
that)

and code in shmemBase.c (the usage of enterMutex() function for
sending/receiving bytes through shared memory connection) 
illustrates on

how the connection shutdown event is used as a "cancellation token".

Thanks for clarifying that.

So if we were to encounter an abandoned mutex the code would 
presently have acquired the mutex but return an error, thus 
preventing a subsequent release, and preventing other threads from 
acquiring (but allowing current thread to recurisevely acquire. So 
this could both hang and cause data corruption.


The new code will still return an error but release the mutex. So 
no more hangs (other than by conditions caused by data corruption) 
but more opportunity for data corruption.


Obviously it depends on exactly what happens in the critical 
sections guarded by this mutex, but in general I don't agree with 
this rationale for making the change:


204 /* If the mutex is abandoned we want to return an error
205  * and also release the mutex so that we don't cause
206  * other threads to be blocked. If a mutex was abandoned
207  * we are in scary state. Data may be corrupted or 
inconsistent
208  * but it is still better to let other threads run (and 
possibly
209  * crash) than having them blocked (on the premise that a 
crash

210  * is always easier to debug than a hang).

Considering something has to have gone drastically wrong for the 
mutex to become abandoned, I'm more inclined to consider this a 
fatal error and abort.


But I'll let the serviceability folk chime in here.
I was involved in fixing this and writing the comment, so obviously 
I thought it a good solution :-)


I do agree that it would probably be a good idea to consider this a 
fatal error and abort. At that point in the code we don’t have any 
really nice ways of doing that, though. We could just print an 
error and call abort(). What we are doing now is returning an error 
from sysIPMutexEnter() and delegating the error handling to the 
caller. We have tried to check all call paths to verify that they 
do “the right thing” in the face of an error. It is obviously hard 
to verify, but it looks like they all terminate the connection with 
some kind of error message.


/Staffan



Thanks,
David



Thank you,
-Aleksej


On 05/14/2014 01:05 PM, David Holmes wrote:

On 14/05/2014 11:06 AM, Vitaly Davidovich wrote:
In windows, you acquire a mutex by waiting on it using one of 
the wait

functions, one of them employed in the code in question.  If
WaitForMultipleObjects succeeds and returns the index of the 
mutex,

current thread has ownership now.

Yes I understand the basic mechanics :)


It's also common to use multi wait functions where the event is a
"cancelation token", e.g. manual reset event; this allows 
someone to
cancel waiting on mutex acquisition and return from the wait 
function.
Presumably that's the case here, but I'll let Aleksej confirm; 
just

wanted to throw this out there in the meantime :).

Ah I see - yes cancellable lock acquisition would make sense.

Thanks,
David


Sent from my phone

On May 13, 2014 6:46 PM, "David Holmes" mailto:david.hol...@oracle.com>> wrote:

Hi Aleksej,

Thanks for the doc references regarding abandonment.

Let me rephrase my question. What is this logic trying to 
achieve by
waiting on both a mutex and an event? Do we already own the 
mutex

when this function is called?

David

On 13/05/2014 11:19 PM, Aleksej Efimov wrote:

David,

The Windows has a different terminology for mutex 
objects (much

differs
from the POSIX one). This one link gave me some 
understanding of

i