[OMPI devel] opal_condition_wait

2007-12-06 Thread Tim Prins

Hi,

A couple of questions.

First, in opal_condition_wait (condition.h:97) we do not release the 
passed mutex if opal_using_threads() is not set. Is there a reason for 
this? I ask since this violates the way condition variables are supposed 
to work, and it seems like there are situations where this could cause 
deadlock.


Also, when we are using threads, there is a case where we do not 
decrement the signaled count, in condition.h:84. Gleb put this in in 
r9451, however the change does not make sense to me. I think that the 
signal count should always be decremented.


Can anyone shine any light on these issues?

Thanks,

Tim


Re: [OMPI devel] opal_condition_wait

2007-12-06 Thread Tim Prins

Tim Prins wrote:

Hi,

A couple of questions.

First, in opal_condition_wait (condition.h:97) we do not release the 
passed mutex if opal_using_threads() is not set. Is there a reason for 
this? I ask since this violates the way condition variables are supposed 
to work, and it seems like there are situations where this could cause 
deadlock.
So in (partial) answer to my own email, this is because throughout the 
code we do:

OPAL_THREAD_LOCK(m)
opal_condition_wait(cond, m);
OPAL_THREAD_UNLOCK(m)

So this relies on opal_condition_wait not touching the lock. This 
explains it, but it still seems very wrong.




Also, when we are using threads, there is a case where we do not 
decrement the signaled count, in condition.h:84. Gleb put this in in 
r9451, however the change does not make sense to me. I think that the 
signal count should always be decremented.


Can anyone shine any light on these issues?

Thanks,

Tim
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] opal_condition_wait

2007-12-06 Thread Brian W. Barrett

On Thu, 6 Dec 2007, Tim Prins wrote:


Tim Prins wrote:

First, in opal_condition_wait (condition.h:97) we do not release the
passed mutex if opal_using_threads() is not set. Is there a reason for
this? I ask since this violates the way condition variables are supposed
to work, and it seems like there are situations where this could cause
deadlock.

So in (partial) answer to my own email, this is because throughout the
code we do:
OPAL_THREAD_LOCK(m)
opal_condition_wait(cond, m);
OPAL_THREAD_UNLOCK(m)

So this relies on opal_condition_wait not touching the lock. This
explains it, but it still seems very wrong.


Yes, this is correct.  The assumption is that you are using the 
conditional macro lock/unlock with the condition variables.  I personally 
don't like this (I think we should have had macro conditional condition 
variables), but that obviously isn't how it works today.


The problem with always holding the lock when you enter the condition 
variable is that even when threading is disabled, calling a lock is at 
least as expensive as an add, possibly including a cache miss.  So from a 
performance standpoint, this would be a no-go.



Also, when we are using threads, there is a case where we do not
decrement the signaled count, in condition.h:84. Gleb put this in in
r9451, however the change does not make sense to me. I think that the
signal count should always be decremented.

Can anyone shine any light on these issues?


Unfortunately, I can't add much on this front.

Brian


Re: [OMPI devel] opal_condition_wait

2007-12-06 Thread Gleb Natapov
On Thu, Dec 06, 2007 at 09:46:45AM -0500, Tim Prins wrote:
> Also, when we are using threads, there is a case where we do not 
> decrement the signaled count, in condition.h:84. Gleb put this in in 
> r9451, however the change does not make sense to me. I think that the 
> signal count should always be decremented.
> 
> Can anyone shine any light on these issues?
> 
I made this change a long time ago (I wander why I even tested threaded
build back then), but what I recall looking into the code and log message
there was a deadlock when signal broadcast doesn't wake up all thread
that are waiting on a conditional variable. Suppose two threads wait on
a condition C, third thread does broadcast. This makes C->c_signaled to
be equal 2. Now one thread wakes up and decrement C->c_signaled by one.
And before other thread is starting to run it calls condition_wait on C
one more time. Because c_signaled is 1 it doesn't sleep and decrement
c_signaled one more time. Now c_signaled is zero and when second thread
wakes up it see this and go to sleep again. The solution was to check in
condition_wait if condition is already signaled before go to sleep and
if yes exit immediately.

--
Gleb.


Re: [OMPI devel] opal_condition_wait

2007-12-11 Thread Tim Prins
Ok, I think I am understanding this a bit now. By not decrementing the 
signaled count, we are allowing a single broadcast to wake up the same 
thread multiple times, and are allowing a single cond_signal to wake up 
multiple threads.


My understanding was that this behavior was not right, but upon further 
inspection of the pthreads documentation this behavior seems to be 
allowable.


Thanks for the clarifications,

Tim

Gleb Natapov wrote:

On Thu, Dec 06, 2007 at 09:46:45AM -0500, Tim Prins wrote:
Also, when we are using threads, there is a case where we do not 
decrement the signaled count, in condition.h:84. Gleb put this in in 
r9451, however the change does not make sense to me. I think that the 
signal count should always be decremented.


Can anyone shine any light on these issues?


I made this change a long time ago (I wander why I even tested threaded
build back then), but what I recall looking into the code and log message
there was a deadlock when signal broadcast doesn't wake up all thread
that are waiting on a conditional variable. Suppose two threads wait on
a condition C, third thread does broadcast. This makes C->c_signaled to
be equal 2. Now one thread wakes up and decrement C->c_signaled by one.
And before other thread is starting to run it calls condition_wait on C
one more time. Because c_signaled is 1 it doesn't sleep and decrement
c_signaled one more time. Now c_signaled is zero and when second thread
wakes up it see this and go to sleep again. The solution was to check in
condition_wait if condition is already signaled before go to sleep and
if yes exit immediately.

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] opal_condition_wait

2007-12-11 Thread Tim Prins
Well, this makes some sense, although it still seems like this violates 
the spirit of condition variables.


Thanks,

Tim

Brian W. Barrett wrote:

On Thu, 6 Dec 2007, Tim Prins wrote:


Tim Prins wrote:

First, in opal_condition_wait (condition.h:97) we do not release the
passed mutex if opal_using_threads() is not set. Is there a reason for
this? I ask since this violates the way condition variables are supposed
to work, and it seems like there are situations where this could cause
deadlock.

So in (partial) answer to my own email, this is because throughout the
code we do:
OPAL_THREAD_LOCK(m)
opal_condition_wait(cond, m);
OPAL_THREAD_UNLOCK(m)

So this relies on opal_condition_wait not touching the lock. This
explains it, but it still seems very wrong.


Yes, this is correct.  The assumption is that you are using the 
conditional macro lock/unlock with the condition variables.  I personally 
don't like this (I think we should have had macro conditional condition 
variables), but that obviously isn't how it works today.


The problem with always holding the lock when you enter the condition 
variable is that even when threading is disabled, calling a lock is at 
least as expensive as an add, possibly including a cache miss.  So from a 
performance standpoint, this would be a no-go.



Also, when we are using threads, there is a case where we do not
decrement the signaled count, in condition.h:84. Gleb put this in in
r9451, however the change does not make sense to me. I think that the
signal count should always be decremented.

Can anyone shine any light on these issues?


Unfortunately, I can't add much on this front.

Brian
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] opal_condition_wait

2007-12-11 Thread Gleb Natapov
On Tue, Dec 11, 2007 at 10:27:55AM -0500, Tim Prins wrote:
> My understanding was that this behavior was not right, but upon further 
> inspection of the pthreads documentation this behavior seems to be 
> allowable.
> 
I think that Open MPI does not implement condition variable in the strict
sense. Open MPI condition variable has to progress devices and wait for a
condition simultaneously and not just wait till a condition is satisfied.

--
Gleb.