Re: race condition destroying condition variables

2017-12-27 Thread Samuel Thibault
Brent W. Baccala, on mer. 27 déc. 2017 15:07:51 -0500, wrote:
> On Wed, Dec 27, 2017 at 2:31 PM, Samuel Thibault <[1]samuel.thiba...@gnu.org>
> wrote:
> 
> Ok, so even if Posix explicitly says that it has undefined behavior,
> since nptl behaves fine we should probably behave fine too.
> 
> My point was that since Linux waits for the other threads in
> pthread_cond_destroy(), we should too. 

Well, that's exactly what I wrote :)

Samuel



Re: race condition destroying condition variables

2017-12-27 Thread Brent W. Baccala
On Wed, Dec 27, 2017 at 2:31 PM, Samuel Thibault 
wrote:

> Hello,
>
> Brent W. Baccala, on mar. 26 déc. 2017 23:06:13 -0500, wrote:
> > Also, the Linux source code in nptl/ includes the following comment:
> >
> >   /* If there are waiters which have been already signalled or
> >  broadcasted, but still are using the pthread_cond_t structure,
> >  pthread_cond_destroy needs to wait for them.  */
>
> Ok, so even if Posix explicitly says that it has undefined behavior,
> since nptl behaves fine we should probably behave fine too.
>

Let me clarify - that comment precedes a block of code in
pthread_cond_destroy() that waits for the other threads.

See glibc-2.23/nptl/pthread_cond_destroy.c, around line 50.

That code's been rewritten in glibc-2.25, but the requirement is still
there.

glibc-2.25/nptl/pthread_cond_destroy.c, lines 38-40:

Thus, we can assume that all waiters that are still accessing the condvar
> have been woken.  We wait until they have confirmed to have woken up by
> decrementing __wrefs.


... and then it waits for "wrefs >> 3" to become zero.

My point was that since Linux waits for the other threads in
pthread_cond_destroy(), we should too.

agape
brent


Re: race condition destroying condition variables

2017-12-27 Thread Samuel Thibault
Hello,

Brent W. Baccala, on mar. 26 déc. 2017 23:06:13 -0500, wrote:
> Also, the Linux source code in nptl/ includes the following comment:
> 
>   /* If there are waiters which have been already signalled or
>      broadcasted, but still are using the pthread_cond_t structure,
>      pthread_cond_destroy needs to wait for them.  */

Ok, so even if Posix explicitly says that it has undefined behavior,
since nptl behaves fine we should probably behave fine too.

> The problem that I've got now is that I've changed the size of the condition
> variables by adding an extra field,

Not a good idea, for the resaons you mention :)

> First, I'd like to build the packages using the same scripts used to build the
> Debian distribution packages.  Is this available somewhere?

Just run dpkg-buildpackage from the package source, or use pdebuild, or
use a wanna-build infrastructure. Nothing simple, at any rate.

> Next, how will modifying the memory layout of condition variables affect our
> packaging?  I figure that we'll have to bump libc0.3 to libc0.4, and that
> should pretty much do the trick, right?

Yes, but that's a *HUGE* transition, which is painful. Better use symbol
versioning, so that old non-recompiled programs use the old
implementation, and recompiled programs used the new.

> Finally, I'm wondering if we need to change this at all.  The "__data" field 
> in
> struct __pthread_cond seems unused, at least in the libpthreads directory.  Is
> there any use for this pointer?  Or can I use those bits?

AFAICT it was used for compatibility with libcthreads, which we don't
care any more, so I'd say feel free to use it, and avoid the transition
altogether.

Samuel



Re: race condition destroying condition variables

2017-12-26 Thread Brent W. Baccala
Well, I've tried both Samuel's (swap space) and Svante's (nocheck)
suggestions and have found that both allow me to successfully build the
glibc packages!

The problem that I've got now is that I've changed the size of the
condition variables by adding an extra field, an integer that tracks the
number of waiting threads.  This means that I have to recompile anything
that depends on the size of a condition variable, which is potentially
everything on the system.

First, I'd like to build the packages using the same scripts used to build
the Debian distribution packages.  Is this available somewhere?

Next, how will modifying the memory layout of condition variables affect
our packaging?  I figure that we'll have to bump libc0.3 to libc0.4, and
that should pretty much do the trick, right?

Finally, I'm wondering if we need to change this at all.  The "__data"
field in struct __pthread_cond seems unused, at least in the libpthreads
directory.  Is there any use for this pointer?  Or can I use those bits?

Also, the Linux source code in nptl/ includes the following comment:

  /* If there are waiters which have been already signalled or
 broadcasted, but still are using the pthread_cond_t structure,
 pthread_cond_destroy needs to wait for them.  */

...which is the conclusion I've come to as well.

agape
brent


Re: race condition destroying condition variables

2017-12-21 Thread Svante Signell
On Fri, 2017-12-22 at 00:22 +0100, Samuel Thibault wrote:
> 
> > 
> > The testsuite for glibc is known to be broken. Just
> > export DEB_BUILD_OPTIONS=nocheck before building
> > with dpkg-buildpackage -b ...
> 
> Broken tests are marked as such, there aren't so many marked as such.
> 
> Using nocheck is a way to avoid the issue indeed.
> 
> The build boxes have 3GB memory and 4GB swap, and do pass the
> testsuite
> fine (minus the marked tests)

Well, I've never managed to build glibc with tests enabled. And it
takes a much longer time to build. But of course YMWV.



Re: race condition destroying condition variables

2017-12-21 Thread Samuel Thibault
Svante Signell, on jeu. 21 déc. 2017 23:58:20 +0100, wrote:
> On Thu, 2017-12-21 at 17:32 -0500, Brent W. Baccala wrote:
> > Well, I've got a patch that might work, but I'm having a lot of
> > trouble testing it.
> > 
> > I can't dpkg-buildpackage the Debian glibc package.
> > 
> > It gets into the test routines, then a bunch of the math tests crash
> > with SIGSEGVs and SIGILLs, then I get a bunch of kernel errors:
> > 
> > no more room in ee26a908 ((...i386-libc/elf/ld.so.1(2423))
> > no more room in ee26a908 ((...i386-libc/elf/ld.so.1(2423))
> > no more room in ee26a908 ((...i386-libc/elf/ld.so.1(2423))
> > no more room in ee26a908 ((...i386-libc/elf/ld.so.1(2814))
> > no more room in ee26a908 ((...i386-libc/elf/ld.so.1(2814))
> > no more room in ee26a908 ((...i386-libc/elf/ld.so.1(2814))
> > 
> > This happens in either tst-vfprintf-width-prec or tst-mallocfork2,
> > depending on whether I allocate 2 GB or 4 GB to the virtual machine.
> > 
> > Any ideas what the problem might be?
> > 
> > And how are the Debian packages built for downloading?  It must be
> > something different than what I'm doing...
> 
> Hi Brent,
> 
> The testsuite for glibc is known to be broken. Just
> export DEB_BUILD_OPTIONS=nocheck before building
> with dpkg-buildpackage -b ...

Broken tests are marked as such, there aren't so many marked as such.

Using nocheck is a way to avoid the issue indeed.

The build boxes have 3GB memory and 4GB swap, and do pass the testsuite
fine (minus the marked tests)

Samuel



Re: race condition destroying condition variables

2017-12-21 Thread Svante Signell
On Thu, 2017-12-21 at 17:32 -0500, Brent W. Baccala wrote:
> Well, I've got a patch that might work, but I'm having a lot of
> trouble testing it.
> 
> I can't dpkg-buildpackage the Debian glibc package.
> 
> It gets into the test routines, then a bunch of the math tests crash
> with SIGSEGVs and SIGILLs, then I get a bunch of kernel errors:
> 
> no more room in ee26a908 ((...i386-libc/elf/ld.so.1(2423))
> no more room in ee26a908 ((...i386-libc/elf/ld.so.1(2423))
> no more room in ee26a908 ((...i386-libc/elf/ld.so.1(2423))
> no more room in ee26a908 ((...i386-libc/elf/ld.so.1(2814))
> no more room in ee26a908 ((...i386-libc/elf/ld.so.1(2814))
> no more room in ee26a908 ((...i386-libc/elf/ld.so.1(2814))
> 
> This happens in either tst-vfprintf-width-prec or tst-mallocfork2,
> depending on whether I allocate 2 GB or 4 GB to the virtual machine.
> 
> Any ideas what the problem might be?
> 
> And how are the Debian packages built for downloading?  It must be
> something different than what I'm doing...

Hi Brent,

The testsuite for glibc is known to be broken. Just
export DEB_BUILD_OPTIONS=nocheck before building
with dpkg-buildpackage -b ...

HTH



Re: race condition destroying condition variables

2017-12-21 Thread Brent W. Baccala
Well, I've got a patch that might work, but I'm having a lot of trouble
testing it.

I can't dpkg-buildpackage the Debian glibc package.

It gets into the test routines, then a bunch of the math tests crash with
SIGSEGVs and SIGILLs, then I get a bunch of kernel errors:

no more room in ee26a908 ((...i386-libc/elf/ld.so.1(2423))
no more room in ee26a908 ((...i386-libc/elf/ld.so.1(2423))
no more room in ee26a908 ((...i386-libc/elf/ld.so.1(2423))
no more room in ee26a908 ((...i386-libc/elf/ld.so.1(2814))
no more room in ee26a908 ((...i386-libc/elf/ld.so.1(2814))
no more room in ee26a908 ((...i386-libc/elf/ld.so.1(2814))

This happens in either tst-vfprintf-width-prec or tst-mallocfork2,
depending on whether I allocate 2 GB or 4 GB to the virtual machine.

Any ideas what the problem might be?

And how are the Debian packages built for downloading?  It must be
something different than what I'm doing...

agape
brent


Re: race condition destroying condition variables

2017-12-19 Thread Brent W. Baccala
On Tue, Dec 19, 2017 at 3:25 AM, Samuel Thibault 
 wrote:

> Brent W. Baccala, on mar. 19 déc. 2017 00:08:44 -0500, wrote:
> > Looks like there's a race condition when we destroy a condition
> variable.  My
> > understanding of the expected behavior is that once all the threads have
> been
> > signaled (i.e, pthread_cond_broadcast is called), the condition variable
> can be
> > safely destroyed with pthread_cond_destroy.
>
> Err, I don't think that POSIX allows to assume that. The fact that
> pthread_cond_broadcast has returned doesn't mean that other threads have
> finished with pthread_cond_wait.
>
>
POSIX seems a little unclear on that, but C++ seems to require it.

POSIX:

"It shall be safe to destroy an initialized condition variable upon which
no threads are currently blocked." [1]

and

"The *pthread_cond_broadcast*() function shall unblock all threads
currently blocked on the specified condition variable *cond*." [2]

A little further down on [1], in an "informative" section, is the following
snippet of code:

(A) pthread_cond_broadcast(&ep->notbusy);
pthread_mutex_unlock(&lp->lm);
(B) pthread_cond_destroy(&rp->notbusy);

...accompanied by the following comment:

"In this example, the condition variable and its list element may be freed
(line B) immediately after all threads waiting for it are awakened (line
A)..."

which is what makes sense to me.  Of course, our pthread_cond_destroy()
does nothing, but assuming that the condvar is now freed in a step (C),
this code won't work reliably in our current implementation.

I found a discussion of this on stackexchange [3], where the top answer
observes that 'The standard only says "shall unblock" and not "on
successful return from this call they will be unblocked".'

The C++ standard, however, is more explicit, in section 30.5.1:

"Only the notification to unblock the wait must happen before destruction".

cppreference.com [4] says:

"It is only safe to invoke the destructor if all threads have been
notified. It is not required that they have exited their respective wait
functions: some threads may still be waiting to reacquire the associated
lock, or may be waiting to be scheduled to run after reacquiring it."

That's the behavior I was counting on.  I'm using C++, and have found that
I can't just notify_all() a condition variable, then destroy it.

And, yes, I've got everything locked so that another thread can't jump in
there with a wait() between those two events.

On Tue, Dec 19, 2017 at 4:17 AM, Richard Braun  wrote:

Besides, the threads should also all go through reacquiring the associated
> mutex, usually sitting right next to the condition variable, and usually
> both embedded in a parent object. What you're normally really interested
> in is releasing this parent object, including destroying the mutex, which
> means you also have to wait for all threads to unlock it. One common way
> to deal with this is reference counters on the parent object.


In my case, I've got a lot more condition variables than mutexes, because I
don't want threads waking up for events other than the one they're waiting
for.  One mutex for the entire pager, and a condition variable for every
outstanding lock and write.  So, for example, if a thread is waiting for a
particular write to finish, it waits on that write's condition variable,
which gets signaled and destroyed when the write finishes, while the pager
object and the mutex continue to exist.

I was thinking about wrapping such a counter as you suggest into the
condition variable structure, to ensure that the POSIX informative behavior
and the C++ behavior work reliably.

Increment the counter (atomically) when we're about to block on a condition
variable, and decrement it when we're done the post-block processing.  Then
pthread_cond_destroy() can first check to make sure that the wait queue is
empty (return EBUSY if not), then spin wait for the counter to be zero
otherwise.  I think that should ensure that we can free() the condition
variable after pthread_cond_destroy() has returned, and that, in turn, will
ensure that the C++ destructor works reliably, too.

agape
brent

[1]
http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_cond_destroy.html
[2]
http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_cond_broadcast.html
[3]
https://stackoverflow.com/questions/7598457/when-can-a-cond-var-be-used-to-synchronize-its-own-destruction-unmapping
[4]
http://en.cppreference.com/w/cpp/thread/condition_variable/~condition_variable


Re: race condition destroying condition variables

2017-12-19 Thread Richard Braun
On Tue, Dec 19, 2017 at 09:25:16AM +0100, Samuel Thibault wrote:
> Brent W. Baccala, on mar. 19 déc. 2017 00:08:44 -0500, wrote:
> > Looks like there's a race condition when we destroy a condition variable.  
> > My
> > understanding of the expected behavior is that once all the threads have 
> > been
> > signaled (i.e, pthread_cond_broadcast is called), the condition variable 
> > can be
> > safely destroyed with pthread_cond_destroy.
> 
> Err, I don't think that POSIX allows to assume that. The fact that
> pthread_cond_broadcast has returned doesn't mean that other threads have
> finished with pthread_cond_wait.

Besides, the threads should also all go through reacquiring the associated
mutex, usually sitting right next to the condition variable, and usually
both embedded in a parent object. What you're normally really interested
in is releasing this parent object, including destroying the mutex, which
means you also have to wait for all threads to unlock it. One common way
to deal with this is reference counters on the parent object.

-- 
Richard Braun



Re: race condition destroying condition variables

2017-12-19 Thread Samuel Thibault
Brent W. Baccala, on mar. 19 déc. 2017 00:08:44 -0500, wrote:
> Looks like there's a race condition when we destroy a condition variable.  My
> understanding of the expected behavior is that once all the threads have been
> signaled (i.e, pthread_cond_broadcast is called), the condition variable can 
> be
> safely destroyed with pthread_cond_destroy.

Err, I don't think that POSIX allows to assume that. The fact that
pthread_cond_broadcast has returned doesn't mean that other threads have
finished with pthread_cond_wait.

Samuel