On Fri, Jan 28, 2011 at 09:32:42AM -0700, Eric Blake wrote:
> On 01/24/2011 08:13 AM, Daniel P. Berrange wrote:
> > Some functionality run in virExec hooks may do I/O which
> > can trigger SIGPIPE. Renable SIGPIPE blocking around the
> > hook function
> 
> > * src/util/util.c: Block SIGPIPE around hooks
> > -    if (hook)
> > +    if (hook) {
> > +        /* virFork reset all signal handlers to the defaults.
> > +         * This is good for the child process, but our hook
> > +         * risks running something that generates SIGPIPE,
> > +         * so we need to temporarily block that again
> > +         */
> > +        struct sigaction waxon, waxoff;
> 
> Cute.
> 
> > +        waxoff.sa_handler = SIG_IGN;
> > +        waxoff.sa_flags = 0;
> > +        memset(&waxon, 0, sizeof(waxon));
> > +        if (sigaction(SIGPIPE, &waxoff, &waxon) < 0) {
> > +            virReportSystemError(errno, "%s",
> > +                                 _("Could not disable SIGPIPE"));
> 
> Yikes.  We have a potential deadlock problem.  See this bug report
> against GNU sort:
> 
> http://lists.gnu.org/archive/html/coreutils/2011-01/msg00085.html
> 
> In sort, any program that mixes pthread_create with fork (and libvirt
> falls into that category) must obey this section of POSIX:
> 
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html
> 
> "If a multi-threaded process calls fork(), the new process shall contain
> a replica of the calling thread and its entire address space, possibly
> including the states of mutexes and other resources. Consequently, to
> avoid errors, the child process may only execute async-signal-safe
> operations until such time as one of the exec functions is called."
> 
> malloc() (which is called by virReportSystemError(), as well as by _())
> is NOT async-signal-safe; therefore, it is quite possible that we fork()
> in one thread while another thread is in the middle of holding the
> malloc() mutex, and the child process will deadlock because it no longer
> has a secondary thread available to release the malloc() mutex.
> 
> Ultimately, we need to refactor and audit the code so that only
> async-signal-safe functions are allowed between fork() and exec(); which
> means that virExec needs to be taught how to hand all errors back to the
> parent over a secondary pipe for the parent to issue (rather than the
> child attempting to issue any errors on its own).
> 
> However, that problem is pre-existing; so your patch, while adding
> another instance of a violation, is not adding a regression, so:

Hmm, we have had this style of problem before, but with libvirt
internal APIs. eg, this is why we do virLogLock() and Unlock
across the fork() call.

I didn't occur to me that we could get hit at the POSIX level
with this. This is a collosal PITA because we do quite alot
of work inbetween fork+exec() in QEMU, including calling out
to library APIs we don't control to the extent that I doubt
we can practically audit it, or easily address it :-(

Daniel

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Reply via email to