On Fri, Jan 28, 2011 at 09:32:42AM -0700, Eric Blake wrote: > On 01/24/2011 08:13 AM, Daniel P. Berrange wrote: > > Some functionality run in virExec hooks may do I/O which > > can trigger SIGPIPE. Renable SIGPIPE blocking around the > > hook function > > > * src/util/util.c: Block SIGPIPE around hooks > > - if (hook) > > + if (hook) { > > + /* virFork reset all signal handlers to the defaults. > > + * This is good for the child process, but our hook > > + * risks running something that generates SIGPIPE, > > + * so we need to temporarily block that again > > + */ > > + struct sigaction waxon, waxoff; > > Cute. > > > + waxoff.sa_handler = SIG_IGN; > > + waxoff.sa_flags = 0; > > + memset(&waxon, 0, sizeof(waxon)); > > + if (sigaction(SIGPIPE, &waxoff, &waxon) < 0) { > > + virReportSystemError(errno, "%s", > > + _("Could not disable SIGPIPE")); > > Yikes. We have a potential deadlock problem. See this bug report > against GNU sort: > > http://lists.gnu.org/archive/html/coreutils/2011-01/msg00085.html > > In sort, any program that mixes pthread_create with fork (and libvirt > falls into that category) must obey this section of POSIX: > > http://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html > > "If a multi-threaded process calls fork(), the new process shall contain > a replica of the calling thread and its entire address space, possibly > including the states of mutexes and other resources. Consequently, to > avoid errors, the child process may only execute async-signal-safe > operations until such time as one of the exec functions is called." > > malloc() (which is called by virReportSystemError(), as well as by _()) > is NOT async-signal-safe; therefore, it is quite possible that we fork() > in one thread while another thread is in the middle of holding the > malloc() mutex, and the child process will deadlock because it no longer > has a secondary thread available to release the malloc() mutex. > > Ultimately, we need to refactor and audit the code so that only > async-signal-safe functions are allowed between fork() and exec(); which > means that virExec needs to be taught how to hand all errors back to the > parent over a secondary pipe for the parent to issue (rather than the > child attempting to issue any errors on its own). > > However, that problem is pre-existing; so your patch, while adding > another instance of a violation, is not adding a regression, so:
Hmm, we have had this style of problem before, but with libvirt internal APIs. eg, this is why we do virLogLock() and Unlock across the fork() call. I didn't occur to me that we could get hit at the POSIX level with this. This is a collosal PITA because we do quite alot of work inbetween fork+exec() in QEMU, including calling out to library APIs we don't control to the extent that I doubt we can practically audit it, or easily address it :-( Daniel -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list