On Mon, Mar 30, 2015 at 8:31 PM, Robert Haas <robertmh...@gmail.com> wrote: > > On Wed, Mar 18, 2015 at 11:43 PM, Amit Kapila <amit.kapil...@gmail.com> wrote: > >> I think I figured out the problem. That fix only helps in the case > >> where the postmaster noticed the new registration previously but > >> didn't start the worker, and then later notices the termination. > >> What's much more likely to happen is that the worker is started and > >> terminated so quickly that both happen before we create a > >> RegisteredBgWorker for it. The attached patch fixes that case, too. > > > > Patch fixes the problem and now for Rescan, we don't need to Wait > > for workers to finish. > > I realized that there is a problem with this. If an error occurs in > one of the workers just as we're deciding to kill them all, then the > error won't be reported.
We are sending SIGTERM to worker for terminating the worker, so if the error occurs before the signal is received then it should be sent to master backend. Am I missing something here? > Also, the new code to propagate > XactLastRecEnd won't work right, either. As we are generating FATAL error on termination of worker (bgworker_die()), so won't it be handled in AbortTransaction path by below code in parallel-mode patch? + if (!parallel) + latestXid = RecordTransactionAbort(false); + else + { + latestXid = InvalidTransactionId; + + /* + * Since the parallel master won't get our value of XactLastRecEnd in this + * case, we nudge WAL-writer ourselves in this case. See related comments in + * RecordTransactionAbort for why this matters. + */ + XLogSetAsyncXactLSN(XactLastRecEnd); + } With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com