[HACKERS] relaying errors from background workers

Robert Haas Wed, 21 May 2014 21:22:22 -0700

Suppose a user backend starts a background worker for some purpose;
the background worker dies with an error.  The infrastructure we have
today is sufficient for the user backend to discover that the worker
backend has died, but not why.  There might be an error in the server
log, but the error information won't be transmitted back to the user
backend in any way.  I think it would be nice to fix that.  I also
think it would be nice to be able to relay not only errors, but also
messages logged via ereport() or elog() at lower log levels (WARNING,
NOTICE, INFO, DEBUG).


The design I have in mind is to teach elog.c how to write such
messages to a shm_mq.  This is in fact one of the major use cases I
had in mind when I designed the shm_mq infrastructure, because it
seems to me that almost anything we want to do in parallel is likely
to want to do this.  Even aside from parallelism, it's not too hard to
imagine wanting to use background workers to launch a job in the
background and then come back later and see what happened.  If there
was an error, you're going to want go know specifically what went
wrong, not just that something went wrong.

The main thing I'm not sure about is how to format the message that we
write to the shm_mq.  One option is to try to use the good old FEBE
protocol.  This doesn't look entirely straightforward, because
send_message_to_frontend() assembles the message using pq_sendbyte()
and pq_sendstring(), and then sends it out to the client using
pq_endmessage() and pq_flush().  This infrastructure assumes that the
connection to the frontend is a socket.  It doesn't seem impossible to
kludge that infrastructure to be able to send to either a socket or a
shm_mq, but I'm not sure whether it's a good idea.  Alternatively, we
could devise some other message format specific to this problem; it
would probably look a lot like an ErrorData protocol message, but
maybe that's doesn't really matter.  Any thoughts?

A third alternative is to say, OK, we really ought to have an actual
socket connection between those backends, so that using FEBE just
works.  I don't think that's a good idea.  It would require passing a
socket descriptor from the user backend up to the postmaster and then
back down to the background worker, or else using some kind of FIFO.
That's a set of portability problems I'd rather not deal with.  I
think the shm_mq infrastructure is also better in that it gives us the
ability to use a very large queue size if, for example, we discover
that we need that in order to avoid having the client block because
the queue is full.  Socket buffer sizes can be adjusted at the OS
level, of course, but the details are different on every platform and
the upper limits tend not to be too large.

Another thing to think about is that, most likely, many users of the
background worker facility will want to respond to a relayed error by
rethrowing it.  That means that whatever format we use to send the
error from one process to the other has to be able to be decoded by
the receiving process.  That process will probably want to do
something like add add a bit more to the context (e.g. "in background
worker PID %d") and throw the resulting error preserving the rest of
the original fields.  I'm not sure exactly what make sense here, but
the point is that ideally the message format should be something that
the receiver can rethrow, possibly after tweaking it a bit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] relaying errors from background workers

Reply via email to