Suppose a user backend starts a background worker for some purpose; the background worker dies with an error. The infrastructure we have today is sufficient for the user backend to discover that the worker backend has died, but not why. There might be an error in the server log, but the error information won't be transmitted back to the user backend in any way. I think it would be nice to fix that. I also think it would be nice to be able to relay not only errors, but also messages logged via ereport() or elog() at lower log levels (WARNING, NOTICE, INFO, DEBUG).
The design I have in mind is to teach elog.c how to write such messages to a shm_mq. This is in fact one of the major use cases I had in mind when I designed the shm_mq infrastructure, because it seems to me that almost anything we want to do in parallel is likely to want to do this. Even aside from parallelism, it's not too hard to imagine wanting to use background workers to launch a job in the background and then come back later and see what happened. If there was an error, you're going to want go know specifically what went wrong, not just that something went wrong. The main thing I'm not sure about is how to format the message that we write to the shm_mq. One option is to try to use the good old FEBE protocol. This doesn't look entirely straightforward, because send_message_to_frontend() assembles the message using pq_sendbyte() and pq_sendstring(), and then sends it out to the client using pq_endmessage() and pq_flush(). This infrastructure assumes that the connection to the frontend is a socket. It doesn't seem impossible to kludge that infrastructure to be able to send to either a socket or a shm_mq, but I'm not sure whether it's a good idea. Alternatively, we could devise some other message format specific to this problem; it would probably look a lot like an ErrorData protocol message, but maybe that's doesn't really matter. Any thoughts? A third alternative is to say, OK, we really ought to have an actual socket connection between those backends, so that using FEBE just works. I don't think that's a good idea. It would require passing a socket descriptor from the user backend up to the postmaster and then back down to the background worker, or else using some kind of FIFO. That's a set of portability problems I'd rather not deal with. I think the shm_mq infrastructure is also better in that it gives us the ability to use a very large queue size if, for example, we discover that we need that in order to avoid having the client block because the queue is full. Socket buffer sizes can be adjusted at the OS level, of course, but the details are different on every platform and the upper limits tend not to be too large. Another thing to think about is that, most likely, many users of the background worker facility will want to respond to a relayed error by rethrowing it. That means that whatever format we use to send the error from one process to the other has to be able to be decoded by the receiving process. That process will probably want to do something like add add a bit more to the context (e.g. "in background worker PID %d") and throw the resulting error preserving the rest of the original fields. I'm not sure exactly what make sense here, but the point is that ideally the message format should be something that the receiver can rethrow, possibly after tweaking it a bit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers