On Thu, Dec 5, 2013 at 11:12 AM, Heikki Linnakangas <hlinnakan...@vmware.com> wrote: > Hmm. Those two use cases are quite different. For message-passing, you want > a lot of small queues, but for parallel sort, you want one huge allocation. > I wonder if we shouldn't even try a one-size-fits-all solution. > > For message-passing, there isn't much need to even use dynamic shared > memory. You could just assign one fixed-sized, single-reader multiple-writer > queue for each backend.
True, although if the queue needs to 1MB, or even 128kB, that would bloat the static shared-memory footprint over the server pretty significantly. And I don't know that we know that a small queue will be adequate in all cases. If you've got a worker backend feeding data back to the user backend, the size of the queue limits how far ahead of the user backend that worker can get. Big is good, because then the user backend won't stall on read, but small is also good, in case the query is cancelled or hits an error. It is far from obvious to me that one-size-fits-all is the right solution. > For parallel sort, you'll want to utilize all the available memory and all > CPUs for one huge sort. So all you really need is a single huge shared > memory segment. If one process is already using that 512GB segment to do a > sort, you do *not* want to allocate a second 512GB segment. You'll want to > wait for the first operation to finish first. Or maybe you'll want to have > 3-4 somewhat smaller segments in use at the same time, but not more than > that. This is all true, but it has basically nothing to do with parallelism. work_mem is a poor model, but I didn't invent it. Hopefully some day someone will fix it, maybe even me, but that's a separate project. > I really think we need to do something about it. To use your earlier example > of parallel sort, it's not acceptable to permanently leak a 512 GB segment > on a system with 1 TB of RAM. > > One idea is to create the shared memory object with shm_open, and wait until > all the worker processes that need it have attached to it. Then, > shm_unlink() it, before using it for anything. That way the segment will be > automatically released once all the processes close() it, or die. In > particular, kill -9 will release it. (This is a variant of my earlier idea > to create a small number of anonymous shared memory file descriptors in > postmaster startup with shm_open(), and pass them down to child processes > with fork()). I think you could use that approach with SysV shared memory as > well, by destroying the segment with sgmget(IPC_RMID) immediately after all > processes have attached to it. That's a very interesting idea. I've been thinking that we needed to preserve the property that new workers could attach to the shared memory segment at any time, but that might not be necessary in all case. We could introduce a new dsm operation that means "i promise no one else needs to attach to this segment". Further attachments would be disallowed by dsm.c regardless of the implementation in use, and dsm_impl.c would also be given a chance to perform implementation-specific operations, like shm_unlink and shmctl(IPC_RMID). This new operation, when used, would help to reduce the chance of leaks and perhaps catch other programming errors as well. What should we call it? dsm_finalize() is the first thing that comes to mind, but I'm not sure I like that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers