On 12/10/2013 07:27 PM, Noah Misch wrote:
On Thu, Dec 05, 2013 at 06:12:48PM +0200, Heikki Linnakangas wrote:
On 11/20/2013 09:58 PM, Robert Haas wrote:
On Wed, Nov 20, 2013 at 8:32 AM, Heikki Linnakangas <hlinnakan...@vmware.com> 
wrote:
* As discussed in the "Something fishy happening on frogmouth" thread, I
don't like the fact that the dynamic shared memory segments will be
permanently leaked if you kill -9 postmaster and destroy the data directory.

Your test elicited different behavior for the dsm code vs. the main
shared memory segment because it involved running a new postmaster
with a different data directory but the same port number on the same
machine, and expecting that that new - and completely unrelated -
postmaster would clean up the resources left behind by the old,
now-destroyed cluster.  I tend to view that as a defect in your test
case more than anything else, but as I suggested previously, we could
potentially change the code to use something like 1000000 + (port *
100) with a forward search for the control segment identifier, instead
of using a state file, mimicking the behavior of the main shared
memory segment.  I'm not sure we ever reached consensus on whether
that was overall better than what we have now.

I really think we need to do something about it. To use your earlier
example of parallel sort, it's not acceptable to permanently leak a 512
GB segment on a system with 1 TB of RAM.

I don't.  Erasing your data directory after an unclean shutdown voids any
expectations for a thorough, automatic release of system resources.  Don't do
that.  The next time some new use of a persistent resource violates your hope
for this scenario, there may be no remedy.

Well, the point of erasing the data directory is to release system resources. I would normally expect "killall -9 <process>; rm -rf <data dir>" to thorougly get rid of the running program and all the resources. It's surprising enough that the regular shared memory segment is left behind, but at least that one gets cleaned up when you start a new server (on same port). Let's not add more cases like that, if we can avoid it.

BTW, what if the data directory is seriously borked, and the server won't start? Sure, don't do that, but it would be nice to have a way to recover if you do anyway. (docs?)

One idea is to create the shared memory object with shm_open, and wait
until all the worker processes that need it have attached to it. Then,
shm_unlink() it, before using it for anything. That way the segment will
be automatically released once all the processes close() it, or die. In
particular, kill -9 will release it. (This is a variant of my earlier
idea to create a small number of anonymous shared memory file
descriptors in postmaster startup with shm_open(), and pass them down to
child processes with fork()). I think you could use that approach with
SysV shared memory as well, by destroying the segment with
sgmget(IPC_RMID) immediately after all processes have attached to it.

That leaves a window in which we still leak the segment,

A small window is better than a large one.

Another refinement is to wait for all the processes to attach before setting the segment's size with ftruncate(). That way, when the window is open for leaking the segment, it's still 0-sized so leaking it is not a big deal.

and it is less
general: not every use of DSM is conducive to having all processes attach in a
short span of time.

Let's cross that bridge when we get there. AFAICS it fits all the use cases discussed this far.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to