Re: [HACKERS] Making BackgroundWorkerHandle a complete type or offering a worker enumeration API?

2014-12-15 Thread Robert Haas
On Sat, Dec 13, 2014 at 4:13 AM, Craig Ringer cr...@2ndquadrant.com wrote:
 While working on BDR, I've run into a situation that I think highlights
 a limitation of the dynamic bgworker API that should be fixed.
 Specifically, when the postmaster crashes and restarts shared memory
 gets cleared but dynamic bgworkers don't get unregistered, and that's a
 mess.

I've noticed this as well.  What I was thinking of proposing is that
we change things so that a BGW_NEVER_RESTART worker is unregistered
when a crash-and-restart cycle happens, but workers with any other
restart time are retained. What's happened to me a few times is that
the database crashes after registering BGW_NO_RESTART workers but
before the postmaster launches them; the postmaster fires them up
after completing the crash-and-restart cycle, but by then the dynamic
shared memory segments they are supposed to map are gone, so they just
start up uselessly and then die.

 The latest BDR extension has a single static bgworker registered at
 shared_preload_libraries time. This worker launches one dynamic bgworker
 per database. Those dynamic bgworkers are in turn responsible for
 launching workers for each connection to another node in the mesh
 topology (and for various other tasks). They communicate via shared
 memory blocks, where each worker has an offset into a shared memory array.

 That's all fine until the postmaster crashes and restarts, zeroing
 shared memory. The dynamic background workers are killed by the
 postmaster, but *not* unregistered. Workers only get unregistered if
 they exit with exit code 0, which isn't the case when they're killed, or
 when their restart interval is BGW_NO_RESTART .

Maybe it would be best to make the per-database workers BGW_NO_RESTART
and have the static bgworker, rather than the postmaster, be
responsible for starting them.  Then the fix mentioned above would
suffice.

If that's not good for some reason, my second choice is adding a
BGWORKER_UNREGISTER_AFTER_CRASH flag.  That seems much simpler and
less cumbersome than your other proposal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making BackgroundWorkerHandle a complete type or offering a worker enumeration API?

2014-12-15 Thread Craig Ringer
On 12/16/2014 12:12 AM, Robert Haas wrote:
 On Sat, Dec 13, 2014 at 4:13 AM, Craig Ringer cr...@2ndquadrant.com wrote:
 While working on BDR, I've run into a situation that I think highlights
 a limitation of the dynamic bgworker API that should be fixed.
 Specifically, when the postmaster crashes and restarts shared memory
 gets cleared but dynamic bgworkers don't get unregistered, and that's a
 mess.
 
 I've noticed this as well.  What I was thinking of proposing is that
 we change things so that a BGW_NEVER_RESTART worker is unregistered
 when a crash-and-restart cycle happens, but workers with any other
 restart time are retained.

Personally I need workers that get restarted, but are discarded on
crash. They access shared memory, so when shmem is cleared I need them
to be discarded too, but otherwise I wish them to be persistent
until/unless they're intentionally unregistered.

If I have to use BGW_NO_RESTART then I land up having to implement
monitoring of worker status and manual re-registration in a supervisor
static worker. Which is a pain, since it's duplicating work the
postmaster would otherwise just be doing for me.

I'd really rather a separate flag.

 Maybe it would be best to make the per-database workers BGW_NO_RESTART
 and have the static bgworker, rather than the postmaster, be
 responsible for starting them.  Then the fix mentioned above would
 suffice.

Yeah... it would, but it involves a bunch of code that duplicates
process management the postmaster already does.

More importantly, if the supervisor worker crashes / is killed it loses
its handles to the other workers and the signals they send no longer go
to the right worker. So it's not robust.

 If that's not good for some reason, my second choice is adding a
 BGWORKER_UNREGISTER_AFTER_CRASH flag.  That seems much simpler and
 less cumbersome than your other proposal.

That'd be my preference.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making BackgroundWorkerHandle a complete type or offering a worker enumeration API?

2014-12-15 Thread Robert Haas
On Mon, Dec 15, 2014 at 11:28 AM, Craig Ringer cr...@2ndquadrant.com wrote:
 If that's not good for some reason, my second choice is adding a
 BGWORKER_UNREGISTER_AFTER_CRASH flag.  That seems much simpler and
 less cumbersome than your other proposal.

 That'd be my preference.

OK, let's do that, then.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making BackgroundWorkerHandle a complete type or offering a worker enumeration API?

2014-12-15 Thread Craig Ringer
On 12/16/2014 12:31 AM, Robert Haas wrote:
 On Mon, Dec 15, 2014 at 11:28 AM, Craig Ringer cr...@2ndquadrant.com wrote:
  If that's not good for some reason, my second choice is adding a
  BGWORKER_UNREGISTER_AFTER_CRASH flag.  That seems much simpler and
  less cumbersome than your other proposal.
 
  That'd be my preference.
 OK, let's do that, then.

Right-o.

I had an earlier patch that added unregistration on exit(0) and also
added a flag like this. Only the first part got committed. I'll
resurrect it and rebase it on top of master.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers