Hi all While working on BDR, I've run into a situation that I think highlights a limitation of the dynamic bgworker API that should be fixed. Specifically, when the postmaster crashes and restarts shared memory gets cleared but dynamic bgworkers don't get unregistered, and that's a mess.
I have a few suggestions and would like your thoughts / preferences. If I don't hear anything I'll post a patch to add a BGW_UNREGISTER_ON_POSTMASTER_RESTART flag soon. Details: I think we need one of: * An API to enumerate registered BackgroundWorker s and get their BackgroundWorkerHandle s so a restarting extension can clean up its old workers from before the restart by killing and unregistering them using their handles; * Make background workers unconditionally unregistered on postmaster restart. This would probably have been the correct design, but I think it's too late to change as an unconditional default now. * Add a BGW_UNREGISTER_ON_POSTMASTER_CRASH (or whatever) flag that lets extensions tell the postmaster to discard a dynamic bgworker when it restarts and clears shmem. There was some earlier discussion (see below) on that when unregistration on exit code 0 got added, but it didn't make it in the final patch. An API to enumerate bgworkers seems pretty easy to add, and generally useful. The worker lib + funcname + argument + name should be sufficient to identify a worker usefully. Worth having? When it came up before it was bounced because we were too close to releasing 9.4 (heh) for proper API discussion and review. Overall I'd prefer to have a simple flag to unregister workers on postmaster restart, but I think an enumeration API could be generally a useful thing. Note that in the prior thread related to this: http://www.postgresql.org/message-id/534e250f.2060...@2ndquadrant.com I was at that time passing pointers into postmaster-allocated memory directly to bgworkers. That's no longer the case. The same problem exists when you pass an offset into a shared memory array if you can't guarantee that the array is always initialized with the same entries in the same order when the postmaster restarts, though. See in particular: http://www.postgresql.org/message-id/20140416112144.gd17...@awork2.anarazel.de --- Explanation --- The latest BDR extension has a single static bgworker registered at shared_preload_libraries time. This worker launches one dynamic bgworker per database. Those dynamic bgworkers are in turn responsible for launching workers for each connection to another node in the mesh topology (and for various other tasks). They communicate via shared memory blocks, where each worker has an offset into a shared memory array. That's all fine until the postmaster crashes and restarts, zeroing shared memory. The dynamic background workers are killed by the postmaster, but *not* unregistered. Workers only get unregistered if they exit with exit code 0, which isn't the case when they're killed, or when their restart interval is BGW_NO_RESTART . This means that when the workers start they try to access their shared memory blocks via the offsets passed as arguments in the BackgroundWorker struct and find that their shared memory block is zeroed out, so they can do nothing but exit. Or worse, their shared memory block might've since been initialized with data for some other unrelated worker launched after the postmaster restart. Meanwhile the static worker that manages BDR has been restarted and has to set up shared memory and register workers. It doesn't have any way of knowing which workers from before the postmaster restart were registered or what their offsets into shared memory are. So it has to launch a new batch of workers, but it has no way to stop the old workers from seeing the new workers' shared memory blocks as their own. To work around this, at Andres's suggestion I added a "worker generation number" as a postmaster var that gets copied into the BDR shared memory control segment at startup. The generation number is passed to workers as the high 16 bits of their int32 worker argument, with the low 16 bits being their offset into the shared memory array. Workers check the generation number before trying to access their shared memory blocks and proc_exit(0) if they see they're from a previous generation or if the global generation number is zero (indicating shmem got zeroed and hasn't been set up again yet). This works, but it's ugly, especially the need to jam the generation number into the worker argument. I think this problem will be common as adoption of dynamic bgworkers increases and should be fixed in 9.5 if possible. So: comments, preferences? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers