<http://www.keithf4.com>
On Tue, Apr 21, 2015 at 5:47 PM, Keith Fiske <[email protected]> wrote: > > > > <http://www.keithf4.com> > > On Fri, Apr 10, 2015 at 1:00 PM, Keith Fiske <[email protected]> wrote: > >> >> On Thu, Apr 9, 2015 at 11:56 PM, Craig Ringer <[email protected]> >> wrote: >> >>> >>> >>> On 9 April 2015 at 05:35, Keith Fiske <[email protected]> wrote: >>> >>>> I'm working on a background worker (BGW) for my pg_partman extension. >>>> I've gotten the basics of it working for my first round, but there's two >>>> features I'm missing that I'd like to add before release: >>>> >>>> 1) Only allow one instance of this BGW to run >>>> >>> >>> Load your extension in shared_preload_libraries, so that _PG_init runs >>> in the postmaster. Register a static background worker then. >>> >>> If you need one worker per database (because it needs to access the DB) >>> this won't work for you, though. What we do in BDR is have a single static >>> background worker that's launched by the postmaster, which then launches >>> and terminates per-database workers that do the "real work". >>> >>> Because of a limitation in the bgworker API in releases 9.4 and older, >>> the static worker has to connect to a database if it wants to access shared >>> catalogs like pg_database. This limitation has been lifted in 9.5 though, >>> along with the need to use the database name instead of its oid to connect >>> (which left bgworkers unable to handle RENAME DATABASE). >>> >>> (We still really need a hook on CREATE DATABASE too) >>> >>> 2) Create a bgw_terminate_partman() function to stop it more intuitively >>>> than doing a pg_cancel_backend() on the PID >>>> >>> >>> If you want it to be able to be started/stopped dynamically, you should >>> probably use RequestAddinShmemSpace to allocate a small shared memory >>> block. Use that to register the PGPROC for the current worker when the >>> worker starts, and add a boolean field you can use to ask it to terminate >>> its self. You'll also need a LWLock to protect access to the segment, so >>> you don't have races between a worker starting and the user asking to >>> cancel it, etc. >>> >>> Unfortunately the BackgroundWorkerHandle struct is opaque, so you cannot >>> store it in shared memory when it's returned by >>> RegisterDynamicBackgroundWorker() and use it to later check the worker's >>> status or ask it to exit. You have to use regular backend manipulation >>> functions and PGPROC instead. >>> >>> Personally, I suggest that you leave the worker as a static worker, and >>> leave it always running when the extension is active. If it isn't doing >>> anything, have it sleep on its latch, then set its latch from other >>> processes when something interesting happens. (You can put the process >>> latch from PGPROC into your shmem segment so you can set it from elsewhere, >>> or allocate a new latch). >>> >>> This is my first venture into writing C code for postgres, so I'm not >>>> familiar with a lot of the internals yet. I read >>>> http://www.postgresql.org/docs/9.4/static/bgworker.html and I see it >>>> mentioning how you can check the status of a BGW launched dynamically and >>>> the function to terminate one, but I'm not clear how how you can get the >>>> information on a currently running BGW to do these things. >>>> >>> >>> You can't. It's a pretty significant limitation in the current API. >>> There's no way to enumerate bgworkers via the bgworker API, only via PGPROC. >>> >>> >>>> I used the worker_spi example for a lot of this, so if there's any >>>> additional guidance for a better way to do what I've done, I'd appreciate >>>> it. All I really have it doing now is calling the run_maintenance() >>>> function at a defined interval and don't need it doing more than that yet. >>>> <http://www.keithf4.com> >>>> >>> >>> The BDR project has an extension with much more in-depth use of >>> background workers, but it's probably *too* complicated. We have a static >>> bgworker that launches and terminates dynamic bgworkers (per-database) that >>> in turn launch and terminate more dynamic background workers >>> (per-connection to peer databases). >>> >>> If you're interested, all the code is mirrored on github: >>> >>> https://github.com/2ndquadrant/bdr/tree/bdr-plugin/next >>> >>> and the relevant parts are: >>> >>> https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr.c#L640 >>> https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr_perdb.c >>> https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr_supervisor.c >>> https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr_shmem.c >>> https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr_apply.c#L2401 >>> https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr.h >>> >>> ... but there's a *lot* of code there. >>> >>> -- >>> Craig Ringer http://www.2ndQuadrant.com/ >>> PostgreSQL Development, 24x7 Support, Training & Services >>> >> >> >> Craig, >> >> Thanks for the response! Definitely cleared up a lot of questions I had >> regarding how to interact with currently running BGWs. Glad to know I can >> at least stop banging my head against the desk about that. I've still got a >> lot to learn as far as how to interact with shared memory, but now that I >> know that's the path I have to go down, I'm fine with that. >> >> My current plan now after your response this this: >> >> - Statically launch master BGW with shared_preload_libraries >> - Use dynamically launched BGW for each database that pg_partman will run >> on in the cluster. My previous idea of restricting it to one BGW would >> likely have stopped it from ever working on more than one database in a >> cluster. >> - Will see if I can create a function that polls the cluster for >> currently existing databases that actually have pg_partman installed. This >> should eliminate the need for a GUC naming the databases to run for. Should >> allow handling if a database is renamed as well. This way, as soon as the >> extension is created on a database, it should hopefully "just work" and >> start managing it. >> >> 9.4 is my targeted release to support for a while, so I'll just have to >> deal with the shortcomings you mentioned. Does the above sound like it >> could work then? >> >> -- >> Keith Fiske >> Database Administrator >> OmniTI Computer Consulting, Inc. >> http://www.keithf4.com >> >> > So I've made some progress and thought I'd found a way around requiring > the use of shared memory (for now). > > https://gist.github.com/keithf4/3dbda40f7f941ada7ffe > > The idea is there will be a main BGW that is always running in a loop. At > the interval time period, the main loop will parse a GUC variable to get a > list of databases and launch a dynamic BGW for each one to run the > partition maintenance if pg_partman is installed. > > I looked up the method that the shared_preload_libraries GUC uses to parse > a csv list of values starting on line 238 with making a copy of the GUC > variable to parse out > > https://gist.github.com/keithf4/3dbda40f7f941ada7ffe#file-pg_partman_bgw-c-L238 > > Everything within the pg_partman_bgw_main() function can read the value > fine, all the way to the point where it assigns the dbname variable here > > https://gist.github.com/keithf4/3dbda40f7f941ada7ffe#file-pg_partman_bgw-c-L251 > > However, once I try and pass this variable to the dynamic BGW function, it > seems to lose access to it. An example from my postgres logs: > > 2015-04-21 17:18:57 EDT [] [6866]: [6-1] user=,db=,e=00000 LOG: starting > background worker process "pg_partman master background worker" > 2015-04-21 17:18:57 EDT [] [6866]: [7-1] user=,db=,e=00000 LOG: database > system is ready to accept connections > 2015-04-21 17:18:57 EDT [] [6875]: [1-1] user=,db=,e=00000 LOG: > pg_partman master background worker master process initialized with role > keith on database template1 > 2015-04-21 17:18:57 EDT [] [6872]: [1-1] user=,db=,e=00000 LOG: > autovacuum launcher started > 2015-04-21 17:19:17 EDT [] [6875]: [2-1] user=,db=,e=00000 DEBUG: Dynamic > bgw launch begun for keith > 2015-04-21 17:19:17 EDT [] [6866]: [8-1] user=,db=,e=00000 LOG: > registering background worker "pg_partman dynamic background worker > (dbname=keith)" > 2015-04-21 17:19:17 EDT [] [6866]: [9-1] user=,db=,e=00000 LOG: starting > background worker process "pg_partman dynamic background worker > (dbname=keith)" > 2015-04-21 17:19:17 EDT [] [6877]: [1-1] user=,db=,e=00000 DEBUG: Before > run_maint initialize connection for db @òö > 2015-04-21 17:19:17 EDT [] [6877]: [2-1] user=,db=,e=3D000 FATAL: > database " @òö " does not exist > 2015-04-21 17:19:17 EDT [] [6866]: [10-1] user=,db=,e=00000 LOG: worker > process: pg_partman dynamic background worker (dbname=keith) (PID 6877) > exited with exit code 1 > 2015-04-21 17:19:17 EDT [] [6866]: [11-1] user=,db=,e=00000 LOG: > unregistering background worker "pg_partman dynamic background worker > (dbname=keith)" > > You can see the first line of the pg_partman_bgw_run_maint() function just > sees the variable as junk > > https://gist.github.com/keithf4/3dbda40f7f941ada7ffe#file-pg_partman_bgw-c-L312 > > Doing some debugging, I've narrowed down the issue back to where the > pstrdup() makes a copy of the GUC variable. If I just have a single, valid > database set for pg_partman_bgw.dbname like this in postgresql.conf > > pg_partman_bgw.dbname = 'keith' > > And I then replace this line > > https://gist.github.com/keithf4/3dbda40f7f941ada7ffe#file-pg_partman_bgw-c-L262 > with > worker.bgw_main_arg = CStringGetDatum(pg_partman_bgw_dbname); > everything works fine > > 2015-04-21 17:40:18 EDT [] [7113]: [8-1] user=,db=,e=00000 LOG: > registering background worker "pg_partman dynamic background worker > (dbname=keith)" > 2015-04-21 17:40:18 EDT [] [7113]: [9-1] user=,db=,e=00000 LOG: starting > background worker process "pg_partman dynamic background worker > (dbname=keith)" > 2015-04-21 17:40:18 EDT [] [7127]: [1-1] user=,db=,e=00000 DEBUG: Before > run_maint initialize connection for db keith > 2015-04-21 17:40:18 EDT [] [7127]: [2-1] user=,db=,e=00000 DEBUG: After > run_maint initialize connection for db keith > 2015-04-21 17:40:18 EDT [] [7127]: [3-1] user=,db=,e=00000 DEBUG: > Checking if pg_partman extension is installed in database: keith > 2015-04-21 17:40:18 EDT [] [7127]: [4-1] user=,db=,e=00000 LOG: > pg_partman dynamic background worker (dbname=keith) dynamic background > worker initialized with role keith on database keith > 2015-04-21 17:40:18 EDT [] [7127]: [5-1] user=,db=,e=00000 LOG: > pg_partman dynamic background worker (dbname=keith): SELECT > partman.run_maintenance(p_analyze := true, p_jobmon := false) called by > role keith on database keith > 2015-04-21 17:40:18 EDT [] [7113]: [10-1] user=,db=,e=00000 LOG: worker > process: pg_partman dynamic background worker (dbname=keith) (PID 7127) > exited with exit code 0 > 2015-04-21 17:40:18 EDT [] [7113]: [11-1] user=,db=,e=00000 LOG: > unregistering background worker "pg_partman dynamic background worker > (dbname=keith)" > > Obviously I want to be able to support more than a single database so I > need a method of splitting the GUC variable. I even tried using a simpler > method a coworker showed me using strtok_r() instead of this List method, > but had the same result. All the examples in other tools I've seen for the > bgw_main_arg value have been Int32 values (including your BDR tool). Is > there any way to pass a string as the argument bgw_main_arg? > > -- > Keith Fiske > Database Administrator > OmniTI Computer Consulting, Inc. > http://www.keithf4.com > Actually, to clarify that last question, is there any way to pass a non-global string variable, or one defined in the main BGW function to another BGW? I tested passing a string I defined in pg_partman_bgw_main() to the bgw_main_arg and the same junk resulted (and yes, I'm sure I passed-by-reference). I can only guess that the reason the pg_partman_bgw_dbname GUC works is because of the context that GUC's are defined in? Keith
