<http://www.keithf4.com>
On Fri, Apr 10, 2015 at 1:00 PM, Keith Fiske <ke...@omniti.com> wrote: > > On Thu, Apr 9, 2015 at 11:56 PM, Craig Ringer <cr...@2ndquadrant.com> > wrote: > >> >> >> On 9 April 2015 at 05:35, Keith Fiske <ke...@omniti.com> wrote: >> >>> I'm working on a background worker (BGW) for my pg_partman extension. >>> I've gotten the basics of it working for my first round, but there's two >>> features I'm missing that I'd like to add before release: >>> >>> 1) Only allow one instance of this BGW to run >>> >> >> Load your extension in shared_preload_libraries, so that _PG_init runs in >> the postmaster. Register a static background worker then. >> >> If you need one worker per database (because it needs to access the DB) >> this won't work for you, though. What we do in BDR is have a single static >> background worker that's launched by the postmaster, which then launches >> and terminates per-database workers that do the "real work". >> >> Because of a limitation in the bgworker API in releases 9.4 and older, >> the static worker has to connect to a database if it wants to access shared >> catalogs like pg_database. This limitation has been lifted in 9.5 though, >> along with the need to use the database name instead of its oid to connect >> (which left bgworkers unable to handle RENAME DATABASE). >> >> (We still really need a hook on CREATE DATABASE too) >> >> 2) Create a bgw_terminate_partman() function to stop it more intuitively >>> than doing a pg_cancel_backend() on the PID >>> >> >> If you want it to be able to be started/stopped dynamically, you should >> probably use RequestAddinShmemSpace to allocate a small shared memory >> block. Use that to register the PGPROC for the current worker when the >> worker starts, and add a boolean field you can use to ask it to terminate >> its self. You'll also need a LWLock to protect access to the segment, so >> you don't have races between a worker starting and the user asking to >> cancel it, etc. >> >> Unfortunately the BackgroundWorkerHandle struct is opaque, so you cannot >> store it in shared memory when it's returned by >> RegisterDynamicBackgroundWorker() and use it to later check the worker's >> status or ask it to exit. You have to use regular backend manipulation >> functions and PGPROC instead. >> >> Personally, I suggest that you leave the worker as a static worker, and >> leave it always running when the extension is active. If it isn't doing >> anything, have it sleep on its latch, then set its latch from other >> processes when something interesting happens. (You can put the process >> latch from PGPROC into your shmem segment so you can set it from elsewhere, >> or allocate a new latch). >> >> This is my first venture into writing C code for postgres, so I'm not >>> familiar with a lot of the internals yet. I read >>> http://www.postgresql.org/docs/9.4/static/bgworker.html and I see it >>> mentioning how you can check the status of a BGW launched dynamically and >>> the function to terminate one, but I'm not clear how how you can get the >>> information on a currently running BGW to do these things. >>> >> >> You can't. It's a pretty significant limitation in the current API. >> There's no way to enumerate bgworkers via the bgworker API, only via PGPROC. >> >> >>> I used the worker_spi example for a lot of this, so if there's any >>> additional guidance for a better way to do what I've done, I'd appreciate >>> it. All I really have it doing now is calling the run_maintenance() >>> function at a defined interval and don't need it doing more than that yet. >>> <http://www.keithf4.com> >>> >> >> The BDR project has an extension with much more in-depth use of >> background workers, but it's probably *too* complicated. We have a static >> bgworker that launches and terminates dynamic bgworkers (per-database) that >> in turn launch and terminate more dynamic background workers >> (per-connection to peer databases). >> >> If you're interested, all the code is mirrored on github: >> >> https://github.com/2ndquadrant/bdr/tree/bdr-plugin/next >> >> and the relevant parts are: >> >> https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr.c#L640 >> https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr_perdb.c >> https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr_supervisor.c >> https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr_shmem.c >> https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr_apply.c#L2401 >> https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr.h >> >> ... but there's a *lot* of code there. >> >> -- >> Craig Ringer http://www.2ndQuadrant.com/ >> PostgreSQL Development, 24x7 Support, Training & Services >> > > > Craig, > > Thanks for the response! Definitely cleared up a lot of questions I had > regarding how to interact with currently running BGWs. Glad to know I can > at least stop banging my head against the desk about that. I've still got a > lot to learn as far as how to interact with shared memory, but now that I > know that's the path I have to go down, I'm fine with that. > > My current plan now after your response this this: > > - Statically launch master BGW with shared_preload_libraries > - Use dynamically launched BGW for each database that pg_partman will run > on in the cluster. My previous idea of restricting it to one BGW would > likely have stopped it from ever working on more than one database in a > cluster. > - Will see if I can create a function that polls the cluster for currently > existing databases that actually have pg_partman installed. This should > eliminate the need for a GUC naming the databases to run for. Should allow > handling if a database is renamed as well. This way, as soon as the > extension is created on a database, it should hopefully "just work" and > start managing it. > > 9.4 is my targeted release to support for a while, so I'll just have to > deal with the shortcomings you mentioned. Does the above sound like it > could work then? > > -- > Keith Fiske > Database Administrator > OmniTI Computer Consulting, Inc. > http://www.keithf4.com > > So I've made some progress and thought I'd found a way around requiring the use of shared memory (for now). https://gist.github.com/keithf4/3dbda40f7f941ada7ffe The idea is there will be a main BGW that is always running in a loop. At the interval time period, the main loop will parse a GUC variable to get a list of databases and launch a dynamic BGW for each one to run the partition maintenance if pg_partman is installed. I looked up the method that the shared_preload_libraries GUC uses to parse a csv list of values starting on line 238 with making a copy of the GUC variable to parse out https://gist.github.com/keithf4/3dbda40f7f941ada7ffe#file-pg_partman_bgw-c-L238 Everything within the pg_partman_bgw_main() function can read the value fine, all the way to the point where it assigns the dbname variable here https://gist.github.com/keithf4/3dbda40f7f941ada7ffe#file-pg_partman_bgw-c-L251 However, once I try and pass this variable to the dynamic BGW function, it seems to lose access to it. An example from my postgres logs: 2015-04-21 17:18:57 EDT [] [6866]: [6-1] user=,db=,e=00000 LOG: starting background worker process "pg_partman master background worker" 2015-04-21 17:18:57 EDT [] [6866]: [7-1] user=,db=,e=00000 LOG: database system is ready to accept connections 2015-04-21 17:18:57 EDT [] [6875]: [1-1] user=,db=,e=00000 LOG: pg_partman master background worker master process initialized with role keith on database template1 2015-04-21 17:18:57 EDT [] [6872]: [1-1] user=,db=,e=00000 LOG: autovacuum launcher started 2015-04-21 17:19:17 EDT [] [6875]: [2-1] user=,db=,e=00000 DEBUG: Dynamic bgw launch begun for keith 2015-04-21 17:19:17 EDT [] [6866]: [8-1] user=,db=,e=00000 LOG: registering background worker "pg_partman dynamic background worker (dbname=keith)" 2015-04-21 17:19:17 EDT [] [6866]: [9-1] user=,db=,e=00000 LOG: starting background worker process "pg_partman dynamic background worker (dbname=keith)" 2015-04-21 17:19:17 EDT [] [6877]: [1-1] user=,db=,e=00000 DEBUG: Before run_maint initialize connection for db @òö 2015-04-21 17:19:17 EDT [] [6877]: [2-1] user=,db=,e=3D000 FATAL: database " @òö " does not exist 2015-04-21 17:19:17 EDT [] [6866]: [10-1] user=,db=,e=00000 LOG: worker process: pg_partman dynamic background worker (dbname=keith) (PID 6877) exited with exit code 1 2015-04-21 17:19:17 EDT [] [6866]: [11-1] user=,db=,e=00000 LOG: unregistering background worker "pg_partman dynamic background worker (dbname=keith)" You can see the first line of the pg_partman_bgw_run_maint() function just sees the variable as junk https://gist.github.com/keithf4/3dbda40f7f941ada7ffe#file-pg_partman_bgw-c-L312 Doing some debugging, I've narrowed down the issue back to where the pstrdup() makes a copy of the GUC variable. If I just have a single, valid database set for pg_partman_bgw.dbname like this in postgresql.conf pg_partman_bgw.dbname = 'keith' And I then replace this line https://gist.github.com/keithf4/3dbda40f7f941ada7ffe#file-pg_partman_bgw-c-L262 with worker.bgw_main_arg = CStringGetDatum(pg_partman_bgw_dbname); everything works fine 2015-04-21 17:40:18 EDT [] [7113]: [8-1] user=,db=,e=00000 LOG: registering background worker "pg_partman dynamic background worker (dbname=keith)" 2015-04-21 17:40:18 EDT [] [7113]: [9-1] user=,db=,e=00000 LOG: starting background worker process "pg_partman dynamic background worker (dbname=keith)" 2015-04-21 17:40:18 EDT [] [7127]: [1-1] user=,db=,e=00000 DEBUG: Before run_maint initialize connection for db keith 2015-04-21 17:40:18 EDT [] [7127]: [2-1] user=,db=,e=00000 DEBUG: After run_maint initialize connection for db keith 2015-04-21 17:40:18 EDT [] [7127]: [3-1] user=,db=,e=00000 DEBUG: Checking if pg_partman extension is installed in database: keith 2015-04-21 17:40:18 EDT [] [7127]: [4-1] user=,db=,e=00000 LOG: pg_partman dynamic background worker (dbname=keith) dynamic background worker initialized with role keith on database keith 2015-04-21 17:40:18 EDT [] [7127]: [5-1] user=,db=,e=00000 LOG: pg_partman dynamic background worker (dbname=keith): SELECT partman.run_maintenance(p_analyze := true, p_jobmon := false) called by role keith on database keith 2015-04-21 17:40:18 EDT [] [7113]: [10-1] user=,db=,e=00000 LOG: worker process: pg_partman dynamic background worker (dbname=keith) (PID 7127) exited with exit code 0 2015-04-21 17:40:18 EDT [] [7113]: [11-1] user=,db=,e=00000 LOG: unregistering background worker "pg_partman dynamic background worker (dbname=keith)" Obviously I want to be able to support more than a single database so I need a method of splitting the GUC variable. I even tried using a simpler method a coworker showed me using strtok_r() instead of this List method, but had the same result. All the examples in other tools I've seen for the bgw_main_arg value have been Int32 values (including your BDR tool). Is there any way to pass a string as the argument bgw_main_arg? -- Keith Fiske Database Administrator OmniTI Computer Consulting, Inc. http://www.keithf4.com