Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-05 Thread Alvaro Herrera
Here's a first attempt at a new documentation chapter.  This goes in
part Server Programming, just after the SPI chapter.

I just noticed that worker_spi could use some more sample code, for
example auth_counter was getting its own LWLock and also its own shmem
area, which would be helpful to demonstrate I think.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
!-- doc/src/sgml/bgworker.sgml --

chapter id=bgworker
 titleBackground Worker Processes/title

 indexterm zone=bgworker
  primaryProcesses/primary
  secondaryAdditional/secondary
 /indexterm

 para
  PostgreSQL can be extended to run user-supplied code in separate processes.
  Such processes are started, stopped and monitored by 
commandpostgres/command,
  which permits them have a lifetime closely linked to the server's status.
  These processes have the option to attach to productnamePostgreSQL/'s
  shared memory area and connect to databases internally.
 /para

 warning
  para
   There are considerable robustness and security risks in using background
   worker processes, because them being written in the literalC/ language
   gives them unrestricted access to data.  Administrators wishing to enable
   modules that include background worker process should exercise extreme
   caution.  Only carefully audited modules should be permitted to run
   background worker processes.
  /para
 /warning

 para
  Only modules listed in varnameshared_preload_libraries/ can run
  background workers.  A module wishing to register a background worker needs
  to register it by calling
  functionRegisterBackgroundWorker(typeBackgroundWorker 
*worker/type)/function.
  The structure structnameBackgroundWorker/structname is defined thus:
programlisting
typedef struct BackgroundWorker
{
char   *bgw_name;
int bgw_flags;
BgWorkerStartTime bgw_start_time;
int bgw_restart_time;   /* in seconds, or BGW_NEVER_RESTART */
bgworker_main_type  bgw_main;
void   *bgw_main_arg;
bgworker_sighdlr_type bgw_sighup;
bgworker_sighdlr_type bgw_sigterm;
} BackgroundWorker;
/programlisting
  /para

  para
   structfieldbgw_name/ is a string to be used in log messages, process
   lists and similar contexts.
  /para

  para
   structfieldbgw_flags/ is a bitwise-or'd bitmask indicating the
   capabilities that the module would like to have.  Possible values are
   literalBGWORKER_SHMEM_ACCESS/literal (requesting shared memory access)
   and literalBGWORKER_BACKEND_DATABASE_CONNECTION/literal (requesting the
   ability to establish a database connection, through which it can later run
   transactions and queries).
  /para

  para
   structfieldbgw_start_time/structfield is the server state during which
   commandpostgres/ should start the process; it can be one of
   literalBgWorkerStart_PostmasterStart/ (start as soon as
   commandpostgres/ itself has finished its own initialization; processes
   requesting this are not eligible for database connections),
   literalBgWorkerStart_ConsistentState/ (start as soon as consistent state
   has been reached in a HOT standby, allowing processes to connect to
   databases and run read-only queries), and
   literalBgWorkerStart_RecoveryFinished/ (start as soon as the system has
   entered normal read-write state).  Note the last two values are equivalent
   in a server that's not a HOT standby.
  /para
  
  para
   structfieldbgw_restart_time/structfield is the interval, in seconds, that
   commandpostgres/command should wait before restarting the process,
   in case it crashes.  It can be any positive value, or 
literalBGW_NEVER_RESTART/literal, indicating not to restart the process in 
case of a crash.
  /para

  para
   structfieldbgw_main/structfield is a pointer to the function to run once
   the process is started.  structfieldbgw_main_arg/structfield will be
   passed to it as its only argument.  Note that
   literalMyBgworkerEntry/literal is a pointer to a copy of the
   structnameBackgroundWorker/structname structure passed
   at registration time.
  /para

  para
   structfieldbgw_sighup/structfield and structfieldbgw_sigterm/ are
   pointers to functions that will be installed as signal handlers for the new
   process.
  /para

  paraOnce running, the process can connect to a database by calling
   functionBackgroundWorkerInitializeConnection(parameterchar 
*dbname/parameter, parameterchar *username/parameter)/function.
   This allows the process to run transactions and queries using the
   literalSPI/literal interface.  If varnamedbname/ is NULL,
   the session is not connected to any particular database, but shared catalogs
   can be accessed.  If varnameusername/ is NULL, the process will run as
   the superuser created during commandinitdb/.
  /para

  para
   Signals are initially blocked when control reaches the
   structfieldbgw_main/ function, and must be unblocked by it; this is to
   allow the process to 

Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-05 Thread Andres Freund
On 2012-12-05 12:09:58 -0300, Alvaro Herrera wrote:
 Here's a first attempt at a new documentation chapter.  This goes in
 part Server Programming, just after the SPI chapter.

 I just noticed that worker_spi could use some more sample code, for
 example auth_counter was getting its own LWLock and also its own shmem
 area, which would be helpful to demonstrate I think.

I am not exactly renowned for my english skills, but I have made a pass
over the file made some slight changes and extended it in two places.
I've also added a comment with a question that came to my mind when
reading the docs...

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
--- /tmp/bgworker.old.sgml	2012-12-05 22:12:00.220609454 +0100
+++ /tmp/bgworker.sgml	2012-12-05 22:26:11.604712943 +0100
@@ -11,16 +11,16 @@
  para
   PostgreSQL can be extended to run user-supplied code in separate processes.
   Such processes are started, stopped and monitored by commandpostgres/command,
-  which permits them have a lifetime closely linked to the server's status.
+  which permits them to have a lifetime closely linked to the server's status.
   These processes have the option to attach to productnamePostgreSQL/'s
-  shared memory area and connect to databases internally.
+  shared memory area and to connect to databases internally.
  /para
 
  warning
   para
There are considerable robustness and security risks in using background
-   worker processes, because them being written in the literalC/ language
-   gives them unrestricted access to data.  Administrators wishing to enable
+   worker processes because, being written in the literalC/ language,
+   they have unrestricted access to data.  Administrators wishing to enable
modules that include background worker process should exercise extreme
caution.  Only carefully audited modules should be permitted to run
background worker processes.
@@ -31,7 +31,8 @@
   Only modules listed in varnameshared_preload_libraries/ can run
   background workers.  A module wishing to register a background worker needs
   to register it by calling
-  functionRegisterBackgroundWorker(typeBackgroundWorker *worker/type)/function.
+  functionRegisterBackgroundWorker(typeBackgroundWorker *worker/type)/function
+  from it's function_PG_Init()/.
   The structure structnameBackgroundWorker/structname is defined thus:
 programlisting
 typedef struct BackgroundWorker
@@ -50,12 +51,12 @@
 
   para
structfieldbgw_name/ is a string to be used in log messages, process
-   lists and similar contexts.
+   listings and similar contexts.
   /para
 
   para
structfieldbgw_flags/ is a bitwise-or'd bitmask indicating the
-   capabilities that the module would like to have.  Possible values are
+   capabilities that the module wants.  Possible values are
literalBGWORKER_SHMEM_ACCESS/literal (requesting shared memory access)
and literalBGWORKER_BACKEND_DATABASE_CONNECTION/literal (requesting the
ability to establish a database connection, through which it can later run
@@ -68,33 +69,35 @@
literalBgWorkerStart_PostmasterStart/ (start as soon as
commandpostgres/ itself has finished its own initialization; processes
requesting this are not eligible for database connections),
-   literalBgWorkerStart_ConsistentState/ (start as soon as consistent state
+   literalBgWorkerStart_ConsistentState/ (start as soon as a consistent state
has been reached in a HOT standby, allowing processes to connect to
databases and run read-only queries), and
literalBgWorkerStart_RecoveryFinished/ (start as soon as the system has
entered normal read-write state).  Note the last two values are equivalent
in a server that's not a HOT standby.
   /para
-  
+
   para
structfieldbgw_restart_time/structfield is the interval, in seconds, that
-   commandpostgres/command should wait before restarting the process,
-   in case it crashes.  It can be any positive value, or literalBGW_NEVER_RESTART/literal, indicating not to restart the process in case of a crash.
+   commandpostgres/command should wait before restarting the process, in
+   case it crashes.  It can be any positive value,
+   or literalBGW_NEVER_RESTART/literal, indicating not to restart the
+   process in case of a crash.
   /para
 
   para
structfieldbgw_main/structfield is a pointer to the function to run once
the process is started.  structfieldbgw_main_arg/structfield will be
-   passed to it as its only argument.  Note that
-   literalMyBgworkerEntry/literal is a pointer to a copy of the
-   structnameBackgroundWorker/structname structure passed
-   at registration time.
+   passed to it as its only argument.  Note that the global variable
+   literalMyBgworkerEntry/literal points to a copy of the
+   structnameBackgroundWorker/structname structure passed at registration
+   time.
   /para
 
   para

Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-05 Thread Alvaro Herrera
Andres Freund wrote:
 On 2012-12-05 12:09:58 -0300, Alvaro Herrera wrote:
  Here's a first attempt at a new documentation chapter.  This goes in
  part Server Programming, just after the SPI chapter.
 
  I just noticed that worker_spi could use some more sample code, for
  example auth_counter was getting its own LWLock and also its own shmem
  area, which would be helpful to demonstrate I think.
 
 I am not exactly renowned for my english skills, but I have made a pass
 over the file made some slight changes and extended it in two places.

Thanks, I have applied it.

 I've also added a comment with a question that came to my mind when
 reading the docs...

para
 structfieldbgw_sighup/structfield and structfieldbgw_sigterm/ are
 pointers to functions that will be installed as signal handlers for the 
 new
 -   process.
 +   process. XXX: Can they be NULL?
/para

Hm.  The code doesn't check, so what happens is probably a bug anyhow.
I don't know whether sigaction crashes in this case; its manpage doesn't
say.  I guess the right thing to do is have RegisterBackgroundWorker
check for a NULL sighandler, and set it to something standard if so (say
SIG_IGN for SIGHUP and maybe quickdie() or similar for SIGTERM).

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-05 Thread Simon Riggs
On 5 December 2012 15:09, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Here's a first attempt at a new documentation chapter.  This goes in
 part Server Programming, just after the SPI chapter.

 I just noticed that worker_spi could use some more sample code, for
 example auth_counter was getting its own LWLock and also its own shmem
 area, which would be helpful to demonstrate I think.

to run once - to run when

Prefer
BgWorkerStart_ConsistentState to be renamed to BgWorkerRun_InHotStandby
BgWorkerStart_RecoveryFinished to be renamed to BgWorkerRun_InNormalMode

presumably a process will shutdown if (BgWorkerRun_InHotStandby 
!BgWorkerRun_InNormalMode)


-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-05 Thread Alvaro Herrera
Simon Riggs wrote:
 On 5 December 2012 15:09, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
  Here's a first attempt at a new documentation chapter.  This goes in
  part Server Programming, just after the SPI chapter.
 
  I just noticed that worker_spi could use some more sample code, for
  example auth_counter was getting its own LWLock and also its own shmem
  area, which would be helpful to demonstrate I think.
 
 to run once - to run when

Thanks.

 Prefer
 BgWorkerStart_ConsistentState to be renamed to BgWorkerRun_InHotStandby
 BgWorkerStart_RecoveryFinished to be renamed to BgWorkerRun_InNormalMode
 
 presumably a process will shutdown if (BgWorkerRun_InHotStandby 
 !BgWorkerRun_InNormalMode)

Hmm, no, I haven't considered that.  You mean that a bgworker that
specifies to start at BgWorkerStart_ConsistentState will stop once
normal mode is reached?  Currently they don't do that.  And since we
don't have the notion that workers stop working, it wouldn't work --
postmaster would start them back up immediately.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-05 Thread Andres Freund
On 2012-12-05 19:03:44 -0300, Alvaro Herrera wrote:
 Simon Riggs wrote:
  Prefer
  BgWorkerStart_ConsistentState to be renamed to BgWorkerRun_InHotStandby
  BgWorkerStart_RecoveryFinished to be renamed to BgWorkerRun_InNormalMode
 
  presumably a process will shutdown if (BgWorkerRun_InHotStandby 
  !BgWorkerRun_InNormalMode)

 Hmm, no, I haven't considered that.  You mean that a bgworker that
 specifies to start at BgWorkerStart_ConsistentState will stop once
 normal mode is reached?  Currently they don't do that.  And since we
 don't have the notion that workers stop working, it wouldn't work --
 postmaster would start them back up immediately.

I personally don't see too much demand for this from a use-case
perspective. Simon, did you have anything in mind that made you ask
this?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-05 Thread Andres Freund
On 2012-12-05 18:42:42 -0300, Alvaro Herrera wrote:
 para
  structfieldbgw_sighup/structfield and structfieldbgw_sigterm/ 
  are
  pointers to functions that will be installed as signal handlers for the 
  new
  -   process.
  +   process. XXX: Can they be NULL?
 /para

 Hm.  The code doesn't check, so what happens is probably a bug anyhow.
 I don't know whether sigaction crashes in this case; its manpage doesn't
 say.  I guess the right thing to do is have RegisterBackgroundWorker
 check for a NULL sighandler, and set it to something standard if so (say
 SIG_IGN for SIGHUP and maybe quickdie() or similar for SIGTERM).

Afair a NULL sigaction is used to query the current handler. Which
indirectly might lead to problems due to the wrong handler being called.

Setting up SIG_IGN and quickdie in that case seems to be sensible.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-05 Thread Simon Riggs
On 5 December 2012 22:22, Andres Freund and...@2ndquadrant.com wrote:
 On 2012-12-05 19:03:44 -0300, Alvaro Herrera wrote:
 Simon Riggs wrote:
  Prefer
  BgWorkerStart_ConsistentState to be renamed to BgWorkerRun_InHotStandby
  BgWorkerStart_RecoveryFinished to be renamed to BgWorkerRun_InNormalMode
 
  presumably a process will shutdown if (BgWorkerRun_InHotStandby 
  !BgWorkerRun_InNormalMode)

 Hmm, no, I haven't considered that.  You mean that a bgworker that
 specifies to start at BgWorkerStart_ConsistentState will stop once
 normal mode is reached?  Currently they don't do that.  And since we
 don't have the notion that workers stop working, it wouldn't work --
 postmaster would start them back up immediately.

 I personally don't see too much demand for this from a use-case
 perspective. Simon, did you have anything in mind that made you ask
 this?

Just clarifying how it worked, for the docs; happy with the way it is.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-04 Thread Markus Wanner
On 12/03/2012 10:35 PM, Alvaro Herrera wrote:
 So here's version 8.  This fixes a couple of bugs and most notably
 creates a separate PGPROC list for bgworkers, so that they don't
 interfere with client-connected backends.

Nice, thanks. I'll try to review this again, shortly.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-04 Thread Alvaro Herrera
Alvaro Herrera wrote:

 One notable thing is that I had to introduce this in the postmaster
 startup sequence:
 
 /*
  * process any libraries that should be preloaded at postmaster start
  */
 process_shared_preload_libraries();
 
 /*
  * If loadable modules have added background workers, MaxBackends needs to
  * be updated.  Do so now.
  */
 // RerunAssignHook(max_connections);
 if (GetNumShmemAttachedBgworkers()  0)
 SetConfigOption(max_connections,
 GetConfigOption(max_connections, false, false),
 PGC_POSTMASTER, PGC_S_OVERRIDE);
 
 Note the intention here is to re-run the GUC assign hook for
 max_connections (hence the commented out hypothetical call to do so).
 Obviously, having to go through GetConfigOption and SetConfigOption is
 not a nice thing to do; we'll have to add some new entry point to guc.c
 for this to have a nicer interface.

After fooling with guc.c to create such a new entry point, I decided
that it looks too ugly.  guc.c is already complex enough with the API we
have today that I don't want to be seen creating a partially-duplicate
interface, even if it is going to result in simplified processing of
this one place.  If we ever get around to needing another place to
require rerunning a variable's assign_hook we can discuss it; for now it
doesn't seem worth it.

This is only called at postmaster start time, so it's not too
performance-sensitive, hence SetConfigOption( .., GetConfigOption(), ..)
seems acceptable from that POV.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-03 Thread Alvaro Herrera
Markus Wanner wrote:
 On 11/28/2012 03:51 PM, Alvaro Herrera wrote:
  I remember your patchset.  I didn't look at it for this round, for no
  particular reason.  I did look at KaiGai's submission from two
  commitfests ago, and also at a patch from Simon which AFAIK was never
  published openly.  Simon's patch merged the autovacuum code to make
  autovac workers behave like bgworkers as handled by his patch, just like
  you suggest.  To me it didn't look all that good; I didn't have the guts
  for that, hence the separation.  If later bgworkers are found to work
  well enough, we can port autovac workers to use this framework, but
  for now it seems to me that the conservative thing is to leave autovac
  untouched.
 
 Hm.. interesting to see how the same idea grows into different patches
 and gets refined until we end up with something considered committable.
 
 Do you remember anything in particular that didn't look good? Would you
 help reviewing a patch on top of bgworker-7 that changed autovacuum into
 a bgworker?

I'm not really that interested in it currently ... and there are enough
other patches to review.  I would like bgworkers to mature a bit more
and get some heavy real world testing before we change autovacuum.

  How are you envisioning that the requests would occur?
 
 Just like av_launcher does it now: set a flag in shared memory and
 signal the postmaster (PMSIGNAL_START_AUTOVAC_WORKER).

I'm not sure how this works.  What process is in charge of setting such
a flag?

  In assign_maxconnections, et al, GetNumRegisteredBackgroundWorkers() is
  used in relation to MAX_BACKENDS or to calculate MaxBackends. That seems
  to leak the bgworkers that registered with neither
  BGWORKER_SHMEM_ACCESS nor BGWORKER_BACKEND_DATABASE_CONNECTION set. Or
  is there any reason to discount such extra daemon processes?
  
  No, I purposefully let those out, because those don't need a PGPROC.  In
  fact that seems to me to be the whole point of non-shmem-connected
  workers: you can have as many as you like and they won't cause a
  backend-side impact.  You can use such a worker to connect via libpq to
  the server, for example.
 
 Hm.. well, in that case, the shmem-connected ones are *not* counted. If
 I create and run an extensions that registers 100 shmem-connected
 bgworkers, I cannot connect to a system with max_connections=100
 anymore, because bgworkers then occupy all of the connections, already.

This is actually a very silly bug: it turns out that guc.c obtains the
count of workers before workers have actually registered.  So this
necessitates some reshuffling.

 Or put another way: max_connections should always be the
 max number of *client* connections the DBA wants to allow.

Completely agreed.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-03 Thread Markus Wanner
On 12/03/2012 03:28 PM, Alvaro Herrera wrote:
 I'm not really that interested in it currently ... and there are enough
 other patches to review.  I would like bgworkers to mature a bit more
 and get some heavy real world testing before we change autovacuum.

Understood.

 Just like av_launcher does it now: set a flag in shared memory and
 signal the postmaster (PMSIGNAL_START_AUTOVAC_WORKER).
 
 I'm not sure how this works.  What process is in charge of setting such
 a flag?

The only process that currently starts background workers ... ehm ...
autovacuum workers is the autovacuum launcher. It uses the above
Postmaster Signal in autovacuum.c:do_start_autovacuum_worker() to have
the postmaster launch bg/autovac workers on demand.

(And yes, this kind of is an exception to the rule that the postmaster
must not rely on shared memory. However, these are just flags we write
atomically, see pmsignal.[ch])

I'd like to extend that, so that other processes can request to start
(pre-registered) background workers more dynamically. I'll wait for an
update of your patch, though.

 This is actually a very silly bug: it turns out that guc.c obtains the
 count of workers before workers have actually registered.  So this
 necessitates some reshuffling.

Okay, thanks.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-03 Thread Simon Riggs
On 3 December 2012 15:17, Markus Wanner mar...@bluegap.ch wrote:

 Just like av_launcher does it now: set a flag in shared memory and
 signal the postmaster (PMSIGNAL_START_AUTOVAC_WORKER).

 I'm not sure how this works.  What process is in charge of setting such
 a flag?

 The only process that currently starts background workers ... ehm ...
 autovacuum workers is the autovacuum launcher. It uses the above
 Postmaster Signal in autovacuum.c:do_start_autovacuum_worker() to have
 the postmaster launch bg/autovac workers on demand.

My understanding was that the patch keep autovac workers and
background workers separate at this point.

Is there anything to be gained *now* from merging those two concepts?
I saw that as refactoring that can occur once we are happy it should
take place, but isn't necessary.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-03 Thread Alvaro Herrera
Markus Wanner wrote:
 On 12/03/2012 03:28 PM, Alvaro Herrera wrote:

  Just like av_launcher does it now: set a flag in shared memory and
  signal the postmaster (PMSIGNAL_START_AUTOVAC_WORKER).
  
  I'm not sure how this works.  What process is in charge of setting such
  a flag?
 
 The only process that currently starts background workers ... ehm ...
 autovacuum workers is the autovacuum launcher. It uses the above
 Postmaster Signal in autovacuum.c:do_start_autovacuum_worker() to have
 the postmaster launch bg/autovac workers on demand.

Oh, I understand that.  My question was more about what mechanism are
you envisioning for new bgworkers.  What process would be in charge of
sending such signals?  Would it be a persistent bgworker which would
decide to start up other bgworkers based on external conditions such as
timing?  And how would postmaster know which bgworker to start -- would
it use the bgworker's name to find it in the registered workers list?

The trouble with the above rough sketch is how does the coordinating
bgworker communicate with postmaster.  Autovacuum has a very, um,
peculiar mechanism to make this work: avlauncher sends a signal (which
has a hardcoded-in-core signal number) and postmaster knows to start a
generic avworker; previously avlauncher has set things up in shared
memory, and the generic avworker knows where to look to morph into the
specific bgworker required.  So postmaster never really looks at shared
memory other than the signal number (which is the only use of shmem in
postmaster that's acceptable, because it's been carefully coded to be
robust).  This doesn't work for generic modules because we don't have a
hardcoded signal number; if two modules wanted to start up generic
bgworkers, how would postmaster know which worker-main function to call?

Now, maybe we can make this work in some robust fashion (robust
meaning we don't put postmaster at risk, which is of utmost importance;
and this in turn means don't trust anything in shared memory.)  I don't
say it's impossible; only that it needs some more thought and careful
design.

As posted, the feature is already useful and it'd be good to have it
committed soon so that others can experiment with whatever sample
bgworkers they see fit.  That will give us more insight on other API
refinements we might need.

I'm going to disappear on paternity leave, most likely later this week,
or early next week; I would like to commit this patch before that.  When
I'm back we can discuss other improvements.  That will give us several
weeks before the end of the 9.3 development period to get these in.  Of
course, other committers are welcome to improve the code in my absence.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-03 Thread Alvaro Herrera
Simon Riggs wrote:
 On 3 December 2012 15:17, Markus Wanner mar...@bluegap.ch wrote:

  The only process that currently starts background workers ... ehm ...
  autovacuum workers is the autovacuum launcher. It uses the above
  Postmaster Signal in autovacuum.c:do_start_autovacuum_worker() to have
  the postmaster launch bg/autovac workers on demand.
 
 My understanding was that the patch keep autovac workers and
 background workers separate at this point.

That is correct.

 Is there anything to be gained *now* from merging those two concepts?
 I saw that as refactoring that can occur once we are happy it should
 take place, but isn't necessary.

IMO it's a net loss in robustness of the autovac implementation.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-03 Thread Markus Wanner
On 12/03/2012 04:27 PM, Simon Riggs wrote:
 My understanding was that the patch keep autovac workers and
 background workers separate at this point.

I agree to that, for this patch. However, that might not keep me from
trying to (re-)sumbit some of the bgworker patches in my queue. :-)

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-03 Thread Markus Wanner
On 12/03/2012 04:44 PM, Alvaro Herrera wrote:
 Simon Riggs wrote:
 Is there anything to be gained *now* from merging those two concepts?
 I saw that as refactoring that can occur once we are happy it should
 take place, but isn't necessary.
 
 IMO it's a net loss in robustness of the autovac implementation.

Agreed.

That's only one aspect of it, though. From the other point of view, it
would be a proof of stability for the bgworker implementation if
autovacuum worked on top of it.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-03 Thread Markus Wanner
Alvaro,

in general, please keep in mind that there are two aspects that I tend
to mix and confuse: one is what's implemented and working for
Postgres-R. The other this is how I envision (parts of it) to be merged
back into Postgres, itself. Sorry if that causes confusion.

On 12/03/2012 04:43 PM, Alvaro Herrera wrote:
 Oh, I understand that.  My question was more about what mechanism are
 you envisioning for new bgworkers.  What process would be in charge of
 sending such signals?  Would it be a persistent bgworker which would
 decide to start up other bgworkers based on external conditions such as
 timing?  And how would postmaster know which bgworker to start -- would
 it use the bgworker's name to find it in the registered workers list?

Well, in the Postgres-R case, I've extended the autovacuum launcher to a
more general purpose job scheduler, named coordinator. Lacking your
bgworker patch, it is the only process that is able to launch background
workers.

Your work now allows extensions to register background workers. If I
merge the two concepts, I can easily imagine allowing other (bgworker)
processes to launch bgworkers.

Another thing I can imagine is turning that coordinator into something
that can schedule and trigger jobs registered by extensions (or even
user defined ones). Think: cron daemon for SQL jobs in the background.
(After all, that's pretty close to what the autovacuum launcher does,
already.)

 The trouble with the above rough sketch is how does the coordinating
 bgworker communicate with postmaster.

I know. I've gone through that pain.

In Postgres-R, I've solved this with imessages (which was the primary
reason for rejection of the bgworker patches back in 2010).

The postmaster only needs to starts *a* background worker process. That
process itself then has to figure out (by checking its imessage queue)
what job it needs to perform. Or if you think in terms of bgworker
types: what type of bgworker it needs to be.

 Autovacuum has a very, um,
 peculiar mechanism to make this work: avlauncher sends a signal (which
 has a hardcoded-in-core signal number) and postmaster knows to start a
 generic avworker; previously avlauncher has set things up in shared
 memory, and the generic avworker knows where to look to morph into the
 specific bgworker required.  So postmaster never really looks at shared
 memory other than the signal number (which is the only use of shmem in
 postmaster that's acceptable, because it's been carefully coded to be
 robust).

In Postgres-R, I've extended exactly that mechanism to work for other
jobs that autovacuum.

 This doesn't work for generic modules because we don't have a
 hardcoded signal number; if two modules wanted to start up generic
 bgworkers, how would postmaster know which worker-main function to call?

You've just described above how this works for autovacuum: the
postmaster doesn't *need* to know. Leave it to the bgworker to determine
what kind of task it needs to perform.

 As posted, the feature is already useful and it'd be good to have it
 committed soon so that others can experiment with whatever sample
 bgworkers they see fit.  That will give us more insight on other API
 refinements we might need.

I completely agree. I didn't ever intend to provide an alternative patch
or hold you back. (Except for the extra daemon issue, where we disagree,
but that's not a reason to keep this feature back). So please, go ahead
and commit this feature (once the issues I've mentioned in the review
are fixed).

Please consider all of these plans or ideas in here (or in Postgres-R)
as extending on your work, rather than competing against.

 I'm going to disappear on paternity leave, most likely later this week,
 or early next week; I would like to commit this patch before that.  When
 I'm back we can discuss other improvements.  That will give us several
 weeks before the end of the 9.3 development period to get these in.  Of
 course, other committers are welcome to improve the code in my absence.

Okay, thanks for sharing that. I'd certainly appreciate your inputs on
further refinements for bgworkers and/or autovacuum.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-11-30 Thread Dimitri Fontaine
Markus Wanner mar...@bluegap.ch writes:
 However, as you say, maybe we need more coding examples.

 Maybe a minimally usable extra daemon example? Showing how to avoid
 common pitfalls? Use cases, anybody? :-)

What about the PGQ ticker, pgqd?

  https://github.com/markokr/skytools/tree/master/sql/ticker
  https://github.com/markokr/skytools/blob/master/sql/ticker/pgqd.c

Or maybe pgAgent, which seems to live there, but is in C++ so might need
a rewrite to the specs:

  
http://git.postgresql.org/gitweb/?p=pgadmin3.git;a=tree;f=pgadmin/agent;h=ebbcf71bd918efdc82466785ffac6f2ac3443847;hb=HEAD

Maybe it would be easier to have a version of GNU mcron as an extension,
with the abitity to fire PostgreSQL stored procedures directly?  (That
way the cron specific parts of the logic are already implemented)

  http://www.gnu.org/software/mcron/

Another idea would be to have a pgbouncer extension. We would still need
of course to have pgbouncer as a separate component so that client
connection can outlive a postmaster crash, but that would still be very
useful as a first step into admission control. Let's not talk about the
feedback loop and per-cluster resource usage monitoring yet, but I guess
that you can see the drift.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-11-30 Thread Pavel Stehule
2012/11/30 Dimitri Fontaine dimi...@2ndquadrant.fr:
 Markus Wanner mar...@bluegap.ch writes:
 However, as you say, maybe we need more coding examples.

 Maybe a minimally usable extra daemon example? Showing how to avoid
 common pitfalls? Use cases, anybody? :-)

 What about the PGQ ticker, pgqd?

   https://github.com/markokr/skytools/tree/master/sql/ticker
   https://github.com/markokr/skytools/blob/master/sql/ticker/pgqd.c

 Or maybe pgAgent, which seems to live there, but is in C++ so might need
 a rewrite to the specs:

   
 http://git.postgresql.org/gitweb/?p=pgadmin3.git;a=tree;f=pgadmin/agent;h=ebbcf71bd918efdc82466785ffac6f2ac3443847;hb=HEAD

 Maybe it would be easier to have a version of GNU mcron as an extension,
 with the abitity to fire PostgreSQL stored procedures directly?  (That
 way the cron specific parts of the logic are already implemented)

   http://www.gnu.org/software/mcron/

 Another idea would be to have a pgbouncer extension. We would still need
 of course to have pgbouncer as a separate component so that client
 connection can outlive a postmaster crash, but that would still be very
 useful as a first step into admission control. Let's not talk about the
 feedback loop and per-cluster resource usage monitoring yet, but I guess
 that you can see the drift.

both will be nice

Pavel


 Regards,
 --
 Dimitri Fontaine
 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-11-30 Thread Markus Wanner
On 11/30/2012 11:09 AM, Dimitri Fontaine wrote:
 Markus Wanner mar...@bluegap.ch writes:
 However, as you say, maybe we need more coding examples.

 Maybe a minimally usable extra daemon example? Showing how to avoid
 common pitfalls? Use cases, anybody? :-)
 
 What about the PGQ ticker, pgqd?
 
   https://github.com/markokr/skytools/tree/master/sql/ticker
   https://github.com/markokr/skytools/blob/master/sql/ticker/pgqd.c

AFAICS pgqd currently uses libpq, so I think it would rather turn into
what I call a background worker, with a connection to Postgres shared
memory. I perfectly well see use cases (plural!) for those.

What I'm questioning is the use for what I rather call extra daemons,
that is, additional processes started by the postmaster without a
connection to Postgres shared memory (and thus without a database
connection).

I was asking for a minimal example of such an extra daemon, similar to
worker_spi, showing how to properly write such a thing. Ideally showing
how to avoid common pitfalls.

 Maybe it would be easier to have a version of GNU mcron as an extension,
 with the abitity to fire PostgreSQL stored procedures directly?  (That
 way the cron specific parts of the logic are already implemented)
 
   http://www.gnu.org/software/mcron/

Again, that's something that would eventually require a database
connection. Or at least access to Postgres shared memory to eventually
start the required background jobs. (Something Alvaro's patch doesn't
implement, yet. But which exactly matches what the coordinator and
bgworkers in Postgres-R do.)

For ordinary extra daemons, I'm worried about things like an OOM killer
deciding to kill the postmaster, being its parent. Or (io)niceness
settings. Or Linux cgroups. IMO all of these things just get harder to
administrate for extra daemons, when they move under the hat of the
postmaster.

Without access to shared memory, the only thing an extra daemon would
gain by moving under postmaster is the knowledge of the start, stop
and restart (crash) events of the database. And that in a very
inflexible way: the extra process is forced to start, stop and restart
together with the database to learn about these events.

Using a normal client connection arguably is a better way to learn about
crash events - it doesn't have the above limitation. And the start and
stop events, well, the DBA or sysadmin is under control of these,
already. We can possibly improve on that, yes. But extra daemons are not
the way to go, IMO.

 Another idea would be to have a pgbouncer extension. We would still need
 of course to have pgbouncer as a separate component so that client
 connection can outlive a postmaster crash, but that would still be very
 useful as a first step into admission control. Let's not talk about the
 feedback loop and per-cluster resource usage monitoring yet, but I guess
 that you can see the drift.

Sorry, I don't. Especially not with something like pgbouncer, because I
definitely *want* to control and administrate that separately.

So I guess this is a vote to disallow `worker.bgw_flags = 0` on the
basis that everything a process, which doesn't need to access Postgres
shared memory, better does whatever it does outside of Postgres. For the
benefit of the stability of Postgres and for ease of administration of
the two.

Or, maybe, rather drop the BGWORKER_SHMEM_ACCESS flag and imply that
every registered process wants to have access to Postgres shared memory.
Then document the gotchas and requirements of living under the
Postmaster, as I've requested before. (If you want a foot gun, you can
still write an extension that doesn't need to access Postgres shared
memory, but hey.. I we can't help those who desperately try to shoot
their foot).

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-11-30 Thread Dimitri Fontaine
Markus Wanner mar...@bluegap.ch writes:
 AFAICS pgqd currently uses libpq, so I think it would rather turn into
 what I call a background worker, with a connection to Postgres shared
 memory. I perfectly well see use cases (plural!) for those.

 What I'm questioning is the use for what I rather call extra daemons,
 that is, additional processes started by the postmaster without a
 connection to Postgres shared memory (and thus without a database
 connection).

I totally missed the need to connect to shared memory to be able to
connect to a database and query it. Can't we just link the bgworkder
code to the client libpq library, just as plproxy is doing I believe?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-11-30 Thread Alvaro Herrera
Dimitri Fontaine wrote:
 Markus Wanner mar...@bluegap.ch writes:
  AFAICS pgqd currently uses libpq, so I think it would rather turn into
  what I call a background worker, with a connection to Postgres shared
  memory. I perfectly well see use cases (plural!) for those.
 
  What I'm questioning is the use for what I rather call extra daemons,
  that is, additional processes started by the postmaster without a
  connection to Postgres shared memory (and thus without a database
  connection).
 
 I totally missed the need to connect to shared memory to be able to
 connect to a database and query it. Can't we just link the bgworkder
 code to the client libpq library, just as plproxy is doing I believe?

One of the uses for bgworkers that don't have shmem connection is to
have them use libpq connections instead.  I don't really see the point
of forcing everyone to use backend connections when libpq connections
are enough.  In particular, they are easier to port from existing code;
and they make it easier to share code with systems that still have to
support older PG versions.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-11-30 Thread Andres Freund
On 2012-11-30 09:57:20 -0300, Alvaro Herrera wrote:
 Dimitri Fontaine wrote:
  Markus Wanner mar...@bluegap.ch writes:
   AFAICS pgqd currently uses libpq, so I think it would rather turn into
   what I call a background worker, with a connection to Postgres shared
   memory. I perfectly well see use cases (plural!) for those.
  
   What I'm questioning is the use for what I rather call extra daemons,
   that is, additional processes started by the postmaster without a
   connection to Postgres shared memory (and thus without a database
   connection).
 
  I totally missed the need to connect to shared memory to be able to
  connect to a database and query it. Can't we just link the bgworkder
  code to the client libpq library, just as plproxy is doing I believe?

 One of the uses for bgworkers that don't have shmem connection is to
 have them use libpq connections instead.  I don't really see the point
 of forcing everyone to use backend connections when libpq connections
 are enough.  In particular, they are easier to port from existing code;
 and they make it easier to share code with systems that still have to
 support older PG versions.

They also can get away with a lot more crazy stuff without corrupting
the database. You better know something about what youre doing before
doing something with direct shared memory access.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-11-30 Thread Dimitri Fontaine
Andres Freund and...@2ndquadrant.com writes:
 One of the uses for bgworkers that don't have shmem connection is to
 have them use libpq connections instead.  I don't really see the point
 of forcing everyone to use backend connections when libpq connections
 are enough.  In particular, they are easier to port from existing code;
 and they make it easier to share code with systems that still have to
 support older PG versions.

Exactly, I think most bgworker would just use libpq if that's available,
using a backend's infrastructure is not that good a fit here. I mean,
connect from your worker to a database using libpq and call a backend's
function (provided by the same extension I guess) in there.

That's how I think pgqd would get integrated into the worker
infrastructure, right?

 They also can get away with a lot more crazy stuff without corrupting
 the database. You better know something about what youre doing before
 doing something with direct shared memory access.

And there's a whole lot you can already do just with a C coded stored
procedure already.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-11-30 Thread Kohei KaiGai
2012/11/30 Dimitri Fontaine dimi...@2ndquadrant.fr:
 Andres Freund and...@2ndquadrant.com writes:
 One of the uses for bgworkers that don't have shmem connection is to
 have them use libpq connections instead.  I don't really see the point
 of forcing everyone to use backend connections when libpq connections
 are enough.  In particular, they are easier to port from existing code;
 and they make it easier to share code with systems that still have to
 support older PG versions.

 Exactly, I think most bgworker would just use libpq if that's available,
 using a backend's infrastructure is not that good a fit here. I mean,
 connect from your worker to a database using libpq and call a backend's
 function (provided by the same extension I guess) in there.

 That's how I think pgqd would get integrated into the worker
 infrastructure, right?

One thing we have to pay attention is, the backend code cannot distinguish
connection from pgworker via libpq from other regular connections, from
perspective of access control.
Even if we implement major portion with C-function, do we have a reliable way
to prohibit C-function being invoked with user's query?

I also plan to use bgworker to load data chunk from shared_buffer to GPU
device in parallel, it shall be performed under the regular access control
stuff.

I think, using libpq is a good option for 95% of development, however, it
also should be possible to use SPI interface for corner case usage.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-11-30 Thread Markus Wanner
On 11/30/2012 01:40 PM, Dimitri Fontaine wrote:
 I totally missed the need to connect to shared memory to be able to
 connect to a database and query it. Can't we just link the bgworkder
 code to the client libpq library, just as plproxy is doing I believe?

Well, sure you can use libpq. But so can external processes. So that's
no benefit of running under the postmaster.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-11-30 Thread Markus Wanner
On 11/30/2012 01:59 PM, Andres Freund wrote:
 On 2012-11-30 09:57:20 -0300, Alvaro Herrera wrote:
 One of the uses for bgworkers that don't have shmem connection is to
 have them use libpq connections instead.  I don't really see the point
 of forcing everyone to use backend connections when libpq connections
 are enough.

Requiring a libpq connection is a good indication for *not* wanting the
process to run under the postmaster, IMO.

 In particular, they are easier to port from existing code;
 and they make it easier to share code with systems that still have to
 support older PG versions.
 
 They also can get away with a lot more crazy stuff without corrupting
 the database.

Exactly. That's a good reason to *not* tie that to the postmaster, then.
Please keep as much of the potentially dangerous stuff separate (and
advice developers to do so as well, instead of offering them a foot
gun). So that our postmaster can do its job. And do it reliably, without
trying to be a general purpose start/stop daemon. There are better and
well established tools for that.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-11-30 Thread Dimitri Fontaine
Kohei KaiGai kai...@kaigai.gr.jp writes:
 One thing we have to pay attention is, the backend code cannot distinguish
 connection from pgworker via libpq from other regular connections, from
 perspective of access control.
 Even if we implement major portion with C-function, do we have a reliable way
 to prohibit C-function being invoked with user's query?

Why would you want to do that? And maybe the way to enforce that would
be by having your extension do its connecting using SPI to be able to
place things in known pieces of memory for the function to check?

 I also plan to use bgworker to load data chunk from shared_buffer to GPU
 device in parallel, it shall be performed under the regular access control
 stuff.

That sounds like a job where you need shared memory access but maybe not
a database connection?

 I think, using libpq is a good option for 95% of development, however, it
 also should be possible to use SPI interface for corner case usage.

+1, totally agreed.

-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-11-30 Thread Alvaro Herrera
Markus Wanner wrote:
 On 11/30/2012 01:40 PM, Dimitri Fontaine wrote:
  I totally missed the need to connect to shared memory to be able to
  connect to a database and query it. Can't we just link the bgworkder
  code to the client libpq library, just as plproxy is doing I believe?
 
 Well, sure you can use libpq. But so can external processes. So that's
 no benefit of running under the postmaster.

No, it's not a benefit of that; but such a process would get started up
when postmaster is started, and shut down when postmaster stops.  So it
makes easier to have processes that need to run alongside postmaster.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-11-30 Thread Kohei KaiGai
2012/11/30 Dimitri Fontaine dimi...@2ndquadrant.fr:
 Kohei KaiGai kai...@kaigai.gr.jp writes:
 One thing we have to pay attention is, the backend code cannot distinguish
 connection from pgworker via libpq from other regular connections, from
 perspective of access control.
 Even if we implement major portion with C-function, do we have a reliable way
 to prohibit C-function being invoked with user's query?

 Why would you want to do that? And maybe the way to enforce that would
 be by having your extension do its connecting using SPI to be able to
 place things in known pieces of memory for the function to check?

As long as SPI is an option of bgworker, I have nothing to argue here.
If the framework enforced extension performing as background worker using
libpq for database connection, a certain kind of works being tied with internal
stuff has to be implemented as C-functions. Thus, I worried about it.

 I also plan to use bgworker to load data chunk from shared_buffer to GPU
 device in parallel, it shall be performed under the regular access control
 stuff.

 That sounds like a job where you need shared memory access but maybe not
 a database connection?

Both of them are needed in this case. This io-worker will perform according
to the request from regular backend process, and fetch tuples from the heap
to the DMA buffer being on shared memory.

 I think, using libpq is a good option for 95% of development, however, it
 also should be possible to use SPI interface for corner case usage.

 +1, totally agreed.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-11-30 Thread Markus Wanner
Alvaro,

On 11/30/2012 02:44 PM, Alvaro Herrera wrote:
 So it
 makes easier to have processes that need to run alongside postmaster.

That's where we are in respectful disagreement, then. As I don't think
that's easier, overall, but in my eye, this looks like a foot gun.

As long as things like pgbouncer, pgqd, etc.. keep to be available as
separate daemons, I'm happy, though.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-11-30 Thread Markus Wanner
On 11/30/2012 03:16 PM, Kohei KaiGai wrote:
 This feature does not enforce them to implement with this new framework.
 If they can perform as separate daemons, it is fine enough.

I'm not clear on what exactly you envision, but if a process needs
access to shared buffers, it sounds like it should be a bgworker. I
don't quite understand why that process also wants a libpq connection,
but that's certainly doable.

 But it is not all the cases where we want background workers being tied
 with postmaster's duration.

Not wanting a process to be tied to postmaster's duration is a strong
indication that it better not be a bgworker, I think.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-11-30 Thread Kohei KaiGai
2012/11/30 Markus Wanner mar...@bluegap.ch:
 On 11/30/2012 03:16 PM, Kohei KaiGai wrote:
 This feature does not enforce them to implement with this new framework.
 If they can perform as separate daemons, it is fine enough.

 I'm not clear on what exactly you envision, but if a process needs
 access to shared buffers, it sounds like it should be a bgworker. I
 don't quite understand why that process also wants a libpq connection,
 but that's certainly doable.

It seemed to me you are advocating that any use case of background-
worker can be implemented with existing separate daemon approach.
What I wanted to say is, we have some cases that background-worker
framework allows to implement such kind of extensions with more
reasonable design.

Yes, one reason I want to use background-worker is access to shared-
memory segment. Also, it want to connect databases simultaneously
out of access controls; like as autovacuum. It is a reason why I'm saying
SPI interface should be only an option, not only libpq.
(If extension choose libpq, it does not take anything special, does it?)

 But it is not all the cases where we want background workers being tied
 with postmaster's duration.

 Not wanting a process to be tied to postmaster's duration is a strong
 indication that it better not be a bgworker, I think.

It also probably means, in case when a process whose duration wants to
be tied with duration of postmaster, its author can consider to implement
it as background worker.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Review: Extra Daemons / bgworker

2012-11-28 Thread Markus Wanner
Hello Álvaro,

first of all, thank you for bringing this up again and providing a
patch. My first attempt on that was more than two years ago [1]. As the
author of a former bgworker patch, I'd like to provide an additional
review - KaiGai was simply faster to sing up as a reviewer on the
commitfest app...

I started my review is based on rev 6d7e51.. from your bgworker branch
on github, only to figure out later that it wasn't quite up to date. I
upgraded to bgworker-7.patch. I hope that's the most recent version.


# Concept

I appreciate the ability to run daemons that are not connected to
Postgres shared memory. It satisfies a user request that came up several
times when I talked about the bgworkers in Postgres-R.

Another good point is the flexible interface via extensions, even
allowing different starting points for such background workers.

One thing I'd miss (for use of that framework in Postgres-R) is the
ability to start a registered bgworker only upon request, way after the
system started. So that the bgworker process doesn't even exist until it
is really needed. I realize that RegisterBackgroundWorker() is still
necessary in advance to reserve the slots in BackgroundWorkerList and
shared memory.

As a use case independent of Postgres-R, think of something akin to a
worker_spi, but wanting that to perform a task every 24h on hundreds of
databases. You don't want to keep hundreds of processes occupying PGPROC
slots just perform a measly task every 24h.

(We've discussed the ability to let bgworkers re-connect to another
database back then. For one, you'd still have currently unneeded worker
processes around all the time. And second, I think that's hard to get
right - after all, a terminated process is guaranteed to not leak any
stale data into a newly started one, no matter what.)

From my point of view, autovacuum is the very first example of a
background worker process. And I'm a bit puzzled about it not being
properly integrated into this framework. Once you think of autovacuum as
a background job which needs access to Postgres shared memory and a
database, but no client connection, it looks like a bit of code
duplication (and not using code we already have). I realize this kind of
needs the above feature being able to request the (re)start of bgworkers
at arbitrary points in time. However, it would also be a nice test case
for the bgworker infrastructure.

I'd be happy to help with extending the current patch into that
direction, if you agree it's generally useful. Or adapt my bgworker code
accordingly.


# Minor technical issues or questions

In assign_maxconnections, et al, GetNumRegisteredBackgroundWorkers() is
used in relation to MAX_BACKENDS or to calculate MaxBackends. That seems
to leak the bgworkers that registered with neither
BGWORKER_SHMEM_ACCESS nor BGWORKER_BACKEND_DATABASE_CONNECTION set. Or
is there any reason to discount such extra daemon processes?

The additional contrib modules auth_counter and worker_spi are missing
from the contrib/Makefile. If that's intentional, they should probably
be listed under Missing.

The auth_counter module leaves worker.bgw_restart_time uninitialized.

Being an example, it might make sense for auth_counter to provide a
signal that just calls SetLatch() to interrupt WaitLatch() and make
auth_counter emit its log line upon request, i.e. show how to use SIGUSR1.

The bgw_restart_time doesn't always work (certainly not the way I'm
expecting it to). For example, if you forget to pass the
BGWORKER_SHMEM_ACCESS flag, trying to LWLockAcquire() leads to the
worker being restarted immediately and repeatedly - independent of the
bgw_restart_time setting. The same holds true if the bgworker exits with
status 0 or in case it segfaults. Not when exiting with code 1, though.
Why is that? Especially in case of a segfault or equally hard errors
that can be expected to occur repeatedly, I don't want the worker to be
restarted that frequently.


# Documentation

There are two working examples in contrib. The auth_counter could use a
header comment similar to worker_spi, quickly describing what it does.
There's no example of a plain extra daemon, without shmem access.

Coding guidelines for bgworker / extra daemon writers are missing. I
read these must not use sleep(), with an explanation in both examples.
Other questions that come to mind: what about signal handlers? fork()?
threads? Can/should it use PG_TRY/PG_CATCH or setjmp()/longjmp(). How to
best load other 3rd party libraries, i.e. for OpenGL processing?

Especially for extra daemons (i.e. bgw_flags = 0), differences to
running an external daemon should be documented. I myself am unclear
about all of the implications that running as a child of the postmaster
has (OOM and cgroups come to mind - there certainly are other aspects).
(I myself still have a hard time finding a proper use case for extra
daemons. I don't want the postmaster to turn into a general purpose
watchdog for everything and the kitchen 

Re: [HACKERS] Review: Extra Daemons / bgworker

2012-11-28 Thread Alvaro Herrera
Markus Wanner wrote:

Hi Markus,

Many thanks for your review.

 first of all, thank you for bringing this up again and providing a
 patch. My first attempt on that was more than two years ago [1]. As the
 author of a former bgworker patch, I'd like to provide an additional
 review - KaiGai was simply faster to sing up as a reviewer on the
 commitfest app...

I remember your patchset.  I didn't look at it for this round, for no
particular reason.  I did look at KaiGai's submission from two
commitfests ago, and also at a patch from Simon which AFAIK was never
published openly.  Simon's patch merged the autovacuum code to make
autovac workers behave like bgworkers as handled by his patch, just like
you suggest.  To me it didn't look all that good; I didn't have the guts
for that, hence the separation.  If later bgworkers are found to work
well enough, we can port autovac workers to use this framework, but
for now it seems to me that the conservative thing is to leave autovac
untouched.

(As an example, we introduced ilist some commits back and changed some
uses to it; but there are still many places where we're using SHM_QUEUE,
or List, or open-coded lists, which we could port to the new
infrastructure, but there's no pressing need to do it.)

 I started my review is based on rev 6d7e51.. from your bgworker branch
 on github, only to figure out later that it wasn't quite up to date. I
 upgraded to bgworker-7.patch. I hope that's the most recent version.

Sorry about that -- forgot to push to github.  bgworker-7 corresponds to
commit 0a49a540b which I have just pushed to github.

 # Concept
 
 I appreciate the ability to run daemons that are not connected to
 Postgres shared memory. It satisfies a user request that came up several
 times when I talked about the bgworkers in Postgres-R.
 
 Another good point is the flexible interface via extensions, even
 allowing different starting points for such background workers.

Great.

 One thing I'd miss (for use of that framework in Postgres-R) is the
 ability to start a registered bgworker only upon request, way after the
 system started. So that the bgworker process doesn't even exist until it
 is really needed. I realize that RegisterBackgroundWorker() is still
 necessary in advance to reserve the slots in BackgroundWorkerList and
 shared memory.
 
 As a use case independent of Postgres-R, think of something akin to a
 worker_spi, but wanting that to perform a task every 24h on hundreds of
 databases. You don't want to keep hundreds of processes occupying PGPROC
 slots just perform a measly task every 24h.

Yeah, this is something I specifically kept out initially to keep things
simple.

Maybe one thing to do in this area would be to ensure that there is a
certain number of PGPROC elements reserved specifically for bgworkers;
kind of like autovacuum workers have.  That would avoid having regular
clients exhausting slots for bgworkers, and vice versa.

How are you envisioning that the requests would occur?

 # Minor technical issues or questions
 
 In assign_maxconnections, et al, GetNumRegisteredBackgroundWorkers() is
 used in relation to MAX_BACKENDS or to calculate MaxBackends. That seems
 to leak the bgworkers that registered with neither
 BGWORKER_SHMEM_ACCESS nor BGWORKER_BACKEND_DATABASE_CONNECTION set. Or
 is there any reason to discount such extra daemon processes?

No, I purposefully let those out, because those don't need a PGPROC.  In
fact that seems to me to be the whole point of non-shmem-connected
workers: you can have as many as you like and they won't cause a
backend-side impact.  You can use such a worker to connect via libpq to
the server, for example.

 The additional contrib modules auth_counter and worker_spi are missing
 from the contrib/Makefile. If that's intentional, they should probably
 be listed under Missing.
 
 The auth_counter module leaves worker.bgw_restart_time uninitialized.
 
 Being an example, it might make sense for auth_counter to provide a
 signal that just calls SetLatch() to interrupt WaitLatch() and make
 auth_counter emit its log line upon request, i.e. show how to use SIGUSR1.

KaiGai proposed that we remove auth_counter.  I don't see why not; I
mean, worker_spi is already doing most of what auth_counter is doing, so
why not?  However, as you say, maybe we need more coding examples.

 The bgw_restart_time doesn't always work (certainly not the way I'm
 expecting it to). For example, if you forget to pass the
 BGWORKER_SHMEM_ACCESS flag, trying to LWLockAcquire() leads to the
 worker being restarted immediately and repeatedly - independent of the
 bgw_restart_time setting. The same holds true if the bgworker exits with
 status 0 or in case it segfaults. Not when exiting with code 1, though.
 Why is that? Especially in case of a segfault or equally hard errors
 that can be expected to occur repeatedly, I don't want the worker to be
 restarted that frequently.

Ah, that's just a bug, of course.

-- 
Álvaro Herrera 

Re: [HACKERS] Review: Extra Daemons / bgworker

2012-11-28 Thread Markus Wanner
On 11/28/2012 03:51 PM, Alvaro Herrera wrote:
 I remember your patchset.  I didn't look at it for this round, for no
 particular reason.  I did look at KaiGai's submission from two
 commitfests ago, and also at a patch from Simon which AFAIK was never
 published openly.  Simon's patch merged the autovacuum code to make
 autovac workers behave like bgworkers as handled by his patch, just like
 you suggest.  To me it didn't look all that good; I didn't have the guts
 for that, hence the separation.  If later bgworkers are found to work
 well enough, we can port autovac workers to use this framework, but
 for now it seems to me that the conservative thing is to leave autovac
 untouched.

Hm.. interesting to see how the same idea grows into different patches
and gets refined until we end up with something considered committable.

Do you remember anything in particular that didn't look good? Would you
help reviewing a patch on top of bgworker-7 that changed autovacuum into
a bgworker?

 (As an example, we introduced ilist some commits back and changed some
 uses to it; but there are still many places where we're using SHM_QUEUE,
 or List, or open-coded lists, which we could port to the new
 infrastructure, but there's no pressing need to do it.)

Well, I usually like cleaning things up earlier rather than later (my
desk clearly being an exception to that rule, though). But yeah, new
code usually brings new bugs with it.

 Sorry about that -- forgot to push to github.  bgworker-7 corresponds to
 commit 0a49a540b which I have just pushed to github.

Thanks.

 Maybe one thing to do in this area would be to ensure that there is a
 certain number of PGPROC elements reserved specifically for bgworkers;
 kind of like autovacuum workers have.  That would avoid having regular
 clients exhausting slots for bgworkers, and vice versa.

Yeah, I think that's mandatory, anyways, see below.

 How are you envisioning that the requests would occur?

Just like av_launcher does it now: set a flag in shared memory and
signal the postmaster (PMSIGNAL_START_AUTOVAC_WORKER).

(That's why I'm so puzzled: it looks like it's pretty much all there,
already. I even remember a discussion about that mechanism probably not
being fast enough to spawn bgworkers. And a proposal to add multiple
such flags, so an avlauncher-like daemon could ask for multiple
bgworkers to be started in parallel. I've even measured the serial
bgworker fork rate back then, IIRC it was in the hundreds of forks per
second...)

 # Minor technical issues or questions

 In assign_maxconnections, et al, GetNumRegisteredBackgroundWorkers() is
 used in relation to MAX_BACKENDS or to calculate MaxBackends. That seems
 to leak the bgworkers that registered with neither
 BGWORKER_SHMEM_ACCESS nor BGWORKER_BACKEND_DATABASE_CONNECTION set. Or
 is there any reason to discount such extra daemon processes?
 
 No, I purposefully let those out, because those don't need a PGPROC.  In
 fact that seems to me to be the whole point of non-shmem-connected
 workers: you can have as many as you like and they won't cause a
 backend-side impact.  You can use such a worker to connect via libpq to
 the server, for example.

Hm.. well, in that case, the shmem-connected ones are *not* counted. If
I create and run an extensions that registers 100 shmem-connected
bgworkers, I cannot connect to a system with max_connections=100
anymore, because bgworkers then occupy all of the connections, already.

Please add the registered shmem-connected bgworkers to the
max_connections limit. I think it's counter intuitive to have autovacuum
workers reserved separately, but extension's bgworkers eat (client)
connections. Or put another way: max_connections should always be the
max number of *client* connections the DBA wants to allow.

(Or, if that's in some way complicated, please give the DBA an
additional GUC akin to max_background_workers. That can be merged with
the current max_autovacuum_workers, once/if we rebase autovaccum onto
bgworkers).

 The additional contrib modules auth_counter and worker_spi are missing
 from the contrib/Makefile. If that's intentional, they should probably
 be listed under Missing.

 The auth_counter module leaves worker.bgw_restart_time uninitialized.

 Being an example, it might make sense for auth_counter to provide a
 signal that just calls SetLatch() to interrupt WaitLatch() and make
 auth_counter emit its log line upon request, i.e. show how to use SIGUSR1.
 
 KaiGai proposed that we remove auth_counter.  I don't see why not; I
 mean, worker_spi is already doing most of what auth_counter is doing, so
 why not?

Agreed.

 However, as you say, maybe we need more coding examples.

Maybe a minimally usable extra daemon example? Showing how to avoid
common pitfalls? Use cases, anybody? :-)

 Ah, that's just a bug, of course.

I see. Glad my review found it.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your