I originally sent this a week ago, but there was no response and I do not
see it at:
  http://momjian.postgresql.org/cgi-bin/pgpatches
or
  http://momjian.postgresql.org/cgi-bin/pgpatches_hold
so I assume it got missed in all the excitement about the psql banner.

-----

Subject: [PATCHES] pg_dump lock timeout
Date: Sun, 11 May 2008 04:30:47 -0700

Attached is a patch to add a commandline option to pg_dump to limit how long
pg_dump will wait for locks during startup.

The intent of this patch is to allow pg_dump to fail if a table lock cannot
be taken in a reasonable time. This allows the caller of pg_dump to retry or
otherwise correct the situation, without having locks held for long periods,
and without pg_dump having a long window during which catalog changes can
occur.

It works by setting statement_timeout to the user specified delay during
the startup phase where it is taking access share locks on all the tables.
Once all the locks are taken, it sets statement_timeout back to the default.
If a lock table statement times out, the dump fails with the statement timed
out error.

The orginal motivation was a client who runs heavy batch workloads and uses
truncate table and other DML in long transactions. This has created some
unhappy interaction scenarios with pg_dump:

  - pg_dump ends up waiting hours on a DML table lock that is part of a long
    transaction. Once the lock is released, pg_dump runs only to find
    some table later in the list has been dropped. So pg_dump fails.

  - pg_dump waits on a lock while holding access share locks on most of the
    tables. Other processes that want to do DML wait on pg_dump. After a
    while, large parts of the application are blocked while pg_dump waits
    on locks. Eventually the operations staff notice that pg_dump is
    blocking production and kill the dump.

Please have a look and consider it for merging.


-----

I'll even include the patch in the original mail this time, instead of a
hurried followup.

Thanks again,

-dg

-- 
David Gould       [EMAIL PROTECTED]      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.
*** pgsql/src/bin/pg_dump/pg_dump.c.orig        2008-05-11 03:23:06.000000000 
-0700
--- pgsql/src/bin/pg_dump/pg_dump.c     2008-05-11 03:44:58.000000000 -0700
***************
*** 71,76 ****
--- 71,77 ----
  bool          schemaOnly;
  bool          dataOnly;
  bool          aclsSkip;
+ const char    *lockWaitTimeout;
  
  /* subquery used to convert user ID (eg, datdba) to user name */
  static const char *username_subquery;
***************
*** 238,243 ****
--- 239,245 ----
                {"column-inserts", no_argument, NULL, 'D'},
                {"host", required_argument, NULL, 'h'},
                {"ignore-version", no_argument, NULL, 'i'},
+               {"lock-wait-timeout", required_argument, NULL, 'l'},
                {"no-reconnect", no_argument, NULL, 'R'},
                {"oids", no_argument, NULL, 'o'},
                {"no-owner", no_argument, NULL, 'O'},
***************
*** 278,283 ****
--- 280,286 ----
        strcpy(g_opaque_type, "opaque");
  
        dataOnly = schemaOnly = dumpInserts = attrNames = false;
+       lockWaitTimeout = NULL;
  
        progname = get_progname(argv[0]);
  
***************
*** 299,305 ****
                }
        }
  
!       while ((c = getopt_long(argc, argv, 
"abcCdDE:f:F:h:in:N:oOp:RsS:t:T:U:vWxX:Z:",
                                                        long_options, 
&optindex)) != -1)
        {
                switch (c)
--- 302,308 ----
                }
        }
  
!       while ((c = getopt_long(argc, argv, 
"abcCdDE:f:F:h:il:n:N:oOp:RsS:t:T:U:vWxX:Z:",
                                                        long_options, 
&optindex)) != -1)
        {
                switch (c)
***************
*** 350,355 ****
--- 353,362 ----
                                /* ignored, deprecated option */
                                break;
  
+                       case 'l':                       /* lock wait time */
+                               lockWaitTimeout = optarg;
+                               break;
+ 
                        case 'n':                       /* include schema(s) */
                                
simple_string_list_append(&schema_include_patterns, optarg);
                                include_everything = false;
***************
*** 755,760 ****
--- 762,769 ----
        printf(_("\nGeneral options:\n"));
        printf(_("  -f, --file=FILENAME      output file name\n"));
        printf(_("  -F, --format=c|t|p       output file format (custom, tar, 
plain text)\n"));
+       printf(_("  -l, --lock-wait-timeout=DELAY\n"
+                         "                           timeout and fail after 
delay waiting for a table share lock\n"));
        printf(_("  -v, --verbose            verbose mode\n"));
        printf(_("  -Z, --compress=0-9       compression level for compressed 
formats\n"));
        printf(_("  --help                   show this help, then exit\n"));
***************
*** 3191,3196 ****
--- 3200,3213 ----
        i_reltablespace = PQfnumber(res, "reltablespace");
        i_reloptions = PQfnumber(res, "reloptions");
  
+       if (lockWaitTimeout)
+       {
+               /* Abandon the dump instead of waiting forever for a table lock 
*/
+               resetPQExpBuffer(lockquery);
+               appendPQExpBuffer(lockquery, "SET statement_timeout = ");
+               appendStringLiteralConn(lockquery, lockWaitTimeout, g_conn);
+               do_sql_command(g_conn, lockquery->data);
+       }
        for (i = 0; i < ntups; i++)
        {
                tblinfo[i].dobj.objType = DO_TABLE;
***************
*** 3259,3264 ****
--- 3276,3285 ----
                                          tblinfo[i].dobj.name);
        }
  
+       if (lockWaitTimeout)
+       {
+               do_sql_command(g_conn, "SET statement_timeout = default");
+       }
        PQclear(res);
  
        /*
*** pgsql/doc/src/sgml/ref/pg_dump.sgml.orig    2008-05-11 03:38:05.000000000 
-0700
--- pgsql/doc/src/sgml/ref/pg_dump.sgml 2008-05-11 03:38:56.000000000 -0700
***************
*** 302,307 ****
--- 302,320 ----
       </varlistentry>
  
       <varlistentry>
+       <term><option>-l <replaceable 
class="parameter">wait_time</replaceable></option></term>
+       <term><option>--lock-wait-timeout=<replaceable 
class="parameter">wait_time</replaceable></option></term>
+       <listitem>
+        <para>
+         Do not wait forever for table locks at the start of the dump. Instead
+         time out and abandon the dump if unable to lock a table within the
+         specified wait time. The wait time is specified with the same formats
+         as accepted for intervals by the SET command.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry>
        <term><option>-n <replaceable 
class="parameter">schema</replaceable></option></term>
        <term><option>--schema=<replaceable 
class="parameter">schema</replaceable></option></term>
        <listitem>
-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

Reply via email to