On Thursday 12 February 2009 17:41:01 Peter Eisentraut wrote: > I know we've already had a discussion on the naming of the pg_restore -m > option, but in any case this description in pg_restore --help is confusing: > > -m, --multi-thread=NUM use this many parallel connections to restore > > Either it is using that many threads in the client, or it is using that > many connections to the server. I assume the implementation does > approximately both, but we should be clear about what we promise to the > user. Either: Reserve this many connections on the server. Or: Reserve > this many threads in the kernel of the client. The documentation in the > reference/man page is equally confused. > > Also, the term "multi" is redundant, because whether it is multi or single > is obviously determined by the value of NUM.
After reviewing the discussion and the implementation, I would say "workers" would be the best description of the feature, but unfortunately the options -w or -W are not available. I'd also avoid -n or -N for "num..." because pg_dump already uses -n and -N for something else, and we are now trying to avoid inconsistent options between these programs. Also, option names usually don't start with units (imagine --num-shared-buffers or --num-port). While I think "jobs" isn't a totally accurate description, I would still propose to use -j/--jobs for the option name, because it is neutral about the implementation and has a strong precedent as being used to increase the parallelization to get the work done faster. I also noticed that Andrew D. used "jobs" in his own emails to comment on the feature. :-) The attached patch also updated the documentation to give some additional advice about which numbers to use.
Index: doc/src/sgml/ref/pg_restore.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/ref/pg_restore.sgml,v retrieving revision 1.80 diff -u -3 -p -r1.80 pg_restore.sgml --- doc/src/sgml/ref/pg_restore.sgml 26 Feb 2009 16:02:37 -0000 1.80 +++ doc/src/sgml/ref/pg_restore.sgml 19 Mar 2009 21:18:32 -0000 @@ -216,6 +216,46 @@ </varlistentry> <varlistentry> + <term><option>-j <replaceable class="parameter">number-of-jobs</replaceable></option></term> + <term><option>--jobs=<replaceable class="parameter">number-of-jobs</replaceable></option></term> + <listitem> + <para> + Run the most time-consuming parts + of <application>pg_restore</> — those which load data, + create indexes, or create constraints — using multiple + concurrent jobs. This option can dramatically reduce the time + to restore a large database to a server running on a + multi-processor machine. + </para> + + <para> + Each job is one process or one thread, depending on the + operating system, and uses a separate connection to the + server. + </para> + + <para> + The optimal value for this option depends on the hardware + setup of the server, of the client, and of the network. + Factors include the number of CPU cores and the disk setup. A + good place to start is the number of CPU cores on the server, + but values larger than that can also lead to faster restore + times in many cases. Of course, values that are too high will + lead to decreasing performance because of thrashing. + </para> + + <para> + Only the custom archive format is supported with this option. + The input file must be a regular file (not, for example, a + pipe). This option is ignored when emitting a script rather + than connecting directly to a database server. Also, multiple + jobs cannot be used together with the + option <option>--single-transaction</option>. + </para> + </listitem> + </varlistentry> + + <varlistentry> <term><option>-l</option></term> <term><option>--list</option></term> <listitem> @@ -242,28 +282,6 @@ </varlistentry> <varlistentry> - <term><option>-m <replaceable class="parameter">number-of-threads</replaceable></option></term> - <term><option>--multi-thread=<replaceable class="parameter">number-of-threads</replaceable></option></term> - <listitem> - <para> - Run the most time-consuming parts of <application>pg_restore</> - — those which load data, create indexes, or create - constraints — using multiple concurrent connections to the - database. This option can dramatically reduce the time to restore a - large database to a server running on a multi-processor machine. - </para> - - <para> - This option is ignored when emitting a script rather than connecting - directly to a database server. Multiple threads cannot be used - together with <option>--single-transaction</option>. Also, the input - must be a plain file (not, for example, a pipe), and at present only - the custom archive format is supported. - </para> - </listitem> - </varlistentry> - - <varlistentry> <term><option>-n <replaceable class="parameter">namespace</replaceable></option></term> <term><option>--schema=<replaceable class="parameter">schema</replaceable></option></term> <listitem> Index: src/bin/pg_dump/pg_backup.h =================================================================== RCS file: /cvsroot/pgsql/src/bin/pg_dump/pg_backup.h,v retrieving revision 1.50 diff -u -3 -p -r1.50 pg_backup.h --- src/bin/pg_dump/pg_backup.h 26 Feb 2009 16:02:37 -0000 1.50 +++ src/bin/pg_dump/pg_backup.h 19 Mar 2009 21:18:32 -0000 @@ -139,7 +139,7 @@ typedef struct _restoreOptions int suppressDumpWarnings; /* Suppress output of WARNING entries * to stderr */ bool single_txn; - int number_of_threads; + int number_of_jobs; bool *idWanted; /* array showing which dump IDs to emit */ } RestoreOptions; Index: src/bin/pg_dump/pg_backup_archiver.c =================================================================== RCS file: /cvsroot/pgsql/src/bin/pg_dump/pg_backup_archiver.c,v retrieving revision 1.167 diff -u -3 -p -r1.167 pg_backup_archiver.c --- src/bin/pg_dump/pg_backup_archiver.c 13 Mar 2009 22:50:44 -0000 1.167 +++ src/bin/pg_dump/pg_backup_archiver.c 19 Mar 2009 21:18:32 -0000 @@ -354,7 +354,7 @@ RestoreArchive(Archive *AHX, RestoreOpti * * In parallel mode, turn control over to the parallel-restore logic. */ - if (ropt->number_of_threads > 1 && ropt->useDB) + if (ropt->number_of_jobs > 1 && ropt->useDB) restore_toc_entries_parallel(AH); else { @@ -3061,7 +3061,7 @@ static void restore_toc_entries_parallel(ArchiveHandle *AH) { RestoreOptions *ropt = AH->ropt; - int n_slots = ropt->number_of_threads; + int n_slots = ropt->number_of_jobs; ParallelSlot *slots; int work_status; int next_slot; Index: src/bin/pg_dump/pg_restore.c =================================================================== RCS file: /cvsroot/pgsql/src/bin/pg_dump/pg_restore.c,v retrieving revision 1.95 diff -u -3 -p -r1.95 pg_restore.c --- src/bin/pg_dump/pg_restore.c 11 Mar 2009 03:33:29 -0000 1.95 +++ src/bin/pg_dump/pg_restore.c 19 Mar 2009 21:18:32 -0000 @@ -93,8 +93,8 @@ main(int argc, char **argv) {"host", 1, NULL, 'h'}, {"ignore-version", 0, NULL, 'i'}, {"index", 1, NULL, 'I'}, + {"jobs", 1, NULL, 'j'}, {"list", 0, NULL, 'l'}, - {"multi-thread", 1, NULL, 'm'}, {"no-privileges", 0, NULL, 'x'}, {"no-acl", 0, NULL, 'x'}, {"no-owner", 0, NULL, 'O'}, @@ -146,7 +146,7 @@ main(int argc, char **argv) } } - while ((c = getopt_long(argc, argv, "acCd:ef:F:h:iI:lL:m:n:Op:P:RsS:t:T:U:vwWxX:1", + while ((c = getopt_long(argc, argv, "acCd:ef:F:h:iI:j:lL:n:Op:P:RsS:t:T:U:vwWxX:1", cmdopts, NULL)) != -1) { switch (c) @@ -181,6 +181,10 @@ main(int argc, char **argv) /* ignored, deprecated option */ break; + case 'j': /* number of restore jobs */ + opts->number_of_jobs = atoi(optarg); + break; + case 'l': /* Dump the TOC summary */ opts->tocSummary = 1; break; @@ -189,10 +193,6 @@ main(int argc, char **argv) opts->tocFile = strdup(optarg); break; - case 'm': /* number of restore threads */ - opts->number_of_threads = atoi(optarg); - break; - case 'n': /* Dump data for this schema only */ opts->schemaNames = strdup(optarg); break; @@ -318,9 +318,9 @@ main(int argc, char **argv) } /* Can't do single-txn mode with multiple connections */ - if (opts->single_txn && opts->number_of_threads > 1) + if (opts->single_txn && opts->number_of_jobs > 1) { - fprintf(stderr, _("%s: cannot specify both --single-transaction and multiple threads\n"), + fprintf(stderr, _("%s: cannot specify both --single-transaction and multiple jobs\n"), progname); exit(1); } @@ -417,9 +417,9 @@ usage(const char *progname) printf(_(" -C, --create create the target database\n")); printf(_(" -e, --exit-on-error exit on error, default is to continue\n")); printf(_(" -I, --index=NAME restore named index\n")); + printf(_(" -j, --jobs=NUM use this many parallel jobs to restore\n")); printf(_(" -L, --use-list=FILENAME use table of contents from this file for\n" " selecting/ordering output\n")); - printf(_(" -m, --multi-thread=NUM use this many parallel connections to restore\n")); printf(_(" -n, --schema=NAME restore only objects in this schema\n")); printf(_(" -O, --no-owner skip restoration of object ownership\n")); printf(_(" -P, --function=NAME(args)\n"
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers