v13 has added a proper test comparing original and restored table data
On Tue, Jan 27, 2026 at 11:43 PM Hannu Krosing <[email protected]> wrote: > > Hi David > > Thanks for reviewing. > > Please hold back reviewing this v12 patch until I have verified that > it passes CfBot and until I have finished my testing with 17TB table. > > I would appreciate pointers or help on adding the data correctness > tests to the tap tests as real tests. > For now I put them as DO$$ ... $$ blocks in the parallel test .pl and > I check manually that the table data checksums match > If we add them we should add them to other places in pg_dump tests as > well. Currently we just test that dump and restore do not fail, not > that the restored data is correct. > > On Fri, Jan 23, 2026 at 3:15 AM David Rowley <[email protected]> wrote: > > > > On Fri, 23 Jan 2026 at 06:05, Hannu Krosing <[email protected]> wrote: > > > > > > Fixing all the warnings > > > > I think overall this needs significantly more care and precision than > > what you've given it so far. For example, you have: > > > > + if(dopt->max_table_segment_pages != InvalidBlockNumber) > > + appendPQExpBufferStr(query, > > "pg_relation_size(c.oid)/current_setting('block_size')::int AS > > relpages, "); > > + else > > + appendPQExpBufferStr(query, "c.relpages, "); > > > > Note that pg_class.relpages is "int". Later the code in master does: > > > > tblinfo[i].relpages = atoi(PQgetvalue(res, i, i_relpages)); > > I have now fixed the base issue by changing the data type of > TableInfo.relpages to BlockNumber, and also changed the way we get it > by > > 1. converting it to unsigned int ( c.relpages::oid ) in the query > 2. reading it from the result using strtoul() > > (technically it should have been enough to just use strtoul() as it > already wraps signed ints to unsigned ones, but having it converted in > the query seems cleaner) > > This allowed removing casts to (BlockNumber) everywhere where > .relpages was used. > > Functionally value was ever only used for ordering and even this > loosley, which explains why patch v10 did not break anything. > > I also changed the data type of TocEntry.dataLength from pgoff_t to > uint64. The current clearly had an overflow in case when off_t was 32 > bit and sum of relpages from heap and toast was larger than allowed > for it. > > > If you look in vacuum.c, you'll see "pgcform->relpages = (int32) > > num_pages;" that the value stored in relpages will be negative when > > the table is >= 16TB (assuming 8k pages). Your pg_relation_size > > expression is not going to produce an INT. It'll produce a BIGINT, per > > "select pg_typeof(pg_relation_size('pg_class') / > > current_setting('block_size')::int);". So the atoi() can receive a > > string of digits representing an integer larger than INT_MAX in this > > case. Looking at [1], I see: > > As said above this should be fixed now by using correct type in struch > and strtoul(). > To be sure I have now created a 17TB table and running some tests on > this as well. > Will let you know here when done. > > > "7.22.1 Numeric conversion functions 1 The functions atof, atoi, atol, > > and atoll need not affect the value of the integer expression errno on > > an error. If the value of the result cannot be represented, *the > > behavior is undefined.*" > > > > And testing locally, I see that my Microsoft compiler will just return > > INT_MAX on overflow, whereas I see gcc does nothing to prevent > > overflows and just continues to multiply by 10 regardless of what > > overflows occur, which I think would just make the code work by > > accident. > > As .relpages was only ever used for ordering parallel copies it does > work just not optimally. > > The old code has similar overflow/wraparound for case when off_t is 32 > bit int and the sum of relpages from heap and toast table is above > INT_MAX > > I have removed the whole part where this was partially fixed for the > case when one of them was > 0x7fffffff (i.e. negative) by pinning the > dataLength to INT_MAX in that case > > > Aside from that, nothing in the documentation mentions that this is > > for "heap" tables only. That should be mentioned as it'll just result > > in people posting questions about why it's not working for some other > > table access method. There's also not much care for white space. > > You've introduced a bunch of whitespace changes unrelated to code > > changes you've made, plus there's not much regard for following > > project standard. For example, you commonly do "if(" and don't > > consistently follow the bracing rules, e.g: > > > > + for(chkptr = optarg; *chkptr != '\0'; chkptr++) > > + if(*chkptr == '-') > > I assumed that it is the classical "single statemet -- no braces. > > Do we have a writeup of our coding standards somewhere ? > > Now this specific case is rewritten using while() so shoud be ok. > > > Things like the following help convey the level of care that's gone into > > this: > > > > +/* > > + * option_parse_int > > + * > > + * Parse integer value for an option. If the parsing is successful, > > returns > > + * true and stores the result in *result if that's given; if parsing fails, > > + * returns false. > > + */ > > +bool > > +option_parse_uint32(const char *optarg, const char *optname, > > > > i.e zero effort gone in to modify the comments after pasting them from > > option_parse_int(). > > > > Another example: > > > > + pg_log_error("%s musst be in range %lu..%lu", > > > > Also, I have no comprehension of why you'd use uint64 for the valid > > range when the function is for processing uint32 types in: > > The uint64 there I picked up from the referenced long unsigned usage > in pg_resetval after I managed to get pg_log_warning to print out -1 > for format %u and did not want to go to debug why that happens. > > I have now made all the arguments uint32 > > > +bool > > +option_parse_uint32(const char *optarg, const char *optname, > > + uint64 min_range, uint64 max_range, > > + uint32 *result) > > > > In its current state, it's quite hard to take this patch seriously. > > Please spend longer self-reviewing it before posting. You could > > temporarily hard-code something for testing which makes at least 1 > > table appear to be larger than 16TB and ensure your code works. What > > you have is visually broken and depends on whatever the atoi > > implementation opts to do in the overflow case. These are all things > > diligent commiters will be testing and it's sad to see how little > > effort you're putting into this. How do you expect this community to > > scale with this quality level of patch submissions? You've been around > > long enough and should know and do better. Are you just expecting the > > committer to fix these things for you? That work does not get done via > > magic wand. Being on v10 already, I'd have expected the patch to be > > far beyond proof of concept grade. If you're withholding investing > > time on this until you see more community buy-in, then I'd suggest you > > write that and withhold further revisions until you're happy with the > > level of buy-in. > > > I'm also still not liking your de-normalised TableInfo representation > > for "is_segment". > > IMO, InvalidBlockNumber should be used to represent > > open bounded ranges, and if there's no chunking, then startPage and > > endPage will both be InvalidBlockNumber. > > That's what I ended up doing > > I switched to using startPage = InvalidBlockNumber to indicate that no > chunking is in effect. > > This is safe because when chunking is in use I always try to set both > chunk end pages, and lower bound I can always set the lower bound. > > Only for the last page is the endPage left to InvalidBlockNumber. > > > IMO, what you have now > > needlessly allows invalid states where is_segment == true and > > startPage, endPage are not set correctly. If you want to keep the code > > simple, hide the complexity in a macro or an inline function. There's > > just no performance reason to materialise the more complex condition > > into a dedicated boolean flag.
From e598191f7464ca2ecfa9779a823d1aa8a409cdf7 Mon Sep 17 00:00:00 2001 From: Hannu Krosing <[email protected]> Date: Wed, 28 Jan 2026 18:24:19 +0100 Subject: [PATCH v13] * changed flag name to max-table-segment-pages * added check for amname = "heap" * added simple chunked dump and restore test * changed the data type of TableInfo.relpages to BlockNumber, * select it using relpages:oid to get unsigned int out * read it in from query result using strtoul() * removed a bunch of casts from .relpages to (BlocNumber) * changed the data type of TocEntry.dataLength to uint64 current pgoff_t certainly had an overflow in 32bit case when heap relpages + toast relpages > INT_MAX * switched to using of pg_relation_size(c.oid)/current_setting('block_size')::int when --max-table-segment-pages is set * added documentation * added option_parse_uint32(...) to be used for full range of pages numbers * TESTS: added test to compare original and restored table contents --- doc/src/sgml/ref/pg_dump.sgml | 24 +++ src/bin/pg_dump/pg_backup.h | 2 + src/bin/pg_dump/pg_backup_archiver.c | 2 + src/bin/pg_dump/pg_backup_archiver.h | 2 +- src/bin/pg_dump/pg_dump.c | 169 +++++++++++++++++----- src/bin/pg_dump/pg_dump.h | 22 ++- src/bin/pg_dump/t/004_pg_dump_parallel.pl | 31 ++++ src/fe_utils/option_utils.c | 55 +++++++ src/include/fe_utils/option_utils.h | 3 + 9 files changed, 268 insertions(+), 42 deletions(-) diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml index 688e23c0e90..1811c67d141 100644 --- a/doc/src/sgml/ref/pg_dump.sgml +++ b/doc/src/sgml/ref/pg_dump.sgml @@ -1088,6 +1088,30 @@ PostgreSQL documentation </listitem> </varlistentry> + <varlistentry> + <term><option>--max-table-segment-pages=<replaceable class="parameter">npages</replaceable></option></term> + <listitem> + <para> + Dump data in segments based on number of pages in the main relation. + If the number of data pages in the relation is more than <replaceable class="parameter">npages</replaceable> + the data is split into segments based on that number of pages. + Individual segments can be dumped in parallel. + </para> + + <note> + <para> + The option <option>--max-table-segment-pages</option> is applied to only pages + in the main heap and if the table has a large TOASTed part this has to be + taken into account when deciding on the number of pages to use. + In the extreme case a single 8kB heap page can have ~200 toast pointers each + corresponding to 1GB of data. If this data is also non-compressible then a + single-page segment can dump as 200GB file. + </para> + </note> + + </listitem> + </varlistentry> + <varlistentry> <term><option>--no-comments</option></term> <listitem> diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h index d9041dad720..b63ae05d895 100644 --- a/src/bin/pg_dump/pg_backup.h +++ b/src/bin/pg_dump/pg_backup.h @@ -27,6 +27,7 @@ #include "common/file_utils.h" #include "fe_utils/simple_list.h" #include "libpq-fe.h" +#include "storage/block.h" typedef enum trivalue @@ -178,6 +179,7 @@ typedef struct _dumpOptions bool aclsSkip; const char *lockWaitTimeout; int dump_inserts; /* 0 = COPY, otherwise rows per INSERT */ + BlockNumber max_table_segment_pages; /* chunk when relpages is above this */ /* flags for various command-line long options */ int disable_dollar_quoting; diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c index 4a63f7392ae..ed1913d66bc 100644 --- a/src/bin/pg_dump/pg_backup_archiver.c +++ b/src/bin/pg_dump/pg_backup_archiver.c @@ -44,6 +44,7 @@ #include "pg_backup_archiver.h" #include "pg_backup_db.h" #include "pg_backup_utils.h" +#include "storage/block.h" #define TEXT_DUMP_HEADER "--\n-- PostgreSQL database dump\n--\n\n" #define TEXT_DUMPALL_HEADER "--\n-- PostgreSQL database cluster dump\n--\n\n" @@ -154,6 +155,7 @@ InitDumpOptions(DumpOptions *opts) opts->dumpSchema = true; opts->dumpData = true; opts->dumpStatistics = false; + opts->max_table_segment_pages = InvalidBlockNumber; } /* diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h index 325b53fc9bd..b6a9f16a122 100644 --- a/src/bin/pg_dump/pg_backup_archiver.h +++ b/src/bin/pg_dump/pg_backup_archiver.h @@ -377,7 +377,7 @@ struct _tocEntry size_t defnLen; /* length of dumped definition */ /* working state while dumping/restoring */ - pgoff_t dataLength; /* item's data size; 0 if none or unknown */ + uint64 dataLength; /* item's data size; 0 if none or unknown */ int reqs; /* do we need schema and/or data of object * (REQ_* bit mask) */ bool created; /* set for DATA member if TABLE was created */ diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 687dc98e46d..0badb245b55 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -539,6 +539,7 @@ main(int argc, char **argv) {"exclude-extension", required_argument, NULL, 17}, {"sequence-data", no_argument, &dopt.sequence_data, 1}, {"restrict-key", required_argument, NULL, 25}, + {"max-table-segment-pages", required_argument, NULL, 26}, {NULL, 0, NULL, 0} }; @@ -803,6 +804,13 @@ main(int argc, char **argv) dopt.restrict_key = pg_strdup(optarg); break; + case 26: + if (!option_parse_uint32(optarg, "--max-table-segment-pages", 1, MaxBlockNumber, + &dopt.max_table_segment_pages)) + exit_nicely(1); + pg_log_warning("CHUNKING: set dopt.max_table_segment_pages to [%u]", dopt.max_table_segment_pages); + break; + default: /* getopt_long already emitted a complaint */ pg_log_error_hint("Try \"%s --help\" for more information.", progname); @@ -1372,6 +1380,9 @@ help(const char *progname) printf(_(" --extra-float-digits=NUM override default setting for extra_float_digits\n")); printf(_(" --filter=FILENAME include or exclude objects and data from dump\n" " based on expressions in FILENAME\n")); + printf(_(" --max-table-segment-pages=NUMPAGES\n" + " Number of main table pages above which data is \n" + " copied out in chunks, also determines the chunk size\n")); printf(_(" --if-exists use IF EXISTS when dropping objects\n")); printf(_(" --include-foreign-data=PATTERN\n" " include data of foreign tables on foreign\n" @@ -2412,7 +2423,7 @@ dumpTableData_copy(Archive *fout, const void *dcontext) * a filter condition was specified. For other cases a simple COPY * suffices. */ - if (tdinfo->filtercond || tbinfo->relkind == RELKIND_FOREIGN_TABLE) + if (tdinfo->filtercond || is_segment(tdinfo) || tbinfo->relkind == RELKIND_FOREIGN_TABLE) { /* Temporary allows to access to foreign tables to dump data */ if (tbinfo->relkind == RELKIND_FOREIGN_TABLE) @@ -2428,9 +2439,23 @@ dumpTableData_copy(Archive *fout, const void *dcontext) else appendPQExpBufferStr(q, "* "); - appendPQExpBuffer(q, "FROM %s %s) TO stdout;", + appendPQExpBuffer(q, "FROM %s %s", fmtQualifiedDumpable(tbinfo), tdinfo->filtercond ? tdinfo->filtercond : ""); + if (is_segment(tdinfo)) + { + appendPQExpBufferStr(q, tdinfo->filtercond?" AND ":" WHERE "); + if(tdinfo->startPage == 0) + appendPQExpBuffer(q, "ctid <= '(%u,32000)'", tdinfo->endPage); + else if(tdinfo->endPage != InvalidBlockNumber) + appendPQExpBuffer(q, "ctid BETWEEN '(%u,1)' AND '(%u,32000)'", + tdinfo->startPage, tdinfo->endPage); + else + appendPQExpBuffer(q, "ctid >= '(%u,1)'", tdinfo->startPage); + pg_log_warning("CHUNKING: pages [%u:%u]",tdinfo->startPage, tdinfo->endPage); + } + + appendPQExpBuffer(q, ") TO stdout;"); } else { @@ -2438,6 +2463,9 @@ dumpTableData_copy(Archive *fout, const void *dcontext) fmtQualifiedDumpable(tbinfo), column_list); } + + pg_log_warning("CHUNKING: data query: %s", q->data); + res = ExecuteSqlQuery(fout, q->data, PGRES_COPY_OUT); PQclear(res); destroyPQExpBuffer(clistBuf); @@ -2933,42 +2961,95 @@ dumpTableData(Archive *fout, const TableDataInfo *tdinfo) { TocEntry *te; - te = ArchiveEntry(fout, tdinfo->dobj.catId, tdinfo->dobj.dumpId, - ARCHIVE_OPTS(.tag = tbinfo->dobj.name, - .namespace = tbinfo->dobj.namespace->dobj.name, - .owner = tbinfo->rolname, - .description = "TABLE DATA", - .section = SECTION_DATA, - .createStmt = tdDefn, - .copyStmt = copyStmt, - .deps = &(tbinfo->dobj.dumpId), - .nDeps = 1, - .dumpFn = dumpFn, - .dumpArg = tdinfo)); - - /* - * Set the TocEntry's dataLength in case we are doing a parallel dump - * and want to order dump jobs by table size. We choose to measure - * dataLength in table pages (including TOAST pages) during dump, so - * no scaling is needed. - * - * However, relpages is declared as "integer" in pg_class, and hence - * also in TableInfo, but it's really BlockNumber a/k/a unsigned int. - * Cast so that we get the right interpretation of table sizes - * exceeding INT_MAX pages. + /* data chunking works off relpages, which are computed exactly using + * pg_relation_size() when --max-table-segment-pages was set + * + * We also don't chunk if table access method is not "heap" + * TODO: we may add chunking for other access methods later, maybe + * based on primary key tranges */ - te->dataLength = (BlockNumber) tbinfo->relpages; - te->dataLength += (BlockNumber) tbinfo->toastpages; + if (tbinfo->relpages <= dopt->max_table_segment_pages || + strcmp(tbinfo->amname, "heap") != 0) + { + te = ArchiveEntry(fout, tdinfo->dobj.catId, tdinfo->dobj.dumpId, + ARCHIVE_OPTS(.tag = tbinfo->dobj.name, + .namespace = tbinfo->dobj.namespace->dobj.name, + .owner = tbinfo->rolname, + .description = "TABLE DATA", + .section = SECTION_DATA, + .createStmt = tdDefn, + .copyStmt = copyStmt, + .deps = &(tbinfo->dobj.dumpId), + .nDeps = 1, + .dumpFn = dumpFn, + .dumpArg = tdinfo)); - /* - * If pgoff_t is only 32 bits wide, the above refinement is useless, - * and instead we'd better worry about integer overflow. Clamp to - * INT_MAX if the correct result exceeds that. - */ - if (sizeof(te->dataLength) == 4 && - (tbinfo->relpages < 0 || tbinfo->toastpages < 0 || - te->dataLength < 0)) - te->dataLength = INT_MAX; + /* + * Set the TocEntry's dataLength in case we are doing a parallel dump + * and want to order dump jobs by table size. We choose to measure + * dataLength in table pages (including TOAST pages) during dump, so + * no scaling is needed. + * + * While pg_class.relpages which stores BlockNumber, a/k/a unsigned int, + * is declared as "integer" we convert it back and store it as + * BlockNumber in TableInfo. + * And dataLenght is pgoff_t (long int) so does now overflow for + * 2 x UINT32_MAX + */ + te->dataLength = tbinfo->relpages; + te->dataLength += tbinfo->toastpages; + } + else + { + uint64 current_chunk_start = 0; + PQExpBuffer chunk_desc = createPQExpBuffer(); + + pg_log_warning("CHUNKING: toc for chunked relpages [%u]", tbinfo->relpages); + + /* TODO - use uint 64 for current_chunk_start to avoid wraparound */ + while (current_chunk_start < tbinfo->relpages) + { + TableDataInfo *chunk_tdinfo = (TableDataInfo *) pg_malloc(sizeof(TableDataInfo)); + + memcpy(chunk_tdinfo, tdinfo, sizeof(TableDataInfo)); + AssignDumpId(&chunk_tdinfo->dobj); + //addObjectDependency(&chunk_tdinfo->dobj, tbinfo->dobj.dumpId); /* do we need this here */ +// chunk_tdinfo->is_segment = true; + chunk_tdinfo->startPage = (BlockNumber) current_chunk_start; + chunk_tdinfo->endPage = chunk_tdinfo->startPage + dopt->max_table_segment_pages - 1; + + pg_log_warning("CHUNKING: toc for pages [%u:%u]",chunk_tdinfo->startPage, chunk_tdinfo->endPage); + + current_chunk_start += dopt->max_table_segment_pages; + if (current_chunk_start >= tbinfo->relpages) + chunk_tdinfo->endPage = InvalidBlockNumber; /* last chunk is for "all the rest" */ + + printfPQExpBuffer(chunk_desc, "TABLE DATA (pages %u:%u)", chunk_tdinfo->startPage, chunk_tdinfo->endPage); + + te = ArchiveEntry(fout, chunk_tdinfo->dobj.catId, chunk_tdinfo->dobj.dumpId, + ARCHIVE_OPTS(.tag = tbinfo->dobj.name, + .namespace = tbinfo->dobj.namespace->dobj.name, + .owner = tbinfo->rolname, + .description = chunk_desc->data, + .section = SECTION_DATA, + .createStmt = tdDefn, + .copyStmt = copyStmt, + .deps = &(tbinfo->dobj.dumpId), + .nDeps = 1, + .dumpFn = dumpFn, + .dumpArg = chunk_tdinfo)); + + if(chunk_tdinfo->endPage == InvalidBlockNumber) + te->dataLength = tbinfo->relpages - chunk_tdinfo->startPage; + else + te->dataLength = dopt->max_table_segment_pages; + /* let's assume toast pages distribute evenly among chunks */ + if(tbinfo->relpages) + te->dataLength += te->dataLength * tbinfo->toastpages / tbinfo->relpages; + } + + destroyPQExpBuffer(chunk_desc); + } } destroyPQExpBuffer(copyBuf); @@ -3092,6 +3173,8 @@ makeTableDataInfo(DumpOptions *dopt, TableInfo *tbinfo) tdinfo->dobj.namespace = tbinfo->dobj.namespace; tdinfo->tdtable = tbinfo; tdinfo->filtercond = NULL; /* might get set later */ + tdinfo->startPage = InvalidBlockNumber; /* we use this as indication that no chunking is needed */ + tdinfo->endPage = InvalidBlockNumber; addObjectDependency(&tdinfo->dobj, tbinfo->dobj.dumpId); /* A TableDataInfo contains data, of course */ @@ -7254,8 +7337,16 @@ getTables(Archive *fout, int *numTables) "c.relnamespace, c.relkind, c.reltype, " "c.relowner, " "c.relchecks, " - "c.relhasindex, c.relhasrules, c.relpages, " - "c.reltuples, c.relallvisible, "); + "c.relhasindex, c.relhasrules, "); + + /* fetch current relation size if chunking is requested */ + if(dopt->max_table_segment_pages != InvalidBlockNumber) + appendPQExpBufferStr(query, "pg_relation_size(c.oid)/current_setting('block_size')::int AS relpages, "); + else + /* pg_class.relpages stores BlockNumber (uint32) in an int field, convert to oid to get unsigned int out */ + appendPQExpBufferStr(query, "c.relpages::oid, "); + + appendPQExpBufferStr(query, "c.reltuples, c.relallvisible, "); if (fout->remoteVersion >= 180000) appendPQExpBufferStr(query, "c.relallfrozen, "); @@ -7495,7 +7586,7 @@ getTables(Archive *fout, int *numTables) tblinfo[i].ncheck = atoi(PQgetvalue(res, i, i_relchecks)); tblinfo[i].hasindex = (strcmp(PQgetvalue(res, i, i_relhasindex), "t") == 0); tblinfo[i].hasrules = (strcmp(PQgetvalue(res, i, i_relhasrules), "t") == 0); - tblinfo[i].relpages = atoi(PQgetvalue(res, i, i_relpages)); + tblinfo[i].relpages = strtoul(PQgetvalue(res, i, i_relpages), NULL, 10); if (PQgetisnull(res, i, i_toastpages)) tblinfo[i].toastpages = 0; else diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h index 4c4b14e5fc7..be71661ac41 100644 --- a/src/bin/pg_dump/pg_dump.h +++ b/src/bin/pg_dump/pg_dump.h @@ -16,6 +16,7 @@ #include "pg_backup.h" #include "catalog/pg_publication_d.h" +#include "storage/block.h" #define oidcmp(x,y) ( ((x) < (y) ? -1 : ((x) > (y)) ? 1 : 0) ) @@ -335,7 +336,11 @@ typedef struct _tableInfo Oid owning_tab; /* OID of table owning sequence */ int owning_col; /* attr # of column owning sequence */ bool is_identity_sequence; - int32 relpages; /* table's size in pages (from pg_class) */ + BlockNumber relpages; /* table's size in pages (from pg_class) + * converted to unsigned integer + * when --max-table-segment-pages is set + * the computed from pog_relation_size() + */ int toastpages; /* toast table's size in pages, if any */ bool interesting; /* true if need to collect more data */ @@ -413,8 +418,21 @@ typedef struct _tableDataInfo DumpableObject dobj; TableInfo *tdtable; /* link to table to dump */ char *filtercond; /* WHERE condition to limit rows dumped */ + /* startPage and endPage to support segmented dump */ + BlockNumber startPage; /* As we always know the lowest segment page + * number we can use InvalidBlockNumber here + * to recognize no segmenting case. + * When 0 for the first page of first + * segment we can omit in range query */ + BlockNumber endPage; /* last page in segment for page-range dump, + * startPage+max_table_segment_pages-1 for + * most segments, but InvalidBlockNumber for + * the last one to indicate open range + */ } TableDataInfo; +#define is_segment(tdiptr) (tdiptr->startPage != InvalidBlockNumber) + typedef struct _indxInfo { DumpableObject dobj; @@ -448,7 +466,7 @@ typedef struct _indexAttachInfo typedef struct _relStatsInfo { DumpableObject dobj; - int32 relpages; + BlockNumber relpages; char *reltuples; int32 relallvisible; int32 relallfrozen; diff --git a/src/bin/pg_dump/t/004_pg_dump_parallel.pl b/src/bin/pg_dump/t/004_pg_dump_parallel.pl index 738f34b1c1b..4f35aeed9b9 100644 --- a/src/bin/pg_dump/t/004_pg_dump_parallel.pl +++ b/src/bin/pg_dump/t/004_pg_dump_parallel.pl @@ -11,6 +11,7 @@ use Test::More; my $dbname1 = 'regression_src'; my $dbname2 = 'regression_dest1'; my $dbname3 = 'regression_dest2'; +my $dbname4 = 'regression_dest3'; my $node = PostgreSQL::Test::Cluster->new('main'); $node->init; @@ -21,6 +22,7 @@ my $backupdir = $node->backup_dir; $node->run_log([ 'createdb', $dbname1 ]); $node->run_log([ 'createdb', $dbname2 ]); $node->run_log([ 'createdb', $dbname3 ]); +$node->run_log([ 'createdb', $dbname4 ]); $node->safe_psql( $dbname1, @@ -87,4 +89,33 @@ $node->command_ok( ], 'parallel restore as inserts'); +$node->command_ok( + [ + 'pg_dump', + '--format' => 'directory', + '--max-table-segment-pages' => 2, + '--no-sync', + '--jobs' => 2, + '--file' => "$backupdir/dump3", + $node->connstr($dbname1), + ], + 'parallel dump with chunks of two heap pages'); + +$node->command_ok( + [ + 'pg_restore', '--verbose', + '--dbname' => $node->connstr($dbname4), + '--jobs' => 3, + "$backupdir/dump3", + ], + 'parallel restore with chunks of two heap pages'); + +my $table = 'tplain'; +my $tablehash_query = "SELECT '$table', sum(hashtext(t::text)), count(*) FROM $table AS t"; + +my $result_1 = $node->safe_psql($dbname1, $tablehash_query); +my $result_4 = $node->safe_psql($dbname4, $tablehash_query); + +is($result_4, $result_1, "Hash check for $table: restored db ($result_4) vs original db ($result_1)"); + done_testing(); diff --git a/src/fe_utils/option_utils.c b/src/fe_utils/option_utils.c index cc483ae176c..aff1fbd31a3 100644 --- a/src/fe_utils/option_utils.c +++ b/src/fe_utils/option_utils.c @@ -83,6 +83,61 @@ option_parse_int(const char *optarg, const char *optname, return true; } +/* + * option_parse_uint32 + * + * Parse unsigned integer value for an option. If the parsing is successful, + * returns true and stores the result in *result if that's given; + * if parsing fails, returns false. + */ +bool +option_parse_uint32(const char *optarg, const char *optname, + uint32 min_range, uint32 max_range, + uint32 *result) +{ + char *endptr; + unsigned long val; + + /* Fail if there is a minus sign at the start of value */ + while(isspace((unsigned char) *optarg)) + optarg++; + if(*optarg == '-') + { + pg_log_error("value \"%s\" for option %s can not be negative", + optarg, optname); + return false; + } + + errno = 0; + val = strtoul(optarg, &endptr, 10); + + /* + * Skip any trailing whitespace; if anything but whitespace remains before + * the terminating character, fail. + */ + while (*endptr != '\0' && isspace((unsigned char) *endptr)) + endptr++; + + if (*endptr != '\0') + { + pg_log_error("invalid value \"%s\" for option %s", + optarg, optname); + return false; + } + + /* as min_range and max_range are uint32 then the range check will + * catch the case where unsigned long val is outside 32 bit range */ + if (errno == ERANGE || val < min_range || val > max_range) + { + pg_log_error("%s not in range %u..%u", optname, min_range, max_range); + return false; + } + + if (result) + *result = (uint32) val; + return true; +} + /* * Provide strictly harmonized handling of the --sync-method option. */ diff --git a/src/include/fe_utils/option_utils.h b/src/include/fe_utils/option_utils.h index 0db6e3b6e91..c74cd1fb595 100644 --- a/src/include/fe_utils/option_utils.h +++ b/src/include/fe_utils/option_utils.h @@ -22,6 +22,9 @@ extern void handle_help_version_opts(int argc, char *argv[], extern bool option_parse_int(const char *optarg, const char *optname, int min_range, int max_range, int *result); +extern bool option_parse_uint32(const char *optarg, const char *optname, + uint32 min_range, uint32 max_range, + uint32 *result); extern bool parse_sync_method(const char *optarg, DataDirSyncMethod *sync_method); -- 2.43.0
