Re: Joins on TID
BTW, if we're to start taking joins on TID seriously, we should also add the missing hash opclass for TID, so that you can do hash joins when dealing with a lot of rows. (In principle this also enables things like hash aggregation, though I'm not very clear on a use-case for grouping by TID.) regards, tom lane diff --git a/src/backend/utils/adt/tid.c b/src/backend/utils/adt/tid.c index 41d540b..7b25947 100644 *** a/src/backend/utils/adt/tid.c --- b/src/backend/utils/adt/tid.c *** *** 20,25 --- 20,26 #include #include + #include "access/hash.h" #include "access/heapam.h" #include "access/sysattr.h" #include "catalog/namespace.h" *** tidsmaller(PG_FUNCTION_ARGS) *** 239,244 --- 240,272 PG_RETURN_ITEMPOINTER(ItemPointerCompare(arg1, arg2) <= 0 ? arg1 : arg2); } + Datum + hashtid(PG_FUNCTION_ARGS) + { + ItemPointer key = PG_GETARG_ITEMPOINTER(0); + + /* + * While you'll probably have a lot of trouble with a compiler that + * insists on appending pad space to struct ItemPointerData, we can at + * least make this code work, by not using sizeof(ItemPointerData). + * Instead rely on knowing the sizes of the component fields. + */ + return hash_any((unsigned char *) key, + sizeof(BlockIdData) + sizeof(OffsetNumber)); + } + + Datum + hashtidextended(PG_FUNCTION_ARGS) + { + ItemPointer key = PG_GETARG_ITEMPOINTER(0); + uint64 seed = PG_GETARG_INT64(1); + + /* As above */ + return hash_any_extended((unsigned char *) key, + sizeof(BlockIdData) + sizeof(OffsetNumber), + seed); + } + /* * Functions to get latest tid of a specified tuple. diff --git a/src/include/catalog/pg_amop.dat b/src/include/catalog/pg_amop.dat index e689c9b..436f1bd 100644 *** a/src/include/catalog/pg_amop.dat --- b/src/include/catalog/pg_amop.dat *** *** 1013,1018 --- 1013,1022 { amopfamily => 'hash/cid_ops', amoplefttype => 'cid', amoprighttype => 'cid', amopstrategy => '1', amopopr => '=(cid,cid)', amopmethod => 'hash' }, + # tid_ops + { amopfamily => 'hash/tid_ops', amoplefttype => 'tid', amoprighttype => 'tid', + amopstrategy => '1', amopopr => '=(tid,tid)', amopmethod => 'hash' }, + # text_pattern_ops { amopfamily => 'hash/text_pattern_ops', amoplefttype => 'text', amoprighttype => 'text', amopstrategy => '1', amopopr => '=(text,text)', diff --git a/src/include/catalog/pg_amproc.dat b/src/include/catalog/pg_amproc.dat index bbcee26..8ddb699 100644 *** a/src/include/catalog/pg_amproc.dat --- b/src/include/catalog/pg_amproc.dat *** *** 340,345 --- 340,349 amprocrighttype => 'cid', amprocnum => '1', amproc => 'hashint4' }, { amprocfamily => 'hash/cid_ops', amproclefttype => 'cid', amprocrighttype => 'cid', amprocnum => '2', amproc => 'hashint4extended' }, + { amprocfamily => 'hash/tid_ops', amproclefttype => 'tid', + amprocrighttype => 'tid', amprocnum => '1', amproc => 'hashtid' }, + { amprocfamily => 'hash/tid_ops', amproclefttype => 'tid', + amprocrighttype => 'tid', amprocnum => '2', amproc => 'hashtidextended' }, { amprocfamily => 'hash/text_pattern_ops', amproclefttype => 'text', amprocrighttype => 'text', amprocnum => '1', amproc => 'hashtext' }, { amprocfamily => 'hash/text_pattern_ops', amproclefttype => 'text', diff --git a/src/include/catalog/pg_opclass.dat b/src/include/catalog/pg_opclass.dat index 5178d04..c451d36 100644 *** a/src/include/catalog/pg_opclass.dat --- b/src/include/catalog/pg_opclass.dat *** *** 167,172 --- 167,174 opcintype => 'xid' }, { opcmethod => 'hash', opcname => 'cid_ops', opcfamily => 'hash/cid_ops', opcintype => 'cid' }, + { opcmethod => 'hash', opcname => 'tid_ops', opcfamily => 'hash/tid_ops', + opcintype => 'tid' }, { opcmethod => 'hash', opcname => 'text_pattern_ops', opcfamily => 'hash/text_pattern_ops', opcintype => 'text', opcdefault => 'f' }, diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat index 2abd531..e8452e1 100644 *** a/src/include/catalog/pg_operator.dat --- b/src/include/catalog/pg_operator.dat *** *** 204,212 oprrest => 'eqsel', oprjoin => 'eqjoinsel' }, { oid => '387', oid_symbol => 'TIDEqualOperator', descr => 'equal', ! oprname => '=', oprcanmerge => 't', oprleft => 'tid', oprright => 'tid', ! oprresult => 'bool', oprcom => '=(tid,tid)', oprnegate => '<>(tid,tid)', ! oprcode => 'tideq', oprrest => 'eqsel', oprjoin => 'eqjoinsel' }, { oid => '402', descr => 'not equal', oprname => '<>', oprleft => 'tid', oprright => 'tid', oprresult => 'bool', oprcom => '<>(tid,tid)', oprnegate => '=(tid,tid)', oprcode => 'tidne', --- 204,213 oprrest => 'eqsel', oprjoin => 'eqjoinsel' }, { oid => '387', oid_symbol => 'TIDEqualOperator', descr => 'equal', ! oprname => '=', oprcanmerge => 't', oprcanhash => 't', oprleft => 'tid', ! oprright =>
Re: pgsql: Check for conflicting queries during replay of gistvacuumpage()
On Fri, Dec 21, 2018 at 7:09 PM Tom Lane wrote: > Alvaro Herrera writes: > >> Hmmm, I'm fairly sure you should have bumped XLOG_PAGE_MAGIC for this > >> change. Otherwise, what is going to happen to an unpatched standby (of > >> released versions) that receives the new WAL record from a patched > >> primary? > > Oh, and if the answer to your question is not "it fails with an > intelligible error about an unrecognized WAL record type", then we > need to adjust what is emitted so that that will be what happens. > Crashing, or worse silently misprocessing the record, will not do. Please, note that backpatched version takes special efforts to not introduce new WAL record type. Unpatched standby applies WAL stream of patched primary without any errors, but ignoring conflicts (as it was before) [1]. Patched standby applies the same WAL stream with conflict handling. And I've briefly mentioned that in commit message. "On stable releases we've to be tricky to keep WAL compatibility. Information required for conflict processing is just appended to data of XLOG_GIST_PAGE_UPDATE record. So, PostgreSQL version, which doesn't know about conflict processing, will just ignore that." The thing we can mention in the release notes is that both primary and standby should be upgraded to get conflict handling. If one of them is not upgraded, conflicts will be still missed. Links: 1. https://www.postgresql.org/message-id/CAPpHfdsKS0K8q1sJ-XyMrU%3DL%2Be6XSAOgS09NXp1bQDQts%2Bqz%2Bg%40mail.gmail.com -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Change pgarch_readyXlog() to return .history files first
On Fri, Dec 21, 2018 at 08:17:12AM +0200, David Steele wrote: > I thought about doing that, but wanted to focus on the task at hand. It > does save a strcpy and a bit of stack space, so seems like a win. > > Overall, the patch looks good to me. I think breaking up the if does make > the code more readable. Thanks for the lookups. I can see that the patch applies without conflicts down to 9.4, and based on the opinions gathered on this thread back-patching this stuff is the consensus, based on input from Kyotaro Horiguchi and David Steele (I don't mind much myself). So, any objections from others in doing so? -- Michael signature.asc Description: PGP signature
Re: [PATCH] Improve tab completion for CREATE TABLE
On Fri, Dec 21, 2018 at 03:14:36PM +, Dagfinn Ilmari Mannsåker wrote: > Here's a patch that does this (and in passing alphabetises the list of > options). Cool, thanks. The position of the option list is fine. However list_TABLEOPTIONS is not a name consistent with the surroundings. So we could just use table_level_option? Reordering them is a good idea, log_autovacuum_min_duration being the bad entry. -- Michael signature.asc Description: PGP signature
Re: Clean up some elog messages and comments for do_pg_stop_backup and do_pg_start_backup
On Fri, Dec 21, 2018 at 10:43:57AM -0300, Alvaro Herrera wrote: > errhint("Check that your archive_command is executing properly. " > + "Backup can be canceled safely, " > "but the database backup will not be usable without all the WAL > segments."))) > > I think repeating the same term in the third line is not great. Some > ideas: > > Backups can be canceled safely, but they will not be usable without all the > WAL segments. > The backup can be canceled safely, but it will not be usable without all the > WAL segments. > Database backups can be canceled safely, but the current backup will not be > usable without all the WAL segments. > Database backups can be canceled safely, but no backup will be usable without > all the WAL segments. Yes, I agree that repeating two times the work backup is not great. What about the following then? This is your second proposal except that the sentence refers to the backup current running using "this", which shows better the context in my opinion: "This backup can be canceled safely, but it will not be usable without all the WAL segments. -- Michael signature.asc Description: PGP signature
Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
On 12/20/18, Tom Lane wrote: > I'd be inclined to put the script in src/tools, I think. IMO src/common > is for code that actually gets built into our executables. Done. >> which takes >> pl_unreserved_kwlist.h as input and outputs >> pl_unreserved_kwlist_offset.h and pl_unreserved_kwlist_string.h. > > I wonder whether we'd not be better off producing just one output > file, in which we have the offsets emitted as PG_KEYWORD macros > and then the giant string emitted as a macro definition, ie > something like > > #define PG_KEYWORD_STRING \ > "absolute\0" \ > "alias\0" \ > ... > > That simplifies the Makefile-hacking, at least, and it possibly gives > callers more flexibility about what they actually want to do with the > string. Okay, I tried that. Since the script is turning one header into another, I borrowed the "*_d.h" nomenclature from the catalogs. Using a single file required some #ifdef hacks in the output file. Maybe there's a cleaner way to do this, but I don't know what it is. Using a single file also gave me another idea: Take value and category out of ScanKeyword, and replace them with an index into another array containing those, which will only be accessed in the event of a hit. That would shrink ScanKeyword to 4 bytes (offset, index), further increasing locality of reference. Might not be worth it, but I can try it after moving on to the core scanner. > I'm for "not change things unnecessarily". People might well be > scraping the keyword list out of parser/kwlist.h for other purposes > right now --- indeed, it's defined the way it is exactly to let > people do that. I don't see a good reason to force them to redo > whatever tooling they have that depends on that. So let's build > kwlist_offsets.h alongside that, but not change kwlist.h itself. Done. >> I used the global .gitignore, but maybe that's an abuse of it. > > Yeah, I'd say it is. Moved. >> +# TODO: Error out if the keyword names are not in ASCII order. > > +many for including such a check. Done. > Also note that we don't require people to have Perl installed when > building from a tarball. Therefore, these derived headers must get > built during "make distprep" and removed by maintainer-clean but > not distclean. I think this also has some implications for VPATH > builds, but as long as you follow the pattern used for other > derived header files (e.g. fmgroids.h), you should be fine. Done. I also blindly added support for MSVC. -John Naylor src/common/keywords.c | 55 src/include/common/keywords.h | 14 +++ src/pl/plpgsql/src/.gitignore | 1 + src/pl/plpgsql/src/Makefile | 9 +- src/pl/plpgsql/src/pl_scanner.c | 119 ++ src/pl/plpgsql/src/pl_unreserved_kwlist.h | 107 +++ src/tools/gen_keywords.pl | 136 ++ src/tools/msvc/Solution.pm| 10 +++ 8 files changed, 356 insertions(+), 95 deletions(-) diff --git a/src/common/keywords.c b/src/common/keywords.c index 0c0c794c68..0fb14a0372 100644 --- a/src/common/keywords.c +++ b/src/common/keywords.c @@ -112,3 +112,58 @@ ScanKeywordLookup(const char *text, return NULL; } + +/* Like ScanKeywordLookup, but uses offsets into a keyword string. */ +const ScanKeywordOffset * +ScanKeywordLookupOffset(const char *text, + const ScanKeywordOffset *keywords, + int num_keywords, + const char *keyword_string) +{ + int len, +i; + char word[NAMEDATALEN]; + const ScanKeywordOffset *low; + const ScanKeywordOffset *high; + + len = strlen(text); + /* We assume all keywords are shorter than NAMEDATALEN. */ + if (len >= NAMEDATALEN) + return NULL; + + /* + * Apply an ASCII-only downcasing. We must not use tolower() since it may + * produce the wrong translation in some locales (eg, Turkish). + */ + for (i = 0; i < len; i++) + { + char ch = text[i]; + + if (ch >= 'A' && ch <= 'Z') + ch += 'a' - 'A'; + word[i] = ch; + } + word[len] = '\0'; + + /* + * Now do a binary search using plain strcmp() comparison. + */ + low = keywords; + high = keywords + (num_keywords - 1); + while (low <= high) + { + const ScanKeywordOffset *middle; + int difference; + + middle = low + (high - low) / 2; + difference = strcmp(keyword_string + middle->offset, word); + if (difference == 0) + return middle; + else if (difference < 0) + low = middle + 1; + else + high = middle - 1; + } + + return NULL; +} diff --git a/src/include/common/keywords.h b/src/include/common/keywords.h index 0b31505b66..7cd7bf4461 100644 --- a/src/include/common/keywords.h +++ b/src/include/common/keywords.h @@ -28,11 +28,20 @@ typedef struct ScanKeyword int16 category; /* see codes above */ } ScanKeyword; +typedef struct ScanKeywordOffset +{ + int32 offset; /* offset into a keyword string */ + int16 value; /* grammar's token code */ + int16 category; /* see codes
Re: could recovery_target_timeline=latest be the default in standby mode?
On Fri, Dec 21, 2018 at 01:54:20PM +0300, Sergei Kornilov wrote: > I am +1 for recovery_target_timeline=latest by default. This is > common case in my opinion. I agree also that switching to the latest timeline should be the default. People get confused because of the current default. -- Michael signature.asc Description: PGP signature
Re: [PATCH] Improve tab completion for CREATE TABLE
On Fri, Dec 21, 2018 at 01:57:40PM +, Dagfinn Ilmari Mannsåker wrote: > Yeah, because of that we can't do the obvious HeadMatches("CREATE", > "TABLE") && (TailMatches(...) || TailMatches(...) || ...). I believe > this would require extending the match syntax with regex-like grouping, > alternation and repetition operators, which I'm not volunteering to > do. That seems to be a lot of work for little benefit as few queries are able to work within CREATE SCHEMA, so I would not take this road. -- Michael signature.asc Description: PGP signature
Joins on TID
I decided to spend an afternoon seeing exactly how much work would be needed to support parameterized TID scans, ie nestloop-with-inner-TID- scan joins, as has been speculated about before, most recently here: https://www.postgresql.org/message-id/flat/CAMqTPq%3DhNg0GYFU0X%2BxmuKy8R2ARk1%2BA_uQpS%2BMnf71MYpBKzg%40mail.gmail.com It turns out it's not that bad, less than 200 net new lines of code (all of it in the planner; the executor seems to require no work). Much of the code churn is because tidpath.c is so ancient and crufty. It was mostly ignoring the RestrictInfo infrastructure, in particular emitting the list of tidquals as just bare clauses not RestrictInfos. I had to change that in order to avoid inefficiencies in some places. I haven't really looked at how much of a merge problem there'll be with Edmund Horner's work for TID range scans. My feeling about it is that we might be best off treating that as a totally separate code path, because the requirements are significantly different (for instance, a range scan needs AND semantics not OR semantics for the list of quals to apply). regards, tom lane diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index 480fd25..88780c0 100644 *** a/src/backend/optimizer/path/costsize.c --- b/src/backend/optimizer/path/costsize.c *** cost_tidscan(Path *path, PlannerInfo *ro *** 1202,1216 ntuples = 0; foreach(l, tidquals) { ! if (IsA(lfirst(l), ScalarArrayOpExpr)) { /* Each element of the array yields 1 tuple */ ! ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) lfirst(l); Node *arraynode = (Node *) lsecond(saop->args); ntuples += estimate_array_length(arraynode); } ! else if (IsA(lfirst(l), CurrentOfExpr)) { /* CURRENT OF yields 1 tuple */ isCurrentOf = true; --- 1202,1219 ntuples = 0; foreach(l, tidquals) { ! RestrictInfo *rinfo = lfirst_node(RestrictInfo, l); ! Expr *qual = rinfo->clause; ! ! if (IsA(qual, ScalarArrayOpExpr)) { /* Each element of the array yields 1 tuple */ ! ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) qual; Node *arraynode = (Node *) lsecond(saop->args); ntuples += estimate_array_length(arraynode); } ! else if (IsA(qual, CurrentOfExpr)) { /* CURRENT OF yields 1 tuple */ isCurrentOf = true; diff --git a/src/backend/optimizer/path/tidpath.c b/src/backend/optimizer/path/tidpath.c index 3bb5b8d..335de0a 100644 *** a/src/backend/optimizer/path/tidpath.c --- b/src/backend/optimizer/path/tidpath.c *** *** 12,29 * this allows * WHERE ctid IN (tid1, tid2, ...) * * We also support "WHERE CURRENT OF cursor" conditions (CurrentOfExpr), * which amount to "CTID = run-time-determined-TID". These could in * theory be translated to a simple comparison of CTID to the result of * a function, but in practice it works better to keep the special node * representation all the way through to execution. * - * There is currently no special support for joins involving CTID; in - * particular nothing corresponding to best_inner_indexscan(). Since it's - * not very useful to store TIDs of one table in another table, there - * doesn't seem to be enough use-case to justify adding a lot of code - * for that. - * * * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group * Portions Copyright (c) 1994, Regents of the University of California --- 12,28 * this allows * WHERE ctid IN (tid1, tid2, ...) * + * As with indexscans, our definition of "pseudoconstant" is pretty liberal: + * we allow anything that doesn't involve a volatile function or a Var of + * the relation under consideration. Vars belonging to other relations of + * the query are allowed, giving rise to parameterized TID scans. + * * We also support "WHERE CURRENT OF cursor" conditions (CurrentOfExpr), * which amount to "CTID = run-time-determined-TID". These could in * theory be translated to a simple comparison of CTID to the result of * a function, but in practice it works better to keep the special node * representation all the way through to execution. * * * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group * Portions Copyright (c) 1994, Regents of the University of California *** *** 44,125 #include "optimizer/pathnode.h" #include "optimizer/paths.h" #include "optimizer/restrictinfo.h" ! static bool IsTidEqualClause(OpExpr *node, int varno); ! static bool IsTidEqualAnyClause(ScalarArrayOpExpr *node, int varno); ! static List *TidQualFromExpr(Node *expr, int varno); ! static List *TidQualFromBaseRestrictinfo(RelOptInfo *rel); ! /* ! * Check to see if an opclause is of the form * CTID = pseudoconstant * or * pseudoconstant = CTID ! * ! * We check that the CTID Var belongs to relation "varno". That is
Re: Offline enabling/disabling of data checksums
On Fri, Dec 21, 2018 at 09:16:16PM +0100, Michael Banck wrote: > It adds an (now mandatory) --action parameter that takes either verify, > enable or disable as argument. There are two discussion points which deserve attention here: 1) Do we want to rename pg_verify_checksums to something else, like pg_checksums. I like a lot if we would do a simple renaming of the tool, which should be the first step taken. 2) Which kind of interface do we want to use? When I did my own flavor of pg_checksums, I used an --action switch able to use the following values: - enable - disable - verify The switch cannot be specified twice (perhaps we could enforce the last value as other binaries do in the tree, not sure if that's adapted here). A second type of interface is to use one switch per action. For both interfaces if no action is specify then the tool fails. Vote is open. > This is basically meant as a stop-gap measure in case online activation > of checksums won't make it for v12, but maybe it is independently > useful? I think that this is independently useful, I got this stuff part of an upgrade workflow where the user is ready to accept some extra one-time offline time so as checksums are enabled. > Things I have not done so far: > > 1. Rename pg_verify_checksums to e.g. pg_checksums as it will no longer > only verify checksums. Check. That sounds right to me. > 2. Rename the scan_* functions (Michael renamed them to operate_file and > operate_directory but I am not sure it is worth it. The renaming makes sense, as scan implies only reading while enabling checksums causes a write. > 3. Once that patch is in, there would be a way to disable checksums so > there'd be a case to also change the initdb default to enabled, but that > required further discussion (and maybe another round of benchmarks). Perhaps, that's unrelated to this thread though. I am not sure that all users would be ready to pay the extra cost of checksums enabled by default. -- Michael signature.asc Description: PGP signature
Re: ATTACH/DETACH PARTITION CONCURRENTLY
On Fri, 21 Dec 2018 at 10:05, Robert Haas wrote: > On Thu, Dec 20, 2018 at 3:58 PM Alvaro Herrera > wrote: > > Namely: how does this handle the case of partition pruning structure > > being passed from planner to executor, if an attach happens in the > > middle of it and puts a partition in between existing partitions? Array > > indexes of any partitions that appear later in the partition descriptor > > will change. > > > > This is the reason I used the query snapshot rather than EState. > > I didn't handle that. If partition pruning relies on nothing changing > between planning and execution, isn't that broken regardless of any of > this? It's true that with the simple query protocol we'll hold locks > continuously from planning into execution, and therefore with the > current locking regime we couldn't really have a problem. But unless > I'm confused, with the extended query protocol it's quite possible to > generate a plan, release locks, and then reacquire locks at execution > time. Unless we have some guarantee that a new plan will always be > generated if any DDL has happened in the middle, I think we've got > trouble, and I don't think that is guaranteed in all cases. Today the plan would be invalidated if a partition was ATTACHED or DETACHED. The newly built plan would get the updated list of partitions. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Compiling on Termux
On Sat, Dec 22, 2018 at 9:19 AM David Fetter wrote: > On Fri, Dec 21, 2018 at 04:24:20PM -0500, Andrew Dunstan wrote: > > On 12/21/18 4:04 PM, Thomas Munro wrote: > > > On Sat, Dec 22, 2018 at 7:32 AM David Fetter wrote: > > >> On Sat, Dec 22, 2018 at 07:03:32AM +1100, Thomas Munro wrote: > > >>> That talks about using -D__ANDROID_API__=23 (or presumably higher) to > > >>> make sure that sigtimedwait is exposed by signal.h. Something similar > > >>> may be afoot here. > > >> That worked perfectly...to expose the next issue, namely that at least > > >> part of the System-V IPC mechanisms have been left out. Do we support > > >> other systems where this is true? > > > Interesting, they left that stuff out deliberately out basically > > > because it all sucks: > > > > > > https://android.googlesource.com/platform/ndk/+/4e159d95ebf23b5f72bb707b0cb1518ef96b3d03/docs/system/libc/SYSV-IPC.TXT > > > > > > PostgreSQL currently needs SysV shm just for the small segment that's > > > used for preventing multiple copies running in the same pgdata. You'd > > > need to find a new way to do that. We don't use SysV semaphores > > > (though we can) or message queues. > > > > That doc says: > > > > Killing processes automatically to make room for new ones is an > > important part of Android's application lifecycle implementation. > > > > How's that going to play with Postgres? Sounds like the OOM killer on > > steroids. > > I don't know precisely how it's going to play with Postgres, but > Termux does supply a Postgres in its native packages. That package > appears to work, at least in the single-connection case, so they're > doing something somehow to get it up and running. > > It could be that Android won't be a platform we recommend for high > workloads, at least as long as these things remain true. They use libandroid-shmem which emulates SysV shmem. https://github.com/termux/termux-packages/blob/master/packages/postgresql/src-backend-Makefile.patch https://github.com/termux/libandroid-shmem -- Thomas Munro http://www.enterprisedb.com
Re: Compiling on Termux
On Fri, Dec 21, 2018 at 04:24:20PM -0500, Andrew Dunstan wrote: > On 12/21/18 4:04 PM, Thomas Munro wrote: > > On Sat, Dec 22, 2018 at 7:32 AM David Fetter wrote: > >> On Sat, Dec 22, 2018 at 07:03:32AM +1100, Thomas Munro wrote: > >>> That talks about using -D__ANDROID_API__=23 (or presumably higher) to > >>> make sure that sigtimedwait is exposed by signal.h. Something similar > >>> may be afoot here. > >> That worked perfectly...to expose the next issue, namely that at least > >> part of the System-V IPC mechanisms have been left out. Do we support > >> other systems where this is true? > > Interesting, they left that stuff out deliberately out basically > > because it all sucks: > > > > https://android.googlesource.com/platform/ndk/+/4e159d95ebf23b5f72bb707b0cb1518ef96b3d03/docs/system/libc/SYSV-IPC.TXT > > > > PostgreSQL currently needs SysV shm just for the small segment that's > > used for preventing multiple copies running in the same pgdata. You'd > > need to find a new way to do that. We don't use SysV semaphores > > (though we can) or message queues. > > That doc says: > > Killing processes automatically to make room for new ones is an > important part of Android's application lifecycle implementation. > > How's that going to play with Postgres? Sounds like the OOM killer on > steroids. I don't know precisely how it's going to play with Postgres, but Termux does supply a Postgres in its native packages. That package appears to work, at least in the single-connection case, so they're doing something somehow to get it up and running. It could be that Android won't be a platform we recommend for high workloads, at least as long as these things remain true. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Use an enum for RELKIND_*?
Out of curiosity I built with -Wswitch-enum to see if it would be possible to just enable it. It looks like the main culprits are the node types and if those were switched to #defines it might be feasible to do so though it would still be a lot of hassle to add case labels all over the place. But I had a second look to see if the output pointed out any actual bugs. I found one though it's pretty minor: lockfuncs.c:234:3: warning: enumeration value ‘LOCKTAG_SPECULATIVE_TOKEN’ not handled in switch [-Wswitch-enum] switch ((LockTagType) instance->locktag.locktag_type) ^~ It just causes speculative locks to be printed wrong in the pg_lock_status view. And speculative locks are only held transiently so unless you're do a bulk load using ON CONFLICT (and also cared about pg_lock_status, perhaps in some monitoring tool) you wouldn't even notice this: postgres***=# select * from pg_lock_status() where locktype = 'speculative token'; ┌───┬──┬──┬──┬───┬┬───┬─┬───┬──┬┬───┬───┬─┬──┐ │ locktype │ database │ relation │ page │ tuple │ virtualxid │ transactionid │ classid │ objid │ objsubid │ virtualtransaction │ pid │ mode │ granted │ fastpath │ ├───┼──┼──┼──┼───┼┼───┼─┼───┼──┼┼───┼───┼─┼──┤ │ speculative token │ 634 │ │ │ │ │ │ 1 │ 0 │0 │ 4/32 │ 18652 │ ExclusiveLock │ t │ f│ └───┴──┴──┴──┴───┴┴───┴─┴───┴──┴┴───┴───┴─┴──┘ The speculative lock is actually on a transaction ID and "speculative lock token" so the "database", "objid", and "objsubid" are bogus here.
Re: [HACKERS] ArrayLists instead of List (for some things)
On Fri, 3 Nov 2017 at 03:17, Tom Lane wrote: > > David Rowley writes: > > Comments on the design are welcome, but I was too late to the > > commitfest, so there are other priorities. However, if you have a > > strong opinion, feel free to voice it. > > I do not like replacing Lists piecemeal; that's a recipe for ongoing > API breakage and back-patching pain. Plus we'll then have *four* > different linked-list implementations in the backend, which sure > seems like too many. > > We've jacked up the List API and driven a new implementation underneath > once before. Maybe it's time to do that again. (reviving this old thread as it's still on my list of things to work on) How would you feel if I submitted a patch that changed all locations where we test for an empty list with: if (list) or if (list == NIL) to a more standard form of if (list_is_empty(list)) ? The idea here is that I'd like a way to preallocate memory for a list to a given size before we start populating it, and that flies in the face of how we currently test for list emptiness. Such a macro could be backpatched to assist extensions. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Performance issue in foreign-key-aware join estimation
On Sat, 22 Dec 2018 at 09:28, David Rowley wrote: > Going by my profiler this drops match_eclasses_to_foreign_key_col() > down to just 10% of total planner time for this query. The new > bms_is_member() call is pretty hot inside that function though. I should have said 28% instead of 10%. 10% is the time spent exclusively just in that function (not in a function called by that function). The bms_is_member() call I mention above is about 20% of the total time, which likely includes some other call sites too. Back in [1], I mentioned that I'd like to start moving away from our linked list based implementation of List and start using an array based version instead. If we had this we could easily further improve this code fkey matching code to not even look near equivalence classes that don't contain members for the relations in question with a design something like: 1. Make PlannerInfo->eq_classes an array list instead of an array, this will significantly improve the performance of list_nth(). 2. Have a Bitmapset per relation that indexes which items in eq_classes have ec_members for the given relation. 3. In match_eclasses_to_foreign_key_col() perform a bms_overlap() on the eq_classes index bitmapsets for the relations at either side of the foreign key then perform a bms_next_member() type loop over the result of that in order to skip over the eq_classes items that can't match. Since match_foreign_keys_to_quals() calls match_eclasses_to_foreign_key_col() once for each item in PlannerInfo->fkey_list (167815 items in this case) and again for each foreign key column in those keys, then this should significantly reduce the effort required since we make a pass over *every* equivalence class each time match_eclasses_to_foreign_key_col() gets called. This array list implementation is something I did want to get one for PG12. The height of the bar to do this is pretty high given what was mentioned in [2]. [1] https://www.postgresql.org/message-id/CAKJS1f_2SnXhPVa6eWjzy2O9A%3Docwgd0Cj-LQeWpGtrWqbUSDA%40mail.gmail.com [2] https://www.postgresql.org/message-id/21592.1509632225%40sss.pgh.pa.us -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Compiling on Termux
On Sat, Dec 22, 2018 at 7:32 AM David Fetter wrote: > On Sat, Dec 22, 2018 at 07:03:32AM +1100, Thomas Munro wrote: > > That talks about using -D__ANDROID_API__=23 (or presumably higher) to > > make sure that sigtimedwait is exposed by signal.h. Something similar > > may be afoot here. > > That worked perfectly...to expose the next issue, namely that at least > part of the System-V IPC mechanisms have been left out. Do we support > other systems where this is true? Interesting, they left that stuff out deliberately out basically because it all sucks: https://android.googlesource.com/platform/ndk/+/4e159d95ebf23b5f72bb707b0cb1518ef96b3d03/docs/system/libc/SYSV-IPC.TXT PostgreSQL currently needs SysV shm just for the small segment that's used for preventing multiple copies running in the same pgdata. You'd need to find a new way to do that. We don't use SysV semaphores (though we can) or message queues. -- Thomas Munro http://www.enterprisedb.com
Re: Compiling on Termux
On Sat, Dec 22, 2018 at 07:03:32AM +1100, Thomas Munro wrote: > On Sat, Dec 22, 2018 at 5:56 AM David Fetter wrote: > > > > Folks, > > > > I'm trying to compile master (c952eae52a33069e2e92d34f217b43d0eca3d7de) > > on Termux, using the supplied settings, as follows. > > > > pg_config --configure | xargs ./configure > configure.out 2>configure.err > > make -j 4 > make.out 2> make.err > > > > There appears to be some confusion somewhere about sync_file_range, > > namely that it's found by ./configure and then not found in make. > > > > What should I be poking at to make this work? > > Apparently your libc (or something else) defines the function so the > configure test passes, but your doesn't declare it so we > can't use it. I guess Termux supplies the headers but your Android > supplies the libraries, so there may be sync issues. I'd try hunting > around for declarations with something like find /usr/include -name > '*.h' | xargs grep sync_file_range. Here's an interesting similar > case: > > https://github.com/termux/termux-packages/issues/899 > > That talks about using -D__ANDROID_API__=23 (or presumably higher) to > make sure that sigtimedwait is exposed by signal.h. Something similar > may be afoot here. That worked perfectly...to expose the next issue, namely that at least part of the System-V IPC mechanisms have been left out. Do we support other systems where this is true? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Performance issue in foreign-key-aware join estimation
On Fri, 21 Dec 2018 at 06:44, Tom Lane wrote: > I was distressed to discover via perf that 69% of the runtime of this > test now goes into match_eclasses_to_foreign_key_col(). That seems > clearly unacceptable. Agreed. That's pretty terrible. I looked at this a bit and see that match_eclasses_to_foreign_key_col() is missing any smarts to skip equivalence classes that don't have ec_relids bits for both rels. With that added the run-time is reduced pretty dramatically. I've only tested with a debug build as of now, but I get: Unpatched: $ pgbench -n -T 60 -f query.sql postgres latency average = 18411.604 ms Patched: latency average = 8748.177 ms Going by my profiler this drops match_eclasses_to_foreign_key_col() down to just 10% of total planner time for this query. The new bms_is_member() call is pretty hot inside that function though. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services speed_up_matching_fkeys_to_eclasses.patch Description: Binary data
Offline enabling/disabling of data checksums
Hi, the attached patch adds offline enabling/disabling of checksums to pg_verify_checksums. It is based on independent work both Michael (Paquier) and me did earlier this year and takes changes from both, see https://github.com/credativ/pg_checksums and https://github.com/michaelpq/pg_plugins/tree/master/pg_checksums It adds an (now mandatory) --action parameter that takes either verify, enable or disable as argument. This is basically meant as a stop-gap measure in case online activation of checksums won't make it for v12, but maybe it is independently useful? Things I have not done so far: 1. Rename pg_verify_checksums to e.g. pg_checksums as it will no longer only verify checksums. 2. Rename the scan_* functions (Michael renamed them to operate_file and operate_directory but I am not sure it is worth it. 3. Once that patch is in, there would be a way to disable checksums so there'd be a case to also change the initdb default to enabled, but that required further discussion (and maybe another round of benchmarks). Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c index 6444fc9ca4..65d6195509 100644 --- a/src/bin/pg_verify_checksums/pg_verify_checksums.c +++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c @@ -1,7 +1,7 @@ /* * pg_verify_checksums * - * Verifies page level checksums in an offline cluster + * Verifies/enables/disables page level checksums in an offline cluster * * Copyright (c) 2010-2018, PostgreSQL Global Development Group * @@ -13,15 +13,16 @@ #include #include +#include "access/xlog_internal.h" #include "catalog/pg_control.h" #include "common/controldata_utils.h" +#include "common/file_perm.h" +#include "common/file_utils.h" #include "getopt_long.h" #include "pg_getopt.h" #include "storage/bufpage.h" #include "storage/checksum.h" #include "storage/checksum_impl.h" -#include "storage/fd.h" - static int64 files = 0; static int64 blocks = 0; @@ -31,16 +32,32 @@ static ControlFileData *ControlFile; static char *only_relfilenode = NULL; static bool verbose = false; +typedef enum +{ + PG_ACTION_NONE, + PG_ACTION_DISABLE, + PG_ACTION_ENABLE, + PG_ACTION_VERIFY +} ChecksumAction; + +/* Filename components */ +#define PG_TEMP_FILES_DIR "pgsql_tmp" +#define PG_TEMP_FILE_PREFIX "pgsql_tmp" + +static ChecksumAction action = PG_ACTION_NONE; + static const char *progname; static void usage(void) { - printf(_("%s verifies data checksums in a PostgreSQL database cluster.\n\n"), progname); + printf(_("%s enables/disables/verifies data checksums in a PostgreSQL database cluster.\n\n"), progname); printf(_("Usage:\n")); printf(_(" %s [OPTION]... [DATADIR]\n"), progname); printf(_("\nOptions:\n")); printf(_(" [-D, --pgdata=]DATADIR data directory\n")); + printf(_(" -A, --action action to take on the cluster, can be set as\n")); + printf(_(" \"verify\", \"enable\" and \"disable\"\n")); printf(_(" -v, --verbose output verbose messages\n")); printf(_(" -r RELFILENODE check only relation with specified relfilenode\n")); printf(_(" -V, --version output version information, then exit\n")); @@ -80,6 +97,80 @@ skipfile(const char *fn) } static void +updateControlFile(char *DataDir, ControlFileData *ControlFile) +{ + int fd; + char buffer[PG_CONTROL_FILE_SIZE]; + char ControlFilePath[MAXPGPATH]; + + Assert(action == PG_ACTION_ENABLE || + action == PG_ACTION_DISABLE); + + /* + * For good luck, apply the same static assertions as in backend's + * WriteControlFile(). + */ +#if PG_VERSION_NUM >= 10 + StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_MAX_SAFE_SIZE, + "pg_control is too large for atomic disk writes"); +#endif + StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_FILE_SIZE, + "sizeof(ControlFileData) exceeds PG_CONTROL_FILE_SIZE"); + + /* Recalculate CRC of control file */ + INIT_CRC32C(ControlFile->crc); + COMP_CRC32C(ControlFile->crc, +(char *) ControlFile, +offsetof(ControlFileData, crc)); + FIN_CRC32C(ControlFile->crc); + + /* + * Write out PG_CONTROL_FILE_SIZE bytes into pg_control by zero-padding + * the excess over sizeof(ControlFileData), to avoid premature EOF related + * errors when reading it. + */ + memset(buffer, 0, PG_CONTROL_FILE_SIZE); + memcpy(buffer, ControlFile, sizeof(ControlFileData)); + + snprintf(ControlFilePath, sizeof(ControlFilePath), "%s/%s", DataDir, XLOG_CONTROL_FILE); + + unlink(ControlFilePath); + + fd = open(ControlFilePath, +
Re: Compiling on Termux
On Sat, Dec 22, 2018 at 5:56 AM David Fetter wrote: > > Folks, > > I'm trying to compile master (c952eae52a33069e2e92d34f217b43d0eca3d7de) > on Termux, using the supplied settings, as follows. > > pg_config --configure | xargs ./configure > configure.out 2>configure.err > make -j 4 > make.out 2> make.err > > There appears to be some confusion somewhere about sync_file_range, > namely that it's found by ./configure and then not found in make. > > What should I be poking at to make this work? Apparently your libc (or something else) defines the function so the configure test passes, but your doesn't declare it so we can't use it. I guess Termux supplies the headers but your Android supplies the libraries, so there may be sync issues. I'd try hunting around for declarations with something like find /usr/include -name '*.h' | xargs grep sync_file_range. Here's an interesting similar case: https://github.com/termux/termux-packages/issues/899 That talks about using -D__ANDROID_API__=23 (or presumably higher) to make sure that sigtimedwait is exposed by signal.h. Something similar may be afoot here. -- Thomas Munro http://www.enterprisedb.com
Re: slow queries over information schema.tables
Just brainstorming here. Another crazy idea would be to get rid of "name" data type, at least from the user-visible planner point of view. It would probably have to be stored as a fixed length data type like today but with a one-byte length header. That would make it possible for the planner to use as if it was just another varchar.
Re: Tid scan improvements
BTW, with respect to this bit in 0001: @@ -1795,6 +1847,15 @@ nulltestsel(PlannerInfo *root, NullTestType nulltesttype, Node *arg, return (Selectivity) 0; /* keep compiler quiet */ } } +else if (vardata.var && IsA(vardata.var, Var) && + ((Var *) vardata.var)->varattno == SelfItemPointerAttributeNumber) +{ +/* + * There are no stats for system columns, but we know CTID is never + * NULL. + */ +selec = (nulltesttype == IS_NULL) ? 0.0 : 1.0; +} else { /* I'm not entirely sure why you're bothering; surely nulltestsel is unrelated to what this patch is about? And would anybody really write "WHERE ctid IS NULL"? However, if we do think it's worth adding code to cover this case, I wouldn't make it specific to CTID. *All* system columns can be assumed not null, see heap_getsysattr(). regards, tom lane
Re: A few new options for vacuumdb
On 12/21/18, 10:51 AM, "Robert Haas" wrote: > On Thu, Dec 20, 2018 at 11:48 AM Bossart, Nathan wrote: >> Either way, we'll still have to decide whether to fail or to silently >> skip the option if you do something like specify --min-mxid-age for a >> 9.4 server. > > +1 for fail. Sounds good. I'll keep all the version checks and will fail for unsupported options in the next patch set. Nathan
Re: [suggestion]support UNICODE host variables in ECPG
"Nagaura, Ryohei" writes: > Tsunakawa-san >> * What's the benefit of supporting UTF16 in host variables? > 1) As byte per character is constant in UTF16 encoding, it can process > strings more efficiently than other encodings. I don't think I buy that argument; it falls down as soon as you consider characters above U+. I worry that by supporting UTF16, we'd basically be encouraging users to write code that fails on such characters, which doesn't seem like good project policy. regards, tom lane
Re: Tid scan improvements
Edmund Horner writes: > Ok. I think that will simplify things. So if I follow you correctly, > we should do: > 1. If has_useful_pathkeys is true: generate pathkeys (for CTID ASC), > and use truncate_useless_pathkeys on them. > 2. If we have tid quals or pathkeys, emit a TID scan path. Check. > For the (optional) backwards scan support patch, should we separately > emit another path, in the reverse direction? What indxpath.c does is, if has_useful_pathkeys is true, to generate pathkeys both ways and then build paths if the pathkeys get past truncate_useless_pathkeys. That seems sufficient in this case too. There are various heuristics about whether it's really useful to consider both sort directions, but that intelligence is already built into truncate_useless_pathkeys. tid quals with no pathkeys would be reason to generate a forward path, but not reason to generate a reverse path, because then that would be duplicative. regards, tom lane
Re: A few new options for vacuumdb
On Thu, Dec 20, 2018 at 11:48 AM Bossart, Nathan wrote: > Either way, we'll still have to decide whether to fail or to silently > skip the option if you do something like specify --min-mxid-age for a > 9.4 server. +1 for fail. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Improve selectivity estimate for range queries
"Yuzuko Hosoya" writes: > From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp] >> At Thu, 20 Dec 2018 17:21:29 +0900, "Yuzuko Hosoya" >> wrote in >> <008701d4983d$02e731c0$08b59540$@lab.ntt.co.jp> >>> To handle such cases I've thought up of an idea based on a previous >>> commit[1] which solved a similar problem related to >>> DEFAULT_NUM_DISTINCT. Just like this modification, I added a flag >>> which shows whether the selectivity >> The commit [1] added almost no complexity, but this does. Affects many >> functions to introduce tighter coupling between operator-selectivity >> functions and clause selectivity functions. isdefault travells alone >> too long just to bind remote functions. We might need more pricipled >> way to handle that. Yeah, I don't think that this approach is a reasonable tradeoff to eliminate a narrow case. In particular, what's been done here to clause_selectivity is totally unprincipled and poorly thought out: you can't add an output parameter that's set in only some cases. Yet, there's no good way to decide how to set it in many cases. For instance, what would you do for an AND or OR where some of the branches are default estimates and some not? >> Around the [1], there was a discussion about introducing the notion of >> uncertainty to selectivity. The isdefualt is a kind of uncertainty >> value indicating '0/100% uncertain'. So my feeling is saying that >> it's better that Selectivity has certinty component then building a >> selectivity arithmetics involving uncertainty, though I don't have a >> clear picture. > I see. Indeed if we would change Selectivity so that it includes > information about default/non-default, such problems I mentioned > would be solved. But I'm afraid that this modification would lead > to a big impact, since there are lots of functions that calculate > selectivity. I also don't have a concreate idea, so I will think > about it. Besides that, I'd like to ask community whether such > modification would be acceptable or not. The planner definitely has a lot of problems that could be addressed with some sort of uncertainty framework ... but it'd be a huge undertaking, which is why nobody's tried yet. A smaller-footprint way to fix the immediate problem might be to change the values of DEFAULT_INEQ_SEL and friends so that they're even less likely to be matched by accident. That is, instead of 0., use 0.342 or some such. It might seem that that's just moving the problem around, but I think it might be possible to show that such a value couldn't be computed by scalarltsel given a histogram with no more than 1 members. (I haven't tried to actually prove that, but it seems intuitive that the set of possible results would be quantized with no more than about 5 digits precision.) regards, tom lane
Re: Remove Deprecated Exclusive Backup Mode
On Fri, Dec 21, 2018 at 1:18 AM David Steele wrote: > On 12/21/18 2:01 AM, Michael Paquier wrote: > > On Thu, Dec 20, 2018 at 12:29:48PM +0200, David Steele wrote: > >> Cannot move patch to the same commitfest, and multiple future commitfests > >> exist! > > > > I am not sure what it means either. What if you just mark the existing > > entry as returned with feedback, and create a new one ahead? > > Yeah, I can do that, but it would be nice to know why this is not > working. It would also be nice to preserve the history. > > Magnus? I think what it means is that it doesn't know which CommitFest is the "next" one. If the patch were currently in the in-progress CommitFest, the Open one (if any) would be the next one. Otherwise, if there were only one Future CommitFest, that would be the next one. But right now you're moving a patch that is already in the Open CommitFest and there are two 2 Future CommitFests, so it doesn't know which one to pick. I think the old CommitFest application let you set the CommitFest via the Edit screen also, so you could pick a specific CommitFest to target. But that doesn't seem to exist in Magnus's version. If you just wait until the Open CommitFest is marked In Progress and the next one is marked Open, it should work in any case. I think. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: ATTACH/DETACH PARTITION CONCURRENTLY
On Thu, Dec 20, 2018 at 4:38 PM Robert Haas wrote: > Lowering the lock level might also make something that was previously > safe into something unsafe, because now there's no longer a guarantee > that invalidation messages are received soon enough. With > AccessExclusiveLock, we'll send invalidation messages before releasing > the lock, and other processes will acquire the lock and then > AcceptInvalidationMessages(). But with ShareUpdateExclusiveLock the > locks can coexist, so now there might be trouble. I think this is an > area where we need to do some more investigation. So there are definitely problems here. With my patch: create table tab (a int, b text) partition by range (a); create table tab1 partition of tab for values from (0) to (10); prepare t as select * from tab; begin; explain execute t; -- seq scan on tab1 execute t; -- no rows Then, in another session: alter table tab detach partition tab1; insert into tab1 values (300, 'oops'); Back to the first session: execute t; -- shows (300, 'oops') explain execute t; -- still planning to scan tab1 commit; explain execute t; -- now it got the memo, and plans to scan nothing execute t; -- no rows Well, that's not good. We're showing a value that was never within the partition bounds of any partition of tab. The problem is that, since we already have locks on all relevant objects, nothing triggers the second 'explain execute' to process invalidation messages, so we don't update the plan. Generally, any DDL with less than AccessExclusiveLock has this issue. On another thread, I was discussing with Tom and Peter the possibility of trying to rejigger things so that we always AcceptInvalidationMessages() at least once per command, but I think that just turns this into a race: if a concurrent commit happens after 'explain execute t' decides not to re-plan but before it begins executing, we have the same problem. This example doesn't involve partition pruning, and in general I don't think that the problem is confined to partition pruning. It's rather that if there's no conflict between the lock that is needed to change the set of partitions and the lock that is needed to run a query, then there's no way to guarantee that the query runs with the same set of partitions for which it was planned. Unless I'm mistaken, which I might be, this is also a problem with your approach -- if you repeat the same prepared query in the same transaction, the transaction snapshot will be updated, and thus the PartitionDesc will be expanded differently at execution time, but the plan will not have changed, because invalidation messages have not been processed. Anyway, I think the only fix here is likely to be making the executor resilient against concurrent changes in the PartitionDesc. I don't think there's going to be any easy way to compensate for added partitions without re-planning, but maybe we could find a way to flag detached partitions so that they return no rows without actually touching the underlying relation. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: START/END line number for COPY FROM
Surafel Temesgen writes: > Currently we can skip header line on COPY FROM but having the ability to > skip and stop copying at any line can use to divide long copy operation and > enable to copy a subset of the file and skipping footer. Attach is a patch > for it I do not think this is a good idea. We have resisted attempts to add ETL-like features to COPY on the grounds that it would add complexity and cost performance, and that that's not what COPY is for. This seems to fall squarely in the domain of something you should be doing with another tool. regards, tom lane
Re: pgsql: Check for conflicting queries during replay of gistvacuumpage()
On 2018-Dec-21, Tom Lane wrote: > Alvaro Herrera writes: > > Hmmm, I'm fairly sure you should have bumped XLOG_PAGE_MAGIC for this > > change. Otherwise, what is going to happen to an unpatched standby (of > > released versions) that receives the new WAL record from a patched > > primary? > > We can't change XLOG_PAGE_MAGIC in released branches, surely. > > I think the correct thing is just for the release notes to warn people > to upgrade standby servers first. You're right. My memory is playing tricks on me. I recalled that we had done it to prevent replay of WAL replay in nonpatched standbys in some backpatched commit, but I can't find any evidence of this :-( The commit message for 8e9a16ab8f7f (in 9.3 branch after it was released) says: In replication scenarios using the 9.3 branch, standby servers must be upgraded before their master, so that they are prepared to deal with the new WAL record once the master is upgraded; failure to do so will cause WAL replay to die with a PANIC message. Later upgrade of the standby will allow the process to continue where it left off, so there's no disruption of the data in the standby in any case. Standbys know how to deal with the old WAL record, so it's okay to keep the master running the old code for a while. Stupidly, I checked the 9.4 version of that commit (then the master branch) and it does indeed contain the XLOG_PAGE_MAGIC change, but the 9.3 commit doesn't. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pgsql: Check for conflicting queries during replay of gistvacuumpage()
Alvaro Herrera writes: >> Hmmm, I'm fairly sure you should have bumped XLOG_PAGE_MAGIC for this >> change. Otherwise, what is going to happen to an unpatched standby (of >> released versions) that receives the new WAL record from a patched >> primary? Oh, and if the answer to your question is not "it fails with an intelligible error about an unrecognized WAL record type", then we need to adjust what is emitted so that that will be what happens. Crashing, or worse silently misprocessing the record, will not do. regards, tom lane
Re: Changes to pg_dump/psql following collation "C" in the catalog
"Daniel Verite" writes: > One consequence of using the "C" collation in the catalog versus > the db collation is that pg_dump -t with a regexp may not find > the same tables as before. It happens when these conditions are > all met: > - the collation of the database is not "C" > - the regexp has locale-dependant parts > - the names to match include characters that are sensitive to > locale-dependant matching Hm, interesting. > It seems that to fix that, we could qualify the references to columns such > as "relname" and "schema_name" with COLLATE "default" clauses in the > queries that use pattern-matching in client-side tools, AFAICS > pg_dump and psql. Seems reasonable. I was initially worried that this might interfere with query optimization, but some experimentation says that the planner successfully derives prefix index clauses anyway (which is correct, because matching a fixed regex prefix doesn't depend on locale). It might be better to attach the COLLATE clause to the pattern constant instead of the column name; that'd be less likely to break if sent to an older server. > Before going any further with this idea, is there agreement that it's an > issue to address and does this look like the best way to do that? That is a question worth asking. We're going to be forcing people to get used to this when working directly in SQL, so I don't know if masking it in a subset of tools is really a win or not. regards, tom lane
Re: pgsql: Check for conflicting queries during replay of gistvacuumpage()
Hmmm, I'm fairly sure you should have bumped XLOG_PAGE_MAGIC for this change. Otherwise, what is going to happen to an unpatched standby (of released versions) that receives the new WAL record from a patched primary? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: tickling the lesser contributor's withering ego
On 2018-12-21 16:17, Alvaro Herrera wrote: On 2018-Dec-21, Tom Lane wrote: Alvaro Herrera writes: > I propose the following patch, which will make those links stable -- > then we can add the following links to the contributors page: > https://www.postgresql/org/docs/10/release-10.html#RELEASE-10-ACKNOWLEDGEMENTS > https://www.postgresql/org/docs/11/release-11.html#RELEASE-11-ACKNOWLEDGEMENTS Seems reasonable, but note the lag time --- unless somebody does something out of the ordinary, those pages won't actually have such tags till after the February minor releases. Good point. That seems acceptable to me. Erik, are you willing to propose a patch for the pgweb side of things? Yes, sounds reasonable. I think I can cobble something together.
Re: tickling the lesser contributor's withering ego
On 2018-Dec-21, Tom Lane wrote: > Alvaro Herrera writes: > > I propose the following patch, which will make those links stable -- > > then we can add the following links to the contributors page: > > https://www.postgresql/org/docs/10/release-10.html#RELEASE-10-ACKNOWLEDGEMENTS > > https://www.postgresql/org/docs/11/release-11.html#RELEASE-11-ACKNOWLEDGEMENTS > > Seems reasonable, but note the lag time --- unless somebody does > something out of the ordinary, those pages won't actually have > such tags till after the February minor releases. Good point. That seems acceptable to me. Erik, are you willing to propose a patch for the pgweb side of things? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Changes to pg_dump/psql following collation "C" in the catalog
Hi, One consequence of using the "C" collation in the catalog versus the db collation is that pg_dump -t with a regexp may not find the same tables as before. It happens when these conditions are all met: - the collation of the database is not "C" - the regexp has locale-dependant parts - the names to match include characters that are sensitive to locale-dependant matching For instance a table named "rapport_journée_12" in an fr_FR.UTF-8 db used to be found by pg_dump -t 'rapport_\w+_\d+', and is now missed in the devel version. It seems that to fix that, we could qualify the references to columns such as "relname" and "schema_name" with COLLATE "default" clauses in the queries that use pattern-matching in client-side tools, AFAICS pg_dump and psql. Before going any further with this idea, is there agreement that it's an issue to address and does this look like the best way to do that? Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: [PATCH] Improve tab completion for CREATE TABLE
I wrote: > Another omission I just realised of is that it doesn't complete the list > of table storage options after after "WITH (". That should be fairly > easy to add (we already have the list for completing after ALTER TABLE > SET|RESET), but it's getting late here now. Here's a patch that does this (and in passing alphabetises the list of options). - ilmari -- - Twitter seems more influential [than blogs] in the 'gets reported in the mainstream press' sense at least. - Matt McLeod - That'd be because the content of a tweet is easier to condense down to a mainstream media article. - Calle Dybedahl >From 5f948f2ebff03a89d18ab621bc21d03cfaa07529 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= Date: Fri, 21 Dec 2018 15:03:19 + Subject: [PATCH] Add completion for CREATE TABLE ... WITH options Move the list of options from the "ALTER TABLE SET|RESET (" block to the top-level, and reuse it after "CREATE TABLE ( ... ) WITH (". In passing, alphabetise the option list properly. --- src/bin/psql/tab-complete.c | 74 ++--- 1 file changed, 37 insertions(+), 37 deletions(-) diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index 5ba6ffba8c..074c88378b 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -1005,6 +1005,40 @@ static const pgsql_thing_t words_after_create[] = { {NULL} /* end of list */ }; +static const char *const list_TABLEOPTIONS[] = { + "autovacuum_analyze_scale_factor", + "autovacuum_analyze_threshold", + "autovacuum_enabled", + "autovacuum_freeze_max_age", + "autovacuum_freeze_min_age", + "autovacuum_freeze_table_age", + "autovacuum_multixact_freeze_max_age", + "autovacuum_multixact_freeze_min_age", + "autovacuum_multixact_freeze_table_age", + "autovacuum_vacuum_cost_delay", + "autovacuum_vacuum_cost_limit", + "autovacuum_vacuum_scale_factor", + "autovacuum_vacuum_threshold", + "fillfactor", + "log_autovacuum_min_duration", + "parallel_workers", + "toast.autovacuum_enabled", + "toast.autovacuum_freeze_max_age", + "toast.autovacuum_freeze_min_age", + "toast.autovacuum_freeze_table_age", + "toast.autovacuum_multixact_freeze_max_age", + "toast.autovacuum_multixact_freeze_min_age", + "toast.autovacuum_multixact_freeze_table_age", + "toast.autovacuum_vacuum_cost_delay", + "toast.autovacuum_vacuum_cost_limit", + "toast.autovacuum_vacuum_scale_factor", + "toast.autovacuum_vacuum_threshold", + "toast.log_autovacuum_min_duration", + "toast_tuple_target", + "user_catalog_table", + NULL +}; + /* Forward declaration of functions */ static char **psql_completion(const char *text, int start, int end); @@ -1904,44 +1938,7 @@ psql_completion(const char *text, int start, int end) COMPLETE_WITH("("); /* ALTER TABLE SET|RESET ( */ else if (Matches("ALTER", "TABLE", MatchAny, "SET|RESET", "(")) - { - static const char *const list_TABLEOPTIONS[] = - { - "autovacuum_analyze_scale_factor", - "autovacuum_analyze_threshold", - "autovacuum_enabled", - "autovacuum_freeze_max_age", - "autovacuum_freeze_min_age", - "autovacuum_freeze_table_age", - "autovacuum_multixact_freeze_max_age", - "autovacuum_multixact_freeze_min_age", - "autovacuum_multixact_freeze_table_age", - "autovacuum_vacuum_cost_delay", - "autovacuum_vacuum_cost_limit", - "autovacuum_vacuum_scale_factor", - "autovacuum_vacuum_threshold", - "fillfactor", - "parallel_workers", - "log_autovacuum_min_duration", - "toast_tuple_target", - "toast.autovacuum_enabled", - "toast.autovacuum_freeze_max_age", - "toast.autovacuum_freeze_min_age", - "toast.autovacuum_freeze_table_age", - "toast.autovacuum_multixact_freeze_max_age", - "toast.autovacuum_multixact_freeze_min_age", - "toast.autovacuum_multixact_freeze_table_age", - "toast.autovacuum_vacuum_cost_delay", - "toast.autovacuum_vacuum_cost_limit", - "toast.autovacuum_vacuum_scale_factor", - "toast.autovacuum_vacuum_threshold", - "toast.log_autovacuum_min_duration", - "user_catalog_table", - NULL - }; - COMPLETE_WITH_LIST(list_TABLEOPTIONS); - } else if (Matches("ALTER", "TABLE", MatchAny, "REPLICA", "IDENTITY", "USING", "INDEX")) { completion_info_charp = prev5_wd; @@ -2439,6 +2436,9 @@ psql_completion(const char *text, int start, int end) else if (TailMatches("CREATE", "TEMP|TEMPORARY", "TABLE", MatchAny, "(*)")) COMPLETE_WITH("INHERITS (", "ON COMMIT", "PARTITION BY", "TABLESPACE", "WITH ("); + else if (TailMatches("CREATE", "TABLE", MatchAny, "(*)", "WITH", "(") || + TailMatches("CREATE", "TEMP|TEMPORARY|UNLOGGED", "TABLE", MatchAny, "(*)", "WITH", "(")) + COMPLETE_WITH_LIST(list_TABLEOPTIONS); /* Complete CREATE TABLE ON COMMIT with actions */ else if (TailMatches("CREATE", "TEMP|TEMPORARY", "TABLE", MatchAny, "(*)", "ON", "COMMIT")) COMPLETE_WITH("DELETE ROWS", "DROP", "PRESERVE ROWS"); -- 2.20.1
Re: tickling the lesser contributor's withering ego
Alvaro Herrera writes: > I propose the following patch, which will make those links stable -- > then we can add the following links to the contributors page: > https://www.postgresql/org/docs/10/release-10.html#RELEASE-10-ACKNOWLEDGEMENTS > https://www.postgresql/org/docs/11/release-11.html#RELEASE-11-ACKNOWLEDGEMENTS Seems reasonable, but note the lag time --- unless somebody does something out of the ordinary, those pages won't actually have such tags till after the February minor releases. regards, tom lane
Re: tickling the lesser contributor's withering ego
On 2018-Nov-04, Erik Rijkers wrote: > I wouldn't mind if this page: > https://www.postgresql.org/community/contributors/ > > contained a link to (contributors v11): > https://www.postgresql.org/docs/11/static/release-11.html#id-1.11.6.5.6 > > and to (contributors v10) > https://www.postgresql.org/docs/current/static/release-11.html#id-1.11.6.5.6 I propose the following patch, which will make those links stable -- then we can add the following links to the contributors page: https://www.postgresql/org/docs/10/release-10.html#RELEASE-10-ACKNOWLEDGEMENTS https://www.postgresql/org/docs/11/release-11.html#RELEASE-11-ACKNOWLEDGEMENTS -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/doc/src/sgml/release-10.sgml b/doc/src/sgml/release-10.sgml index aacdd360e0..c72b4c931e 100644 --- a/doc/src/sgml/release-10.sgml +++ b/doc/src/sgml/release-10.sgml @@ -8990,7 +8990,7 @@ This was disabled in the PG 9.6 branch so there is no commit here. - + Acknowledgments diff --git a/doc/src/sgml/release-11.sgml b/doc/src/sgml/release-11.sgml index f35b0d8cc9..faa5835de4 100644 --- a/doc/src/sgml/release-11.sgml +++ b/doc/src/sgml/release-11.sgml @@ -3617,7 +3617,7 @@ same commits as above - + Acknowledgments
Re: [PATCH] Improve tab completion for CREATE TABLE
Michael Paquier writes: > On Thu, Dec 20, 2018 at 12:02:52AM +, Dagfinn Ilmari Mannsåker wrote: >> Point, fixed in the attached v4. OTOH, as I mentioned in my other >> email, that runs into the problem that it won't complete the actions >> after e.g. "CREATE TEMP TABLE FOO () WITH () ON COMMIT". > > I am fine to do that later on if that's adapted, one complication being > that the completion is dependent on the order of the clauses for CREATE > TABLE as we need something compatible with CREATE TABLE commands bundled > with CREATE SCHEMA calls so only tail checks can be done. Yeah, because of that we can't do the obvious HeadMatches("CREATE", "TABLE") && (TailMatches(...) || TailMatches(...) || ...). I believe this would require extending the match syntax with regex-like grouping, alternation and repetition operators, which I'm not volunteering to do. > So committed your last version after some comment tweaks and reordering > the entries in alphabetical order. Thanks! - ilmari -- "The surreality of the universe tends towards a maximum" -- Skud's Law "Never formulate a law or axiom that you're not prepared to live with the consequences of." -- Skud's Meta-Law
Re: Clean up some elog messages and comments for do_pg_stop_backup and do_pg_start_backup
On 2018-Dec-21, Michael Paquier wrote: > The thing is that the current messages are actually misleading, because > for base backups taken by the replication protocol pg_stop_backup is > never called, which is I think confusing. While looking around I have > also noticed that the top comments of do_pg_start_backup and > do_pg_stop_backup also that they are used with BASE_BACKUP. > > Attached is a patch to reduce the confusion and improve the related > comments and messages. Looks like good cleanup. This phrase: "Backup can be canceled safely, " seems a bit odd. It would work either in plural "Backups can" or maybe be specific to the current backup, "The backup can". But looking at the bigger picture, errhint("Check that your archive_command is executing properly. " +"Backup can be canceled safely, " "but the database backup will not be usable without all the WAL segments."))) I think repeating the same term in the third line is not great. Some ideas: Backups can be canceled safely, but they will not be usable without all the WAL segments. The backup can be canceled safely, but it will not be usable without all the WAL segments. Database backups can be canceled safely, but the current backup will not be usable without all the WAL segments. Database backups can be canceled safely, but no backup will be usable without all the WAL segments. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: could recovery_target_timeline=latest be the default in standby mode?
Hello I am +1 for recovery_target_timeline=latest by default. This is common case in my opinion. And first release without recovery.conf is reasonable time for change default value. But i doubt if we can ignore recovery_target_timeline in standby and always use latest in standby. I have no use case for this but i prefer change only default value. regards, Sergei
pg_upgrade: Pass -j down to vacuumdb
Hi Hackers, Here is a patch that passes the -j option from pg_upgrade down to vacuumdb if supported. I'll add it to the January 'Fest. Thanks for considering ! Best regards, Jesper >From ea941f942830589469281e0d5c17740469c6aebc Mon Sep 17 00:00:00 2001 From: jesperpedersen Date: Fri, 21 Dec 2018 05:05:31 -0500 Subject: [PATCH] Pass the -j option down to vacuumdb if supported. Author: Jesper Pedersen --- src/bin/pg_upgrade/check.c | 25 ++--- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c index f2215b7095..102221a201 100644 --- a/src/bin/pg_upgrade/check.c +++ b/src/bin/pg_upgrade/check.c @@ -495,15 +495,26 @@ create_script_for_cluster_analyze(char **analyze_script_file_name) ECHO_QUOTE, ECHO_QUOTE); fprintf(script, "echo %sthis script and run:%s\n", ECHO_QUOTE, ECHO_QUOTE); - fprintf(script, "echo %s\"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE, - new_cluster.bindir, user_specification.data, - /* Did we copy the free space files? */ - (GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ? - "--analyze-only" : "--analyze", ECHO_QUOTE); + if (user_opts.jobs <= 1 || GET_MAJOR_VERSION(new_cluster.major_version) < 905) + fprintf(script, "echo %s\"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE, +new_cluster.bindir, user_specification.data, + /* Did we copy the free space files? */ +(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ? +"--analyze-only" : "--analyze", ECHO_QUOTE); + else + fprintf(script, "echo %s\"%s/vacuumdb\" %s-j %d --all %s%s\n", ECHO_QUOTE, +new_cluster.bindir, user_specification.data, user_opts.jobs, + /* Did we copy the free space files? */ +(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ? +"--analyze-only" : "--analyze", ECHO_QUOTE); fprintf(script, "echo%s\n\n", ECHO_BLANK); - fprintf(script, "\"%s/vacuumdb\" %s--all --analyze-in-stages\n", - new_cluster.bindir, user_specification.data); + if (user_opts.jobs <= 1 || GET_MAJOR_VERSION(new_cluster.major_version) < 905) + fprintf(script, "\"%s/vacuumdb\" %s--all --analyze-in-stages\n", +new_cluster.bindir, user_specification.data); + else + fprintf(script, "\"%s/vacuumdb\" %s-j %d --all --analyze-in-stages\n", +new_cluster.bindir, user_specification.data, user_opts.jobs); /* Did we copy the free space files? */ if (GET_MAJOR_VERSION(old_cluster.major_version) < 804) fprintf(script, "\"%s/vacuumdb\" %s--all\n", new_cluster.bindir, -- 2.17.2
Re: Progress reporting for pg_verify_checksums
Hi, On Wed, Oct 03, 2018 at 11:50:36AM +1300, Thomas Munro wrote: > On Sat, Sep 29, 2018 at 1:07 AM Michael Banck > wrote: > Windows doesn't like sigaction: > > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.15189 Thanks for the report and sorry for taking so long to reply. > I'm not sure if we classify this as a "frontend" program. Should it > be using pqsignal() from src/port/pqsignal.c? Or perhaps just > sigaction as you have it (pqsignal.c says that we require sigaction on > all Unices), but #ifndef WIN32 around that stuff, since SIGUSR1 is > never going to work anyway. I've used pqsignal now, and disabled that feature on Windows. I've also updated one comment and added an additional regression test. V5 attached. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz diff --git a/doc/src/sgml/ref/pg_verify_checksums.sgml b/doc/src/sgml/ref/pg_verify_checksums.sgml index 905b8f1222..b470c19c08 100644 --- a/doc/src/sgml/ref/pg_verify_checksums.sgml +++ b/doc/src/sgml/ref/pg_verify_checksums.sgml @@ -80,6 +80,19 @@ PostgreSQL documentation + -P + --progress + + +Enable progress reporting. Turning this on will deliver an approximate +progress report during the checksum verification. Sending the +SIGUSR1 signal will toggle progress reporting +on or off during the verification run (not available on Windows). + + + + + -V --version diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c index 6444fc9ca4..b049ea17b3 100644 --- a/src/bin/pg_verify_checksums/pg_verify_checksums.c +++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c @@ -10,6 +10,8 @@ #include "postgres_fe.h" #include +#include +#include #include #include @@ -30,9 +32,18 @@ static ControlFileData *ControlFile; static char *only_relfilenode = NULL; static bool verbose = false; +static bool show_progress = false; static const char *progname; +/* + * Progress status information. + */ +int64 total_size = 0; +int64 current_size = 0; +pg_time_t last_progress_update; +pg_time_t scan_started; + static void usage(void) { @@ -43,6 +54,7 @@ usage(void) printf(_(" [-D, --pgdata=]DATADIR data directory\n")); printf(_(" -v, --verbose output verbose messages\n")); printf(_(" -r RELFILENODE check only relation with specified relfilenode\n")); + printf(_(" -P, --progress show progress information\n")); printf(_(" -V, --version output version information, then exit\n")); printf(_(" -?, --help show this help, then exit\n")); printf(_("\nIf no data directory (DATADIR) is specified, " @@ -67,6 +79,69 @@ static const char *const skip[] = { NULL, }; +static void +toggle_progress_report(int signum) +{ + + /* we handle SIGUSR1 only, and toggle the value of show_progress */ + if (signum == SIGUSR1) + show_progress = !show_progress; + +} + +/* + * Report current progress status. Parts borrowed from + * PostgreSQLs' src/bin/pg_basebackup.c + */ +static void +report_progress(bool force) +{ + pg_time_t now = time(NULL); + int total_percent = 0; + + char totalstr[32]; + char currentstr[32]; + char currspeedstr[32]; + + /* Make sure we report at most once a second */ + if ((now == last_progress_update) && !force) + return; + + /* Save current time */ + last_progress_update = now; + + /* + * Calculate current percent done, based on KiB... + */ + total_percent = total_size ? (int64) ((current_size / 1024) * 100 / (total_size / 1024)) : 0; + + /* Don't display larger than 100% */ + if (total_percent > 100) + total_percent = 100; + + /* The same for total size */ + if (current_size > total_size) + total_size = current_size / 1024; + + snprintf(totalstr, sizeof(totalstr), INT64_FORMAT, + total_size / 1024); + snprintf(currentstr, sizeof(currentstr), INT64_FORMAT, + current_size / 1024); + snprintf(currspeedstr, sizeof(currspeedstr), INT64_FORMAT, + (current_size / 1024) / (((time(NULL) - scan_started) == 0) ? 1 : (time(NULL) - scan_started))); + fprintf(stderr, "%s/%s kB (%d%%, %s kB/s)", + currentstr, totalstr, total_percent, currspeedstr); + + /* + * If we are reporting to a terminal, send a carriage return so that we + * stay on the same line. If not, send a newline. + */ + if (isatty(fileno(stderr))) + fprintf(stderr, "\r"); + else + fprintf(stderr, "\n"); +} + static bool skipfile(const char *fn) { @@ -117,6 +192,7 @@ scan_file(const char *fn, BlockNumber segmentno)
Re: Online verification of checksums
Hi, On Thu, Dec 20, 2018 at 04:19:11PM +0100, Michael Banck wrote: > Yeah, new rebased version attached. By the way, one thing that this patch also fixes is checksum verification on basebackups (as pointed out the other day by my colleague Bernd Helmele): postgres@kohn:~$ initdb -k data postgres@kohn:~$ pg_ctl -D data -l logfile start waiting for server to start done server started postgres@kohn:~$ pg_verify_checksums -D data pg_verify_checksums: cluster must be shut down to verify checksums postgres@kohn:~$ pg_basebackup -h /tmp -D backup1 postgres@kohn:~$ pg_verify_checksums -D backup1 pg_verify_checksums: cluster must be shut down to verify checksums postgres@kohn:~$ pg_checksums -c -D backup1 Checksum scan completed Files scanned: 1094 Blocks scanned: 2867 Bad checksums: 0 Data checksum version: 1 Where pg_checksums has the online verification patch applied. As I don't think many people will take down their production servers in order to verify checksums, verifying them on basebackups looks like a useful use-case that is currently broken with pg_verify_checksums. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
could recovery_target_timeline=latest be the default in standby mode?
Is there ever a reason why you would *not* want recovery_target_timeline=latest in standby mode? pg_basebackup -R doesn't set it. That seems suboptimal. Perhaps this could be the default in standby mode, or even the implicit default, ignoring the recovery_target_timeline setting altogether. How could we make things simpler here? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Speeding up creating UPDATE/DELETE generic plan for partitioned table into a lot
Kato-san, On 2018/12/21 15:36, Kato, Sho wrote: > Hi, > I want to speed up the creation of UPDATE/DELETE generic plan for tables > partitioned into a lot. > > Currently, creating a generic plan of UPDATE/DELTE for such table, planner > creates a plan to scan all partitions. > So it takes a very long time. > I tried with a table partitioned into 8192, it took 12 seconds. > > In most cases, since the partitions to access are partial, I think planner > does not need to create a Scan path for every partition. What do you mean by "since the partitions to access are partial"? > Is there any better way? For example, can planner create generic plans from > the parameters specified for EXECUTE? Well, a generic plan is, by definition, *not* specific to the values of parameters, so it's not clear what you're suggesting here. Thanks, Amit
Performance of SELECT in a table partitioned into a lot
Hi, I compared INSERT/UPDATE/DELETE/SELECT throughput with PostgreSQL and another dbms. For INSERT/DELETE/UPDATE, PostgreSQL performance is superior, but for SELECT, PostgreSQL performance is slightly lower than another dbms. Because information may be missing, it may be difficult, but do you know this reason? Also, if you know where I need to find out, please let me know. *table information(e.g. 8192 partitions, each partition has 1,000 records)* testdb=# \d test.accounts Partitioned table "test.accounts" Column | Type | Collation | Nullable | Default --+-+---+--+ aid | integer | | not null | nextval('test.accounts_aid_seq'::regclass) abalance | integer | | | Partition key: RANGE (aid) Indexes: "accounts_ix" btree (aid) Number of partitions: 8192 (Use \d+ to list them.) *interface* JDBC(postgresql-42.2.4.jar) Use PreparedStatement *database tuning* plan_cache_mode = force_custom_plan max_worker_processes = 0 max_parallel_workers_per_gather = 0 max_parallel_workers = 0 *SQL* SELECT abalance FROM test.accounts WHERE aid = $1; $1 is random(npart * 1000) *Other setting* Benchmark is executed with a single session. *Benchmark results* I use master(commit 71a05b2232 Wed Dec 5) + v8 patch[1] + v1 patch[2]. npart PostgreSQL another dbms - -- 0 6314.7 6580.3 2 5761.9 6390.6 4 5916 6279.3 8 5884.1 6000.7 165887.7 6296 325868.3 6274.4 645826.5 6248.6 128 5807.4 6208.9 256 5748.7 6241.4 512 5699.8 6204.6 1024 5625.9 6174.1 2048 5540.5 6159.3 4096 5393.3 6060 8192 5251.3 6093.4 [1]:https://www.postgresql.org/message-id/9d7c5112-cb99-6a47-d3be-cf1ee6862...@lab.ntt.co.jp [2]:https://www.postgresql.org/message-id/CAKJS1f-=fnmqmqp6qitkd+xeddxw22yslp-0xfk3jaqux2y...@mail.gmail.com regards, Sho Kato
RE: [suggestion]support UNICODE host variables in ECPG
Matsumura-san, Tsunakawa-san Thank you for reply. Tsunakawa-san > * What's the benefit of supporting UTF16 in host variables? There are two benefits. 1) As byte per character is constant in UTF16 encoding, it can process strings more efficiently than other encodings. 2) This enables C programmers to use wide characters. > * Does your proposal comply with the SQL standard? If not, what does the > SQL standard say about support for UTF16? I referred to the document, but I could not find it. Does anyone know about this? > * Why only Windows? It should be implemented in other OS if needed. Matsumura-san > I understand that the previous discussion pointed that the feature had > better be implemented more simply or step-by-step and description about > implementation is needed more. > I also think it prevented the discussion to reach to the detail of feature. > What is your opinion about it? I wanted to discuss the necessity first, so I did not answer. I'm very sorry for not having mentioned it. If it is judged that this function is necessary, I'll remake the design. Best regards, - Ryohei Nagaura
RE: Improve selectivity estimate for range queries
Hi, Thanks for the comments. I attach the v2 patch. > From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp] > Sent: Friday, December 21, 2018 12:25 PM > > Hello. > > At Thu, 20 Dec 2018 17:21:29 +0900, "Yuzuko Hosoya" > wrote in > <008701d4983d$02e731c0$08b59540$@lab.ntt.co.jp> > > In my environment, the selectivity for id > 0 was 0.1, > > and the selectivity for id < 1 was 0.1. Then, the > > value of rqlist->hibound and rqlist->lobound are set respectively. > > On the other hand, DEFAULT_INEQ_SEL is 0.. As a > > result, the final selectivity is computed according to > > DEFAULT_RANGE_INEQ_SEL, since the condition of the second if statement > > was satisfied. However, these selectivities were computed according to > > the statistics, so the final selectivity had to be calculated from > > rqlist->hibound + rqlist->lobound > - 1.0. > > My test case might be uncommon, but I think it would cause some problems. > > Yeah, its known behavior as the comment just above. If in your example query, > the table weer very large > and had an index it could run faultly an index scan over a fair amount of > tuples. But I'm not sure > how much it matters and your example doesn't show that. > > The behvavior discussed here is introduced by this commit. > > | commit 547bb4a7f2bdccad9253a99211ce84b3f9de485a > | Author: Tom Lane > | Date: Tue Nov 9 00:34:46 2004 + > | > | Use a hopefully-more-reliable method of detecting default selectivity > | estimates when combining the estimates for a range query. As pointed > out > | by Miquel van Smoorenburg, the existing check for an impossible combined > | result would quite possibly fail to detect one default and one > non-default > | input. It seems better to use the default range query estimate in such > | cases. To do so, add a check for an estimate of exactly > DEFAULT_INEQ_SEL. > | This is a bit ugly because it introduces additional coupling between > | clauselist_selectivity and scalarltsel/scalargtsel, but it's not like > | there wasn't plenty already... > > > To handle such cases I've thought up of an idea based on a previous > > commit[1] which solved a similar problem related to > > DEFAULT_NUM_DISTINCT. Just like this modification, I added a flag > > which shows whether the selectivity > > The commit [1] added almost no complexity, but this does. Affects many > functions to introduce tighter > coupling between operator-selectivity functions and clause selectivity > functions. > isdefault travells alone too long just to bind remote functions. We might > need more pricipled way to > handle that. > Yes, indeed. But I have not found better way than this approach yet. > > was calculated according to the statistics or not to > > clauselist_selectivity, and used it as a condition of the if statement > > instead of > > rqlist->hibound == DEFAULT_INEQ_SEL and rqlist->lobound == DEFAULT_INEQ_SEL. > > I am afraid that changes were more than I had expected, but we can > > estimate selectivity correctly. > > > > Patch attached. Do you have any thoughts? > > I've just run over the patch, but some have comments. > > + if (isdefault) > + rqelem->lobound_isdefault = true; > > This is taking the disjunction of lobounds ever added, I suppose it should be > the last lobounds' > isdefault. > Indeed. I fixed it. > As you see, four other instances of "selec ==/!= DEFAULT_*" exist in > mergejoinsel. Don't they lead > to similar kind of problem? > I didn't encounter similar problems, but as you said, such problems would be occurred due to the condition, selec != DEFAULT_INEQ_SEL. I changed these conditions into "!isdefault". > > > Selectivity lobound;/* Selectivity of a var > something clause */ > Selectivity hibound;/* Selectivity of a var < something clause */ > +boollobound_isdefault;/* lobound is a default selectivity? */ > +boolhibound_isdefault;/* hibound is a default selectivity? */ > } RangeQueryClause; > > Around the [1], there was a discussion about introducing the notion of > uncertainty to selectivity. > The isdefualt is a kind of uncertainty value indicating '0/100% uncertain'. > So my feeling is saying > that it's better that Selectivity has certinty component then building a > selectivity arithmetics > involving uncertainty, though I don't have a clear picture. > I see. Indeed if we would change Selectivity so that it includes information about default/non-default, such problems I mentioned would be solved. But I'm afraid that this modification would lead to a big impact, since there are lots of functions that calculate selectivity. I also don't have a concreate idea, so I will think about it. Besides that, I'd like to ask community whether such modification would be acceptable or not. Best regards, Yuzuko Hosoya NTT
Re: Tid scan improvements
On Fri, 21 Dec 2018 at 16:31, Tom Lane wrote: > > Edmund Horner writes: > > For the forward scan, I seem to recall, from your merge join example, > > that it's useful to set the pathkeys even when there are no > > query_pathkeys. We just have to unconditionally set them so that the > > larger plan can make use of them. > > No. Look at indxpath.c: it does not worry about pathkeys unless > has_useful_pathkeys is true, and it definitely does not generate > pathkeys that don't get past truncate_useless_pathkeys. Those > functions are responsible for worrying about whether mergejoin > can use the pathkeys. It's not tidpath.c's job to outthink them. Ok. I think that will simplify things. So if I follow you correctly, we should do: 1. If has_useful_pathkeys is true: generate pathkeys (for CTID ASC), and use truncate_useless_pathkeys on them. 2. If we have tid quals or pathkeys, emit a TID scan path. For the (optional) backwards scan support patch, should we separately emit another path, in the reverse direction? (My current patch only creates one path, and tries to decide what the best direction is by looking at query_pathkeys. This doesn't fit into the above algorithm.)