Re: convert elog(LOG) calls to ereport
On Wed, Dec 02, 2020 at 02:26:24PM +0100, Peter Eisentraut wrote: > There are a number of elog(LOG) calls that appear to be user-facing, so they > should be ereport()s. > @@ -8591,25 +8604,46 @@ LogCheckpointEnd(bool restartpoint) > CheckpointStats.ckpt_sync_rels; > average_msecs = (long) ((average_sync_time + 999) / 1000); > > - elog(LOG, "%s complete: wrote %d buffers (%.1f%%); " > - "%d WAL file(s) added, %d removed, %d recycled; " > - "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; " > - "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; " > - "distance=%d kB, estimate=%d kB", +1 to this change. > @@ -1763,7 +1771,8 @@ pq_setkeepalivesidle(int idle, Port *port) > #else > if (idle != 0) > { > - elog(LOG, "setting the keepalive idle time is not supported"); > + ereport(LOG, > + (errmsg("setting the keepalive idle time is not > supported"))); +1 > --- a/src/backend/optimizer/geqo/geqo_erx.c > +++ b/src/backend/optimizer/geqo/geqo_erx.c > @@ -420,7 +420,8 @@ edge_failure(PlannerInfo *root, Gene *gene, int index, > Edge *edge_table, int num > } > } > > - elog(LOG, "no edge found via random decision and total_edges == > 4"); > + ereport(LOG, > + (errmsg("no edge found via random decision and > total_edges == 4"))); The user can't act upon this without reading the source code. This and the other messages proposed in this file are better as elog(). > @@ -343,7 +346,8 @@ PGSharedMemoryCreate(Size size, >* care. >*/ > if (!CloseHandle(hmap)) > - elog(LOG, "could not close handle to shared memory: error code > %lu", GetLastError()); > + ereport(LOG, > + (errmsg("could not close handle to shared > memory: error code %lu", GetLastError(; The numerous messages proposed in src/backend/port/ files are can't-happen events, and there's little a DBA can do without reading the source code. They're better as elog(). > @@ -6108,8 +6111,9 @@ backend_read_statsfile(void) > /* Copy because timestamptz_to_str returns a > static buffer */ > filetime = pstrdup(timestamptz_to_str(file_ts)); > mytime = pstrdup(timestamptz_to_str(cur_ts)); > - elog(LOG, "stats collector's time %s is later > than backend local time %s", > - filetime, mytime); > + ereport(LOG, > + (errmsg("statistics collector's > time %s is later than backend local time %s", > + filetime, > mytime))); +1 > @@ -769,10 +769,11 @@ StartupReplicationOrigin(void) > replication_states[last_state].remote_lsn = > disk_state.remote_lsn; > last_state++; > > - elog(LOG, "recovered replication state of node %u to %X/%X", > - disk_state.roident, > - (uint32) (disk_state.remote_lsn >> 32), > - (uint32) disk_state.remote_lsn); > + ereport(LOG, > + (errmsg("recovered replication state of node %u > to %X/%X", > + disk_state.roident, > + (uint32) (disk_state.remote_lsn > >> 32), > + (uint32) > disk_state.remote_lsn))); +1 > @@ -1914,7 +1914,8 @@ FileClose(File file) > > /* in any case do the unlink */ > if (unlink(vfdP->fileName)) > - elog(LOG, "could not unlink file \"%s\": %m", > vfdP->fileName); > + ereport(LOG, > + (errmsg("could not unlink file \"%s\": > %m", vfdP->fileName))); +1 > > /* and last report the stat results */ > if (stat_errno == 0) > @@ -1922,7 +1923,8 @@ FileClose(File file) > else > { > errno = stat_errno; > - elog(LOG, "could not stat file \"%s\": %m", > vfdP->fileName); > + ereport(LOG, > + (errmsg("could not stat file \"%s\": > %m", vfdP->fileName))); +1 For the changes I didn't mention explicitly (most of them), I'm -0.5. Many of them "can't happen", use source code terms of art, and/or breach guideline "Avoid mentioning called function names, either; instead say what the code was trying to do" (https://www.postgresql.org/docs/devel/error-style-guide.html).
Re: Two fsync related performance issues?
On Wed, 2 Dec 2020 at 15:41, Michael Paquier wrote: > On Tue, Dec 01, 2020 at 07:39:30PM +0800, Craig Ringer wrote: > > On Wed, 14 Oct 2020, 13:06 Michael Paquier, wrote: > >> Yeah, it is safer to assume that it is the responsability of the > >> backup tool to ensure that because it could be possible that a host is > >> unplugged just after taking a backup, and having Postgres do this work > >> at the beginning of archive recovery would not help in most cases. > > > > Let's document that assumption in the docs for pg_basebackup and the file > > system copy based replica creation docs. With a reference to initdb's > > datadir sync option. > > Do you have any suggestion about that, in the shape of a patch perhaps? > I'll try to get to that, but I have some other docs patches outstanding that I'd like to resolve first. -- Craig Ringer http://www.2ndQuadrant.com/ 2ndQuadrant - PostgreSQL Solutions for the Enterprise
Re: [PATCH] Runtime control of CLOBBER_CACHE_ALWAYS
On Tue, 27 Oct 2020 at 16:34, Peter Eisentraut < peter.eisentr...@2ndquadrant.com> wrote: > On 2020-09-25 09:40, Craig Ringer wrote: > > While working on extensions I've often wanted to enable cache clobbering > > for a targeted piece of code, without paying the price of running it for > > all backends during postgres startup and any initial setup tasks that > > are required. > > > > So here's a patch that, when CLOBBER_CACHE_ALWAYS or > > CLOBBER_CACHE_RECURSIVE are defined, adds a GUC named > > clobber_cache_depth . It defaults to 1 for CLOBBER_CACHE_ALWAYS or 3 for > > CLOBBER_CACHE_RECURSIVE to match the existing compiled-in behaviour. But > > with this change it's now possible to run Pg with clobber_cache_depth=0 > > then set it to 1 only for targeted tests. > > I think it would be great if the cache clobber code is always compiled > in (but turned off) when building with assertions. The would reduce the > number of hoops to jump through to actually use this locally and reduce > the number of surprises from the build farm. > I implemented something a bit like that for a patched postgres where it became an additional configure option. It's easy enough to enable it automatically for USING_CASSERT instead, and I think that's sensible, so I've adapted the patch to do so. As initially written the patch defined the clobber control GUC only if cache clobber control is compiled in. But we don't do that for anything else, so I've changed it to always define the GUC, which will be ignored without effect if no cache clobber control is compiled in. I also renamed the GUC to debug_clobber_cache_depth to match other debug GUCs. For backward compatibility, you can arrange it so that the built-in > default of clobber_cache_depth is 1 or 3 if CLOBBER_CACHE_ALWAYS or > CLOBBER_CACHE_RECURSIVE are defined. > It already does that - see diff hunk for src/include/utils/inval.h in prior patch. I've just changed it to default to 0 if neither are defined and moved it to pg_config_manual.h . I noticed that I missed handling of RECOVER_RELATION_BUILD_MEMORY in the earlier patch - the relcache was eagerly freeing relation descriptor memory without regard for the current clobber_cache_depth value. I've changed that in the updated patch so that RelationBuildDesc only frees memory eagerly if clobber_cache_depth is actually > 0 or if RECOVER_RELATION_BUILD_MEMORY has been explicitly defined to 1. It also preserves the original behaviour of not eagerly freeing memory even when cache clobber is enabled if RECOVER_RELATION_BUILD_MEMORY is explicitly defined to 0. Both CLOBBER_CACHE_ENABLED and RECOVER_RELATION_BUILD_MEMORY are now shown in pg_config_manual.h . A small entry has been added to the docs too, under developer options. The changes add a little more noise than I'd prefer, but I didn't want to vanish support for the compile-time constants or introduce surprising behaviour. To try it out, apply the patch (git am), build with --enable-cassert, then compare: make -C src/test/regress check and PGOPTIONS="-c debug_clobber_cache_depth=1" \ make -C src/test/regress check The speed difference will be obvious if nothing else! It's much more practical using make check-tests and a specific TESTS= list. -- Craig Ringer http://www.2ndQuadrant.com/ 2ndQuadrant - PostgreSQL Solutions for the Enterprise From fed26086ca2d5840ac288846b7b81b0d74af6949 Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Tue, 22 Sep 2020 09:51:00 +0800 Subject: [PATCH v2] Replace CLOBBER_CACHE_ALWAYS with debug_clobber_cache_depth GUC Forced cache invalidation (CLOBBER_CACHE_ALWAYS) has been impractical to use for testing in PostgreSQL because it's so slow and because it's toggled on/off only at build time. It is helpful when hunting bugs in any code that uses the sycache/relcache because causes cache invalidations to be injected whenever it would be possible for an invalidation to occur, whether or not one was really pending. Address this by providing runtime control over cache clobber behaviour using the new debug_clobber_cache_depth GUC. Support is not compiled in at all unless assertions are enabled or CLOBBER_CACHE_ENABLED is explicitly defined at compile time. It defaults to 0 if compiled in, so it has negligible effect on assert build performance by default. When support is compiled in, test code can now set debug_clobber_cache_depth=1 locally to a backend to test specific queries, functions, extensions, etc. Or tests can toggle it globally for a specific test case while retaining normal performance during test setup and teardown. For backwards compatibility with existing test harnesses and scripts, clobber_cache_depth defaults to 1 if CLOBBER_CACHE_ALWAYS is defined, and to 3 if CLOBBER_CACHE_RECURSIVE is defined. CLOBBER_CACHE_ENABLED is now visible in pg_config_manual.h, as is the related RECOVER_RELATION_BUILD_MEMORY setting for the relcache. --- doc/src/sgml/config.sgml| 34 +
Re: Huge memory consumption on partitioned table with FKs
On Thu, Dec 3, 2020 at 2:29 PM Kyotaro Horiguchi wrote: > At Thu, 3 Dec 2020 12:27:53 +0900, Amit Langote > wrote in > > On Thu, Dec 3, 2020 at 10:15 AM Kyotaro Horiguchi > > wrote: > > > I don't think you're missing something. As I (tried to..) mentioned in > > > another branch of this thread, you're right. On the otherhand > > > fk_attnums is actually used to generate the query, thus *issue* > > > potentially affects the query. Although that might be paranoic, that > > > possibility is eliminated by checking the congruency(?) of fk_attnums > > > (or other members). We might even be able to share a riinfo among > > > such children. > > > > Just to be sure I've not misunderstood you, let me mention why I think > > the queries used by different partitions all turn out to be the same > > despite attribute number differences among partitions. The places > > that uses fk_attnums when generating a query are these among others: > > > > Oid fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]); > > ... > > Oid fk_coll = RIAttCollation(fk_rel, riinfo->fk_attnums[i]); > > ... > > quoteOneName(attname, > > RIAttName(fk_rel, riinfo->fk_attnums[i])); > > > > For the queries on the referencing side ("check" side), > > type/collation/attribute name determined using the above are going to > > be the same for all partitions in a given tree irrespective of the > > attribute number, because they're logically the same column. On the > > Yes, I know that, which is what I meant by "practically" or > "actually", but it is not explicitly defined AFAICS. Well, I think it's great that we don't have to worry *in this part of the code* about partition's fk_attnums not being congruent with the root parent's, because ensuring that is the responsibility of the other parts of the system such as DDL. If we have any problems in this area, they should be dealt with by ensuring that there are no bugs in those other parts. > Thus that would be no longer an issue if we explicitly define that > "When conpraentid stores a valid value, each element of fk_attnums > points to logically the same attribute with the RI_ConstraintInfo for > the parent constraint." Or I'd be happy if we have such a comment > there instead. I saw a comment in Kuroda-san's v2 patch that is perhaps meant to address this point, but the placement needs to be reconsidered: @@ -366,6 +368,14 @@ RI_FKey_check(TriggerData *trigdata) querysep = "WHERE"; for (int i = 0; i < riinfo->nkeys; i++) { + + /* + * We share the same plan among all relations in a partition + * hierarchy. The plan is guaranteed to be compatible since all of + * the member relations are guaranteed to have the equivalent set + * of foreign keys in fk_attnums[]. + */ + Oid pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]); Oid fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]); A more appropriate place for this kind of comment would be where fk_attnums is defined or in ri_BuildQueryKey() that is shared by different RI query issuing functions. > > referenced side ("restrict", "cascade", "set" side), as you already > > mentioned, fk_attnums refers to the top parent table of the > > referencing side, so no possibility of they being different in the > > various referenced partitions' RI_ConstraintInfos. > > Right. (I'm not sure I have mention that here, though:p)A Maybe I misread but I think you did in your email dated Dec 1 where you said: "After an off-list discussion, we confirmed that even in that case the patch works as is because fk_attnum (or contuple.conkey) always stores key attnums compatible to the topmost parent when conparent has a valid value (assuming the current usage of fk_attnum), but I still feel uneasy to rely on that unclear behavior." > > On the topic of how we'd be able to share even the RI_ConstraintInfos > > among partitions, that would indeed look a bit more elaborate than the > > patch we have right now. > > Maybe just letting the hash entry for the child riinfo point to the > parent riinfo if all members (other than constraint_id, of course) > share the exactly the same values. No need to count references since > we don't going to remove riinfos. Ah, something maybe worth trying. Although the memory we'd save by sharing the RI_ConstraintInfos would not add that much to the savings we're having by sharing the plan, because it's the plans that are a memory hog AFAIK. > > > About your patch, it calculates the root constrid at the time an > > > riinfo is created, but when the root-partition is further attached to > > > another partitioned-table after the riinfo creation, > > > constraint_root_id gets stale. Of course that dones't matter > > > practically, though. > > > > Maybe we could also store the hash value of the root constraint OID as > > rootHashValue and check for that one too in > > InvalidateConstrai
Re: [HACKERS] logical decoding of two-phase transactions
On Thu, Dec 3, 2020 at 5:34 PM Masahiko Sawada wrote: > While looking at the patch set I found that the tests in > src/test/subscription don't work with this patch. I got the following > error: > > 2020-12-03 15:18:12.666 JST [44771] tap_sub ERROR: unrecognized > pgoutput option: two_phase > 2020-12-03 15:18:12.666 JST [44771] tap_sub CONTEXT: slot "tap_sub", > output plugin "pgoutput", in the startup callback > 2020-12-03 15:18:12.666 JST [44771] tap_sub STATEMENT: > START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '2', > two_phase 'on', publication_names '"tap_pub","tap_pub_ins_only"') > > In v29-0009 patch "two_phase" option is added on the subscription side > (i.g., libpqwalreceiver) but it seems not on the publisher side > (pgoutput). > The v29-0009 patch is still a WIP for a new SUBSCRIPTION "two_phase" option so it is not yet fully implemented. I did run following prior to upload but somehow did not see those failures yesterday: cd src/test/subscription make check Anyway, as 0009 is the last of the set please just git apply --reverse that one if it is causing a problem. Sorry for any inconvenience. I will add the missing functionality to 0009 as soon as I can. Kind Regards, Peter Smith. Fujitsu Australia
RE: [Patch] Optimize dropping of relation buffers using dlist
From: Jamison, Kirk/ジャミソン カーク > Apologies for the delay, but attached are the updated versions to simplify the > patches. Looks good for me. Thanks to Horiguchi-san and Andres-san, the code bebecame further compact and easier to read. I've marked this ready for committer. To the committer: I don't think it's necessary to refer to COMMIT/ROLLBACK PREPARED in the following part of the 0003 commit message. They surely call DropRelFileNodesAllBuffers(), but COMMIT/ROLLBACK also call it. the full scan threshold. This improves the DropRelationFiles() performance when the TRUNCATE command truncated off any of the empty pages at the end of relation, and when dropping relation buffers if a commit/rollback transaction has been prepared in FinishPreparedTransaction(). Regards Takayuki Tsunakawa
Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
On Wed, Dec 02, 2020 at 10:30:08PM -0600, Justin Pryzby wrote: > Good idea. I think you mean like this. Yes, something like that. Thanks. > +typedef struct ReindexParams { > + bool concurrently; > + bool verbose; > + bool missingok; > + > + int options;/* bitmask of lowlevel REINDEXOPT_* */ > +} ReindexParams; > + By moving everything into indexcmds.c, keeping ReindexParams within it makes sense to me. Now, there is no need for the three booleans because options stores the same information, no? > struct ReindexIndexCallbackState > { > - int options;/* options from > statement */ > + boolconcurrently; > Oid locked_table_oid; /* tracks previously > locked table */ > }; Here also, I think that we should just pass down the full ReindexParams set. -- Michael signature.asc Description: PGP signature
Re: convert elog(LOG) calls to ereport
On Wed, Dec 02, 2020 at 11:04:45AM -0300, Alvaro Herrera wrote: > Please take the opportunity to move the flag name out of the message in > this one, also. I do wonder if it'd be a good idea to move the syscall > name itself out of the message, too; that would reduce the number of > messages to translate 50x to just "%s(%s) failed: %m" instead of one > message per distinct syscall. +1. + else + ereport(LOG, + (errmsg("checkpoint starting:%s%s%s%s%s%s%s%s", Would it be better to add a note for translators here, in short that all those %s are options related to checkpoint/restartpoints? The ones in geqo_erx.c look like debugging information, and the ones in win32_shmem.c for segment creation are code paths only used by the postmaster. -- Michael signature.asc Description: PGP signature
Re: Commitfest 2020-11 is closed
On Thu, Dec 3, 2020 at 12:02 PM Josef Šimánek wrote: > st 2. 12. 2020 v 23:36 odesílatel Andrew Dunstan napsal: > > On 12/2/20 5:13 PM, Thomas Munro wrote: > > > I'm experimenting with Github's built in CI. All other ideas welcome. > > > > I'd look very closely at gitlab. > > I was about to respond before with the same idea. Feel free to ping > regarding another CI. Also there is possibility to move to GitHub > Actions (free open source CI). I got some experience with that as > well. I spent today getting something working on Github just to try it out, and started a new thread[1] about that. I've not tried Gitlab and have no opinion about that; if someone has a working patch similar to that, I'd definitely be interested to take a look. I've looked at some others. For what cfbot is doing (namely: concentrating the work of hundreds into one "account" via hundreds of branches), the spin-up times and concurrency limits are a bit of a problem on many of them. FWIW I think they're all wonderful for offering this service to open source projects and I'm grateful that they do it! [1] https://www.postgresql.org/message-id/flat/CA%2BhUKG%2By_SHVQcU3CPokmJxuHp1niebCjq4XzZizf8SR9ZdQRQ%40mail.gmail.com
Re: Printing LSN made easy
On Mon, Nov 30, 2020 at 8:07 PM Alvaro Herrera wrote: > On 2020-Nov-30, Ashutosh Bapat wrote: > > > Peter Eisentraut explained that the translation system can not handle > > strings broken by macros, e.g. errmsg("foo" MACRO "bar"), since it > doesn't > > know statically what the macro resolves to. But I am not familiar with > our > > translation system myself. But if we allow INT64_FORMAT, LSN_FORMAT > should > > be allowed. That makes life much easier. We do not need other functions > at > > all. > > > > Peter E or somebody familiar with translations can provide more > > information. > > We don't allow INT64_FORMAT in translatable strings, period. (See > commit 6a1cd8b9236d which undid some hacks we had to use to work around > the lack of it, by allowing %llu to take its place.) For the same > reason, we cannot allow LSN_FORMAT or similar constructions in > translatable strings either. > If the solution to ugliness of LSN printing is going to require that we > add a "%s" which prints a previously formatted string with LSN_FORMAT, > it won't win us anything. > Thanks for the commit. The commit reverted code which uses a char array to print int64. On the same lines, using a character array to print an LSN won't be acceptable. I agree. Maybe we should update the section "Message-Writing Guidelines" in doc/src/sgml/nls.sgml. I think it's implied by "Do not construct sentences at run-time, like ... " but using a macro is not really constructing sentences run time. Hopefully, some day, we will fix the translation system to handle strings with macros and make it easier to print PG specific objects in strings easily. > > As annoyed as I am about our LSN printing, I don't think this patch > has the solutions we need. > I think we could at least accept the macros so that they can be used in non-translatable strings i.e. use it with sprints, appendStringInfo variants etc. The macros will also serve to document the string format of LSN. Developers can then copy from the macros rather than copying from "some other" (erroneous) usage in the code. -- Best Wishes, Ashutosh
Re: [HACKERS] logical decoding of two-phase transactions
On Wed, Dec 2, 2020 at 8:24 PM Peter Smith wrote: > > I have rebased the v28 patch set (made necessary due to recent commit [1]) > [1] > https://github.com/postgres/postgres/commit/0926e96c493443644ba8e96b5d96d013a9ffaf64 > > And at the same time I have added patch 0009 to this set - This is for > the new SUBSCRIPTION option "two_phase" (0009 is still WIP but > stable). > > PSA new patch set with version bumped to v29. Thank you for updating the patch! While looking at the patch set I found that the tests in src/test/subscription don't work with this patch. I got the following error: 2020-12-03 15:18:12.666 JST [44771] tap_sub ERROR: unrecognized pgoutput option: two_phase 2020-12-03 15:18:12.666 JST [44771] tap_sub CONTEXT: slot "tap_sub", output plugin "pgoutput", in the startup callback 2020-12-03 15:18:12.666 JST [44771] tap_sub STATEMENT: START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '2', two_phase 'on', publication_names '"tap_pub","tap_pub_ins_only"') In v29-0009 patch "two_phase" option is added on the subscription side (i.g., libpqwalreceiver) but it seems not on the publisher side (pgoutput). Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/
Github Actions (CI)
Hi hackers, I'm looking for more horsepower for testing commitfest entries automatically, and today I tried out $SUBJECT. The attached is a rudimentary first attempt, for show-and-tell. If you have a Github account, you just have to push it to a branch there and look at the Actions tab on the web page for the results. Does anyone else have .github files and want to share, to see if we can combine efforts here? The reason for creating three separate "workflows" for Linux, Windows and macOS rather than three separate "jobs" inside one workflow is so that cfbot.cputube.org could potentially get separate pass/fail results for each OS out of the API rather than one combined result. I rather like that feature of cfbot's results. (I could be wrong about needing to do that, this is the first time I've ever looked at this stuff.) The Windows test actually fails right now, exactly as reported by Ranier[1]. It is a release build on a recent MSVC, so I guess that is expected and off-topic for this thread. But generally, .github/workflows/ci-windows.yml is the weakest part of this. It'd be great to get a debug/assertion build, show backtraces when it crashes, run more of the tests, etc etc, but I don't know nearly enough about Windows to do that myself. Another thing is that it uses Choco for flex and bison; it'd be better to find those on the image, if possible. Also, for all 3 OSes, it's not currently attempting to cache build results or anything like that. I'm a bit sad that GH doesn't have FreeBSD build runners. Those are now popping up on other CIs, but I'm not sure if their free/open source tiers have enough resources for cfbot. [1] https://www.postgresql.org/message-id/flat/CAEudQArhn8bH836OB%2B3SboiaeEcgOtrJS58Bki4%3D5yeVqToxgw%40mail.gmail.com From 424bcae7f1fcb1ada0e7046bfc5e0c5254c6f439 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Thu, 3 Dec 2020 10:58:16 +1300 Subject: [PATCH] Github CI WIP --- .github/workflows/ci-linux.yml | 43 ++ .github/workflows/ci-macos.yml | 38 +++ .github/workflows/ci-windows-buildsetup.pl | 35 ++ .github/workflows/ci-windows-dumpregr.pl | 22 +++ .github/workflows/ci-windows.yml | 32 5 files changed, 170 insertions(+) create mode 100644 .github/workflows/ci-linux.yml create mode 100644 .github/workflows/ci-macos.yml create mode 100644 .github/workflows/ci-windows-buildsetup.pl create mode 100644 .github/workflows/ci-windows-dumpregr.pl create mode 100644 .github/workflows/ci-windows.yml diff --git a/.github/workflows/ci-linux.yml b/.github/workflows/ci-linux.yml new file mode 100644 index 00..8ee32770cd --- /dev/null +++ b/.github/workflows/ci-linux.yml @@ -0,0 +1,43 @@ +name: ci-linux +on: [push] +jobs: + ci-linux: +runs-on: ubuntu-latest +steps: +- uses: actions/checkout@v2 +- name: Install packages + run: | +sudo apt-get --yes update +sudo apt-get --yes install gcc libreadline-dev flex bison make perl libipc-run-perl clang llvm-dev libperl-dev libpython-dev tcl-dev libldap2-dev libicu-dev docbook-xml docbook-xsl fop libxml2-utils xsltproc krb5-admin-server krb5-kdc krb5-user slapd ldap-utils libssl-dev pkg-config locales-all gdb + env: +DEBIAN_FRONTEND: noninteractive +- name: Configure + run: ./configure --enable-cassert --enable-debug --enable-tap-tests --with-tcl --with-python --with-perl --with-ldap --with-openssl --with-icu --with-llvm +- name: Build + run: | +echo "COPT=-Wall -Werror" > src/Makefile.custom +make -s -j3 +- name: Check world + run: | +echo '/tmp/%e-%s-%p.core' | sudo tee /proc/sys/kernel/core_pattern +ulimit -c unlimited +make -s check-world + env: +PG_TEST_EXTRA: "ssl kerberos" +#PG_TEST_EXTRA: "ssl ldap kerberos" why does slapd fail to start? +- name: Look for clues + if: ${{ failure() }} + run: | +# dump useful logs +for F in ` find . -name initdb.log -o -name regression.diffs ` ; do + echo === $F === + head -1000 $F +done +# look for core files and spit out backtraces +for corefile in $(find /tmp/ -name '*.core' 2>/dev/null) ; do + binary=$(gdb -quiet -core $corefile -batch -ex 'info auxv' | grep AT_EXECFN | perl -pe "s/^.*\"(.*)\"\$/\$1/g") + echo dumping $corefile for $binary + gdb --batch --quiet -ex "thread apply all bt full" -ex "quit" $binary $corefile +done +- name: Documentation + run: make -s docs diff --git a/.github/workflows/ci-macos.yml b/.github/workflows/ci-macos.yml new file mode 100644 index 00..2c20a5a279 --- /dev/null +++ b/.github/workflows/ci-macos.yml @@ -0,0 +1,38 @@ +name: ci-macos +on: [push] +jobs: + ci-macos: +runs-on: macos-latest +steps: +- uses: actions/checkout@v2 +- name: Install packages +
Re: Printing LSN made easy
On Mon, Nov 30, 2020 at 7:38 PM Li Japin wrote: > Hi, > > On Nov 30, 2020, at 9:06 PM, Ashutosh Bapat > wrote: > > On Fri, Nov 27, 2020 at 9:51 PM Li Japin wrote: > > > Hi, > > Here, we cannot use sizeof(but) to get the buf size, because it is a > pointer, so it always > 8 bytes on 64-bit or 4 bytes on 32-bit machine. > > > For an array, the sizeof() returns the size of memory consumed by the > array. See section "Application to arrays" at > https://en.wikipedia.org/wiki/Sizeof. > > > That’s true! However, in pg_lsn_out_buffer(), it converts to a pointer, > not an array. See the following test: > > Ah! Thanks for pointing that out. I have fixed this in my repository. However, from Alvaro's reply it looks like the approach is not acceptable, so I am not posting the fixed version here. -- Best Wishes, Ashutosh
Re: should INSERT SELECT use a BulkInsertState?
On Wed, Dec 2, 2020 at 10:24 PM Justin Pryzby wrote: > > One loose end in this patch is how to check for volatile default expressions. > > copyfrom.c is a utility statement, so it can look at the parser's column list: > COPY table(c1,c2)... > > However, for INSERT, in nodeModifyTable.c, it looks like parsing, rewriting, > and planning are done, at which point I don't know if there's a good way to > find that. The default expressions will have been rewritten into the planned > statement. > > We need the list of columns whose default is volatile, excluding columns for > which a non-default value is specified. > > INSERT INTO table (c1,c2) VALUES (1,default); > > We'd want the list of any column in the table with a volatile default, > excluding columns c1, but not-excluding explicit default columns c2 or any > implicit default columns (c3, etc). > > Any idea ? > I think we should be doing all the necessary checks in the planner and have a flag in the planned stmt to indicate whether to go with multi insert or not. For the required checks, we can have a look at how the existing COPY decides to go with either CIM_MULTI or CIM_SINGLE. Now, the question of how we can get to know whether a given relation has default expressions or volatile expressions, it is worth to look at build_column_default() and contain_volatile_functions(). I prefer to have the multi insert deciding code in COPY and INSERT SELECT, in a single common function which can be reused. Though COPY has somethings like default expressions and others ready unlike INSERT SELECT, we can try to keep them under a common function and say for COPY we can skip some code and for INSERT SELECT we can do extra work to find default expressions. Although unrelated, for parallel inserts in INSERT SELECT[1], in the planner there are some checks to see if the parallelism is safe or not. Check max_parallel_hazard_for_modify() in v8-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch from [1]. On the similar lines, we can also have multi insert deciding code. [1] https://www.postgresql.org/message-id/CAJcOf-fy3P%2BkDArvmbEtdQTxFMf7Rn2%3DV-sqCnMmKO3QKBsgPA%40mail.gmail.com With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Huge memory consumption on partitioned table with FKs
At Thu, 3 Dec 2020 12:27:53 +0900, Amit Langote wrote in > Hello, > > On Thu, Dec 3, 2020 at 10:15 AM Kyotaro Horiguchi > wrote: > > At Wed, 2 Dec 2020 22:02:50 +0900, Amit Langote > > wrote in > > > Hmm, I don't see what the problem is here, because it's not > > > RI_ConstraintInfo that is being shared among partitions of the same > > > parent, but the RI query (and the SPIPlanPtr for it) that is issued > > > through SPI. The query (and the plan) turns out to be the same no > > > matter what partition's RI_ConstraintInfo is first used to generate > > > it. What am I missing? > > > > I don't think you're missing something. As I (tried to..) mentioned in > > another branch of this thread, you're right. On the otherhand > > fk_attnums is actually used to generate the query, thus *issue* > > potentially affects the query. Although that might be paranoic, that > > possibility is eliminated by checking the congruency(?) of fk_attnums > > (or other members). We might even be able to share a riinfo among > > such children. > > Just to be sure I've not misunderstood you, let me mention why I think > the queries used by different partitions all turn out to be the same > despite attribute number differences among partitions. The places > that uses fk_attnums when generating a query are these among others: > > Oid fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]); > ... > Oid fk_coll = RIAttCollation(fk_rel, riinfo->fk_attnums[i]); > ... > quoteOneName(attname, > RIAttName(fk_rel, riinfo->fk_attnums[i])); > > For the queries on the referencing side ("check" side), > type/collation/attribute name determined using the above are going to > be the same for all partitions in a given tree irrespective of the > attribute number, because they're logically the same column. On the Yes, I know that, which is what I meant by "practically" or "actually", but it is not explicitly defined AFAICS. Thus that would be no longer an issue if we explicitly define that "When conpraentid stores a valid value, each element of fk_attnums points to logically the same attribute with the RI_ConstraintInfo for the parent constraint." Or I'd be happy if we have such a comment there instead. > referenced side ("restrict", "cascade", "set" side), as you already > mentioned, fk_attnums refers to the top parent table of the > referencing side, so no possibility of they being different in the > various referenced partitions' RI_ConstraintInfos. Right. (I'm not sure I have mention that here, though:p)A > On the topic of how we'd be able to share even the RI_ConstraintInfos > among partitions, that would indeed look a bit more elaborate than the > patch we have right now. Maybe just letting the hash entry for the child riinfo point to the parent riinfo if all members (other than constraint_id, of course) share the exactly the same values. No need to count references since we don't going to remove riinfos. > > > BTW, one problem with Kuroda-san's patch is that it's using > > > conparentid as the shared key, which can still lead to multiple > > > queries being generated when using multi-level partitioning, because > > > there would be many (intermediate) parent constraints in that case. > > > We should really be using the "root" constraint OID as the key, > > > because there would only be one root from which all constraints in a > > > given partition hierarchy would've originated. Actually, I had > > > written a patch a few months back to do exactly that, a polished > > > version of which I've attached with this email. Please take a look. > > > > I don't like that the function self-recurses but > > ri_GetParentConstrOid() actually climbs up to the root constraint, so > > the patch covers the multilevel partitioning cases. > > Sorry, I got too easily distracted by the name Kuroda-san chose for > the field in RI_ConstraintInfo and the function. I thought the same at the first look to the function. > > About your patch, it calculates the root constrid at the time an > > riinfo is created, but when the root-partition is further attached to > > another partitioned-table after the riinfo creation, > > constraint_root_id gets stale. Of course that dones't matter > > practically, though. > > Maybe we could also store the hash value of the root constraint OID as > rootHashValue and check for that one too in > InvalidateConstraintCacheCallBack(). That would take care of this > unless I'm missing something. Seems to be sound. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [HACKERS] [PATCH] Generic type subscripting
st 2. 12. 2020 v 21:02 odesílatel Tom Lane napsal: > Dmitry Dolgov <9erthali...@gmail.com> writes: > >> On Wed, Dec 02, 2020 at 12:58:51PM -0500, Tom Lane wrote: > >> So ... one of the things that's been worrying me about this patch > >> from day one is whether it would create a noticeable performance > >> penalty for existing use-cases. I did a small amount of experimentation > >> about that with the v35 patchset, and it didn't take long at all to > >> find that this: > >> ... > >> is about 15% slower with the patch than with HEAD. I'm not sure > >> what an acceptable penalty might be, but 15% is certainly not it. > > > I've tried to reproduce that, but get ~2-4% slowdown (with a pinned > > backend, no turbo etc). Are there any special steps I've probably > > missed? > > Hmm, no, I just built with --disable-cassert and otherwise my usual > development options. > > I had experimented with some other variants of the test case, > where the repeated statement is > > a[i] := i;-- about the same > a[i] := a[i-1] + 1; -- 7% slower > a[i] := a[i-1] - a[i-2]; -- 15% slower > > so it seems clear that the penalty is on the array fetch not array > assign side. This isn't too surprising now that I think about it, > because plpgsql's array assignment code is untouched by the patch > (which is a large feature omission BTW: you still can't write > jsonb['x'] := y; > The refactoring of the left part of the assignment statement in plpgsql probably can be harder work than this patch. But it should be the next step. in plpgsql). > > I tested the last patch on my FC33 Lenovo T520 (I7) and I don't see 15% slowdown too .. On my comp there is a slowdown of about 1.5-3%. I used your function arraytest. Regards Pavel > regards, tom lane >
pg_ctl.exe file deleted automatically
Hi, We are using Windows 2019 server. Sometimes we see the pg_ctl.exe getting automatically deleted. Due to this, while starting up Postgres windows service we are getting the error. "Error 2: The system cannot find the file specified" Can you let us know the potential causes for this pg_ctl.exe file deletion? Regards, Joel
Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
On Thu, Dec 03, 2020 at 10:19:43AM +0900, Michael Paquier wrote: > OK, this one is now committed. As of this thread, I think that we are > going to need to do a bit more once we add options that are not just > booleans for both commands (the grammar rules do not need to be > changed now): > - Have a ReindexParams, similarly to VacuumParams except that we store > the results of the parsing in a single place. With the current HEAD, > I did not see yet the point in doing so because we just need an > integer that maps to a bitmask made of ReindexOption. > - The part related to ReindexStmt in utility.c is getting more and > more complicated, so we could move most of the execution into > indexcmds.c with some sort of ExecReindex() doing the option parsing > job, and go to the correct code path depending on the object type > dealt with. Good idea. I think you mean like this. I don't know where to put the struct. I thought maybe the lowlevel, integer options should live in the struct, in addition to bools, but it's not important. -- Justin >From 520d1c54d6988d69768178b4abf03c5837654b9a Mon Sep 17 00:00:00 2001 From: Alexey Kondratov Date: Wed, 23 Sep 2020 18:21:16 +0300 Subject: [PATCH v31 3/5] Refactor and reuse set_rel_tablespace() --- src/backend/catalog/index.c | 74 src/backend/commands/indexcmds.c | 35 --- src/include/catalog/index.h | 2 + 3 files changed, 49 insertions(+), 62 deletions(-) diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index 532c11e9dd..b317f556df 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -3607,7 +3607,6 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence, Relation iRel, heapRelation; Oid heapId; - Oid oldTablespaceOid; IndexInfo *indexInfo; volatile bool skipped_constraint = false; PGRUsage ru0; @@ -3723,41 +3722,17 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence, tablespaceOid = InvalidOid; /* - * Set the new tablespace for the relation. Do that only in the - * case where the reindex caller wishes to enforce a new tablespace. + * Set the new tablespace for the relation if requested. */ - oldTablespaceOid = iRel->rd_rel->reltablespace; if (set_tablespace && - (tablespaceOid != oldTablespaceOid || - (tablespaceOid == MyDatabaseTableSpace && OidIsValid(oldTablespaceOid + set_rel_tablespace(indexId, tablespaceOid)) { - Relation pg_class; - Form_pg_class rd_rel; - HeapTuple tuple; - - /* First get a modifiable copy of the relation's pg_class row */ - pg_class = table_open(RelationRelationId, RowExclusiveLock); - - tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(indexId)); - if (!HeapTupleIsValid(tuple)) - elog(ERROR, "cache lookup failed for relation %u", indexId); - rd_rel = (Form_pg_class) GETSTRUCT(tuple); - /* * Mark the relation as ready to be dropped at transaction commit, * before making visible the new tablespace change so as this won't * miss things. */ RelationDropStorage(iRel); - - /* Update the pg_class row */ - rd_rel->reltablespace = tablespaceOid; - CatalogTupleUpdate(pg_class, &tuple->t_self, tuple); - - heap_freetuple(tuple); - - table_close(pg_class, RowExclusiveLock); - RelationAssumeNewRelfilenode(iRel); /* Make sure the reltablespace change is visible */ @@ -4063,6 +4038,51 @@ reindex_relation(Oid relid, int flags, int options, Oid tablespaceOid) return result; } +/* + * set_rel_tablespace - modify relation tablespace in the pg_class entry. + * + * 'reloid' is an Oid of relation to be modified. + * 'tablespaceOid' is an Oid of new tablespace. + * + * Catalog modification is done only if tablespaceOid is different from + * the currently set. Returned bool value is indicating whether any changes + * were made or not. + */ +bool +set_rel_tablespace(Oid reloid, Oid tablespaceOid) +{ + Relation pg_class; + HeapTuple tuple; + Form_pg_class rd_rel; + bool changed = false; + Oid oldTablespaceOid; + + /* Get a modifiable copy of the relation's pg_class row. */ + pg_class = table_open(RelationRelationId, RowExclusiveLock); + + tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(reloid)); + if (!HeapTupleIsValid(tuple)) + elog(ERROR, "cache lookup failed for relation %u", reloid); + rd_rel = (Form_pg_class) GETSTRUCT(tuple); + + /* No work if no change in tablespace. */ + oldTablespaceOid = rd_rel->reltablespace; + if (tablespaceOid != oldTablespaceOid || + (tablespaceOid == MyDatabaseTableSpace && OidIsValid(oldTablespaceOid))) + { + /* Update the pg_class row. */ + rd_rel->reltablespace = (tablespaceOid == MyDatabaseTableSpace) ? + InvalidOid : tablespaceOid; + CatalogTupleUpdate(pg_class, &tuple->t_self, tuple); + + changed = true; + } + + heap_freetuple(tuple); + table_close(pg_class, RowExclusiveLock); + + return changed; +} /*
Re: Deprecate custom encoding conversions
On 2020/12/03 13:07, Tom Lane wrote: Fujii Masao writes: On 2020/12/03 1:04, Heikki Linnakangas wrote: We never use non-default conversions for anything. All code that performs encoding conversions only cares about the default ones. Yes. I had to update pg_conversion.condefault directly so that we can use custom encoding when I registered it. The direct update of pg_conversion is of course not good thing. So I was wondering if we should have something like ALTER CONVERSION SET DEFAULT. Perhaps, but what I'd think is more urgent is having some way for a session to select a non-default conversion for client I/O. +1 What about adding new libpq parameter like encoding_conversion (like client_encoding) so that a client can specify what conversion to use? That setting is sent to the server while establishing the connection. Probably we would need to verify that the setting of this new parameter is consistent with that of client_encoding. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Deprecate custom encoding conversions
Fujii Masao writes: > On 2020/12/03 1:04, Heikki Linnakangas wrote: >> We never use non-default conversions for anything. All code that performs >> encoding conversions only cares about the default ones. > Yes. I had to update pg_conversion.condefault directly so that we can > use custom encoding when I registered it. The direct update of > pg_conversion is of course not good thing. So I was wondering > if we should have something like ALTER CONVERSION SET DEFAULT. Perhaps, but what I'd think is more urgent is having some way for a session to select a non-default conversion for client I/O. regards, tom lane
Re: Deprecate custom encoding conversions
On 2020/12/03 1:04, Heikki Linnakangas wrote: Hi, PostgreSQL allows writing custom encoding conversion functions between any character encodings, using the CREATE CONVERSION command. It's pretty flexible, you can define default and non-default conversions, and the conversions live in schemas so you can have multiple conversions installed in a system and you can switch between them by changing search_path. However: We never use non-default conversions for anything. All code that performs encoding conversions only cares about the default ones. Yes. I had to update pg_conversion.condefault directly so that we can use custom encoding when I registered it. The direct update of pg_conversion is of course not good thing. So I was wondering if we should have something like ALTER CONVERSION SET DEFAULT. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Deprecate custom encoding conversions
On 2020/12/03 11:48, tsunakawa.ta...@fujitsu.com wrote: From: Michael Paquier Tsunakawa-san, could you post a link to this article, if possible? I am curious about their problem and why they used CREATE CONVERSION as a way to solve it. That's fine even if it is in Japanese. I just pulled info from my old memory in my previous mail. Now the following links seem to be relevant. (They should be different from what I saw in the past.) https://ml.postgresql.jp/pipermail/pgsql-jp/2007-January/013103.html https://teratail.com/questions/295988 IIRC, the open source Postgres extension EUDC also uses CREATE CONVERSION. FWIW, about four years before, for the real project, I wrote the extension [1] that provides yet another encoding conversion from UTF-8 to EUC_JP, to handle some external characters. This extension uses CREATE CONVERSION. [1] https://github.com/MasaoFujii/pg_fallback_utf8_to_euc_jp Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: A new function to wait for the backend exit after termination
Thanks for the review. On Thu, Dec 3, 2020 at 7:24 AM Hou, Zhijie wrote: > > 1. > + > + ereport(WARNING, > + (errmsg("could not wait for the termination of the > backend with PID %d within %ld milliseconds", > + pid, timeout))); > + > > The code use %ld to print int64 type. > How about use INT64_FORMAT, which looks more appropriate. > Changed it to use %lld and typecasting timeout to (long long int) as suggested by Tom. > > 2. > + if (timeout <= 0) > + { > + ereport(WARNING, > + (errmsg("timeout cannot be negative or zero: > %ld", timeout))); > + PG_RETURN_BOOL(r); > + } > > The same as 1. > Changed. > > 3. > +pg_terminate_backend_and_wait(PG_FUNCTION_ARGS) > +{ > + int pid = PG_GETARG_DATUM(0); > > +pg_wait_backend(PG_FUNCTION_ARGS) > +{ > + int pid = PG_GETARG_INT32(0); > > The code use different macro to get pid, > How about use PG_GETARG_INT32(0) for each one. > Changed. > I am Sorry I forgot a possible typo comment. > > +{ oid => '16386', descr => 'terminate a backend process and wait for it\'s > exit or until timeout occurs' > > Does the following change looks better? > > Wait for it\'s exit => Wait for its exit > Changed. > > I changed the status to 'wait on anthor'. > The others of the patch LGTM, > I think it can be changed to Ready for Committer again, when this comment is > confirmed. > Attaching v4 patch. Please have a look. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com From e3f364e8a5a06b3e122b657e668146f220549acb Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Thu, 3 Dec 2020 09:09:13 +0530 Subject: [PATCH v4] pg_terminate_backend with wait, timeout and pg_wait_backend with timeout This patch adds two new functions: 1. pg_terminate_backend(pid, wait, timeout) - terminate and wait or timeout for a given backend. 2. pg_wait_backend(pid, timeout) - check the existence of the backend with a given PID and wait or timeout until it goes away. --- doc/src/sgml/func.sgml| 28 +- src/backend/catalog/system_views.sql | 10 ++ src/backend/postmaster/pgstat.c | 3 + src/backend/storage/ipc/signalfuncs.c | 137 ++ src/include/catalog/pg_proc.dat | 9 ++ src/include/pgstat.h | 3 +- 6 files changed, 187 insertions(+), 3 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index df29af6371..6b77b80336 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -23954,7 +23954,7 @@ SELECT collation for ('foo' COLLATE "de_DE"); pg_terminate_backend -pg_terminate_backend ( pid integer ) +pg_terminate_backend ( pid integer, wait boolean, timeout bigint DEFAULT 100 ) boolean @@ -23962,7 +23962,31 @@ SELECT collation for ('foo' COLLATE "de_DE"); specified process ID. This is also allowed if the calling role is a member of the role whose backend is being terminated or the calling role has been granted pg_signal_backend, -however only superusers can terminate superuser backends. +however only superusers can terminate superuser backends. When no +wait and timeout are +provided, only SIGTERM is sent to the backend with the given process +ID and false is returned immediately. But the +backend may still exist. With wait specified as +true, the function waits for the backend termination +until the given timeout or the default 100 +milliseconds. On timeout false is returned. + + + + + + + pg_wait_backend + +pg_wait_backend ( pid integer, timeout bigint DEFAULT 100 ) +boolean + + +Check the existence of the session whose backend process has the +specified process ID. On success return true. This +function waits until the given timeout or the +default 100 milliseconds. On timeout false is +returned. diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index b140c210bc..2d1907b4a8 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -1246,6 +1246,16 @@ CREATE OR REPLACE FUNCTION RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote' PARALLEL SAFE; +CREATE OR REPLACE FUNCTION + pg_terminate_backend(pid integer, wait boolean, timeout int8 DEFAULT 100) + RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_terminate_backend_and_wait' + PARALLEL UNSAFE; + +CREATE OR REPLACE FUNCTION + pg_wait_backend(pid integer, timeout int8 DEFAULT 100) + RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_wait_backend' + PARALLEL UNSAFE; + -- legacy definition for compatibility with 9.3
RE: Disable WAL logging to speed up data loading
From: Osumi, Takamichi/大墨 昂道 > I've made a new patch v05 that took in comments to filter out WALs more > strictly > and addressed some minor fixes that were discussed within past few days. > Also, I changed the documentations, considering those modifications. The code looks good, and the performance seems to be nice, so I marked this ready for committer. Regards Takayuki Tsunakawa
RE: [Patch] Optimize dropping of relation buffers using dlist
On Thursday, November 26, 2020 4:19 PM, Horiguchi-san wrote: > Hello, Kirk. Thank you for the new version. Apologies for the delay, but attached are the updated versions to simplify the patches. The changes reflected most of your comments/suggestions. Summary of changes in the latest versions. 1. Updated the function description of DropRelFileNodeBuffers in 0003. 2. Updated the commit logs of 0003 and 0004. 3, FindAndDropRelFileNodeBuffers is now called for each relation fork, instead of for all involved forks. 4. Removed the unnecessary palloc() and subscripts like forks[][], firstDelBlock[], nforks, as advised by Horiguchi-san. The memory allocation for block[][] was also simplified. So 0004 became simpler and more readable. > At Thu, 26 Nov 2020 03:04:10 +, "k.jami...@fujitsu.com" > wrote in > > On Thursday, November 19, 2020 4:08 PM, Tsunakawa, Takayuki wrote: > > > From: Andres Freund > > > > DropRelFileNodeBuffers() in recovery? The most common path is > > > > DropRelationFiles()->smgrdounlinkall()->DropRelFileNodesAllBuffers > > > > (), which 3/4 doesn't address and 4/4 doesn't mention. > > > > > > > > 4/4 seems to address DropRelationFiles(), but only talks about > > > TRUNCATE? > > > > > > Yes. DropRelationFiles() is used in the following two paths: > > > > > > [Replay of TRUNCATE during recovery] > > > xact_redo_commit/abort() -> DropRelationFiles() -> > > > smgrdounlinkall() -> > > > DropRelFileNodesAllBuffers() > > > > > > [COMMIT/ROLLBACK PREPARED] > > > FinishPreparedTransaction() -> DropRelationFiles() -> > > > smgrdounlinkall() > > > -> DropRelFileNodesAllBuffers() > > > > Yes. The concern is that it was not clear in the function descriptions > > and commit logs what the optimizations for the > > DropRelFileNodeBuffers() and DropRelFileNodesAllBuffers() are for. So > > I revised only the function description of DropRelFileNodeBuffers() and the > commit logs of the 0003-0004 patches. Please check if the brief descriptions > would suffice. > > I read the commit message of 3/4. (Though this is not involved literally in > the > final commit.) > > > While recovery, when WAL files of XLOG_SMGR_TRUNCATE from vacuum > or > > autovacuum are replayed, the buffers are dropped when the sizes of all > > involved forks of a relation are already "cached". We can get > > This sentence seems missing "dropped by (or using) what". > > > a reliable size of nblocks for supplied relation's fork at that time, > > and it's safe because DropRelFileNodeBuffers() relies on the behavior > > that cached nblocks will not be invalidated by file extension during > > recovery. Otherwise, or if not in recovery, proceed to sequential > > search of the whole buffer pool. > > This sentence seems involving confusion. It reads as if "we can rely on it > because we're relying on it". And "the cached value won't be invalidated" > doesn't explain the reason precisely. The reason I think is that the cached > value is guaranteed to be the maximum page we have in shared buffer at least > while recovery, and that guarantee is holded by not asking fseek once we > cached the value. Fixed the commit log of 0003. > > > > I also don't get why 4/4 would be a good idea on its own. It uses > > > > BUF_DROP_FULL_SCAN_THRESHOLD to guard > > > > FindAndDropRelFileNodeBuffers() on a per relation basis. But since > > > > DropRelFileNodesAllBuffers() can be used for many relations at > > > > once, this could end up doing BUF_DROP_FULL_SCAN_THRESHOLD > - 1 > > > lookups a lot > > > > of times, once for each of nnodes relations? > > > > > > So, the threshold value should be compared with the total number of > > > blocks of all target relations, not each relation. You seem to be right, > > > got > it. > > > > Fixed this in 0004 patch. Now we compare the total number of > > buffers-to-be-invalidated For ALL relations to the > BUF_DROP_FULL_SCAN_THRESHOLD. > > I didn't see the previous version, but the row of additional palloc/pfree's in > this version looks uneasy. Fixed this too. > int i, > + j, > + *nforks, > n = 0; > > Perhaps I think we don't define variable in different types at once. > (I'm not sure about defining multple variables at once.) Fixed this too. > @@ -3110,7 +3125,10 @@ DropRelFileNodesAllBuffers(RelFileNodeBackend > *rnodes, int nnodes) > > DropRelFileNodeAllLocalBuffers(rnodes[i].node); > } > else > + { > + rels[n] = smgr_reln[i]; > nodes[n++] = rnodes[i].node; > + } > } > > We don't need to remember nodes and rnodes here since rnodes[n] is > rels[n]->smgr_rnode here. Or we don't even need to store rels since we can > scan the smgr_reln later again. > > nodes is needed in the full-scan path but it is enough to collect it after > finding > that we do full-scan. I followe
RE: A new function to wait for the backend exit after termination
> > I changed the status to 'wait on anthor'. > The others of the patch LGTM, > I think it can be changed to Ready for Committer again, when this comment > is confirmed. > I am Sorry I forgot a possible typo comment. +{ oid => '16386', descr => 'terminate a backend process and wait for it\'s exit or until timeout occurs' Does the following change looks better? Wait for it\'s exit => Wait for its exit Best regards, houzj
Re: Huge memory consumption on partitioned table with FKs
Hello, On Thu, Dec 3, 2020 at 10:15 AM Kyotaro Horiguchi wrote: > At Wed, 2 Dec 2020 22:02:50 +0900, Amit Langote > wrote in > > Hello, > > > > On Tue, Dec 1, 2020 at 9:40 AM Kyotaro Horiguchi > > wrote: > > > > > > At Mon, 30 Nov 2020 21:03:45 -0300, Alvaro Herrera > > > wrote in > > > > On 2020-Nov-26, Kyotaro Horiguchi wrote: > > > > > > > > > This shares RI_ConstraintInfo cache by constraints that shares the > > > > > same parent constraints. But you forgot that the cache contains some > > > > > members that can differ among partitions. > > > > > > > > > > Consider the case of attaching a partition that have experienced a > > > > > column deletion. > > > > > > > > I think this can be solved easily in the patch, by having > > > > ri_BuildQueryKey() compare the parent's fk_attnums to the parent; if > > > > they are equal then use the parent's constaint_id, otherwise use the > > > > child constraint. That way, the cache entry is reused in the common > > > > case where they are identical. > > > > > > *I* think it's the direction. After an off-list discussion, we > > > confirmed that even in that case the patch works as is because > > > fk_attnum (or contuple.conkey) always stores key attnums compatible > > > to the topmost parent when conparent has a valid value (assuming the > > > current usage of fk_attnum), but I still feel uneasy to rely on that > > > unclear behavior. > > > > Hmm, I don't see what the problem is here, because it's not > > RI_ConstraintInfo that is being shared among partitions of the same > > parent, but the RI query (and the SPIPlanPtr for it) that is issued > > through SPI. The query (and the plan) turns out to be the same no > > matter what partition's RI_ConstraintInfo is first used to generate > > it. What am I missing? > > I don't think you're missing something. As I (tried to..) mentioned in > another branch of this thread, you're right. On the otherhand > fk_attnums is actually used to generate the query, thus *issue* > potentially affects the query. Although that might be paranoic, that > possibility is eliminated by checking the congruency(?) of fk_attnums > (or other members). We might even be able to share a riinfo among > such children. Just to be sure I've not misunderstood you, let me mention why I think the queries used by different partitions all turn out to be the same despite attribute number differences among partitions. The places that uses fk_attnums when generating a query are these among others: Oid fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]); ... Oid fk_coll = RIAttCollation(fk_rel, riinfo->fk_attnums[i]); ... quoteOneName(attname, RIAttName(fk_rel, riinfo->fk_attnums[i])); For the queries on the referencing side ("check" side), type/collation/attribute name determined using the above are going to be the same for all partitions in a given tree irrespective of the attribute number, because they're logically the same column. On the referenced side ("restrict", "cascade", "set" side), as you already mentioned, fk_attnums refers to the top parent table of the referencing side, so no possibility of they being different in the various referenced partitions' RI_ConstraintInfos. On the topic of how we'd be able to share even the RI_ConstraintInfos among partitions, that would indeed look a bit more elaborate than the patch we have right now. > > BTW, one problem with Kuroda-san's patch is that it's using > > conparentid as the shared key, which can still lead to multiple > > queries being generated when using multi-level partitioning, because > > there would be many (intermediate) parent constraints in that case. > > We should really be using the "root" constraint OID as the key, > > because there would only be one root from which all constraints in a > > given partition hierarchy would've originated. Actually, I had > > written a patch a few months back to do exactly that, a polished > > version of which I've attached with this email. Please take a look. > > I don't like that the function self-recurses but > ri_GetParentConstrOid() actually climbs up to the root constraint, so > the patch covers the multilevel partitioning cases. Sorry, I got too easily distracted by the name Kuroda-san chose for the field in RI_ConstraintInfo and the function. > About your patch, it calculates the root constrid at the time an > riinfo is created, but when the root-partition is further attached to > another partitioned-table after the riinfo creation, > constraint_root_id gets stale. Of course that dones't matter > practically, though. Maybe we could also store the hash value of the root constraint OID as rootHashValue and check for that one too in InvalidateConstraintCacheCallBack(). That would take care of this unless I'm missing something. -- Amit Langote EDB: http://www.enterprisedb.com
RE: Disable WAL logging to speed up data loading
Hello I've made a new patch v05 that took in comments to filter out WALs more strictly and addressed some minor fixes that were discussed within past few days. Also, I changed the documentations, considering those modifications. Best, Takamichi Osumi disable_WAL_logging_v05.patch Description: disable_WAL_logging_v05.patch
RE: Deprecate custom encoding conversions
From: Michael Paquier > Tsunakawa-san, could you post a link to this article, if possible? I am > curious > about their problem and why they used CREATE CONVERSION as a way to > solve it. That's fine even if it is in Japanese. I just pulled info from my old memory in my previous mail. Now the following links seem to be relevant. (They should be different from what I saw in the past.) https://ml.postgresql.jp/pipermail/pgsql-jp/2007-January/013103.html https://teratail.com/questions/295988 IIRC, the open source Postgres extension EUDC also uses CREATE CONVERSION. Regards Takayuki Tsunakawa
Re: autovac issue with large number of tables
On Wed, Dec 2, 2020 at 7:11 PM Masahiko Sawada wrote: > > On Wed, Dec 2, 2020 at 3:33 PM Fujii Masao > wrote: > > > > > > > > On 2020/12/02 12:53, Masahiko Sawada wrote: > > > On Tue, Dec 1, 2020 at 5:31 PM Masahiko Sawada > > > wrote: > > >> > > >> On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao > > >> wrote: > > >>> > > >>> > > >>> > > >>> On 2020/12/01 16:23, Masahiko Sawada wrote: > > On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito > > wrote: > > > > > > Hi, > > > > > > On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao > > > wrote: > > >> > > >> > > >> > > >> On 2020/11/30 10:43, Masahiko Sawada wrote: > > >>> On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito > > >>> wrote: > > > > Hi, Thanks for you comments. > > > > On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao > > wrote: > > > > > > > > > > > > On 2020/11/27 18:38, Kasahara Tatsuhito wrote: > > >> Hi, > > >> > > >> On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao > > >> wrote: > > >>> > > >>> > > >>> > > >>> On 2020/11/26 10:41, Kasahara Tatsuhito wrote: > > On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada > > wrote: > > > > > > On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito > > > wrote: > > >> > > >> Hi, > > >> > > >> On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada > > >> wrote: > > >>> > > >>> On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito > > >>> wrote: > > > > Hi, > > > > On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito > > wrote: > > >> I wonder if we could have table_recheck_autovac do two > > >> probes of the stats > > >> data. First probe the existing stats data, and if it > > >> shows the table to > > >> be already vacuumed, return immediately. If not, *then* > > >> force a stats > > >> re-read, and check a second time. > > > Does the above mean that the second and subsequent > > > table_recheck_autovac() > > > will be improved to first check using the previous > > > refreshed statistics? > > > I think that certainly works. > > > > > > If that's correct, I'll try to create a patch for the PoC > > > > I still don't know how to reproduce Jim's troubles, but I > > was able to reproduce > > what was probably a very similar problem. > > > > This problem seems to be more likely to occur in cases > > where you have > > a large number of tables, > > i.e., a large amount of stats, and many small tables need > > VACUUM at > > the same time. > > > > So I followed Tom's advice and created a patch for the PoC. > > This patch will enable a flag in the table_recheck_autovac > > function to use > > the existing stats next time if VACUUM (or ANALYZE) has > > already been done > > by another worker on the check after the stats have been > > updated. > > If the tables continue to require VACUUM after the > > refresh, then a refresh > > will be required instead of using the existing statistics. > > > > I did simple test with HEAD and HEAD + this PoC patch. > > The tests were conducted in two cases. > > (I changed few configurations. see attached scripts) > > > > 1. Normal VACUUM case > > - SET autovacuum = off > > - CREATE tables with 100 rows > > - DELETE 90 rows for each tables > > - SET autovacuum = on and restart PostgreSQL > > - Measure the time it takes for all tables to be > > VACUUMed > > > > 2. Anti wrap round VACUUM case > > - CREATE brank tables > > - SELECT all of these tables (for generate stats) > > - SET autovacuum_freeze_max_age to low values and > > restart PostgreSQL > > - Consumes a lot of XIDs by using txid_curent() > > - Measure the time it takes for all tables to be > > VA
Re: Commitfest 2020-11 is closed
On 12/02/20 16:51, David Steele wrote: > Depending on how you have Github organized migrating to travis-ci.com may be > bit tricky because it requires full access to all private repositories in > your account and orgs administrated by your account. PL/Java just began using travis-ci.com this summer at the conclusion of our GSoC project, and Thomas had been leery of the full-access-to-everything requirement, but that turned out to have been an old way it once worked. The more current way involves installation as a GitHub app into a particular repository, and it did not come with excessive access requirements. That being said, we got maybe three months of use out of it all told. On 2 November, they announced a "new pricing model"[1],[2], and since 24 November it has no longer run PL/Java tests, logging a "does not have enough credits" message[3] instead. So I rather hastily put a GitHub Actions workflow together to plug the hole. Apparently there is a way for OSS projects to ask nicely for an allocation of some credits that might be available. Regards, -Chap [1] https://www.jeffgeerling.com/blog/2020/travis-cis-new-pricing-plan-threw-wrench-my-open-source-works [2] https://blog.travis-ci.com/2020-11-02-travis-ci-new-billing [3] https://travis-ci.com/github/tada/pljava/requests
Re: Deprecate custom encoding conversions
> I recall a discussion in which someone explained that things are messy for > Japanese because there's not really just one version of SJIS; there are > several, because various groups extended the original standard in not- > quite-compatible ways. In turn this means that there's not really one > well-defined conversion to/from Unicode That's true. > --- which is the reason why > allowing custom conversions seemed like a good idea. I don't know > whether anyone over there is actually using custom conversions, but > I'd be hesitant to just nuke the feature altogether. By Googling I found an instance which is using CREATE CONVERSION (Japanese article). http://grep.blog49.fc2.com/blog-entry-87.html -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
RE: A new function to wait for the backend exit after termination
> > + ereport(WARNING, > > + (errmsg("could not wait for the termination of the > backend with PID %d within %ld milliseconds", > > + pid, timeout))); > > > The code use %ld to print int64 type. > > How about use INT64_FORMAT, which looks more appropriate. > > This is a translatable message, so INT64_FORMAT is no good -- we need > something that is the same across platforms. The current project standard > for this problem is to use "%lld" and explicitly cast the argument to long > long int to match that. Thank you for pointing out that, And sorry for did not think of it. Yes, we can use %lld, (long long int) timeout. Best regards, houzj
Renaming cryptohashes.c to cryptohashfuncs.c
Hi all, Now that the tree has a new set of files for cryptohash functions as of src/common/cryptohash*.c, it is a bit weird to have another file called cryptohashes.c for the set of SQL functions. Any objections to rename that to cryptohashfuncs.c? That would be much more consistent with the surroundings (cleaning up anythig related to MD5 is on my TODO list). Patch is attached. Thoughts? -- Michael From d84ccefbf13f44c352d985dd87eadcc998cfb7b9 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Thu, 3 Dec 2020 11:02:39 +0900 Subject: [PATCH] Rename cryptohashes.c to cryptohashfuncs.c --- src/backend/utils/adt/Makefile | 2 +- src/backend/utils/adt/{cryptohashes.c => cryptohashfuncs.c} | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) rename src/backend/utils/adt/{cryptohashes.c => cryptohashfuncs.c} (98%) diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile index b4d55e849b..f6ec7b64cd 100644 --- a/src/backend/utils/adt/Makefile +++ b/src/backend/utils/adt/Makefile @@ -22,7 +22,7 @@ OBJS = \ bool.o \ cash.o \ char.o \ - cryptohashes.o \ + cryptohashfuncs.o \ date.o \ datetime.o \ datum.o \ diff --git a/src/backend/utils/adt/cryptohashes.c b/src/backend/utils/adt/cryptohashfuncs.c similarity index 98% rename from src/backend/utils/adt/cryptohashes.c rename to src/backend/utils/adt/cryptohashfuncs.c index 5de294a7fd..47bc0b3482 100644 --- a/src/backend/utils/adt/cryptohashes.c +++ b/src/backend/utils/adt/cryptohashfuncs.c @@ -1,13 +1,13 @@ /*- * - * cryptohashes.c + * cryptohashfuncs.c * Cryptographic hash functions * * Portions Copyright (c) 2018-2020, PostgreSQL Global Development Group * * * IDENTIFICATION - * src/backend/utils/adt/cryptohashes.c + * src/backend/utils/adt/cryptohashfuncs.c * *- */ -- 2.29.2 signature.asc Description: PGP signature
Re: A new function to wait for the backend exit after termination
"Hou, Zhijie" writes: > + ereport(WARNING, > + (errmsg("could not wait for the termination of the > backend with PID %d within %ld milliseconds", > + pid, timeout))); > The code use %ld to print int64 type. > How about use INT64_FORMAT, which looks more appropriate. This is a translatable message, so INT64_FORMAT is no good -- we need something that is the same across platforms. The current project standard for this problem is to use "%lld" and explicitly cast the argument to long long int to match that. regards, tom lane
RE: A new function to wait for the backend exit after termination
Hi I take a look into the patch, and here some comments. 1. + + ereport(WARNING, + (errmsg("could not wait for the termination of the backend with PID %d within %ld milliseconds", + pid, timeout))); + The code use %ld to print int64 type. How about use INT64_FORMAT, which looks more appropriate. 2. + if (timeout <= 0) + { + ereport(WARNING, + (errmsg("timeout cannot be negative or zero: %ld", timeout))); + PG_RETURN_BOOL(r); + } The same as 1. 3. +pg_terminate_backend_and_wait(PG_FUNCTION_ARGS) +{ + int pid = PG_GETARG_DATUM(0); +pg_wait_backend(PG_FUNCTION_ARGS) +{ + int pid = PG_GETARG_INT32(0); The code use different macro to get pid, How about use PG_GETARG_INT32(0) for each one. I changed the status to 'wait on anthor'. The others of the patch LGTM, I think it can be changed to Ready for Committer again, when this comment is confirmed. Best regards, houzj
Re: Deprecate custom encoding conversions
On Thu, Dec 03, 2020 at 12:54:56AM +, tsunakawa.ta...@fujitsu.com wrote: > I can't answer deeper questions because I'm not familiar with > character sets, but I saw some Japanese web articles that use CREATE > CONVERSION to handle external characters. So, I think we may as > well retain it. (OTOH, I'm wondering how other DBMSs support > external characters and what Postgres should implement it.) Tsunakawa-san, could you post a link to this article, if possible? I am curious about their problem and why they used CREATE CONVERSION as a way to solve it. That's fine even if it is in Japanese. -- Michael signature.asc Description: PGP signature
Re: scram-sha-256 broken with FIPS and OpenSSL 1.0.2
On Wed, Dec 02, 2020 at 12:03:49PM +0900, Michael Paquier wrote: > Thanks. 0001 has been applied and the buildfarm does not complain, so > it looks like we are good (I'll take care of any issues, like the one > Fujii-san has just reported). Attached are new patches for 0002, the > EVP switch. One thing I noticed is that we need to free the backup > manifest a bit earlier once we begin to use resource owner in > basebackup.c as there is a specific step that may do a double-free. > This would not happen when not using OpenSSL or on HEAD. It would be > easy to separate the resowner and cryptohash portions of the patch > here, but both are tightly linked, so I'd prefer to keep them > together. Attached is a rebased version to take care of the conflicts introduced by 91624c2f. -- Michael From a68819b3f843b4ce2883c35c008d5dd4fb47ee35 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Wed, 2 Dec 2020 11:51:43 +0900 Subject: [PATCH v8] Switch cryptohash_openssl.c to use EVP Postgres is two decades late for this switch. --- src/include/utils/resowner_private.h | 7 ++ src/backend/replication/basebackup.c | 8 +- src/backend/utils/resowner/resowner.c | 62 src/common/cryptohash_openssl.c | 132 -- src/tools/pgindent/typedefs.list | 1 + 5 files changed, 158 insertions(+), 52 deletions(-) diff --git a/src/include/utils/resowner_private.h b/src/include/utils/resowner_private.h index a781a7a2aa..c373788bc1 100644 --- a/src/include/utils/resowner_private.h +++ b/src/include/utils/resowner_private.h @@ -95,4 +95,11 @@ extern void ResourceOwnerRememberJIT(ResourceOwner owner, extern void ResourceOwnerForgetJIT(ResourceOwner owner, Datum handle); +/* support for cryptohash context management */ +extern void ResourceOwnerEnlargeCryptoHash(ResourceOwner owner); +extern void ResourceOwnerRememberCryptoHash(ResourceOwner owner, + Datum handle); +extern void ResourceOwnerForgetCryptoHash(ResourceOwner owner, + Datum handle); + #endif /* RESOWNER_PRIVATE_H */ diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index 22be7ca9d5..79b458c185 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -729,11 +729,17 @@ perform_base_backup(basebackup_options *opt) errmsg("checksum verification failure during base backup"))); } + /* + * Make sure to free the manifest before the resource owners as + * manifests use cryptohash contexts that may depend on resource + * owners (like OpenSSL). + */ + FreeBackupManifest(&manifest); + /* clean up the resource owner we created */ WalSndResourceCleanup(true); pgstat_progress_end_command(); - FreeBackupManifest(&manifest); } /* diff --git a/src/backend/utils/resowner/resowner.c b/src/backend/utils/resowner/resowner.c index 8bc2c4e9ea..546ad8d1c5 100644 --- a/src/backend/utils/resowner/resowner.c +++ b/src/backend/utils/resowner/resowner.c @@ -20,6 +20,7 @@ */ #include "postgres.h" +#include "common/cryptohash.h" #include "common/hashfn.h" #include "jit/jit.h" #include "storage/bufmgr.h" @@ -128,6 +129,7 @@ typedef struct ResourceOwnerData ResourceArray filearr; /* open temporary files */ ResourceArray dsmarr; /* dynamic shmem segments */ ResourceArray jitarr; /* JIT contexts */ + ResourceArray cryptohasharr; /* cryptohash contexts */ /* We can remember up to MAX_RESOWNER_LOCKS references to local locks. */ int nlocks; /* number of owned locks */ @@ -175,6 +177,7 @@ static void PrintTupleDescLeakWarning(TupleDesc tupdesc); static void PrintSnapshotLeakWarning(Snapshot snapshot); static void PrintFileLeakWarning(File file); static void PrintDSMLeakWarning(dsm_segment *seg); +static void PrintCryptoHashLeakWarning(Datum handle); /* @@ -444,6 +447,7 @@ ResourceOwnerCreate(ResourceOwner parent, const char *name) ResourceArrayInit(&(owner->filearr), FileGetDatum(-1)); ResourceArrayInit(&(owner->dsmarr), PointerGetDatum(NULL)); ResourceArrayInit(&(owner->jitarr), PointerGetDatum(NULL)); + ResourceArrayInit(&(owner->cryptohasharr), PointerGetDatum(NULL)); return owner; } @@ -553,6 +557,17 @@ ResourceOwnerReleaseInternal(ResourceOwner owner, jit_release_context(context); } + + /* Ditto for cryptohash contexts */ + while (ResourceArrayGetAny(&(owner->cryptohasharr), &foundres)) + { + pg_cryptohash_ctx *context = + (pg_cryptohash_ctx *) PointerGetDatum(foundres); + + if (isCommit) +PrintCryptoHashLeakWarning(foundres); + pg_cryptohash_free(context); + } } else if (phase == RESOURCE_RELEASE_LOCKS) { @@ -725,6 +740,7 @@ ResourceOwnerDelete(ResourceOwner owner) Assert(owner->filearr.nitems == 0); Assert(owner->dsmarr.nitems == 0); Assert(owner->jitarr.nitems == 0); + Assert(owner->cryptohasharr.nitems == 0); Assert(owner->nlocks == 0 || owner->
Re: Deprecate custom encoding conversions
"tsunakawa.ta...@fujitsu.com" writes: > From: Heikki Linnakangas >> Any objections? Anyone using custom encoding conversions in production? > I can't answer deeper questions because I'm not familiar with character sets, > but I saw some Japanese web articles that use CREATE CONVERSION to handle > external characters. So, I think we may as well retain it. (OTOH, I'm > wondering how other DBMSs support external characters and what Postgres > should implement it.) I recall a discussion in which someone explained that things are messy for Japanese because there's not really just one version of SJIS; there are several, because various groups extended the original standard in not- quite-compatible ways. In turn this means that there's not really one well-defined conversion to/from Unicode --- which is the reason why allowing custom conversions seemed like a good idea. I don't know whether anyone over there is actually using custom conversions, but I'd be hesitant to just nuke the feature altogether. Having said that, it doesn't seem like the conversion API is necessarily any more set in stone than any of our other C-level APIs. We break things at the C API level whenever there's sufficient reason. regards, tom lane
Re: Get memory contexts of an arbitrary backend process
Fujii Masao writes: > I'm starting to study how this feature behaves. At first, when I executed > the following query, the function never returned. ISTM that since > the autovacuum launcher cannot respond to the request of memory > contexts dump, the function keeps waiting infinity. Is this a bug? > Probably we should exclude non-backend proceses from the target > processes to dump? Sorry if this was already discussed. > SELECT pg_get_backend_memory_contexts(pid) FROM pg_stat_activity; FWIW, I think this patch is fundamentally unsafe. It's got a lot of the same problems that I complained about w.r.t. the nearby proposal to allow cross-backend stack trace dumping. It does avoid the trap of thinking that it can do work in a signal handler, but instead it supposes that it can do work involving very high-level objects such as shared hash tables in anyplace that might execute CHECK_FOR_INTERRUPTS. That's never going to be safe: the only real expectation the system has is that CHECK_FOR_INTERRUPTS is called at places where our state is sane enough that a transaction abort can clean up. Trying to do things like taking LWLocks is going to lead to deadlocks or worse. We need not even get into the hard questions, such as what happens when one process or the other exits unexpectedly. I also find the idea that this should be the same SQL function as pg_get_backend_memory_contexts to be a seriously bad decision. That means that it's not possible to GRANT the right to examine only your own process's memory --- with this proposal, that means granting the right to inspect every other process as well. Beyond that, the fact that there's no way to restrict the capability to just, say, other processes owned by the same user means that it's not really safe to GRANT to non-superusers anyway. Even with such a restriction added, things are problematic, since for example it would be possible to inquire into the workings of a security-definer function executing in another process that nominally is owned by your user. Between the security and implementation issues here, I really think we'd be best advised to just reject the concept, period. regards, tom lane
Re: Huge memory consumption on partitioned table with FKs
Hi Hackers, >> I would embed all this knowledge in ri_BuildQueryKey though, without >> adding the new function ri_GetParentConstOid. I don't think that >> function meaningful abstraction value, and instead it would make what I >> suggest more difficult. > It seems to me reasonable. Indeed, I tried to get the conparentid with ri_BuildQueryKey. (v2_reduce_ri_SPI-plan-hash.patch) > Somewhat of a detour, but in reviewing the patch for > Statement-Level RI checks, Andres and I observed that SPI > made for a large portion of the RI overhead. If this can be resolved by reviewing the Statement-Level RI checks, then I think it would be better. > BTW, one problem with Kuroda-san's patch is that it's using > conparentid as the shared key, which can still lead to multiple > queries being generated when using multi-level partitioning, because > there would be many (intermediate) parent constraints in that case. Horiguchi-san also mentiond, In my patch, in ri_GetParentConstOid, iterate on getting the constraint OID until comparentid == InvalidOid. This should allow to get the "root constraint OID". However, the names "comparentid" and "ri_GetParentConstOid" are confusing. We should use names like "constraint_root_id", as in Amit-san's patch. Best Regards, -- Keisuke Kuroda NTT Software Innovation Center keisuke.kuroda.3...@gmail.com v2_reduce_ri_SPI-plan-hash.patch Description: Binary data
Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
On Tue, Dec 01, 2020 at 03:10:13PM +0900, Michael Paquier wrote: > Well, my impression is that both of you kept exchanging patches, > touching and reviewing each other's patch (note that Alexei has also > sent a rebase of 0002 just yesterday), so I think that it is fair to > say that both of you should be listed as authors and credited as such > in the release notes for this one. OK, this one is now committed. As of this thread, I think that we are going to need to do a bit more once we add options that are not just booleans for both commands (the grammar rules do not need to be changed now): - Have a ReindexParams, similarly to VacuumParams except that we store the results of the parsing in a single place. With the current HEAD, I did not see yet the point in doing so because we just need an integer that maps to a bitmask made of ReindexOption. - The part related to ReindexStmt in utility.c is getting more and more complicated, so we could move most of the execution into indexcmds.c with some sort of ExecReindex() doing the option parsing job, and go to the correct code path depending on the object type dealt with. -- Michael signature.asc Description: PGP signature
Re: Huge memory consumption on partitioned table with FKs
At Wed, 2 Dec 2020 22:02:50 +0900, Amit Langote wrote in > Hello, > > On Tue, Dec 1, 2020 at 9:40 AM Kyotaro Horiguchi > wrote: > > > > At Mon, 30 Nov 2020 21:03:45 -0300, Alvaro Herrera > > wrote in > > > On 2020-Nov-26, Kyotaro Horiguchi wrote: > > > > > > > This shares RI_ConstraintInfo cache by constraints that shares the > > > > same parent constraints. But you forgot that the cache contains some > > > > members that can differ among partitions. > > > > > > > > Consider the case of attaching a partition that have experienced a > > > > column deletion. > > > > > > I think this can be solved easily in the patch, by having > > > ri_BuildQueryKey() compare the parent's fk_attnums to the parent; if > > > they are equal then use the parent's constaint_id, otherwise use the > > > child constraint. That way, the cache entry is reused in the common > > > case where they are identical. > > > > *I* think it's the direction. After an off-list discussion, we > > confirmed that even in that case the patch works as is because > > fk_attnum (or contuple.conkey) always stores key attnums compatible > > to the topmost parent when conparent has a valid value (assuming the > > current usage of fk_attnum), but I still feel uneasy to rely on that > > unclear behavior. > > Hmm, I don't see what the problem is here, because it's not > RI_ConstraintInfo that is being shared among partitions of the same > parent, but the RI query (and the SPIPlanPtr for it) that is issued > through SPI. The query (and the plan) turns out to be the same no > matter what partition's RI_ConstraintInfo is first used to generate > it. What am I missing? I don't think you're missing something. As I (tried to..) mentioned in another branch of this thread, you're right. On the otherhand fk_attnums is actually used to generate the query, thus *issue* potentially affects the query. Although that might be paranoic, that possibility is eliminated by checking the congruency(?) of fk_attnums (or other members). We might even be able to share a riinfo among such children. > BTW, one problem with Kuroda-san's patch is that it's using > conparentid as the shared key, which can still lead to multiple > queries being generated when using multi-level partitioning, because > there would be many (intermediate) parent constraints in that case. > We should really be using the "root" constraint OID as the key, > because there would only be one root from which all constraints in a > given partition hierarchy would've originated. Actually, I had > written a patch a few months back to do exactly that, a polished > version of which I've attached with this email. Please take a look. I don't like that the function self-recurses but ri_GetParentConstrOid() actually climbs up to the root constraint, so the patch covers the multilevel partitioning cases. About your patch, it calculates the root constrid at the time an riinfo is created, but when the root-partition is further attached to another partitioned-table after the riinfo creation, constraint_root_id gets stale. Of course that dones't matter practically, though. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Add docs stub for recovery.conf
On Wed, Dec 2, 2020 at 08:07:47PM -0500, Isaac Morland wrote: > On Wed, 2 Dec 2020 at 19:33, David G. Johnston > wrote: > > On Wed, Dec 2, 2020 at 5:26 PM Bruce Momjian wrote: > > I think the ideal solution is to create a section for all the rename > cases and do all the redirects to that page. The page would list the > old and new name for each item, and would link to the section for each > new item. > > > > Nothing prevents us from doing that for simple renames. For me, this > situation is not a simple rename and the proposed solution is appropriate > for what it is - changing the implementation details of an existing > feature. We can do both - though the simple rename page doesn't seem > particularly appealing at first glance. > > > I for one do not like following a bookmark or link and then being redirected > to > a generic page that doesn't relate to the specific link I was following. What > is being proposed here is not as bad as the usual, where all the old links > simply turn into redirects to the homepage, but it's still disorienting. I > would much rather each removed page be moved to an appendix (without renaming) > and edited to briefly explain what happened to the page and provide links to > the appropriate up-to-date page or pages. Yes, that is pretty much the same thing I was suggesting, except that each rename has its own _original_ URL link, which I think is also acceptable. My desire is for these items to all exist in one place, and an appendix of them seems fine. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Get memory contexts of an arbitrary backend process
On 2020/11/16 19:58, torikoshia wrote: On 2020-10-28 15:32, torikoshia wrote: On 2020-10-23 13:46, Kyotaro Horiguchi wrote: I think we might need to step-back to basic design of this feature since this patch seems to have unhandled corner cases that are difficult to find. I've written out the basic design below and attached corresponding patch. I'm starting to study how this feature behaves. At first, when I executed the following query, the function never returned. ISTM that since the autovacuum launcher cannot respond to the request of memory contexts dump, the function keeps waiting infinity. Is this a bug? Probably we should exclude non-backend proceses from the target processes to dump? Sorry if this was already discussed. SELECT pg_get_backend_memory_contexts(pid) FROM pg_stat_activity; Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Add docs stub for recovery.conf
On Wed, 2 Dec 2020 at 19:33, David G. Johnston wrote: > On Wed, Dec 2, 2020 at 5:26 PM Bruce Momjian wrote: > >> I think the ideal solution is to create a section for all the rename >> cases and do all the redirects to that page. The page would list the >> old and new name for each item, and would link to the section for each >> new item. >> >> > Nothing prevents us from doing that for simple renames. For me, this > situation is not a simple rename and the proposed solution is appropriate > for what it is - changing the implementation details of an existing > feature. We can do both - though the simple rename page doesn't seem > particularly appealing at first glance. > I for one do not like following a bookmark or link and then being redirected to a generic page that doesn't relate to the specific link I was following. What is being proposed here is not as bad as the usual, where all the old links simply turn into redirects to the homepage, but it's still disorienting. I would much rather each removed page be moved to an appendix (without renaming) and edited to briefly explain what happened to the page and provide links to the appropriate up-to-date page or pages.
Re: pg_stat_statements oddity with track = all
On Wed, Dec 02, 2020 at 06:23:54AM -0800, Nikolay Samokhvalov wrote: > On Tue, Dec 1, 2020 at 10:32 PM Julien Rouhaud wrote: > > > On Tue, Dec 01, 2020 at 10:08:06PM -0800, Nikolay Samokhvalov wrote: > > > If all top-level records in pg_stat_statements have "true" in the new > > > column (is_toplevel), how would this lead to the need to increase > > > pg_stat_statements.max? The number of records would remain the same, as > > > before extending pg_stat_statements. > > > > If the same query is getting executed both at top level and as a nested > > statement, two entries will then be created. That's probably unlikely for > > things like RI trigger queries, but I don't know what to expect for client > > application queries. > > > > Right, but this is how things already work. The extra field you've proposed > won't increase the number of records so it shouldn't affect how users > choose pg_stat_statements.max. The extra field I've proposed would increase the number of records, as it needs to be a part of the key. The only alternative would be what Fufi-san mentioned, i.e. to split plans and calls (for instance plans_toplevel, plans_nested, calls_toplevel, calls_nested) and let users compute an approximate value for toplevel counters.
RE: Deprecate custom encoding conversions
From: Heikki Linnakangas > I propose that we add a notice to the CREATE CONVERSION docs to say that > it is deprecated, and remove it in a few years. > > Any objections? Anyone using custom encoding conversions in production? I can't answer deeper questions because I'm not familiar with character sets, but I saw some Japanese web articles that use CREATE CONVERSION to handle external characters. So, I think we may as well retain it. (OTOH, I'm wondering how other DBMSs support external characters and what Postgres should implement it.) Also, the SQL standard has CREATE CHARACTER SET and CREATE TRANSLATION. I don't know how these are useful, but the mechanism of CREATE CONVERSION can be used to support them. CREATE CHARACTER SET [ AS ] [ ] CREATE TRANSLATION FOR TO FROM Regards Takayuki Tsunakawa
Re: Add docs stub for recovery.conf
On Wed, Dec 2, 2020 at 5:26 PM Bruce Momjian wrote: > I think the ideal solution is to create a section for all the rename > cases and do all the redirects to that page. The page would list the > old and new name for each item, and would link to the section for each > new item. > > Nothing prevents us from doing that for simple renames. For me, this situation is not a simple rename and the proposed solution is appropriate for what it is - changing the implementation details of an existing feature. We can do both - though the simple rename page doesn't seem particularly appealing at first glance. David J.
Re: Add docs stub for recovery.conf
On Wed, Dec 2, 2020 at 05:57:01PM -0500, Stephen Frost wrote: > * Bruce Momjian (br...@momjian.us) wrote: > > We were not going to use just redirects --- we were going to create a > > page that had all the renames listed, with links to the new names. > > Maybe I'm the one who is confused here, but I thought there was > objection to adding a new section/page which covers these topics (which > is what Craig's original patch does)...? If there isn't an objection to > that then it seems like we should move forward with it. > > If I'm following correctly, maybe there was some idea that we should > have more things added to this section than just the recovery.conf bits, > and perhaps we should, but that could certainly be done later. To be > clear though, I don't think we need to do this in all cases- the > existing flow for pg_xlogdump -> pg_waldump works pretty well. Maybe we > add in a note here too if someone wants to but I don't think it's > strictly necessary for the 'simple' rename cases. > > I also feel like that could be done once the section gets added, if > someone wants to. > > Was there something else that I'm missing here in terms of what the > concern is regarding Craig's patch..? I think the ideal solution is to create a section for all the rename cases and do all the redirects to that page. The page would list the old and new name for each item, and would link to the section for each new item. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: SELECT INTO deprecation
On Wed, Dec 02, 2020 at 03:35:46PM -0500, Tom Lane wrote: > Yeah, if we want to kill it let's just do so. The negative language in > the reference page has been there since (at least) 7.1, so people can > hardly say they didn't see it coming. +1. I got to wonder about the impact when migrating applications though. SELECT INTO has a different meaning in Oracle, but SQL server creates a new table like Postgres. -- Michael signature.asc Description: PGP signature
Re: Commitfest 2020-11 is closed
st 2. 12. 2020 v 23:36 odesílatel Andrew Dunstan napsal: > > > On 12/2/20 5:13 PM, Thomas Munro wrote: > > > > I'm experimenting with Github's built in CI. All other ideas welcome. > > > > I'd look very closely at gitlab. I was about to respond before with the same idea. Feel free to ping regarding another CI. Also there is possibility to move to GitHub Actions (free open source CI). I got some experience with that as well. > > cheers > > > andrew > > >
Re: Add docs stub for recovery.conf
Greetings, * Bruce Momjian (br...@momjian.us) wrote: > On Wed, Dec 2, 2020 at 02:47:13PM -0500, Stephen Frost wrote: > > * David G. Johnston (david.g.johns...@gmail.com) wrote: > > > On Mon, Nov 30, 2020 at 11:42 AM Bruce Momjian wrote: > > > > The downside is you end up with X-1 dummy sections just to allow for > > > > references to old syntax, and you then have to find them all and remove > > > > them when you implement the proper solution. I have no intention of > > > > applying such an X-1 fix. > > > > > > X = 2; seems like a strong objection for such a minor issue. The status > > > quo seems worse than that. > > > > I've been thinking about this and I think I'm on Craig and David's side- > > having something cleaner, and clearer, than just http redirects and such > > would be good for these cases and I don't think we are going to end up > > with so many of them that it ends up becoming an issue. > > We were not going to use just redirects --- we were going to create a > page that had all the renames listed, with links to the new names. Maybe I'm the one who is confused here, but I thought there was objection to adding a new section/page which covers these topics (which is what Craig's original patch does)...? If there isn't an objection to that then it seems like we should move forward with it. If I'm following correctly, maybe there was some idea that we should have more things added to this section than just the recovery.conf bits, and perhaps we should, but that could certainly be done later. To be clear though, I don't think we need to do this in all cases- the existing flow for pg_xlogdump -> pg_waldump works pretty well. Maybe we add in a note here too if someone wants to but I don't think it's strictly necessary for the 'simple' rename cases. I also feel like that could be done once the section gets added, if someone wants to. Was there something else that I'm missing here in terms of what the concern is regarding Craig's patch..? Thanks, Stephen signature.asc Description: PGP signature
Re: pg_dump, ATTACH, and independently restorable child partitions
On Wed, Nov 25, 2020 at 06:35:19PM -0500, Tom Lane wrote: > Justin Pryzby writes: > > 1. Maybe pg_restore ExecuteSqlCommandBuf() should (always?) call > > ExecuteSimpleCommands() instead of ExecuteSqlCommand(). It doesn't seem to > > break anything (although that surprised me). > > That certainly does break everything, which I imagine is the reason > why the cfbot shows that this patch is failing the pg_upgrade tests. Thanks for looking, I have tried this. > What we'd need to do if we want this to work with direct-to-DB restore > is to split off the ATTACH PARTITION command as a separate TOC entry. > That doesn't seem amazingly difficult, and it would even offer the > possibility that you could extract the partition standalone without > having to ignore errors. (You'd use -l/-L to select the CREATE TABLE, > the data, etc, but not the ATTACH object.) -- Justin >From 0d0b013930199158306fd295eebbf36c4c4cb5b4 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Thu, 26 Nov 2020 22:02:07 -0600 Subject: [PATCH v4] pg_dump: output separate "object" for ALTER TABLE..ATTACH PARTITION ..allowing table partitions to be restored separately and independently, even if there's an error attaching to parent table. --- src/bin/pg_dump/common.c | 28 + src/bin/pg_dump/pg_dump.c | 76 -- src/bin/pg_dump/pg_dump.h | 8 src/bin/pg_dump/pg_dump_sort.c | 48 +++-- 4 files changed, 118 insertions(+), 42 deletions(-) diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c index 634ca86cfb..8450e22327 100644 --- a/src/bin/pg_dump/common.c +++ b/src/bin/pg_dump/common.c @@ -320,6 +320,34 @@ flagInhTables(Archive *fout, TableInfo *tblinfo, int numTables, for (j = 0; j < numParents; j++) parents[j]->interesting = true; } + + if (tblinfo[i].dobj.dump && tblinfo[i].ispartition) + { + /* We don't even need to store this anywhere, but maybe there's + * some utility in making a single, large allocation earlier ? */ + TableAttachInfo *attachinfo = palloc(sizeof(*attachinfo)); + + attachinfo->dobj.objType = DO_TABLE_ATTACH; + attachinfo->dobj.catId.tableoid = 0; + attachinfo->dobj.catId.oid = 0; + AssignDumpId(&attachinfo->dobj); + attachinfo->dobj.name = pg_strdup(tblinfo[i].dobj.name); + attachinfo->dobj.namespace = tblinfo[i].dobj.namespace; + attachinfo->parentTbl = tblinfo[i].parents[0]; + attachinfo->partitionTbl = &tblinfo[i]; + + /* + * We must state the DO_TABLE_ATTACH object's dependencies + * explicitly, since it will not match anything in pg_depend. + * + * Give it dependencies on both the partition table and the parent + * table, so that it will not be executed till both of those + * exist. (There's no need to care what order those are created + * in.) + */ + addObjectDependency(&attachinfo->dobj, tblinfo[i].dobj.dumpId); + addObjectDependency(&attachinfo->dobj, tblinfo[i].parents[0]->dobj.dumpId); + } } } diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index dc1d41dd8d..f66182bc73 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -201,6 +201,7 @@ static void dumpAgg(Archive *fout, AggInfo *agginfo); static void dumpTrigger(Archive *fout, TriggerInfo *tginfo); static void dumpEventTrigger(Archive *fout, EventTriggerInfo *evtinfo); static void dumpTable(Archive *fout, TableInfo *tbinfo); +static void dumpTableAttach(Archive *fout, TableAttachInfo *tbinfo); static void dumpTableSchema(Archive *fout, TableInfo *tbinfo); static void dumpAttrDef(Archive *fout, AttrDefInfo *adinfo); static void dumpSequence(Archive *fout, TableInfo *tbinfo); @@ -10110,6 +10111,9 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj) case DO_TABLE: dumpTable(fout, (TableInfo *) dobj); break; + case DO_TABLE_ATTACH: + dumpTableAttach(fout, (TableAttachInfo *) dobj); + break; case DO_ATTRDEF: dumpAttrDef(fout, (AttrDefInfo *) dobj); break; @@ -15573,6 +15577,55 @@ createDummyViewAsClause(Archive *fout, TableInfo *tbinfo) return result; } +/* + * dumpTableAttach + * write to fout the commands to attach a child partition + * + * For partitioned tables, emit the ATTACH PARTITION clause. Note + * that we always want to create partitions this way instead of using + * CREATE TABLE .. PARTITION OF, mainly to preserve a possible column + * layout discrepancy with the parent, but also to ensure it gets the + * correct tablespace setting if it differs from the parent's. + */ +static void +dumpTableAttach(Archive *fout, TableAttachInfo *attachinfo) +{ + DumpOptions *dopt = fout->dopt; + char *qualrelname; + PQExpBuffer q; + + if (dopt->dataOnly) + return; + + if (attachinfo->partitionTbl->dobj.dump & DUMP_COMPONENT_DEFINITION == 0) + return; + + /* With partitions there can only be one parent */ + if (attachinfo->partitionTbl->numParents != 1) + fatal("invalid number of parents %d for table \"%s\"",
Re: Commitfest 2020-11 is closed
On 12/2/20 5:13 PM, Thomas Munro wrote: > > I'm experimenting with Github's built in CI. All other ideas welcome. I'd look very closely at gitlab. cheers andrew
Re: Commitfest 2020-11 is closed
On 12/2/20 5:13 PM, Thomas Munro wrote: On Thu, Dec 3, 2020 at 10:51 AM David Steele wrote: On 12/2/20 4:15 PM, Thomas Munro wrote: On Thu, Dec 3, 2020 at 10:00 AM Tom Lane wrote: This is actually a bit problematic, because now the cfbot is ignoring those patches (or if it's not, I don't know where it's displaying the results). Please go ahead and move the remaining open patches, or else re-open the CF if that's possible. As of quite recently, Travis CI doesn't seem to like cfbot's rate of build jobs. Recently it's been taking a very long time to post results for new patches and taking so long to get around to retesting patches for bitrot that the my "delete old results after a week" logic started wiping out some results while they are still the most recent, leading to the blank spaces where the results are supposed to be. D'oh. I'm looking into a couple of options. pgBackRest test runs have gone from ~17 minutes to 3-6 hours over the last two months. Ouch. Yeah. Also keep in mind that travis-ci.org will be shut down at the end of the month. Some users who have migrated to travis-ci.com have complained that it is not any faster, but I have not tried myself (yet). Oh. Yeah. Depending on how you have Github organized migrating to travis-ci.com may be bit tricky because it requires full access to all private repositories in your account and orgs administrated by your account. I'm experimenting with Github's built in CI. All other ideas welcome. We're leaning towards Github actions ourselves. The only thing that makes us want to stay with Travis (at least for some tests) is the support for the ppc64le, arm64, and s390x architectures. s390x in particular since it is the only big-endian architecture we have access to. Regards, -- -David da...@pgmasters.net
Re: Commitfest 2020-11 is closed
On Thu, Dec 3, 2020 at 10:51 AM David Steele wrote: > On 12/2/20 4:15 PM, Thomas Munro wrote: > > On Thu, Dec 3, 2020 at 10:00 AM Tom Lane wrote: > >> This is actually a bit problematic, because now the cfbot is ignoring > >> those patches (or if it's not, I don't know where it's displaying the > >> results). Please go ahead and move the remaining open patches, or > >> else re-open the CF if that's possible. > > > > As of quite recently, Travis CI doesn't seem to like cfbot's rate of > > build jobs. Recently it's been taking a very long time to post > > results for new patches and taking so long to get around to retesting > > patches for bitrot that the my "delete old results after a week" logic > > started wiping out some results while they are still the most recent, > > leading to the blank spaces where the results are supposed to be. > > D'oh. I'm looking into a couple of options. > > pgBackRest test runs have gone from ~17 minutes to 3-6 hours over the > last two months. Ouch. > Also keep in mind that travis-ci.org will be shut down at the end of the > month. Some users who have migrated to travis-ci.com have complained > that it is not any faster, but I have not tried myself (yet). Oh. > Depending on how you have Github organized migrating to travis-ci.com > may be bit tricky because it requires full access to all private > repositories in your account and orgs administrated by your account. I'm experimenting with Github's built in CI. All other ideas welcome.
Re: Commitfest 2020-11 is closed
On 12/2/20 4:15 PM, Thomas Munro wrote: On Thu, Dec 3, 2020 at 10:00 AM Tom Lane wrote: This is actually a bit problematic, because now the cfbot is ignoring those patches (or if it's not, I don't know where it's displaying the results). Please go ahead and move the remaining open patches, or else re-open the CF if that's possible. As of quite recently, Travis CI doesn't seem to like cfbot's rate of build jobs. Recently it's been taking a very long time to post results for new patches and taking so long to get around to retesting patches for bitrot that the my "delete old results after a week" logic started wiping out some results while they are still the most recent, leading to the blank spaces where the results are supposed to be. D'oh. I'm looking into a couple of options. pgBackRest test runs have gone from ~17 minutes to 3-6 hours over the last two months. Also keep in mind that travis-ci.org will be shut down at the end of the month. Some users who have migrated to travis-ci.com have complained that it is not any faster, but I have not tried myself (yet). Depending on how you have Github organized migrating to travis-ci.com may be bit tricky because it requires full access to all private repositories in your account and orgs administrated by your account. Regards, -- -David da...@pgmasters.net
Proposed patch for key managment
Attached is a patch for key management, which will eventually be part of cluster file encryption (CFE), called TDE (Transparent Data Encryption) by Oracle. It is an update of Masahiko Sawada's patch from July 31: https://www.postgresql.org/message-id/ca+fd4k6rjwnvztro3q2f5hsdd8hgyuc4cuy9u3e6ran4c6t...@mail.gmail.com Sawada-san did all the hard work, and I just redirected the patch. The general outline of this CFE feature can be seen here: https://wiki.postgresql.org/wiki/Transparent_Data_Encryption The currently planned progression for this feature is to allow secure retrieval of key encryption keys (KEK) outside of the database, then use those to encrypt data keys that encrypt heap/index/tmpfile files. This patch was changed from Masahiko Sawada's version by removing references to non-cluster file encryption because having SQL-level keys stored in this way was considered to have limited usefulness. I have also remove references to SQL-level KEK rotation since that is best done as a command-line too. If most people approve of this general approach, and the design decisions made, I would like to apply this in the next few weeks, but this brings complications. The syntax added by this commit might not provide a useful feature until PG 15, so how do we hide it from users. I was thinking of not applying the doc changes (or commenting them out) and commenting out the --help output. Once this patch is applied, I would like to use the /dev/tty file descriptor passing feature for the ssl_passphrase_command parameter, so the ssl passphrase can be prompted from the terminal. (I am attaching pass_fd.sh as a POC for how file descriptor handling works.) I would also then write the KEK rotation command-line tool. After that, we can start working on heap/index/tmpfile encryption using this patch as a base. Here is an example of the current patch in action, using a KEK like '7CE7945EA2F7322536F103E4D7D91C21E52288089EF99D186B9A76D666EBCA66', which is not echoed to the terminal: $ initdb -R -c '/u/postgres/tmp/pass_fd.sh "Enter password:" %R' The files belonging to this database system will be owned by user "postgres". This user must also own the server process. The database cluster will be initialized with locale "en_US.UTF-8". The default database encoding has accordingly been set to "UTF8". The default text search configuration will be set to "english". Data page checksums are disabled. Cluster file encryption is enabled. fixing permissions on existing directory /u/pgsql/data ... ok creating subdirectories ... ok selecting dynamic shared memory implementation ... posix selecting default max_connections ... 100 selecting default shared_buffers ... 128MB selecting default time zone ... America/New_York creating configuration files ... ok running bootstrap script ... --> Enter password:ok performing post-bootstrap initialization ... --> Enter password:ok syncing data to disk ... ok initdb: warning: enabling "trust" authentication for local connections You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb. Success. You can now start the database server using: pg_ctl -D /u/pgsql/data -l logfile start $ pg_ctl -R -l /u/pg/server.log start waiting for server to start... --> Enter password: done server started A non-matching passphrase looks like this: $ pg_ctl -R -l /u/pg/server.log start waiting for server to start... --> Enter password: stopped waiting pg_ctl: could not start server Examine the log output. $ tail -2 /u/pg/server.log 2020-12-02 16:32:34.045 EST [13056] FATAL: cluster passphrase does not match expected passphrase 2020-12-02 16:32:34.047 EST [13056] LOG: database system is shut down -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee key.diff.gz Description: application/gzip pass_fd.sh Description: Bourne shell script
Re: libpq compression
On Thu, Nov 26, 2020 at 8:15 AM Daniil Zakhlystov wrote: > However, I don’t mean by this that we shouldn’t support switchable > compression. I propose that we can offer two compression modes: permanent > (which is implemented in the current state of the patch) and switchable > on-the-fly. Permanent compression allows us to deliver a robust solution that > is already present in some databases. Switchable compression allows us to > support more complex scenarios in cases when the frontend and backend really > need it and can afford development effort to implement it. I feel that one thing that may be getting missed here is that my suggestions were intended to make this simpler, not more complicated. Like, in the design I proposed, switchable compression is not a separate form of compression and doesn't require any special support. Both sides are just allowed to set the compression method; theoretically, they could set it more than once. Similarly, I don't intend the possibility of using different compression algorithms in the two directions as a request for an advanced feature so much as a way of simplifying the protocol. Like, in the protocol that you proposed previously, you've got a four-phase handshake to set up compression. The startup packet carries initial information from the client, and the server then sends CompressionAck, and then the client sends SetCompressionMethod, and then the server sends SetCompressionMethod. This system is fairly complex, and it requires some form of interlocking. Once the client has sent a SetCompressionMethod message, it cannot send any other protocol message until it receives a SetCompressionMethod message back from the server. Otherwise, it doesn't know whether the server actually responded with SetCompressionMethod as well, or whether it sent say ErrorResponse or NoticeResponse or something. In the former case it needs to send compressed data going forward; in the latter uncompressed; but it can't know which until it seems the server message. And keep in mind that control isn't necessarily with libpq at this point, because non-blocking mode could be in use. This is all solvable, but the way I proposed it, you don't have that problem. You never need to wait for a message from the other end before being able to send a message yourself. Similarly, allowing different compression methods in the two directions may seem to make things more complicated, but I don't think it really is. Arguably it's simpler. The instant the server gets the startup packet, it can issue SetCompressionMethod. The instant the client gets SupportedCompressionTypes, it can issue SetCompressionMethod. So there's practically no hand-shaking at all. You get a single protocol message and you immediately respond by setting the compression method and then you just send compressed messages after that. Perhaps the time at which you begin receiving compressed data will be a little different than the time at which you begin sending it, or perhaps compression will only ever be used in one direction. But so what? The code really does need to care. You just need to keep track of the active compression mode in each direction, and that's it. And again, if you allow the compression method to be switched at any time, you just have to know how what to do when you get a SetCompressionMethod. If you only allow it to be changed once, to set it initially, then you have to ADD code to reject that message the next time it's sent. If that ends up avoiding a significant amount of complexity somewhere else then I don't have a big problem with it, but if it doesn't, it's simpler to allow it whenever than to restrict it to only once. > 2. List of the compression algorithms which the frontend is able to > decompress in the order of preference. > For example: > “zlib:1,3,5;zstd:7,8;uncompressed” means that frontend is able to: > - decompress zlib with 1,3 or 5 compression levels > - decompress zstd with 7 or 8 compression levels > - “uncompressed” at the end means that the frontend agrees to receive > uncompressed messages. If there is no “uncompressed” compression algorithm > specified it means that the compression is required. I think that there's no such thing as being able to decompress some compression levels with the same algorithm but not others. The level only controls behavior on the compression side. So, the client can only send zlib data if the server can decompress it, but the server need not advertise which levels it can decompress, because it's all or nothing. > Supported compression and decompression methods are configured using GUC > parameters: > > compress_algorithms = ‘...’ // default value is ‘uncompressed’ > decompress_algorithms = ‘...’ // default value is ‘uncompressed’ This raises an interesting question which I'm not quite sure about. It doesn't seem controversial to assert that the client must be able to advertise which algorithms it does and does not support, and likewise for the s
Re: Commitfest 2020-11 is closed
On Thu, Dec 3, 2020 at 10:00 AM Tom Lane wrote: > This is actually a bit problematic, because now the cfbot is ignoring > those patches (or if it's not, I don't know where it's displaying the > results). Please go ahead and move the remaining open patches, or > else re-open the CF if that's possible. As of quite recently, Travis CI doesn't seem to like cfbot's rate of build jobs. Recently it's been taking a very long time to post results for new patches and taking so long to get around to retesting patches for bitrot that the my "delete old results after a week" logic started wiping out some results while they are still the most recent, leading to the blank spaces where the results are supposed to be. D'oh. I'm looking into a couple of options.
Re: Commitfest 2020-11 is closed
Anastasia Lubennikova writes: > Commitfest 2020-11 is officially closed now. > Many thanks to everyone who participated by posting patches, reviewing > them, committing and sharing ideas in discussions! Thanks for all the hard work! > Today, me and Georgios will move the remaining items to the next CF or > return them with feedback. We're planning to leave Ready For Committer > till the end of the week, to make them more visible and let them get the > attention they deserve. This is actually a bit problematic, because now the cfbot is ignoring those patches (or if it's not, I don't know where it's displaying the results). Please go ahead and move the remaining open patches, or else re-open the CF if that's possible. regards, tom lane
Re: SELECT INTO deprecation
Stephen Frost writes: > * Peter Eisentraut (peter.eisentr...@enterprisedb.com) wrote: >> While reading about deprecating and removing various things in other >> threads, I was wondering about how deprecated SELECT INTO is. There are >> various source code comments about this, but the SELECT INTO reference page >> only contains soft language like "recommended". I'm proposing the attached >> patch to stick a more explicit deprecation notice right at the top. > I don't see much value in this. Yeah, if we want to kill it let's just do so. The negative language in the reference page has been there since (at least) 7.1, so people can hardly say they didn't see it coming. regards, tom lane
Re: Add docs stub for recovery.conf
On Wed, Dec 2, 2020 at 02:47:13PM -0500, Stephen Frost wrote: > Greetings, > > * David G. Johnston (david.g.johns...@gmail.com) wrote: > > On Mon, Nov 30, 2020 at 11:42 AM Bruce Momjian wrote: > > > The downside is you end up with X-1 dummy sections just to allow for > > > references to old syntax, and you then have to find them all and remove > > > them when you implement the proper solution. I have no intention of > > > applying such an X-1 fix. > > > > X = 2; seems like a strong objection for such a minor issue. The status > > quo seems worse than that. > > I've been thinking about this and I think I'm on Craig and David's side- > having something cleaner, and clearer, than just http redirects and such > would be good for these cases and I don't think we are going to end up > with so many of them that it ends up becoming an issue. We were not going to use just redirects --- we were going to create a page that had all the renames listed, with links to the new names. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Minor documentation error regarding streaming replication protocol
On Tue, Dec 1, 2020 at 10:16:31AM -0800, Jeff Davis wrote: > On Thu, 2020-10-08 at 23:17 -0400, Bruce Momjian wrote: > > As I said before, it is more raw bytes, but > > we > > don't have an SQL data type for that. > > Sorry to jump in to this thread late. > > Can't we just set both "filename" and "content" to be BYTEA, but then > set the format code to 1 (indicating binary)? > > For clients that do look at the descriptor, they are more likely to get > it right; and for clients that don't look at the descriptor, there will > be no change. Yes, we could, but I thought the format code was not something we set at this level. Looking at byteasend() it is true it just sends the bytes. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Add table access method as an option to pgbench
Hello David, Some feedback about v4. It looks that the option is *silently* ignored when creating a partitionned table, which currently results in an error, and not passed to partitions, which would accept them. This is pretty weird. The input check is added with an error message when both partitions and table access method are used. Hmmm. If you take the resetting the default, I do not think that you should have to test anything? AFAICT the access method is valid on partitions, although not on the partitioned table declaration. So I'd say that you could remove the check. They should also trigger failures, eg "--table-access-method=no-such-table-am", to be added to the @errors list. No sure how to address this properly, since the table access method potentially can be *any* name. I think it is pretty unlikely that someone would chose the name "no-such-table-am" when developing a new storage engine for Postgres inside Postgres, so that it would make this internal test fail. There are numerous such names used in tests, eg "no-such-database", "no-such-command". So I'd suggest to add such a test to check for the expected failure. I do not understand why you duplicated all possible option entry. Test with just table access method looks redundant if the feature is make to work orthonogonally to partitions, which should be the case. Only one positive test case added using *heap* as the table access method to verify the initialization. Yep, only "heap" if currently available for tables. -- Fabien.
Re: [HACKERS] [PATCH] Generic type subscripting
Dmitry Dolgov <9erthali...@gmail.com> writes: >> On Wed, Dec 02, 2020 at 12:58:51PM -0500, Tom Lane wrote: >> So ... one of the things that's been worrying me about this patch >> from day one is whether it would create a noticeable performance >> penalty for existing use-cases. I did a small amount of experimentation >> about that with the v35 patchset, and it didn't take long at all to >> find that this: >> ... >> is about 15% slower with the patch than with HEAD. I'm not sure >> what an acceptable penalty might be, but 15% is certainly not it. > I've tried to reproduce that, but get ~2-4% slowdown (with a pinned > backend, no turbo etc). Are there any special steps I've probably > missed? Hmm, no, I just built with --disable-cassert and otherwise my usual development options. I had experimented with some other variants of the test case, where the repeated statement is a[i] := i;-- about the same a[i] := a[i-1] + 1; -- 7% slower a[i] := a[i-1] - a[i-2]; -- 15% slower so it seems clear that the penalty is on the array fetch not array assign side. This isn't too surprising now that I think about it, because plpgsql's array assignment code is untouched by the patch (which is a large feature omission BTW: you still can't write jsonb['x'] := y; in plpgsql). regards, tom lane
Re: SELECT INTO deprecation
On Wed, Dec 02, 2020 at 12:58:36PM -0500, Stephen Frost wrote: > Greetings, > > * Peter Eisentraut (peter.eisentr...@enterprisedb.com) wrote: > > While reading about deprecating and removing various things in > > other threads, I was wondering about how deprecated SELECT INTO > > is. There are various source code comments about this, but the > > SELECT INTO reference page only contains soft language like > > "recommended". I'm proposing the attached patch to stick a more > > explicit deprecation notice right at the top. > > I don't see much value in this. Users already have 5 years to adapt > their code to new major versions of PG and that strikes me as plenty > enough time and is why we support multiple major versions of PG for > so long. Users who keep pace and update for each major version > aren't likely to have issue making this change since they're already > used to regularly updating their code for new major versions, while > others are going to complain no matter when we remove it and will > ignore any deprecation notices we put out there, so there isn't much > point in them. +1 for removing it entirely and including this prominently in the release notes. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Off-Schedule Minor Release Consideration?
"David G. Johnston" writes: > Given that we have already heard about 3 separate minor release regressions > in the most recent update I'm putting forth for discussion whether an > off-schedule release should be done. I probably wouldn't care as much if > it wasn't for the fact that the same release fixes two CVEs. The release team had a discussion about this. We're leaning towards not doing anything out of the ordinary, but are open to public input. To recap a bit: Actually there are four known regressions, because there are two different CREATE TABLE LIKE bugs: * CREATE TABLE LIKE gets confused if new table masks old one (see f7f83a55b) * CREATE TABLE LIKE fails to handle a self-referential foreign key defined in the CREATE, if it depends on an index imported by LIKE (see 97390fe8a) * psql \c with a connstring containing a password ignores the password (see 7e5e1bba0) * LISTEN/NOTIFY can fail to empty its SLRU queue in v12 and earlier (see 9c83b54a9) We felt that the first three of these are at best minor annoyances. The LISTEN/NOTIFY issue is potentially more serious, in that it could cause a service outage for a site that makes heavy use of NOTIFY. However, the race is between two listeners, so in the common (I think) usage pattern of only one listener there's no issue. Even if you hit the race condition, you don't have a problem until 8GB more NOTIFY traffic has been sent, which seems like a lot. The complainants in that bug report indicated they'd overrun that amount in a week or two, but that seems like unusually heavy usage. The other aspect that seems to mitigate the seriousness of the LISTEN/NOTIFY issue is that a server restart is enough to clear the queue. Given that you'd also need a server restart to install a hypothetical off-cycle release, it's easy to see that there's no benefit except to sites that anticipate logging more than 16GB of NOTIFY traffic (and thus needing two or more restarts) before February. So we are leaning to the position that the severity of these issues doesn't justify the work involved in putting out an off-cycle release. But if there are people out there with high-NOTIFY-traffic servers who would be impacted, please speak up! (BTW, the calendar is also a bit unfriendly to doing an off-cycle release. It's probably already too late to plan a Dec 10 release, and by the week after that a lot of people will be in holiday mode. If you start thinking about January, it's not clear why bother, since we'll have minor releases in February anyway.) regards, tom lane
Re: [PATCH] SET search_path += octopus
On Tue, Dec 1, 2020 at 08:19:34PM +0530, Abhijit Menon-Sen wrote: > I'm certainly not in favour of introducing multiple operators to express > the various list operations, like += for prepend and =+ for append. (By > the standard of assembling such things from spare parts, the right thing > to do would be to add M4 support to postgresql.conf, but I'm not much of > a fan of that idea either.) Yeah, when M4 is your next option, you are in trouble. ;-) -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Add docs stub for recovery.conf
Greetings, * David G. Johnston (david.g.johns...@gmail.com) wrote: > On Mon, Nov 30, 2020 at 11:42 AM Bruce Momjian wrote: > > The downside is you end up with X-1 dummy sections just to allow for > > references to old syntax, and you then have to find them all and remove > > them when you implement the proper solution. I have no intention of > > applying such an X-1 fix. > > X = 2; seems like a strong objection for such a minor issue. The status > quo seems worse than that. I've been thinking about this and I think I'm on Craig and David's side- having something cleaner, and clearer, than just http redirects and such would be good for these cases and I don't think we are going to end up with so many of them that it ends up becoming an issue. Thanks, Stephen signature.asc Description: PGP signature
Re: Log message for GSS connection is missing once connection authorization is successful.
Greetings, * Stephen Frost (sfr...@snowman.net) wrote: > * vignesh C (vignes...@gmail.com) wrote: > > Thanks for testing this, I had missed testing this. The expression > > matching was not correct. Attached v6 patch which includes the fix for > > this. > > This generally looks pretty good to me. I did reword the commit message > a bit, run pgindent, and added the appropriate log message for the last > test (was there a reason you didn't include that..?). In general, this > looks pretty good to commit to me. > > I'll look at it again over the weekend or early next week and unless > there's objections, I'll push it. And committed. Thanks! Stephen signature.asc Description: PGP signature
Re: [HACKERS] [PATCH] Generic type subscripting
> On Wed, Dec 02, 2020 at 01:20:10PM -0600, Justin Pryzby wrote: > On Wed, Dec 02, 2020 at 08:18:08PM +0100, Dmitry Dolgov wrote: > > > On Wed, Dec 02, 2020 at 12:58:51PM -0500, Tom Lane wrote: > > > So ... one of the things that's been worrying me about this patch > > > from day one is whether it would create a noticeable performance > > > penalty for existing use-cases. I did a small amount of experimentation > > > about that with the v35 patchset, and it didn't take long at all to > > > find that this: > > > --- cut --- > > > > > > is about 15% slower with the patch than with HEAD. I'm not sure > > > what an acceptable penalty might be, but 15% is certainly not it. > > > > > > I'm also not quite sure where the cost is going. It looks like > > > 0001+0002 aren't doing much to the executor except introducing > > > one level of subroutine call, which doesn't seem like it'd account > > > for that. > > > > I've tried to reproduce that, but get ~2-4% slowdown (with a pinned > > backend, no turbo etc). Are there any special steps I've probably > > missed? At the same time, I remember had conducted this sort of tests > > before when you and others raised the performance degradation question > > and the main part of the patch was already more or less stable. From > > what I remember the numbers back then were also rather small. > > Are you comparing with casserts (and therefor MEMORY_CONTEXT_CHECKING) > disabled? Yep, they're disabled.
Re: proposal: unescape_text function
On Wed, Dec 2, 2020 at 07:30:39PM +0100, Pavel Stehule wrote: > postgres=# select > 'Arabic : ' || unistr( '\0627\0644\0639\0631\0628\064A\0629' ) || ' > Chinese : ' || unistr( '\4E2D\6587' ) || ' > English : ' || unistr( 'English' ) || ' > French : ' || unistr( 'Fran\00E7ais' ) || ' > German : ' || unistr( 'Deutsch' ) || ' > Greek : ' || unistr( '\0395\03BB\03BB\03B7\03BD\03B9\03BA\03AC' ) || ' > Hebrew : ' || unistr( '\05E2\05D1\05E8\05D9\05EA' ) || ' > Japanese : ' || unistr( '\65E5\672C\8A9E' ) || ' > Korean : ' || unistr( '\D55C\AD6D\C5B4' ) || ' > Portuguese : ' || unistr( 'Portugu\00EAs' ) || ' > Russian : ' || unistr( '\0420\0443\0441\0441\043A\0438\0439' ) || ' > Spanish : ' || unistr( 'Espa\00F1ol' ) || ' > Thai : ' || unistr( '\0E44\0E17\0E22' ) > as unicode_test_string; > ┌──┐ > │ unicode_test_string │ > ╞══╡ > │ Arabic : العربية ↵│ > │ Chinese : 中文 ↵│ > │ English : English ↵│ > │ French : Français ↵│ > │ German : Deutsch ↵│ > │ Greek : Ελληνικά ↵│ > │ Hebrew : עברית ↵│ > │ Japanese : 日本語 ↵│ > │ Korean : 한국어 ↵│ > │ Portuguese : Português↵│ > │ Russian : Русский ↵│ > │ Spanish : Español ↵│ > │ Thai : ไทย │ > └──┘ Offlist, this table output is super-cool! -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: [HACKERS] [PATCH] Generic type subscripting
On Wed, Dec 02, 2020 at 08:18:08PM +0100, Dmitry Dolgov wrote: > > On Wed, Dec 02, 2020 at 12:58:51PM -0500, Tom Lane wrote: > > So ... one of the things that's been worrying me about this patch > > from day one is whether it would create a noticeable performance > > penalty for existing use-cases. I did a small amount of experimentation > > about that with the v35 patchset, and it didn't take long at all to > > find that this: > > --- cut --- > > > > is about 15% slower with the patch than with HEAD. I'm not sure > > what an acceptable penalty might be, but 15% is certainly not it. > > > > I'm also not quite sure where the cost is going. It looks like > > 0001+0002 aren't doing much to the executor except introducing > > one level of subroutine call, which doesn't seem like it'd account > > for that. > > I've tried to reproduce that, but get ~2-4% slowdown (with a pinned > backend, no turbo etc). Are there any special steps I've probably > missed? At the same time, I remember had conducted this sort of tests > before when you and others raised the performance degradation question > and the main part of the patch was already more or less stable. From > what I remember the numbers back then were also rather small. Are you comparing with casserts (and therefor MEMORY_CONTEXT_CHECKING) disabled? -- Justin
Re: [HACKERS] [PATCH] Generic type subscripting
> On Wed, Dec 02, 2020 at 11:52:54AM -0500, Tom Lane wrote: > Dmitry Dolgov <9erthali...@gmail.com> writes: > >> On Mon, Nov 30, 2020 at 02:26:19PM +0100, Dmitry Dolgov wrote: > >>> On Mon, Nov 30, 2020 at 04:12:29PM +0300, Alexander Korotkov wrote: > >>> The idea of an opaque field in SubscriptingRef structure is more > >>> attractive to me. Could you please implement it? > > >> Sure, doesn't seem to be that much work. > > I just happened to notice this bit. This idea is a complete nonstarter. > You cannot have an "opaque" field in a parsetree node, because then the > backend/nodes code has no idea what to do with it for > copy/compare/outfuncs/readfuncs. The patch seems to be of the opinion > that "do nothing" is adequate, which it completely isn't. > > Perhaps this is a good juncture at which to remind people that parse > tree nodes are read-only so far as the executor is concerned, so > storing something there only at execution time won't work either. Oh, right, stupid of me. Then I'll just stick with the original Alexanders suggestion.
Re: [HACKERS] [PATCH] Generic type subscripting
> On Wed, Dec 02, 2020 at 12:58:51PM -0500, Tom Lane wrote: > So ... one of the things that's been worrying me about this patch > from day one is whether it would create a noticeable performance > penalty for existing use-cases. I did a small amount of experimentation > about that with the v35 patchset, and it didn't take long at all to > find that this: > > --- cut --- > create or replace function arraytest(n int) returns void as > $$ > declare > a int[]; > begin > a := array[1, 1]; > for i in 3..n loop > a[i] := a[i-1] - a[i-2]; > end loop; > end; > $$ > language plpgsql stable; > > \timing on > > select arraytest(1000); > --- cut --- > > is about 15% slower with the patch than with HEAD. I'm not sure > what an acceptable penalty might be, but 15% is certainly not it. > > I'm also not quite sure where the cost is going. It looks like > 0001+0002 aren't doing much to the executor except introducing > one level of subroutine call, which doesn't seem like it'd account > for that. I've tried to reproduce that, but get ~2-4% slowdown (with a pinned backend, no turbo etc). Are there any special steps I've probably missed? At the same time, I remember had conducted this sort of tests before when you and others raised the performance degradation question and the main part of the patch was already more or less stable. From what I remember the numbers back then were also rather small.
Re: proposal: unescape_text function
st 2. 12. 2020 v 11:37 odesílatel Pavel Stehule napsal: > > > st 2. 12. 2020 v 9:23 odesílatel Peter Eisentraut < > peter.eisentr...@enterprisedb.com> napsal: > >> On 2020-11-30 22:15, Pavel Stehule wrote: >> > I would like some supporting documentation on this. So far we only >> > have >> > one stackoverflow question, and then this implementation, and they >> are >> > not even the same format. My worry is that if there is not precise >> > specification, then people are going to want to add things in the >> > future, and there will be no way to analyze such requests in a >> > principled way. >> > >> > >> > I checked this and it is "prefix backslash-u hex" used by Java, >> > JavaScript or RTF - >> > https://billposer.org/Software/ListOfRepresentations.html >> >> Heh. The fact that there is a table of two dozen possible >> representations kind of proves my point that we should be deliberate in >> picking one. >> >> I do see Oracle unistr() on that list, which appears to be very similar >> to what you are trying to do here. Maybe look into aligning with that. >> > > unistr is a primitive form of proposed function. But it can be used as a > base. The format is compatible with our "4.1.2.3. String Constants with > Unicode Escapes". > > What do you think about the following proposal? > > 1. unistr(text) .. compatible with Postgres unicode escapes - it is > enhanced against Oracle, because Oracle's unistr doesn't support 6 digits > unicodes. > > 2. there can be optional parameter "prefix" with default "\". But with > "\u" it can be compatible with Java or Python. > > What do you think about it? > I thought about it a little bit more, and the prefix specification has not too much sense (more if we implement this functionality as function "unistr"). I removed the optional argument and renamed the function to "unistr". The functionality is the same. Now it supports Oracle convention, Java and Python (for Python U) and \+XX. These formats was already supported. The compatibility witth Oracle is nice. postgres=# select 'Arabic : ' || unistr( '\0627\0644\0639\0631\0628\064A\0629' ) || ' Chinese: ' || unistr( '\4E2D\6587' ) || ' English: ' || unistr( 'English' ) || ' French : ' || unistr( 'Fran\00E7ais' ) || ' German : ' || unistr( 'Deutsch' ) || ' Greek : ' || unistr( '\0395\03BB\03BB\03B7\03BD\03B9\03BA\03AC' ) || ' Hebrew : ' || unistr( '\05E2\05D1\05E8\05D9\05EA' )|| ' Japanese : ' || unistr( '\65E5\672C\8A9E' ) || ' Korean : ' || unistr( '\D55C\AD6D\C5B4' ) || ' Portuguese : ' || unistr( 'Portugu\00EAs' )|| ' Russian: ' || unistr( '\0420\0443\0441\0441\043A\0438\0439' ) || ' Spanish: ' || unistr( 'Espa\00F1ol' ) || ' Thai : ' || unistr( '\0E44\0E17\0E22' ) as unicode_test_string; ┌──┐ │ unicode_test_string│ ╞══╡ │ Arabic : العربية↵│ │ Chinese: 中文 ↵│ │ English: English ↵│ │ French : Français ↵│ │ German : Deutsch ↵│ │ Greek : Ελληνικά ↵│ │ Hebrew : עברית↵│ │ Japanese : 日本語 ↵│ │ Korean : 한국어 ↵│ │ Portuguese : Português↵│ │ Russian: Русский ↵│ │ Spanish: Español ↵│ │ Thai : ไทย │ └──┘ (1 row) postgres=# SELECT UNISTR('Odpov\u011Bdn\u00E1 osoba'); ┌─┐ │ unistr │ ╞═╡ │ Odpovědná osoba │ └─┘ (1 row) New patch attached Regards Pavel > Pavel > diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index df29af6371..6ad8136523 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -3553,6 +3553,34 @@ repeat('Pg', 4) PgPgPgPg + + + + unistr + +unistr ( string text ) +text + + +Evaluate escaped unicode chars (4 or 6 digits) without prefix or + with prefix u (4 digits) or with prefix +U (8 digits) to chars or with prefix ++ (6 digits). + + +unistr('\0441\043B\043E\043D') +слон + + +unistr('d\0061t\+61') +data + + +unistr('d\u0061t\U0061') +data + + + diff --git a/src/backend/parser/parser.c b/src/backend/parser/parser.c index be86eb37fe..cbddb61396 100644 --- a/src/backend/parser/parser.c +++ b/src/backend/parser/parser.c @@ -278,30 +278,6 @@ base_yylex(YYSTYPE *lvalp, YYLTYPE *llocp, core_yyscan_t yyscanner) return cur_token; } -/* convert hex digit (caller should have verified that) to value */ -static unsigned int -hexval(unsigned char c) -
Re: Deprecate custom encoding conversions
Heikki Linnakangas writes: > On 02/12/2020 18:18, Tom Lane wrote: >> Heikki Linnakangas writes: >>> I propose that we add a notice to the CREATE CONVERSION docs to say that >>> it is deprecated, and remove it in a few years. >> While I agree that it's probably not that useful, what would we gain >> by removing it? If you intend to remove those catalogs, what lookup >> mechanism would replace them? We can't exactly drop the encoding >> conversion functionality. > I'm thinking of replacing the catalog with a hard-coded 2D array of > function pointers. Or something along those lines. I like the current structure in which the encoding functions are in separate .so's, so that you don't load the ones you don't need. It's not real clear how to preserve that if we hard-code things. > So I'm looking for refactoring the conversion routines to be more > convenient for the callers. But the current function signature is > exposed at the SQL level, thanks to CREATE CONVERSION. I'd be the first to agree that the current API for conversion functions is not terribly well-designed. But what if we just change it? That can't be any worse of a compatibility hit than removing CREATE CONVERSION altogether. regards, tom lane
Re: [HACKERS] [PATCH] Generic type subscripting
So ... one of the things that's been worrying me about this patch from day one is whether it would create a noticeable performance penalty for existing use-cases. I did a small amount of experimentation about that with the v35 patchset, and it didn't take long at all to find that this: --- cut --- create or replace function arraytest(n int) returns void as $$ declare a int[]; begin a := array[1, 1]; for i in 3..n loop a[i] := a[i-1] - a[i-2]; end loop; end; $$ language plpgsql stable; \timing on select arraytest(1000); --- cut --- is about 15% slower with the patch than with HEAD. I'm not sure what an acceptable penalty might be, but 15% is certainly not it. I'm also not quite sure where the cost is going. It looks like 0001+0002 aren't doing much to the executor except introducing one level of subroutine call, which doesn't seem like it'd account for that. I don't think this can be considered RFC until the performance issue is addressed. regards, tom lane
Re: SELECT INTO deprecation
Greetings, * Peter Eisentraut (peter.eisentr...@enterprisedb.com) wrote: > While reading about deprecating and removing various things in other > threads, I was wondering about how deprecated SELECT INTO is. There are > various source code comments about this, but the SELECT INTO reference page > only contains soft language like "recommended". I'm proposing the attached > patch to stick a more explicit deprecation notice right at the top. I don't see much value in this. Users already have 5 years to adapt their code to new major versions of PG and that strikes me as plenty enough time and is why we support multiple major versions of PG for so long. Users who keep pace and update for each major version aren't likely to have issue making this change since they're already used to regularly updating their code for new major versions, while others are going to complain no matter when we remove it and will ignore any deprecation notices we put out there, so there isn't much point in them. > I also found some gratuitous uses of SELECT INTO in various tests and > documentation (not ecpg or plpgsql of course). Here is a patch to adjust > those to CREATE TABLE AS. If we aren't actually removing SELECT INTO then I don't know that it makes sense to just stop testing it. > I don't have a specific plan for removing top-level SELECT INTO altogether, > but there is a nontrivial amount of code for handling it, so there would be > some gain if it could be removed eventually. We should either remove it, or remove the comments that it's deprecated, not try to make it more deprecated or try to somehow increase the recommendation to not use it. I'd recommend we remove it and make note that it's been removed in v14 in the back branches at the same time, which will give those who actually pay attention and care that much more time before v14 comes out to update their code (not that I'm all that worried, as they'll also see it in the beta release notes too). Thanks, Stephen signature.asc Description: PGP signature
Re: should INSERT SELECT use a BulkInsertState?
One loose end in this patch is how to check for volatile default expressions. copyfrom.c is a utility statement, so it can look at the parser's column list: COPY table(c1,c2)... However, for INSERT, in nodeModifyTable.c, it looks like parsing, rewriting, and planning are done, at which point I don't know if there's a good way to find that. The default expressions will have been rewritten into the planned statement. We need the list of columns whose default is volatile, excluding columns for which a non-default value is specified. INSERT INTO table (c1,c2) VALUES (1,default); We'd want the list of any column in the table with a volatile default, excluding columns c1, but not-excluding explicit default columns c2 or any implicit default columns (c3, etc). Any idea ? -- Justin
Re: [HACKERS] [PATCH] Generic type subscripting
Dmitry Dolgov <9erthali...@gmail.com> writes: >> On Mon, Nov 30, 2020 at 02:26:19PM +0100, Dmitry Dolgov wrote: >>> On Mon, Nov 30, 2020 at 04:12:29PM +0300, Alexander Korotkov wrote: >>> The idea of an opaque field in SubscriptingRef structure is more >>> attractive to me. Could you please implement it? >> Sure, doesn't seem to be that much work. I just happened to notice this bit. This idea is a complete nonstarter. You cannot have an "opaque" field in a parsetree node, because then the backend/nodes code has no idea what to do with it for copy/compare/outfuncs/readfuncs. The patch seems to be of the opinion that "do nothing" is adequate, which it completely isn't. Perhaps this is a good juncture at which to remind people that parse tree nodes are read-only so far as the executor is concerned, so storing something there only at execution time won't work either. regards, tom lane
Re: Additional improvements to extended statistics
On 12/2/20 4:51 PM, Dean Rasheed wrote: > On Sun, 29 Nov 2020 at 21:02, Tomas Vondra > wrote: >> >> I wonder how much of the comment before clauselist_selectivity should >> move to clauselist_selectivity_ext - it does talk about range clauses >> and so on, but clauselist_selectivity does not really deal with that. >> But maybe that's just an implementation detail and it's better to keep >> the comment the way it is. > > I think it's better to keep it the way it is, so that the entirety of > what clauselist_selectivity() does (via clauselist_selectivity_ext()) > can be read in one place, but I have added separate comments for the > new "_ext" functions to explain how they differ. That matches similar > examples elsewhere. > +1 > >> I noticed this outdated comment: >> >> /* Always compute the selectivity using clause_selectivity */ >> s2 = clause_selectivity_ext(root, clause, varRelid, jointype, sjinfo, > > Updated. > > >> Also, the comment at clauselist_selectivity_or seems to not follow the >> usual pattern, which I think is >> >> /* >> * function name >> * short one-sentence description >> * >> * ... longer description ... >> */ > > Hmm, it seems OK to me. The first part is basically copied from > clauselist_selectivity(). The "longer description" doesn't really have > much more to say because it's much simpler than > clauselist_selectivity(), but it seems neater to keep the two roughly > in sync. > I see. In that case it's OK, I guess. > > I've been hacking on this a bit more and attached is an updated > (hopefully final) version with some other comment improvements and > also a couple of other tweaks: > > The previous version had duplicated code blocks that implemented the > same MCV-correction algorithm using simple_sel, mcv_sel, base_sel, > other_sel and total_sel, which was quite messy. So I refactored that > into a new function mcv_combine_selectivities(). About half the > comments from statext_mcv_clauselist_selectivity() then move over to > mcv_combine_selectivities(). I also updated the comments for > mcv_clauselist_selectivity() and mcv_clause_selectivity_or() to > explain how their outputs are expected to be used by > mcv_combine_selectivities(). That hopefully makes for a clean > separation of concerns, and makes it easier to tweak the way MCV stats > are applied on top of simple stats, if someone thinks of a better > approach in the future. > > In the previous version, for an ORed list of clauses, the MCV > correction was only applied to the overlaps between clauses. That's OK > as long as each clause only refers to a single column, since the > per-column statistics ought to be the best way to estimate each > individual clause in that case. However, if the individual clauses > refer to more than one column, I think the MCV correction should be > applied to each individual clause as well as to the overlaps. That > turns out to be pretty straightforward, since we're already doing all > the hard work of computing the match bitmap for each clause. The sort > of queries I had in mind were things like this: > > WHERE (a = 1 AND b = 1) OR (a = 2 AND b = 2) > > I added a new test case along those lines and the new estimates are > much better than they are without this patch, but not for the reason I > thought --- the old code consistently over-estimated queries like that > because it actually applied the MCV correction twice (once while > processing each AND list, via clause_selectivity(), called from > clauselist_selectivity_simple(), and once for the top-level OR clause, > contained in a single-element implicitly-ANDed list). The way the new > code is structured avoids any kind of double application of extended > stats, producing a much better estimate, which is good. > Nice. > However, the new code doesn't apply the extended stats directly using > clauselist_selectivity_or() for this kind of query because there are > no RestrictInfos for the nested AND clauses, so > find_single_rel_for_clauses() (and similarly > statext_is_compatible_clause()) regards those clauses as not > compatible with extended stats. So what ends up happening is that > extended stats are used only when we descend down to the two AND > clauses, and their results are combined using the original "s1 + s2 - > s1 * s2" formula. That actually works OK in this case, because there > is no overlap between the two AND clauses, but it wouldn't work so > well if there was. > > I'm pretty sure that can be fixed by teaching > find_single_rel_for_clauses() and statext_is_compatible_clause() to > handle BoolExpr clauses, looking for RestrictInfos underneath them, > but I think that should be left for a follow-in patch. I have left a > regression test in place, whose estimates ought to be improved by such > a fix. > Yeah, I agree with leaving this for a separate patch. We can't do everything at the same time. > The upshot of all that is that the new code that applies the MCV > correction to the individ
Re: proposal: unescape_text function
On 12/02/20 09:55, Chapman Flack wrote: > In Perl, there is a useful extension to regexp substitution where > you specify the replacement not as a string or even a string with & > and \1 \2 ... magic, but as essentially a lambda that is passed the > match and returns a computed replacement. That makes conversions of > the sort discussed here generally trivial to implement. Python, I should have added, allows that also. Java too, since release 9. Regards, -Chap
Re: Deprecate custom encoding conversions
On 02/12/2020 18:18, Tom Lane wrote: Heikki Linnakangas writes: I propose that we add a notice to the CREATE CONVERSION docs to say that it is deprecated, and remove it in a few years. While I agree that it's probably not that useful, what would we gain by removing it? If you intend to remove those catalogs, what lookup mechanism would replace them? We can't exactly drop the encoding conversion functionality. I'm thinking of replacing the catalog with a hard-coded 2D array of function pointers. Or something along those lines. I had this idea while looking at the encoding conversions performed in COPY. The current conversion functions return a null-terminated, palloc'd string, which is a bit awkward for the callers. The caller needs to call strlen() on the result, and it would be nice to reuse the same buffer for all the conversions. And I've got a hunch that it'd be faster to convert the whole 64 kB raw input buffer in one go, rather than convert each line separately, but the current function signature doesn't make that easy either. So I'm looking for refactoring the conversion routines to be more convenient for the callers. But the current function signature is exposed at the SQL level, thanks to CREATE CONVERSION. - Heikki
Re: Deprecate custom encoding conversions
Heikki Linnakangas writes: > I propose that we add a notice to the CREATE CONVERSION docs to say that > it is deprecated, and remove it in a few years. While I agree that it's probably not that useful, what would we gain by removing it? If you intend to remove those catalogs, what lookup mechanism would replace them? We can't exactly drop the encoding conversion functionality. regards, tom lane
Re: SELECT INTO deprecation
st 2. 12. 2020 v 12:55 odesílatel Peter Eisentraut < peter.eisentr...@enterprisedb.com> napsal: > While reading about deprecating and removing various things in other > threads, I was wondering about how deprecated SELECT INTO is. There are > various source code comments about this, but the SELECT INTO reference > page only contains soft language like "recommended". I'm proposing the > attached patch to stick a more explicit deprecation notice right at the > top. > > I also found some gratuitous uses of SELECT INTO in various tests and > documentation (not ecpg or plpgsql of course). Here is a patch to > adjust those to CREATE TABLE AS. > > I don't have a specific plan for removing top-level SELECT INTO > altogether, but there is a nontrivial amount of code for handling it, so > there would be some gain if it could be removed eventually. > +1 Pavel
Deprecate custom encoding conversions
Hi, PostgreSQL allows writing custom encoding conversion functions between any character encodings, using the CREATE CONVERSION command. It's pretty flexible, you can define default and non-default conversions, and the conversions live in schemas so you can have multiple conversions installed in a system and you can switch between them by changing search_path. However: We never use non-default conversions for anything. All code that performs encoding conversions only cares about the default ones. I think this flexibility is kind of ridiculous anyway. If a built-in conversion routine doesn't handle some characters correctly, surely we should fix the built-in function, rather than provide a mechanism for having your own conversion functions. If you truly need to perform the conversions differently than the built-in routines do, you can always perform the conversion in the client instead. Note that we don't support adding completely new custom encodings, all this is just for conversions between the built-in encodings that we have. I propose that we add a notice to the CREATE CONVERSION docs to say that it is deprecated, and remove it in a few years. Any objections? Anyone using custom encoding conversions in production? - Heikki
Re: Additional improvements to extended statistics
On Sun, 29 Nov 2020 at 21:02, Tomas Vondra wrote: > > I wonder how much of the comment before clauselist_selectivity should > move to clauselist_selectivity_ext - it does talk about range clauses > and so on, but clauselist_selectivity does not really deal with that. > But maybe that's just an implementation detail and it's better to keep > the comment the way it is. I think it's better to keep it the way it is, so that the entirety of what clauselist_selectivity() does (via clauselist_selectivity_ext()) can be read in one place, but I have added separate comments for the new "_ext" functions to explain how they differ. That matches similar examples elsewhere. > I noticed this outdated comment: > > /* Always compute the selectivity using clause_selectivity */ > s2 = clause_selectivity_ext(root, clause, varRelid, jointype, sjinfo, Updated. > Also, the comment at clauselist_selectivity_or seems to not follow the > usual pattern, which I think is > > /* > * function name > * short one-sentence description > * > * ... longer description ... > */ Hmm, it seems OK to me. The first part is basically copied from clauselist_selectivity(). The "longer description" doesn't really have much more to say because it's much simpler than clauselist_selectivity(), but it seems neater to keep the two roughly in sync. I've been hacking on this a bit more and attached is an updated (hopefully final) version with some other comment improvements and also a couple of other tweaks: The previous version had duplicated code blocks that implemented the same MCV-correction algorithm using simple_sel, mcv_sel, base_sel, other_sel and total_sel, which was quite messy. So I refactored that into a new function mcv_combine_selectivities(). About half the comments from statext_mcv_clauselist_selectivity() then move over to mcv_combine_selectivities(). I also updated the comments for mcv_clauselist_selectivity() and mcv_clause_selectivity_or() to explain how their outputs are expected to be used by mcv_combine_selectivities(). That hopefully makes for a clean separation of concerns, and makes it easier to tweak the way MCV stats are applied on top of simple stats, if someone thinks of a better approach in the future. In the previous version, for an ORed list of clauses, the MCV correction was only applied to the overlaps between clauses. That's OK as long as each clause only refers to a single column, since the per-column statistics ought to be the best way to estimate each individual clause in that case. However, if the individual clauses refer to more than one column, I think the MCV correction should be applied to each individual clause as well as to the overlaps. That turns out to be pretty straightforward, since we're already doing all the hard work of computing the match bitmap for each clause. The sort of queries I had in mind were things like this: WHERE (a = 1 AND b = 1) OR (a = 2 AND b = 2) I added a new test case along those lines and the new estimates are much better than they are without this patch, but not for the reason I thought --- the old code consistently over-estimated queries like that because it actually applied the MCV correction twice (once while processing each AND list, via clause_selectivity(), called from clauselist_selectivity_simple(), and once for the top-level OR clause, contained in a single-element implicitly-ANDed list). The way the new code is structured avoids any kind of double application of extended stats, producing a much better estimate, which is good. However, the new code doesn't apply the extended stats directly using clauselist_selectivity_or() for this kind of query because there are no RestrictInfos for the nested AND clauses, so find_single_rel_for_clauses() (and similarly statext_is_compatible_clause()) regards those clauses as not compatible with extended stats. So what ends up happening is that extended stats are used only when we descend down to the two AND clauses, and their results are combined using the original "s1 + s2 - s1 * s2" formula. That actually works OK in this case, because there is no overlap between the two AND clauses, but it wouldn't work so well if there was. I'm pretty sure that can be fixed by teaching find_single_rel_for_clauses() and statext_is_compatible_clause() to handle BoolExpr clauses, looking for RestrictInfos underneath them, but I think that should be left for a follow-in patch. I have left a regression test in place, whose estimates ought to be improved by such a fix. The upshot of all that is that the new code that applies the MCV correction to the individual clauses in an ORed list doesn't help with queries like the one above at the moment, and it's not obvious whether it is currently reachable, but I think it's worth leaving in because it seems more principled, and makes that code more future-proof. I also think it's neater because now the signature of mcv_clause_selectivity_or() is more natural --- it's primary return va
Re: [HACKERS] [PATCH] Generic type subscripting
> On Mon, Nov 30, 2020 at 02:26:19PM +0100, Dmitry Dolgov wrote: > > On Mon, Nov 30, 2020 at 04:12:29PM +0300, Alexander Korotkov wrote: > > > > > > My first question is whether we're > > > > able to handle different subscript types differently. For instance, > > > > one day we could handle jsonpath subscripts for jsonb. And for sure, > > > > jsonpath subscripts are expected to be handled differently from text > > > > subscripts. I see we can distinguish types during in prepare and > > > > validate functions. But it seems there is no type information in > > > > fetch and assign functions. Should we add something like this to the > > > > SubscriptingRefState for future usage? > > > > > > > > Datum uppertypeoid[MAX_SUBSCRIPT_DEPTH]; > > > > Datum lowertypeoid[MAX_SUBSCRIPT_DEPTH]; > > > > > > Yes, makes sense. My original idea was that it could be done within the > > > jsonpath support patch itself, but at the same time providing these > > > fields into SubscriptingRefState will help other potential extensions. > > > > > > Having said that, maybe it would be even better to introduce a field > > > with an opaque structure for both SubscriptingRefState and > > > SubscriptingRef, where every implementation of custom subscripting can > > > store any necessary information? In case of jsonpath it could keep type > > > information acquired in prepare function, which would be then passed via > > > SubscriptingRefState down to the fetch/assign. > > > > The idea of an opaque field in SubscriptingRef structure is more > > attractive to me. Could you please implement it? > > Sure, doesn't seem to be that much work. The attached implementation should be enough I guess, as fetch/assign are executed in a child memory context of one where prepare does the stuff. >From 6f7d9589cc827dfb3b1d82a3e0e629b482e4c77a Mon Sep 17 00:00:00 2001 From: erthalion <9erthali...@gmail.com> Date: Thu, 31 Jan 2019 22:37:19 +0100 Subject: [PATCH v35 1/5] Base implementation of subscripting mechanism Introduce all the required machinery for generalizing subscripting operation for a different data types: * subscripting handler procedure, to set up a relation between data type and corresponding subscripting logic. * subscripting routines, that help generalize a subscripting logic, since it involves few stages, namely preparation (e.g. to figure out required types), validation (to check the input and return meaningful error message), fetch (executed when we extract a value using subscripting), assign (executed when we update a data type with a new value using subscripting). Without this it would be neccessary to introduce more new fields to pg_type, which would be too invasive. All ArrayRef related logic was removed and landed as a separate subscripting implementation in the following patch from this series. The rest of the code was rearranged, to e.g. store a type of assigned value for an assign operation. Reviewed-by: Tom Lane, Arthur Zakirov --- .../pg_stat_statements/pg_stat_statements.c | 1 + src/backend/catalog/heap.c| 6 +- src/backend/catalog/pg_type.c | 7 +- src/backend/commands/typecmds.c | 77 +- src/backend/executor/execExpr.c | 26 +--- src/backend/executor/execExprInterp.c | 124 +++ src/backend/nodes/copyfuncs.c | 2 + src/backend/nodes/equalfuncs.c| 2 + src/backend/nodes/outfuncs.c | 2 + src/backend/nodes/readfuncs.c | 2 + src/backend/parser/parse_expr.c | 54 --- src/backend/parser/parse_node.c | 141 -- src/backend/parser/parse_target.c | 88 +-- src/backend/utils/adt/ruleutils.c | 21 +-- src/backend/utils/cache/lsyscache.c | 23 +++ src/include/c.h | 2 + src/include/catalog/pg_type.h | 9 +- src/include/executor/execExpr.h | 15 +- src/include/nodes/primnodes.h | 8 + src/include/nodes/subscripting.h | 42 ++ src/include/parser/parse_node.h | 6 +- src/include/utils/lsyscache.h | 1 + 22 files changed, 333 insertions(+), 326 deletions(-) create mode 100644 src/include/nodes/subscripting.h diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 1eac9edaee..31ba120fb2 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -2800,6 +2800,7 @@ JumbleExpr(pgssJumbleState *jstate, Node *node) JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr); JumbleExpr(jstate, (Node *) sbsref->refexpr); JumbleExpr(jstate, (Node *) sbsref->refassgnexpr); +APP_JUMB(sbsref->refnestedfunc); } break; case T_FuncExpr: diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
Re: proposal: unescape_text function
On 12/02/20 05:37, Pavel Stehule wrote: > 2. there can be optional parameter "prefix" with default "\". But with "\u" > it can be compatible with Java or Python. Java's unicode escape form is one of those early ones that lack a six-digit form, and where any character outside of the basic multilingual plane has to be represented by two four-digit escapes in a row, encoding the two surrogates that would make up the character's representation in UTF-16. Obviously that's an existing form that's out there, so it's not a bad thing to have some kind of support for it, but it's not a great representation to encourage people to use. Python, by contrast, has both \u and \U where you would use the latter to represent a non-BMP character directly. So the Java and Python schemes should be considered distinct. In Perl, there is a useful extension to regexp substitution where you specify the replacement not as a string or even a string with & and \1 \2 ... magic, but as essentially a lambda that is passed the match and returns a computed replacement. That makes conversions of the sort discussed here generally trivial to implement. Would it be worth considering to add something of general utility like that, and then there could be a small library of pure SQL functions (or a wiki page or GitHub gist) covering a bunch of the two dozen representations on that page linked above? Regards, -Chap
Re: Corner-case bug in pg_rewind
On 02/12/2020 20:13, Heikki Linnakangas wrote: On 01/12/2020 16:52, Pavel Borisov wrote: Status update for a commitfest entry. The patch is Waiting on Author for some time. As this is a bug fix, I am moving it to the next CF. Ian, are you planning to continue working on it? As a reviewer, I consider the patch useful and good overall. The comments I left were purely cosmetic. It's a pity to me that this bugfix delayed for such a small reason and outdated, therefore. It would be nice to complete this fix on the next CF. Yeah, we really should fix this.. On 16/11/2020 04:49, Ian Lawrence Barwick wrote: Also, I think Heikki's notion could be fulfilled. I spent a bit of time looking at that suggestion but couldn't actually verify it was an issue which needed fixing. > Attached are two patches. The first patch is your original patch, unmodified (except for a cosmetic rename of the test file). The second patch builds on that, demonstrating and fixing the issue I mentioned. It took me a while to create a repro for it, it's easily masked by incidental full-page writes or because rows created by XIDs that are not marked as committed on the other timeline are invisible, but succeeded at last. Aha, many thanks. I wasn't entirely sure what I was looking for there and recently haven't had the time or energy to dig any further. Regards Ian Barwick -- Ian Barwick https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
macOS SIP, next try
Previous discussions on macOS SIP: https://www.postgresql.org/message-id/flat/561E73AB.8060800%40gmx.net https://www.postgresql.org/message-id/flat/CA%2BTgmoYGi5oR8Rvb2-SY1_WEjd76H5gCqSukxFQ66qR7MewWAQ%40mail.gmail.com https://www.postgresql.org/message-id/flat/6a4d6124-41f0-756b-0811-c5c5def7ef4b%402ndquadrant.com Tests that use a temporary installation (i.e., make check) use a platform-specific environment variable to make programs and libraries find shared libraries in the temporary installation rather than in the actual configured installation target directory. On macOS, this is DYLD_LIBRARY_PATH. Ever since System Integrity Protection (SIP) arrived, this hasn't worked anymore, because DYLD_LIBRARY_PATH gets disabled by the system (in somewhat obscure ways). The workaround was to either disable SIP or to do make install before make check. The solution here is to use install_name_tool to edit the shared library path in the binaries (programs and libraries) after installation into the temporary prefix. The solution is, I think, correct in principle, but the implementation is hackish, since it hardcodes the names and versions of the interesting shared libraries in an unprincipled way. More refinement is needed here. Ideas welcome. In order to allow all that, we need to link everything with the option -headerpad_max_install_names so that there is enough room in the binaries to put in the temporary path in place of the original one. (The temporary path is strictly longer than the original one.) This bloats the binaries, so it might only be appropriate for development builds and should perhaps be hidden behind some option. Ideas also welcome here. The attached patch works for me. You can see that it disables setting DYLD_LIBRARY_PATH, but the tests still work. Verification on other system variantions welcome, of course. Comments welcome. I think this direction is promising, but some details need to be addressed. From 28f2f9fc2234e3d993027fb014f1c85f7b3db6be Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Wed, 2 Dec 2020 14:49:53 +0100 Subject: [PATCH] WIP: Fix temp-install tests to work with macOS SIP Tests that use a temporary installation (i.e., make check) use a platform-specific environment variable to make programs and libraries find shared libraries in the temporary installation rather than in the actual configured installation target directory. On macOS, this is DYLD_LIBRARY_PATH. Ever since System Integrity Protection (SIP) arrived, this hasn't worked anymore, because DYLD_LIBRARY_PATH gets disabled by the system (in somewhat obscure ways). The workaround was to either disable SIP or to do make install before make check. The solution here is to use install_name_tool to edit the shared library path in the binaries (programs and libraries) after installation into the temporary prefix. XXX The solution is, I think, correct in principle, but the implementation is hackish, since it hardcodes the names and versions of the interesting shared libraries in an unprincipled way. In order to allow that, we need to link everything with the option -headerpad_max_install_names so that there is enough room in the binaries to put in the temporary path in place of the original one. (The temporary path is strictly longer than the original one.) This bloats the binaries, so it might only be appropriate for development builds and should perhaps be hidden behind some option. --- src/Makefile.global.in| 1 + src/interfaces/ecpg/test/Makefile | 1 + src/makefiles/Makefile.darwin | 13 - src/test/isolation/Makefile | 3 +++ 4 files changed, 17 insertions(+), 1 deletion(-) diff --git a/src/Makefile.global.in b/src/Makefile.global.in index 7ca1e9aac5..a2349c814b 100644 --- a/src/Makefile.global.in +++ b/src/Makefile.global.in @@ -409,6 +409,7 @@ ifeq ($(MAKELEVEL),0) $(MKDIR_P) '$(abs_top_builddir)'/tmp_install/log $(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1 $(MAKE) -j1 $(if $(CHECKPREP_TOP),-C $(CHECKPREP_TOP),) checkprep >>'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1 + $(call temp_install_postprocess,'$(abs_top_builddir)/tmp_install/$(bindir)'/* '$(abs_top_builddir)/tmp_install/$(libdir)'/*.so '$(abs_top_builddir)/tmp_install/$(libdir)'/*.dylib) endif endif endif diff --git a/src/interfaces/ecpg/test/Makefile b/src/interfaces/ecpg/test/Makefile index be53b7b94d..468e3076ad 100644 --- a/src/interfaces/ecpg/test/Makefile +++ b/src/interfaces/ecpg/test/Makefile @@ -80,6 +80,7 @@ endif REGRESS_OPTS = --dbname=ecpg1_regression,ecpg2_regression --create-role=regress_ecpg_user1,regress_ecpg_user2 $(EXTRA_REGRESS_OPTS) check: all + $(call temp_install_postprocess,$(patsubst %.pgc,%,$(wildcard compat_informix/*.pgc compat_oracle/*.pgc connect/*.pgc pgtypeslib/*.pgc preproc/*.pgc sql/*.pg
Re: pg_stat_statements oddity with track = all
On Tue, Dec 1, 2020 at 10:32 PM Julien Rouhaud wrote: > On Tue, Dec 01, 2020 at 10:08:06PM -0800, Nikolay Samokhvalov wrote: > > If all top-level records in pg_stat_statements have "true" in the new > > column (is_toplevel), how would this lead to the need to increase > > pg_stat_statements.max? The number of records would remain the same, as > > before extending pg_stat_statements. > > If the same query is getting executed both at top level and as a nested > statement, two entries will then be created. That's probably unlikely for > things like RI trigger queries, but I don't know what to expect for client > application queries. > Right, but this is how things already work. The extra field you've proposed won't increase the number of records so it shouldn't affect how users choose pg_stat_statements.max.
Re: pg_stat_statements oddity with track = all
Hello > - add a parent_statement_id column that would be NULL for top level queries Will generate too much entries... Every FK for each different delete/insert, for example. But very useful for databases with a lot of stored procedures to find where this query is called. May be new mode track = tree? Use NULL to indicate a top-level query (same as with track=tree) and some constant for any nested queries when track = all. Also, currently a top statement will account buffers usage for underlying statements? regards, Sergei
Re: Autovacuum on partitioned table (autoanalyze)
Hello Alvaro, Thank you for your comments. > > > In second thought about the reason for the "toprel_oid". It is perhaps > > to avoid "wrongly" propagated values to ancestors after a manual > > ANALYZE on a partitioned table. But the same happens after an > > autoanalyze iteration if some of the ancestors of a leaf relation are > > analyzed before the leaf relation in a autoanalyze iteration. That > > can trigger an unnecessary analyzing for some of the ancestors. > > I'm not sure I understand this point. I think we should only trigger > this on analyzes of *leaf* partitions, not intermediate partitioned > relations. That way you would never get these unnecesary analyzes. > Am I missing something? > > (So with my proposal in the previous email, we would send the list of > ancestor relations after analyzing a leaf partition. Whenever we > analyze a non-leaf, then the list of ancestors is sent as an empty > list.) > The problem Horiguchi-san mentioned is as follows: create table p1 (i int) partition by range(i); create table p1_1 partition of p1 for values from (0) to (500) partition by range(i); create table p1_1_1 partition of p1_1 for values from (0) to (300); insert into p1 select generate_series(0,299); -- After auto analyze (first time) postgres=# select relname, n_mod_since_analyze, last_autoanalyze from pg_stat_all_tables where relname in ('p1','p1_1','p1_1_1'); relname | n_mod_since_analyze | last_autoanalyze -+-+--- p1 | 300 | p1_1| 300 | p1_1_1 | 0 | 2020-12-02 22:24:18.753574+09 (3 rows) -- Insert more rows postgres=# insert into p1 select generate_series(0,199); postgres=# select relname, n_mod_since_analyze, last_autoanalyze from pg_stat_all_tables where relname in ('p1','p1_1','p1_1_1'); relname | n_mod_since_analyze | last_autoanalyze -+-+--- p1 | 300 | p1_1| 300 | p1_1_1 | 200 | 2020-12-02 22:24:18.753574+09 (3 rows) -- After auto analyze (second time) postgres=# select relname, n_mod_since_analyze, last_autoanalyze from pg_stat_all_tables where relname in ('p1','p1_1','p1_1_1'); relname | n_mod_since_analyze | last_autoanalyze -+-+--- p1 | 0 | 2020-12-02 22:25:18.857248+09 p1_1| 200 | 2020-12-02 22:25:18.661932+09 p1_1_1 | 0 | 2020-12-02 22:25:18.792078+09 After 2nd auto analyze, all relations' n_mod_since_analyze should be 0, but p1_1's is not. This is because p1_1 was analyzed before p1_1_1. So p1_1 will be analyzed again in the 3rd auto analyze. That is propagating changes_since_analyze to *all ancestors* may cause unnecessary analyzes even if we do this only when analyzing leaf partitions. So I think we should track ancestors which are not analyzed in the current iteration as Horiguchi-san proposed. Regarding your idea: > typedef struct PgStat_MsgAnalyze > { >PgStat_MsgHdr m_hdr; >Oidm_databaseid; >Oidm_tableoid; >bool m_autovacuum; >bool m_resetcounter; >TimestampTzm_analyzetime; >PgStat_Counter m_live_tuples; >PgStat_Counter m_dead_tuples; >intm_nancestors; >Oidm_ancestors[PGSTAT_NUM_ANCESTORENTRIES]; > } PgStat_MsgAnalyze; I'm not sure but how about storing only ancestors that aren't analyzed in the current iteration in m_ancestors[PGSTAT_NUM_ANCESTORENTRIES] ? > > > > Regarding the counters in pg_stat_all_tables: maybe some of these > > > > should be > > > > null rather than zero ? Or else you should make an 0001 patch to fully > > > > implement this view, with all relevant counters, not just > > > > n_mod_since_analyze, > > > > last_*analyze, and *analyze_count. These are specifically misleading: > > > > > > > > last_vacuum | > > > > last_autovacuum | > > > > n_ins_since_vacuum | 0 > > > > vacuum_count| 0 > > > > autovacuum_count| 0 > > > > > > > I haven't modified this part yet, but you meant that we should set > > > null to counters > > > about vacuum because partitioned tables are not vacuumed? > > > > Perhaps bacause partitioned tables *cannot* be vacuumed. I'm not sure > > what is the best way here. Showing null seems reasonable but I'm not > > sure that doesn't break anything. > > I agree that showing NULLs for the vacuum columns is better. Perhaps > the most reasonable way to do this is use -1 as an indicator that NULL > ought to be returned from pg_stat_get_vacuum_count() et al, and add a > boolean in PgStat_TableCounts next to t_truncated, maybe "t_nullvacuum" > that says to store -1 instead of 0 in pgstat_recv_tabstat. > Thank you for the advice. I'll fix it based on this idea. -- Best regards, Yuzuko Hosoya NTT Open Source Software
Re: convert elog(LOG) calls to ereport
On 2020-Dec-02, Peter Eisentraut wrote: > There are a number of elog(LOG) calls that appear to be user-facing, so they > should be ereport()s. This patch changes them. There are more elog(LOG) > calls remaining, but they all appear to be some kind of debugging support. > Also, I changed a few elog(FATAL)s that were nearby, but I didn't > specifically look for them. > - elog(LOG, "WSAIoctl(SIO_KEEPALIVE_VALS) failed: %ui", > - WSAGetLastError()); > + ereport(LOG, > + (errmsg("WSAIoctl(SIO_KEEPALIVE_VALS) failed: > %ui", > + WSAGetLastError(; Please take the opportunity to move the flag name out of the message in this one, also. I do wonder if it'd be a good idea to move the syscall name itself out of the message, too; that would reduce the number of messages to translate 50x to just "%s(%s) failed: %m" instead of one message per distinct syscall. Should fd.c messages do errcode_for_file_access() like elsewhere? Overall, it looks good to me. Thanks
Re: wrong link in acronyms.sgml
On 2020-Dec-02, Erik Rijkers wrote: > Hi > > I just noticed that in > > https://www.postgresql.org/docs/13/acronyms.html > (i.e., doc/src/sgml/acronyms.sgml) > > there is under lemma 'HOT' a link with URL: > > https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/heap/README.HOT;hb=HEAD > > Is this deliberate? Surely pointing 13-docs into HEAD-docs is wrong? Yeah, I noticed this while working on the glossary. This link works: https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/heap/README.HOT;hb=REL_13_STABLE but the problem with this one is that we'll have to update once per release. > Otherwise it may be better to remove the link altogether and just have the > acronym lemma: 'HOT' and explanation 'Heap-Only Tuple'. Yeah, I don't think pointing to the source-code README file is a great resource. I think starting with 13 we should add an entry in the glossary for Heap-Only Tuple, and make the acronym entry point to the glossary. In fact, we could do this with several other acronyms too, such as DDL, DML, GEQO, LSN, MVCC, SQL, TID, WAL. The links to external resources can then be put in the glossary definition, if needed.
Re: A new function to wait for the backend exit after termination
On Wed, Dec 2, 2020 at 1:30 PM Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote: > On Mon, Nov 30, 2020 at 8:10 PM Muhammad Usama wrote: > > > > The following review has been posted through the commitfest application: > > make installcheck-world: tested, passed > > Implements feature: tested, passed > > Spec compliant: tested, passed > > Documentation:not tested > > > > I have tested the patch against current master branch > (commit:6742e14959a3033d946ab3d67f5ce4c99367d332) > > Both functions work without a problem and as expected. > > > > Thanks! > > > > > Just a tiny comment/suggestion. > > specifying a -ve timeout in pg_terminate_backed rightly throws an error, > > I am not sure if it would be right or a wrong approach but I guess we > can ignore -ve > > timeout in pg_terminate_backend function when wait (second argument) is > false. > > > > e.g. pg_terminate_backend(12320, false,-1); -- ignore -1 timout since > wait is false > > > > IMO, that's not a good idea. I see it this way, for any function first > the input args have to be validated. If okay, then follows the use of > those args and the main functionality. I can also see pg_promote(), > which first does the input timeout validation throwing error if it is > <= 0. > > We can retain the existing behaviour. > Agreed! Thanks Best regards Muhammad Usama > > > > > The new status of this patch is: Ready for Committer > > > > Thanks! > > With Regards, > Bharath Rupireddy. > EnterpriseDB: http://www.enterprisedb.com >